Abstract
Background
Myotonic dystrophy type 1 (DM1) is an incurable multisystem disease
caused by a CTG-repeat expansion in the DM1 protein kinase (DMPK) gene.
The OPTIMISTIC clinical trial demonstrated positive and heterogenous
effects of cognitive behavioral therapy (CBT) on the capacity for
activity and social participations in DM1 patients. Through a process
of reverse engineering, this study aims to identify druggable molecular
biomarkers associated with the clinical improvement in the OPTIMISTIC
cohort.
Methods
Based on full blood samples collected during OPTIMISTIC, we performed
paired mRNA sequencing for 27 patients before and after the CBT
intervention. Linear mixed effect models were used to identify
biomarkers associated with the disease-causing CTG expansion and the
mean clinical improvement across all clinical outcome measures.
Results
We identified 608 genes for which their expression was significantly
associated with the CTG-repeat expansion, as well as 1176 genes
significantly associated with the average clinical response towards the
intervention. Remarkably, all 97 genes associated with both returned to
more normal levels in patients who benefited the most from CBT. This
main finding has been replicated based on an external dataset of mRNA
data of DM1 patients and controls, singling these genes out as
candidate biomarkers for therapy response. Among these candidate genes
were DNAJB12, HDAC5, and TRIM8, each belonging to a protein family that
is being studied in the context of neurological disorders or muscular
dystrophies. Across the different gene sets, gene pathway enrichment
analysis revealed disease-relevant impaired signaling in, among others,
insulin-, metabolism-, and immune-related pathways. Furthermore,
evidence for shared dysregulations with another neuromuscular disease,
Duchenne muscular dystrophy, was found, suggesting a partial overlap in
blood-based gene dysregulation.
Conclusions
DM1-relevant disease signatures can be identified on a molecular level
in peripheral blood, opening new avenues for drug discovery and therapy
efficacy assessments.
Supplementary Information
The online version contains supplementary material available at
10.1186/s12916-022-02591-y.
Keywords: Myotonic dystrophy type 1, Biomarker, RNA-seq, Peripheral
blood, Therapeutic Response, Lifestyle intervention
Background
Myotonic dystrophy type 1 (DM1) is a neuromuscular disease with a
worldwide average prevalence of around 1 in 8000 people and a high
unmet clinical need [[33]1]. DM1 is considered the most frequently
occurring adult-onset form of muscular dystrophy. This degenerative
multisystem disease is characterized by a wide range of symptoms
including myotonia, muscle weakness and dystrophy, fatigue, apathy,
cataracts, obesity, and insulin resistance. Next to a severe decrease
of life quality, DM1 patients suffer from a reduced life expectancy
mostly due to problems with cardiac and respiratory function.
Currently, no curative therapy exists.
DM1 is caused by the expansion of a CTG trinucleotide microsatellite
repeat in the 3′ UTR of the DM1 protein kinase (DMPK) gene
[[34]2–[35]4]. Unaffected individuals carry up to approximately 37 CTG
triplets in DMPK, while in DM1 patients this ranges from 50 to even a
few thousand repetitions. Depending on the inherited repeat length, DM1
can become manifest at birth or early in life but more frequently
becomes apparent in adulthood [[36]1]. In general, the disease
manifestation is earlier and more severe with longer repeat expansions.
Interruption of the CTG repeat by variants such as CCG or CGG is
associated with milder symptoms [[37]5]. The expanded CTG repeat is
thought to cause disease mainly via an mRNA gain-of-function mechanism,
in which aberrant hairpin structures formed by long CUG repeats are
central [[38]6–[39]9]. Directly or indirectly, these hairpin structures
dysregulate the function of RNA binding proteins from the
muscleblind-like (MBNL) and CUGBP Elav-like (CELF) families, leading to
widespread disturbed RNA processing and consequently altered functions
of various proteins [[40]10–[41]12]. Although proven to be the
disease-causing mutation, clinical symptoms of DM1 are only moderately
associated with the CTG repeat or the dysregulation of specific
proteins which suggests an involvement of other mechanisms in symptom
expression [[42]13–[43]16].
While there are many promising therapeutic oligonucleotides, small
molecule drugs, and gene therapies in the (pre) clinical pipeline for
some of the signs and symptoms of DM1, none is expected to reach
widespread clinical application soon. Physical training and increasing
activity are currently being applied to relieve DM1 symptoms with
marked improvements in relatively mildly affected DM1 patients [[44]17,
[45]18], which has furthermore shown to induce biochemical responses in
DM1 mouse models [[46]19, [47]20].
The to-date largest clinical trial in DM1 was OPTIMISTIC: Observational
Prolonged Trial In Myotonic dystrophy type 1 to Improve Quality of
Life-Standards, a Target Identification Collaboration [[48]18]. The
OPTIMISTIC clinical trial included over 250 well-characterized DM1
patients from four centers in Europe, where the effects of cognitive
behavioral therapy (CBT) and optional graded exercise therapy were
closely monitored over 16 months via more than twenty outcome measures.
Notably, the CBT intervention was tailored towards the specific needs
of the patient in a shared decision-making process between the patient
and the psychotherapist, allowing for a personalized intervention. The
trial has shown significant, yet heterogenous improvements for various
signs and symptoms, as well as the capacity for social activity and
participation in DM1 [[49]21].
Here, we set out to find molecular profiles associated with the
disease-causing CTG repeat and therapy response based on full blood
mRNA sequencing before and after the CBT intervention of 27 patients
from the OPTIMISTIC cohort. Given the accessibility of peripheral
blood, it has increasingly been used for the successful identification
of disease biomarkers for a variety of neurological and psychiatric
disorders such as Duchenne muscular dystrophy (DMD), Huntington’s
disease, major depressive disorder, and DM1 [[50]22–[51]25].
Furthermore, the multisystem nature of DM1 is known to be reflected by
various laboratory abnormalities of blood samples, supporting the
relevance of peripheral blood for the identification of
disease-relevant information [[52]26]. We analyzed gene expression
levels as a function of CTG repeat size (as a proxy for disease
load/severity) and of the therapy response. Next, we combined these
findings and compared the results to various previously published
datasets. We were able to identify 608 genes significantly associated
with the CTG repeat and further illustrate that 97 of these genes
returned towards more normal expression levels in clinical CBT
responders.
Methods
The cognitive behavioral therapy intervention
Patients of the OPTIMISTIC intervention arm were treated with a
personalized form of CBT. The customization of the intervention was
based on a selection of different treatment modules: regulating
sleep-wake patterns, compensating for the reduced patient initiative,
formulating helpful beliefs about fatigue and myotonic dystrophy type
1, optimizing social interactions, and coping with pain [[53]18]. The
individual module selection was made based on a shared decision-making
process between experienced and specifically trained CBT therapists and
patients.
Sample source and patient sampling
Samples and metadata used for this study were all gathered during the
OPTIMISTIC clinical trial [[54]18]. At the different time points in the
trial, blood was drawn and a wide range of clinical outcome measures
were recorded. Figure [55]1 has been generated to illustrate the
heterogeneity in changes across all grouped outcome measures, with
annotation of all individual outcome measures in the legend. In order
to maximize the generalizability of the study findings, the goal of the
patient sampling was to obtain a balanced subset of the OPTIMISTIC
intervention group (n=128). Additionally, by capturing the whole range
of therapy responses in a continuous uniform distribution, as assessed
by the primary clinical trial outcome DM1-Activ-c [[56]27], strong
linear associations between non-responders and responders could be
studied. To promote future research, the sampling was furthermore done
on the most completely characterized patients. To achieve this, several
filter steps have been applied before the random sampling. Patients
were selected for which the DM1-Activ-c questionnaire results were
available at each time point (n=104), with less than 20% missing values
for other outcome measures (n=81) and without a variant CTG repeat
(n=74). Homogeneity of baseline disease severity was accounted for by
selecting patients that were within one interquartile range (IQR) of
the mean for the baseline variables DM1-Activ-c, 6MWT, and CTG-repeat
length (n=45). One patient was excluded because of polypharmacy (n=44).
One patient was excluded because of a drop of 57 points of the
DM1-Activ-c score between the baseline and 5-month assessments with a
subsequent increase of 55 points between the 5- and 10-month
assessments (n=43).
Fig. 1.
[57]Fig. 1
[58]Open in a new tab
Distribution of changes in outcome measures per patient. Per outcome
measure, changes between baseline and 10 months of CBT were scaled by
the root mean square. Additionally, for some outcome measures, a sign
adjustment was performed so that an increased score is always
associated with improved health status. All outcome measures are shown
per patient, where patients were ordered along the y-axis by their
change in DM1-Activ-c scores (purple squares). Boxes enclose the 25th
to 75th percentiles, divided by a thick line that represents the mean
compound response score. The whiskers represent the lowest/highest
value no further than 1.5 times the interquartile range. Quality of
life (red): Myotonic Dystrophy Health Index, Individualized
Neuromuscular Quality of Life Questionnaire, Adult Social Behavioural
Questionnaire, Illness Management Questionnaire, Checklist individual
strength — Subscale activity. Physical assessments (blue): Six-Minute
Walk Test, BORG Scale, accelometery measures. Fatigue scores (black):
Fatigue and Daytime Sleepiness Scale, Checklist Individual Strength —
Subscale fatigue, Jacobsen Fatigue Catastrophizing Scale. Cognition and
other (gray): Trail Making Test, Stroop Color-Word Interference Test,
McGill Pain Questionnaire, Beck Depression Inventory — Fast Screen,
Social Support — Discrepancies and Negative Interactions, Apathy
Evaluation Scale — Clinical version, Self-Efficacy Scale 28
The distribution of these 43 patients over the clinical sites A, B, C,
and D was A12, B11, C16, and D4. Therefore, all patients from site D
were selected. For the remaining sites, a stratified random sampling
approach was implemented, where patients were randomly sampled from the
different sites and a maximum of two patients with the same change in
DM1-Activ-c were selected. This process of random selection was
repeated until a reasonable site and delta-DM1-Activ-c distribution was
achieved, defined as more than 7 patients for sites A, B, and C,
resulting in the final selection of 30 patients. Due to the
unavailability of samples and unsuccessful RNA sequencing, three
patients were later excluded (n=27). The final selection featured a
site distribution of 5 times center A, 8 B, 10 C, 4 D, and 22 unique
changes DM1-Activ-c scores, with no change in the DM1-Activ-c score
being present more than twice.
RNA sequencing
Blood drawn during the OPTIMISTIC trial was collected in Tempus tubes
and centrally stored at the New Castle MRC Centre for Rare &
Neuromuscular Diseases biobank with strict SOPs and temperature control
(−80°C). RNA was locally isolated in Nijmegen using the Tempus Spin RNA
Isolation Kit (Applied Biosystems/Thermo Fisher Scientific) according
to the manufacturer’s instructions. The concentration and RNA Integrity
Number (RIN) were checked using Fragment Analyzer (Thermo Fisher
Scientific). The mean RIN value was 8.9 and all were > 7.5. Hemoglobin
mRNA was depleted using the Globinclear kit (Thermo Fisher Scientific).
Libraries were prepared using NEBNext Ultra II Directional RNA Library
Prep Kit (Illumina) according to the manufacturer’s instructions for a
polyA mRNA workflow using UMI-indexed adapters. The size distribution
(between 300 and 500 bp) was confirmed using Fragment Analyzer. A total
of 150-bp paired end sequencing was performed with a NovaSeq6000
machine (Illumina) at a library concentration of 1.1 nM, generating >
30 M read pairs per sample. All raw sequencing data and associated
genotype/phenotype/experimental information is stored in the European
Genome-phenome Archive (EGA) under controlled access with Dataset ID
EGAS00001005830 [[59]28].
RNA-sequencing primary data analysis
Adapter sequences and low-quality base calls were removed from fastq
files using cutadapt 3.4 via TrimGalore 0.6.6 at no other default
parameters than the --paired flag [[60]29]. Trimmed fastq files were
mapped to the human genome version hg38.95 using STAR 2.7.0 at default
parameters and --outSAMtype BAM SortedByCoordinate [[61]30]. After
indexing using samtools [[62]31] at default parameters, PCR duplicates
were removed from the bam files using umi-tools dedup with the flags
--spliced-is-unique, --paired and --output-stats (Additional file
[63]1: Table S1) [[64]32]. Strandedness was verified via RSeQC’s
infer_experiment [[65]33]. After indexing the deduplicated bam files,
reads were counted for overlap with hg38.95 genes via HTSeq with
parameters --format bam --order pos and --stranded=yes [[66]34]. EPIC,
quanTIseq, and xCell algorithms were applied to the count tables to
verify that the cell type compositions were similar at the two time
points [[67]35–[68]37]. GATK HaplotypeCaller and Picard
GenotypeConcordance were used to check the correct matching of samples
from the same patient [[69]38, [70]39].
Splice analysis was performed using rMATS v4.1.0 [[71]40] via the same
gtf as for STAR/HTSeq with the parameters and flags: -t paired
--readLength 150 --variable-read-length --novelSS --libType
fr-firststrand --statoff.
Comparative DM1 datasets
In order to validate our findings, we obtained several external DM1
gene expression datasets of different tissue types (tibialis muscle
[[72]11, [73]41], heart [[74]15, [75]42], brain [[76]12, [77]43], and
peripheral blood [[78]25, [79]44]). Additionally, table S5 from Wang et
al. was directly obtained from the publication [[80]11]. The dataset
EV10 from Signorelli et al. was obtained for the association between
gene expression (logFC) and body or performance test in DMD patients
[[81]22]. To compare DM1 samples to controls, a two-sided two-sample
Wilcoxon test was performed using row_wilcoxon_twosample in matrixTests
R package on normalized, log-transformed gene counts [[82]45].
Statistical analysis
All statistical analyses were carried out in R [[83]46]. For gene
expression analysis, firstly, genes with low read counts before and
after CBT were filtered using edgeR filterByExpr with group =
before/after CBT and min.count = 50 [[84]47]. Following, normalized
logCPM values and weights were calculated from the filtered read counts
with Voom in Limma [[85]48].
To achieve an overarching measure for CBT response, first, the changes
for each outcome measure were calculated per patient by subtracting the
value after 10 months of CBT from that at baseline. Where applicable,
outcome measures were multiplied by −1 in order to always associate
positive changes with an improved health status. Using R base scale,
the changes per outcome measure were then scaled without centering to
account for the different scales of the outcome measures. Finally, for
each patient, a “Compound Response” score was calculated based on the
mean of all scaled outcome measures. Individual contributions towards
this compound response score were visualized (Fig. [86]1). L5ENMO, the
mean activity during rest, was a control parameter in OPTIMISTIC and
was excluded from this analysis.
We first set out to explore differential gene expression before and
after the CBT intervention. Gene expression values from Voom (in
logCPM) were separately modeled using mixed effect models with
before/after CBT (categorical) as fixed effect and patient identity as
random effect ([87]1). Gene weights were also carried over from Voom.
This analysis has been implemented using lme in the lme4-wrapper
lmerTest [[88]49, [89]50]. lmerTest estimates a p-value for the
contribution of fixed effects to the model via Satterthwaite’s degrees
of freedom method. Parameters of the fits were extracted with R base
summary and p-values were FDR corrected via the Benjamini and Hochberg
method with stats p.adjust [[90]51].
[MATH: Geneexpre
ssion=C<
mi>BT0/1+1|Patient,geneweigh
ts :MATH]
1
In order to study the cohort heterogeneity of gene expression changes,
we calculated the logCPM-based difference in gene expression between
before and after the intervention for each patient for the 560 genes
significantly associated with the CBT predictor of ([91]1) (adjusted p
< 0.05). Patients and genes with similar expression changes were
clustered using the R package heatmap3 based on the complete linkage
method for hierarchical clustering, with gene expression values being
centered and scaled per gene [[92]52]. Changes in clinical response
(DM1-Activ-c, Six-Minute Walk Test (6MWT), and compound response) were
visualized using the corrplot function and added to the heatmap
[[93]53]. The changes in DM1-Activ-c and 6MWT were scaled using R base
scale without centering to account for the different scales.
Next, using the same methodology as for the CBT intervention effect, we
set out to explore the associations of the different clinical outcome
measures and CTG-repeat length with gene expression. For this purpose,
we separately modeled gene expression values with either one of the
outcome measures or the CTG-repeat length (at the trial start, ([94]2)
as fixed effect and patient identity as random effect. The categorical
CBT covariate (before/after) was included for each fit to correct for
differences between the two time points.
[MATH: Gen
eexpre
ssion=C<
mi>BT0/1+CTG_repeat*+1|Patient,geneweigh
ts*SamerepeatlengthfittedforbothT0andT10M
:MATH]
2
Analogous to the methodology described for the CTG-repeat association
analyses, genes associated with overarching clinical response were
identified by fitting separate mixed effect models for each gene with
the two fixed effects CBT (categorical) and compound response, as well
as patient identity as random effect ([95]3). Notably, the compound
response variable has only been fitted for gene expression after the
CBT intervention (zero at baseline). As such, the compound response
predictor reflects the difference between the two time points that can
be attributed towards therapy responsiveness, while accounting for
non-therapy-specific differences by including the categorical CBT
predictor.
[MATH: Gen
eexpre
ssion=C<
mi>BT0/1+Compound_Respons
e0/cont.*+1|Patient,geneweigh
ts*Compo<
mi>und_ResponseiszeroatT0andconti
nuousatT10M
:MATH]
3
Potential biomarker candidates were discovered by intersecting the
genes significantly associated with the CTG_repeat predictor from model
([96]2) and the Compound_Response predictor from model ([97]3). Pearson
correlation coefficients and the associated nominal p-values were
calculated between the change in gene expression and the change in
clinical score (compound response, delta-DM1-Activ-c) for these
potential biomarkers using the corr.test function of the R package
“psych” [[98]54].
For the splicing analysis, the PSI values for splice exclusion (SE)
events were extracted from the rMATS output and fitted in linear models
similar to those for gene expression. Splice events were filtered by
excluding exons from the analysis with less than three mapping reads
and one junction spanning read in at least 14 samples.
The R package ggplot2 was used for representation in volcano and
scatter plots [[99]55]. The R package VennDiagram was used to generate
the Venn diagram [[100]56].
Gene set enrichment analysis
Gene set enrichment analyses have been independently implemented for
the gene sets associated with CBT, CTG-repeat length, compound response
scores, and the genes significantly associated with both CTG-repeat
length and compound response using gProfiler [[101]57]. For the CBT,
CTG-repeat length, and compound response-associated genes, the 500
genes with the lowest nominal p-values were ordered (decreasing) based
on their absolute regression coefficients. Subsequent enrichment
analyses were implemented using the R client of gProfiler with the
parameter orderd_querey = TRUE against a custom background of 10,292
genes expressed in our samples. Multiple testing correction was based
on the default setting “g_SCS.” We tested for enrichment (one-sided)
pathways within the WikiPathway database. The gene set associated with
both CTG-repeat length and compound response was based on an FDR
threshold of 10% for the respective regression coefficients, resulting
in a gene set of 311 genes. A regular, non-order weighted ORA
(over-representation analysis) analysis was run for this gene set with
ordered_quere = FALSE. Similar to the other analyses, one-sided
(enrichment) pathway discovery was based on the WikiPathway database
with the default setting “g_SCS” to correct for multiple testing. For
all enrichment analyses, only significant pathways (p-adjusted < 0.05)
are reported.
The exact scripts and the resulting datasets of the statistical
analyses are available via
[102]https://github.com/cmbi/DM1_blood_RNAseq
Results
Patient sampling, procedure, and analysis of outcome measures
For the identification of blood biomarkers that are associated with the
clinical response to the CBT intervention, 27 patients from the
OPTIMISTIC cohort were selected based on a random stratified sampling
procedure. These patients reflected a uniform continuous distribution
of therapy responses as assessed by the primary trial outcome, the
DM1-Activ-c questionnaire.
The sampled set consisted of 14 females and 13 males aged 19–63 years
and represented a wide range of CTG-repeat lengths (Additional file
[103]2: Fig. S1). All sampled patients received CBT, and
mRNA-sequencing profiles were obtained at baseline and after 10 months
of CBT, the primary endpoint of the OPTIMISTIC trial. Large clinical
heterogeneity of clinical responses after CBT was observed across all
of the different outcome measures. Figure [104]1 highlights the scaled
differences across these outcome measures, color coded into five
different groups (cognition and other, DM1-Activ-c, fatigue scores,
physical assessments, and quality of life). Because of the large
heterogeneity, we defined a compound response score. The compound
response score is the mean of all scaled outcome measures (Fig.
[105]1).
Marked changes in gene expression after CBT
We first studied the molecular changes that occurred after the CBT
intervention by comparing the mRNA expression levels in blood before
and after the CBT intervention. We found that 560 genes were
significantly up- or downregulated after CBT (277 genes down, 283 up,
fold changes ranging between 0.64 and 2.35, Fig. [106]2A). Hierarchical
clustering of patients based on the changes in these 560 genes revealed
substantial molecular heterogeneity within this 10-month timeframe.
There was no evident concordance between the clustering of samples
based on changes in gene expression and the changes in DM1-Activ-c,
6MWT, or compound response score (Fig. [107]2B). The four genes with
the lowest p-values were GGCX, ZNF16, SERBP1, and SLC39A8 (Fig.
[108]2C). Biological pathways significantly associated with these genes
were limited to an immunological pathway and an electron transport
chain in mitochondria (Table [109]1). DMPK expression did not
significantly change over the course of the study, nor was it
associated with the CTG-repeat length (Additional file [110]2: Fig.
S2).
Fig. 2.
[111]Fig. 2
[112]Open in a new tab
Changes in gene expression after cognitive behavioral therapy. A linear
mixed effect model was fitted for each gene, estimating the fixed
effect of CBT while accounting for random effects of the individual.
The p-values for the fixed effect were estimated via Satterthwaite’s
degrees of freedom method and FDR corrected. A Volcano plot of
significance (−log10 of the nominal p-value) and the effect size for
changed expression after 10 months of CBT. Genes for which the effect
size of CBT is significant (FDR < 0.05) are visualized in black. B
Heatmap of changes in normalized logCPM values between the baseline and
the 10-month assessment for the 560 genes significantly associated with
the CBT effect size (scores ranging from −3 (dark red) to +3 (dark
blue)). Patients and genes were clustered based on the complete linkage
method for hierarchical clustering, and values were centered and scaled
per gene. For each patient, changes in DM1-Activ-c (delta-DM1-Activ-c)
and Six-Minute Walk Test (delta-6MWT) as well as the compound response
score were added (scores ranging from −1.6 (dark red) to +2.76 (dark
blue)). Delta-DM1-Activ-c and delta-6MWT were scaled by their root mean
square. C Expression values (logCPM) at baseline (blue) and after CBT
(red) of the four genes with the lowest nominal p-values from panel A
including their Pearson correlations
Table 1.
Gene set enrichment analysis
Query* Adjusted p-value** Term size*** Intersection size**** Term ID
Term name Intersection with WikiPathway
CBT_ORA 0.014 49 5 WP:WP5039 SARS-CoV-2 innate immunity evasion and
cell-specific immune response CXCL1,PPBP,PF4,CXCR2,CXCL5
CBT_ORA 0.037 91 8 WP:WP111 Electron transport chain (OXPHOS system in
mitochondria) MT-ND3,ATP5F1E,MT-ND1,MT-ATP6,MT-CO2,MT-CO3,MT-ND4,MT-CO1
CTG_ORA <0.001 46 10 WP:WP286 IL-3 signaling pathway
FOS,CSF2RB,GAB2,HCK,LYN,MAPK3,RAF1,STAT3,STAT5B,PIK3CD
CTG_ORA 0.001 49 10 WP:WP395 IL-4 signaling pathway
FOS,IRS2,CEBPB,GAB2,NFIL3,MAPK3,STAT6,STAT3,STAT5B,PIK3CD
CTG_ORA 0.002 26 3 WP:WP4564 Neural crest cell migration during
development MMP9,EPHB4,FOS
CTG_ORA 0.003 28 3 WP:WP4565 Neural crest cell migration in cancer
MMP9,EPHB4,FOS
CTG_ORA 0.004 29 7 WP:WP3972 PDGFR-beta pathway
FOS,MAPK3,RAF1,MAP2K4,STAT6,STAT3,STAT5B
CTG_ORA 0.006 29 2 WP:WP4808 Endochondral ossification with skeletal
dysplasias MMP9,ALPL
CTG_ORA 0.006 29 2 WP:WP474 Endochondral osssification MMP9,ALPL
CTG_ORA 0.006 56 2 WP:WP2324 AGE/RAGE pathway MMP9,ALPL
CTG_ORA 0.008 48 10 WP:WP304 Kit receptor signaling pathway
FOS,GAB2,SH2B2,LYN,MAPK3,RAF1,STAT3,STAT5B,CRK,RPS6KA1
CTG_ORA 0.011 37 8 WP:WP127 IL-5 signaling pathway
FOS,CSF2RB,LYN,MAPK3,RAF1,STAT3,STAT5B,RPS6KA1
CTG_ORA 0.013 31 7 WP:WP313 Signaling of hepatocyte growth factor
receptor FOS,PTEN,MAPK3,RAF1,STAT3,CRK,PXN
CTG_ORA 0.015 110 16 WP:WP3929 Chemokine signaling pathway
CXCR2,TIAM2,HCK,NCF1,LYN,ARRB2,GNB2,PREX1,MAPK3,RAF1,STAT3,STAT5B,CRK,W
AS,PXN,PIK3CD
CTG_ORA 0.031 15 3 WP:WP3599 Transcription factor regulation in
adipogenesis IRS2,CEBPD,CEBPB
CTG_CR_ORA 0.002 110 13 WP:WP3929 Chemokine signaling pathway
WAS,PXN,HCK,MAPK3,PREX1,ARRB2,PIK3R5,VAV1,NCF1,CXCL16,PIK3CD,GNB2,GRB2
CTG_CR_ORA 0.006 46 8 WP:WP286 IL-3 signaling pathway
GAB2,CSF2RB,HCK,MAPK3,PTPN6,VAV1,PIK3CD,GRB2
CTG_CR_ORA 0.009 48 8 WP:WP304 Kit receptor signaling pathway
GAB2,MAPK3,PTPN6,RPS6KA1,VAV1,SH2B2,JUNB,GRB2
CTG_CR_ORA 0.016 115 12 WP:WP306 Focal adhesion
ACTN1,PXN,HCK,MAPK3,CCND2,VASP,TLN1,VAV1,ZYX,ITGA5,PIK3CD,GRB2
CTG_CR_ORA 0.032 31 6 WP:WP3937 Microglia pathogen phagocytosis pathway
HCK,PTPN6,VAV1,NCF1,ITGB2,PIK3CD
[113]Open in a new tab
*Query refers to the gene sets used. CBT: 560 genes differentially
expressed after the intervention; CTG: 608 genes associated with the
CTG repeat; CTG_CR: 97 genes associated with both CTG repeat and
compound response; ORA over-representation analysis
**p-values associated with the finding after correction for multiple
testing
***Number of genes in the identified pathway
****Number of genes in the provided gene set overlapping with the
pathway
CTG-repeat length associations reflect molecular dysregulation in blood
across different studies
Having a readily accessible fingerprint reflecting the molecular
dysregulations associated with DM1 is potentially of high value for
both clinical and research settings. To this end, we studied the
correlations of blood-based gene expression with the disease-causing
CTG-repeat length assessed at the start of the trial. The assessments
were based on small pool PCR, Acil digestion, and Southern blot, as
opposed to (estimations) of CTG-repeat length at birth or disease
onset. This was done to minimize potential confounding effects related
to the progressive nature of the disease and to assure homogeneity in
the measurement methodology.
Based on this approach, we identified 608 genes significantly
associated with the CTG repeat at an FDR cutoff of 5% (474 positively,
134 negatively, fold changes ranging between 0.76 and 1.23 per 100
CTGs, Fig. [114]3A). The four genes with the lowest p-values RNF170,
IRS2, NDE1, and PRIMPOL showed a clear linear relationship between the
CTG-repeat length at baseline and expression values, both before and
following the intervention (Fig. [115]3B). Most enriched pathways were
related to immunological processes (IL-3, IL-4, IL-5 signaling;
chemokine signaling pathway), yet also pathways related to
adipogenesis, hepatocyte signaling, and AGE/RAGE were discovered.
Interestingly, the gene MMP9 was among several of the
CTG-repeat-associated pathways.
Fig. 3.
[116]Fig. 3
[117]Open in a new tab
Gene expression levels associated with CTG-repeat length. For each
gene, a mixed effect model was fitted with before/after CBT and
CTG-repeat length as fixed effects, while accounting for (random)
effects of the individual. The p-values for the fixed effects were
estimated via Satterthwaite’s freedom method and FDR corrected. A
Volcano plot of significance (−log10 of the nominal p-value) and effect
size of the CTG-repeat length (per 100 CTGs) on gene expression. Genes
for which the effect of CTG-repeat is significant (FDR < 0.05) are
visualized in black. B For the four genes with the lowest nominal
p-values from panel A, the gene expression values (logCPM) are plotted
against the CTG-repeat length. Blue dots represent baseline expression
values, and red dots expression values after CBT. The regression line
is fitted over all values, independent of the time point. The Pearson
correlation coefficients for the association between CTG-repeat length
and gene expression are displayed for each gene
To replicate and validate that these findings are disease relevant, we
compared the CTG-repeat length effect size with effect sizes reflecting
the differences in gene expression between DM1 and controls in various
published datasets (Additional file [118]2: Fig. S3). We found a strong
correlation with a study that performed mRNA sequencing on DM1 and
control blood samples (Pearson rho: 0.59, Additional file [119]2: Fig.
S3A [[120]25]). However, similar correlations were not observed for
studies profiling DM1 and control tissues other than blood (heart
[[121]15], brain [[122]12], and muscle [[123]11], Additional file
[124]2: Fig. S3B-S3D). Neither were correlations observed with inferred
MBNL activity or muscle strength measures from another study
(Additional file [125]2: Fig. S3E-S3F [[126]11]). Although no
correlations were found with the effect size of blood-based physical
test assessments in DMD (Additional file [127]2: Fig. S3G), the
CTG-repeat effect size did correlate well with the effect size of
blood-based DMD body measures (Pearson rho: 0.41, Additional file
[128]2: Fig. S3H) [[129]22]. In the latter study, comparisons with
physical tests and body measures were independently based on the first
principal component of a set of different clinical assessments of DMD
patients. Thus, while DM1-related molecular dysregulations in blood can
be validated in other studies, even from other neuromuscular disorders,
they do not necessarily reflect expression dysregulation observed in
other tissues.
Since DM1 is known as a splicing disease, we also studied the
association of the CTG-repeat length with alternative splicing events.
Here, four events in three genes (RBM39, FLNA, and CTSZ) reached an FDR
threshold of <5% (Additional file [130]2: Fig. S4). Given the limited
and small effects observed, we did not further explore these
associations.
Non-significant associations between gene expression and individual clinical
outcome measures
Next, we studied phenotype-genotype relationships by estimating the
associations of gene expression values with individual clinical outcome
measures used in the OPTIMISTIC trial. Here, we found virtually no
significant associations between gene expression and clinical outcomes.
No significant associations after multiple testing correction were
found for the DM1-Activ-c questionnaire (Additional file [131]2: Fig.
S5). The four genes with the lowest nominal p-values were SREBF2,
ZNF283, SF3B3, and GSKIP.
Significant associations with average clinical response
To account for the heterogenic changes in individual outcome measures,
we calculated a compound CBT response score that reflects the average
scaled therapy response of all outcome measures used in OPTIMISTIC
(Fig. [132]1). Noteworthy, similar ranges of change are observed for
the variety of outcome measures due to the applied scaling. As such,
each outcome measure contributes similarly to the compound response
score. The compound response score can be interpreted as an estimate of
an overall difference in capacity between the end and the start of the
intervention. We were able to identify 1176 genes significantly (FDR <
0.05) associated with the compound response score (384 positive, 792
negative, Fig. [133]4A). The four hits with the lowest p-values
(PPP1R9B, CSNK1G2, PPP6R1, FBXO48) show a clear linear relationship
between changes in gene expression during the trial and the compound
response score (Fig. [134]4B). No enriched pathways were identified for
this gene set.
Fig. 4.
[135]Fig. 4
[136]Open in a new tab
Gene expression association with compound response scores. For each
gene, a mixed effect model was fitted with before/after CBT and
compound response scores as fixed effects, while accounting for
(random) effects of the individual. Compound response scores were
fitted for gene expression values after CBT and set to be zero at
baseline; the effect size of this covariate therefore expresses changes
in gene expression compared to the baseline values that are associated
with clinical response. The p-values for the fixed effects were
estimated via Satterthwaite’s freedom method and FDR corrected. A
Volcano plot of significance (−log10 of the nominal p-value) and the
effect size of the compound response score on gene expression. Genes
for which the effect size of compound response is significant (FDR <
0.05) are visualized in black. B For the four genes with the lowest
nominal p-values from panel A, the changes in gene expression (delta
logCPM after-before CBT) are plotted against the compound response
scores, including Pearson correlations
Clinical improvement linked to the reversal of disease-induced gene
expression
Since we were able to identify genes significantly associated with both
the CTG-repeat length as well as the average clinical response, we were
interested in their intersection. Among the significant hits of both
analyses, we found an overlap of 97 genes (Fig. [137]5A). To further
investigate this relationship, we plotted the effect size of the
compound response score against the effect size of the CTG-repeat
length for each expressed gene (Fig. [138]5B). For the 97 genes
significantly associated with both predictors, a remarkable pattern
emerged: genes that were lower expressed in patients with a long CTG
repeat showed an increase in expression levels in the patients with a
good clinical response and vice versa. This anticorrelation pattern was
confirmed by analyzing an earlier dataset comparing gene expression in
DM1 and control blood samples [[139]25]. Here, the 97 genes showed a
similar association with the DM1 phenotype as has been found with the
CTG length in our study, confirming that the gene expression of
patients with a good CBT response changed into the direction of the
levels observed in healthy controls (Fig. [140]5C). This remarkable
relationship could not be explained by possible confounding between
CTG-repeat length and compound response, as the compound response
effect size is only slightly affected by first regressing out the
CTG-repeat length effect (Additional file [141]2: Fig. S6). The four
genes with the lowest p-values with both the CTG-repeat length as well
as the compound response score (either FDR < 0.025) were DNAJB12 (CTG
Pearson rho=0.49; CRS Pearson rho=−0.43), HDAC5 (CTG rho=0.65; CRS
rho=−0.46), TRIM8 (CTG rho=0.59; CRS rho=−0.59), and ZNF22 (CTG
rho=−0.52; CRS rho=0.52). For 81 of the 97 candidate biomarkers, the
Pearson correlations were nominally significant for the change in gene
expression and the compound response score. Yet, only 2 of the 97
Pearson correlations were nominally significant for the change in gene
expression and the change in DM1-Activ-c scores. Enrichment analysis
for these 97 genes resulted in mostly immunological pathways (chemokine
and IL-3 signaling, microglia pathogen phagocytosis pathway; Table
[142]1).
Fig. 5.
[143]Fig. 5
[144]Open in a new tab
Clinical improvement is linked to normalization of expression of
CTG-repeat-associated genes. Linear mixed effect models were fitted for
each expressed gene, with CBT as a fixed effect, patient as a random
effect, and either CTG repeat or compound response as the predictor.
p-values for the regression coefficients were estimated via
Satterthwaite’s degrees of freedom and considered significant for
values smaller than 0.05 after FDR correction. Furthermore, differences
in gene expression of blood samples from DM1 patients and controls were
calculated based on an external study using a Wilcoxon signed-rank test
on normalized, log-transformed gene counts. A Venn diagram illustrating
the number of significant genes associated with CTG-repeat length and
compound response, as well as their overlap (disease-relevant changes).
B For all expressed genes, the regression coefficients of the compound
response scores are plotted against the regression coefficients of the
CTG-repeat lengths, including their Pearson correlation. For
illustrative purposes, the regression coefficients of the CTG repeat
have been multiplied by 100. Furthermore, the x-axis has been scaled
between −0.25 and 0.25, removing 12 outliers from the figure. Colored
in purple are the genes for which both regression coefficients were
significant (FDR < 0.05). C For all expressed genes, the compound
response effect size is plotted against the DM1 effect size based on an
external study comparing blood expression profiles from DM1 patients
and controls, including their Pearson correlation [[145]25]. For
illustrative purposes, the x-axis has been scaled between −1.5 and 1.5,
removing 6 outliers from the figure. Colored in purple are the same 97
genes as in panel B
In summary, these results suggest that for a subset of genes
significantly associated with the biochemical phenotype caused by the
CTG-repeat expansion a reversal of disease-induced gene expression
occurred in clinical responders. The association with both molecular
dysregulation and clinical response makes this subset of genes highly
relevant for the discovery of novel therapeutic targets.
Discussion
The purpose of this study was the identification of DM1-specific
therapeutic biomarkers in peripheral blood. The multisystem nature of
DM1 is known to be reflected by laboratory abnormalities of peripheral
blood, making it together with its accessibility a promising tissue for
biomarker studies in this disease [[146]26]. Hence, we used blood
samples of 27 DM1 patients from the OPTIMISTIC cohort to study the
associations of gene expression with disease severity as well as the
response towards the CBT intervention. In an effort to fairly represent
the whole OPTIMISTIC cohort and to facilitate the generalizability of
the study findings, a stratified random sampling procedure was
implemented which resulted in a balanced patient cohort with respect to
age, CTG-repeat length, sex, therapy response, and clinical site
distributions. Patients with an interrupted CTG repeat were excluded in
order to limit molecular heterogeneity induced by slower disease
progression rates [[147]5]. Nonetheless, we identified substantial
heterogeneity in molecular expression profile changes after the
10-month CBT intervention. Furthermore, we identified gene sets that
were significantly associated with the disease-causing CTG repeat as
well as with the average response towards the CBT intervention across
different clinical outcome measures. Most interestingly, an overlap of
97 genes among these latter two gene sets has been identified, showing
a clear trend of more normal expression levels in clinical responders.
Based on these different gene sets, several biological pathways
associated with DM1 have been discovered, as well as specific
genes/gene families with ties to neuro(-muscular) disorders.
The OPTIMISTIC study has shown that DM1 patients significantly improve
in their capacity for activity and social participation after the CBT
intervention [[148]18]. It was furthermore hypothesized that CBT may
directly or indirectly improve other biological systems affected by the
disease. This hypothesis has been confirmed for muscles of the lower
extremity, showing an increase in cross-sectional area as a result of
the intervention [[149]58]. Here, we set out to further explore this
hypothesis by investigating changes on the molecular level. These
changes may be the result of increased physical activity, which has
been linked to differences in gene expression in previous studies
[[150]59], but may also be a more direct effect of the
psychotherapeutic CBT intervention [[151]60].
In line with the results of an earlier study, we have illustrated that
the clinical response towards the CBT intervention was rather
heterogenous [[152]21]. A novel addition to this finding was the
illustration that this heterogeneity extends towards changes in
molecular profiles within a 10-month timeframe. Importantly, this
heterogeneity could not be explained by changes in the cellular
composition of the blood samples between the two time points, as the
similarity of cell type composition has been verified. Additionally,
this heterogeneity could not be explained by changes of different
outcome measures such as the DM1-Activ-c, 6MWT, or compound response.
While the CBT intervention likely played a part in this heterogeneity,
the magnitude of this contribution could not be assessed due to the
lack of a control group. Other factors, such as aging or seasonal
effects, may also have contributed to this finding.
Across the different gene sets identified in this study, several of the
genes with the lowest p-values (SLC39A8, IRS2, FBXO48) and one
WikiPathway (transcription factor regulation in adipogenesis) were
associated with insulin signaling or more broadly related to
metabolism/adipogenesis [[153]61–[154]63]. Dysregulation of insulin
signaling has been linked to clinical features of DM1 and is an
actively ongoing field of investigation [[155]64]. Aberrant insulin
signaling has also been found in other diseases of the nervous system
such as depression, with indirect improvements being observed after CBT
[[156]65]. Interestingly, the anti-diabetic drug metformin has been
shown to improve the mobility of DM1 patients with effect sizes of the
6MWT comparable to those observed in the OPTIMISTIC study [[157]66].
With increasing therapeutic interest in this area, our findings suggest
that disease-relevant insulin signaling can be studied on a molecular
level in blood samples, highlighting the utility of peripheral blood in
this setting.
Similarly, across most of the gene sets, we identified several
WikiPathways associated with immunological functions (cell-specific
immune response, chemokine signaling pathway, IL-3, 4, and 5
signaling). While this may be in part due to a bias introduced by the
profiled tissue, the immune system likely plays a role in the DM1
pathophysiology like for many other chronic diseases [[158]67]. As
such, blood sample-based immunology studies may be an interesting field
of future investigation.
The intersection of the genes significantly associated with the
disease-causing CTG repeat, as well as the average CBT response across
different outcome measures, revealed a subset of 97 genes. These genes
are of particular interest for the identification of therapeutic
biomarkers, as their disease association has been confirmed in an
external dataset and they showed normalization of expression levels in
clinical responders. Among the genes with the lowest p-values
associated to both CTG-repeat length as well as CBT response were
HDAC5, DNAJB12, and TRIM8. In total, the subset of 97 genes consisted
of two HDACs (histone deacetylases, HDAC5, HDAC7). HDACs play an
important role in transcriptional regulation and compounds that inhibit
HDAC enzymes are being studied for their potential effect on a range of
human diseases, including neurological disorders [[159]68]. The DNAJB12
protein is a member of the heat shock protein family, with some
evidence supporting positive effects of their induction for muscular
dystrophy and other muscle wasting conditions [[160]69]. The TRIM
family protein TRIM72 has been shown to be an essential component of
the cellular membrane repair in muscles, with evidence supporting some
positive effects in mouse models of muscular dystrophy [[161]70].
Authors of the same study suggest the potential of other TRIM family
members as potential targets in similar disease states, which may
support the further investigation of TRIM8 in DM1. Although mostly
associated with therapy response, RARA and RXRA were also among the
overlapping 97 genes. Stimulating retinoic acid signaling has been
linked to muscle regeneration in mouse models via increased
proliferation of fibro/adipogenic progenitor cells, highlighting the
relevance of this pathway as another potentially DM1-relevant drug
target [[162]71]. Taken together, these findings confirm the value of
whole blood-based expression profiling for the discovery of therapeutic
biomarkers in DM1.
Interestingly, the genes significantly associated with the CTG repeat
showed a moderate correlation with the genes associated with DMD body
measurements of an external study. We hypothesize that these body
measures are likely also correlated with age, which in turn reflects
disease progression. This suggests that some of our results may
therefore not be DM1 specific, but rather reflect non-specific
molecular dysregulations shared across different (neuromuscular)
disorders. This hypothesis is supported by the significant association
between the CTG repeat with MMP9, which is known to be a non-specific
biomarker that has for instance been linked to cardiac remodeling after
myocardial infarction, inflammation, and DMD [[163]72, [164]73]. We
therefore deem further exploration of shared dysregulations as highly
valuable, as this may lead to the discovery of therapeutic targets
relevant to a variety of diseases.
Although DM1 is known as an alternative splicing disease, only four
splice events have been significantly linked to the disease-causing CTG
repeat in this study. This may be the result of relatively low DMPK
expression in blood [[165]74] and is in line with the absence of strong
splice aberrations in blood from DM1 patients compared to controls
[[166]25]. DMPK’s low expression in blood cells may also explain the
lack of concordance between our disease severity-associated gene
expression differences observed in blood with gene expression
differences observed in the muscle and brain. So, while whole
blood-based transcription profiling can identify disease-relevant
molecular dysregulations, these dysregulations do likely not fully
reflect the dysregulations observed in other tissue types. Yet, we
found a high correlation of the CTG-repeat effect size with the DM1
phenotype effect size of a different blood-based study, as well as with
a principal component derived body measure association of a DMD-based
study. While the former validates our findings, the latter suggests the
possibility of shared, disease-relevant, dysregulations across
different neuro(muscular) disorders detectable in peripheral blood. If
true, associated pathways might reveal highly interesting targets for
drug discovery, as they may have a positive influence on multiple
diseases.
Limitations of this study
To find disease-relevant gene expression in blood, we searched for
linear associations with the disease-causing CTG-repeat length. While
the CTG-repeat length is thought to be the main driver of molecular
dysregulation, associations between the progenitor allele length of the
CTG repeat with several clinical outcome measures, including
DM1-Activ-c and 6MWT, have been found to be only weak-moderate
[[167]13]. In line with the previously published challenges to directly
relate gene expression to clinical phenotypes, we were not able to find
significant, direct associations between clinical outcome measures and
gene expression. Still, among the genes with the lowest p-values for
the DM1-Activ-c questionnaire was GSKIP, a gene encoding for an
inhibitor protein of the known DM1 drug target GSK3-β
[[168]75–[169]77]. Given the biological relevance of this finding, we
deem it likely that the current study design was underpowered to study
the association of gene expression with individual clinical outcome
measures, especially when taking clinical and molecular heterogeneity
into account.
The clinical heterogeneity in therapy response may in part be explained
by the personalized nature of the CBT intervention, with therapy foci
being tailored towards the needs and wishes of the individual patient.
As a consequence, one might expect different outcome measures to be
more appropriate for CBT efficacy assessments for different patients.
Yet, the identification of molecular signatures associated with therapy
response necessitates the use of the same clinical outcome measure. For
this reason, and to average out some of the uncertainty inherently
associated with the recording of the different outcome measures, we
settled on the use of the compound response score. While the scaling
assured a more or less equal contribution of each outcome measure to
this score, we acknowledge that this combined score is biased by the
outcome measures that were used in OPTIMISTIC.
Even though we statistically corrected for non-specific molecular
changes between the two time points, the lack of RNA-seq profiles from
the OPTIMISTIC control arm makes it difficult to state with certainty
that the observed molecular changes are due to the therapy itself.
However, this does not discount their value as potential therapeutic
targets, as they are, regardless of the mediation factor, significantly
associated with improved clinical status. Moreover, for this reason, we
deemed studying the RNA-seq profiles of the OPTIMISTIC control arm to
be less valuable, as these patients did not significantly clinically
declined over the 10-month timeframe [[170]18].
Conclusions
Starting from DM1-specific disease determinants, the OPTIMISTIC study
has shown that patient-tailored CBT can increase the health status of
DM1 patients by improving social participation and activity. It was
furthermore hypothesized that the CBT intervention positively
challenges the biological system, which has already been confirmed by
increased cross-sectional area for muscles of the lower extremities.
Making use of the clinical heterogeneity in therapy response, we here
additionally confirmed disease-relevant molecular changes in peripheral
blood. Not only do our results highlight the utility of peripheral
blood to study the multisystem nature of the disease, but also
generated the foundation for an upcoming, multi-omics-based drug
repurposing study.
Supplementary Information
[171]12916_2022_2591_MOESM1_ESM.docx^ (18.9KB, docx)
Additional file 1: Table S1. PCR duplicates.
[172]12916_2022_2591_MOESM2_ESM.zip^ (12.6MB, zip)
Additional file 2: Figure S1. Patient characteristics. Box-plots
illustrating the distribution of age (A) and CTG-repeat length (B) of
the 27 patients at baseline, separated by sex. C) Box-plots
illustrating the change in DM1-Activ-c score after the intervention
versus before, as expressed in Delta DM1-Activ-c scores, separated by
sex. Figure S2. DMPK expression. Panel A shows the expression values of
DMPK in logCPM before and after the CBT intervention for all 27
patients. Panel B shows the association between the CTG repeat length
and the change in DMPK expression, as calculated by expression levels
after the intervention minus the expression levels at the start of the
trial. In addition, the Pearson correlation of this association is
shown. Figure S3. Comparison of gene expression associated to DM1 in
other studies with that to the CTG repeat in this study. The mean
differences in normalized gene counts were calculated between DM1 and
control samples for four studies comparing blood (A (25, 44)), heart (B
(15, 42)), brain (C (12, 43)) and tibialis muscle (D, (11, 41)) and
plotted against the effect sizes for the CTG-repeat in this study
(ReCognitION) for genes that were measured in both studies. In E and F,
the correlation of expression to the inferred MBNL activity and muscle
strength(11), was compared to our effects for the CTG-repeat. In G and
H, gene expression associations in blood of the results of physical
tests and several body measurements for Duchenne muscular dystrophy
(DMD) patients (22) are compared to CTG-repeat associations from our
study. In (22), associations with physical tests and body measurements
were based on the first principal component over associated measures,
each reflecting respectively 78% and 70% of the associated measurements
variances. Depicted on the top left in each graph is the Pearson
correlation coefficient for the plotted values with the associated
p-values. Figure S4. Splice exclusion and CTG-repeat length. PSI values
for splice exclusion events were determined using rMATS [. For each of
the PSI values a linear mixed effect model was fitted with the modal
CTG repeat length as covariate and patient as random effect. A) Volcano
plot of significance (-log 10 of the nominal p-values of the modal
CTG effect size) and the CTG effect size for the PSI values.
Significant results after FDR correction (p < 0.05) are marked in
black. B) For the four PSI values with the lowest nominal p-values from
A, the PSI values are plotted against the modal CTG repeat lengths
before (blue) and after the CBT intervention (red) including the
Pearson correlation coefficients. Figure S5. Gene expression
association with DM1-Activ-c. For each gene a mixed effect model was
fitted with before/after CBT and DM1-Activ-c scores as fixed effects,
while accounting for (random) effects of the individual. The p-values
for the fixed effects were estimated via Satterthwaite’s freedom method
and FDR corrected. A) Volcano plot of the significance (-10log of the
nominal p-value) and effect size of the DM1-Activ-c scores on
gene expression. B) For the four genes with the lowest nominal p-values
from panel A, the DM1-Activ-c scores are plotted against the gene
expression values (logCPM). Blue dots represent baseline values, red
dots values after CBT. The regression line indicates the linear
association independent of the timepoints. Similarly, the Pearson
correlation coefficients shown are independent of the
timepoints. Figure S6. Shared explained variance among CTG-repeat and
Compound Response predictors. To assess the overlap in gene expression
level variance explained by the CTG-repeat length and the Compound
Response score, the Compound Response score was fitted on the residuals
of the CTG-repeat length as fixed effect, while accounting for random
effects of the patient. A) The effect sizes of the Compound Response
score as estimated on the CTG-repeat model residuals are
plotted against the effect sizes Compound Response scores as presented
in this study. The Rho score reflects the Pearson correlation
coefficient. B) Analogous to Figure 5B, the Compound Response score
effect size as estimated on the CTG-repeat model residuals are plotted
against the CTG repeat effect size size as estimated on the CTG-repeat
model residuals are plotted against the CTG repeat effect size scaled
between -0.25 and 0.25, resulting in the removal of 12 outliers.
Colored in purple are the same 97 genes as in Figure 5B.
Acknowledgements