Abstract

Background

   Myotonic dystrophy type 1 (DM1) is an incurable multisystem disease
   caused by a CTG-repeat expansion in the DM1 protein kinase (DMPK) gene.
   The OPTIMISTIC clinical trial demonstrated positive and heterogenous
   effects of cognitive behavioral therapy (CBT) on the capacity for
   activity and social participations in DM1 patients. Through a process
   of reverse engineering, this study aims to identify druggable molecular
   biomarkers associated with the clinical improvement in the OPTIMISTIC
   cohort.

Methods

   Based on full blood samples collected during OPTIMISTIC, we performed
   paired mRNA sequencing for 27 patients before and after the CBT
   intervention. Linear mixed effect models were used to identify
   biomarkers associated with the disease-causing CTG expansion and the
   mean clinical improvement across all clinical outcome measures.

Results

   We identified 608 genes for which their expression was significantly
   associated with the CTG-repeat expansion, as well as 1176 genes
   significantly associated with the average clinical response towards the
   intervention. Remarkably, all 97 genes associated with both returned to
   more normal levels in patients who benefited the most from CBT. This
   main finding has been replicated based on an external dataset of mRNA
   data of DM1 patients and controls, singling these genes out as
   candidate biomarkers for therapy response. Among these candidate genes
   were DNAJB12, HDAC5, and TRIM8, each belonging to a protein family that
   is being studied in the context of neurological disorders or muscular
   dystrophies. Across the different gene sets, gene pathway enrichment
   analysis revealed disease-relevant impaired signaling in, among others,
   insulin-, metabolism-, and immune-related pathways. Furthermore,
   evidence for shared dysregulations with another neuromuscular disease,
   Duchenne muscular dystrophy, was found, suggesting a partial overlap in
   blood-based gene dysregulation.

Conclusions

   DM1-relevant disease signatures can be identified on a molecular level
   in peripheral blood, opening new avenues for drug discovery and therapy
   efficacy assessments.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12916-022-02591-y.

   Keywords: Myotonic dystrophy type 1, Biomarker, RNA-seq, Peripheral
   blood, Therapeutic Response, Lifestyle intervention

Background

   Myotonic dystrophy type 1 (DM1) is a neuromuscular disease with a
   worldwide average prevalence of around 1 in 8000 people and a high
   unmet clinical need [[33]1]. DM1 is considered the most frequently
   occurring adult-onset form of muscular dystrophy. This degenerative
   multisystem disease is characterized by a wide range of symptoms
   including myotonia, muscle weakness and dystrophy, fatigue, apathy,
   cataracts, obesity, and insulin resistance. Next to a severe decrease
   of life quality, DM1 patients suffer from a reduced life expectancy
   mostly due to problems with cardiac and respiratory function.
   Currently, no curative therapy exists.

   DM1 is caused by the expansion of a CTG trinucleotide microsatellite
   repeat in the 3′ UTR of the DM1 protein kinase (DMPK) gene
   [[34]2–[35]4]. Unaffected individuals carry up to approximately 37 CTG
   triplets in DMPK, while in DM1 patients this ranges from 50 to even a
   few thousand repetitions. Depending on the inherited repeat length, DM1
   can become manifest at birth or early in life but more frequently
   becomes apparent in adulthood [[36]1]. In general, the disease
   manifestation is earlier and more severe with longer repeat expansions.
   Interruption of the CTG repeat by variants such as CCG or CGG is
   associated with milder symptoms [[37]5]. The expanded CTG repeat is
   thought to cause disease mainly via an mRNA gain-of-function mechanism,
   in which aberrant hairpin structures formed by long CUG repeats are
   central [[38]6–[39]9]. Directly or indirectly, these hairpin structures
   dysregulate the function of RNA binding proteins from the
   muscleblind-like (MBNL) and CUGBP Elav-like (CELF) families, leading to
   widespread disturbed RNA processing and consequently altered functions
   of various proteins [[40]10–[41]12]. Although proven to be the
   disease-causing mutation, clinical symptoms of DM1 are only moderately
   associated with the CTG repeat or the dysregulation of specific
   proteins which suggests an involvement of other mechanisms in symptom
   expression [[42]13–[43]16].

   While there are many promising therapeutic oligonucleotides, small
   molecule drugs, and gene therapies in the (pre) clinical pipeline for
   some of the signs and symptoms of DM1, none is expected to reach
   widespread clinical application soon. Physical training and increasing
   activity are currently being applied to relieve DM1 symptoms with
   marked improvements in relatively mildly affected DM1 patients [[44]17,
   [45]18], which has furthermore shown to induce biochemical responses in
   DM1 mouse models [[46]19, [47]20].

   The to-date largest clinical trial in DM1 was OPTIMISTIC: Observational
   Prolonged Trial In Myotonic dystrophy type 1 to Improve Quality of
   Life-Standards, a Target Identification Collaboration [[48]18]. The
   OPTIMISTIC clinical trial included over 250 well-characterized DM1
   patients from four centers in Europe, where the effects of cognitive
   behavioral therapy (CBT) and optional graded exercise therapy were
   closely monitored over 16 months via more than twenty outcome measures.
   Notably, the CBT intervention was tailored towards the specific needs
   of the patient in a shared decision-making process between the patient
   and the psychotherapist, allowing for a personalized intervention. The
   trial has shown significant, yet heterogenous improvements for various
   signs and symptoms, as well as the capacity for social activity and
   participation in DM1 [[49]21].

   Here, we set out to find molecular profiles associated with the
   disease-causing CTG repeat and therapy response based on full blood
   mRNA sequencing before and after the CBT intervention of 27 patients
   from the OPTIMISTIC cohort. Given the accessibility of peripheral
   blood, it has increasingly been used for the successful identification
   of disease biomarkers for a variety of neurological and psychiatric
   disorders such as Duchenne muscular dystrophy (DMD), Huntington’s
   disease, major depressive disorder, and DM1 [[50]22–[51]25].
   Furthermore, the multisystem nature of DM1 is known to be reflected by
   various laboratory abnormalities of blood samples, supporting the
   relevance of peripheral blood for the identification of
   disease-relevant information [[52]26]. We analyzed gene expression
   levels as a function of CTG repeat size (as a proxy for disease
   load/severity) and of the therapy response. Next, we combined these
   findings and compared the results to various previously published
   datasets. We were able to identify 608 genes significantly associated
   with the CTG repeat and further illustrate that 97 of these genes
   returned towards more normal expression levels in clinical CBT
   responders.

Methods

The cognitive behavioral therapy intervention

   Patients of the OPTIMISTIC intervention arm were treated with a
   personalized form of CBT. The customization of the intervention was
   based on a selection of different treatment modules: regulating
   sleep-wake patterns, compensating for the reduced patient initiative,
   formulating helpful beliefs about fatigue and myotonic dystrophy type
   1, optimizing social interactions, and coping with pain [[53]18]. The
   individual module selection was made based on a shared decision-making
   process between experienced and specifically trained CBT therapists and
   patients.

Sample source and patient sampling

   Samples and metadata used for this study were all gathered during the
   OPTIMISTIC clinical trial [[54]18]. At the different time points in the
   trial, blood was drawn and a wide range of clinical outcome measures
   were recorded. Figure [55]1 has been generated to illustrate the
   heterogeneity in changes across all grouped outcome measures, with
   annotation of all individual outcome measures in the legend. In order
   to maximize the generalizability of the study findings, the goal of the
   patient sampling was to obtain a balanced subset of the OPTIMISTIC
   intervention group (n=128). Additionally, by capturing the whole range
   of therapy responses in a continuous uniform distribution, as assessed
   by the primary clinical trial outcome DM1-Activ-c [[56]27], strong
   linear associations between non-responders and responders could be
   studied. To promote future research, the sampling was furthermore done
   on the most completely characterized patients. To achieve this, several
   filter steps have been applied before the random sampling. Patients
   were selected for which the DM1-Activ-c questionnaire results were
   available at each time point (n=104), with less than 20% missing values
   for other outcome measures (n=81) and without a variant CTG repeat
   (n=74). Homogeneity of baseline disease severity was accounted for by
   selecting patients that were within one interquartile range (IQR) of
   the mean for the baseline variables DM1-Activ-c, 6MWT, and CTG-repeat
   length (n=45). One patient was excluded because of polypharmacy (n=44).
   One patient was excluded because of a drop of 57 points of the
   DM1-Activ-c score between the baseline and 5-month assessments with a
   subsequent increase of 55 points between the 5- and 10-month
   assessments (n=43).

Fig. 1.

   [57]Fig. 1
   [58]Open in a new tab

   Distribution of changes in outcome measures per patient. Per outcome
   measure, changes between baseline and 10 months of CBT were scaled by
   the root mean square. Additionally, for some outcome measures, a sign
   adjustment was performed so that an increased score is always
   associated with improved health status. All outcome measures are shown
   per patient, where patients were ordered along the y-axis by their
   change in DM1-Activ-c scores (purple squares). Boxes enclose the 25th
   to 75th percentiles, divided by a thick line that represents the mean
   compound response score. The whiskers represent the lowest/highest
   value no further than 1.5 times the interquartile range. Quality of
   life (red): Myotonic Dystrophy Health Index, Individualized
   Neuromuscular Quality of Life Questionnaire, Adult Social Behavioural
   Questionnaire, Illness Management Questionnaire, Checklist individual
   strength — Subscale activity. Physical assessments (blue): Six-Minute
   Walk Test, BORG Scale, accelometery measures. Fatigue scores (black):
   Fatigue and Daytime Sleepiness Scale, Checklist Individual Strength —
   Subscale fatigue, Jacobsen Fatigue Catastrophizing Scale. Cognition and
   other (gray): Trail Making Test, Stroop Color-Word Interference Test,
   McGill Pain Questionnaire, Beck Depression Inventory — Fast Screen,
   Social Support — Discrepancies and Negative Interactions, Apathy
   Evaluation Scale — Clinical version, Self-Efficacy Scale 28

   The distribution of these 43 patients over the clinical sites A, B, C,
   and D was A12, B11, C16, and D4. Therefore, all patients from site D
   were selected. For the remaining sites, a stratified random sampling
   approach was implemented, where patients were randomly sampled from the
   different sites and a maximum of two patients with the same change in
   DM1-Activ-c were selected. This process of random selection was
   repeated until a reasonable site and delta-DM1-Activ-c distribution was
   achieved, defined as more than 7 patients for sites A, B, and C,
   resulting in the final selection of 30 patients. Due to the
   unavailability of samples and unsuccessful RNA sequencing, three
   patients were later excluded (n=27). The final selection featured a
   site distribution of 5 times center A, 8 B, 10 C, 4 D, and 22 unique
   changes DM1-Activ-c scores, with no change in the DM1-Activ-c score
   being present more than twice.

RNA sequencing

   Blood drawn during the OPTIMISTIC trial was collected in Tempus tubes
   and centrally stored at the New Castle MRC Centre for Rare &
   Neuromuscular Diseases biobank with strict SOPs and temperature control
   (−80°C). RNA was locally isolated in Nijmegen using the Tempus Spin RNA
   Isolation Kit (Applied Biosystems/Thermo Fisher Scientific) according
   to the manufacturer’s instructions. The concentration and RNA Integrity
   Number (RIN) were checked using Fragment Analyzer (Thermo Fisher
   Scientific). The mean RIN value was 8.9 and all were > 7.5. Hemoglobin
   mRNA was depleted using the Globinclear kit (Thermo Fisher Scientific).
   Libraries were prepared using NEBNext Ultra II Directional RNA Library
   Prep Kit (Illumina) according to the manufacturer’s instructions for a
   polyA mRNA workflow using UMI-indexed adapters. The size distribution
   (between 300 and 500 bp) was confirmed using Fragment Analyzer. A total
   of 150-bp paired end sequencing was performed with a NovaSeq6000
   machine (Illumina) at a library concentration of 1.1 nM, generating >
   30 M read pairs per sample. All raw sequencing data and associated
   genotype/phenotype/experimental information is stored in the European
   Genome-phenome Archive (EGA) under controlled access with Dataset ID
   EGAS00001005830 [[59]28].

RNA-sequencing primary data analysis

   Adapter sequences and low-quality base calls were removed from fastq
   files using cutadapt 3.4 via TrimGalore 0.6.6 at no other default
   parameters than the --paired flag [[60]29]. Trimmed fastq files were
   mapped to the human genome version hg38.95 using STAR 2.7.0 at default
   parameters and --outSAMtype BAM SortedByCoordinate [[61]30]. After
   indexing using samtools [[62]31] at default parameters, PCR duplicates
   were removed from the bam files using umi-tools dedup with the flags
   --spliced-is-unique, --paired and --output-stats (Additional file
   [63]1: Table S1) [[64]32]. Strandedness was verified via RSeQC’s
   infer_experiment [[65]33]. After indexing the deduplicated bam files,
   reads were counted for overlap with hg38.95 genes via HTSeq with
   parameters --format bam --order pos and --stranded=yes [[66]34]. EPIC,
   quanTIseq, and xCell algorithms were applied to the count tables to
   verify that the cell type compositions were similar at the two time
   points [[67]35–[68]37]. GATK HaplotypeCaller and Picard
   GenotypeConcordance were used to check the correct matching of samples
   from the same patient [[69]38, [70]39].

   Splice analysis was performed using rMATS v4.1.0 [[71]40] via the same
   gtf as for STAR/HTSeq with the parameters and flags: -t paired
   --readLength 150 --variable-read-length --novelSS --libType
   fr-firststrand --statoff.

Comparative DM1 datasets

   In order to validate our findings, we obtained several external DM1
   gene expression datasets of different tissue types (tibialis muscle
   [[72]11, [73]41], heart [[74]15, [75]42], brain [[76]12, [77]43], and
   peripheral blood [[78]25, [79]44]). Additionally, table S5 from Wang et
   al. was directly obtained from the publication [[80]11]. The dataset
   EV10 from Signorelli et al. was obtained for the association between
   gene expression (logFC) and body or performance test in DMD patients
   [[81]22]. To compare DM1 samples to controls, a two-sided two-sample
   Wilcoxon test was performed using row_wilcoxon_twosample in matrixTests
   R package on normalized, log-transformed gene counts [[82]45].

Statistical analysis

   All statistical analyses were carried out in R [[83]46]. For gene
   expression analysis, firstly, genes with low read counts before and
   after CBT were filtered using edgeR filterByExpr with group =
   before/after CBT and min.count = 50 [[84]47]. Following, normalized
   logCPM values and weights were calculated from the filtered read counts
   with Voom in Limma [[85]48].

   To achieve an overarching measure for CBT response, first, the changes
   for each outcome measure were calculated per patient by subtracting the
   value after 10 months of CBT from that at baseline. Where applicable,
   outcome measures were multiplied by −1 in order to always associate
   positive changes with an improved health status. Using R base scale,
   the changes per outcome measure were then scaled without centering to
   account for the different scales of the outcome measures. Finally, for
   each patient, a “Compound Response” score was calculated based on the
   mean of all scaled outcome measures. Individual contributions towards
   this compound response score were visualized (Fig. [86]1). L5ENMO, the
   mean activity during rest, was a control parameter in OPTIMISTIC and
   was excluded from this analysis.

   We first set out to explore differential gene expression before and
   after the CBT intervention. Gene expression values from Voom (in
   logCPM) were separately modeled using mixed effect models with
   before/after CBT (categorical) as fixed effect and patient identity as
   random effect ([87]1). Gene weights were also carried over from Voom.
   This analysis has been implemented using lme in the lme4-wrapper
   lmerTest [[88]49, [89]50]. lmerTest estimates a p-value for the
   contribution of fixed effects to the model via Satterthwaite’s degrees
   of freedom method. Parameters of the fits were extracted with R base
   summary and p-values were FDR corrected via the Benjamini and Hochberg
   method with stats p.adjust [[90]51].
   [MATH: <mrow><mi>G</mi><mi>e</mi><mi>n</mi><mi>e</mi><mspace
   width="4pt"></mspace><mi>e</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</mi>
   <mi>s</mi><mi>s</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo>=</mo><mi>C</mi><
   mi>B</mi><mi>T</mi><mfenced close=")" open="("><mn>0</mn><mo
   stretchy="false">/</mo><mn>1</mn></mfenced><mo>+</mo><mfenced close=")"
   open="("><mn>1</mn><mo
   stretchy="false">|</mo><mi>P</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>e</m
   i><mi>n</mi><mi>t</mi></mfenced><mo>,</mo><mi>g</mi><mi>e</mi><mi>n</mi
   ><mi>e</mi><mspace
   width="4pt"></mspace><mi>w</mi><mi>e</mi><mi>i</mi><mi>g</mi><mi>h</mi>
   <mi>t</mi><mi>s</mi></mrow> :MATH]
   1

   In order to study the cohort heterogeneity of gene expression changes,
   we calculated the logCPM-based difference in gene expression between
   before and after the intervention for each patient for the 560 genes
   significantly associated with the CBT predictor of ([91]1) (adjusted p
   < 0.05). Patients and genes with similar expression changes were
   clustered using the R package heatmap3 based on the complete linkage
   method for hierarchical clustering, with gene expression values being
   centered and scaled per gene [[92]52]. Changes in clinical response
   (DM1-Activ-c, Six-Minute Walk Test (6MWT), and compound response) were
   visualized using the corrplot function and added to the heatmap
   [[93]53]. The changes in DM1-Activ-c and 6MWT were scaled using R base
   scale without centering to account for the different scales.

   Next, using the same methodology as for the CBT intervention effect, we
   set out to explore the associations of the different clinical outcome
   measures and CTG-repeat length with gene expression. For this purpose,
   we separately modeled gene expression values with either one of the
   outcome measures or the CTG-repeat length (at the trial start, ([94]2)
   as fixed effect and patient identity as random effect. The categorical
   CBT covariate (before/after) was included for each fit to correct for
   differences between the two time points.
   [MATH: <mstyle displaystyle="true"
   scriptlevel="0"><mrow><mtable><mtr><mtd><mrow><mi>G</mi><mi>e</mi><mi>n
   </mi><mi>e</mi><mspace
   width="4pt"></mspace><mi>e</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</mi>
   <mi>s</mi><mi>s</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo>=</mo><mi>C</mi><
   mi>B</mi><mi>T</mi><mfenced close=")" open="("><mn>0</mn><mo
   stretchy="false">/</mo><mn>1</mn></mfenced><mo>+</mo><mi>C</mi><mi>T</m
   i><mi>G</mi><mi>_</mi><mi>r</mi><mi>e</mi><mi>p</mi><mi>e</mi><mi>a</mi
   ><mi>t</mi><mo>*</mo><mo>+</mo><mfenced close=")"
   open="("><mn>1</mn><mo
   stretchy="false">|</mo><mi>P</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>e</m
   i><mi>n</mi><mi>t</mi></mfenced><mo>,</mo><mi>g</mi><mi>e</mi><mi>n</mi
   ><mi>e</mi><mspace
   width="4pt"></mspace><mi>w</mi><mi>e</mi><mi>i</mi><mi>g</mi><mi>h</mi>
   <mi>t</mi><mi>s</mi></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow><mr
   ow></mrow><mo>*</mo><mtext>Same</mtext><mspace
   width="4pt"></mspace><mtext>repeat</mtext><mspace
   width="4pt"></mspace><mtext>length</mtext><mspace
   width="4pt"></mspace><mtext>fitted</mtext><mspace
   width="4pt"></mspace><mtext>for</mtext><mspace
   width="4pt"></mspace><mtext>both</mtext><mspace
   width="4pt"></mspace><mtext>T</mtext><mn>0</mn><mspace
   width="4pt"></mspace><mtext>and</mtext><mspace
   width="4pt"></mspace><mtext>T</mtext><mn>10</mn><mtext>M</mtext></mrow>
   </mtd></mtr></mtable></mrow></mstyle> :MATH]
   2

   Analogous to the methodology described for the CTG-repeat association
   analyses, genes associated with overarching clinical response were
   identified by fitting separate mixed effect models for each gene with
   the two fixed effects CBT (categorical) and compound response, as well
   as patient identity as random effect ([95]3). Notably, the compound
   response variable has only been fitted for gene expression after the
   CBT intervention (zero at baseline). As such, the compound response
   predictor reflects the difference between the two time points that can
   be attributed towards therapy responsiveness, while accounting for
   non-therapy-specific differences by including the categorical CBT
   predictor.
   [MATH: <mstyle displaystyle="true"
   scriptlevel="0"><mrow><mtable><mtr><mtd><mrow><mi>G</mi><mi>e</mi><mi>n
   </mi><mi>e</mi><mspace
   width="4pt"></mspace><mi>e</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</mi>
   <mi>s</mi><mi>s</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo>=</mo><mi>C</mi><
   mi>B</mi><mi>T</mi><mfenced close=")" open="("><mn>0</mn><mo
   stretchy="false">/</mo><mn>1</mn></mfenced><mo>+</mo><mi>C</mi><mi>o</m
   i><mi>m</mi><mi>p</mi><mi>o</mi><mi>u</mi><mi>n</mi><mi>d</mi><mi>_</mi
   ><mi>R</mi><mi>e</mi><mi>s</mi><mi>p</mi><mi>o</mi><mi>n</mi><mi>s</mi>
   <mi>e</mi><mfenced close=")" open="("><mn>0</mn><mo
   stretchy="false">/</mo><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi><mo>.</m
   o></mfenced><mo>*</mo><mo>+</mo><mfenced close=")"
   open="("><mn>1</mn><mo
   stretchy="false">|</mo><mi>P</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>e</m
   i><mi>n</mi><mi>t</mi></mfenced><mo>,</mo><mi>g</mi><mi>e</mi><mi>n</mi
   ><mi>e</mi><mspace
   width="4pt"></mspace><mi>w</mi><mi>e</mi><mi>i</mi><mi>g</mi><mi>h</mi>
   <mi>t</mi><mi>s</mi></mrow></mtd></mtr><mtr><mtd><mrow><mrow></mrow><mr
   ow></mrow><mo>*</mo><mi>C</mi><mi>o</mi><mi>m</mi><mi>p</mi><mi>o</mi><
   mi>u</mi><mi>n</mi><mi>d</mi><mi>_</mi><mi>R</mi><mi>e</mi><mi>s</mi><m
   i>p</mi><mi>o</mi><mi>n</mi><mi>s</mi><mi>e</mi><mspace
   width="4pt"></mspace><mi>i</mi><mi>s</mi><mspace
   width="4pt"></mspace><mi>z</mi><mi>e</mi><mi>r</mi><mi>o</mi><mspace
   width="4pt"></mspace><mi>a</mi><mi>t</mi><mspace
   width="4pt"></mspace><mi>T</mi><mn>0</mn><mspace
   width="4pt"></mspace><mi>a</mi><mi>n</mi><mi>d</mi><mspace
   width="4pt"></mspace><mi>c</mi><mi>o</mi><mi>n</mi><mi>t</mi><mi>i</mi>
   <mi>n</mi><mi>u</mi><mi>o</mi><mi>u</mi><mi>s</mi><mspace
   width="4pt"></mspace><mi>a</mi><mi>t</mi><mspace
   width="4pt"></mspace><mi>T</mi><mn>10</mn><mi>M</mi></mrow></mtd></mtr>
   </mtable></mrow></mstyle> :MATH]
   3

   Potential biomarker candidates were discovered by intersecting the
   genes significantly associated with the CTG_repeat predictor from model
   ([96]2) and the Compound_Response predictor from model ([97]3). Pearson
   correlation coefficients and the associated nominal p-values were
   calculated between the change in gene expression and the change in
   clinical score (compound response, delta-DM1-Activ-c) for these
   potential biomarkers using the corr.test function of the R package
   “psych” [[98]54].

   For the splicing analysis, the PSI values for splice exclusion (SE)
   events were extracted from the rMATS output and fitted in linear models
   similar to those for gene expression. Splice events were filtered by
   excluding exons from the analysis with less than three mapping reads
   and one junction spanning read in at least 14 samples.

   The R package ggplot2 was used for representation in volcano and
   scatter plots [[99]55]. The R package VennDiagram was used to generate
   the Venn diagram [[100]56].

Gene set enrichment analysis

   Gene set enrichment analyses have been independently implemented for
   the gene sets associated with CBT, CTG-repeat length, compound response
   scores, and the genes significantly associated with both CTG-repeat
   length and compound response using gProfiler [[101]57]. For the CBT,
   CTG-repeat length, and compound response-associated genes, the 500
   genes with the lowest nominal p-values were ordered (decreasing) based
   on their absolute regression coefficients. Subsequent enrichment
   analyses were implemented using the R client of gProfiler with the
   parameter orderd_querey = TRUE against a custom background of 10,292
   genes expressed in our samples. Multiple testing correction was based
   on the default setting “g_SCS.” We tested for enrichment (one-sided)
   pathways within the WikiPathway database. The gene set associated with
   both CTG-repeat length and compound response was based on an FDR
   threshold of 10% for the respective regression coefficients, resulting
   in a gene set of 311 genes. A regular, non-order weighted ORA
   (over-representation analysis) analysis was run for this gene set with
   ordered_quere = FALSE. Similar to the other analyses, one-sided
   (enrichment) pathway discovery was based on the WikiPathway database
   with the default setting “g_SCS” to correct for multiple testing. For
   all enrichment analyses, only significant pathways (p-adjusted < 0.05)
   are reported.

   The exact scripts and the resulting datasets of the statistical
   analyses are available via
   [102]https://github.com/cmbi/DM1_blood_RNAseq

Results

Patient sampling, procedure, and analysis of outcome measures

   For the identification of blood biomarkers that are associated with the
   clinical response to the CBT intervention, 27 patients from the
   OPTIMISTIC cohort were selected based on a random stratified sampling
   procedure. These patients reflected a uniform continuous distribution
   of therapy responses as assessed by the primary trial outcome, the
   DM1-Activ-c questionnaire.

   The sampled set consisted of 14 females and 13 males aged 19–63 years
   and represented a wide range of CTG-repeat lengths (Additional file
   [103]2: Fig. S1). All sampled patients received CBT, and
   mRNA-sequencing profiles were obtained at baseline and after 10 months
   of CBT, the primary endpoint of the OPTIMISTIC trial. Large clinical
   heterogeneity of clinical responses after CBT was observed across all
   of the different outcome measures. Figure [104]1 highlights the scaled
   differences across these outcome measures, color coded into five
   different groups (cognition and other, DM1-Activ-c, fatigue scores,
   physical assessments, and quality of life). Because of the large
   heterogeneity, we defined a compound response score. The compound
   response score is the mean of all scaled outcome measures (Fig.
   [105]1).

Marked changes in gene expression after CBT

   We first studied the molecular changes that occurred after the CBT
   intervention by comparing the mRNA expression levels in blood before
   and after the CBT intervention. We found that 560 genes were
   significantly up- or downregulated after CBT (277 genes down, 283 up,
   fold changes ranging between 0.64 and 2.35, Fig. [106]2A). Hierarchical
   clustering of patients based on the changes in these 560 genes revealed
   substantial molecular heterogeneity within this 10-month timeframe.
   There was no evident concordance between the clustering of samples
   based on changes in gene expression and the changes in DM1-Activ-c,
   6MWT, or compound response score (Fig. [107]2B). The four genes with
   the lowest p-values were GGCX, ZNF16, SERBP1, and SLC39A8 (Fig.
   [108]2C). Biological pathways significantly associated with these genes
   were limited to an immunological pathway and an electron transport
   chain in mitochondria (Table [109]1). DMPK expression did not
   significantly change over the course of the study, nor was it
   associated with the CTG-repeat length (Additional file [110]2: Fig.
   S2).

Fig. 2.

   [111]Fig. 2
   [112]Open in a new tab

   Changes in gene expression after cognitive behavioral therapy. A linear
   mixed effect model was fitted for each gene, estimating the fixed
   effect of CBT while accounting for random effects of the individual.
   The p-values for the fixed effect were estimated via Satterthwaite’s
   degrees of freedom method and FDR corrected. A Volcano plot of
   significance (−log10 of the nominal p-value) and the effect size for
   changed expression after 10 months of CBT. Genes for which the effect
   size of CBT is significant (FDR < 0.05) are visualized in black. B
   Heatmap of changes in normalized logCPM values between the baseline and
   the 10-month assessment for the 560 genes significantly associated with
   the CBT effect size (scores ranging from −3 (dark red) to +3 (dark
   blue)). Patients and genes were clustered based on the complete linkage
   method for hierarchical clustering, and values were centered and scaled
   per gene. For each patient, changes in DM1-Activ-c (delta-DM1-Activ-c)
   and Six-Minute Walk Test (delta-6MWT) as well as the compound response
   score were added (scores ranging from −1.6 (dark red) to +2.76 (dark
   blue)). Delta-DM1-Activ-c and delta-6MWT were scaled by their root mean
   square. C Expression values (logCPM) at baseline (blue) and after CBT
   (red) of the four genes with the lowest nominal p-values from panel A
   including their Pearson correlations

Table 1.

   Gene set enrichment analysis
   Query* Adjusted p-value** Term size*** Intersection size**** Term ID
   Term name Intersection with WikiPathway
   CBT_ORA 0.014 49 5 WP:WP5039 SARS-CoV-2 innate immunity evasion and
   cell-specific immune response CXCL1,PPBP,PF4,CXCR2,CXCL5
   CBT_ORA 0.037 91 8 WP:WP111 Electron transport chain (OXPHOS system in
   mitochondria) MT-ND3,ATP5F1E,MT-ND1,MT-ATP6,MT-CO2,MT-CO3,MT-ND4,MT-CO1
   CTG_ORA <0.001 46 10 WP:WP286 IL-3 signaling pathway
   FOS,CSF2RB,GAB2,HCK,LYN,MAPK3,RAF1,STAT3,STAT5B,PIK3CD
   CTG_ORA 0.001 49 10 WP:WP395 IL-4 signaling pathway
   FOS,IRS2,CEBPB,GAB2,NFIL3,MAPK3,STAT6,STAT3,STAT5B,PIK3CD
   CTG_ORA 0.002 26 3 WP:WP4564 Neural crest cell migration during
   development MMP9,EPHB4,FOS
   CTG_ORA 0.003 28 3 WP:WP4565 Neural crest cell migration in cancer
   MMP9,EPHB4,FOS
   CTG_ORA 0.004 29 7 WP:WP3972 PDGFR-beta pathway
   FOS,MAPK3,RAF1,MAP2K4,STAT6,STAT3,STAT5B
   CTG_ORA 0.006 29 2 WP:WP4808 Endochondral ossification with skeletal
   dysplasias MMP9,ALPL
   CTG_ORA 0.006 29 2 WP:WP474 Endochondral osssification MMP9,ALPL
   CTG_ORA 0.006 56 2 WP:WP2324 AGE/RAGE pathway MMP9,ALPL
   CTG_ORA 0.008 48 10 WP:WP304 Kit receptor signaling pathway
   FOS,GAB2,SH2B2,LYN,MAPK3,RAF1,STAT3,STAT5B,CRK,RPS6KA1
   CTG_ORA 0.011 37 8 WP:WP127 IL-5 signaling pathway
   FOS,CSF2RB,LYN,MAPK3,RAF1,STAT3,STAT5B,RPS6KA1
   CTG_ORA 0.013 31 7 WP:WP313 Signaling of hepatocyte growth factor
   receptor FOS,PTEN,MAPK3,RAF1,STAT3,CRK,PXN
   CTG_ORA 0.015 110 16 WP:WP3929 Chemokine signaling pathway
   CXCR2,TIAM2,HCK,NCF1,LYN,ARRB2,GNB2,PREX1,MAPK3,RAF1,STAT3,STAT5B,CRK,W
   AS,PXN,PIK3CD
   CTG_ORA 0.031 15 3 WP:WP3599 Transcription factor regulation in
   adipogenesis IRS2,CEBPD,CEBPB
   CTG_CR_ORA 0.002 110 13 WP:WP3929 Chemokine signaling pathway
   WAS,PXN,HCK,MAPK3,PREX1,ARRB2,PIK3R5,VAV1,NCF1,CXCL16,PIK3CD,GNB2,GRB2
   CTG_CR_ORA 0.006 46 8 WP:WP286 IL-3 signaling pathway
   GAB2,CSF2RB,HCK,MAPK3,PTPN6,VAV1,PIK3CD,GRB2
   CTG_CR_ORA 0.009 48 8 WP:WP304 Kit receptor signaling pathway
   GAB2,MAPK3,PTPN6,RPS6KA1,VAV1,SH2B2,JUNB,GRB2
   CTG_CR_ORA 0.016 115 12 WP:WP306 Focal adhesion
   ACTN1,PXN,HCK,MAPK3,CCND2,VASP,TLN1,VAV1,ZYX,ITGA5,PIK3CD,GRB2
   CTG_CR_ORA 0.032 31 6 WP:WP3937 Microglia pathogen phagocytosis pathway
   HCK,PTPN6,VAV1,NCF1,ITGB2,PIK3CD
   [113]Open in a new tab

   *Query refers to the gene sets used. CBT: 560 genes differentially
   expressed after the intervention; CTG: 608 genes associated with the
   CTG repeat; CTG_CR: 97 genes associated with both CTG repeat and
   compound response; ORA over-representation analysis

   **p-values associated with the finding after correction for multiple
   testing

   ***Number of genes in the identified pathway

   ****Number of genes in the provided gene set overlapping with the
   pathway

CTG-repeat length associations reflect molecular dysregulation in blood
across different studies

   Having a readily accessible fingerprint reflecting the molecular
   dysregulations associated with DM1 is potentially of high value for
   both clinical and research settings. To this end, we studied the
   correlations of blood-based gene expression with the disease-causing
   CTG-repeat length assessed at the start of the trial. The assessments
   were based on small pool PCR, Acil digestion, and Southern blot, as
   opposed to (estimations) of CTG-repeat length at birth or disease
   onset. This was done to minimize potential confounding effects related
   to the progressive nature of the disease and to assure homogeneity in
   the measurement methodology.

   Based on this approach, we identified 608 genes significantly
   associated with the CTG repeat at an FDR cutoff of 5% (474 positively,
   134 negatively, fold changes ranging between 0.76 and 1.23 per 100
   CTGs, Fig. [114]3A). The four genes with the lowest p-values RNF170,
   IRS2, NDE1, and PRIMPOL showed a clear linear relationship between the
   CTG-repeat length at baseline and expression values, both before and
   following the intervention (Fig. [115]3B). Most enriched pathways were
   related to immunological processes (IL-3, IL-4, IL-5 signaling;
   chemokine signaling pathway), yet also pathways related to
   adipogenesis, hepatocyte signaling, and AGE/RAGE were discovered.
   Interestingly, the gene MMP9 was among several of the
   CTG-repeat-associated pathways.

Fig. 3.

   [116]Fig. 3
   [117]Open in a new tab

   Gene expression levels associated with CTG-repeat length. For each
   gene, a mixed effect model was fitted with before/after CBT and
   CTG-repeat length as fixed effects, while accounting for (random)
   effects of the individual. The p-values for the fixed effects were
   estimated via Satterthwaite’s freedom method and FDR corrected. A
   Volcano plot of significance (−log10 of the nominal p-value) and effect
   size of the CTG-repeat length (per 100 CTGs) on gene expression. Genes
   for which the effect of CTG-repeat is significant (FDR < 0.05) are
   visualized in black. B For the four genes with the lowest nominal
   p-values from panel A, the gene expression values (logCPM) are plotted
   against the CTG-repeat length. Blue dots represent baseline expression
   values, and red dots expression values after CBT. The regression line
   is fitted over all values, independent of the time point. The Pearson
   correlation coefficients for the association between CTG-repeat length
   and gene expression are displayed for each gene

   To replicate and validate that these findings are disease relevant, we
   compared the CTG-repeat length effect size with effect sizes reflecting
   the differences in gene expression between DM1 and controls in various
   published datasets (Additional file [118]2: Fig. S3). We found a strong
   correlation with a study that performed mRNA sequencing on DM1 and
   control blood samples (Pearson rho: 0.59, Additional file [119]2: Fig.
   S3A [[120]25]). However, similar correlations were not observed for
   studies profiling DM1 and control tissues other than blood (heart
   [[121]15], brain [[122]12], and muscle [[123]11], Additional file
   [124]2: Fig. S3B-S3D). Neither were correlations observed with inferred
   MBNL activity or muscle strength measures from another study
   (Additional file [125]2: Fig. S3E-S3F [[126]11]). Although no
   correlations were found with the effect size of blood-based physical
   test assessments in DMD (Additional file [127]2: Fig. S3G), the
   CTG-repeat effect size did correlate well with the effect size of
   blood-based DMD body measures (Pearson rho: 0.41, Additional file
   [128]2: Fig. S3H) [[129]22]. In the latter study, comparisons with
   physical tests and body measures were independently based on the first
   principal component of a set of different clinical assessments of DMD
   patients. Thus, while DM1-related molecular dysregulations in blood can
   be validated in other studies, even from other neuromuscular disorders,
   they do not necessarily reflect expression dysregulation observed in
   other tissues.

   Since DM1 is known as a splicing disease, we also studied the
   association of the CTG-repeat length with alternative splicing events.
   Here, four events in three genes (RBM39, FLNA, and CTSZ) reached an FDR
   threshold of <5% (Additional file [130]2: Fig. S4). Given the limited
   and small effects observed, we did not further explore these
   associations.

Non-significant associations between gene expression and individual clinical
outcome measures

   Next, we studied phenotype-genotype relationships by estimating the
   associations of gene expression values with individual clinical outcome
   measures used in the OPTIMISTIC trial. Here, we found virtually no
   significant associations between gene expression and clinical outcomes.
   No significant associations after multiple testing correction were
   found for the DM1-Activ-c questionnaire (Additional file [131]2: Fig.
   S5). The four genes with the lowest nominal p-values were SREBF2,
   ZNF283, SF3B3, and GSKIP.

Significant associations with average clinical response

   To account for the heterogenic changes in individual outcome measures,
   we calculated a compound CBT response score that reflects the average
   scaled therapy response of all outcome measures used in OPTIMISTIC
   (Fig. [132]1). Noteworthy, similar ranges of change are observed for
   the variety of outcome measures due to the applied scaling. As such,
   each outcome measure contributes similarly to the compound response
   score. The compound response score can be interpreted as an estimate of
   an overall difference in capacity between the end and the start of the
   intervention. We were able to identify 1176 genes significantly (FDR <
   0.05) associated with the compound response score (384 positive, 792
   negative, Fig. [133]4A). The four hits with the lowest p-values
   (PPP1R9B, CSNK1G2, PPP6R1, FBXO48) show a clear linear relationship
   between changes in gene expression during the trial and the compound
   response score (Fig. [134]4B). No enriched pathways were identified for
   this gene set.

Fig. 4.

   [135]Fig. 4
   [136]Open in a new tab

   Gene expression association with compound response scores. For each
   gene, a mixed effect model was fitted with before/after CBT and
   compound response scores as fixed effects, while accounting for
   (random) effects of the individual. Compound response scores were
   fitted for gene expression values after CBT and set to be zero at
   baseline; the effect size of this covariate therefore expresses changes
   in gene expression compared to the baseline values that are associated
   with clinical response. The p-values for the fixed effects were
   estimated via Satterthwaite’s freedom method and FDR corrected. A
   Volcano plot of significance (−log10 of the nominal p-value) and the
   effect size of the compound response score on gene expression. Genes
   for which the effect size of compound response is significant (FDR <
   0.05) are visualized in black. B For the four genes with the lowest
   nominal p-values from panel A, the changes in gene expression (delta
   logCPM after-before CBT) are plotted against the compound response
   scores, including Pearson correlations

Clinical improvement linked to the reversal of disease-induced gene
expression

   Since we were able to identify genes significantly associated with both
   the CTG-repeat length as well as the average clinical response, we were
   interested in their intersection. Among the significant hits of both
   analyses, we found an overlap of 97 genes (Fig. [137]5A). To further
   investigate this relationship, we plotted the effect size of the
   compound response score against the effect size of the CTG-repeat
   length for each expressed gene (Fig. [138]5B). For the 97 genes
   significantly associated with both predictors, a remarkable pattern
   emerged: genes that were lower expressed in patients with a long CTG
   repeat showed an increase in expression levels in the patients with a
   good clinical response and vice versa. This anticorrelation pattern was
   confirmed by analyzing an earlier dataset comparing gene expression in
   DM1 and control blood samples [[139]25]. Here, the 97 genes showed a
   similar association with the DM1 phenotype as has been found with the
   CTG length in our study, confirming that the gene expression of
   patients with a good CBT response changed into the direction of the
   levels observed in healthy controls (Fig. [140]5C). This remarkable
   relationship could not be explained by possible confounding between
   CTG-repeat length and compound response, as the compound response
   effect size is only slightly affected by first regressing out the
   CTG-repeat length effect (Additional file [141]2: Fig. S6). The four
   genes with the lowest p-values with both the CTG-repeat length as well
   as the compound response score (either FDR < 0.025) were DNAJB12 (CTG
   Pearson rho=0.49; CRS Pearson rho=−0.43), HDAC5 (CTG rho=0.65; CRS
   rho=−0.46), TRIM8 (CTG rho=0.59; CRS rho=−0.59), and ZNF22 (CTG
   rho=−0.52; CRS rho=0.52). For 81 of the 97 candidate biomarkers, the
   Pearson correlations were nominally significant for the change in gene
   expression and the compound response score. Yet, only 2 of the 97
   Pearson correlations were nominally significant for the change in gene
   expression and the change in DM1-Activ-c scores. Enrichment analysis
   for these 97 genes resulted in mostly immunological pathways (chemokine
   and IL-3 signaling, microglia pathogen phagocytosis pathway; Table
   [142]1).

Fig. 5.

   [143]Fig. 5
   [144]Open in a new tab

   Clinical improvement is linked to normalization of expression of
   CTG-repeat-associated genes. Linear mixed effect models were fitted for
   each expressed gene, with CBT as a fixed effect, patient as a random
   effect, and either CTG repeat or compound response as the predictor.
   p-values for the regression coefficients were estimated via
   Satterthwaite’s degrees of freedom and considered significant for
   values smaller than 0.05 after FDR correction. Furthermore, differences
   in gene expression of blood samples from DM1 patients and controls were
   calculated based on an external study using a Wilcoxon signed-rank test
   on normalized, log-transformed gene counts. A Venn diagram illustrating
   the number of significant genes associated with CTG-repeat length and
   compound response, as well as their overlap (disease-relevant changes).
   B For all expressed genes, the regression coefficients of the compound
   response scores are plotted against the regression coefficients of the
   CTG-repeat lengths, including their Pearson correlation. For
   illustrative purposes, the regression coefficients of the CTG repeat
   have been multiplied by 100. Furthermore, the x-axis has been scaled
   between −0.25 and 0.25, removing 12 outliers from the figure. Colored
   in purple are the genes for which both regression coefficients were
   significant (FDR < 0.05). C For all expressed genes, the compound
   response effect size is plotted against the DM1 effect size based on an
   external study comparing blood expression profiles from DM1 patients
   and controls, including their Pearson correlation [[145]25]. For
   illustrative purposes, the x-axis has been scaled between −1.5 and 1.5,
   removing 6 outliers from the figure. Colored in purple are the same 97
   genes as in panel B

   In summary, these results suggest that for a subset of genes
   significantly associated with the biochemical phenotype caused by the
   CTG-repeat expansion a reversal of disease-induced gene expression
   occurred in clinical responders. The association with both molecular
   dysregulation and clinical response makes this subset of genes highly
   relevant for the discovery of novel therapeutic targets.

Discussion

   The purpose of this study was the identification of DM1-specific
   therapeutic biomarkers in peripheral blood. The multisystem nature of
   DM1 is known to be reflected by laboratory abnormalities of peripheral
   blood, making it together with its accessibility a promising tissue for
   biomarker studies in this disease [[146]26]. Hence, we used blood
   samples of 27 DM1 patients from the OPTIMISTIC cohort to study the
   associations of gene expression with disease severity as well as the
   response towards the CBT intervention. In an effort to fairly represent
   the whole OPTIMISTIC cohort and to facilitate the generalizability of
   the study findings, a stratified random sampling procedure was
   implemented which resulted in a balanced patient cohort with respect to
   age, CTG-repeat length, sex, therapy response, and clinical site
   distributions. Patients with an interrupted CTG repeat were excluded in
   order to limit molecular heterogeneity induced by slower disease
   progression rates [[147]5]. Nonetheless, we identified substantial
   heterogeneity in molecular expression profile changes after the
   10-month CBT intervention. Furthermore, we identified gene sets that
   were significantly associated with the disease-causing CTG repeat as
   well as with the average response towards the CBT intervention across
   different clinical outcome measures. Most interestingly, an overlap of
   97 genes among these latter two gene sets has been identified, showing
   a clear trend of more normal expression levels in clinical responders.
   Based on these different gene sets, several biological pathways
   associated with DM1 have been discovered, as well as specific
   genes/gene families with ties to neuro(-muscular) disorders.

   The OPTIMISTIC study has shown that DM1 patients significantly improve
   in their capacity for activity and social participation after the CBT
   intervention [[148]18]. It was furthermore hypothesized that CBT may
   directly or indirectly improve other biological systems affected by the
   disease. This hypothesis has been confirmed for muscles of the lower
   extremity, showing an increase in cross-sectional area as a result of
   the intervention [[149]58]. Here, we set out to further explore this
   hypothesis by investigating changes on the molecular level. These
   changes may be the result of increased physical activity, which has
   been linked to differences in gene expression in previous studies
   [[150]59], but may also be a more direct effect of the
   psychotherapeutic CBT intervention [[151]60].

   In line with the results of an earlier study, we have illustrated that
   the clinical response towards the CBT intervention was rather
   heterogenous [[152]21]. A novel addition to this finding was the
   illustration that this heterogeneity extends towards changes in
   molecular profiles within a 10-month timeframe. Importantly, this
   heterogeneity could not be explained by changes in the cellular
   composition of the blood samples between the two time points, as the
   similarity of cell type composition has been verified. Additionally,
   this heterogeneity could not be explained by changes of different
   outcome measures such as the DM1-Activ-c, 6MWT, or compound response.
   While the CBT intervention likely played a part in this heterogeneity,
   the magnitude of this contribution could not be assessed due to the
   lack of a control group. Other factors, such as aging or seasonal
   effects, may also have contributed to this finding.

   Across the different gene sets identified in this study, several of the
   genes with the lowest p-values (SLC39A8, IRS2, FBXO48) and one
   WikiPathway (transcription factor regulation in adipogenesis) were
   associated with insulin signaling or more broadly related to
   metabolism/adipogenesis [[153]61–[154]63]. Dysregulation of insulin
   signaling has been linked to clinical features of DM1 and is an
   actively ongoing field of investigation [[155]64]. Aberrant insulin
   signaling has also been found in other diseases of the nervous system
   such as depression, with indirect improvements being observed after CBT
   [[156]65]. Interestingly, the anti-diabetic drug metformin has been
   shown to improve the mobility of DM1 patients with effect sizes of the
   6MWT comparable to those observed in the OPTIMISTIC study [[157]66].
   With increasing therapeutic interest in this area, our findings suggest
   that disease-relevant insulin signaling can be studied on a molecular
   level in blood samples, highlighting the utility of peripheral blood in
   this setting.

   Similarly, across most of the gene sets, we identified several
   WikiPathways associated with immunological functions (cell-specific
   immune response, chemokine signaling pathway, IL-3, 4, and 5
   signaling). While this may be in part due to a bias introduced by the
   profiled tissue, the immune system likely plays a role in the DM1
   pathophysiology like for many other chronic diseases [[158]67]. As
   such, blood sample-based immunology studies may be an interesting field
   of future investigation.

   The intersection of the genes significantly associated with the
   disease-causing CTG repeat, as well as the average CBT response across
   different outcome measures, revealed a subset of 97 genes. These genes
   are of particular interest for the identification of therapeutic
   biomarkers, as their disease association has been confirmed in an
   external dataset and they showed normalization of expression levels in
   clinical responders. Among the genes with the lowest p-values
   associated to both CTG-repeat length as well as CBT response were
   HDAC5, DNAJB12, and TRIM8. In total, the subset of 97 genes consisted
   of two HDACs (histone deacetylases, HDAC5, HDAC7). HDACs play an
   important role in transcriptional regulation and compounds that inhibit
   HDAC enzymes are being studied for their potential effect on a range of
   human diseases, including neurological disorders [[159]68]. The DNAJB12
   protein is a member of the heat shock protein family, with some
   evidence supporting positive effects of their induction for muscular
   dystrophy and other muscle wasting conditions [[160]69]. The TRIM
   family protein TRIM72 has been shown to be an essential component of
   the cellular membrane repair in muscles, with evidence supporting some
   positive effects in mouse models of muscular dystrophy [[161]70].
   Authors of the same study suggest the potential of other TRIM family
   members as potential targets in similar disease states, which may
   support the further investigation of TRIM8 in DM1. Although mostly
   associated with therapy response, RARA and RXRA were also among the
   overlapping 97 genes. Stimulating retinoic acid signaling has been
   linked to muscle regeneration in mouse models via increased
   proliferation of fibro/adipogenic progenitor cells, highlighting the
   relevance of this pathway as another potentially DM1-relevant drug
   target [[162]71]. Taken together, these findings confirm the value of
   whole blood-based expression profiling for the discovery of therapeutic
   biomarkers in DM1.

   Interestingly, the genes significantly associated with the CTG repeat
   showed a moderate correlation with the genes associated with DMD body
   measurements of an external study. We hypothesize that these body
   measures are likely also correlated with age, which in turn reflects
   disease progression. This suggests that some of our results may
   therefore not be DM1 specific, but rather reflect non-specific
   molecular dysregulations shared across different (neuromuscular)
   disorders. This hypothesis is supported by the significant association
   between the CTG repeat with MMP9, which is known to be a non-specific
   biomarker that has for instance been linked to cardiac remodeling after
   myocardial infarction, inflammation, and DMD [[163]72, [164]73]. We
   therefore deem further exploration of shared dysregulations as highly
   valuable, as this may lead to the discovery of therapeutic targets
   relevant to a variety of diseases.

   Although DM1 is known as an alternative splicing disease, only four
   splice events have been significantly linked to the disease-causing CTG
   repeat in this study. This may be the result of relatively low DMPK
   expression in blood [[165]74] and is in line with the absence of strong
   splice aberrations in blood from DM1 patients compared to controls
   [[166]25]. DMPK’s low expression in blood cells may also explain the
   lack of concordance between our disease severity-associated gene
   expression differences observed in blood with gene expression
   differences observed in the muscle and brain. So, while whole
   blood-based transcription profiling can identify disease-relevant
   molecular dysregulations, these dysregulations do likely not fully
   reflect the dysregulations observed in other tissue types. Yet, we
   found a high correlation of the CTG-repeat effect size with the DM1
   phenotype effect size of a different blood-based study, as well as with
   a principal component derived body measure association of a DMD-based
   study. While the former validates our findings, the latter suggests the
   possibility of shared, disease-relevant, dysregulations across
   different neuro(muscular) disorders detectable in peripheral blood. If
   true, associated pathways might reveal highly interesting targets for
   drug discovery, as they may have a positive influence on multiple
   diseases.

Limitations of this study

   To find disease-relevant gene expression in blood, we searched for
   linear associations with the disease-causing CTG-repeat length. While
   the CTG-repeat length is thought to be the main driver of molecular
   dysregulation, associations between the progenitor allele length of the
   CTG repeat with several clinical outcome measures, including
   DM1-Activ-c and 6MWT, have been found to be only weak-moderate
   [[167]13]. In line with the previously published challenges to directly
   relate gene expression to clinical phenotypes, we were not able to find
   significant, direct associations between clinical outcome measures and
   gene expression. Still, among the genes with the lowest p-values for
   the DM1-Activ-c questionnaire was GSKIP, a gene encoding for an
   inhibitor protein of the known DM1 drug target GSK3-β
   [[168]75–[169]77]. Given the biological relevance of this finding, we
   deem it likely that the current study design was underpowered to study
   the association of gene expression with individual clinical outcome
   measures, especially when taking clinical and molecular heterogeneity
   into account.

   The clinical heterogeneity in therapy response may in part be explained
   by the personalized nature of the CBT intervention, with therapy foci
   being tailored towards the needs and wishes of the individual patient.
   As a consequence, one might expect different outcome measures to be
   more appropriate for CBT efficacy assessments for different patients.
   Yet, the identification of molecular signatures associated with therapy
   response necessitates the use of the same clinical outcome measure. For
   this reason, and to average out some of the uncertainty inherently
   associated with the recording of the different outcome measures, we
   settled on the use of the compound response score. While the scaling
   assured a more or less equal contribution of each outcome measure to
   this score, we acknowledge that this combined score is biased by the
   outcome measures that were used in OPTIMISTIC.

   Even though we statistically corrected for non-specific molecular
   changes between the two time points, the lack of RNA-seq profiles from
   the OPTIMISTIC control arm makes it difficult to state with certainty
   that the observed molecular changes are due to the therapy itself.
   However, this does not discount their value as potential therapeutic
   targets, as they are, regardless of the mediation factor, significantly
   associated with improved clinical status. Moreover, for this reason, we
   deemed studying the RNA-seq profiles of the OPTIMISTIC control arm to
   be less valuable, as these patients did not significantly clinically
   declined over the 10-month timeframe [[170]18].

Conclusions

   Starting from DM1-specific disease determinants, the OPTIMISTIC study
   has shown that patient-tailored CBT can increase the health status of
   DM1 patients by improving social participation and activity. It was
   furthermore hypothesized that the CBT intervention positively
   challenges the biological system, which has already been confirmed by
   increased cross-sectional area for muscles of the lower extremities.
   Making use of the clinical heterogeneity in therapy response, we here
   additionally confirmed disease-relevant molecular changes in peripheral
   blood. Not only do our results highlight the utility of peripheral
   blood to study the multisystem nature of the disease, but also
   generated the foundation for an upcoming, multi-omics-based drug
   repurposing study.

Supplementary Information

   [171]12916_2022_2591_MOESM1_ESM.docx^ (18.9KB, docx)

   Additional file 1: Table S1. PCR duplicates.
   [172]12916_2022_2591_MOESM2_ESM.zip^ (12.6MB, zip)

   Additional file 2: Figure S1. Patient characteristics. Box-plots
   illustrating the distribution of age (A) and CTG-repeat length (B) of
   the 27 patients at baseline, separated by sex. C) Box-plots
   illustrating the change in DM1-Activ-c score after the intervention
   versus before, as expressed in Delta DM1-Activ-c scores, separated by
   sex. Figure S2. DMPK expression. Panel A shows the expression values of
   DMPK in logCPM before and after the CBT intervention for all 27
   patients. Panel B shows the association between the CTG repeat length
   and the change in DMPK expression, as calculated by expression levels
   after the intervention minus the expression levels at the start of the
   trial. In addition, the Pearson correlation of this association is
   shown. Figure S3. Comparison of gene expression associated to DM1 in
   other studies with that to the CTG repeat in this study. The mean
   differences in normalized gene counts were calculated between DM1 and
   control samples for four studies comparing blood (A (25, 44)), heart (B
   (15, 42)), brain (C (12, 43)) and tibialis muscle (D, (11, 41)) and
   plotted against the effect sizes for the CTG-repeat in this study
   (ReCognitION) for genes that were measured in both studies. In E and F,
   the correlation of expression to the inferred MBNL activity and muscle
   strength(11), was compared to our effects for the CTG-repeat. In G and
   H, gene expression associations in blood of the results of physical
   tests and several body measurements for Duchenne muscular dystrophy
   (DMD) patients (22) are compared to CTG-repeat associations from our
   study. In (22), associations with physical tests and body measurements
   were based on the first principal component over associated measures,
   each reflecting respectively 78% and 70% of the associated measurements
   variances. Depicted on the top left in each graph is the Pearson
   correlation coefficient for the plotted values with the associated
   p-values. Figure S4. Splice exclusion and CTG-repeat length. PSI values
   for splice exclusion events were determined using rMATS [. For each of
   the PSI values a linear mixed effect model was fitted with the modal
   CTG repeat length as covariate and patient as random effect. A) Volcano
   plot of significance (-log 10 of the nominal p-values of the modal
   CTG effect size) and the CTG effect size for the PSI values.
   Significant results after FDR correction (p < 0.05) are marked in
   black. B) For the four PSI values with the lowest nominal p-values from
   A, the PSI values are plotted against the modal CTG repeat lengths
   before (blue) and after the CBT intervention (red) including the
   Pearson correlation coefficients. Figure S5. Gene expression
   association with DM1-Activ-c. For each gene a mixed effect model was
   fitted with before/after CBT and DM1-Activ-c scores as fixed effects,
   while accounting for (random) effects of the individual. The p-values
   for the fixed effects were estimated via Satterthwaite’s freedom method
   and FDR corrected. A) Volcano plot of the significance (-10log of the
   nominal p-value) and effect size of the DM1-Activ-c scores on
   gene expression. B) For the four genes with the lowest nominal p-values
   from panel A, the DM1-Activ-c scores are plotted against the gene
   expression values (logCPM). Blue dots represent baseline values, red
   dots values after CBT. The regression line indicates the linear
   association independent of the timepoints. Similarly, the Pearson
   correlation coefficients shown are independent of the
   timepoints. Figure S6. Shared explained variance among CTG-repeat and
   Compound Response predictors. To assess the overlap in gene expression
   level variance explained by the CTG-repeat length and the Compound
   Response score, the Compound Response score was fitted on the residuals
   of the CTG-repeat length as fixed effect, while accounting for random
   effects of the patient. A) The effect sizes of the Compound Response
   score as estimated on the CTG-repeat model residuals are
   plotted against the effect sizes Compound Response scores as presented
   in this study. The Rho score reflects the Pearson correlation
   coefficient. B) Analogous to Figure 5B, the Compound Response score
   effect size as estimated on the CTG-repeat model residuals are plotted
   against the CTG repeat effect size size as estimated on the CTG-repeat
   model residuals are plotted against the CTG repeat effect size scaled
   between -0.25 and 0.25, resulting in the removal of 12 outliers.
   Colored in purple are the same 97 genes as in Figure 5B.

Acknowledgements