Abstract
Recent genome-wide association studies (GWASs) of several individual
sleep traits have identified hundreds of genetic loci, suggesting
diverse mechanisms. Moreover, sleep traits are moderately correlated,
so together may provide a more complete picture of sleep health, while
illuminating distinct domains. Here we construct novel sleep health
scores (SHSs) incorporating five core self-report measures: sleep
duration, insomnia symptoms, chronotype, snoring, and daytime
sleepiness, using additive (SHS-ADD) and five principal
components-based (SHS-PCs) approaches. GWASs of these six SHSs identify
28 significant novel loci adjusting for multiple testing on six traits
(p < 8.3e-9), along with 341 previously reported loci (p < 5e-08). The
heritability of the first three SHS-PCs equals or exceeds that of
SHS-ADD (SNP-h^2 = 0.094), while revealing sleep-domain-specific
genetic discoveries. Significant loci enrich in multiple brain tissues
and in metabolic and neuronal pathways. Post-GWAS analyses uncover
novel genetic mechanisms underlying sleep health and reveal connections
(including potential causal links) to behavioral, psychological, and
cardiometabolic traits.
Subject terms: Genome-wide association studies, Sleep disorders
__________________________________________________________________
Data-driven composite sleep health scores, combining self-reported
sleep duration, snoring, chronotype, insomnia, and sleepiness, provide
heritable, interpretable phenotypes and novel GWAS discoveries
elucidating regulatory pathways.
Introduction
Sleep is an essential biological process, orchestrated by interrelated
neurologic and physiologic regulatory processes, responding to
individual, social, and environmental influences^[75]1–[76]3. Positive
sleep traits have been associated with lower rates of cardiometabolic
and neuropsychiatric diseases, as well as higher productivity and
well-being^[77]4. Moreover, general sleep health has come into recent
focus as a consequential and modifiable health factor, with the
combined presence of multiple healthy sleep factors frequently being a
stronger predictor of positive health outcomes^[78]5,[79]6. As a
composite, sleep health is recognized to involve multiple domains,
including regularity, satisfaction, alertness, timing, efficiency, and
duration^[80]1.
Several recent studies leveraging biobank-scale data have resulted in
well-powered genome-wide association studies (GWASs) of sleep
phenotypes, capturing several aspects of sleep health, including
self-reported sleep duration^[81]7, insomnia^[82]8,[83]9,
sleepiness^[84]10, snoring^[85]11, and chronotype^[86]12. These GWASs
have begun to elucidate the genetic architecture of sleep, while
revealing the presence of widespread genetic correlations across both
sleep and related neuropsychiatric and cardiometabolic
traits^[87]2,[88]3. Associated genomic loci and pathways are often
shared across multiple sleep traits, suggesting a shared genetic basis
and co-regulated processes. Therefore, a more complete and robust
understanding of sleep may be achieved by describing patterns across
multiple traits, pointing toward underlying domains, highlighting the
potential utility in analyzing composite sleep health scores (SHSs).
Recently, an additive sleep health score (SHS-ADD) consisting of five
self-reported sleep behaviors was studied in unrelated individuals in
the UK Biobank (UKB), yielding new genetic findings^[89]13. However,
additive scores compress data across multiple domains to a single
metric, resulting in potential information loss, increased genetic
heterogeneity, and weaker signal for genetic analyses.
Prior data-driven phenotyping of complex traits and correlated disease
conditions have often relied on principal components analysis (PCA).
The PCA algorithm is well-recognized for its ability to sensibly
restructure multi-dimensional phenotypes, by constructing statistically
independent linear composites, or principal components (PCs),
sequentially prioritized in terms of variance explained. The
construction of PCs is informed by underlying relationships across
traits, with correlated traits tending to be combined, given the
independence constraint, since a composite of correlated traits will
explain more of the remaining total variance. In the broader field of
genomics, PCA has been effectively applied as a dimension reduction
technique to derive composite phenotypes and to refine and improve
investigations of the genetic architecture of complex traits, with
applications to both subject-level phenotypic data and GWAS summary
statistics. Multivariate analyses with PCA have been used for various
target phenotypes in genetic analyses, including recently in cardiac
imaging^[90]14, anthropometric traits reflecting body shape^[91]15,
metabolic profiling^[92]16 and lipid profiling^[93]17.
In this study, we expanded the prior UKB SHS-ADD GWAS to a larger UKB
sample and constructed five novel PC-derived SHSs as linear
combinations of the same five underlying traits. We conjectured that,
when compared with SHS-ADD, SHS-PCs would create more precisely
targeted phenotypes, resulting in potentially greater heritability, as
well as distinct and interpretable domain-specific associations in
secondary analyses.
Results
Sleep health score construction in UKB
The study population consisted of 413,904 UKB participants of European
ancestry with complete sleep and genomics data (“Methods”). Sample
characteristics are provided in Supplementary Data [94]1. Self-reported
sleep traits (sleep duration, insomnia, chronotype, snoring, and
daytime sleepiness) derived from the baseline questionnaire were used
to construct SHS traits (“Methods”). Briefly, SHS-ADD was
operationalized as in previous UKB studies^[95]5,[96]13, defined as the
sum of five dichotomized positive sleep health characteristics: sleep
duration of 7–8 h, morning chronotype preference, no snoring,
infrequent insomnia symptoms and infrequent daytime sleepiness
(“Methods”). SHS-PCs were extracted from the original integer-scale
same underlying sleep traits, treated as linear continuous measures
(after being mean-centered and variance-standardized), and then
oriented to positively correlate with self-assessed overall health.
Several findings emerged in constructing the Sleep Health Score
Principal Components (SHS-PCs) in the UK Biobank, providing context and
guiding interpretation. SHS-PCs 1–5 individually explained from 25.2%
to 14.9% of the phenotypic variation (Fig. [97]1 and Supplementary
Data [98]2). Based on their PC loadings (Fig. [99]1 and Supplementary
Data [100]2), higher scores on SHS-PCs are interpreted as follows –
SHS-PC1: longer sleep with less-frequent insomnia symptoms and
sleepiness; SHS-PC2: healthier sleep with less-frequent sleepiness and
without snoring (i.e., without symptoms of sleep apnea syndrome);
SHS-PC3: morningness chronotype; SHS-PC4: snoring with less frequent
sleepiness; SHS-PC5: shorter sleep duration with less-frequent insomnia
symptoms. Substantial non-normality was observed for SHS-PC2, SHS-PC3,
and SHS-PC4 (Supplementary Fig. [101]1), resulting from the underlying
trait distributions (“Methods”). The direction and loadings of the PCs
follow from underlying covariance and corresponding Pearson
correlations among the self-report sleep traits (Supplementary
Data [102]3): SHS-PC1 was driven primarily by correlations between
sleep duration and insomnia (r = −0.24) and between insomnia and
sleepiness (r = 0.09); SHS-PC2 was driven by the correlation between
sleepiness and snoring (r = 0.08); SHS-PC3 loaded on chronotype, which
was independent of the other underlying traits; whereas SHS-PC4 and
SHS-PC5 appear to be driven largely by the PC independence constraint,
such that they loaded on both positive and negative sleep attributes,
and does not imply these combinations constitute clusters in the data.
Fig. 1. Loadings and variance explained for the five principal component
(PC)-based composite sleep health scores (SHS).
[103]Fig. 1
[104]Open in a new tab
a SHS-PC loadings on self-reported sleep phenotypes. Interpretation of
higher SHS-PC scores, based on loadings, are: PC1—longer sleep with
less-frequent insomnia and lower sleepiness; PC2—absence of snoring and
lower sleepiness; PC3—morningness chronotype; PC4—presence of snoring
and less frequent sleepiness; PC5—shorter sleep and less-frequent
insomnia. Radar plots display loading magnitudes (radial distance). Red
dots: positive loadings; Blue dots: negative loadings. b Percent of the
phenotypic variance explained by each sleep health score. SHS sleep
health score, PC principal component.
The interpretation of the SHSs was further clarified by their Spearman
rank correlations (r[s]) with objective traits not used in their
construction (Supplementary Figs. [105]2a and [106]3a). For example,
SHS-PC1 was the SHS trait most positively correlated with
accelerometry-based sleep duration (r[s] = 0.09) and sleep efficiency
(r[s] = 0.08) metrics. SHS-PC2 was positively correlated with higher
sleep efficiency (r[s] = 0.06), lower daytime inactivity
(|r[s]| = 0.10), lower BMI (|r[s]| = 0.19), and being female
(|r[s]| = 0.17), features which suggest the absence of sleep apnea.
SHS-PC3 correlated with earlier accelerometry-based measures of maximum
activity timing (|r[s]| = 0.29) and sleep midpoint (|r[s]| = 0.27),
(measurements related to circadian timing). Notably, SHS-PC4 showed no
significant correlations with self-assessed overall health (p > 0.05)
but was nonetheless correlated with other measures of overall health,
including lower levels of self-reported disability (|r[s]| = 0.06) and
lower numbers of treatments/medications taken (|r[s]| = 0.05). Notably,
SHS-ADD had the strongest association with self-reported overall health
(|r[s]| = 0.21), as well as with the accelerometry-derived sleep
regularity index (|r[s]| = 0.10).
Genome-wide association analysis
We performed GWAS for the six SHS traits using linear mixed regression
models, adjusting for age, sex, genotyping array, ten genetic PCs and
genetic relatedness matrix (“Methods”). We identified 31,188
genome-wide significant (GWS) SNPs (p < 5e-8), resulting in 45 loci for
SHS-ADD (SNP-h^2 = 0.094), and 91, 48, 166, 26, and 24 loci for
SHS-PC1-5 (SNP-h^2 = 0.117, 0.093, 0.153, 0.070, and 0.068),
respectively (Fig. [107]2a, Table [108]1, Supplementary Figs. [109]4
and [110]5, and Supplementary Data [111]4 and [112]5; “Methods”).
Function annotation of all SNPs in linkage disequilibrium (LD;
[MATH:
r2≥0.6 :MATH]
) with the lead SNPs in the risk loci was performed using FUMA^[113]18
(“Methods”; Fig. [114]2e, h).
Fig. 2. Genome-wide significant SNPs associated with SHS.
[115]Fig. 2
[116]Open in a new tab
a Number of Genome-wide significant (GWS) SNPs (p < 5e-08) and risk
loci (“Methods”). b Number of risk variants that colocalized for each
pair of SHS (HyPrColoc; “Methods”). c Number of loci not reported by
previous sleep GWASs in biobanks (GWS loci with a lead variant at least
500 kb from any of the previously published GWS sleep variants;
“Methods”). d Number of loci reported by previous sleep GWASs in
biobanks (lead variant within 500 kb of published GWS sleep variants;
“Methods”). e Distribution of Functional consequences (FUMA^[117]18;
“Methods”) of all annotated SNPs in LD with independent GWS SNPs by
SHS; 3.2% of the annotated SNPs were in functional regions (exon, UTR,
and splice site). f Regulome DB score distribution (FUMA; “Methods”) of
all annotated SNPs in LD with independent GWS SNPs by SHS; 3.4% of the
annotated SNPs were in regulatory regions with Regulome DB score < 2. g
CADD score distribution (FUMA; “Methods”) of all annotated SNPs in LD
with independent GWS SNPs by SHS; 6.8% of the annotated SNPs likely
deleterious effect with CADD score>10. h Chromatin state distribution
(FUMA; “Methods”) of all annotated SNPs in LD with independent GWS SNPs
by SHS; 74% of the annotated SNPs were in open chromatin regions with a
minimum chromatin state between 1 and 7. OSA obstructive sleep apnea,
RLS restless leg syndrome, SHS sleep health score, UTR5 5′ untranslated
region: UTR3 3′ untranslated region, ncRNA non-coding RNA.
Table 1.
Significant novel loci associated with SHS with a lead variant at least
500 kb from any of the previously published genome-wide significant
sleep variants
SHS Loc. SNP Chr. Position (GRCh37) Nearest gene(s) Alleles (E/A) EAF
INFO BETA SE P
ADD 1 rs75607302 4 159677732 PPID/FNIP2 A/AT 0.428 0.977 −0.013 0.002
2.8E − 09
2 rs10808575 8 130062388 [118]AC068570.1/LINC00977 T/C 0.733 0.997
0.014 0.002 2.8E − 09
3 rs12257317 10 19457469 UBE2V2P1/MALRD1 A/G 0.685 0.997 0.013 0.002
5.7E − 09
4 rs7924036 10 65191645 JMJD1C G/T 0.497 1.000 −0.012 0.002 7.4E − 09
5 rs72896891 18 42632654 SETBP1 A/T 0.833 0.993 −0.018 0.003 6.8E − 10
PC1 6 rs12759956 1 18432831 RP11-174G17.2/IGSF21 T/A 0.713 0.995 0.017
0.003 4.0E − 10
7 rs12470733 2 200968215 C2orf47/SPATS2L C/A 0.798 0.993 −0.019 0.003
4.6E − 10
8 rs1571582 9 103663962
RP11-394D2.1/
RP11-62L10.1
T/C 0.498 0.997 −0.016 0.002 3.4E − 11
9 rs201449027 12 9142784 KLRG1/RP11-259O18.4 TG/T 0.561 0.965 0.016
0.002 1.4E − 10
10 rs4559781 13 28303803 NPM1P4/GSX1 C/T 0.153 0.994 0.020 0.003
2.1E − 09
11 rs139221256 15 85357857 ZNF592/ALPK3 T/TA 0.749 0.991 0.016 0.003
5.8E − 09
12 rs12601771 17 4108822 ANKFY1 G/A 0.433 0.995 0.015 0.002 5.6E − 10
13 rs8074498^a 17 79954544 ASPSCR1 T/A 0.418 0.983 0.016 0.002
1.2E − 10
14 rs9610500 22 22221167 MAPK1 A/G 0.634 0.983 −0.015 0.003 4.8E − 09
PC2 15 rs2821226 1 203517292 OPTC/ATP2B4 A/G 0.473 0.984 −0.013 0.002
5.2E − 09
16 rs17559978 7 84677860 SEMA3D G/A 0.688 0.990 0.015 0.002 2.3E − 10
17 rs11111069 12 102271962 DRAM1 C/G 0.791 0.995 −0.016 0.003 2.4E − 09
18 rs113851179 16 1733479 LA16c-431H6.6/HN1L C/CT 0.925 0.987 0.026
0.004 8.6E − 10
19 rs12979056 19 17862131 FCHO1 G/A 0.542 0.992 −0.014 0.002 1.5E − 09
20 rs3788337 22 23412017 RTDR1 G/A 0.647 0.993 −0.014 0.002 6.5E − 09
PC3 21 rs56049037 7 32947201 AVL9 G/A 0.713 0.988 0.016 0.002 1.3E − 11
22 8:11053467-GA:G 8 11053467 XKR6 GA/G 0.555 0.914 0.014 0.002
1.6E − 09
23 rs11494758 12 9116542 KLRG1 C/T 0.622 0.997 −0.014 0.002 3.4E − 10
24 rs71272625 15 78166843 LINGO1/CSPG4P13 C/CT 0.331 0.931 0.014 0.002
8.8E − 10
25 rs11373181 22 42705672 TCF20 A/AC 0.482 0.990 −0.013 0.002 3.6E − 10
PC4 26 13:58551593-GA:G 13 58551593 PCDH17/RNA5SP30 GA/G 0.763 0.993
−0.016 0.002 2.4E − 11
PC5 27 rs138572890^b 16 50264953 PAPD5 C/CTTTA 0.929 0.968 0.023 0.004
7.7E − 10
28 19:59007970-CA:C 19 59007970 SLC27A5 CA/C 0.366 0.948 −0.013 0.002
7.0E − 11
[119]Open in a new tab
The significance threshold was adjusted for multiple testing across six
traits (p < 8.3 × 10^−9).
Loc. locus, Chr. chromosome, E/A effect and alternative alleles, EAF
effect allele frequency, INFO imputation quality score.
^aMissense variant.
^b3′ UTR variant.
To determine novelty, we compared the 400 unique GWS loci against those
found to be GWS in prior biobank GWASs of individual sleep traits and
the previously developed SHS-ADD in the UKB (“Methods”; Fig. [120]2c,
d; Supplementary Data [121]5). This identified 59 unreported
SHS-associated GWS loci with a lead variant at least 500 kb away from
previously reported sleep variants (Supplementary Data [122]4). Of
these unreported loci, 28 passed a stricter significance threshold
(p < 8.3e-9) accounting for multiple testing on six traits, which are
defined as novel loci (Table [123]1). Locus zoom plots of the 28 novel
loci are shown in Supplementary Fig. [124]6. Among the 28 novel loci,
two were independent (r^2 < 0.1) but within 50 kb of one another:
rs201449027 (associated with SHS-PC1) and rs11494758 (with SHS-PC3),
both in the KLRG1 locus (a gene with reported immune system
function^[125]19). The other 26 loci were distinctly associated with
one SHS trait. In addition, there were two functional variants among
the lead SNPs: First, SHS-PC1 associated with rs8074498, a missense
variant in ASPSCR1, a gene regulating GLUT4 in glucose sequestration
and transportation in response to insulin. Second, SHS-PC5 associated
with rs138572890 in the 3’ UTR of PAPD5, a non-canonical poly(A)
polymerase involved in the surveillance and degradation of aberrant
RNAs, including the glucose transporter GLUT1. Functional annotation
for the lead SNPs at the 28 novel and 31 additional unreported loci is
provided in Supplementary Data [126]6 and [127]7.
Among all 400 GWS loci, there were 62 loci colocalizing SHS traits
(Fig. [128]2b), including 18 loci colocalizing multiple SHS-PCs,
suggesting the presence of instances of pleiotropy across SHS-PC traits
(“Methods”; Fig. [129]2b and Supplementary Data [130]8). Several
variants colocalized across multiple SHS and were previously reported
in multiple sleep GWAS, for example: Rs113851554 at MEIS1 (colocalizing
SHS-ADD, SHS-PC1, SHS-PC3, and SHS-PC5) was reported in GWASs of
insomnia, chronotype, and restless leg syndrome. Rs2863957 at PAX8
(colocalizing SHS-PC1, SHS-PC2, and SHS-PC5) was reported for sleep
duration and insomnia. Rs1421085 at FTO (colocalizing SHS-PC3 and
SHS-PC4), a widely recognized obesity gene, was reported for sleep
duration, chronotype, snoring, and OSA. In addition, SHS-PC4 and
SHS-PC5 colocalized at rs576981040 in a novel locus containing
TNFRSF14, a gene involved in T-cell activation and signaling. The
infrequent colocalization of loci associated with SHS-PC traits is
consistent with their being largely genetic and phenotypically
statistically independent (Supplementary Figs. [131]2b and [132]3b).
Genetic overlap with individual sleep traits
Genetic correlations, between SHS traits with individual sleep and
accelerometry traits, were qualitatively similar to the analogous
phenotypic correlations described above, while often stronger
(Supplementary Fig. [133]2). GWS SHS loci and their corresponding
genetic risk scores (GRS; “Methods”) were associated with underlying
self-reported and accelerometry derived sleep traits, largely in
keeping with expectation, based on the construction of the SHS traits
(Supplementary Data [134]9 and [135]10, Supplementary Note [136]1).
Conversely, approximately 50% of 1039 GWS loci previously reported for
SHS-ADD or individual sleep traits were associated with one or more SHS
traits (p < 5e-8; Supplementary Data [137]11 and [138]12).
Sensitivity analysis
We performed 22 distinct sensitivity analyses (“Methods”), for each of
the 400 GWS loci across the six SHS traits in unrelated individuals
(n = 308,902), adjusting for various factors or restricting to one of
three subsets (males-only, n = 145,186; females-only, n = 163,716;
healthy-only, n = 115,297). Specific covariate adjustments led to
modest average attenuation (<15%) in SHS genetic effects across the GWS
loci (Supplementary Data [139]13 and Supplementary Fig. [140]7). For
example, adjustment of adiposity measures in SHS-PC2 (9.3%) and SHS-ADD
(7.9%), mood variables in SHS-ADD (14.1%) and SHS-PC1 (9.6%), and in a
healthy subset without chronic diseases in SHS- SHS-PC1 (13.5%). Nearly
all individual loci remained nominally significant (p < 0.05) with
similar effect size (standard error [SE] change < 2) after additional
adjustment.
Replication and validation analyses
We tested the replication of the 400 GWS loci in the Hispanic Community
Health Study/Study of Latinos (HCHS/SOL; n = 11,144) using the same SHS
PC loadings in UKB. Notably, HCHS/SOL differs in sample size and
continental ancestry, compared with UKB, as well as in the distribution
of key variables, such as age and sleep duration (“Methods”;
Supplementary Data [141]14). Chronotype was only collected in a subset
of the HCHS/SOL cohort and was imputed to the entire sample
(“Methods”). Of the 400 loci tested, 12 demonstrated nominal
significance (P < 0.05) and consistent directional effects compared to
UKB (Supplementary Data [142]15). However, we were able to validate the
polygenic risk score (PRS) constructed using genome-wide summary
statistics for each SHS in HCHS/SOL (P < 0.008), except for SHS-PC3,
which showed nominal significance (P < 0.05) (“Methods”; Table [143]2).
We further examined the PRS associations with sleep phecodes in the MGB
biobank (“Methods”; Table [144]2). The PRSs of both SHS-ADD and SHS-PC1
were associated with lower odds of 6 of the 13 sleep phecodes:
insomnia, obstructive sleep apnea, restless legs syndrome, sleep
disorders (unspecified), organic or persistent insomnia, and sleep
apnea (unspecified). PRS for SHS-PC2, SHS-PC4, and SHS-PC5 were
associated with 2 sleep phecodes: obstructive sleep apnea and sleep
apnea. Whereas the SHS-PC4 PRS was associated with higher odds for
sleep apnea disorders (likely reflecting its positive association with
snoring), increases in PRS for SHS-PC2 and SHS-PC5 were associated with
lower odds for sleep apnea.
Table 2.
Novel SHS Polygenic Risk Score (SHS-PRS) associations in independent
studies
SHS-ADD SHS-PC1 SHS-PC2 SHS-PC3 SHS-PC4 SHS-PC5
Study Phenotype Phe -code N^a Beta (SE)/
OR (95% CI) P^‡ Beta (SE)/
OR (95% CI) P^‡ Beta (SE)/
OR (95% CI) P^‡ Beta (SE)/
OR (95% CI) P^‡ Beta (SE)/
OR (95% CI)) P‡ Beta (SE)/
OR (95% CI) P^‡
HCHS/SOL Corresponding SHS^b 11,144 0.063 (0.010) 3.86 E − 10 0.086
(0.009) 9.22 E − 20 0.082 (0.009) 3.86 E − 18 0.024 (0.009) 1.16 E − 02
0.047 (0.009) 5.41 E − 07 0.094 (0.009) 3.04 E − 23
MGB biobank Insomnia 327.4 5748/30226 0.91 (0.88, 0.94) 5.87 E − 10 0.9
(0.87, 0.92) 1.15 E − 12 0.99 (0.96, 1.02) 0.416 0.97 (0.94, 1) 0.026
1.01 (0.98, 1.04) 0.372 0.96 (0.93, 0.98) 0.002
Obstructive sleep apnea 327.32 6722/30226 0.86 (0.84, 0.88) 6.73 E − 25
0.93 (0.9, 0.95) 7.65 E − 08 0.86 (0.84, 0.89) 3.88 E − 26 1.00 (0.97,
1.03) 0.998 1.08 (1.06, 1.11) 5.21 E − 09 0.94 (0.92, 0.97) 1.82 E − 05
Restless legs syndrome 327.71 1281/30226 0.86 (0.81, 0.91) 3.06 E − 07
0.86 (0.81, 0.91) 1.94 E − 07 0.93 (0.88, 0.99) 0.015 0.96 (0.91, 1.02)
0.179 1.00 (0.94, 1.06) 0.982 0.91 (0.86, 0.97) 0.002
Sleep disorders 327 3048/30226 0.88 (0.85, 0.92) 1.30 E − 09 0.9 (0.87,
0.94) 2.80 E − 07 0.94 (0.91, 0.98) 0.003 0.96 (0.93, 1) 0.053 1.01
(0.97, 1.05) 0.564 0.98 (0.94, 1.02) 0.239
Organic or persistent insomnia 327.41 1343/30226 0.87 (0.82, 0.92) 2.10
E − 06 0.87 (0.82, 0.92) 3.27 E − 06 0.97 (0.92, 1.03) 0.325 0.98
(0.93, 1.03) 0.419 1.02 (0.96, 1.08) 0.530 0.93 (0.88, 0.98) 0.008
Sleep apnea 327.3 4161/30226 0.85 (0.82, 0.88) 1.06 E − 19 0.93 (0.9,
0.96) 2.61 E − 05 0.84 (0.81, 0.87) 2.03 E − 23 0.99 (0.95, 1.02) 0.403
1.07 (1.04, 1.11) 3.84 E − 05 0.93 (0.9, 0.97) 6.77 E − 05
Sleep-related movement disorders 327.7 445/30226 0.9 (0.81, 0.99) 0.036
0.89 (0.81, 0.99) 0.024 0.99 (0.89, 1.08) 0.763 0.95 (0.86, 1.04) 0.252
1.01 (0.92, 1.11) 0.782 0.99 (0.9, 1.09) 0.774
Circadian rhythm sleep disorder 327.6 302/30226 0.85 (0.76, 0.96) 0.010
0.88 (0.78, 0.99) 0.033 0.88 (0.78, 0.99) 0.034 0.94 (0.83, 1.05) 0.246
0.97 (0.86, 1.08) 0.559 0.99 (0.88, 1.11) 0.882
Parasomnia 327.5 497/30226 0.94 (0.85, 1.03) 0.171 0.92 (0.83, 1) 0.061
0.99 (0.9, 1.09) 0.842 0.95 (0.87, 1.04) 0.277 1.01 (0.93, 1.11) 0.784
1.01 (0.93, 1.11) 0.771
Hypersomnia 327.1 989/30226 0.89 (0.84, 0.96) 0.001 0.94 (0.88, 1)
0.064 0.9 (0.84, 0.96) 0.001 0.94 (0.88, 1) 0.049 1.02 (0.96, 1.09)
0.554 0.92 (0.87, 0.99) 0.017
Central/nonobstructive sleep apnea 327.31 555/30226 0.91 (0.83, 1)
0.046 1.06 (0.97, 1.15) 0.232 0.93 (0.85, 1.01) 0.089 0.96 (0.89, 1.05)
0.407 1.07 (0.98, 1.16) 0.117 0.88 (0.8, 0.96) 0.003
Sleep-related leg cramps 327.72 223/30226 1.00 (0.86, 1.15) 0.949 0.95
(0.83, 1.1) 0.511 1.04 (0.91, 1.19) 0.563 1.07 (0.94, 1.22) 0.318 1.07
(0.94, 1.22) 0.305 1.00 (0.87, 1.14) 0.990
Cataplexy and narcolepsy 347 325/30860 0.88 (0.78, 0.99) 0.032 0.97
(0.86, 1.08) 0.558 0.88 (0.79, 0.98) 0.025 0.98 (0.88, 1.09) 0.660 1.01
(0.91, 1.13) 0.842 0.96 (0.86, 1.07) 0.432
[145]Open in a new tab
^aTotal sample size is provided for continuous outcome in HCHS/SOL,
sample sizes in cases vs controls are provided for binary sleep
phenotypes in MGB biobank.
^bSHS in HCHS/SOL were calculated from the SHS-ADD definition and UKB
SHS-PC loadings.
^‡Significant P-values of PRS associations with corresponding SHS in
HCHS/SOL accounting for 6 SHS traits (P < 0.008) and with sleep phecode
in MGB biobank accounting for the number of phecode and SHS traits
(P < 0.0006) shown in bold.
Implicated genes
We prioritized the genes at GWS loci using three mapping methods
(position, eQTL, and Chromatin Interaction [CI]) as well as
MAGMA^[146]20 positional gene-based analysis in FUMA^[147]18
(“Methods”; Supplementary Data [148]16–[149]19; for gene-based GWAS
Manhattan and QQ plots see Supplementary Figs. [150]8 and [151]9).
Hundreds of genes were implicated. Here we highlight only those with
the strongest evidence, as prioritized by all mapping methods. The five
novel SHS-ADD loci were mapped to 39 genes, with five genes in two loci
supported by all four mapping methods. The latter included FNIP2, which
binds to AMP-activated protein kinase (AMPK) and plays a crucial role
in mTORC1 signaling and the regulation of heat shock protein-90
(Hsp90)^[152]21, which has been previously linked to sleep homeostasis
and behavioral rhythms^[153]22,[154]23; NRBF2, which is involved in
circadian rhythm via control of autophagy, and nutrient and cellular
homeostasis^[155]24; and JMJD1C, which plays a role in DNA
repair^[156]25 and has been associated with Rett syndrome (OMIM
312750), which co-occurs with epilepsy and sleep disturbance^[157]26.
Nine novel SHS-PC1 loci were mapped to 140 genes, with five genes in
three loci mapped by all four methods. Of these, NMB encodes the
neuromedin B neuropeptide linked to the endocrine and exocrine systems,
body temperature, and blood pressure^[158]27, while at the same locus,
WDR73 is highly expressed in cerebellar Purkinje neurons, and ZNF592
has been implicated in cerebellar atrophy; ANKFY1 is also involved in
the maintenance of cerebellar Purkinje cells that play a role in
sleep-wake regulation^[159]28,[160]29; MAPK1 is part of the
mitogen-activated protein kinase (MAPK)/extracellular signal-regulated
kinase (ERK) signaling pathway that is linked to mental health and the
circadian system^[161]30. Five novel SHS-PC2 loci were mapped to 97
genes (nine genes in four loci mapped by all methods). Of these, SEMA3D
encodes a member of the class III semaphorins that are involved in axon
guidance during neuronal development^[162]31 (two more class III
semaphorins at this locus, SEMA3A and SEMA3E, also mapped by CI); DRAM1
is a regulator of autophagy in the context of mitochondrial
dysfunction, implicated in neurodegeneration^[163]32; MAPK8IP3 is
another MAPK/ERK gene essential for the function and maintenance of
neurons, with links to neurodevelopmental disorders^[164]33; FCHO1 is
involved in clathrin-coat assembly and clathrin-mediated endocytosis
and has been implicated in immune deficiency^[165]34,[166]35. Three
novel SHS-PC3 loci were mapped to 62 genes, with five genes in three
loci mapped by all four methods. These include PDE1C a
phosphodiesterase bound by calmodulin that regulates proliferation of
vascular smooth muscle cells and may play a pathological role in
cardiac remodeling and dysfunction^[167]36 and TCF20, involved in
neurodevelopmental diseases and sleep disturbances^[168]37,[169]38. One
novel SHS-PC4 locus was mapped to three genes, including PCDH17 (mapped
by position, eQTL and CI in hippocampal and neural progenitor cells)
involved in forming and maintaining neuronal synapses^[170]39. The two
novel SHS-PC5 loci mapped to 71 genes, of which two genes were mapped
by all four methods. These include SLC27A5, involved in bile acid
synthesis and metabolism^[171]40, which has also been implicated in
brain health and neural development^[172]41. In addition, fifty genes
in 28 novel loci have shown drug interactions (Supplementary
Data [173]16). Implicated genes for additional unreported and reported
variants are summarized in Supplementary Data [174]17 and [175]18. For
completeness, all significant genes (p < 2.64e-6) in gene-based
analysis are reported in Supplementary Data [176]19. Note that genes
implicated by fewer mapping methods may also merit prioritization. One
intriguing example is that two GWS unreported SHS-PC2-associated
variants, rs2821226 (OPTC/ATP2B4; 5.20E-09) and rs1610263 (COL8A1;
4.10E-08), both implicate collagen-pathway genes that may play a key
role in the physiology of obstructive sleep apnea (OSA)^[177]42. Lack
of causal tissues (e.g., specific brain and connective tissues) may
result in incorrect mappings. Any definitive conclusions based on
mapped genes will require functional follow-up.
Gene set enrichment analyses
We performed pathway enrichment analysis, applying PASCAL^[178]43 to
SHS GWAS summary statistics, and identified significant enrichments of
SHS-ADD variants in MAPK and NGF signaling pathways; SHS-PC1 variants
in neuronal system, ubiquitin-mediated proteolysis, and MAPK signaling
pathways; SHS-PC2 in ion transport; SHS-PC3 in circadian, mRNA
processing and splicing, G-protein, and metabolic pathways; SHS-PC4 in
neuronal system, neurotransmission at synapses, GABA receptor, and long
term depression pathways; and SHS-PC5 in the gap junction pathway
(Empirical p < 5e-4; Fig. [179]3a and Supplementary Data [180]20).
Fig. 3. Pathway and tissue enrichment analysis.
[181]Fig. 3
[182]Open in a new tab
a Pathway gene sets significantly enriched for SHS genes
(PASCAL^[183]43, “Methods”). b Tissue-specific expression gene sets
enriched for SHS genes (MAGMA^[184]20 via FUMA^[185]18, “Methods”).
We also performed gene-set enrichment analyses^[186]20 (tissue, trait,
cell type, and pathway) on genes mapped by FUMA using MAGMA
(Supplementary Data [187]21–[188]23). Tissue enrichment analyses
identified multiple brain tissues for all SHS traits except SHS-PC4,
with the highest enrichments in cerebellum, hypothalamus, and frontal
cortex for SHS-PC1 and frontal cortex, cerebellum, and nucleus
accumbens for SHS-PC3. All enriched tissue findings were brain tissues,
except the pituitary tissue for SHS-PC1 (p < 0.05/54/6 = 1.5e-4;
Fig. [189]3b and Supplementary Data [190]21), which has both neural and
endocrine functions, the posterior pituitary containing distal axons of
hypothalamic neurons. Cell type enrichment analysis identified multiple
brain cell types for each SHS, and especially for SHS-PC1, including
GABAergic neurons in the human midbrain (Supplementary Fig. [191]10).
Enrichment of SHS-associated genes with phenotype-associated gene sets
from the GWAS catalog (Supplementary Data [192]22) revealed
associations with psychological traits (intelligence, neuroticism,
impulsivity/risk-taking, mood, and psychiatric disorders), behavior
(e.g., regular activity patterns, alcohol consumption), inflammatory
markers and diseases (C-reactive protein, IgG, and inflammatory bowel
diseases), blood pressure, adiposity (especially in SHS-PC2 and PC4),
and reproductive aging. Gene sets for Alzheimer’s disease and related
biomarkers (cerebrospinal fluid tau and amyloid β) were enriched in
SHS-PC1, SHS-PC4, and most strongly SHS-PC5. A hippocampal volume gene
set was enriched in SHS-PC2 and SHS-PC5. Dendrite gyrus brain volume
and kidney disease gene sets were enriched in SHS-PC2. Gene sets
associated with intracranial and subcortical brain region volumes,
craniofacial microsomia, idiopathic pulmonary fibrosis, and aortic root
size were enriched in SHS-PC4. An iron biomarker gene set was enriched
in SHS-PC5.
Genetic and causal relationships between SHS and other common complex traits
LD score regression (LSDC)^[193]44 revealed numerous phenotypes
genetically correlated to SHS traits. Among 375 phenome-wide
representative heritable traits (“Methods”), 256 traits were
genetically correlated with at least one SHS (p < 0.05/375/6 = 2.2e-5;
Fig. [194]4 and Supplementary Data [195]24). Genetic correlations with
SHS were strongest (magnitude ~0.3 to 0.5) with physical and mental
health, and (inversely) with socio-economic status (SES), stress, pain,
mental and emotional distress, and recognized health conditions and
risk factors. However, these genetic correlations had discernable
patterns, unique to each SHS trait. Compared with the SHS-PCs, SHS-ADD
had stronger genetic correlations with non-specific health markers,
e.g., traits related to overall health, physical conditioning, markers
of socio-economic status, healthy lifestyle factors, as well as
stronger inverse genetic relationships with pain, activities
interfering with sleep, and depression.
Fig. 4. SHS genetic correlations with selected phenotypes.
[196]Fig. 4
[197]Open in a new tab
a SHS genetic correlations (LDSC^[198]44, Methods) with sleep traits
(*p < 0.05/132). b SHS genetic correlations (LDSC^[199]44, Methods)
with selected health outcomes (*p < 0.05/2250). SHS sleep health score,
act. derived from actigraphy, max. maximum, N number, noct. nocturnal,
BMI body mass index, SBP systolic blood pressure, DBP diastolic blood
pressure, LDL low-density lipoprotein cholesterol, TG triglycerides,
HbA1c glycosylated hemoglobin, phys. act. physical activity, freq.
frequency, T2D type-II diabetes, MI myocardial infarction, dx
self-report of physician diagnosis to trained interviewe, EHR
electronic health record-derived, SES socioeconomic status, CVD
cardiovascular and metabolic disease, psych. psychological.
Compared with the other SHSs, SHS-PC1 had stronger inverse genetic
correlations with anxiety traits, alcohol addiction, and self-harm
behavior; SHS-PC2 had comparatively stronger genetic correlations with
daytime napping, diagnosed sleep disorders, and more moderate but,
relative to other SHSs, still comparatively stronger inverse genetic
correlations with metabolic and adiposity traits. Association patterns
for SHS-PC3 differed markedly from other SHS, having lower correlations
with overall health and most individual health conditions, inverse
association with educational attainment. Compared with the other SHS,
genetic correlations involving SHS-PC4 were often inverted, including a
(non-significant) inverse association with overall health rating
(despite a weak positive phenotypic correlation) as well as positive
associations with BMI, and cardiovascular traits, congruent with its
positive loading on snoring. SHS-PC5 was preferentially genetically
correlated with fluid intelligence, educational attainment, and
mother’s age.
We further conducted bidirectional Mendelian randomization (MR)
analyses to investigate potentially causal links between sleep health
and 50 selected traits (Methods). We identified potential causal
effects, estimated via inverse variance weighted (IVW) method, of lower
SHS-PC1 on codeine or tramadol medication use (β[IVW] = 0.47;
p = 4.1E − 07); lower SHS-PC5 on Bipolar disorder (β[IVW] = 1.66;
p = 1.6E − 05); and lower SHS-ADD on smoking initiation
(β[IVW] = − 0.42; p = 6.3E − 05) and years of schooling (β[IVW] = 0.34;
p = 1.1E − 04) (Supplementary Data [200]25). In the reverse direction,
MR identified potential causal effects of greater years of education on
higher SHS-ADD (β[IVW] = 0.15; p = 1.2E − 07) and SHS-PC5
(β[IVW] = 0.10; p = 2.7E − 06); lower BMI on higher SHS-PC2
(β[IVW] = − 0.10; p = 7.2E − 07); and Alzheimer’s disease risk on
higher SHS-PC5 (β[IVW] = 0.012; p = 1.5E − 05) (Supplementary
Data [201]26).
Comparison with composite sleep health scores constructed using genetic
correlations in addition to phenotypic correlations
As a sensitivity analysis we constructed composite sleep scores
informed by genetic correlations rather than phenotypic correlations,
using linear combinations of the same underlying sleep phenotypes as
used in the PC-based approach. We constructed these phenotypes using
the ‘MaxH’ maximally heritable approach^[202]45, which provides linear
combinations of phenotypes of maximum heritability, under an
independence constraint. The SHS-MaxH phenotypes were largely
phenotypically concordant with SHS-PCs, with similar heritabilities
(Supplementary Data [203]2 and [204]27). SHS-MaxH1 correlated
phenotypically with SHS-PC3; SHS-MaxH2 with SHS-PC1; SHS-MaxH3 with
SHS-PC2 (r > 0.9). Less concordantly, SHS-MaxH4 and MaxH5 correlated
with both SHS-PC4 and PC5 (|r| > 0.5) but were no greater in
heritability.
Discussion
We performed the first large-scale sleep health GWAS investigating five
novel PC-based SHSs and compared these with an updated GWAS of SHS-ADD
by including related individuals in UKB. Each SHS was based on five
underlying self-reported sleep traits: sleep duration, insomnia symptom
frequency, daytime sleepiness frequency, chronotype, and snoring,
resulting in distinct sleep health composites interpretable via their
loadings. The SHS approach emphasizes the co-occurrence of multiple
sleep traits, aligning with the multi-dimensional view of sleep health,
in which individual components of sleep do not confer sleep health in
isolation.
We identified 28 novel significant (p < 8.3e-9) loci and 31 additional
GWS (p < 5e-8) loci that were not reported by previous sleep GWASs. Our
findings were supported by sensitivity analysis and PRS validation and
associations with clinical sleep outcomes in independent studies. These
loci mapped to genes implicated in neurodevelopment, synaptic
signaling, ion channel transportation, cellular energy production, and
metabolic processes. The findings collectively suggest that studying
SHSs has advanced genetic discovery by linking to plausible biological
mechanisms and aligning with established sleep health domains, thereby
uncovering novel insights.
Findings for SHS-ADD, combining 5 positive binary sleep traits, suggest
a phenotype that captures global sleep health by integrating multiple
independent regulatory signals and sleep domains. Notably, SHS-ADD was
the SHS most strongly associated with accelerometry-derived sleep
regularity index, both phenotypically and genetically, suggesting the
conjunction of multiple independent sleep health traits may be a
prerequisite for sleep regularity. Genetic correlations were strongest
between SHS-ADD and overall health and SES, with MR indicating a
potential bidirectional causal relationship with SES. SHS-ADD was
sensitive to several adjustment factors, including health behaviors and
health status, as well as both psychological factors (with SHS-PC1) and
BMI (with PC2). Moreover, for SHS-ADD, evidence of enrichments,
particularly in neuronal tissues and cell types, was not as strong as
for other traits such as SHS-PC1 and SHS-PC3. Together these findings
suggest SHS-ADD to be a broad sleep health phenotype, that retains
significant heterogeneity, combining multiple independent sleep domains
and/or sleep disorders with distinct etiologies. It nevertheless
highlights the connections between global sleep health, sleep
regularity, and overall health and well-being.
SHS-PC1, longer sleep with less frequent insomnia and sleepiness,
reflects correlated self-reported sleep traits across the domains of
satisfaction, duration, and alertness, while demonstrating higher
heritability than SHS-ADD and each of its underlying sleep traits.
SHS-PC1 was the SHS most strongly genetically correlated with
accelerometry-based sleep efficiency and duration, suggesting shared
genetics with both longer and more efficient objective sleep.
Additional genetic correlation and MR analyses suggest negative
relationships with chronic pain (potentially causal), anxiety, and
neuroticism. Enrichment analyses identified the ubiquitin-proteasome
system pathway, important for circadian rhythm regulation and sleep
homeostasis^[205]46, as well as the GABAergic neuronal cell-type,
central to neural orchestration of sleep. Overall, these findings
suggest a phenotype capturing neurobiological sleep regulation, and
support a bidirectional relationship with anxiety^[206]47 via shared
GABAergic regulation^[207]48, while pointing to further mechanisms
involving presence or perception of distressing stimuli, including
chronic pain.
SHS-PC2, interpreted as healthy sleep characterized by absent snoring
and sleepiness, cardinal symptoms of sleep apnea syndrome, may serve as
a surrogate indicating absence of under-diagnosed and poorly captured
clinical sleep-disordered breathing conditions. Correspondingly,
SHS-PC2 showed strong inverse (causal) associations with
adiposity-related measures like BMI, consistent with the strong
association between OSA and obesity. Likewise, SHS-PC2 was the SHS most
sensitive to BMI adjustment, which nevertheless only modestly
attenuated estimated GWAS effects. SHS-PC2 had inverse genetic
correlations with cardiometabolic traits, notably type 2 diabetes,
which has previously been linked bidirectionally to OSA^[208]49.
Enrichment analyses also link SHS-PC2 with neuronal pathways and
hippocampal volume, suggesting neurological involvement in OSA, and
with connective-tissue genes (collagen pathway) and traits (adolescent
idiopathic scoliosis, aortic root size), consistent with a role for
connective tissue in pharyngeal collapsibility related to
sleep-disordered breathing^[209]50.
SHS-PC3 largely recapitulates chronotype and confirmed known
associations with circadian genes and pathways. It is notable that
chronotype was largely independent not only from other SHS-PCs but was
also more weakly genetically correlated with phenome-wide health
outcomes, while being moderately genetically correlated with healthy
lifestyle behaviors, such as physical activity and time spent outdoors
in summer.
Though SHS-PC4 and SHS-PC5 have more complex interpretations due to
both positive and negative loadings on healthy sleep traits, they were
nonetheless roughly as heritable as the least heritable individual
trait (sleepiness) and contributed novel genetic findings. SHS-PC4,
interpreted as snoring without sleepiness, showed genetic enrichment in
neurotransmission pathways and craniofacial structure, suggesting
mechanisms that could lead to sleepiness without snoring, and/or
snoring without sleepiness. The latter would be consistent with a
sleep-disordered breathing phenotype resulting from reduced
craniofacial dimensions that cause pharyngeal narrowing and turbulent
airflow (or snoring) without the severe airway collapsibility, sleep
fragmentation, and inflammation characteristic of obstructive sleep
apnea syndrome^[210]51,[211]52.
SHS-PC5, characterized by short sleep without insomnia, was genetically
correlated with later objective sleep midpoint and fewer
accelerometry-measured sleep episodes, as well as positive genetic
correlations with markers of cognitive decline, including memory loss,
cerebrospinal fluid t-tau levels, and Alzheimer’s disease, and with
enrichment of genes in the gap junction pathway, implicated in
amyloid-β clearance by astrocytes in Alzheimer’s disease (AD)^[212]53.
Additionally, MR showed a potential reverse causal association between
AD and SHS-PC5. This suggests that SHS-PC5 may characterize shorter,
delayed sleep, without insomnia symptoms, indicative of accelerated
brain aging, rather than a natural short sleep or ‘super-sleeper’
phenotype^[213]3. The low likelihood of insomnia in this subtype is
consistent with lack of consistent data implicating insomnia in
cognitive decline, potentially due to the heterogeneity of conditions
underlying insomnia.
Colocalization analyses revealed selective cases of shared regulation,
but also pointed to sometimes differing relationships with underlying
components, particularly in the special cases of PC2 vs PC4, and PC1 vs
PC5, which share underlying traits with opposite loadings. For example,
DLEU7, proximal to rs592333, colocalized opposite associations with PC2
and PC4, consistent with a role in snoring (which loaded positively in
PC4 and negatively in PC2) and consistent with prior associations of
the DLEU7 locus with adiposity^[214]54. Conversely, rs1846644
colocalized positive associations with both PC2 and PC4 at the KSR2
locus, a gene highly expressed in cerebellar Purkinje neurons^[215]55,
suggesting a link to sleepiness regardless of the presence of snoring,
in keeping with a role for Purkinje neurons in sleep-wake
transition^[216]56. Similarly, for SHS-PC1 and PC5, PAX8 colocalized
with opposite directions, in keeping with opposite loadings on sleep
duration, while MEIS1 colocalized with similar direction in keeping
with consistent loading on insomnia.
Several additional findings pointed to mechanisms shared across SHS
indicating broad-based involvement in sleep health. For example, PC2
and PC4 at the KSR2 locus, a gene highly expressed in cerebellar
Purkinje neurons, which were again implicated in PC1 by two novel loci,
ANKFY1 and WDR73/ZNF592, suggesting cerebellar regulation of sleep
maintenance efficiency. Additional findings consistently reinforced the
role of neural development, as well as consistently implicating
neurotransmitters and synaptic signaling. The association of SHS-PC2
with FCHO1 at rs12979056 gives further support for a role of
clathrin-coat vesicle transport in sleep health, presumably in synaptic
function, a mechanism previously implicated by the STON1-GTF2A1L,
TOR1A, TOR1B, AP2B1 (PC3) and AP3B2 (PC1) loci. Pathways and genes
related to MAPK, GAP-junction, immune signaling, and energy metabolism
processes were found to associate across multiple SHS. Interestingly,
the two identified novel functional variants, rs8074498 in PC1 and
rs138572890 in PC5, the PC phenotypes loading on duration/insomnia,
were connected to glucose transporters (respectively GLUT4 and GLUT1).
Moreover, enrichment analysis for individual SHS all highlighted gene
expression in brain tissues and cells, and associations with metabolic,
inflammatory, and psychiatric traits, which reinforce critical roles
for central nervous system, metabolic, and immune system function on
sleep regulation. Multi-trait genetic correlation analyses further
suggested more broadly interrelatedness of sleep health with overall
health, lifestyle, behavioral and psychological traits, as well as
pain, physical frailty, and deconditioning. However, some caution is
warranted as some apparent genetic relationships could be induced by
factors relating to the subjective rating of sleep health, remaining
heterogeneity within SHS traits, or unadjusted confounders, including
remaining population stratification^[217]57.
This study has several limitations. First, SHS-PCs incorporated sleep
traits in a linear fashion, which may not reflect the complexity of
their contribution to sleep health, e.g., the U-shaped contribution of
sleep duration as modeled in SHS-ADD. Second, SHS-PCs only focus on
five self-reported sleep questions to maximize sample size and to be
more comparable to SHS-ADD as previously published on in UKB by using
the same underlying sleep phenotypes. This could limit our ability to
fully capture a comprehensive sleep health composite, due to
limitations of subjective assessment or the ability of these questions
to fully capture well-being in sleep health. Third, the data-driven PC
approach may limit generalizability across different studies and
populations. We were able to identify HCHS/SOL as having similar
questionnaire data, however, PC loading generalization to this cohort
was challenging due to population and age differences. A potentially
productive approach to future validation studies would be to average PC
loadings across different studies in a meta-analysis^[218]58.
Nonetheless, using the same PC loading in UKB, we did not replicate
individual GWS loci but validated the PRSs constructed by genome-wide
variants. Lastly, we note that the SHS-PCs were based on phenotypic
correlations, which could limit the ability to derive heritable
composite phenotypes. However, in a sensitivity analysis we constructed
sleep scores informed by genetic correlations using the ‘maxH’
maximally heritable approach^[219]45, for which the loadings and
heritability of the derived phenotypes were numerically and
qualitatively similar. Nonetheless, genetic correlation information
could be valuable in future research integrating additional objective
sleep-related phenotypes.
In summary, this study introduces a novel approach to understanding
sleep genetics using PC-based SHS, which effectively distinguishes
differing mechanisms of distinct, domain-specific sleep-related traits.
In keeping with the large influence of insomnia (related to SHS-PC1)
and sleep apnea (related to PC2) on sleep health in the population,
along with the independent role of chronotype (PC3), our approach
appears to have enhanced genetic discovery by separately targeting
these domain-specific sleep health scales. Future research involving
SHSs incorporating accelerometry data and other objectively measured
sleep and activity data may provide enhanced targeting of psychosocial
domains and neuroregulatory sleep mechanisms, resulting in further
genetic discovery.
Methods
Population and study design
The discovery analysis was conducted on participants of European
ancestry from the UK Biobank study^[220]59. The UK Biobank is a
prospective study that has enrolled over 500,000 people aged 40–69
living in the United Kingdom. Baseline measures collected between
2006–2010, including self-reported heath questionnaire and
anthropometric assessments, were used in this analysis. Participants
taking any self-reported sleep medication (described
before^[221]7,[222]8,[223]10) were excluded. The UK Biobank study was
approved by the National Health Service National Research Ethics
Service (ref. 11/NW/0382), and all participants provided written
informed consent to participate in the UK Biobank study.
Genotype
DNA samples of 502,631 participants in the UK Biobank were genotyped on
two arrays: UK BiLEVE (807,411 markers) and UKB Axiom (825,927
markers). 488,377 samples and 805,426 genotyped markers passed standard
QC^[224]60 and were available in the full data release. 452,071
individuals of European ancestry (based on K-means clusters on genomics
PCs) were studied with available phenotypes and genotyping passing
quality control. SNPs were imputed to a combined Haplotype Reference
Consortium (HRC) and 1000 Genome panel. SNPs with minor allele
frequency (MAF) > 0.001, BGEN imputation score >0.3, maximum per SNP
missingness of 10%, and samples with a per-sample missingness of 40%
were kept in the GWAS.
Sleep trait assessment
The UK Biobank baseline questionnaire assessed chronotype, sleep
duration, insomnia symptoms, snoring, and excessive daytime sleepiness
via self-report responses. Self-reported sleep duration was recorded as
an integer-valued variable via responses to the question “About how
many hours sleep do you get in every 24 h? (please include naps).” The
remaining questions had ordinal responses, which were assigned to an
integer scale as follows. Chronotype (morningness): “Do you consider
yourself to be:” -2. Definitely an ‘evening’ person; -1. More an
‘evening’ than a ‘morning’ person; 0. Do not know; 1. More a ‘morning’
than an ‘evening’ person; 2. Definitely a ‘morning’ person; NA. Prefer
not to answer. Insomnia Symptoms: “Do you have trouble falling asleep
at night, or wake up in the middle of the night?” 1. Never/rarely; 2.
Sometimes; 3. Usually; NA. Prefer not to answer. Snoring: “Does your
partner or a close relative or friend complain about your snoring?” 1.
Yes; 0. No; NA. Do not know or Prefer not to answer. Subjective daytime
sleepiness: “How likely are you to doze off or fall asleep during the
daytime when you don’t mean to? (e.g. when working, reading, or
driving?)” 0. Never/rarely; 1. Sometimes; 2. Often; 3. All of the time;
NA. Do not know or Prefer not to answer. Individuals with any
missingness (NA) from any sleep questionnaires were excluded from the
analysis.
Sleep health score construction
For the UK Biobank additive sleep health score (SHS-ADD), consistent
with previous studies^[225]5,[226]13, we assigned one point to each of
five positive sleep traits, as follows: Chronotype: More a ‘morning’
than an ‘evening’ person or Definitely a ‘morning’ person; Sleep
Duration: from 7 to 8 h (inclusive); Insomnia Symptoms: Never/rarely;
Snoring: No; Subjective daytime sleepiness: 0. Never/rarely. No
subjects were excluded; those who did not report the positive
attributes were coded as zero for that trait. The final SHS-ADD rating
is the total number of positive sleep traits for each individual on a
scale of 0-5. We performed principal component analysis of the
above-described integer-scale self-report sleep question responses,
after centering and scaling to standardize each response to mean zero,
variance one, to compute SHS-PC1 through SHS-PC5. SHS-PCs were oriented
so that they were positively correlated with self-assessed overall
health (UKB Data-Field 2178: Overall health rating).
Covariate measurements
Covariates used in the sensitivity analyses included potential
confounders (depression, socio-economic deprivation based on
residential area, alcohol intake frequency, smoking status, caffeine
intake, employment status, marital status, neurodegenerative disorders,
and use of psychiatric medications). Depression was recorded as a
binary variable (yes/no) corresponding to question “Ever depressed for
a whole week?”. Social economic status was measured by the Townsend
Deprivation Index based on aggregated data from national census output
areas in the UK. Alcohol intake frequency was coded as a continuous
variable corresponding to “daily or almost daily”, “three or four times
a week”, “once or twice a week”, “once to three times a month”,
“special occasions only”, and “never” drinking alcohol. Smoking status
was categorized as “current’, “past”, or “never” smoked. Caffeine
intake was coded continuously corresponding to self-reported cups of
tea/coffee per day. Employment status was categorized as “employed”,
“retired”, “looking after home and/or family”, “unable to work because
of sickness or disability”, “unemployed”, “doing unpaid or voluntary
work”, or “full or part-time student”. Neurodegenerative disorder cases
(N = 517) were identified as a union of International Classification of
Diseases (ICD)-10 coded Parkinson’s disease (G20-G21), Alzheimer’s
disease (G30), and other degenerative diseases of nervous system (G23,
G31-G32).
Accelerometry data
Accelerometry data were collected using Axivity AX3 wrist-worn triaxial
accelerometers in 103,711 individuals from the UK Biobank for up to 7
days, 3–10 years after baseline^[227]61. Sleep period time (SPT)-window
and activity levels were extracted using a heuristic algorithm using
the R package [228]GGIR^[229]62. Briefly, for each individual, a 5-min
rolling median of the absolute change in z-angle (representing the
dorsal-ventral direction when the wrist is in the anatomical position)
across a 24-h period. The 10th percentile of the output was used to
construct an individual’s threshold, distinguishing periods with
movement from non-movement. Inactivity bouts were defined as inactivity
of at least 30 min duration. Inactivity bouts with gaps of less than
60 min were combined into blocks. The SPT-window was defined as the
longest inactivity block, with sleep onset as the start of the block
and waking time as the end of the block. This algorithm provides
comparable estimates of sleep onset time, waking time, SPT-window
duration, and sleep duration within the SPT-window with
polysomnography-derived metrics^[230]62. After quality control based on
missingness, wear time, and calibration, eight metrics were generated
and analyzed in this study, namely: M10 (midpoint of the 10 consecutive
hours of maximum activity), L5 (midpoint of the 5 consecutive hours of
minimum activity), sleep midpoint, sleep duration, sleep efficiency,
diurnal inactivity, number of nocturnal sleep episodes, and sleep
regularity index (accounting for wake after sleep onset and daytime
napping^[231]63).
Genome-wide association analysis
We performed a genome-wide association analysis (GWAS) of six SHS
traits as continuous variables using linear mixed regression models
adjusting for age, sex, genotyping array, 10 PCs, and genetic
relatedness matrix in BOLT-LMM. Association testing was based on
p-value from the non-infinitesimal mixed model association test
provided in the BOLT-LMM output. Reference 1000 genome
European-ancestry (EUR) LD scores and genetic map (hg19) were utilized
in this analysis. X-chromosome data were imputed and analyzed
separately (with males coded as 0/2 and female genotypes coded as
0/1/2) using the same analytical approach in BOLT-LMM as was done for
analysis of autosomes. FUMA was used to annotate the genome-wide
significant (GWS) risk loci (p < 5e-8), lead independent SNPs
(r^2 < 0.1), and all SNPs in LD with independent SNPs (r^2 ≥ 0.6)
within a genomic region (including ANNOVAR functional consequence, CADD
score, RegulomeDB score, as well as 15 chromatin states, and GWAS
Catalog associations). LD is derived from the built-in “UKB/release2b
EUR_10k” reference panel. GWAS loci were defined in FUMA^[232]18 by
merging LD blocks within 250 kb of one another. We compared the GWS
loci to loci reported by biobank-based GWAS of sleep traits published
by June 2022, including insomnia^[233]8,[234]9, sleep duration^[235]7,
daytime sleepiness^[236]10, chronotype^[237]12, snoring^[238]11,
daytime napping^[239]64, obstructive sleep apnea (OSA)^[240]65,
restless legs syndrome (RLS)^[241]66, and prior SHS-ADD in UKB
unrelated individuals^[242]13. GWS loci with a lead variant at least
500 kb from any of the previously published GWS sleep variants were
noted as unreported. To account for performing six simultaneous GWAS,
we report significant novel loci defined as the unreported loci passing
a stricter significant threshold correcting for six traits
(p < 8.3e-9).
The constructed SHS traits, as sums of ordinal and integer-valued
variables, were somewhat non-normal, which has the potential to affect
Type-I error. Consistent with prior UKB-GWAS of ordinal sleep
phenotypes, conducting GWAS using linear mixed models implemented in
BOLT is expected to mostly ameliorate this concern. BOLT has previously
been shown to preserve Type-1 error adequately for non-normal ordinal
and binary phenotypes, as long as the sample sizes in different groups
not markedly imbalanced^[243]67.
Gene mapping and gene-based analysis
We used FUMA to map SNPs with r^2 ≥ 0.6 with the lead-independent SNP
in each locus to genes using three methods: positional mapping
(≤10 kb), cis-eQTL (≤1 Mb) in GTEx v8 tissues (FDR < 0.05), and 3D
Chromatin interaction (CI) in 127 tissue/cell types (FDR<1e-6). We also
looked up ensemble phenotypes using R biomaRt package and drug
interaction evidence using DGIdb for mapped genes. We next performed a
supplementary gene-based association analysis using genome-wide summary
statistics using MAGMA^[244]20 in FUMA. Input SNPs were mapped to
18,931 protein-coding genes. Genome-wide significance level was defined
as p < 0.05/18,931 = 2.641e-6. Gene-level association results were used
to aid prioritization of genes at loci identified as significant in the
primary SNP-level multi-GWAS analysis.
Gene set enrichment analysis
We performed pathway enrichment analysis using PASCAL, which estimated
a combined association p-value from the summary statistics of multiple
SNPs in a gene^[245]43. Significant KEGG, Reactome, and BIOCARTA
pathways for each SHS trait were identified using empirical p < 0.05.
We also performed gene set enrichment analysis on positional, eQTL and
CI-mapped genes in FUMA MAGMA^[246]20 adjusted for gene size.
Significant tissues were identified using p < 0.05/54/6 accounting for
54 tissues in GTEx v8 and six traits. Significant pathways (KEGG,
Reactome, and GO pathways in MSigDB), and GWAS gene set (GWAS Catalog)
were identified using adjusted p < 0.05. PASCAL pathway enrichment
incorporates effect sizes and LD at individual loci, whereas FUMA MAGMA
gene set enrichment is based only on the overlap in sets of associated
genes and does not account for LD, impacting interpretation.
Colocalization analysis
We performed colocalization analysis across SHS traits using
HyPrColoc^[247]68 and genome-wide summary statistics to assess the
shared genetic risk factors. Given the independence of the SHS-PCs, we
expected less colocalization and for colocalized loci to reflect
pleiotropic effect and play a central regulatory role.
Sensitivity analyses
Sensitivity analyses of all GWS loci were performed additionally
adjusting for potential confounders (including adiposity,
socio-economic status, alcohol intake frequency, smoking status,
caffeine intake, employment status, marital status, and psychiatric
problems) individually in 308,902 unrelated individuals using PLINK in
additional to adjusting for age, sex, genotyping array and 10 PCs in
PLINK 1.9. We used a hard-call genotype threshold of 0.1, SNP
imputation quality threshold of 0.80, and a MAF threshold of 0.001. We
also performed the analysis excluding shift workers and individuals
with chronic health or psychiatric illnesses (N = 115,297) and in males
(N = 145,196) and females (N = 163,716) (without adjusting for sex).
Genetic risk score analysis
We constructed a weighted Genetic Risk Score (GRS) comprised of GWS
loci for each SHS and tested for associations with other self-reported
sleep traits (sleep duration, long sleep duration, short sleep
duration, insomnia, chronotype, and snoring), and 7-day accelerometry
traits in the UK Biobank. Association testing was conducted in a sample
overlapping with sample which was used for the SHS GWAS, consisting of
the available unrelated complete-case subset of UKB for each tested
outcome. Weighted GRS analyses were performed by summing the products
or risk allele count multiplied by the effect estimate reported in the
SHS GWASs using R package gtx
([248]http://cran.nexr.com/web/packages/gtx/gtx.pdf). We also tested
the GRSs of reported loci for sleep traits using the same approach.
Genetic correlation analysis
We estimated genetic correlations among SHS and with other
self-reported and accelerometry sleep traits using LDSC^[249]44 and
genome-wide SNPs mapped to the HapMap3 reference panel. To understand
the genetic overlap with a range of common health problems, we selected
381 representative UKB traits by choosing the 232 most heritable traits
using UKB hierarchical phenotype categories (selecting at most one
‘level’ per phenotype, at most 5 phenotypes per phenotype category, and
at most 25 phenotypes per category group), as well as all 195 (partly
overlapping) traits in the PanUKBB maximally independent set of
phenotypes
([250]https://pan.ukbb.broadinstitute.org/blog/2022/04/11/h2-qc-updated
-sumstats/index.html). Of the 381 selected traits, 375 passed
heritability thresholds and were carried on for LDSC analyses. Traits
passed heritability QC if stratified LD score regression in the UKB
Europeans defined in the Pan UKBB GWAS (sldsc_25bin_h2_pval_EUR) was
significant with p < 0.05/381. Multiple-testing-corrected significance
level for genetic correlation was defined as p < 0.05/375/6 = 2.2e-5.
Mendelian randomization analysis
Bidirectional mendelian randomization (MR) analyses, as implemented in
the TwoSampleMR R package^[251]69, were conducted to investigate
potentially causal links between sleep health and 50 representative
traits from across the phenome, selected (prior to performing MR) based
on their relationships with sleep traits, either in prior literature,
or based on results of interest from the genetic correlation analysis.
The final list of traits (below) was also determined by manual review
of availability of non-UKB GWAS summary statistics in the IEU open GWAS
project database (non-overlapping samples required for valid
inference). We report associations that were significant correcting for
testing 50 phenotypes and 6 sleep health traits (p < 0.05/300), under
2-sample inverse variance weighted (IVW) methodology, while also
requiring effects estimated under MR-Egger to be consistent in
direction, and that the putative causal direction was not invalidated
in a Steiger directionality test, such that the selected instruments
for the exposure had a greater R^2 variance explained in the exposure
phenotype than the outcome phenotype (neither the MR-Egger nor Steiger
tests were required to be statistically significant). SNP heterogeneity
was evaluated by Cochran’s Q (for IVW)^[252]70 and Rücker’s Q’ (for
MR-Egger)^[253]71 statistics. Horizontal pleiotropy was assessed by
MR-Egger intercept.
The following GWAS traits were selected for MR from the IEU open GWAS
project database: finn-b-R18_COUGH: Cough, finn-b-RX_CODEINE_TRAMADOL:
Codeine or tramadol medication, ieu-a-73: Waist-to-hip ratio,
ieu-b-103: HbA1C, finn-b-F5_NEUROTIC: Neurotic, stress-related and
somatoform disorders, ieu-a-832: Rheumatoid arthritis, ieu-a-962: Ever
vs never smoked, finn-b-G6_PARKINSON_EXMORE: Parkinson’s disease (more
controls excluded), ieu-a-44: Asthma, ieu-b-18: multiple sclerosis,
ieu-a-89: Height, finn-b-KRA_PSY_ANXIETY: Anxiety disorders,
ebi-a-GCST002216: Triglycerides, finn-b-R18_PAIN_THROAT_CHEST: Pain in
throat and chest, ieu-b-2: Alzheimer’s disease, finn-b-ANTIDEPRESSANTS:
Depression medications,
finn-b-Z21_PERSONS_W_POTEN_HEALTH_HAZARDS_RELATED_SOCIO_PSYCHOSO_CIRCUM
STANC: Persons with potential health hazards related to socioeconomic
and psychosocial circumstances, finn-b-K11_DIVERTIC: Diverticular
disease of intestine, ieu-a-113: Neo-agreeableness, ieu-a-115:
Neo-extraversion, ieu-a-1009: Subjective well-being, finn-b-PAIN: Pain
(limb, back, neck, head abdominally), finn-b-ALCOHOL_RELATED: Alcohol
related diseases and deaths, all endpoints, ieu-b-4855: FEV1/FVC,
ieu-a-294: Inflammatory bowel disease, ieu-a-16: Childhood
intelligence, finn-b-F5_SUBSNOALCO: Substance use, excluding alcohol,
finn-b-F5_PANIC: Panic disorder, ieu-a-114: Neo-conscientiousness,
ieu-b-4859: Physical activity, ieu-a-116: Neo-neuroticism,
finn-b-F5_GAD: Generalized anxiety disorder, finn-b-I9_IHD: Ischemic
heart disease, wide definition, finn-b-F5_DEPRESSIO: Depression,
ieu-b-4820: Age at first birth, ebi-a-GCST002223: HDL cholesterol,
ieu-a-1001: Years of schooling, finn-b-RX_PARACETAMOL_NSAID:
Paracetamol of NSAID medication, ebi-a-GCST003116: Coronary artery
disease, ieu-b-38: systolic blood pressure, ieu-b-41: bipolar disorder,
finn-b-M13_ENTESOPATHYLOW: Enthesopathies of lower limb, excluding
foot, ieu-b-73: Alcoholic drinks per week, finn-b-K11_REFLUX:
Gastro-esophageal reflux disease, finn-b-E4_DIABETES: Diabetes
mellitus, ieu-a-1095: Age at menarche, ieu-a-835: Body mass index,
ieu-a-117: Neo-openness to experience, ieu-b-39: diastolic blood
pressure, ebi-a-GCST002222: LDL cholesterol
Replication analysis
HCHS/SOL is a community-based study in the USA, which includes 16,415
adults aged 18–74 with self-identified Hispanic/Latino
background^[254]72,[255]73. Individuals were recruited from randomly
selected households near four centers in Miami, San Diego, Chicago and
the Bronx area of New York. Self-reported sleep duration, insomnia
(assessed by Women’s Health Initiative Insomnia Rating Scale [WHIIRS]),
daytime sleepiness (assessed by Epworth Sleepiness Scale [ESS]), and
snoring were collected in 13,268 individuals at baseline. Chronotype
was only available in 1855 individuals enrolled in the Sueño sleep
ancillary study.
We converted the sleep data in HCHS/SOL to UKB scale. For insomnia, we
used two questions that were part of the WHIIRS questionnaire in the
HCHS/SOL: (1) “Did you have trouble falling asleep?” (2) “Did you wake
up several times at night?”. Each question provided 5 choices: 1. No,
not in the past four weeks; 2. Yes, less than once a week; 3. Yes, 1 or
2 times a week; 4. Yes, 3 or 4 times a week; 5. Yes, 5 or more times a
week. We converted the sum score of the two questions (2–10) to UKB
scale as: 2–4 = “Never/rarely”; 5–8 = “Sometimes”; 9–10 = “Usually”.
For sleepiness, we converted the ESS score (0–24) to UKB scale as:
0–10 = “Never/Rarely”; 11–14 = “Sometimes”; 15–18 = “Often”;
19–24 = “All the time”. For snoring, we converted the 4-level answers
for the question “How often do you snore now?” to UKB scale as: 0 if
the answer was “Never” or “Rarely” and 1 if the answer was “Sometimes”
or “Always”. Chronotype was collected in Sueño using the same
questionnaire to UKB. We performed multiple imputation to calculate and
impute the missing chronotype in the rest of the samples in HCHS/SOL
(N = 11,413) using chained equations method with linear regression on
relevant variables, specifically, sex, age, BMI, and four sleep timing
questions “What time do you usually go to bed in the weekday?”, “What
time do you usually go to bed in the weekend?”, “What time do you
usually wake up in the weekday?”, and “What time do you usually wake up
in the weekend?”. We found the post-imputation distribution of the
chronotype responses matched those observed in Sueño. We then
constructed SHS in HCHS/SOL using the same loadings from UKB.
Of the 13,268 individuals with imputed phenotype data, 11,144
individuals with genotype data and consented to genetic research are
available for replication. Genotyping was conducted using an Illumina
Omni2.5 M SNP array with additional customized content, including
2,536,661 SNPs and imputed to TOPMed reference panel using TOPMed
imputation server. Genetic association analysis for the 400 GWS loci
were performed using linear mixed model in R Genesis software adjusting
for age, sex, study center, sampling weights, five principal components
of genetics representing ancestry, with random effects corresponding to
kinship, household, and block unit.
We construct Polygenetic risk score (PRS) for each SHS trait in the
HCHS/SOL data using Polygenic Risk Score–Continuous Shrinkage
(PRS-CS)^[256]74 with the UKB SHS genome-wide summary statistics and
the 1000 Genome European linkage disequilibrium (LD) reference panel.
PLINK was used to compute the PRS scores by summing the product of the
effect allele count and the effect size across all SNPs in the PRS.
Validation of SHS PRS on clinical phenotypes in a clinical biobank
The Mass General Brigham (MGB) Biobank is a clinical biobank enriched
for disease states supplemented with genetic data from the MGB
healthcare network in Massachusetts. Since 2009, patients have been
recruited through online channels or in person from various MGB
community-based primary care facilities and specialty tertiary care
centers. Among the enrolled patients, a subset (n = 64,639) provided
blood samples for genotyping. DNA extracted from samples was genotyped
using the Infinium Global Screening Array-24 version 2.0 (Illumina).
Imputation was carried out through the Michigan Imputation server with
the Trans-Omics for Precision Medicine (TOPMed) (version r2) reference
panel, and haplotype phasing was performed using Eagle version 2.3.
Low-quality genetic markers and samples were excluded^[257]75. Pairs of
related individuals (kinship > 0.0625) were identified, and one sample
from each related pair was excluded^[258]75. To correct for the
population substructure, principal components of ancestry were computed
using TRACE and the Human Genome Diversity Project^[259]76,[260]77.
Individuals with non-European ancestry were excluded to limit genetic
heterogeneity in the present analysis.
Among 47,082 adult patients included in the present analysis (mean age
= 60.4 ± 17.0; 53.8% female), PRS for each SHS trait were generated
using PRS-CS^[261]74. Case ascertainment for sleep disorders was based
on clinical phenotypes identified from ICD-9/-10 billing codes and
mapped to PheWAS codes (i.e., “phecodes”) based on clinical similarity
generated by the PheWAS R package^[262]78. A total of 13 sleep
disorders were considered in the analysis. For each disorder,
participants with at least two codes were set as cases, and those with
no relevant codes were set as controls. Associations between the PRSs
and each disorder were tested via the Wald test using logistic
regressions adjusted for age, sex, genotyping array, batch, and PCs of
ancestry. Significance was determined using Bonferroni-adjusted P
values for the total number of tests (0.05/(6 PRSs × 13 phenotypes) =
6.4e-4).
Comparison with composite sleep health scores constructed using genetic
correlations in addition to phenotypic correlations
We applied the MaxH method^[263]45 to our phenotypic and genetic data
to construct linear combinations of the self-reported sleep phenotypes
that maximize heritability of the composite, subject to an independence
constraint. From BOLT-REML analyses conducted with the regression model
corresponding to the GWAS described above, we obtained the residual
phenotypic covariance in the analytic sample for sleep duration,
insomnia, snoring, sleepiness, and chronotype, as well as the
corresponding genetic covariance. The heritability-optimized MaxH
phenotype loadings were derived from eigenvectors of the Rayleigh
quotient matrix, following the MaxH method. Calculations were done in
base R, from BOLT-REML outputs.
Statistics and reproducibility
Statistic used are as described above. P-values calculated from the
described regression models are based on Wald tests of the appropriate
parameter, unless otherwise specified. Reproducibility was maintained
by use of versioned scripts and package management systems, such as the
R package renv.
Supplementary information
[264]Supplementary Information^ (12.6MB, pdf)
[265]42003_2025_7514_MOESM2_ESM.pdf^ (119.5KB, pdf)
Description of Additional Supplementary Materials
[266]Supplementary Data 1-27^ (2MB, xlsx)
[267]Transparent Peer Review file^ (905.9KB, pdf)
Acknowledgements