Abstract
The propensity of a trait to vary within a population may have
evolutionary, ecological, or clinical significance. In the present
study we deploy sibling models to offer a novel and unbiased way to
ascertain loci associated with the extent to which phenotypes vary
(variance-controlling quantitative trait loci, or vQTLs). Previous
methods for vQTL-mapping either exclude genetically related individuals
or treat genetic relatedness among individuals as a complicating factor
addressed by adjusting estimates for non-independence in phenotypes.
The present method uses genetic relatedness as a tool to obtain
unbiased estimates of variance effects rather than as a nuisance. The
family-based approach, which utilizes random variation between siblings
in minor allele counts at a locus, also allows controls for parental
genotype, mean effects, and non-linear (dominance) effects that may
spuriously appear to generate variation. Simulations show that the
approach performs equally well as two existing methods (squared Z-score
and DGLM) in controlling type I error rates when there is no unobserved
confounding, and performs significantly better than these methods in
the presence of small degrees of confounding. Using height and BMI as
empirical applications, we investigate SNPs that alter within-family
variation in height and BMI, as well as pathways that appear to be
enriched. One significant SNP for BMI variability, in the MAST4 gene,
replicated. Pathway analysis revealed one gene set, encoding members of
several signaling pathways related to gap junction function, which
appears significantly enriched for associations with within-family
height variation in both datasets (while not enriched in analysis of
mean levels). We recommend approximating laboratory random assignment
of genotype using family data and more careful attention to the
possible conflation of mean and variance effects.
Introduction
From effects of loci on trait means to effects of loci on trait variance
The extent to which a complex trait varies in a population is a product
of mutation, genetic drift and natural selection, as well as
environmental variation and its interaction with genotype. Moreover,
genotypes may differ in their propensities to vary. That is, particular
genotypes might be more sensitive than others to changes in the
environment or to the effects of new mutations. Sensitivity to the
environment could come in the form of phenotypic plasticity, whereby
stereotyped phenotypic changes occur under particular circumstances, or
in the form of developmental instability, whereby random fluctuations
in the internal or external environment lead to different phenotypic
outcomes ([[35]1–[36]3]).
Sensitivity to the effects of mutations is related to cryptic genetic
variation. In a population in which individuals are relatively
insensitive to the effects of mutations, allelic variation may
accumulate, while not presenting phenotypic effects. Replacing an
allele that confers less sensitivity to mutational effects with one
that confers more sensitivity (or introducing an environmental
perturbation with similar effect) will cause the accumulated variation
to have phenotypic consequences ([[37]4, [38]5]). This release of
cryptic genetic variation might have major implications for the
adaptation of populations to environmental change, as well as for the
genetics of human complex traits ([[39]6–[40]9]). Indeed, it has been
proposed that release of cryptic genetic variation might be responsible
for the increased prevalence of human “diseases of modernity” such as
diabetes ([[41]7]).
Model organisms have been used to identify genes that control
sensitivity to mutations or the environment [[42]5]; [[43]10]). Two
approaches have been taken. One approach is to test for differences in
phenotypic variance between wild-type and mutant strains. For example,
the molecular chaperone Hsp90, when impaired, reveals cryptic genetic
variation in the fly Drosophila melanogaster, the flowering plant
Arabidopsis thaliana, the fish Danio rerio and the budding yeast
Saccharomyces cerevisiae ([[44]11–[45]14]).
Evidence on whether Hsp90 controls environmental sensitivity is mixed
([[46]15–[47]17]). However, screens of the S. cerevisiae genome
identified hundreds of genes that, when mutated, cause increased
variation in cell morphology among isogenic cells raised in the same
environment. That is, these mutations increased sensitivity to
fluctuations in the internal or external environment ([[48]18,
[49]19]). A test of one of these genes, which showed a major increase
in environmental sensitivity upon deletion, revealed a high degree of
epistasis with new mutations but no net increase in mutational
sensitivity upon deletion ([[50]2, [51]20, [52]20]). A similar result
was found for impairment of Hsp90 ([[53]21]). In general, the
relationship between suppression of the effects of environmental
variation and suppression of the effects of mutational variation is
unclear ([[54]5, [55]10]).
The second approach to identifying variance-controlling genes is to use
linkage or association analysis to map natural genetic variants that
confer differential sensitivity. Such searches for variance-controlling
quantitative trait loci (vQTLs) have been conducted to identify
determinants of microenvironmental sensitivity (i.e., inherent
stochasticity or sensitivity to fluctuations within a nominally
constant environment) and/or phenotypic plasticity (variation across
controlled environments) of various phenotypes, including morphological
and life-history traits as well as expression levels of individual
genes, in a range of organisms including humans ([[56]22–[57]36]).
Natural genetic variation affecting sensitivity to other segregating
alleles has been studied as well, and indeed there are natural variants
of the Hsp90 gene that appear to reveal cryptic variation
([[58]37–[59]39]).
Existing methods for detecting vQTL
The first study claiming to map a locus that controls variance of a
human trait examined body mass index (BMI) using data from 38 cohorts
that participate in the GIANT consortium for genome-wide analysis (GWA)
of single-nucleotide polymorphisms (SNPs) that affect human height
([[60]40])(although a prior study had examined variance in order to
prioritize the search for gene-by-environment and gene-by-gene
interaction effects [[61]41]). Yang et al. computed a Z-score
(inverse-normal transformation) for BMI for each of 133,154 individuals
then performed GWA on the squared Z-score. This squared Z-score
captures the magnitude of each individual’s deviation from the mean
phenotype and therefore is meant to act as an individual-based measure
of variance. In this discovery sample they found SNPs in the FTO gene
and the RCOR1 gene that appear to control variation in human BMI. In a
replication sample of 36,327 individuals from 13 cohorts, one SNP in
the FTO gene was confirmed [[62]40].
Challenge one: Isolating loci that affect variance in a trait from loci that
affect mean of a trait
There are several challenges with association analyses that identify
vQTL. The first issue is mean-variance confounding. A given locus can
have: 1) effects on mean levels of a trait (mean effects), 2) effects
on variance in a trait (variance effects), or 3) effects on both mean
levels of a trait and variance in a trait. For isolating variance
effects, it is critical to check that any detected effects on
“variance” adequately distinguish between the three types of loci. In
particular, this conflation of variance and mean effects was a concern
in the analysis of Yang et al. ([[63]40]) because normalized variation
scores will tend to be higher for populations with higher mean levels
and because FTO is one of the most well-established genes that affects
the mean level of BMI in human populations ([[64]42–[65]50]). Research
examining MMP3 protein levels in cerebrospinal fluid found similar
overlap, where SNPs in linkage disequilibrium with a locus
well-established in predicting mean levels of the trait (rs679620 of
the MMP3 gene) were associated with both higher mean levels and higher
variance in the trait ([[66]51]).
Yang et al. addressed the effect of mean BMI by showing that there is
no global correlation between mean effects of SNPs and their variance
effects. However, it must be considered that a lack of correlation
between mean and variance effects across the genome could be caused
merely by the fact that the vast majority of SNPs have negligible or
nonexistent effects on both mean and variation. In the present study,
we show through simulation that the squared Z-score method has an
inflated type I error rate and detects variance effects for a trait
simulated to have mean effects only.
In addition to mean-variance confounding, methods to investigate vQTL
must address several other issues, some of which are shared with the
estimation of mean effects and others of which, such as mean-variance
confounding, are unique to or particularly acute in the case of
variance effects ([[67]31]). Methods for vQTL analysis beyond the
squared Z-score method can be grouped into: 1) non-parametric methods
that test whether the three genotypes at a biallelic locus
(minor-allele homozygote, heterozygote, and major-allele homozygote)
have equal or unequal variance, and 2) parametric methods that relax
the assumption in GWA linear regressions that the residual error is
identically distributed across all genotypes.
Non-parametric methods include Levene’s test, which uses a test
statistic derived from the squared difference between an individual’s
level of a trait and the genotype-level mean or median, the latter of
which is used to make the method more robust to a non-normally
distributed trait ([[68]41, [69]52]) and the Fligner-Killeen (FK) test,
which is similar to Levene’s test, in that it uses the absolute
difference between an individual’s level of a trait and the
genotype-level median, but then computes the test statistic based on
ranks of these differences. The FK test can be used either as a
standalone test for variance effects ([[70]25]), or as the test
statistic for the scale component of the Lepage or other joint
scale-location test ([[71]53]). More recently proposed non-parametric
tests consider not only the variance of the trait distribution but
other features as well, such as skew ([[72]54, [73]55]).
The main drawback of non-parametric tests for detecting vQTL’s is that
the tests, which compute differences between discrete genotype groups,
cannot directly control for important covariates, such as age, sex, and
population structure. Some adopt a two-stage regression procedure for
including covariate controls: first the trait is regressed on
covariates, then the variance test of interest is performed on the
residualized dependent variable ([[74]55, [75]56]). However, two-stage
procedures have been shown in simulations to reduce power and induce
bias ([[76]51]).
Challenge two: Controlling for unobserved confounders that can bias the
estimate on minor allele count
This problem—how to include covariates in a way that does not induce
bias—is acute because of the importance of two types of controls when
investigating variance-controlling loci: control for population
stratification and controls for nonrandom association between genotype
and environment. The former control is needed in all GWA analyses to
separate the effect of any particular locus from the effects of all
other loci shared by virtue of common ancestry. The latter control
might be particularly relevant to vQTL analyses because mean effects of
genotypes might impact the environment that is experienced, which might
in turn impact variance ([[77]57]).
Parametric approaches that use generalized linear models to jointly
estimate the mean and variance of a trait address this problem
([[78]51, [79]58, [80]59]); these models can include controls for
population stratification as well as controls for observed covariates
that influence genetic distribution into variance-affecting
environments. The double generalized linear model (DGLM) approach
begins with the typical linear model for estimating mean effects where
the residual variance is the same across genotypes, then the model is
relaxed to allow residual variance to differ by genotype and to
incorporate non-genetic covariates that might contribute to residual
variance; it iterates between estimating parameters for the mean versus
parameters for the variance until convergence ([[81]59]). DGLM thus
allows joint estimation of mean effects and variance effects,
attempting to address mean-variance confounding, and permits controls
for population stratification directly in the model. The main drawback
of DGLM is that, similar to other methods that control for confounding
by controlling for observed covariates (age; sex; the top principal
components) correlated with both the individual’s genotype and the
outcome variable, the method cannot control for unobserved confounding
that may bias the estimate of a SNP’s effect.
In particular, two types of unobserved confounders may be correlated
both with an individual’s genotype and mean or variance effects in a
trait. First is population stratification. Population stratification is
typically controlled for in these methods by inclusion of principal
components of the sample population’s genotypes among the vectors of
covariates predicting the mean and/or residual variance of a trait.
This control is especially important for traits, such as BMI, that are
expected to show considerable environment-dependence. Especially when
pooling such traits across cohorts, there is a risk that systematic
differences in environment correlate regionally with systematic
differences in genetic variation. That is, it is plausible, due to
population stratification, that any genetic signal is merely acting as
a proxy for culture and environment—a potential confounder that has
been well-illustrated by the “chopsticks gene” example ([[82]60]).
Controlling for population stratification using principal components
(PCs) addresses some confounding, but there is residual between-family
confounding even with these controls. This residual confounding can
occur when environmentally influential factors are not randomly
distributed across families but also do not correlate with the
eigenvectors in the genetic matrix.
The other critical covariate that might bias vQTL estimates is
genotype-environment correlation (rGE). Genotype-environment
correlations may be caused by niche construction, whereby individual
organisms shape the environment (in a genotype-dependent way)
([[83]61–[84]65])–dynamics we might expect to occur for phenotypes that
have significant behavioral and environmental etiologies, such as BMI.
As a result, genotypes may be associated with variance in BMI and other
traits not through direct genetic effects but through interaction with
alternative environments associated with variance such as more versus
less sedentary lifestyles. An analogous situation that illustrates this
potentially confounding genotype-by-environment interaction effect is
that of caffeine consumption. A variant in a gene that encodes a
caffeine-metabolizing enzyme can lead to greater variation with no
effect on the mean through a mechanism of niche construction—i.e.
individuals with the minor allele avoid coffee altogether, or if they
are unable to do so, they end up drinking more than those with the
major allele, thus leading to greater variance thanks to the coffee
“environment” ([[85]57]).
Approaches such as DGLM can control for observed covariates that are
correlated with genotype and influence construction of
variance-affecting environments. However, these methods cannot control
for unobserved differences that produce these correlations. The present
paper uses a family-based model to control for these unobserved
differences. Two existing family-based models allow for investigations
of variance-based loci in samples among related as opposed to unrelated
individuals, but do not leverage the family-based structure of the data
to control for unobserved confounders that vary between families
([[86]58]; [[87]66]). First is a family-based version of the likelihood
ratio test that adds a random effect meant to capture familial
correlation in a trait. Although the family-level random effect helps
control for unobserved variation between families that may influence
variance in a trait through pathways other than genotype, the model
relies on the strong assumption of independence between these
unobserved features of family and the observed covariates. We show via
simulation (see [88]Results) that when there is non-zero correlation
between unobserved features of a family and observed covariates, random
effects approaches generate biased estimates of a SNPs’ effects,
confirming results shown in non-genetics contexts ([[89]67]).
Similarly, DGLM in a sample containing monozygotic and dizygotic twins,
which has the advantage of isolating non-genetic from genetic sources
of variance, does not control for unobserved features that vary between
families and that affect construction of variance-affecting
environments ([[90]66]).
Proposed solution: The sibling standard deviation method
We offer an alternative methodology—comparisons of variation within
sibling sets while controlling for parental genotype—that does not
assume independence between observed covariates and unobserved
between-family differences. As a result, the method better approximates
random assignment of genotype in a laboratory. Utilizing a
regression-based framework, the approach retains the advantages of the
DGLM and Bayesian regression approaches: the ability to include
covariates and control for mean effects when estimating variance
effects through estimation of parameters capturing both. The model uses
sibling pairs as the unit of analysis and regresses the standard
deviation of the sibling pair’s trait on the pair’s count of minor
alleles with sibling pair-level controls that include controls for the
mean level of the trait in the sibling pair, parental genotype, pair
sex (MM or FM or FF), mean pair age, and the within-pair age difference
(for the full model specification, as well as alternative
specifications tested, see [91]Methods). The control for the mean level
of the trait in a sibling pair avoids inflated Type I error rates for
SNPs that affect the mean of a trait but not the trait’s variance.
The proposed methodology, although restricted in applicability to
datasets that have a family-based design with at least two offspring,
makes a trade-off. The method has reduced statistical power because the
sample size is halved when we treat siblings rather than individuals as
the unit of analysis. However, the advantage is an estimate of a minor
allele’s contribution to variance that, in the presence of unobserved
confounders, correctly fails to find variance effects when a locus only
has mean effects.
This is a particularly important trade-off to make when we wish to rule
out gene-environment correlations across populations as well as
population stratification as alternative explanations to variance-locus
associations. As noted above, this is especially critical for a
phenotype such as BMI and a locus such as FTO given that environment
and behavior (such as sedentariness) alter the FTO-BMI relationship and
may vary significantly across cohorts/societies ([[92]68]). In cases
where the goal is not to study control of variance per se, but instead
is to probe the existence of gene-environment or gene-gene interactions
in a way that avoids the high-dimensional parameter space problems of
traditional approaches, the use of vQTL approaches that have higher
statistical power but also a higher rate of false positives for SNPs
that affect the mean of a trait but not the variance might be
warranted.
In addition to power, another important feature of an approach to
detect vQTL is the flexibility to capture non-linear effects of
alleles, which the DGLM, the parametric bootstrap-based likelihood
ratio test, and Bayesian regressions allow for by allowing genotypes to
be specified using three indicator variables to capture
non-linearities. The present family-based approach is potentially
susceptible to confounding of variance effects by non-linear effects of
alleles, because the association mapping is done on the sibship unit so
the genotype is represented as the total number of major or minor
alleles of each sib pair. Therefore, if dominance were at play with
respect to mean levels, then this might generate spurious effects on
variance ([[93]69]). That is, if among heterozygotes there was no
effect of an allele on mean levels but among homozygotes there was,
then this itself would generate apparent, but spurious, effects on
variance even when controlling for linear mean effects. However, by
comparing subgroups among sibship pairs that have two minor alleles
(out of four possible in total), we are able to check for this
possibility. Specifically, by comparing those 2-minor allele pairs
where both siblings are heterozygotes (i.e. each individual has one
minor allele) with those where both are homozygotes (where one sibling
has zero minor alleles and the other has two), we can rule out this
possible statistical artifact of non-linear effects on mean levels.
Although the possibility of non-linearities that do not reflect true
variance effects can never be totally eliminated ([[94]69]), this
approach guards against a primary form of non-linearity—dominance.
Preview of the results
Below we report two sets of analyses. In simulations, we show how
unobserved confounding can bias estimates of a SNP’s effect. Two
approaches to estimating variance effects—the squared Z-score method
and DGLM—display inflated type I error rates in the presence of this
confounding. In contrast, the sibling standard deviation approach that
we propose detects variance effects when these effects are present but,
by controlling for the mean of the trait across siblings, correctly
fails to find variance effects when only mean effects are present.
In an empirical application, we then use the sibling SD approach to
perform genome-wide analyses of variance effects on two phenotypes:
height and BMI. We replicate one genome-wide statistically suggestive
hit for variation in BMI from the Framingham Heart Study (FHS) data,
our discovery sample, in our replication sample, the Minnesota Twin
Family Study (MTFS). We also test whether our potential
variance-related alleles are merely reflecting dominance effects among
heterozygotes; we find no evidence for this. Finally, we perform
gene-based and pathway enrichment analysis. We find one pathway,
related to gap junction function, that is significantly enriched in
both our discovery and replication samples for associations with
variance in height. We then discuss the implications of our findings
for prior and future research.
Results
Simulations
The simulations are divided into two parts. First, we show that while
we can address unobserved confounding when estimating the mean of a
trait using a fixed effects estimator that identifies the effect of an
allele off of between-sibling variation, combining this approach with
two current approaches to variance detection—the squared Z-score method
and DGLM—fails to correct for this bias. This introduces the challenge:
how can we estimate the effect of an allele on trait variance in the
presence of unobserved confounding? Part two evaluates the sibling SD
method as a solution.
Part one: Two approaches to variance estimation in the absence versus
presence of an unobserved confounder
When estimating mean effects, we can remove bias in the estimate of the
effect of a minor allele caused by unobserved confounding by using a
fixed effects estimator that demeans the outcome, genotype, and other
observed covariates by the mean within a family ([[95]67]). Examining
the trait simulated to have neither mean nor variance effects, Fig A in
[96]S1 File shows that in the presence of any unobserved confounding,
other estimators (a random effects model that assumes zero correlation
between unobserved confounders and observed covariates; a pooled
regression model with one randomly sampled sibling) return biased
estimates of the effect of an additional minor allele on the trait’s
mean when there is unobserved confounding. More specifically, it shows
that across the 1000 replicates, we see upward bias in the random
effects and pooled regression estimators when we move from the case of
no confounding or a very low amount of confounding ρ = 0.01 to the case
of some confounding (ρ = 0.05 and ρ = 0.1). Table A in [97]S1 File,
which presents the results of a Hausman test comparing the null
hypothesis that the estimated random effects coefficients are equal to
the estimated fixed effects coefficients, rejects the null at levels of
confounding greater than ρ = 0.01.
These results show how a fixed effects estimator that identifies the
effect from sibling deviations from a family’s mean count of alleles
recovers an unbiased estimate. How can we translate these findings from
the case of investigating the effect of an additional minor allele on
the trait’s mean (QTL) to investigating the effect of an additional
minor allele on the trait’s variance (vQTL)?
One approach is to use existing methods for variance detection on a
transformed version of the data. In particular, the fixed effects
estimator is generated by demeaning the outcome, genotype, and other
covariates by the mean across the grouping unit (family in this case),
as represented as follows, where i indexes an individual, j indexes a
family, k indexes a SNP, and X represents genotype and other
covariates. α represents an intercept specific to each family, β
represents coefficients on covariates that vary across individuals,
families, and variants (e.g., minor allele count), and ϵ represents an
error term. The first equation shows the demeaning of each of the terms
and the second equation shows that this demeaning removes the
family-level intercept from the regression.
[MATH:
yijk=(αj-<
msub>α¯j)
mrow>+βk(<
mi>Xijk-X¯jk)+(ϵ<
mrow>ijk-ϵ¯jk)
:MATH]
[MATH:
yijk=βk(X
ijk-
X¯jk)+(ϵ<
mrow>ijk-ϵ¯jk)
:MATH]
We can represent this demeaned version of the data more compactly as
follows:
[MATH:
yijk=βkX¨ijk+ϵ¨ijk :MATH]
(1)
One approach to translating the FE estimator to the vQTL context is to
use the transformed version of the data represented in [98]Eq 1 with
two existing methods for vQTL detection: the squared Z-score method
([[99]40]) and the DGLM ([[100]59]). We explore the properties of each
approach in the sections that follow on three versions of the dependent
variable:
1. Dependent variable simulated to have neither mean nor variance
effects: the dependent variable is simulated to have no
relationship between the count of minor alleles and either the mean
or variance of the outcome (see [101]Methods)
2. Dependent variable simulated to have mean effects only: the
dependent variable is simulated to have a positive relationship
between the count of minor alleles and the mean of the outcome (but
not the outcome’s variance) (see [102]Methods)
3. Dependent variable simulated to have variance effects only: the
dependent variable is simulated to have a positive relationship
between the count of minor alleles and the variance of the outcome
(but not the outcome’s mean) (see [103]Methods)
4. Dependent variable simulated to have mean and variance effects: the
dependent variable is simulated to have a positive relationship
between the count of minor alleles and both the mean and the
variance of the outcome (see [104]Methods)
Squared Z-score results
Table B in [105]S1 File presents the results of regressing the squared
Z-score (estimated separately by sex) on the minor allele count with
controls for sex, age, and ancestry across the 1000 replicates. The
table summarizes the percentage of replicates where the coefficient on
the minor allele count ≠ 0 at p < 0.05. We want this percentage to be
low when the dependent variable is simulated to have no variance
effects: so either null effects or mean effects only in order to
control for type I error. We want this percentage to be high when the
dependent variable is simulated to have variance effects: so either
variance effects only or mean and variance effects. The results shows
that in the absence of an unobserved confounder, we see moderately
inflated type I error rates (estimating variance effects when only mean
effects are present in 5.9% of simulations). In the presence of an
unobserved confounder, we see more inflated type I error rates: 8% of
simulations estimate variance effects when only mean effects are
present. Table C in [106]S1 File shows that demeaning the data does not
address these inflated type I error rates.
DGLM results
The squared Z-score method generates one coefficient of interest that
represents an allele’s contribution to variability in the form of
increasing an individual’s squared Z-score. The DGLM method, which
estimates a linear regression for the trait mean and a gamma regression
from the squared residuals from the first model, generates two
coefficients of interest: (β refers to coefficients on mean effects; γ
refers to coefficients on variance effects). [107]Table 1 summarizes
the true β and γ for different types of simulated traits:
Table 1. Outcomes versus coefficients.
Simulated dependent variable Coefficient on minor allele count from
DGLM
Neither mean nor variance effects β = 0; γ = 0
Mean effects only β ≠ 0; γ = 0
Variance effects only β = 0; γ ≠ 0
[108]Open in a new tab
Table D and Table E in [109]S1 File look at these three cases. For the
trait with null effects, DGLM should estimate β = 0 for mean effects
and γ = 0 for variance effects. For the trait with mean effects, DGLM
should estimate β ≠ 0 and γ = 0. For the trait with variance effects,
DGLM should estimate β = 0 and γ ≠ 0. Therefore, the table summarizes
the percentage of β ≠ 0 and γ ≠ 0 at the p < 0.05 level.
The results show that in the presence of between-family confounding
between the genotype and outcome variable, the method has two types of
type I error. First, when the outcome has neither mean nor variance
effects and there is confounding, the method detects mean effects when
none are present in over 77% of simulations. Second, when the outcome
only has variance effects and there is confounding, the method detects
mean effects when none are present in over 70% of simulations.
Transforming the data via de-meaning reduces these type I error rates
but fails to detect variance effects when these are present (type II
error).
Problems part one reveals
The previous section reveals a problem when choosing a method for
detecting the effects of an additional minor allele on the mean or
variance of a trait in the presence of unobserved confounding between
an individual’s genotype and the trait. In particular, while the fixed
effects estimator provides an unbiased estimate of the effect of minor
alleles on the mean of a trait, de-meaning the data and then trying to
estimate variance effects using the squared Z-score or DGLM approach
leads to inflated rates of type I error when the loci has null or mean
effects but no variance effects. In addition, the DGLM estimated on
demeaned data fails to detect variance effects when these effects are
present.
These problems point to the need for a method that corrects for bias
caused by unobserved confounding but that is also able to detect
effects of minor alleles on a trait’s variance when these effects are
present. The next section investigates properties of the sibling
standard deviation method as a proposed solution.
Part two: Properties of the sibling standard deviation method
To examine variance effects, we estimate the sibling standard deviation
method (see [110]Methods). Before moving to results across the 1000
replicates, [111]Fig 1, which presents the sibling count of minor
alleles and non-adjusted mean sibling standard deviation for pairs with
that count for one randomly chosen replicate, highlights that the
standard deviation of the trait increases (y axis) as we see an
increase in the sibling count of minor alleles (x axis) for the 1.
trait simulated to have variance effects only and 2. trait simulated to
have both mean and variance effects. In contrast, the standard
deviation appropriately does not increase for the 1. trait simulated to
have mean effects only and 2. the trait simulated to have neither mean
nor variance effects.
Fig 1. Raw means of sibling standard deviations (before age, sex, mean of
trait, and parental genotype controls) by count of minor alleles.
[112]Fig 1
[113]Open in a new tab
The graph shows that the sibling standard deviation increases with the
count of minor allele snps that have effects on variance only, or
effects on both the mean and variance of a trait, while stays flat for
snp’s that only effect the mean or that not associated with the trait.
[114]Fig 2 shows that the results from the [115]Fig 1 generalize across
the replicates. The figure shows the distribution of γ on sibling minor
allele count from regressing the sibling standard deviation of a trait
on this count for two traits: the trait simulated to have mean effects
only and the trait simulated to have variance effects only. The figure
shows the distribution is properly centered around zero for the trait
with mean effects and properly centered around γ ≠ 0 for the trait with
variance effects.
Fig 2. Results of sibling standard deviation method across 1000 replicates.
[116]Fig 2
[117]Open in a new tab
The figure shows that both in the presence and absence of family-level
confounding between the genotype and outcome variable, the method,
which examines the effect of an additional minor allele in the sibling
pair on the trait’s standard deviation, correctly estimates no variance
effects (β = 0) when the outcome is simulated to have mean effects
only, and correctly detects variance effects (β ≠ 0) when the outcome
is simulated to have variance effects only.
Table F in [118]S1 File summarizes the percentage of simulations that
reject the null of γ on minor allele count equaling zero. The null
hypothesis in this case, rather than a specific value for the γ on
minor allele count, follows other methods in posing the null of whether
the coefficient is equal to zero [[119]51]. In the presence of no
unobserved confounding, the sibling SD approach has the expected type I
error rate of correctly failing to detect between 95.4 and 95.8% of
variance effects when only mean effects are present depending on the
specification, and correctly failing to detect between 94.6 and 95.7%
of variance effects when neither mean nor variance effects are present.
The squared Z-score method correctly fails to detect 94.1% of variance
effects when only mean effects are present, indicating that the sibling
SD method performs equally well as that existing method in the no
confounding case.
The lower type I error rate of the sibling SD method comes with
comparable type II error rates as other methods when we do not control
for parental genotype: the sibling SD method detects variance effects
when these are present in 81.6% of cases, while the squared Z-score
method detects variance effects when these are present in roughly 87%
of cases and the DGLM method detects variance effects in 75.3% of
cases. When we control for parental genotype in the sibling SD method
as the most stringent form of control on confounding, the method has a
higher type II error rate than the squared Z-score and DGLM. At the
sample size used in the simulation (1000 sibling pairs), the method is
under-powered to detect variance effects when we control for parental
genotype, detecting these effects in roughly 40% of cases. Later, we
discuss larger family-based study designs that are large enough to
control for parent genotype and still have sufficient power to detect
variance effects.
In the presence of some unobserved confounding, the sibling SD method
greatly outperforms the other approaches in controlling type I error
rates. While DGLM and the squared Z-score method detect mean effects
when only variance effects are present in 8-17% of simulations
depending on whether ancestry controls or included and whether the data
are demeaned, the sibling SD method detects mean effects when only
variance effects are present at rates of between 4.2% and 5.6% of cases
depending on the specification. In addition, the method correctly
detects variance effects when these effects are present in 80.8% (no
parent genotype control), with the power dropping to 44.1% (parent
genotype control) at the simulation sample size of 1000 sibling pairs.
Empirical application to height and BMI
Sibling SD results: Height and BMI
The simulation results show that in the absence of an unobserved
confounder, the sibling SD method performs equally well as existing
approaches to variance detection (squared Z-score; DGLM); in the
presence of a small degree of unobserved confounding (ρ = 0.1 between a
family indicator and offspring genotype), the sibling SD method
performs significantly better than these two approaches in correctly
failing to detect variance effects when only mean effects are present.
We now apply this method to two phenotypes: height and BMI. When we use
data from quartets in the FHS and control for mean sibling-pair height,
sex, mean pair age, within-pair age difference, and parental genotype
in genome-wide regressions on the standard deviation of the sibling
pair height we find four SNPs that are genome-wide suggestively
significant (p<10-5) and meet other Hardy-Weinberg equilibrium (HWE)
and minor-allele frequency (MAF) controls (see [120]Methods): rs2804263
(MAF 30.8%); rs2073302 (MAF 39.1%); rs8126205 (MAF 37.1%) and rs4834078
(MAF 24.0%) ([121]Fig 3). For BMI, there were two SNPs that meet
genome-wise suggestive significance: rs30731 (MAF 48.7%) and rs41508049
(MAF 10.3%)([122]Fig 4, with a regional linkage map of the SNP that
replicates in Fig B in [123]S1 File). Our method, as expected, controls
well for population structure: QQ plots for the height and BMI p-values
do not show the telltale “early liftoff” typical of failure to control
this confounder (Fig C in [124]S1 File).
Fig 3. Manhattan plot of sibling variation in height among FHS 3rd generation
sibling pairs.
[125]Fig 3
[126]Open in a new tab
Results for the pairwise sibling standard deviation in height regressed
against the sibling-pair minor count of alleles with controls for sex
of sibship, mean age of siblings, age difference of siblings, sibling
mean height, parental genotype.
Fig 4. Manhattan plot of sibling variation in BMI among FHS 3rd generation
sibling pairs.
[127]Fig 4
[128]Open in a new tab
Results for the pairwise sibling standard deviation in BMI regressed
against the sibling minor allele count with controls for sex of
sibship, mean age of siblings, age difference of siblings, sibling mean
BMI, and parental genotype.
The absence of genome-wide significant SNPs but presence of genome-wide
suggestive SNPs may indicate that the method’s statistical power is
low. A power analysis for the sample size utilized in the discovery
dataset indeed supports this suggestion. For example, a single SNP
would need to explain over 4.8% of the variation in the trait for the
study to achieve 80% statistical power at the FHS sample size and a
significance threshold of p < 10^−5 (Fig D in [129]S1 File). An effect
size of this magnitude is not expected for human complex traits (e.g.,
the largest effect of a single SNP for mean height explains
approximately 1% of the variation). Power is very low near R^2 = 0.01
(Fig D in [130]S1 File) in this particular sample. However, the figure
shows how newly-released samples are adequately powered to detect the
effects and highlight the method’s potential utility for better-powered
studies.
Apparent effects on variance can be generated if effects of alleles at
a locus are not additive (i.e., there is dominance) or if means and
variances are correlated. Addressing the first issue, our approach does
not control for non-additivity of allele effects at a locus, as it
assumes a linear model. However, it does allow a test of whether an
effect on variation net of mean was actually an artifact of non-linear
effects on average rather than an actual variance effect. If the true
relationship between phenotype and a sibling’s minor allele dosage were
non-linear (i.e. revealed dominance effects) our initial findings could
be entirely driven by divergence among those sibling pairs with two
minor alleles. For example, if an individual with two minor alleles
were significantly taller than an individual with either one or zero
minor alleles (recessive effect) then when we collapsed the sibling
pairs with two minor alleles, we could generate artifactual variation
effects because among those sibships with two minor alleles, some would
be distributed 0-2 (and thus one sibling would be taller than the
other) while other sibships would be 1-1 (and thus would be the same
height). Put together, it would appear that two minor alleles increased
the variation net of mean effects. And if strong enough, such a
misspecified effect could exert enough leverage to make a linear effect
on variation appear across all allele numbers (zero to four for the
sibship). These concerns appear not to apply to our analysis. A
two-sample t-test of equality of means that compares homozygotes and
heterozygotes for each significant SNP on the respective trait finds no
differences in levels for either trait at the p < 0.05. [131]Fig 5,
which presents the mean and standard errors, shows the lack of
significant differences.
Fig 5. Test for spurious association with variance due to non-linear effects
on mean levels.
[132]Fig 5
[133]Open in a new tab
Mean and standard error for height (inches) and BMI among with two
minor alleles is shown separately for homozygotes (one sibling with
zero minor alleles and the other sibling with two) and heterozygotes
(each sibling has one minor allele), for each genome-wide suggestively
significant SNP for the respective trait (A. height; B. BMI). One
significant SNP for height (rs8029740) is not depicted because there is
only one sibling pair with the 1-1 allele combination and 0 sibling
pairs with the 0-2 combination. A two-sample t-test for equality of
means, estimated separately for each SNP, revealed no significant
differences between the two groups for the top hits for each trait.
For the second issue, we can investigate correlated mean and variance
effects in the present data. Others have shown this correlation depends
on the minor allele frequency of the SNP and the effect size (the
effect of the SNP on z^2 is (1 − 2p)a^2, where p represents the minor
allele frequency and a represents the additive effect [[134]40]). To
test these in an independent sample, the top SNP for variance in BMI
found by Yang et al. ([[135]40]) shows clear association with mean BMI
in the same data set from the GIANT consortium (Fig E in [136]S1 File),
whereas the four suggestively significant SNPs in our analysis do not
show an association with mean height in the GIANT consortium data (Fig
F in [137]S1 File). Further highlighting the importance of our
regression-based control for sibling-pair mean are the observations
that: 1) sibling-pair standard deviation shows a strong positive
correlation with sibling-pair mean, and 2) this correlation is not
eliminated by using the sibling-pair coefficient of variation rather
than the standard deviation (Fig G in [138]S1 File).
For replication analysis, we used respondents from the MTFS. (For list
of proxy MTFS SNPs with information on MAF and linkage with FHS SNPs
see Table G in [139]S1 File) These families included phenotypic and
genotypic information on pairs of twins as well as their parents,
allowing us to replicate the sibling-based analysis with parental
genetic controls so as to mimic random assignment of alleles. The MTFS
has both dizygotic (DZ) and monozygotic (MZ) twins. Because MZ sibships
do not vary in terms of cryptic genetic variation and may experience
much more similar environments to each other than do genetically
distinct siblings, we also repeated our replication analysis only with
DZ twin sets but found that exclusion of MZ twins did not affect
results. Another concern is that twins (even DZ twins) may experience
more similar environments than singleton siblings; thus, our
replication analysis may suffer from attenuation bias to the extent
that the cause of variation is environmental and not cryptic genetic
differences (which should, by contrast, be equivalent for singleton
full siblings and DZ twins). Among the SNPs that were genome-wide
suggestively significant for the height analysis, only two of the four
had viable proxy SNPs in the MTFS dataset after quality control:
rs2804263 and rs4834078 both had proxies whereas rs2073302 and
rs8126205 did not (Table G in [140]S1 File). When we ran the analysis
for the proxy SNPs in the MTFS dataset, none achieved statistical
significance. Among the SNPs that were genome-wide suggestively
significant for the BMI analysis, both SNPs had viable proxy SNPs in
the MTFS dataset after quality control (Table G in [141]S1 File). When
we ran the analysis for the proxy SNPs in the MTFS dataset, one SNP
achieved statistical significance: rs30731 (proxy in MTFS: rs28636).
In addition to checking whether the significant SNPs from the FHS
analysis replicate in the MTFS sample, we also 1) took the top hits
from the FHS sample, 2) found proxy SNPs in Yang et al., and 3)
investigated the p-values for these top hits with the squared Z-score
in their analysis. None of the top hits from our study significantly
predicted the squared Z-score in their study. There could be multiple
reasons for this lack of overlap. First, the Yang et al. hits could be
false positives due to failing to control for unobserved confounding.
Alternately, the present analysis may have had false negatives that
miss top hits from Yang et al. due to the lack of statistical power. We
cannot distinguish between these two observations in the present
analysis.
Investigating rs30731/rs28636 for within-sibling variation in BMI
rs30731/rs28636, a SNP that significantly affects within-sibling
variation in BMI and that replicated in the MTFS sample, is located on
the MAST4 gene, which encodes a member of the microtubule-associated
serine/threonine protein kinases. The proteins in this family contain a
domain that gives the kinase the ability to determine its own scaffold
to control the effects of their kinase activities.
GWAS studies have uncovered several significant associations between
other SNPs on this gene and traits ranging from BMI to autism/PDD-NOS
(Table H in [142]S1 File). One exception is a GWAS of childhood obesity
conducted by the Early Growth Genetics Consortium ([[143]70]). The
study found a genome-wide suggestively significant hit for rs28636 in
the discovery sample that did not replicate. This non-replication could
be due to the SNP being a variance-affecting locus that might show up
in estimation of mean effects.
Pathway and gene set analyses for all significant SNPs
In addition to investigating the gene function for the replicated SNP,
we also performed two analyses that pool SNPs: gene-based and
pathway-based pooling (see [144]Methods). For the gene analysis, we
found using PASCAL that the gene on which the significant SNP for BMI
variability was located (MAST4) was significantly enriched (p = 0.0015
in the FHS data; p = 0.1 in the replication data). Table I in [145]S1
File shows other significant gene sets identified using PASCAL that
replicated at various p-value thresholds in the MTFS data. VEGAS1
yielded no further significant gene sets.
i-GSEA4GWAS and PASCAL for pathway analysis each yielded some pathways
that appeared to be significantly enriched ([146]Table 2). One of these
pathways, associated with within-sibship variance in height in the FHS
data, replicated in the MTFS data: HSA04540 Gap Junction (p = 0.002 for
each data set)(Fig H in [147]S1 File). HSA04540 includes members of
several signaling pathways, including growth factors and their
receptors, although any connection between these factors and organismal
growth, as manifested in ultimate height, remains to be determined.
Importantly, this pathway was not significant in the GWA for mean
levels effects in either dataset. Table J in [148]S1 File shows
replicated pathways for BMI and height variability estimated using
PASCAL. Important to note, however, is that while HSA-04540 was
significant across datasets using the i-GSEA4GWAS tool, the PASCAL tool
does not test for this pathway so we could not check its robustness
across multiple pathway analysis softwares.
Table 2. Enriched canonical pathways for height and BMI sibling-pair standard
deviations in FHS, estimated using i-GSEA4GWAS.
Pathway/Gene set name P-value FDR Sign. Genes/Slctd. Genes/All Genes
HEIGHT
GLYCEROLIPID METABOLISM < 0.001 6.0000000000000001E-3 20/35/45
HSA00561 GLYCEROLIPID METABOLISM < 0.001 8.9999999999999993E-3 27/49/58
VEGFPATHWAY < 0.001 1.5666665999999999E-2 47046
HSA03030 DNA POLYMERASE 1E-3 2.76E-2 45920
BILE ACID BIOSYNTHESIS 3.0000000000000001E-3 3.175E-2 46712
HSA04540 GAP JUNCTIONM[149]^* 2E-3 4.1666667999999997E-2 42/78/98
HSA00564 GLYCEROPHOSPHOLIPID METABOLISM 5.0000000000000001E-3
0.24111110999999999 20/49/68
BMI
HSA00591 LINOLEIC ACID METABOLISM < 0.001 6.0000000000000001E-3 11652
[150]Open in a new tab
*Replicates in MTFS data
Discussion
Our analysis extends earlier work that aimed to map
variance-controlling loci in humans ([[151]40]). Although the prior
work enjoyed greater statistical power, it also had more potential for
bias—due both to environmental confounding and to conflation of mean
and variance effects. Indeed, Yang et al. identified a locus regulating
BMI variability that is also strongly associated with mean levels and
for which a gene-by-environment interaction effect on mean has been
shown. In the present study, we first show via simulation that two
widely-used methods for detecting variance-effecting loci—the squared
Z-score method and DGLM—fail to adequately distinguish between a locus
affecting the trait mean and a locus affecting the trait variance in
the presence of unobserved confounding. The sibling SD method, by
controlling for the sibling mean of a trait and identifying off of
random, between-sibling variation in allele counts, does distinguish
between these effects. Applying the method to data, we were able to
perform within-family analysis on two samples of white Americans,
completely free of population stratification, largely devoid of rGE
confounding, and with controls for mean level effects as well as checks
for non-linear (i.e. dominance) effects on mean levels. One SNP for BMI
variability, located on the MAST4 gene, replicated. Notably, the SNP,
unlike many on FTO, has not been found to affect mean levels of BMI.
Like the latest methods to map variance-controlling loci in controlled
crosses ([[152]59]), our approach therefore avoids common confounds. At
the same time it overcomes problems specifically associated with human
traits, including the construction of variance-affecting environments,
that existing regression-based methods for detecting vQTLs fail to
address because they allow controls solely for observed confounders
(e.g., [[153]51, [154]58, [155]59, [156]71]).
Though underpowered in the FHS and MTFS sample sizes used in the
present analysis (Fig D in [157]S1 File), our results strongly support
the benefits of approximating a randomized genetic experiment by
analyzing within-family variation while controlling for parental
genotype. Such an analysis addresses the possibility that it is merely
cross-family environments interacting with a mean effect and/or
population structure that produce apparent association with
variability. Meanwhile, parameterizing the estimand as spread (SD) net
of sibship mean levels provides a robust, flexible way to conceive of
variation—that is, rather than parameterizing the relationship between
mean and variance a priori by using the coefficient of variation or
some similar summary statistic. The trade-off inherent to our approach
is that environmental and phenotypic variation within sibships may be
attenuated, reducing statistical leverage; the extent of such a dynamic
is wholly dependent on phenotype, of course. Although the within and
between family components of the variation in the phenotype can be
measured to determine whether or not the phenotype is suitable for such
an approach, the extent of variation within and between sibships in the
unmeasured environmental factors that matter is, of course, unknown.
Moving forward, we see three applications of the method: (1) combining
the method with twin studies to better distinguish between GxG versus
GxE effects that each contribute to trait variability; (2) examining
the heritability of plasticity in a trait as a supplement to examining
the heritability of levels of a trait; and (3) generating weights for
polygenic scores to predict trait variance. We discuss each in turn.
Variability across MZ versus DZ twins
First, we can combine the present within-family approach to measuring
phenotypic variability with the classic twin comparison approach of
behavior genetics, we obtain a method to distinguish between GxE and
GxG interaction effects that may be revealed in a vGWAS for variation
regulating loci. Namely, if a particular allele produces more
variability among dizygotic twins than among monozygotic ones, we can
infer that the difference between those allelic effects is attributable
to two forces: 1. Putatively greater environmental differences within
DZ twin pairs than within MZ twin-ships; and/or 2. The greater
(cryptic) genetic variation within DZ pairs as compared to their MZ
counterparts. Since prior work ([[158]72]) shows that the equal
environments assumption seems to hold for a wide range of outcomes,
thus weakening support for 1 as the explanation, we can attribute the
bulk of the difference in trait variation between DZ versus MZ twins to
the theory that the allele in question is not only buffering the
environment but also serving as a phenotypic capacitor (i.e. repressing
cryptic genetic variation).
Estimating the heritability of plasticity
Another way to combine the present approach with classical statistical
genetic techniques is to supplement estimates of the heritability of
levels of a trait with estimates of the heritability of plasticity in a
trait. For instance, GREML involves partitioning the observed
phenotypic distance in a trait between individuals into the sum of
genetic and environmental contributors to this distance. If we switch
from individuals as the unit of analysis when measuring this distance
to sibling pairs as the unit of analysis, we can calculate the
phenotypic distance between siblings in each pair and the average
genotype at each locus across the two siblings. The, we can place
unrelated sibling pairs in a Genetic Relatedness Matrix and contrast
the genetic distance between sibling pairs to the amount of variability
for the phenotype the pairs display to recover estimates of the
heritability of this variability.
Constructing vPGS
Currently, researchers develop and use polygenic scores (PGS) that
predict mean levels of a trait. We can extend the polygenic score
approach to develop scores that predict variance in a trait (vPGS).
Coefficients for a vPGS construction in a prediction sample would be
obtained from a vGWAS done within families with sibling sets in the
discovery sample to obtain estimates that better distinguish between
mean and variance effects, but the application of coefficients could
then be to the individual person. Fig D in [159]S1 File shows that the
method is adequately powered to detect effects of SNPs that explain
fewer than 1% of the variation in traits in samples like the UK Biobank
that could be used at this discovery stage.
Having a polygenic risk score that predicts particular forms of
phenotypic variability may be helpful for researchers hoping for
non-null results with respect to a given phenotypic measure who are
therefore looking to recruit sensitive subjects for experimentation
that involves specific environmental exposures. Per the earlier
discussion, if calculated from pairs of MZ twins, such a polygenic
score would capture only environmental sensitivity. But if a vGWAS of
MZ twins and DZ twins were conducted, the results could be differenced
out to provide a measure of phenotypic capacitance—i.e. regulation of
internal, genetic variation.
Beyond telling experimenters which subjects may be more genetically
sensitive, such a phenotypic capacitance score may have important
predictive power in terms of disease. Namely, cancer, autoimmune
diseases, metabolic syndrome and other irregularities of cell or system
stability may themselves be predicted by a genetic architecture that is
less robust. Thus, a genetic screening for a tendency toward
developmental plasticity (i.e. if the plasticity score was calculated
on developmental indicators such as height) may be diagnostic. If
applied to behavioral phenotypes, such a score could be predictive of
mental disorders that reflect a lack of canalization of mind, so to
speak, such as schizophrenia.
In light of this discussion, we think that there is benefit to
combining prior, pedigree-based approaches with newer GWAS methods to
better estimate variance effects (as well as levels effects). Thus, we
recommend that consortia of cohorts with genome-wide data on sibling
pairs at the minimum, quartets ideally, be formed to advance GWA to a
more solid foundation of inference that approximates the unbiased
estimates of lab-based genetic manipulations by taking advantage of
random differences in sibling genotypes.
Materials and methods
Simulation study
Generating genotype and trait data
The simulation proceeds in four steps. First, we generate genotypes for
parents and offspring. Second, we generate an unobserved family-level
confounder that is correlated to varying degrees with the observed
sibling genotype. Third, we use the genotype to generate four traits:
1. Trait with neither mean nor variance effects
2. Trait with mean effects but not variance effects
3. Trait with variance effects but not mean effects
4. Trait with both mean and variance effects
In this third step, we generate four versions of each of the four
traits: 1) a version that is not affected by between-family
confounding; and three versions affected by varying degrees of
between-family confounding. Fourth, we explore how the process in steps
one through three generates between-family confounding that is not
fully addressed by controls for the subpopulation that generates the
genotype. Finally, we compare the performance of the sibling standard
deviation approach to other methods. We describe the first four steps
in the present section, and summarize the results of step five in the
results.
All steps were repeated for 1000 replicates of size 2000 (1000 sibling
pairs with 2 offspring each, generated from 4 ancestral
sub-populations).
Step one: Generating genotypes
We use the following process to generate genotypes for parents and
offspring, repeated separately for each of the 1000 replicates:
1. Generate parent genotypes: we use the function simMD taken from the
source code for R’s popgen package to generate parent genotypes,
and use the following parameters in the present simulation:
+ N = 2000 parents, N = 1000 families with 2 offspring per
family
+ 4 subpopulations, with c = 0.01 representing the extent to
which each subpopulation differs in allele frequencies of SNPs
from typical values
+ 1 causal snp. Traits with no effects were thus generated by
setting the coefficient on the parameter that governs the
relationship between 1) the allele on trait mean, and 2) the
allele on trait variance, to zero.
+ Allele frequency (p) of each SNP randomly drawn from a uniform
distribution with bounds at [0.1, 0.9]
2. Generate offspring genotypes: we use the parent genotypes generated
in step 1 to generate offspring genotypes assuming random mating
and segregation
3. Step one and step two result in parent and offspring genotypes we
use in steps two and three of the simulation
Step two: Generating a family-level confounder correlated with genotype
Consider the following model for the relationship between a SNP and
phenotype for an individual i, nested in family j, for snp k. For now,
we just consider identifying the causal effect of an allele on the mean
of Y:
[MATH:
yijk=α+βk<
mi>Xijk+ϵijk
:MATH]
Linear regressions such as those run in GWAS, which often restrict the
sample to unrelated individuals, ignore the grouping structure of the
family and estimate the following model:
[MATH:
yik=α+βkX<
mrow>ik+ϵ
ik :MATH]
Random effects models acknowledge this grouping structure by positing
that each family has its own intercept that shifts the outcome up or
down. The random effects models then estimate these α[j] using a
distribution that pulls some of the family-specific intercepts (
[MATH: αj^
:MATH]
) towards the mean intercept across families (μ[α]) depending on σ[α]
([[160]73]):
[MATH:
yijk=αj+β<
mi>kXijk+ϵijkϵ∼N(μα,σα
mi>2) :MATH]
Random effects methods generate unbiased estimates for β[k] (the causal
effect of an allele for snp k on a trait) when there is no unobserved
confounding at the family level that is correlated with genotype of
other observed covariates (cor(X[ijk], α[j]) = 0))([[161]67]). However,
these methods generate biased estimates when cor(X[ijk], α[j]) ≠ 0). To
generate simulations to test this bias in the genetics context, we use
the following process to generate a family intercept that is correlated
with offspring genotype, and through the process described in step
three, is also correlated with the trait:
1. Operationalize genotype as the sibling pairs’ summed minor allele
count at that locus
2. Choose a ρ parameter for the degree of correlation, and construct a
variance-covariance matrix Q representing the correlation between
the genotype and family-level intercept. E.g., if ρ = 0.1:
[MATH:
[1
0.10.11] :MATH]
3. Take the cholesky decomposition of Q
4. Generate a starting value for the family-level intercept that is
not correlated with genotype (α[j] ∼ N(0, 1)); this starting value
provides a baseline set of intercepts uncorrelated with genotype
that we will then rescale to be correlated with genotype to varying
degrees.
5. Multiply cholesky decomposition by matrix containing the genotype
and the non-correlated intercept to generate α[j] correlated with
the sibling pair’s genotype
6. Repeat the previous steps for four values of ρ representing
different degrees of confounding:
1. ρ = 0 (no confounding)
2. ρ = 0.01
3. ρ = 0.05
4. ρ = 0.1
Step three: Generating traits
To generate traits, we used a similar process for simulating traits as
used in [[162]58]. We generated four general types of traits (traits
with neither mean nor variance effects; traits with mean effects only;
traits with variance effects only; traits with both mean and variance
effects) using the following general setup, and varying the γ and ϵ
parameters, where i represents an individual and k indexes a SNP. Sex
was simulated from a binomial distribution with p = 0.5. Age was
simulated from a normal distribution ∼ N(μ = 50, sd = 10). G[1]
indicates heterozygotes, while G[2] indicates minor allele homozygotes.
Across all simulations with mean or variance effects, an additional
minor allele results in increases in the mean or in the variance:
[MATH:
yik=G1ikγ1+G2
ikγ2
+0.5×sexi+0.05×agei+
ϵik :MATH]
(2)
1. Neither mean nor variance effects:
+ γ[1] = γ[2] = 0
+ ϵ ∼ N(0, 1)
2. Mean effects only
+ γ[1] = 0.15; γ[2] = 0.35
+ ϵ ∼ N(0, 1)
3. Variance effects only
+ γ[1] = γ[2] = 0
+ ϵ ∼ N(0, 1) for major allele homozygotes; ϵ ∼ N(0, 1.15^2) for
heterozygotes; ϵ ∼ N(0, 1.4^2) for minor allele homozygotes
4. Mean and variance effects
+ γ[1] = 0.15; γ[2] = 0.35
+ ϵ ∼ N(0, 1) for major allele homozygotes; ϵ ∼ N(0, 1.15^2) for
heterozygotes; ϵ ∼ N(0, 1.4^2) for minor allele homozygotes
For the simulation to address the possibility of confounding by
unobserved, between-family factors, we also modify [163]Eq 2 to include
the family-level intercept that is correlated with observed genotype at
varying levels (no correlation, medium correlation, high correlation),
and refer to the latter two outcomes as “confounded outcomes”, where i
refers to an individual and j indexes the family that the sibling pair
was generated from:
[MATH:
yijk=G1ijkγ1+G<
/mi>2ijk<
mi>γ2+0.5×
sexij+0.05×a
geij+
αj+ϵ<
mi>ijk :MATH]
(3)
Step four: Controlling for ancestry
To control for ancestral background in the methods that follow, we
control for an indicator for which of the four subpopulations generated
the individual/pair’s genotype. We used an indicator for ancestry
rather than ancestry derived from genotype because, and following the
general simulation method used in [[164]58] where traits were generated
using one SNP because each SNP-trait association is estimated
separately, the genotype has only SNP.
As the strength of between-family confounding increases, the
correlation between these indicators for population stratification and
the family-level intercept increases. To illustrate this increase, we
run the following regression for each of the four degrees of
correlation between family-level intercepts and observed genotypes:
[MATH: familyintercepti=α+β1×
subpopi+ϵ<
mi>i :MATH]
(4)
We then calculated the percent of β[1] ≠ 0 with p < 0.05 across the
1000 replicates with that simulated level of correlation. Table K in
[165]S1 File summarizes the results. It shows that as we move from zero
confounding to some confounding, the relationship between the ancestral
indicator and the family-level intercept becomes more significant.
However, the results also show, and as we show in the results that
follow, while this correlation between the population stratification
indicators and family-level confounding reduces bias in estimates that
this confounding causes, the control does not fully eliminate bias.
[166]Eq 4 looks at the strength of relationship between ancestral
background and the family intercept at various degrees of confounding,
and shows how this strength increases as the degree of confounding
increases. The reason this confounding biases estimates of the effect
of the minor allele count on a trait is because this family intercept
is also correlated with genotype. To show this, we run the following
regressions at each level of confounding:
[MATH: siblingminorallelecounti=<
/mo>α+β1×familyintercepti+ϵi :MATH]
(5)
Table L in [167]S1 File presents the mean β[1] across each level of
confounding, and shows that as confounding increases, the relationship
between family intercept and the minor allele count increases. In the
presence of zero between-family confounding, the slightly positive β[1]
is caused by the correlation between parent and offspring genotypes
that generates correlated genotypes between siblings of the same
parents.
Power
For the trait simulated to have variance effects, the average effect
size was R^2 = 0.01. With the N = 1000 sibling pairs sample sized used
in the analysis, the power to detect these effects is 81%, results we
see confirmed in Table F in [168]S1 File, which shows roughly 80% power
to detect variance effects when these are present.
Analysis
All analysis was performed using R. The following packages were used to
estimate the models (see [169]Results)M with code that names all the
arguments/parameters from the functions used in each package posted at
the following link:
[170]https://scholar.princeton.edu/dconley/princeton-biosociology-lab
* Power analysis: pwr packages with pwr.f2.test command
* Pooled regressions: lm with default settings
* Random effects model: plm with model = “random”
* Fixed effects model: plm with effects = “within” and index of the
family identifier
* DGLM: dglm with a gaussian-family link function and REML as
estimation method
* Squared Z score method: after estimating the Z-score separately by
sex, lm with default settings
* Sibling SD method: after estimating the sibling standard deviation,
lm with default settings
Empirical application
Data
Data for discovery analysis come from the Framingham Heart Study (FHS),
second (parental) and third (sibling) generation respondents. (This
dataset is publicly available through dbGaP
[171]http://www.ncbi.nlm.nih.gov/gap. QC code can be obtained from the
FHS investigators ([[172]74]) The FHS is, in fact, one of the cohorts
included in the GIANT meta-analysis performed by Yang et al.
([[173]40]). Height and weight were taken from clinical measurements
and then BMI was calculated as (weight in kilograms)2 / height in
meters. Genotypes were assayed using the Affymetrix GeneChip Human
Mapping 500K Array and the 50K Human Gene Focused Panel. Genotypes were
determined using the BRLMM algorithm. Our analysis began with the
original 500,568 SNPs, and resulted in 260,469 SNPs available for
analysis after cleaning (e.g., HWE screens and a MAF cut-off of 0.05).
The screens were conducted using all available individuals with genetic
data, not only those that were included in this analysis. Genome-region
association plots were produced using SNAP ([[174]75]), except for
those of published GIANT consortium data, which were produced using
LocusZoom ([[175]76]). Regional linkage maps were produced using SNAP
([[176]75]) and data from the 1,000 Genomes CEU Panel ([[177]77]).
Among third-generation respondents, the numbers in our sample by
sibship size are presented in [178]Table 3. The 200 families with only
one sibling in the data drop from the sibling analysis. Those with more
than two contribute multiple pairs to the data; however, our final
analysis selects only one pair per second-generation family as the more
complicated error structure with multiple pairs leads to early takeoff
on QQ plots. The siblings are genetically related but are not DZ or MZ
twins.
Table 3. Distribution of third generation siblings included in data by
sibship size.
# 3rd-G Sibs in Family: 1 2 3 4 5 6 7 8 9
N (pairs): – 292 483 504 250 150 189 28 36
Families: 200 292 161 84 25 10 9 1 1
[179]Open in a new tab
Total actual N of sibling pairs after random selection of one per
family: 583
The Minnesota Twin Family Study (MTFS) replication data were genotyped
on the Illumina 660W Quad array ([[180]78]) and phenotypes can be found
elsewhere ([[181]79]) Quality control procedures were applied
separately to each individual cohort. Individuals with a call rate <
0.95 (N = 22), estimated inbreeding coefficient > 0.15 (N = 2), or
showing evidence of non-European descent from multidimensional scaling
(N = 298, mainly individuals with Mexican ancestry) were removed.
Individuals were considered outlying from European descent if one or
more of the first four eigenvectors were more than three standard
deviations removed from the mean. SNPs with MAF < 0.01, call rate <
0.95 or HWE-test p-value < 0.001 were removed. The reason for the lower
MAF threshold in the MTFS sample rather than FHS sample is that for the
latter data, internal QC procedures remove SNPs at this threshold prior
to releasing the data to researchers. For our analysis, we included
both the MZ and DZ twin pairs because restricting to DZ twin pairs that
more closely approximate the sibships in the FHS discovery sample does
not change the substantive findings. Because the discovery sample and
the replication sample were genotyped on different arrays, we deployed
SNAP to find corresponding SNPs ([[182]75]). The resulting sample size
for our analysis was 1,048 pairs.
The Boston University Medical Center IRB approved the use of the
discovery analysis data (Framingham Heart Study). The University of
Minnesota IRB approved the use of the replication analysis data
(Minnesota Twin Family Study). The Princeton University IRB approved
the secondary analysis of each data (Protocol 8322).
In addition to the GWA analysis in the discovery and replication
sample, we performed two sets of analyses that pooled SNPs across
multiple software implementations. First, we investigated gene-based
pooling using VEGAS1 ([[183]80]) with the following parameters: CEU
subpopulation specified, no assumed allele frequency difference by sex,
and using all SNPs (not just best hits) within the gene region itself
(+/-0 KB). We estimated pathway-based pooling using i-GSEA4GWAS
([[184]81]). Because estimation of significant pathways and gene sets
can be sensitive to different algorithms’ methods of computing p-values
and controlling type I error rates, we test the robustness of these
findings with PASCAL ([[185]82]). These analyses were performed for
both BMI and height in both the discovery and replication sample. For
the discovery sample, gene-based and pathway-based analyses were
performed using 260,434 variants input; 239,526 variants used; 14,783
genes mapped; 221 gene sets selected. For the replication sample,
gene-based and pathway-based analyses were performed using 522,726
variants input; 487,692 variants used; 16,840 genes mapped; 259 gene
sets selected.
Statistical analysis
All analysis was performed using R. The power analysis depicted in Fig
D in [186]S1 File to estimate the power of the approach at varying
putative effect sizes was performed using the pwr package in R
[[187]83], varying n to reflect the present sample sizes and
recently-released samples with larger sibling populations and showing
power separately for two p value thresholds (p < 10^−5 for discovery
analyses; p < 0.05 for replication). More precisely, we used the
following procedure:
1. Vary the putative R^2 explained by a SNP from 0.0 to 0.01,
incrementing by 0.0001
2. Translate R^2 into effect size (
[MATH:
R21-<
msup>R2 :MATH]
)
3. Since we are analyzing power to estimate the β in a linear
regression, using the pwr.f2.test command to estimate power at
varying effect sizes
Heteroscedasticity-robust standard errors should not substantially
affect power under the present sample size ([[188]84]).
For the main analysis, sibling-pair standard deviations (SD) were fit
by linear regression using the lm command with default options to the
following model, where the key regressor is the number of minor alleles
for the pair of siblings at a given locus. Because this number is for
two individuals, the range is 0 to 4:
[MATH: SDj=α+γk
×siblingminoralleledosagejk<
/mi>+δ×parentminoralleledosagejk<
/mi>+γZj+ϵjk<
/mtd> :MATH]
j indexes a sibling pair, k indexes a SNP, minor allele dosage is the
total number of minor alleles in a sibling pair, parent minor allele
dosage is the total number of minor alleles across the parents, Z[j] is
a vector of sibling pair-level controls that includes controls for the
mean level of the trait in the sibling pair, pair sex (MM or FM or FF),
mean pair age, and the within-pair age difference, and ϵ[jk] is the
residual for sibling pair j at snp k. As we show in the simulation
study, the method returns similar results if parental genotype is or is
not included. And if we do not include parental genotype, we can
include an additional term—the difference in minor allele counts
between siblings—that helps increase the precision of the estimate of
γ[k]. Qualitative results do not change if we instead specify the
mother’s and father’s genotypes separately. In the model, SD[j] is the
standard deviation of a trait within a sibling pair, calculated as
follows, where i indexes an individual sibling in the pair, j indexes
the pair, x refers to the trait, and
[MATH: x¯ :MATH]
refers to the mean of the trait across the two siblings:
[MATH:
SDj=∑i=
mo>12(xij-xj¯)<
/mo>22-1
=
∑i=12(xij-xj¯)<
/mo>2 :MATH]
Huber-White standard errors robust to clustering on pedigree ID (to
account for correlated errors among cousins: sibling pairs that share
the same grandparents but not the same parents) were calculated for the
FHS analysis in the following way, where i = sibling pair 1, j =
sibling pair 2, n = total sibling pairs, g = pedigree grouping, and k =
SNP. σ refers to the residual variance across the sample, and σ[i]
refers to the residual variance for an individual.
For the simple case where
[MATH:
σ2=σi2∀i :MATH]
and
[MATH:
SSTx2=∑i=1n(x
i-x¯)2
msup> :MATH]
:
[MATH: var(β^k)
mrow>=∑i=
mo>1n(xi-x¯)2
msup>σ2SS
Tx2
mfrac> :MATH]
To account for errors that may be heteroskedastic and correlated within
a shared pedigree, we adjust the variance to be robust to cases where
[MATH:
σ2≠σi2 :MATH]
and where for
[MATH:
g=g′
:MATH]
(two sibling pairs share same grandparent/pedigree ID).
[MATH: u^ig :MATH]
represents the observed residual for participant i belonging to
pedigree g. For the case of pedigree ID’s with two or more sibling
pairs, this becomes the following variance robust to heteroskedastic
and correlated errors, with an indicator function for when the sibling
pairs belong to the same pedigree:
[MATH: var(β^k)
mrow>=∑j=
mo>1n∑i<
/mi>=1n(xi-x¯)2
msup>(xj-<
mover
accent="true">x¯)2
msup>u^igu^jg
mi>′1[g=
g′]SSTx2
:MATH]
For the case of pedigree ID’s with one sibling pair only, the above
equation reduces to the following variance robust to heteroskedastic
errors ([[189]85]):
[MATH: var(β^k)
mrow>=∑i=
mo>1n(xi-x¯)2
msup>u^ig2SST
x2 :MATH]
Supporting information
S1 File
Fig A, Estimated coefficients on SNPs for simulated dependent variable
with no effects and confounding between a family-level indicator,
genotype, and outcome. The red dashed line represents the true snp
level effect (β = 0), while the density curves show the range of
estimated
[MATH: β^ :MATH]
for each of the models. We see the fixed effects model correctly
centers the
[MATH: β^ :MATH]
near the β = 0, while the other family-level random effects (random
intercept) and pooled regression show estimates with significant upward
bias in the presence of confounding above ρ = 0.1. However, the random
effects has the advantage of smaller sampling variance (more efficient
estimator) across all levels of confounding because it pools estimates
across families. Fig B, Regional linkage map for FHS genome-wide
suggestive SNPs for sibling-pair standard deviation in BMI from 1,000
Genomes, CEU Panel. Maps produced by SNAP ([[190]75]). Fig C, QQ plots
associated with Manhattan plots in [191]Fig 1. A) Observed versus
expected p-value distributions for analysis of sibling-pair standard
deviation in height for FHS generation-three respondents with controls
for parental genotype, mean height of sibling pair, sex, and sex
difference. B) Same as in (A) except for BMI instead of height. Shaded
gray regions depict 95% confidence intervals. Fig D, Power to detect an
effect size of R^2. The figure contrasts power at three potential
sample sizes (defined as the number of sibling pairs in the data)(see
[192]Methods): 1) the Framingham Heart Study (FHS) sample used in the
present analysis; 2) the Adolescent and Longitudinal Study of Health
(AddHealth) sample; and 3) the UK Biobank sample. Likewise, the figure
contrasts two potential p-value thresholds: p < 10^−5 for the discovery
analysis; p < 0.05 for the confirmation analysis. The figure shows that
although the sample used in the present analysis (FHS) is not
adequately powered to detect realistic effect sizes of R^2 < 0.01,
newly-released datasets with larger sibling subsamples are adequately
powered to detect effects using the method. Fig E, Regional Association
Plot of rs7202116, top hit for variance in BMI found by Yang et al.
(2012), on mean level of BMI from GIANT consortium data. Figure
produced using LocusZoom ([[193]76]). Fig F, Regional Association Plot
of genome-wide suggestively significant (p<10-5) hits from [194]Fig 1
on mean height from GIANT consortium data. (A—D) Plots for the SNPs
rs2804263, rs2073302, rs8126205, and rs4834078, respectively, show no
markers in the respective regions that approach even genome-wide
suggestive significance (p<10^−5). Figures produced using LocusZoom
([[195]76]). Fig G, Relationship between sibling-pair mean BMI and
sibling-pair standard deviation (SD) or coefficient of variation (CV).
A) Sibling-pair SD versus mean (ρ = 0.43). B) Sibling-pair CV versus
mean (ρ = 0.25). Fig H, Manhattan plots for enriched pathway HSA04540
Gap Junction for height variability. A) FHS discovery sample; B) MTFS
replication sample. Table A, Results of Hausman test comparing
[MATH: β^FE :MATH]
with
[MATH: β^RE :MATH]
across the simulations. The results show that at zero and very small
amounts of confounding, the test fails to reject the null hypothesis
that
[MATH: β^RE :MATH]
is a consistent estimator, but at higher levels of confounding, the
test rejects this null hypothesis. Table B, Results of regressing
squared Z-score of trait on minor allele count across 1000 replicates
with non-demeaned data. The results show an inflated type I error rate
for the trait simulated to have either null effects or mean effects but
no variance effects in the presence of an unobserved confounder between
genotype and outcome (underlined rows). Table C, Results of regressing
squared Z-score of trait on minor allele count across 1000 replicates.
Regressions are estimated using the demeaned data. The results show an
inflated type I error rate for the trait simulated to have either null
effects or mean effects but no variance effects in the presence of an
unobserved confounder between genotype and outcome (underlined rows)
even after transforming the data. Table D, Results of DGLM on
non-demeaned (non-transformed) for simulated DV with different types of
effects across 1000 replicates. The results show two types of inflated
type I error rates. First, when a variant has null effects (neither
effects on the mean nor effects on the variance) and there is
confounding, the DGLM has an inflated type I error rate, detecting β ≠
0 in 77.8% of simulations. Second, when a variant has variance effects
but no mean effects, the method also has an inflated type I error rate,
detecting β ≠ 0 in 11% of cases in the absence of confounding and 70%
of cases in the presence of confounding. Table E, Results of DGLM on
demeaned (non-transformed) for simulated DV with different types of
effects across 1000 replicates. The results show an inflated type I
error rate (estimate β ≠ 0 despite the presence of allele affects on
the variance and not the mean) that is smaller but still present in the
demeaned data. The results also show that while demeaning reduces the
type I error rate (false detection of mean effects), the transformation
leads to type II errors (fails to detect variance effects when these
are present). Table F Results of regressing sibling SD of trait on
minor allele count across 1000 replicates. The results show the
percentage of simulations for which the coefficient on the minor allele
count is significant at the p < 0.05 level when we regress the sibling
standard deviation of the trait on this count and controls. In order
for the method to adequately control for Type I error, we want this
percentage to be low for the traits simulated to have 1. neither mean
nor variance effects or 2. mean effects only. In order for the method
to be adequately powered, we want this percentage to be high for the
traits simulated to have: 1. variance effects only and 2. both mean and
variance effects. For the first goal, the results show that in contrast
to the squared Z-score and DGLM, which each, in the presence of an
unobserved confounder, display type I error rates of around 20% in
detecting variance effects in traits simulated to have mean effects
only, the sibling SD method avoids this type of error (underlined rows)
both with and without controls for parental genotype. The results also
illustrate that the method detects variance effects when the trait
either has variance effects only or when the trait exhibits both mean
and variance effects. The first half of the table also shows a lower
type I error rate than squared Z-score when there is no unobserved
confounder. Table G, Proxy SNPs and results for replication analysis
using Minnesota Twin Family Study data. Table H, Replicated GWAS hits
for other SNPs on MAST4 Results are from the NCBI Phenotype-Genotype
Integrator. Table I, Gene set analysis results using PASCAL. The table
shows significant gene sets in FHS that replicated in MTFS at different
p-value thresholds (MAST4, the location of the replicated SNP, does not
appear because although it was p < 0.01 in the FHS dataset, it was p =
0.1 in the MTFS dataset). Table J, Pathway analysis results using
PASCAL The table shows significant pathways in FHS that replicated in
MTFS at different p-value thresholds. The pathway that replicated using
the i-GSEA4GWAS tool is not among those tested by PASCAL. Table K,
Illustrating correlation between population indicators and family-level
intercept across 1000 replicates at four degrees of family-level
confounding. The results show that at low levels of confounding (ρ ≤
0.1), the broad ancestry indicators are correlated with the indicator
for family genotype but are too broad to fully capture the confounding.
As the confounding increases beyond low levels, the ancestry indicators
better capture the confounding. Table L, Relationship between family
intercept and observed genotype. The table shows that as the degree of
between-family confounding increases, there is a stronger relationship
between the intercept that shifts levels of a trait up or down between
families and the genotype.
(PDF)
[196]Click here for additional data file.^ (4MB, pdf)
Acknowledgments