Abstract
Background
Genome-wide Association Studies (GWAS) have proved invaluable for the
identification of disease susceptibility genes. However, the
prioritization of candidate genes and regions for follow-up studies
often proves difficult due to false-positive associations caused by
statistical noise and multiple-testing. In order to address this issue,
we propose the novel GWAS noise reduction (GWAS-NR) method as a way to
increase the power to detect true associations in GWAS, particularly in
complex diseases such as autism.
Methods
GWAS-NR utilizes a linear filter to identify genomic regions
demonstrating correlation among association signals in multiple
datasets. We used computer simulations to assess the ability of GWAS-NR
to detect association against the commonly used joint analysis and
Fisher's methods. Furthermore, we applied GWAS-NR to a family-based
autism GWAS of 597 families and a second existing autism GWAS of 696
families from the Autism Genetic Resource Exchange (AGRE) to arrive at
a compendium of autism candidate genes. These genes were manually
annotated and classified by a literature review and functional grouping
in order to reveal biological pathways which might contribute to autism
aetiology.
Results
Computer simulations indicate that GWAS-NR achieves a significantly
higher classification rate for true positive association signals than
either the joint analysis or Fisher's methods and that it can also
achieve this when there is imperfect marker overlap across datasets or
when the closest disease-related polymorphism is not directly typed. In
two autism datasets, GWAS-NR analysis resulted in 1535 significant
linkage disequilibrium (LD) blocks overlapping 431 unique reference
sequencing (RefSeq) genes. Moreover, we identified the nearest RefSeq
gene to the non-gene overlapping LD blocks, producing a final candidate
set of 860 genes. Functional categorization of these implicated genes
indicates that a significant proportion of them cooperate in a coherent
pathway that regulates the directional protrusion of axons and
dendrites to their appropriate synaptic targets.
Conclusions
As statistical noise is likely to particularly affect studies of
complex disorders, where genetic heterogeneity or interaction between
genes may confound the ability to detect association, GWAS-NR offers a
powerful method for prioritizing regions for follow-up studies.
Applying this method to autism datasets, GWAS-NR analysis indicates
that a large subset of genes involved in the outgrowth and guidance of
axons and dendrites is implicated in the aetiology of autism.
Background
Genome-wide association studies (GWAS) have provided a powerful tool
for identifying disease susceptibility genes. However, analysis of GWAS
data has been focused on single-point tests, such as the traditional
allele-based chi-squared test or the Cochran-Armitage Trend test
[[50]1], which proceed by testing each single nucleotide polymorphism
(SNP) independently. As it is likely that the disease variants have not
been directly genotyped in a GWAS, tests that account for multiple
flanking SNPs in linkage disequilibrium (LD) with the disease variants
may increase the power to detect association [[51]2].
Several approaches have been proposed in order to test for association
based on multiple markers, which include the haplotype-based approach
[[52]3-[53]5] and the multivariate approach [[54]6,[55]7]. Akey et al.
[[56]8] used analytical approaches to demonstrate that multilocus
haplotype tests can be more powerful than single-marker tests. For the
multivariate approach, tests such as Hotelling's T^2 test are often
used to account for multiple markers jointly [[57]6,[58]9]. Although
statistical power can be increased by such multi-marker approaches, it
is not a straightforward operation to select markers for testing.
Including all markers in a gene or region may not be feasible since it
greatly increases the degrees of freedom in the test, which can reduce
the power.
Follow-up studies, such as fine mapping and sequencing, are necessary
in order to validate association signals and they are also challenging
[[59]2]. Prioritization of genes or regions for follow-up studies is
often decided by a threshold of P-values or ranking for significant
markers [[60]10,[61]11]. However, many false positives can still exist
in the markers classified as significant for follow-up as a result of
statistical noise and genome-wide multiple testing. Joint and/or
meta-analysis of GWAS data can achieve greater power if these data or
P-values are available from different datasets. If P-values from
individual and joint analyses are available, it is possible to further
increase the power by assigning more weight to markers with replicated
association signals in several datasets or to markers that have
flanking markers with an association signal.
We propose the use of the GWAS noise reduction (GWAS-NR) approach which
uses P-values from individual analyses, as well as joint analysis of
multiple datasets, and which accounts for association signals from
surrounding markers in LD. GWAS-NR is a novel approach to extending the
power of GWAS studies to detect association. Noise reduction is
achieved by applying a linear filter within a sliding window in order
to identify genomic regions demonstrating correlated profiles of
association across multiple datasets. As noise reduction (NR)
techniques are widely used to boost signal identification in
applications such as speech recognition, data transmission and image
enhancement, we expect that GWAS-NR may complement other GWAS analysis
methods in identifying candidate loci that may then be prioritized for
follow-up analysis or analysed in the context of biological pathways.
Enhancing statistical power is particularly important in the study of
complex diseases such as autism. There is overwhelming evidence from
twin and family studies for a strong genetic component to autism, with
estimates of heritability greater than 80% [[62]12-[63]14]. Autism is
generally diagnosed before the age of 4, based on marked qualitative
differences in social and communication skills, often accompanied by
unusual patterns of behaviour (for example, repetitive, restricted,
stereotyped) [[64]15]. Altered sensitivity to sensory stimuli and
difficulties of motor initiation and coordination also are frequently
present. Identifying the underlying genes and characterizing the
molecular mechanisms of autism will provide immensely useful guidance
in the development of effective clinical interventions.
Numerous autism candidate genes have been reported based on association
evidence, expression analysis, copy number variation (CNV), and
cytogenetic screening. These genes involve processes including cell
adhesion (NLGN3, NLGN4 [[65]16], NRXN1 [[66]17], CDH9/CDH10
[[67]18,[68]19]), axon guidance (SEMA5A [[69]20]), synaptic scaffolding
(SHANK2, DLGAP2 [[70]21], SHANK3 [[71]22]), phosphatidylinositol
signalling (PTEN [[72]23], PIK3CG [[73]24]), cytoskeletal regulation
(TSC1/TSC2 [[74]24,[75]25], EPAC2/RAPGEF4 [[76]26], SYNGAP1 [[77]21]),
transcriptional regulation (MECP2 [[78]27], EN2 [[79]28]) and
excitatory/inhibitory balance (GRIN2A [[80]29], GABRA4, GABRB1
[[81]30]). However, aside from rare mutations and 'syndromic' autism
secondary to known genetic disorders, the identification of specific
genetic mechanisms in autism has remained elusive.
Over the past decade, the vast majority of genetic studies of autism
(both linkage and focused candidate gene studies) have failed to
broadly replicate suspected genetic variations. For this reason, the
assumption that autism is governed by strong and pervasive genetic
variations has given way to the view that autism may involve numerous
genetic variants, each having a small effect size at the population
level. This may arise from common variations having small individual
effects in a large number of individuals (the common disease-common
variant [CDCV] hypothesis) or rare variations having large individual
effects in smaller subsets of individuals (the rare variant [RV]
hypothesis).
Given the potential genetic heterogeneity among individuals with autism
and the likely involvement of numerous genes of small effect at the
population level, we expected that the GWAS-NR could improve the power
to identify candidate genes for follow-up analysis. We applied GWAS-NR
to autism GWAS data from multiple sources and conducted simulation
studies in order to compare the performance of GWAS-NR with traditional
joint and meta-analysis approaches. These data demonstrate that GWAS-NR
is a useful tool for prioritizing regions for follow-up studies such as
next-generation sequencing.
Methods
GWAS-NR
The GWAS-NR algorithm produces a set of weighted P-values for use in
prioritizing genomic regions for follow-up study. Roeder and Wasserman
[[82]31] characterize the statistical properties of such weighting
approaches in GWAS, observing that informative weights can improve
power substantially, while the loss in power is usually small even if
the weights are uninformative. The GWAS-NR algorithm computes a weight
at each locus based on the strength and correlation of association
signals at surrounding markers and in multiple datasets, without
relying on prior information or scientific hypotheses. The weights are
applied to the P-values derived from joint analysis of the complete
data and the resulting weighted P-values are then used to prioritize
regions for follow-up analysis.
Noise reduction methods are frequently applied when extracting a common
signal from multiple sensors. The filter used by GWAS-NR is similar to
the method proposed by de Cheveigné and Simon [[83]32] for sensor noise
suppression in magneto- and electro-encephalograph recordings. Each
sensor is projected onto the other sensors and the fitted values from
these regressions are used in place of the original values. The fitted
values of such regressions retain sources of interest that are common
to multiple sensors. As the regression residuals are orthogonal to the
fitted values, uncorrelated components are suppressed.
In a genomic context, the 'sensors' take the form of probit-transformed
P-values derived from independent datasets, as well as P-values derived
from joint analysis of the full dataset. The filter inherently
highlights cross-validating associations, by preserving signals that
jointly occur in a given genomic region and attenuating spikes that are
not correlated across subsets of the data. However, GWAS-NR can achieve
no advantage over simple joint analysis when an association signal is
restricted to a single marker and flanking markers provide no
supplementary information.
We estimate ordinary least-squares regressions of the form
[MATH:
Zij=αjk+βjkZik+vjk :MATH]
and compute projections
[MATH: Zij
msub>^=αjk+βj
kZik
msub> :MATH]
where Z[i ]and Z[ik ]are the probits Φ^-1(1 - p) of the P-values at
locus i in two datasets j and k. Φ^-1(⋅) denotes the inverse of the
cumulative standard normal distribution. The estimates are computed
within a centred sliding window of w markers and β[jk ]are constrained
to be nonnegative which sets
[MATH: Zij
msub>^ :MATH]
to the mean
[MATH: Zij
msub>¯ :MATH]
in regions having zero or negative correlation across sensors. As β[jk
]is driven by the covariance between probits in datasets j and k,
probits that demonstrate positive local correlation will tend to be
preserved, while probits demonstrating weak local correlation will be
attenuated. One local regression is computed for each locus and is used
to compute a single fitted value
[MATH: Zij
msub>^ :MATH]
for that locus. The same method is used to compute projections
[MATH: Zik
msub>^ :MATH]
.
In order to capture association signals at adjacent loci in different
datasets without estimating numerous parameters, the regressor at each
locus is taken to be the probit of the lowest P-value among that locus
and its two immediate neighbours. Quality control (QC) failure or
different genotyping platforms can cause SNP genotypes to be missing in
different datasets. Missing genotypes for a locus having no immediately
flanking neighbours are assigned a probit of zero. The window width w
is calculated as w = 2h + 1, where h is the lag at which the
autocorrelation of the probits declines below a pre-defined threshold.
In practice, we estimate the autocorrelation profile for each series of
probits and use the average value of h with an autocorrelation
threshold of 0.20.
After computing the projections of
[MATH:
Z∧j<
/msub> :MATH]
and
[MATH:
Z∧k<
/msub> :MATH]
, the resulting values are converted back to P-values and a set of
filtered P-values is computed from these projections using Fisher's
method. The same algorithm is executed again, this time using the
probits of the filtered P-values and the P-values obtained from the
joint association analysis of the complete data. The resulting Fisher
P-values are then treated as weighting factors and are multiplied by
the corresponding raw P-values from the joint analysis, producing a set
of weighted P-values. To aid interpretation, we apply a monotonic
transformation to these weighted P-values, placing them between 0 and 1
by fitting parameters of an extreme value distribution. The GWAS-NR
algorithm was executed as a Matlab script.
Simulations
Although noise reduction has been shown to be useful in other
biomedical applications [[84]32], understanding its properties for
identifying the true positives in disease association studies is also
important. We used computer simulations to compare the performance of
GWAS-NR with the joint association in the presence of linkage (APL)
analysis and Fisher's method under a variety of disease models. We used
genomeSIMLA [[85]33] to simulate LD structures based on the Affymetrix
5.0 chip and performed the sliding-window haplotype APL [[86]34] test
to measure association. Detailed descriptions for the simulation
settings are provided in Additional File [87]1 and detailed haplotype
configurations can be found in Additional File [88]2.
An important goal for the proposed approach is to help prioritize
candidate regions for follow-up studies such as next-generation
sequencing. Top regions or genes ranked by their P-values are often
considered priority regions for follow-up studies. In order to
investigate the proportion of true positives that occur in the top
regions, we treated the association tests as binary classifiers. The
markers were ranked by their P-values and markers that occurred in the
top k ranking were classified as significant, where k was pre-specified
as a cut-off threshold. The markers that were not in the top k ranking
were classified as non-significant. We then compared the sensitivity
and specificity of GWAS-NR with the joint and Fisher's tests. The
sensitivity was calculated based on the proportion of the three markers
associated with the disease that were correctly classified as
significant. The specificity was calculated based on the proportion of
markers not associated with the disease that were correctly classified
as non-significant. The sensitivity and specificity were averaged over
1000 replicates.
Ascertainment and sample description
We ascertained autism patients and their affected and unaffected family
members through the Hussman Institute for Human Genomics (HIHG,
University of Miami Miller School of Medicine, FL, USA), and the
Vanderbilt Center for Human Genetics Research (CHGR, Vanderbilt
University Medical Center, Tennessee, USA; UM/VU). Participating
families were enrolled through a multi-site study of autism genetics
and recruited via support groups, advertisements and clinical and
educational settings. All participants and families were ascertained
using a standard protocol. These protocols were approved by appropriate
Institutional Review Boards. Written informed consent was obtained from
parents, as well as from minors who were able to give informed consent;
in individuals unable to give assent due to age or developmental
problems, assent was obtained whenever possible.
The core inclusion criteria were as follows: (1) chronological age
between 3 and 21 years of age; (2) presumptive clinical diagnosis of
autism; and (3) expert clinical determination of autism diagnosis using
Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV criteria
supported by the Autism Diagnostic Interview-Revised (ADI-R) in the
majority of cases and all available clinical information. The ADI-R is
a semi-structured diagnostic interview which provides diagnostic
algorithms for classification of autism [[89]35]. All ADI-R interviews
were conducted by formally trained interviewers who have achieved
reliability according to established methods. Thirty-eight individuals
did not have an ADI-R and, for those cases, we implemented a
best-estimate procedure to determine a final diagnosis using all
available information from the research record and data from other
assessment procedures. This information was reviewed by a clinical
panel led by an experienced clinical psychologist and included two
other psychologists and a paediatric medical geneticist - all of whom
were experienced in autism. Following a review of case material, the
panel discussed the case until a consensus diagnosis was obtained. Only
those cases in which a consensus diagnosis of autism was reached were
included. (4) The final criterion was a minimal developmental level of
18 months as determined by the Vineland Adaptive Behavior Scale (VABS)
[[90]36] or the VABS-II [[91]37] or intelligence quotient equivalent
>35. These minimal developmental levels assure that ADI-R results are
valid and reduce the likelihood of including individuals with severe
mental retardation only. We excluded participants with severe sensory
problems (for example, visual impairment or hearing loss), significant
motor impairments (for example, failure to sit by 12 months or walk by
24 months) or identified metabolic, genetic or progressive neurological
disorders.
A total of 597 Caucasian families (707 individuals with autism) were
genotyped at HIHG. This dataset consisted of 99 multiplex families
(more than one affected individual) and 498 singleton (parent-child
trio) families. A subset of these data had been previously reported
[[92]19]. In addition, GWAS data were obtained from the Autism Genetic
Resource Exchange (AGRE) [[93]35] as an additional dataset for
analysis. The full AGRE dataset is publicly available and contains
families with the full spectrum of autism spectrum disorders. From
AGRE, we selected only families with one or more individuals diagnosed
with autism (using DSM-IV and ADI-R); affected individuals with
non-autism diagnosis within these families were excluded from the
analysis. This resulted in a dataset of 696 multiplex families (1240
individuals with autism) from AGRE [[94]35].
Genotyping and quality control and population stratification
We extracted DNA for individuals from whole blood by using Puregene
chemistry (QIAGEN, MD, USA). We performed genotyping using the Illumina
Beadstation and the Illumina Infinium Human 1 M beadchip following the
recommended protocol, only with a more stringent GenCall score
threshold of 0.25. Genotyping efficiency was greater than 99%, and
quality assurance was achieved by the inclusion of one CEPH control per
96-well plate that was genotyped multiple times. Technicians were
blinded to affection status and quality-control samples. The AGRE data
were genotyped using the Illumina HumanHap550 BeadChip with over
550,000 SNP markers. All samples and SNPs underwent stringent GWAS
quality control measures as previously described in detail in Ma et al.
[[95]19].
Although population substructure does not cause a type I error in
family-based association tests, multiple founder effects could result
in a reduced power to detect an association in a heterogeneous disease
such as autism. Thus, we conducted EIGENSTRAT [[96]38] analysis on all
parents from analysed families for evidence of population substructure
using the overlapping SNPs genotyped in both the UM/VU and AGRE
datasets. In order to ensure the most homogeneous groups for
association screening and replication, we excluded all families with
outliers defined by EIGENSTRAT [[97]38] out of four standard deviations
of principal components 1 and 2.
Haplotype block definition
We used haplotype blocks to define regions of interest. Significant
regions can be used for follow-up analysis such as next-generation
sequencing. We applied the haplotype block definition method proposed
by Gabriel et al. [[98]39] to the UM/VU dataset. We performed GWAS-NR
based on single-marker APL P-values from UM/VU, AGRE and joint tests.
We also performed GWAS-NR on P-values obtained from sliding-window
haplotype tests with a haplotype length of three markers for the UM/VU,
AGRE and joint datasets. Since the true haplotype length is not known,
we chose a fixed length of three markers across the genome and used
GWAS-NR to sort out true signals from the P-values. Blocks containing
the top 5000 markers, as ranked by the minimum values (MIN_NR) of the
GWAS-NR P-values obtained from single-marker tests, and the GWAS-NR
P-values obtained from tests of three-marker haplotypes, were selected
for further analysis.
Combined P-values for haplotype block scoring
In order to test for the significance of the haplotype blocks, we
calculated the combined P-value for each block using a modified version
of the Truncated Product Method (TPM) [[99]40]. TPM has been shown to
have correct type I error rates and more power than other methods
combining P-values [[100]40] under different simulation models.
Briefly, a combined score was calculated from the markers in each
block, based on the product of MIN_NR that were below a threshold of
0.05. We used the Monte Carlo algorithm [[101]40] with a slight
modification to test the significance of the combined score.
Specifically, a correlation matrix was applied to account for
correlation among P-values for the markers in the same block. The null
hypothesis is that none of the markers in the haplotype block are
associated with the disease. In order to simulate the null distribution
for the combined score, we generated two correlated sets of L uniform
numbers based on the correlation of 0.67 for CAPL and HAPL P-values,
where L denotes the number of tests in the block. The minimum values
were selected from each pair in the two sets, which resulted in a
vector of L minimum values. Then the correlation matrix was applied to
the vector of L minimum values and a null combined GWAS-NR score was
calculated for the haplotype block.
Functional analysis
In order to investigate functional relationships among genes in the
candidate set, each candidate was manually annotated and
cross-referenced, based on a review of current literature, with
attention to common functions, directly interacting proteins and
binding domains. Supplementary functional annotations were obtained
using DAVID (The Database for Annotation, Visualization and Integrated
Discovery) version 6.7 [[102]41-[103]43].
Results
Simulations
We present the simulation results for the three-marker haplotype
disease models in Figures [104]1 and [105]2. Figure [106]1 presents
receiver operating characteristic (ROC) curves to show the sensitivity
and specificity of GWAS-NR, the joint APL analysis and Fisher's tests,
based on varying cut-off values of ranking for significance. The
Fisher's test to combine P-values was used here as a standard
meta-analysis approach. The performance of a classification model can
be judged based on the area under the ROC curve (AUC). For scenario 1
(identical marker coverage in each dataset), GWAS-NR produced a greater
AUC than the joint and Fisher's tests. It can also be observed from the
figure that, given the same specificity, GWAS-NR achieved a higher
sensitivity for classifying true positives as significant as the joint
and Fisher's tests.
Figure 1.
[107]Figure 1
[108]Open in a new tab
Comparative classification rates for genome-wide association studies
noise reduction (GWAS-NR), joint analysis and Fisher's test. GWAS-NR
has area under the curve (AUC) of 0.703 and the joint and Fisher's
tests have AUC of 0.64 and 0.615, respectively, for the recessive
model. Also GWAS-NR has AUC of 0.899 and the joint and Fisher's tests
have AUC of 0.795 and 0.777, respectively, for the multiplicative
model. For the dominant model, AUC for GWAS-NR, the joint and Fisher's
tests are 0.981, 0.880 and 0.867, respectively. For the additive model,
AUC for GWAS-NR, the joint and Fisher's tests are 0.932, 0.822, and
0.807, respectively.
Figure 2.
[109]Figure 2
[110]Open in a new tab
Comparative classification rates for genome-wide association studies
noise reduction noise reduction (GWAS-NR), joint analysis and Fisher's
test with 20% and 50% missing markers. GWAS-NR has area under the curve
(AUC) of 0.689 and the joint and Fisher's tests have AUC of 0.622 and
0.598, respectively, for the recessive model. Also GWAS-NR has AUC of
0.883 and the joint and Fisher's tests have AUC of 0.776 and 0.760,
respectively, for the multiplicative model. For the dominant model, AUC
for GWAS-NR, the joint and Fisher's tests are 0.961, 0.852 and 0.844,
respectively. For the additive model, AUC for GWAS-NR, the joint and
Fisher's tests are 0.895, 0.785, and 0.775, respectively.
As independent datasets may have an imperfect overlap of markers, which
is true of the UM/VU and AGRE autism data, and the omission of the
closest disease-related polymorphism from the data can have substantial
negative impact on the power of GWAS [[111]44], we also compared the
performance of GWAS-NR with the joint APL tests and Fisher's tests
under a range of missing marker scenarios: 20% of the simulated markers
in one dataset were randomly omitted for the recessive and
multiplicative models and 50% of the simulated markers were randomly
omitted in one dataset for the dominant and additive models. This
performance is shown in Figure [112]2. Again, the GWAS-NR produced a
greater AUC than the joint and Fisher's tests and achieved a higher
sensitivity for classifying true positives at each level of
specificity.
The results for the two-marker haplotype disease models are shown in
Additional File [113]3. The same pattern is also observed in Additional
File [114]3 that GWAS-NR produced greater AUC than the joint and
Fisher's tests.
We also evaluated the type I error rates of the modified TPM for
identifying significant LD blocks using a truncation threshold of 0.05.
For the scenario assuming full marker coverage as described in
Additional File [115]1, the modified TPM had type I error rates of
0.035 and 0.004 at the significance levels of 0.05 and 0.01,
respectively. For the missing-marker scenario, the type I error rates
for the modified TPM were 0.046 and 0.007 at the significance levels of
0.05 and 0.01, respectively.
Autism GWAS-NR results
We applied the GWAS-NR in autism data using UM/VU, AGRE and the joint
(UM/VU)/AGRE datasets. A flow diagram (Additional File [116]4) for the
data analysis process is found in the supplemental data. The selection
of haplotype blocks based on Gabriel's definition resulted in a total
of 2680 blocks based on the top 5000 markers. Moreover, 141 markers out
of the 5000 markers which are not in any blocks were also selected.
Blocks of LD were scored based on the truncated product of P-values
below a threshold of 0.05 and a P-value for each block was obtained
through Monte Carlo simulation. The P-values for 141 markers not in any
blocks were also calculated using the Monte Carlo algorithm to account
for the minimum statistics. All of the 141 markers had P-values less
than 0.05 and were selected. 725 LD blocks achieved a significance
threshold of P < = 0.01, and an additional 810 blocks achieved a
threshold of P < = 0.05. A complete list of these blocks is presented
in Additional File [117]5.
In order to determine what genes reside within the 1535 significant LD
blocks, we used the University of California Santa Cruz (UCSC) Genome
Browser Table Browser. The 1535 regions were converted into start and
end positions based on the SNP positions in the March 2006
(NCBI36/hg18) human genome assembly. These start and end positions were
used to define regions in the UCSC Table Browser. We searched each
region for overlap with the RefSeq annotation track in the UCSC
Browser. This search resulted in 431 unique genes which mapped back to
646 significant LD blocks and 50 single markers. These genes are
presented in Additional File [118]6. For the remaining 839 LD blocks
that did not overlap a RefSeq gene, we identified the nearest RefSeq
gene using Galaxy [[119]45]. The distance to these nearest genes
averaged 417,377 bp with a range from 5296 to 5,547,466 bp. These
nearest genes include candidate genes for which strong proximal
associations with autism have previously been reported, such as CDH9
[[120]18,[121]19] and SEMA5A [[122]20]. We considered these genes for
follow-up because GWAS-NR, by construction, may capture association
information from nearby regions that may not be in strict LD with a
given SNP and because these proximal locations may also incorporate
regulatory elements. These genes are presented in Additional File
[123]7. Combining these sets resulted in a candidate set of 860 unique
genes (presented in Additional File [124]8). For genes assigned to more
than one significant LD block, the lowest P-value among these blocks is
used for sorting and discussion purposes.
The most significant LD block we identified is located at 2p24.1 (ch2
204444539-20446116; P = 1.8E-06) proximal to PUM2. One LD block located
within the PUM2 exon also had nominally significant association (P =
0.024). Additional top-ranking candidates, in order of significance,
include CACNA1I (P = 1.8E-05), EDEM1 (P = 1.8E-05), DNER (P = 2.7E-05),
A2BP1 (P = 3.6E-05), ZNF622 (P = 8.11E-05), SEMA4D (P = 9.09E-05) and
CDH8 (P = 9.09E-05). Gene ontology classifications and InterPro binding
domains reported by DAVID [[125]41-[126]43] to be most enriched in the
candidate gene set are presented in Tables [127]1 and [128]2,
respectively, providing a broad functional characterization of the
candidate genes identified by the GWAS-NR in autism.
Table 1.
Common functions of autism candidate genes identified by genome-wide
association studies-noise reduction (GWAS-NR)
Gene ontology (GO) term No. of genes GO term identification P-value^1
Examples
Cell adhesion 76 0007155 6.29E-13 CDH8, NCAM2
__________________________________________________________________
Biological adhesion 76 0022610 6.64E-13 CDH2, CTNNB1
__________________________________________________________________
Cell-cell adhesion 35 0016337 6.24E-08 CTNNA2, AMIGO2
__________________________________________________________________
Homophilic cell adhesion 21 0007156 1.21E-06 PTPRM, FAT1
__________________________________________________________________
Cell motion 44 0006928 6.65E-06 SEMA5A, FYN
__________________________________________________________________
Neuron differentiation 41 0030182 1.14E-05 EN2, NRXN1
__________________________________________________________________
Enzyme linked receptor protein signalling pathway 33 0007167 5.40E-05
NCK2, FGFR2
__________________________________________________________________
Neuron development 32 0048666 1.07E-04 ROBO2, RTN4R
__________________________________________________________________
Negative regulation of gene expression 42 0010629 1.27E-04 SIX3, CUX2
__________________________________________________________________
Axonogenesis 22 0007409 1.31E-04 SEMA6A, SLITRK5
__________________________________________________________________
Cell morphogenesis involved in differentiation 25 0000904 2.16E-04
PRKCA, PTK2
__________________________________________________________________
Cell motility 29 0048870 2.40E-04 DNER, PPAP2B
__________________________________________________________________
Localization of cell 29 0051674 2.40E-04 PTEN, NRP2
__________________________________________________________________
Negative regulation of transcription 38 0016481 3.19E-04 RBPJ, MEIS2
__________________________________________________________________
Cell morphogenesis involved in neuron differentiation 22 0048667
3.94E-04 PARD3, KALRN
__________________________________________________________________
Transmembrane receptor protein tyrosine kinase signalling 23 0007169
3.98E-04 SOCS2, DOK5
__________________________________________________________________
Neuron projection development 25 0031175 4.40E-04 RTN4R, NGF
__________________________________________________________________
Neuron projection morphogenesis 22 0048812 5.07E-04 PVRL1, CDH4
__________________________________________________________________
Regulation of cell projection organization 13 0031344 5.33E-04 SEMA4D,
CDC42EP4
__________________________________________________________________
Negative regulation of nucleobase, nucleoside, nucleotide, and nucleic
acid metabolic process 40 0045934 6.79E-04 BCL6, ZHX2
[129]Open in a new tab
Table 2.
Common binding domains of autism candidate genes identified by
genome-wide association studies-noise reduction (GWAS-NR).
INTERPRO term No. of genes INTERPRO identification P-value^1
Immunoglobulin I-set 20 IPR013098 8.97E-06
__________________________________________________________________
Cadherin 16 IPR002126 6.98E-05
__________________________________________________________________
Cadherin cytoplasmic region 7 IPR000233 1.14E-04
__________________________________________________________________
Pleckstrin homology 26 IPR001849 5.03E-04
__________________________________________________________________
Immunoglobulin 21 IPR013151 5.61E-04
__________________________________________________________________
Immunoglobulin subtype 2 21 IPR003598 6.77E-04
__________________________________________________________________
Fibronectin, type III-like fold 19 IPR008957 1.19E-03
__________________________________________________________________
Fibronectin, type III 19 IPR003961 1.72E-03
__________________________________________________________________
Epidermal growth factor (EGF) 14 IPR006209 3.71E-03
__________________________________________________________________
Meprin/A5-protein/PTPmu (MAM) 5 IPR000998 6.78E-03
__________________________________________________________________
Protein-tyrosine phosphatase, receptor/non-receptor type 7 IPR000242
7.36E-03
__________________________________________________________________
Pleckstrin homology-type 24 IPR001993 7.41E-03
__________________________________________________________________
von Willebrand factor, type A 10 IPR002035 7.41E-03
__________________________________________________________________
Immunoglobulin-like 35 IPR007110 7.57E-03
[130]Open in a new tab
Cell adhesion represented the most common functional annotation
reported for the candidate gene set, with a second set of common
functional annotations relating to neuronal morphogenesis and motility,
including axonogenesis and neuron projection development. Given the
enrichment scores reported by DAVID [[131]41-[132]43] implicating
neurite development and motility, and because numerous cell adhesion
molecules are known to regulate axonal and dendritic projections
[[133]46,[134]47], we examined the known functional roles of the
individual candidate genes responsible for these enrichment scores. A
total of 183 candidate genes were represented among the top 20
functional classifications reported by DAVID [[135]41-[136]43]. Based
on annotations manually curated from a review of current literature, we
observed that 76 (41.5%) of these genes have established roles in the
regulation of neurite outgrowth and guidance. These include 39 (51.3%)
of the candidate genes contained in the cell adhesion, biological
adhesion, cell-cell adhesion and homophilic cell adhesion pathways.
Gene ontology [[137]48] specifically associates two pathways with the
narrow synonym 'neurite outgrowth': the neuron projection development
(pathway 0031175); and the transmembrane receptor protein tyrosine
kinase activity (pathway 0004714). To further test for functional
enrichment of genes related to neurite outgrowth, we formed a
restricted composite of these two pathways. Enrichment analysis using
the EASE function of DAVID [[138]41-[139]43] rejected the hypothesis
that this composite pathway is randomly associated with the autism
candidate set (P = 2.07E-05).
Although many of the candidate genes identified by the GWAS-NR remain
uncharacterized or have no known neurological function, we identified
125 genes within the full candidate set having established and
interconnected roles in the regulation of neurite outgrowth and
guidance. These genes are involved in diverse sub-processes including
cell adhesion, axon guidance, phosphatidylinositol signalling,
establishment of cell polarity, Rho-GTPase signalling, cytoskeletal
regulation and transcription. Table [140]3 presents a summary of these
genes by functional category. Additional File [141]9 presents
annotations for these 125 candidates. Additional File [142]10 presents
104 additional candidates which have suggestive roles in neurite
regulation based on putative biological function or homology to known
neurite regulators but where we did not find evidence specific to
neurite outgrowth and guidance in the current literature.
Table 3.
Autism candidate genes with known roles in neurite outgrowth and
guidance.
Function Candidate gene (by lowest P-value)
Cadherin-catenin function CDH8, CDH2, CDH11, CTNNB1, CTNNA2, PKP4,
CTNND2, CDH4, CTNND1, CTNNA3
__________________________________________________________________
Cell adhesion NCAM2, CNTN3, OPCML, ODZ4, NID1, CNTN5, F3, PVRL1, PTPRG,
PARVA, FLRT2, ODZ2, NRXN1, ITGA9, ELMO1, FUT9, AMIGO2, KIRREL3,
CNTNAP2, NTM
__________________________________________________________________
Ion channel CACNA1I, CACNA1G
__________________________________________________________________
Axon guidance SEMA4D, RTN4R, ROBO2, SEMA5A, PLXDC2, SLITRK5, SEMA6A,
RGMA, UNC5D, ALCAM, NTNG2, RTN4RL1, PLXNC1, NRP2
__________________________________________________________________
Vesicle transport STX2, STX16, STXBP5, SYT6
__________________________________________________________________
Post-synaptic scaffold DLGAP2, MAGI1, MAGI2
__________________________________________________________________
Signal transduction DNER, SPRY4, FRK, PRKCA, DOK6, PDE3A, FER, IRS2,
SOCS2, SPRY2, FRS3, DOK5, FYN, LZTS1, PTPRD, FGFR2, NRG3, PPP2R2B ALK,
RYR2, PALM2-AKAP2, MAP3K7, NTRK3, NGF, PPM1H, GDNF, CXCR4, PTK2, NEDD9,
PTPN1, LEPR
__________________________________________________________________
Phosphatidylinositol signalling PLA2G6, PIK3C2B, PTEN, PLA2G4A
__________________________________________________________________
Cell polarity FAT1, PARD3, PARD6G, DCHS2
__________________________________________________________________
Rho-GTPase signalling NCK2, DOCK1, PREX1, CDC42EP4, RND3, RGNEF, DOCK8,
CIT, SRGAP3, KALRN, IQGAP2
__________________________________________________________________
Cytoskeletal regulation SGK1, MYLK, GPR56, APBB1IP, PTPRM, WIPF3,
PTPRT, MAP3K8, MICAL2, DGKG, COBL, CALD1
__________________________________________________________________
Transcription PUM2, A2BP1, NKX6-1, SOX14, EN2, EBF1, MAP3K1, FOXG1,
NFIC, BCL11A
[143]Open in a new tab
Outside of functions relating to neuritogenesis, the most significant
functional annotation reported by DAVID for the candidate gene set
relates to transmission of nerve impulses (p = 9.02E-04). We identified
40 genes in the candidate set related to neurotransmission
(synaptogenesis, neuronal excitability, synaptic plasticity, and
vesicle exocytosis) which did not have overlapping roles in neurite
regulation. Table [144]4 presents a summary of these genes by
functional category.
Table 4.
Autism candidate genes with roles in synaptic function.
Function Candidate gene (by lowest P-value)
Synaptogenesis LRRTM4, SYN3
__________________________________________________________________
Excitatory/inhibitory balance KCNIP1, KCNQ1, KCNQ5, KCNJ4, SLC6A13,
IQCF1, GABBR2, GRIK4, OAT, KCNN3, GRM3, GCOM1, CACNA2D1, GRM7, ADRB2,
KCNH7, KCNIP4, GRIK2, CACNG2, KCNMA1, KCNG1
__________________________________________________________________
Synaptic plasticity RIMS1, PTGER2, SLC24A2, NETO1, PTGS2
__________________________________________________________________
Vesicle exocytosis PTPRN2, AMPH, RAB11B, SYNPR
__________________________________________________________________
Other TPH2, CHRNA9, RIMBP2, ATXN1, CHRNB4, NOVA1, SNCAIP, CHRM3
[145]Open in a new tab
In order to investigate how the GWAS-NR results compared with the joint
APL tests and Fisher's tests, we examined the lists of top 5000 markers
selected based on GWAS-NR, joint APL test and Fisher's test P-values. A
total of 3328 of the markers are overlapped between the lists for the
GWAS-NR and joint APL tests, while 1951 of the markers are overlapped
between the lists for the GWAS-NR and Fisher's tests. Thus, GWAS-NR had
a higher concordance with the joint APL tests than the Fisher's tests.
The results suggested that Fisher's test may have the lowest
sensitivity to identify the true positives, which is consistent with
our simulation results. Moreover, 120 markers that are not overlapped
between Illumina Infinium Human 1M beadchip and Illumina HumanHap550
BeadChip were among the top 5000 markers selected based on GWAS-NR.
Some of the 120 markers are in the significant genes identified by
haplotype blocks such as PUM2, A2BP1, DNER and SEMA4D.
In order to similarly investigate the overlap of candidate genes
indentified by GWAS-NR and joint APL tests, we repeated the haplotype
block scoring method with the top 5000 markers as identified by joint
APL: this analysis resulted in 1924 significant LD blocks. Of these,
1257 overlapped with the blocks selected by GWAS-NR analysis.
Identification of the RefSeq genes within with these 1257 shared
regions showed that 380 potential candidate genes were shared by the
two methods. In addition, GWAS-NR analysis produced 53 non-overlapping
genes while the joint APL analysis produced 349 non-overlapping genes.
As GWAS-NR amplifies association signals that are replicated in
multiple flanking markers and across data sets, the method can be
expected to produce a reduced list of higher confidence candidate
regions for follow-up, compared with standard single-locus methods. At
the same time, GWAS-NR does not generate a large number of significant
candidates in regions that would otherwise be ranked as insignificant.
While it is not possible to exclude a role in autism for the 349
additional candidate genes produced by the joint APL analysis, it is
notable that among the top 20 gene ontology pathways reported by DAVID
[[146]41-[147]43] for this set of genes, not one is specific to
neuronal function (data not shown). This analysis highlights the
utility of GWAS-NR to narrow and prioritize follow-up gene lists.
Discussion
We propose the use of GWAS-NR, a noise-reduction method for genome-wide
association studies which aims to enhance the power to detect true
positive associations for follow-up analysis. Our results demonstrate
that GWAS-NR is a powerful method for the enhancement of the detection
of genetic associations. Simulation evidence using a variety of disease
models indicates that, when markers are ranked by P-values and
candidates are selected based on a threshold rank, GWAS-NR achieves
higher classification rates than the use of joint P-values or Fisher's
method. In simulated data, the GWAS-NR also achieves strong performance
when there is imperfect marker overlap across datasets and when the
closest disease-related polymorphism is not typed. As Müller-Myhsok and
Abel have observed, when less-than-maximum LD exists between a disease
locus and the closest biallelic marker, the required sample size to
achieve a given level of power may increase dramatically, particularly
if there is a substantial difference in allele frequencies at the
disease marker and the analysed marker [[148]49].
In the context of allelic association, noise can be viewed as observed
but random association evidence (for example, false positives) that is
not the result of true LD with a susceptibility or causative variant.
Such noise is likely to confound studies of complex disorders, where
genetic heterogeneity among affected individuals or complex
interactions among multiple genes may result in modest association
signals that are difficult to detect. The influence of positive noise
components is also likely to contribute to the so-called 'winner's
curse' phenomenon, whereby the estimated effect of a putatively
associated marker is often exaggerated in the initial findings,
compared with estimated effects in follow-up studies [[149]50]. GWAS-NR
appears to be a promising approach to address these challenges.
By amplifying signals in regions where association evidence is locally
correlated across datasets, the GWAS-NR captures information that may
be omitted or underutilized in single-marker analysis. However, the
GWAS-NR can achieve no advantage over simple joint analysis when
flanking markers provide no supplementary information. This is likely
to be true when a true risk locus is typed directly and a single-marker
association method is used or when a true risk haplotype is typed
directly and the number of markers examined in a haplotype-based
analysis is of the same length.
Joint analysis generally has more power than individual tests due to
the increase of sample size. Therefore, GWAS-NR, which uses P-values
from individual analyses as well as joint analysis of multiple
datasets, is expected to have more power than individual tests.
However, if there are subpopulations in the sample and the association
is specific to a subpopulation, joint analysis may not be as powerful
as an individual test for the subpopulation with the association
signal. If samples from multiple populations are analysed jointly, test
results for individual datasets should also be carefully examined with
the GWAS-NR results.
It is common for linear filters to include a large set of estimated
parameters to capture cross-correlations in the data at multiple leads
and lags. However, in a genomic context, the potentially uneven spacing
of markers and varying strength of linkage disequilibrium between
markers encouraged us to apply a parsimonious representation that would
be robust to data structure. We expect that a larger, well-regularized
parameterization may enhance the performance of the noise filter,
particularly if the filter is constructed to adapt to varying linkage
disequilibrium across the genome. This is a subject of further
research.
Our simulation results indicate that applying the modified TPM to
select LD blocks based on GWAS-NR can have conservative type I error
rates. The original TPM reported by Zaykin et al. [[150]40] produced
the expected level of type I error, as a known correlation matrix was
used in the simulations to account for correlation among P-values due
to LD among markers. However, the true correlation is unknown in real
datasets. Accordingly, we estimated correlations in our simulations and
analysis by bootstrapping replicates of samples, as well as using the
sample correlation between P-values obtained though single marker APL
and sliding window haplotype analysis. It is possible that the use of
estimated correlations may introduce extra variations in the
Monte-Carlo simulations of TPM, which may contribute to conservative
type I error rates. As we have demonstrated that GWAS-NR achieves
higher sensitivity at each level of specificity, the resulting regions
with top rankings can be expected to be enriched for true associations
when such associations are actually present in the data, even if the LD
block selection procedure is conservative. Overall, the simulation
results suggest that GWAS-NR can be expected to produce a condensed set
of higher confidence follow-up regions, and that this prioritization
strategy can control the number of false positives at or below the
expected number in analysis.
Autism
Our data identify potential candidate genes for autism that encode a
large subset of proteins involved in the outgrowth and guidance of
axons and dendrites to their appropriate synaptic targets. Our results
also suggest secondary involvement of genes involved in synaptogenesis
and neurotransmission which further contribute to the assembly and
function of neural circuitry. Taken together, these findings augment
existing genetic, epigenetic and neuropathological evidence suggestive
of altered neurite morphology, cell migration, synaptogenesis and
excitatory-inhibitory balance in autism [[151]49].
Altered dendritic structure is among the most consistent
neuroanatomical findings in autism [[152]51,[153]52] and several other
neurodevelopmental syndromes including Down, Rett and fragile-X
[[154]53,[155]54]. Recent neuroanatomical findings include evidence of
subcortical, periventricular, hippocampal and cerebellar heterotopia
[[156]55] and altered microarchitecture of cortical minicolumns
[[157]56], suggestive of dysregulated neuronal migration and guidance.
In recent years, evidence from neuroanatomical and neuroimaging studies
has led a number of researchers to propose models of altered cortical
networks in autism, emphasizing the possible disruption of long-range
connectivity and a developmental bias toward the formation of
short-range connections [[158]57,[159]58].
Neurite regulation is a common function of numerous top-ranking
candidates. PUM2 codes for pumilio homolog 2, which regulates dendritic
outgrowth, arborization, spine formation and filopodial extension of
developing and mature neurons [[160]59]. DNER regulates the
morphogenesis of cerebellar Purkinje cells [[161]60] and acts as an
inhibitor to retinoic-acid induced neurite outgrowth [[162]61]. A2BP1
binds with ATXN2 (SCA2), a dosage-sensitive regulator of actin filament
formation that is suggested to mediate the loss of
cytoskeleton-dependent dendritic structure [[163]62]. SEMA4D induces
axonal growth cone collapse [[164]63] and promotes dendritic branching
and complexity in later stages of development [[165]64,[166]65]. CDH8
regulates hippocampal mossy fibre axon fasciculation and targeting,
complementing N-cadherin (CDH2) in the assembly of synaptic circuits
[[167]66].
Neurite outgrowth and guidance can be conceptualized as a process
whereby extracellular signals are transduced to cytoplasmic signalling
molecules which, in turn, regulate membrane protrusion and neuronal
growth cone navigation by reorganizing the architecture of the neuronal
cytoskeleton. In general, neurite extension is dependent on microtubule
organization, while the extension and retraction of finger-like
filopodia and web-like lamellipodia from the neuronal growth cone is
dependent on actin dynamics. Gordon-Weeks [[168]67] and Bagnard
[[169]68] provide excellent overviews relating to growth cone
regulation and axon guidance. Figure [170]3 provides a simplified
overview of some of these molecular interactions.
Figure 3.
[171]Figure 3
[172]Open in a new tab
Simplified schematic illustrating molecular mechanisms of neurite
regulation. Extracellular events such as cell contact [[173]79],
guidance cues [[174]64], neurotransmitter release [[175]80], and
interactions with extracellular matrix components [[176]46] are
detected by receptors and cell adhesion molecules at the membrane
surface and are transduced via cytoplasmic terminals and multidomain
scaffolding proteins [[177]47] to downstream signalling molecules
[[178]81-[179]83]. Polarity and directional navigation is achieved by
coordinating local calcium concentration [[180]84], Src family kinases
[[181]85], cyclic nucleotide activation (cAMP and cGMP) [[182]86], and
phosphoinositide signalling molecules which affect the spatial
distribution and membrane recruitment of proteins that regulate the
neuronal cytoskeleton [[183]87]. Chief among these regulators are the
small Rho family GTPases RhoA, Rac and Cdc42, which serve as molecular
'switches' to activate downstream effectors of cytoskeletal remodelling
[[184]88]. In developed neurons, this pathway further regulates the
formation of actin-dependent microarchitecture such as mushroom-like
dendritic spines at the postsynaptic terminals of excitatory and
inhibitory synapses [[185]89]. This simplified schematic presents
components in an exploded format for tractability, and includes an
abridged set of interactions. Additional File [186]9 presents autism
candidate genes identified by GWAS-NR having known roles in neurite
regulation. RPTP (receptor protein tyrosine phosphatase); EphR (Eph
receptor); FGFR (fibroblast growth factor receptor); EphR (Eph
receptor); PLXN (plexin); NRP (neuropilin); Trk (neurotrophin
receptor); ECM (extracellular matrix); NetR (netrin receptor); NMDAR
(NMDA receptor); mGluR (metabotropic glutamate receptor); AA
(arachidonic acid); PLCγ (phospholipase C, gamma); MAGI (membrane
associated guanylate kinase homolog); IP3 (inositol
1,4,5-trisphosphate); DAG (diacylglycerol); PIP2 (phosphatidylinositol
4,5-bisphosphate); PIP3 (phosphatidylinositol 3,4,5-trisphosphate);
PI3K (phosphoinositide-3-kinase); nNOS (neuronal nitric oxide
synthase); NO (nitric oxide); IP3R (inositol trisphosphate receptor);
RyR (ryanodine receptor); GEF (guanine exchange factor); GAP (GTPase
activating protein); MAPK (mitogen-activated protein kinase); and JNK
(c-Jun N-terminal kinase).
The autism gene candidates identified by GWAS-NR show functional
enrichment in processes, including adhesion, cell motility,
axonogenesis, cell morphogenesis and neuron projection development.
Notably, a recent analysis of rare CNVs in autism by the Autism Genome
Project Consortium indicates similar functional enrichment in the
processes of neuronal projection, motility, proliferation, and Rho/Ras
GTPase signalling [[187]21].
We propose that, in autism, these processes are not distinct functional
classifications but instead cooperate as interacting parts of a
coherent molecular pathway regulating the outgrowth and guidance of
axons and dendrites. Consistent with this view, the candidate set is
enriched for numerous binding domains commonly found in proteins that
govern neuritogenesis. These include immunoglobulin, cadherin,
pleckstrin homology, MAM, fibronectin type-III and protein tyrosine
phosphatase (PTP) domains [[188]69-[189]71].
The cytoskeletal dynamics of extending neurites are largely governed by
the activity of Rho-GTPases, which act as molecular switches to induce
actin remodelling. Molecular evidence suggests that disassociation of
catenin from cadherin promotes the activation of Rho-family GTPases Rac
and Cdc42, resulting in cytoskeletal rearrangement [[190]72]. Guanine
nucleotide exchange factors (GEFs) such as DOCK1 [[191]73] and KALRN
[[192]74] activate Rho-GTPases by exchanging bound guanosine
diphosphate (GDP) for guanosine triphosphate (GTP), while GTPase
activating proteins (GAPs) such as SRGAP3 [[193]75] increase the rate
of intrinsic GTP hydrolysis to inactivate GTPases. Pleckstrin homology
domains, characteristic of several GEFs and GAPs, bind to
phosphoinositides to establish membrane localization and also may play
a signalling role in GTPase function [[194]76]. Certain GTPases outside
of the Rho family, particularly Rap and Ras, also exert an influence on
cytoskeletal dynamics and neurite differentiation [[195]77,[196]76].
Several genes in the candidate set with established roles in neurite
formation and guidance have been previously implicated in autism. These
include A2BP1 (P = 3.60E-05), ROBO2 (2.00E-03), SEMA5A (2.30E-03), EN2
(4.00E-03), CACNA1G (6.00E-03), PTEN (8.00E-03), NRXN1 (1.10E-02), FUT9
(1.80E-02), DOCK8 (2.10E-02), NRP2 (2.60E-02) and CNTNAP2 (2.70E-02).
Other previously reported autism candidate genes with suggestive roles
in neurite regulation include PCDH9 (1.76E-03), CDH9 (6.00E-03) and
CSMD3 (2.10E-02).
The enriched presence of transcription factors in the candidate set is
intriguing, as many of these candidates, including CUX2, SIX3, MEIS2
and ZFHX1B have established roles in the specification of GABAergic
cortical interneurons [[197]76]. Many guidance mechanisms in the
neuritogenic pathway, such as Slit-Robo, semaphorin-neuropilin, and
CXCR4 signalling also direct the migration and regional patterning of
interneurons during development. Proper targeting of interneurons is
vital to the organization of cortical circuitry, including minicolumnar
architecture which is reported to be altered in autism [[198]78]. Thus,
the functional roles of the candidate genes we identify may embrace
additional forms of neuronal motility and targeting.
Conclusions
We proposed a noise-reduction methodology, GWAS-NR, to enhance the
ability to detect associations in GWAS data. By amplifying signals in
regions where association evidence is locally correlated across
datasets, the GWAS-NR captures information that may be omitted or
underutilized in single-marker analysis. Simulation evidence
demonstrates that under a variety of disease models, GWAS-NR achieves
higher classification rates for true positive associations, compared
with the use of joint p-values or Fisher's method.
The GWAS-NR method was applied to autism data, with the objective of
prioritizing regions of association for follow-up analysis. Gene set
analysis was conducted in order to examine if the identified autism
candidate genes were over-represented in any biological pathway
relative to the background genes. The significance of a given pathway
suggests that the pathway may be associated with autism due to the
enrichment of autism candidate genes in that pathway. We find that many
of the implicated genes cooperate within a coherent molecular
mechanism. This neuritogenic pathway regulates the transduction of
membrane-associated signals to downstream cytoskeletal effectors that
induce the directional protrusion of axons and dendrites. This
mechanism provides a framework that embraces numerous genetic findings
in autism to date, and is consistent with neuroanatomical evidence.
While confirmation of this pathway will require additional evidence
such as the identification of functional variants, our results suggest
that autistic pathology may be mediated by the dynamic regulation of
the neuronal cytoskeleton, with resulting alterations in dendritic and
axonal connectivity.
Abbreviations
ADI-R: Autism Diagnostic Interview - Revised; AGRE: Autism Genetic
Resource Exchange; APL: association in the presence of linkage; AUC:
area under the curve; CNV: copy number variation; DAVID: Database for
Annotation, Visualization and Discovery; GTP: guanosine triphosphate;
LD: linkage disequilibrium; GWAS: Genome-wide association studies; NR:
noise reduction; RefSeq: Reference Sequence; ROC: receiver operating
characteristic; SNP: single nucleotide polymorphism; TPM: truncated
product method.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
All co-authors contributed to writing the manuscript. JPH was the
primary author of the manuscript, developed the statistical methods and
the design for their implementation and contributed pathway analysis of
candidate genes. RHC contributed to the development of the statistical
methods and study design and also conducted statistical analyses. AJG
contributed to molecular analysis and interpretation. JMJ, DS and DM
conducted statistical analyses. IK and PLW performed molecular analysis
and interpretation. JMV performed molecular analysis and contributed to
the study design. ERM provided input into study design, methods
development, statistical analyses, and interpretation of findings. MLC
analysed clinical data and contributed to the study design. JRG
performed molecular analysis, interpreted data and contributed to the
study design. JLH provided input to study design and statistical
analyses. MPV contributed to the design of the study, development of
methods, coordination of statistical and molecular analysis and
interpretation of data. She is also the primary investigator on the
parent study.
Supplementary Material
Additional File 1
Appendix
[199]Click here for file^ (27KB, DOC)
Additional File 2
Table S7: Haplotype configuration Association configuration for the
power simulations.
[200]Click here for file^ (31KB, DOC)
Additional File 3
Comparative classification rates for genome-wide association studies -
noise reduction (GWAS-NR), Joint analysis and Fisher's Test. GWAS-NR
has an area under the curve (AUC) of 0.679 and the joint and Fisher's
tests have AUC of 0.624 and 0.604, respectively, for the recessive
model. Also GWAS-NR has AUC of 0.855 and the joint and Fisher's tests
have AUC of 0.781 and 0.751, respectively, for the multiplicative
model. For the dominant model, AUC for GWAS-NR, the joint and Fisher's
tests are 0.964, 0.871 and 0.853, respectively. For the additive model,
AUC for GWAS-NR, the joint and Fisher's tests are 0.893, 0.806 and
0.771, respectively.
[201]Click here for file^ (152KB, DOC)
Additional File 4
Flow Chart: GWAS-NR analysis workflow in autism datasets. A flow chart
demonstrating the data analysis and candidate gene selection of the
autism datasets presented. HIHG: Hussman Institute for Human Genomics
dataset, AGRE: Autism Genetic Resource Exchange dataset, APL:
Association in the Presence of Linkage, GWAS-NR: Genome-wide
Association Study - Noise Reduction, DAVID: Database for Annotation,
Visualization and Integrated Discovery.
[202]Click here for file^ (89.3KB, PDF)
Additional File 5
Table S1: linkage disequilibrium (LD) blocks identified by Genome-wide
Association Study - Noise Reduction (GWAS-NR). Every LD block
identified by GWAS-NR and haplotype analysis with a P-value < 0.05 is
listed with the chromosome start and stop position, the length in
basepairs of the LD block, and the minimum GWAS-NR P-value of the
block.
[203]Click here for file^ (191KB, XLS)
Additional File 6
Table S2: RefSeq genes overlapping linkage disequilibrium (LD) blocks
identified by Genome-wide Association Study - Noise Reduction
(GWAS-NR). Every LD block identified by GWAS-NR and haplotype analysis
with a P-value < 0.05 and that overlaps a gene in the RefSeq database
is listed with the chromosome start and stop position, the length in
basepairs of the LD block, the minimum GWAS-NR P-value of the block,
and the RefSeq name of the gene(s) that overlap the block.
[204]Click here for file^ (77KB, XLS)
Additional File 7
Table S3: RefSeq genes nearest to linkage disequilibrium (LD) blocks
identified by Genome-wide Association Study - Noise Reduction
(GWAS-NR). Every LD block identified by GWAS-NR and haplotype analysis
with a P-value < 0.05 that does not overlap with a gene in the
reference sequence (RefSeq) database is listed with the chromosome
start and stop position, the length in basepairs of the LD block, the
minimum GWAS-NR P-value of the block and the RefSeq name of the gene(s)
that is nearest to the block.
[205]Click here for file^ (131KB, XLS)
Additional File 8
Table S4: Autism candidate genes identified by Genome-wide Association
Study - Noise Reduction (GWAS-NR). A complete list of reference
sequence (RefSeq) genes either overlapping or nearest to every LD
blocks with the P-value of either the overlapping or nearest block.
[206]Click here for file^ (81.5KB, XLS)
Additional File 9
Table S5: Autism candidate genes [Genome-wide Association Study - Noise
Reduction (GWAS-NR)] having known roles in neurite outgrowth and
guidance. A list of autism candidate genes with known roles in neurite
outgrowth and axon guidance followed by a comment on molecular function
and PubMed identifications of supporting literature.
[207]Click here for file^ (78.5KB, XLS)
Additional File 10
Table S6: autism candidate genes [Genome-wide Association Study - Noise
Reduction (GWAS-NR)] having suggestive roles in neurite outgrowth and
guidance. A list of autism candidate genes with presumptive roles in
neurite outgrowth and axon guidance followed by a comment on molecular
function and PubMed identifications of supporting literature.
[208]Click here for file^ (38.5KB, XLS)
Contributor Information
John P Hussman, Email: hussman@hussmanfoundation.org.
Ren-Hua Chung, Email: RChung@med.miami.edu.
Anthony J Griswold, Email: agriswold@med.miami.edu.
James M Jaworski, Email: jjaworski@med.miami.edu.
Daria Salyakina, Email: DSalyakina@med.miami.edu.
Deqiong Ma, Email: dma@med.miami.edu.
Ioanna Konidari, Email: IKonidari@med.miami.edu.
Patrice L Whitehead, Email: PWhitehead@med.miami.edu.
Jeffery M Vance, Email: JVance@med.miami.edu.
Eden R Martin, Email: EMartin1@med.miami.edu.
Michael L Cuccaro, Email: mcuccaro@med.miami.edu.
John R Gilbert, Email: jgilbert@med.miami.edu.
Jonathan L Haines, Email: jonathan@chgr.mc.vanderbilt.edu.
Margaret A Pericak-Vance, Email: mpericak@med.miami.edu.
Acknowledgements