Abstract Background The causes of poor respiratory function and COPD are incompletely understood, but it is clear that genes and the environment play a role. As DNA methylation is under both genetic and environmental control, we hypothesised that investigation of differential methylation associated with these phenotypes would permit mechanistic insights, and improve prediction of COPD. We investigated genome-wide differential DNA methylation patterns using the recently released 850 K Illumina EPIC array. This is the largest single population, whole-genome epigenetic study to date. Methods Epigenome-wide association studies (EWASs) of respiratory function and COPD were performed in peripheral blood samples from the Generation Scotland: Scottish Family Health Study (GS:SFHS) cohort (n = 3781; 274 COPD cases and 2919 controls). In independent COPD incidence data (n = 149), significantly differentially methylated sites (DMSs; p < 3.6 × 10^−8) were evaluated for their added predictive power when added to a model including clinical variables, age, sex, height and smoking history using receiver operating characteristic analysis. The Lothian Birth Cohort 1936 (LBC1936) was used to replicate association (n = 895) and prediction (n = 178) results. Findings We identified 28 respiratory function and/or COPD associated DMSs, which mapped to genes involved in alternative splicing, JAK-STAT signalling, and axon guidance. In prediction analyses, we observed significant improvement in discrimination between COPD cases and controls (p < .05) in independent GS:SFHS (p = .016) and LBC1936 (p = .010) datasets by adding DMSs to a clinical model. Interpretation Identification of novel DMSs has provided insight into the molecular mechanisms regulating respiratory function and aided prediction of COPD risk. Further studies are needed to assess the causality and clinical utility of identified associations. Fund Wellcome Trust Strategic Award 10436/Z/14/Z. __________________________________________________________________ Research in context. Evidence before this study We searched for articles in PubMed published in English up to July 25, 2018, with the search terms “DNA methylation” and “respiratory function”, or “COPD”. We found some evidence for association between differential DNA methylation and both respiratory function and COPD. Of the twelve previous studies identified, eight used peripheral blood samples (sample size [N] range = 100–1,085) and four used lung tissue samples (N range = 24–160). The number of CpG loci analysed range from 27,578 to 485,512. These studies have not identified consistent changes in methylation, most likely due to a combination of factors including small sample sizes, technical issues, phenotypic definitions, and study design. In addition, no previous study has: analysed a sample from a large single cohort; used the recently released Illumina EPIC array (which assesses ~850,000 CpG loci); adjusted both methylation data and phenotype for smoking history, or used both prevalent and incident COPD electronic health record data. Added value of this study To our knowledge, this is the largest single cohort epigenome-wide association study (EWAS) of respiratory function and COPD to date (n = 3,781). After applying stringent genome-wide significance criteria (p < 3.6 × 10^−8), we found that DNA methylation levels at 28 CpG sites in peripheral blood were associated with respiratory function or COPD. Of these 28, seven were testable in an independent population sample: all seven showed consistent direction of effect between the two samples and three showed replication (p < .007 [0.05/7 CpG sites tested]). Our results suggest that adjustment of both the phenotypic and the DNA methylation probe data for smoking history, which has not been carried out in previous studies, reduces the confounding effects of smoking, identifies larger numbers of associations, and reduces the heterogeneity of effects across smoking strata. We used gene set enrichment and pathway analyses, together with an approach that combines DNA methylation results with gene expression data to provide evidence for enrichment of differentially methylated sites in genes linked to alternative splicing, and JAK-STAT signalling and axon guidance. Finally, we demonstrated that the inclusion of DNA methylation data improves COPD risk prediction over established clinical variables alone in two independent datasets. Implications of all the available evidence There is now accumulating evidence that DNA methylation in peripheral blood is associated with respiratory function and COPD. Our study has shown that DNA methylation levels at 28 CpG sites are robustly associated with respiratory function and COPD, provide mechanistic insights, and can improve prediction of COPD risk. Further studies are warranted to improve understanding of the aetiology of COPD, explore causality and to assess the utility of DNA methylation profiling in the clinical management of this condition. Alt-text: Unlabelled Box 1. Introduction Respiratory function is influenced by both environmental factors and genetic factors, with heritability estimates ranging from 39 to 66% [[51]1,[52]2]. Epigenetic modifications are at the interface of genetics and the environment. DNA methylation, the covalent binding of a methyl group to the 5′ carbon of cytosine-phosphate-guanine (CpG) dinucleotide sequences in the genome, is an epigenetic modification of DNA that is associated with gene expression. Epigenome-wide association studies (EWASs) have the potential to provide mechanistic insights into impaired respiratory function and COPD pathogenesis. Previous EWASs of spirometric measures of respiratory function and respiratory disease have however produced inconsistent results, with some identifying significant associations [[53][3], [54][4], [55][5], [56][6]], and others not [[57][7], [58][8], [59][9]]. Moreover, there has been little consistency between the positive findings reported [[60]9,[61]10]. Studies of lung tissue [[62]5,[63]8] have been constrained by sample availability, with the largest study to date comprising 160 subjects [[64]5]. Inconsistency amongst the results of the peripheral blood-based studies [[65]3,[66]6,[67]7,[68]9,[69]11] is likely to be due to a number of factors, including small sample size (e.g., two studies had <200 samples) [[70]6,[71]11] and/or investigation of a relatively small number (~27,000) of CpG loci [[72]3,[73]11]. The study with the largest number of samples (n = 1085) analysed only 27,000 CpG loci, while the largest study using the 450 K array (the predecessor to the array used here) analysed 920 samples [[74]12]. Differences in spirometric measures, definitions of COPD, study population characteristics and study design, in particular in the method used to adjust for smoking history, are also likely to be important sources of variation [[75]9,[76]10]. Smoking is established as a major risk factor for COPD [[77]13], and previous genome-wide DNA methylation have focused on DNA methylation associated with smoking and COPD [[78]5,[79]14,[80]15]. However, not all smokers develop COPD and >25% of COPD cases occur in never smokers [[81]16]. Results from a growing number of studies suggested that impaired respiratory function and COPD are strongly associated with risk factors other than smoking [[82][17], [83][18], [84][19]], and have a strong genetic component [[85][20], [86][21], [87][22]] that generally acts independently of smoking [[88]23]. To understand the pathological mechanisms of impaired respiratory function and COPD other than smoking we sought to identify robust associations by assessing methylation in a large single cohort sample, applying a more rigorous correction for smoking history and by performing sensitivity analyses. In contrast to prior studies, we used the recently released Illumina EPIC array, which interrogates over 850,000 methylation sites. All 3781 individuals in our sample were from a single cohort with extensive and consistent phenotyping comprising clinical investigation, questionnaire, and linkage to routine medical health records. The cross-sectional design of prior studies has limited their capacity to distinguish cause and effect [[89]10]. To identity predictive biomarkers of COPD, and to provide insights into the causal nature of our findings we tested our findings for their predictive power. We used an independent subpopulation of 150 participants with incident COPD who were disease free at the time of blood sampling. Finally, where data were available, we attempted to replicate our EWAS and prediction findings in an independent cohort, LBC1936, drawn from the same population. 2. Material and methods A flow chart showing the overall study design is outlined in [90]Fig. 1, and full description of the methods is provided in the appendix. Fig. 1. [91]Fig. 1 [92]Open in a new tab Flow-chart showing the analysis pipeline. Direction of the arrows represents the workflow of the study design with performed analysis indicated. Lemon and blue boxes represent the in the discovery Generation Scotland: Scottish Family health study cohort and the replication Lothian Birth Cohort of 1936 (LBC1936) data sets respectively. The grey box indicates input data of COPD case-control differential expression in lung tissue. The green boxes indicate the analyses undertaken. The black arrows and gold boxes indicate output of significant results. (For interpretation of the references to colour in