Abstract Background Proteomic biomarkers related to cardiovascular disease risk factors may offer insights into the pathogenesis of cardiovascular disease. We investigated whether modifiable lifestyle risk factors for cardiovascular disease are associated with distinctive proteomic signatures. Methods and Results We analyzed 1305 circulating plasma proteomic biomarkers (assayed using the SomaLogic platform) in 897 FHS (Framingham Heart Study) Generation 3 participants (mean age 46±8 years; 56% women; discovery sample) and 1121 FOS (Framingham Offspring Study) participants (mean age 52 years; 54% women; validation sample). Participants were free of hypertension, diabetes mellitus, and clinical cardiovascular disease. We used linear mixed effects models (adjusting for age, sex, body mass index, and family structure) to relate levels of each inverse‐log transformed protein to 3 lifestyle factors (ie, smoking, alcohol consumption, and physical activity). A Bonferroni‐adjusted P value indicated statistical significance (based on number of proteins and traits tested, P<4.2×10^−6 in the discovery sample; P<6.85×10^−4 in the validation sample). We observed statistically significant associations of 60 proteins with smoking (37/40 top proteins validated in FOS), 30 proteins with alcohol consumption (23/30 proteins validated), and 5 proteins with physical activity (2/3 proteins associated with the physical activity index validated). We assessed the associations of protein concentrations with previously identified genetic variants (protein quantitative trait loci) linked to lifestyle‐related disease traits in the genome‐wide‐association study catalogue. The protein quantitative trait loci were associated with coronary artery disease, inflammation, and age‐related mortality. Conclusions Our cross‐sectional study from a community‐based sample elucidated distinctive sets of proteins associated with 3 key lifestyle factors. Keywords: alcohol consumption, lifestyle, physical activity, proteomics, smoking Subject Categories: Proteomics, Risk Factors, Cardiovascular Disease, Lifestyle, Exercise __________________________________________________________________ Nonstandard Abbreviations and Acronyms FHS Framingham Heart Study FOS Framingham Offspring Study GWAS genome‐wide association PAI physical activity index pQTL protein quantitative trait loci Clinical Perspective What Is New? * Using data from 2 generations of Framingham Heart Study participants, we identified associations between 3 modifiable lifestyle risk factors (smoking, alcohol consumption, and physical activity) and distinct sets of plasma protein concentrations related to known biological pathways. * The proteomic signature of smoking showed high representation of proteins related to cytokine–cytokine receptor interaction, Th1 and Th2 cell differentiation, interleukin‐17 signaling pathway, chemokine signaling pathway, and complement and coagulation cascades. What Are the Clinical Implications? * Our analysis integrates information from high‐throughput proteomics assays, genetic associations, biological pathways, and weighted predictive models to identify possible mechanisms by which modifiable lifestyle factors may affect the risk of chronic disease. Modifiable lifestyle factors (eg, smoking, diet, physical activity, and alcohol consumption) are associated with at least 50% of the burden of myocardial infarction.[44] ^1 Previous studies reported associations between specific genotypes and gene expression patterns with select cardiovascular disease (CVD) risk factors.[45] ^2 , [46]^3 , [47]^4 , [48]^5 , [49]^6 However, how established CVD risk factors affect the pathogenesis of subclinical and clinical disease progression at the molecular level remains incompletely understood.[50] ^7 , [51]^8 , [52]^9 Subclinical CVD progression is a complex process involving changes at the level of target tissues and circulating proteins.[53] ^10 , [54]^11 , [55]^12 Until recently, technological limitations have challenged our ability to study systemic responses at the level of circulating proteins. The recent availability of high‐throughput aptamer‐based proteomics assays have enabled us to identify sets of circulating proteins statistically associated with specific CVD risk factors (herein referred to as “proteomic signatures”), including those associated with modifiable lifestyle factors. Such proteomic studies could identify new plasma biomarkers of disease risk and potential therapeutic pathways and molecular targets.[56] ^13 We performed a cross‐sectional analysis of data from 2 generations of FHS (Framingham Heart Study) participants to assess the relations between plasma concentrations of >1300 proteins and CVD risk factors. We focused on the relations between circulating proteins and 3 modifiable behavioral risk factors (ie, smoking, alcohol consumption, and physical activity) for which the directionality of association is more likely to be from the risk factor to the protein concentrations (rather than vice versa). We also examined the biological pathways enriched with proteins associated with CVD risk factors and the association of protein concentrations with genetic variants linked to lifestyle‐related disease traits. Methods The data that support the findings of this study are available from the corresponding author upon reasonable request. Study Sample The community‐based, prospective FHS includes multiple cohorts of individuals with residential or familial connections to Framingham, Massachusetts or surrounding towns.[57] ^14 , [58]^15 The FOS (Framingham Offspring Study) began in 1971 and the Third Generation study began in 2002. Our discovery study sample included 897 FHS Generation 3 participants (56% women; mean age 46 years) who were free of hypertension, diabetes mellitus, and clinical CVD at their second examination (2008–2011). Hypertension was defined as a systolic blood pressure of 140 mm Hg or higher, a diastolic blood pressure of 90 mm Hg or higher, or current use of antihypertensive medication.[59] ^16 Diabetes mellitus was defined as a fasting blood glucose level of 126 mg/dL or higher or current use of insulin or glucose‐lowering medication.[60] ^17 CVD was defined as a composite of coronary artery disease, heart failure, or stroke.[61] ^18 Our validation study sample included 1121 FOS participants (54% women; mean age 52 years) without hypertension, diabetes mellitus, or clinical CVD who attended their fifth examination (1991–1995). These examinations were chosen because of availability of blood proteomic profiling at the given time points. We used the FHS Generation 3 sample as the discovery sample rather than the validation sample because of the greater number of proteins analyzed and the availability of additional objectively measured physical activity phenotypes. The study was approved by the Boston University Medical Center institutional review board, and all participants provided written informed consent. Risk Factors Our risk factors of interest were smoking, current alcohol consumption (g/d derived from the number of drinks [12‐oz beer, 5‐oz wine, or 1.5‐oz 80‐proof liquor] consumed per week over the course of a year), and physical activity. We analyzed 2 smoking variables: current smoker (versus not current smoker, serving as referent) and the number of packs of cigarettes smoked per day. Individuals were considered current smokers if they self‐reported smoking at least 1 cigarette per day. Participants were considered former smokers or former consumers of alcohol if they had self‐identified as being a current smoker or consumer of alcohol, respectively, at any previous examination cycle but not at the present examination cycle. In Generation 3 participants who wore an accelerometer attached to a belt worn around the waist for at least 10 hours per day for at least 3 days (out of 8 days possible), we analyzed several accelerometer‐based physical activity variables: minutes of sedentary physical activity/d (≤100 counts/min), minutes of moderate physical activity/d (1535–3959 counts/min), minutes of moderate–vigorous physical activity/d (≥1535 counts/min), minutes of vigorous physical activity/d (≥3960 counts/min), and steps/d (a summary measure of the accelerometry data).[62] ^19 , [63]^20 , [64]^21 These data were not measured at the fifth FOS examination because that examination was conducted in the pre‐accelerometry era. Therefore, we analyzed another measure (ie, the physical activity index [PAI]) that was available in both the FOS and Generation 3 participants at the respective examinations of interest. The PAI was assessed based on participants' self‐reported distribution of physical activity intensity during a typical 24‐hour period, as detailed previously.[65] ^22 Activities were designated as specific intensity levels based on the metabolic equivalent of task or oxygen consumption. The PAI was calculated as (hours sleeping+1.1×hours engaged in sedentary activities+1.5×hours engaged in slight physical activity+2.4×hours engaged in moderate physical activity+5×hours engaged in heavy physical activity).[66] ^22 In secondary analyses, we also considered associations with former versus never smoking status and former versus never alcohol consumption. Protein Quantification Blood plasma samples were collected and stored at −80°C until assayed. In the discovery study sample, we analyzed 1305 circulating plasma proteomic biomarkers using the SOMAscan platform Version 1.3k (SomaLogic Inc., Boulder, CO). In the validation study sample, we analyzed 1061 overlapping circulating plasma proteomic biomarkers that were available on an earlier version of the SOMAscan platform (Version 1.1k).[67] ^23 , [68]^24 The median intra‐ and interassay coefficient of variation across all proteins assayed in FHS Generation 3 was 2.3% and 4.4%, respectively. For the FOS data, the median intra‐assay coefficient of variation was <4% and the median inter‐assay coefficient of variation was <7% across all proteins. The protocol to ensure data quality is described in Figure [69]S1 and Data [70]S1. Statistical Analysis We used linear mixed effects models (the “LMEKIN” function of Kinship Package in R) to relate the circulating concentrations of proteins (dependent variables) and select lifestyle factors (independent variables), respectively. We applied a rank‐based inverse‐normal transformation to all protein biomarkers based on overall distributions. Continuous phenotypes (number of packs of cigarettes smoked and all physical activity phenotypes) were natural log‐transformed (zeros were treated as ones for purposes of natural log‐transformations). Alcohol consumption (an independent variable) was not natural‐log transformed for ease of interpretation. Results of previous analyses with alcohol consumption did not differ substantially whether alcohol consumption was transformed or not.[71] ^25 Except where specified differently, all models adjusted for age, sex, and body mass index. To account for multiple testing, we adopted a Bonferroni correction. We consider a P value to be significant if P<0.05/(1305 proteins×9 lifestyle variables=4.2×10^−6; rounded down to be conservative). We considered proteins to be part of a putative proteomic signature if the associations met this statistical threshold. For the validation sample, we assessed 40 proteins with the lowest P values for the association with smoking (based on either current smoking or number of packs of cigarettes smoked; all 40 proteins associated with current smoking), all 30 proteins significantly associated with alcohol consumption, and the 3 proteins significantly associated with physical activity (using the PAI) based on the Bonferroni‐corrected P value threshold. In the FOS validation sample, we used a Bonferroni threshold based on the total number of proteins assessed for all 3 risk factors (P<0.05/73=6.85×10^−4). We used the same covariates in the validation sample as in the discovery sample. To understand potential biological functions of the genes corresponding to the top proteins, we performed a pathway enrichment analysis. Biological pathways were identified from the Kyoto Encyclopedia of Genes and Genomes pathway database.[72] ^26 We excluded pathways that contained <5 proteins or included >2000 proteins. The enrichment of top proteins in pathways was assessed by the hypergeometric test, and enriched pathways were defined as those with a false discovery rate <0.05.[73] ^27 , [74]^28 Additionally, we estimated prediction models based on protein signatures from the discovery sample, and then validated the association of the derived predictors with lifestyle factors in the validation sample. The predictors were derived from the weighted sum of all the proteins associated significantly with each lifestyle factor, whereas the weight was the beta estimate for each protein.[75] ^29 , [76]^30 The area under the curve was used to measure the performance of predictors for dichotomous traits, and the correlation coefficient was used to measure the performance for continuous traits. We used results from a previously performed genome‐wide association (GWAS) analysis of 40 proteins with genotypes of single nucleotide polymorphisms (SNPs) imputed to the 1000 Genomes Project Phase I Version 3 reference. A total of 1622 FOS participants with genotypes and protein levels whose data met quality control criteria (call rate >97%, no excessive heterozygosity or high Mendelian error rate) were included in the GWAS. A total of 378 163 high‐quality genotyped SNPs (call rate ≥97%, P value for the Hardy‐Weinberg test statistic ≥1×10^−6, Mishap P≥1×10^−9, <100 Mendelian errors, minor allele frequency ≥1%) from Affymetrix 500K and Molecular Inversion Probe 50K arrays were included as the backbone for the imputation. The imputed genotypes included ≈16 million imputed SNPs. SNPs that were significantly associated with lifestyle‐related proteins (P<5.0×10^−8; threshold for genome‐wide significance) were probed for significant associations with lifestyle‐related disease traits using the GWAS Catalog, accounting for kinship and population structure.[77] ^31 , [78]^32 Additionally, for SNPs significantly associated with top proteins associated with smoking, we separately investigated their associations with lung function, lung cancer, and chronic obstructive pulmonary disease in the published literature.[79] ^33 , [80]^34 , [81]^35 , [82]^36 , [83]^37 , [84]^38 , [85]^39 Results Our discovery sample included 897 middle‐aged adults free of hypertension, diabetes mellitus, and cardiovascular disease (Table [86]1). Associations between all 1305 proteins and all 9 risk factor variables are listed in Tables [87]S1 and S2. Based on the Bonferroni‐corrected P value threshold (P<4.2×10^−6), 53 proteins were significantly associated with current smoking, 49 proteins with number of packs of cigarettes smoked, 30 proteins with alcohol consumption, and 5 proteins with at least 1 of the physical activity risk factors (Table [88]2). In the FOS sample (n=1121; mean age=52 years), we observed significant associations for 37 of the 40 proteins with the strongest associations with smoking in the discovery sample, 23 of the 30 proteins associated with alcohol consumption, and 2 of the 3 proteins associated with PAI (at a Bonferroni P<6.85×10^−4) (Table [89]2). We also estimated protein signature predictors by weighting the association within the discovery sample, and then predicting the lifestyle factors in the validation sample.[90] ^29 , [91]^30 The protein signature predictors reached an area under the curve of 0.83 (P=9.9×10^−51) for current smokers, correlation coefficient of 0.55 (P=1.2×10^−88) for packs of cigarette, and correlation coefficient of 0.40 (P=2.9×10^−44) for alcohol consumption. Table 1. Characteristics of the Samples Generation 3 Framingham Offspring Study (N=1121) N Mean (SD) or % (n) Mean (SD) or % (n) Age, y 897 45.6 (8) 52 (9.5) BMI, kg/m^2 897 26.7 (4.8) 26.2 (4.3) Waist circumference, cm 897 93.2 (13.2) 88.9 (13.2) Overweight (BMI ≥25 and <30 kg/m^2) 897 37.8% (339) 40.4% (453) Obese (BMI ≥30 kg/m^2) 897 21.4% (192) 16.1% (180) Total cholesterol, mg/dL 897 186 (32.7) 201.2 (35.9) Fasting blood glucose, mg/dL 889 93.3 (10.9) 93.4 (8.9) Systolic blood pressure, mm Hg 897 113.4 (12.3) 117.1 (12.0) Diastolic blood pressure, mm Hg 896 72.9 (8.7) 71.3 (8.2) Current smoker 897 8.5% (76) 21.0% (236) Alcohol consumption, g/d 896 10.2 (13.1) 10.4 (15.1) Sedentary activity (≤100 counts/min) 785 699.7 (76.2) … Moderate activity (1535–3959 counts/min) 785 18.1 (14.9) … Moderate or vigorous activity (≥1535 counts/min) 785 23.3 (23.1) … Vigorous activity (≥3960 counts/min) 785 5.3 (12.2) … Steps/d 785 8262 (3672) … Physical activity index 785 36.5 (5.9) 34.8 (6.1) [92]Open in a new tab BMI indicates body mass index. Table 2. Proteins Significantly Associated With Smoking, Alcohol Consumption, and Physical Activity Protein Estimated Beta P Value Validation P Value Current smoking (n=896) Polymeric immunoglobulin receptor (PIGR) 1.171 7.22×10^−28 4.41×10^−55 Osteomodulin (OMD) −0.928 1.05×10^−16 1.05×10^−28 Secretory leukocyte peptidase inhibitor (SLPI) 0.910 4.40×10^−16 9.96×10^−19 Major histocompatibility class‐I related protein (MIC‐1) 0.800 6.56×10^−15 4.62×10^−7 Repulsive guidance molecule BMP co‐receptor b (RGMB) −0.852 9.26×10^−15 3.49×10^−19 Interferon gamma–induced protein (IP‐10) −0.832 3.34×10^−13 1.10×10^−11 Neural cell adhesion molecule (NCAM‐120) −0.735 5.72×10^−13 1.06×10^−26 Trefoil factor 2 0.733 1.48×10^−12 6.24×10^−14 Neuronal cell adhesion molecule (Nr‐CAM) −0.761 2.20×10^−12 5.50×10^−9 Leukotriene A‐4 hydrolase (LKHA4) 0.772 2.50×10^−11 2.44×10^−15 Interleukin 23 (IL‐23) −0.731 3.45×10^−11 6.00×10^−5 Adhesion G protein–coupled receptor E2 (EMR2) −0.750 5.67×10^−11 1.30×10^−22 Heparin cofactor II 0.721 6.98×10^−11 1.01×10^−16 Tetraspanin 5 (NET4) 0.697 1.82×10^−10 3.45×10^−6 Intercellular adhesion molecule 5 (sICAM‐5) 0.682 1.83×10^−10 5.21×10^−21 Growth arrest specific 1 (GAS1) −0.664 1.92×10^−10 1.40×10^−8 Leucine‐rich repeat‐containing protein 11 (SLIK5) −0.655 2.55×10^−10 1.45×10^−16 Mevalonate disphosphate decarboxylase 2 (MDC) 0.674 7.30×10^−10 1.06×10^−13 Brevican core protein (PGCB) −0.622 1.49×10^−9 8.31×10^−10 Capping actin protein, gelsolinlike (CAPG) 0.686 1.56×10^−9 5.92×10^−17 Latent transforming growth factor beta binding protein 4 (LTBP4) −0.647 1.72×10^−9 1.04×10^−6 Ubiquitin‐conjugating enzyme E2G2 (UB2G2) 0.622 2.03×10^−9 6.68×10^−2 Matrix metallopeptidase 9 (MMP‐9) 0.687 2.19×10^−9 3.48×10^−12 Matrix metallopeptidase 10 (MMP‐10) 0.674 2.88×10^−9 4.19×10^−5 Notch‐3 −0.615 6.20×10^−9 2.26×10^−10 Cathepsin H 0.627 1.01×10^−8 5.35×10^−8 Endocan −0.622 1.13×10^−8 1.55×10^−11 Semaphorin 3E −0.605 1.38×10^−8 1.86×10^−10 Pleiotrophin (PTN) −0.544 1.65×10^−8 1.53×10^−3 Periostin −0.614 2.51×10^−8 3.41×10^−15 Eotaxin 0.583 3.23×10^−8 5.99×10^−14 Bone morphogenetic protein receptor type 1A (BMPR1A) −0.571 3.44×10^−8 1.43×10^−10 Unc‐5 netrin receptor D (UNC5H4) −0.499 4.43×10^−8 2.84×10^−9 Chemokine (C‐C motif) ligand 21 (6Ckine) 0.583 1.37×10^−7 Jagged canonical Notch ligand 1 (JAG1) −0.567 1.40×10^−7 5.93×10^−16 Calgranulin B 0.601 1.49×10^−7 SPARC‐related modular calcium binding 1 (SMOC1) −0.579 1.66×10^−7 Dermatopontin (DERM) −0.579 1.76×10^−7 2.17×10^−20 Regenerating family member 4 (REG4) 0.555 1.92×10^−7 Xtp3a‐related NTP pyrophosphatase (XTP3A) −0.587 2.27×10^−7 Carbonic anhydrase 6 −0.526 3.14×10^−7 9.96×10^−14 Nectin‐like protein 1 −0.521 3.42×10^−7 Glypican 3 −0.557 3.56×10^−7 Sialic acid binding Ig‐like lectin 7 (Siglec‐7) 0.564 5.03×10^−7 5.56×10^−15 Gelsolin −0.556 9.80×10^−7 Neurotrophic receptor tyrosine kinase 2 (TrkB) −0.566 1.18×10^−6 Adrenomedullin 0.568 1.48×10^−6 3.69×10^−3 Haptoglobin, mixed type 0.513 1.96×10^−6 Adiponectin −0.453 2.07×10^−6 Kynureninase (KYNU) 0.541 2.48×10^−6 1.63×10^−9 Ephrin receptor A5 (EphA5) −0.522 3.78×10^−6 Neurotrophic receptor tyrosine kinase 3 (TrkC) −0.481 3.80×10^−6 Chemokine (C‐C motif) ligand 17 (TARC) 0.523 4.17×10^−6 1.02×10^−12 Packs of cigarettes (n=895) Polymeric immunoglobulin receptor (PIGR) 0.087 1.15×10^−29 1.59×10^−65 Secretory leukocyte peptidase inhibitor (SLPI) 0.066 2.08×10^−16 8.51×10^−22 Osteomodulin (OMD) −0.060 9.89×10^−14 1.83×10^−29 Major histocompatibility class I–related protein (MIC‐1) 0.053 5.88×10^−13 2.43×10^−10 Trefoil factor 2 0.053 1.24×10^−12 2.33×10^−15 Neural cell adhesion molecule (NCAM‐120) −0.052 1.62×10^−12 7.52×10^−28 Repulsive guidance molecule BMP co‐receptor b (RGMB) −0.055 3.45×10^−12 6.31×10^−22 Leukotriene A‐4 hydrolase (LKHA4) 0.056 1.60×10^−11 3.90×10^−16 Capping actin protein, gelsolinlike (CAPG) 0.053 7.93×10^−11 1.42×10^−19 Leucine‐rich repeat‐containing protein 11 (SLIK5) −0.047 1.85×10^−10 4.65×10^−18 Adhesion G protein‐coupled receptor E2 (EMR2) −0.052 3.45×10^−10 1.06×10^−23 Matrix metallopeptidase 9 (MMP‐9) 0.052 3.61×10^−10 1.03×10^−12 Tetraspanin 5 (NET4) 0.049 4.00×10^−10 2.19×10^−6 Sialic acid binding Ig‐‐like lectin 7 (Siglec‐7) 0.050 5.70×10^−10 1.57×10^−16 Cathepsin H 0.049 8.77×10^−10 9.15×10^−11 Growth arrest specific 1 (GAS1) −0.046 9.34×10^−10 6.93×10^−9 Neuronal cell adhesion molecule (Nr‐CAM) −0.047 2.04×10^−9 8.30×10^−10 Ubiquitin‐conjugating enzyme E2G2 (UB2G2) 0.044 2.58×10^−9 6.13×10^−2 Mevalonate disphosphate decarboxylase 2 (MDC) 0.047 3.52×10^−9 2.15×10^−15 Brevican core protein (PGCB) −0.044 3.96×10^−9 2.37×10^−10 Interleukin 23 (IL‐23) −0.046 8.71×10^−9 8.37×10^−6 Carbonic anhydrase 6 −0.043 8.92×10^−9 3.44×10^−16 Endocan −0.044 1.47×10^−8 1.85×10^−11 Heparin cofactor II 0.045 1.80×10^−8 3.63×10^−17 Jagged canonical Notch ligand 1 (JAG1) −0.044 1.92×10^−8 1.72×10^−17 Kynureninase (KYNU) 0.046 2.20×10^−8 5.59×10^−10 Dermatopontin (DERM) −0.044 3.11×10^−8 1.97×10^−19 Eotaxin 0.042 3.58×10^−8 4.03×10^−13 Matrix metallopeptidase 10 (MMP‐10) 0.044 7.00×10^−8 1.00×10^−4 Adrenomedullin 0.046 7.42×10^−8 2.91×10^−3 Chemokine (C‐C motif) ligand 17 (TARC) 0.044 7.46×10^−8 8.72×10^−12 Unc‐5 netrin receptor D (UNC5H4) −0.035 7.56×10^−8 2.22×10^−10 Glypican 3 −0.042 7.60×10^−8 Chemokine (C‐C motif) ligand 21 (6Ckine) 0.043 9.03×10^−8 Matrix metallopeptidase 12 (MMP‐12) 0.042 1.51×10^−7 Interferon gamma‐induced protein (IP‐10) −0.044 1.52×10^−7 1.18×10^−12 Calgranulin B 0.043 1.62×10^−7 Thrombin −0.040 3.01×10^−7 Intercellular adhesion molecule 5 (sICAM‐5) 0.039 4.72×10^−7 2.05×10^−23 SPARC‐related modular calcium binding 1 (SMOC1) −0.040 7.34×10^−7 Nectin‐like protein 1 −0.036 8.28×10^−7 Euchromatic histone lysine methyltransferase 2 (NG36) −0.036 8.88×10^−7 Transferrin (TF) 0.041 1.03×10^−6 Latent transforming growth factor beta binding protein 4 (LTBP4) −0.037 1.61×10^−6 2.72×10^−7 Nicotinamide phosphoribosyltransferase (PBEF) −0.040 1.82×10^−6 Oxidized low‐density lipoprotein receptor 1 (OLR1) 0.040 2.23×10^−6 Interleukin‐9 (IL‐9) −0.040 2.29×10^−6 Notch‐3 −0.036 2.30×10^−6 1.30×10^−10 Pleiotrophin (PTN) −0.033 2.42×10^−6 1.44×10^−3 Alcohol consumption (g/d; n = 896) Thyroxine‐binding globulin −0.019 5.23×10^−17 1.75×10^−16 Laminin 0.018 9.74×10^−14 2.13×10^−11 Angiotensinogen 0.017 1.32×10^−12 3.15×10^−25 Carnosine dipeptidase 1 (CNDP1) 0.017 1.35×10^−12 7.44×10^−10 Cadherin E 0.016 1.62×10^−10 1.71×10^−7 GDNF family receptor alpha 1 (GFRa‐1) −0.014 2.34×10^−10 9.74×10^−14 Apolipoprotein L1 0.014 1.11×10^−9 7.19×10^−4 Plasminogen activator, tissue type (tPA) 0.013 2.10×10^−9 9.23×10^−5 Trypsin 2 0.014 5.24×10^−9 3.63×10^−17 Coagulation factor IXab 0.013 1.11×10^−8 4.84×10^−7 Insulin‐like growth factor binding protein 4 (IGFBP‐4) 0.014 1.60×10^−8 3.49×10^−6 Aminoacylase‐1 0.013 1.94×10^−8 8.98×10^−8 Coagulation factor IX 0.013 2.26×10^−8 4.11×10^−7 PolyUbiquitin K63 0.014 2.48×10^−8 1.26×10^−1 Serine peptidase inhibitor, Kunitz type 2 (SPINT2) −0.014 2.67×10^−8 6.74×10^−8 Erb‐b2 receptor tyrosine kinase 3 (ERBB3) 0.014 3.85×10^−8 7.98×10^−16 a2‐HS‐glycoprotein −0.014 5.28×10^−8 2.05×10^−5 Ectonucleotide pyrophosphatase/phosphodiesterase 7 (ENPP7) 0.013 7.07×10^−8 7.83×10^−6 Phosphoglycerate mutase 1 0.013 8.65×10^−8 1.05×10^−1 Phosphate‐induced (PHI) 0.013 9.42×10^−8 2.11×10^−1 HtrA serine peptidase 2 (HTRA2) 0.013 1.87×10^−7 5.52×10^−2 Apoprotein A‐I 0.013 2.17×10^−7 2.22×10^−11 Insulin‐like growth factor II receptor 0.012 2.47×10^−7 3.19×10^−6 Ferritin 0.011 3.19×10^−7 8.12×10^−3 Activated leukocyte cell adhesion molecule (ALCAM) −0.011 4.62×10^−7 2.16×10^−3 Interleukin 1 receptor–like 2 (IL‐1Rrp2) −0.013 6.37×10^−7 4.23×10^−5 Phosphodiesterase 11 (PDE11) 0.013 6.60×10^−7 6.12×10^−13 Coagulation factor X 0.013 7.23×10^−7 4.68×10^−15 Complement component C9 −0.012 2.19×10^−6 3.71×10^−7 Osteomodulin (OMD) −0.012 2.32×10^−6 6.07×10^−6 PAI (n=887) Leptin −0.680 6.64×10^−8 2.94×10^−3 Creatine kinase (CK‐MB) 1.054 7.74×10^−7 1.25×10^−5 Creatine kinase (CK‐MM) 0.986 3.43×10^−6 1.41×10^−6 Steps/d (n=784) L1 cell adhesion molecule (NCAM‐L1) 0.00005 7.80×10^−9 Sed PA (n=784) None Mod PA (n=784) None Mod/Vig PA (n=784) Lumican 0.007 1.39×10^−6 Vigorous PA (n=784) None [93]Open in a new tab The proteomic signature of smoking shows high representation of proteins related to cytokine–cytokine receptor interaction, type 1 T helper and type 2 T helper cell differentiation, interleukin‐17 (IL‐17) signaling pathway, chemokine signaling pathway, and complement and coagulation cascades (Figure [94]1 and Table [95]3). Additionally, 4 proteins (major histocompatibility class I related protein, matrix metallopeptidase 9, polymeric immunoglobulin receptor, and secretory leukocyte peptidase inhibitor) were specifically associated with current smoking (versus current non‐smokers) whereas no proteins were specifically associated with past smoking. The distributions of the regression residuals for the top 40 proteins that were significantly associated with smoking varied between smokers and non‐smokers (Figure [96]2). In proteomic profiles of alcohol consumption and physical activity, no pathway met the false discovery rate threshold (Figures [97]3 and [98]4, Table [99]3). There were 26 proteins specifically associated with current alcohol consumption, but no proteins specifically associated with past (non‐current) alcohol consumption. Figure 1. Volcano plots for the association between proteins and (A) smoking (vs never smoking; n=896) and (B) smoking (packs/d; n=895). Figure 1 [100]Open in a new tab The dotted line indicates the Bonferroni‐adjusted P value of 4.2×10^−6. The x‐axis refers to a standard deviation change in the proteomic biomarker. CAPG indicates capping actin protein gelsolinlike; IP‐10, interferon gamma–induced protein 10; LKHA4, leukotriene A‐4 hydrolase; MIC‐1, major histocompatibility class I–related protein; NCAM‐120, neural cell adhesion molecule; Nr‐CAM, neuronal cell adhesion molecule; OMD, osteomodulin; PIGR, polymeric immunoglobulin receptor; RGMB, repulsive guidance molecule BMP co‐receptor b; SLIK5, leucine‐rich repeat‐containing protein 11; and SLPI, secretory leukocyte peptidase inhibitor. Table 3. Top 10 Pathways Enriched With Genes Associated With Each Trait Trait Pathway Total Number of Genes in the Pathway Overlapping Genes Enrichment Ratio[101]* P Value False Discovery Rate Overlapping Genes Current smoker Cytokine‐cytokine receptor interaction[102] ^† 294 13 6.43 5.47×10^−8 1.78×10^−5 IL12B, CCL21, IL4, CXCL11, CCL22, CCL17, CXCL10, GDF15, BMPR1A, TNFRSF6B, CCL11, IL9, CXCL9 Th1 and Th2 cell differentiation[103] ^† 92 6 9.48 3.54×10^−5 4.09×10^−3 IL12B, IL4, JAG1, NOTCH1, NOTCH3, TYK2 IL‐17 signaling pathway[104] ^† 93 6 9.38 3.76×10^−5 4.09×10^−3 MMP9, IL4, CCL17, CXCL10, CCL11, S100A9 Chemokine signaling pathway[105] ^† 189 7 5.39 2.78×10^−4 2.27×10^−2 CCL21, CXCL11, CCL22, CCL17, CXCL10, CCL11, CXCL9 Asthma 31 3 14.07 1.20×10^−3 7.83×10^−2 IL4, CCL11, IL9 Cell adhesion molecules (CAMs) 144 5 5.05 2.94×10^−3 1.60×10^−1 CNTN1, CADM3, SELP, NCAM1, NRCAM Notch signaling pathway 48 3 9.09 4.26×10^−3 1.81×10^−1 JAG1, NOTCH1, NOTCH3 Endocrine resistance 98 4 5.94 4.45×10^−3 1.81×10^−1 MMP9, JAG1, NOTCH1, NOTCH3 Toll‐like receptor signaling pathway 104 4 5.59 5.49×10^−3 1.99×10^−1 IL12B, CXCL11, CXCL10, CXCL9 Complement and coagulation cascades 79 3 5.52 1.68×10^−2 5.33×10^−1 SERPIND1, F2, F9 Smoking –pack year Cytokine‐cytokine receptor interaction[106] ^† 294 10 5.26 1.37×10^−5 4.48×10^−3 IL12B, CCL21, CCL22, CCL17, TGFB3, CXCL10, GDF15, BMPR1A, CCL11, IL9 Complement and coagulation cascades[107] ^† 79 5 9.79 1.43×10^−4 2.33×10^−2 PLAUR, SERPIND1, A2M, F2, F3 IL‐17 signaling pathway[108] ^† 93 5 8.32 3.08×10^−4 3.35×10^−2 MMP9, CCL17, CXCL10, CCL11, S100A9 Endocrine resistance 98 4 6.31 3.55×10^−3 2.89×10^−1 MMP9, EGFR, JAG1, NOTCH3 Chemokine signaling pathway 189 5 4.09 7.13×10^−3 4.65×10^−1 CCL21, CCL22, CCL17, CXCL10, CCL11 Proteoglycans in cancer 198 5 3.91 8.63×10^−3 4.69×10^−1 IL12B, MMP9, PLAUR, EGFR, GPC3 Cell adhesion molecules (CAMs) 144 4 4.30 1.36×10^−2 6.34×10^−1 CNTN1, CADM3, NCAM1, NRCAM Asthma 31 2 9.98 1.69×10^−2 6.88×10^−1 CCL11, IL9 Th1 and Th2 cell differentiation 92 3 5.04 2.13×10^−2 7.71×10^−1 IL12B, JAG1, NOTCH3 Axon guidance 175 4 3.54 2.59×10^−2 7.95×10^−1 NTN4, SEMA6A, UNC5D, SEMA3E Alcohol Complement and coagulation cascades 79 4 11.50 3.68×10^−4 7.51×10^−2 PLAT, C9, F10, F9 African trypanosomiasis 35 3 19.48 4.61×10^−4 7.51×10^−2 APOL1, APOA1, SELE Cell adhesion molecules (CAMs) 144 3 4.73 2.48×10^−2 1.00×10^0 CDH1, SELE, ALCAM Glycolysis/ Gluconeogenesis 68 2 6.68 3.57×10^−2 1.00×10^0 PGAM1, GPI PPAR signaling pathway 74 2 6.14 4.16×10^−2 1.00×10^0 APOA1, UBC Biosynthesis of amino acids 75 2 6.06 4.27×10^−2 1.00×10^0 ACY1, PGAM1 Amoebiasis 96 2 4.73 6.63×10^−2 1.00×10^0 LAMA1, C9 PI3K‐Akt signaling pathway 354 4 2.57 6.79×10^−2 1.00×10^0 EFNA2, ERBB3, LAMA1, IL2 Glucagon signaling pathway 103 2 4.41 7.49×10^−2 1.00×10^0 PGAM1, GCG 2‐Oxocarboxylic acid metabolism 18 1 12.62 7.64×10^−2 1.00×10^0 ACY1 Physical activity JAK‐STAT signaling pathway 162 3 9.62 3.30×10^−3 6.30×10^−1 IL10RA, LEP, PIAS4 Arginine and proline metabolism 50 2 20.77 4.00×10^−3 6.30×10^−1 CKB, CKM Proteoglycans in cancer 198 3 7.87 5.80×10^−3 6.30×10^−1 ITGAV, LUM, PLAU Complement and coagulation cascades 79 2 13.15 9.75×10^−3 7.94×10^−1 C5, PLAU NF‐kappa B signaling pathway 95 2 10.93 1.39×10^−2 9.05×10^−1 PIAS4, PLAU Cytokine–cytokine receptor interaction 294 3 5.30 1.71×10^−2 9.29×10^−1 BMP6, IL10RA, LEP Ubiquitin‐mediated proteolysis 136 2 7.64 2.73×10^−2 1.00×10^0 PIAS4, STUB1 Fluid shear stress and atherosclerosis 138 2 7.53 2.80×10^−2 1.00×10^0 ITGAV, PIAS4 Cell adhesion molecules (CAMs) 144 2 7.21 3.03×10^−2 1.00×10^0 ITGAV, L1CAM Tuberculosis 179 2 5.80 4.52×10^−2 1.00×10^0 CEBPB, IL10RA [109]Open in a new tab ^* Enrichment ratio: the number of observed genes divided by the number of expected genes from each pathway. ^^† Meets the false discovery rate threshold. Figure 2. Box plots comparing the residual distributions of the top 40 proteins that were significantly associated with smoking for nonsmokers (NS; n=821) vs smokers (S; n=76). Figure 2 [110]Open in a new tab Y‐axis represents the residuals of the protein concentrations adjusted for age, sex, and body mass index. A, Represents proteins with inverse associations and (B) represents proteins with positive associations. *Proteins: (1) osteomodulin (OMD), (2) repulsive guidance molecule BMP co‐receptor b (RGMB), (3) neural cell adhesion molecule (NCAM‐120), (4) interferon gamma–induced protein (IP‐10), (5) neuronal cell adhesion molecule (Nr‐CAM), (6) adhesion G protein‐coupled receptor E2 (EMR2), (7) interleukin 23 (IL‐23), (8) leucine‐rich repeat‐containing protein 11 (SLIK5), (9) growth arrest specific 1 (GAS1), (10) brevican core protein (PGCB), (11) latent transforming growth factor beta binding protein 4 (LTBP4), (12) endocan, (13) dermatopontin (DERM), (14) neurogenic locus notch homolog protein 3 (notch‐3), (15) bone morphogenetic protein receptor type 1A (BMPR1A), (16) pleiotrophin (PTN), (17) unc‐5 netrin receptor D (UNC5H4), (18) jagged canonical notch ligand 1 (JAG1), (19) periostin, (20) semaphorin 3E, (21) carbonic anhydrase 6, (22) polymeric immunoglobulin receptor (PIGR), (23) major histocompatibility class I–related protein (MIC‐1), (24) secretory leukocyte peptidase inhibitor (SLPI), (25) leukotriene A‐4 hydrolase (LKHA4), (26) tetraspanin 5 (NET4), (27) trefoil factor 2, (28) capping actin protein gelsolinlike (CAPG), (29) heparin cofactor II, (30) eotaxin, (31) matrix metallopeptidase 10 (MMP‐10), (32) intercellular adhesion molecule 5 (sICAM‐5), (33) MMP‐9, (34) mevalonate disphosphate decarboxylase 2 (MDC), (35) ubiquitin‐conjugating enzyme E2G2 (UB2G2), (36) kynureninase (KYNU), (37) sialic acid binding Ig‐like lectin 7 (siglec‐7), (38) cathepsin H, (39) adrenomedullin, and (40) chemokine (C‐C motif) ligand 17 (TARC). Horizontal line in the boxes = median, top border of the boxes = 75th percentile, bottom border of the boxes = 25th percentile, whiskers above the boxes = min(max(x), 75th percentile + 1.5 * inter‐quartile range), whiskers below the boxes = max(min(x), 25th percentile ‐ 1.5 * inter‐quartile range). Figure 3. Volcano plot for the association between proteins and alcohol consumption (n=896). Figure 3 [111]Open in a new tab The dotted line indicates the Bonferroni‐adjusted P‐value of 4.2×10^−6. The x‐axis refers to a standard deviation change in the proteomic biomarker. CNDP1 indicates carnosine dipeptidase 1; GFRa‐1, GDNF family receptor alpha 1; SPINT2, serine peptidase inhibitor, kunitz type 2; and tPA, plasminogen activator tissue type. Figure 4. Volcano plots for the association between proteins and (A) physical activity index (n=887), (B) steps/d (n=784), (C) minutes of sedentary activity/d (n=784), (D) minutes of moderate physical activity/d (n=784), (E) minutes of moderate–vigorous physical activity/d (n=784), and (F) minutes of vigorous physical activity/d (n=784). Figure 4 [112]Open in a new tab The physical activity index was calculated as (hours sleeping+1.1×hours engaged in sedentary activities+1.5×hours engaged in slight physical activity+2.4×hours engaged in moderate physical activity+5×hours engaged in heavy physical activity).[113] ^22 The dotted line indicates the Bonferroni‐adjusted P value of 4.2×10^−6. The x‐axis refers to a standard deviation change in the proteomic biomarker. CK‐MB indicates creatine kinase M‐type: B‐type heterodimer; CK‐MM, creatine kinase M‐type; and NCAM‐L1, L1 cell adhesion molecule. Of the 20 proteins most strongly associated with smoking, 12 were significantly associated with at least 1 of 1671 genetic variants (protein quantitative trait loci [pQTLs]; P<5.0×10^−8) through GWAS analyses. These pQTLs included 14 SNPs associated with coronary artery disease, 2 with myocardial infarction, 6 with stroke, 4 with metabolic syndrome or metabolic traits, 2 with inflammatory biomarkers (including C‐reactive protein), 3 with chronic inflammatory diseases, 7 with allergic disease, allergy, or allergic sensitization, 1 with age‐related diseases and mortality, and 1 with cannabis use (Table [114]S3).[115] ^31 None of the 1671 pQTLs were associated with lung cancer, lung function, or chronic obstructive pulmonary disease.[116] ^31 , [117]^33 , [118]^34 , [119]^35 , [120]^36 , [121]^37 , [122]^38 , [123]^39 Of the 20 proteins most strongly associated with alcohol consumption, 13 had significant associations with at least 1 SNP in our database. There were a total of 2280 significant associations among the 13 proteins linked to alcohol consumption in our sample and pQTLs in our database, including 11 involving alcohol intake, alcohol dependence, or alcohol chronic pancreatitis, 3 with hypertriglyceridemia, 2 with gout, 2 with liver enzymes, and 2 with metabolic syndrome (Table [124]S3).[125] ^31 One of these SNPs was also associated with alcohol consumption and alcohol use disorder in a report of association studies involving up to 1.2 million individuals.[126] ^40 Discussion Given the need to better understand how lifestyle risk factors mediate effects at the molecular level, we used a nontargeted approach to determine whether concentrations of 1305 proteins were strongly associated with 3 modifiable lifestyle risk factors (smoking, alcohol, and physical activity) in 2 generations of FHS participants. To probe the possible biological meaning of the proteomic signatures for these 3 lifestyle risk factors, we considered the pathways enriched with key proteins as well as the pQTL variants associated with protein concentrations. Providing additional evidence that the proteomic signatures that we identified in the discovery cohort might be clinically meaningful, weighted predictive models estimated from the proteomic signatures for each smoking and alcohol consumption variable strongly predicted the respective lifestyle factor in the validation cohort. Our analysis provides a way of integrating information from high‐throughput proteomics assays, genetic associations, and biological pathways to identify possible mechanisms by which modifiable lifestyle factors may mediate systemic responses implicated in the pathogenesis of CVD. It builds on our previous investigation into the proteomic correlates of dietary patterns.[127] ^41 The first modifiable lifestyle factor we considered was smoking. We observed that 60 proteins were significantly associated with smoking. The top protein hits included proteins associated with SNPs of genes (eg, polymeric immunoglobulin receptor, IL‐23, heparin cofactor II, intercellular adhesion molecule 5 [sICAM‐5]) related to coronary artery disease, inflammation processes, and age‐related mortality,[128] ^31 suggesting that the proteomic profile that we identified may be related to several disease outcomes and related biological processes. Additionally, pathways enriched with genes associated with smoking included the complement and coagulation cascades and the IL‐17 signaling pathway that have been previously associated with cardiovascular health.[129] ^42 , [130]^43 Certain proteins associated with smoking have been previously associated with smoking in the toxicologic or epidemiological literature (eg, polymeric immunoglobulin receptor, capping actin protein gelsolin‐like, gelsolin, matrix metallopeptidase 10, and matrix metallopeptidase 12).[131] ^44 , [132]^45 , [133]^46 , [134]^47 , [135]^48 , [136]^49 Similarly, 19 of the proteins that we identified as significantly associated with smoking were included as part of a previously published plasma protein expression model for smoking.[137] ^29 Notably, we evaluated a larger set of proteins than most previous studies. To our knowledge, many of the associations we observed between smoking and protein concentrations are novel, potentially suggesting a larger molecular impact of smoking on biological systems than previously identified. Our observations are consistent with concomitant evidence that long‐term smoking affects a large proportion of the protein concentrations in the bronchoalveolar lavage[138] ^50 and sputum.[139] ^51 It is also consistent with evidence that smoking is associated with an epigenetic signature,[140] ^52 a transcriptomic signature,[141] ^53 and a microRNA signature.[142] ^54 As with smoking, we observed blood proteins associated with alcohol consumption. The proteomic correlates of smoking and alcohol consumption were almost mutually exclusive; only 1 protein (osteomodulin) was associated with both lifestyle factors. Our observations may suggest that although the modifiable lifestyle behaviors cluster within individuals,[143] ^55 , [144]^56 , [145]^57 the corresponding proteomic signatures may be distinctive. We observed both novel (to our knowledge) and previously identified proteomic associations with alcohol consumption. Certain proteins associated with alcohol consumption have been previously linked to alcohol consumption in the toxicological (eg, apolipoprotein‐A1 and phosphoglycerate mutase)[146] ^58 , [147]^59 and epidemiological (eg, activated leukocyte cell adhesion molecule, thyroxine‐binding globulin, and trypsin 2) literature.[148] ^29 Furthermore, certain proteins (eg, a2‐HS‐glycoprotein) and SNPs of genes coding for certain proteins (eg, trypsin 2) have been associated with alcohol‐related disease processes previously.[149] ^31 , [150]^60 In addition, top pathways enriched with genes encoding proteins (pQTLs) associated with alcohol consumption included complement and coagulation cascades, cell adhesion molecules, peroxisome proliferator‐activated receptors signaling pathway, and biosynthesis of amino acids. These pathways are notable because several have been implicated in CVD progression[151] ^42 , [152]^61 ; however, none met our false discovery rate threshold. Further investigation is needed to determine whether our observations can be replicated in a larger study. The proteomic correlates of physical activity included fewer proteins than corresponding associations for either smoking or alcohol consumption. Although physical activity is associated with hundreds of proteins,[153] ^29 , [154]^62 , [155]^63 we observed only proteins that were significantly associated with any of the physical activity variables. It is possible that exercise‐related associations with protein concentrations are tissue specific (eg, in skeletal muscle).[156] ^62 Additionally, we may not have evaluated many of the proteins associated with physical activity in our assay. Nevertheless, 2 of the 3 proteins (creatine kinase M‐type and creatine kinase M‐type B‐type heterodimer) assessed in our FOS validation sample had significant associations. Creatine kinase M‐type B‐type heterodimer is part of a previously published plasma protein expression model for physical activity energy expenditure.[157] ^29 In addition to having a known association with myocardial damage,[158] ^64 creatine kinase‐M‐type protein concentrations and gene expression have been associated with long‐term physical activity patterns and short‐term changes caused by physical activity.[159] ^65 , [160]^66 , [161]^67 Leptin, the third protein associated with PAI in the discovery sample, is most well known for its impact on eating behavior and energy balance,[162] ^68 but it has also been associated with blood pressure, vascular function, and cardiovascular disease.[163] ^69 Furthermore, in some populations, physical activity is associated with blood leptin concentrations.[164] ^29 , [165]^70 , [166]^71 Our analysis had several strengths including the large number of proteins considered, the use of the FOS cohort as a validation dataset, and the integration of existing GWAS catalog and molecular pathway databases. Our investigation also had several limitations. First, we only analyzed associations of lifestyle factors with a subset of plasma protein concentrations. It is possible that the relations would differ if we had considered a larger panel of circulating proteins or protein concentrations in different tissues. Second, our analysis was cross‐sectional. Whereas we assumed that it was more likely for the modifiable lifestyle factors to influence protein concentrations, it is possible that the directionality may be reverse or bidirectional. Third, our findings may be limited in generalizability because all of our participants were White individuals of European ancestry. Finally, whereas SOMAscan aptamer assays allow for high‐throughput quantification of protein concentrations, such technology could introduce errors because of cross‐reactivity and nonspecificity of select aptamers.[167] ^72 Future work could address these limitations by prospectively examining the associations between CVD risk factors and blood concentrations of a larger panel of proteins in a more diverse population. Additionally, future studies could explore the extent to which proteomic signatures may mediate the relations between lifestyle factors and risk of CVD. Our study focused on characterizing the proteomic signatures of modifiable lifestyle factors. By showing that lifestyle factors including smoking, alcohol consumption, and physical activity are associated with individuals' proteomic profiles and by relating these profiles to known biological pathways, we suggest potential mechanisms through which the lifestyle factors might affect the risk of chronic disease. More generally, characterizing proteomic signatures associated with common lifestyle factors can help elucidate the molecular mechanisms associated with disease initiation and progression, develop newer biomarkers of related processes, and inform future primordial prevention efforts. Sources of Funding The Framingham Heart Study acknowledges the support of contracts NO1‐HC‐25195, HHSN268201500001I, and 75N92019D00031 from the National Heart, Lung, and Blood Institute (NHLBI). Other support for this work came from TOPMed X01HL139389, RF1AG063507, NHLBI grants R01HL132320 and T32‐HL‐125232, Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD) grant K12HD092535, and Tufts University School of Medicine. Dr Vasan is supported in part by the Evans Medical Foundation and the Jay and Louis Coffman Endowment from the Department of Medicine, Boston University School of Medicine. No funder had any role in the design or conduct of the study, the collection, management, analysis, or interpretation of the data, the preparation, review, or approval of the manuscript, or the decision to submit the manuscript for publication. The authors declare that they have no conflicts of interest, including relevant financial interests, activities, relationships, or affiliations. The views and opinions expressed in the article do not necessarily represent those of the NHLBI, the National Institutes of Health, or the Department of Health and Human Services. Disclosures None. Supporting information Data S1 Tables S1–S3 Figure S1 [168]Click here for additional data file.^ (3.1MB, pdf) Acknowledgments