Abstract While previous studies identified common genetic variants associated with longevity in centenarians, the role of the rare loss-of-function (LOF) mutation burden remains largely unexplored. Here, we investigated the burden of rare LOF mutations in Ashkenazi Jewish individuals from the Longevity Genes Project and LonGenity study cohorts using whole-exome sequencing data. We found that centenarians had a significantly lower burden (11-22%) of LOF mutations compared to controls. Similar effects were also observed in their offspring. Gene-level burden analysis identified 35 genes with depleted LOF mutations in centenarians, with 14 of these validated in the UK Biobank. Mendelian randomization and multi-omic analyses on these genes identified RGP1, PCNX2, and ANO9 as longevity genes with consistent causal effects on multiple aging-related traits and altered expression during aging. Our findings suggest that a protective genetic background, characterized by a reduced burden of damaging variants, contributes to exceptional longevity, likely acting in concert with specific protective variants to promote healthy aging. Subject terms: Genotype, Ageing, Genetic variation __________________________________________________________________ Previous studies have identified common genetic variants linked to longevity, but the impact of rare damaging mutations remains unclear. Here, the authors show that centenarians carry fewer harmful loss-of-function mutations and identify genes that may contribute to extreme longevity and healthy aging Introduction Aging is a complex process characterized by an accumulation of molecular damage, progressive decline in physiological function, increased susceptibility to disease, and, ultimately, higher risk of mortality^[42]1. While chronological age is a major risk factor, there is remarkable variability in how individuals age, with some experiencing severe disability and premature death while others maintain good health well into old age^[43]2,[44]3. This heterogeneity suggests that aging is a multifactorial process shaped by both genetic and environmental factors^[45]4. At the extreme end of the lifespan spectrum are centenarians, individuals with exceptional longevity who have reached the age of 100 years or more. Centenarians represent a rare and valuable model of successful aging, often displaying delayed onset or escape from major age-related diseases such as cardiovascular disease, diabetes, and dementia^[46]5,[47]6. Furthermore, many maintain physical and cognitive function, as well as independence, well into old age^[48]7. Understanding the factors that contribute to their exceptional longevity could provide valuable insights into the biology of healthy aging and lifespan determination. Studies in model organisms have firmly established that lifespan has a significant genetic component. Single gene mutations in pathways related to insulin/insulin-like growth factor-1 (IGF-1) signaling, mechanistic target of rapamycin (mTOR) signaling, and AMP-activated protein kinase (AMPK) signaling have been shown to dramatically extend lifespan in yeast, worms, flies, and mice^[49]8. Many of these pathways are evolutionarily conserved, suggesting they may play a role in human aging as well. Indeed, functional variants in the IGF-1 receptor have been identified in centenarians, supporting a role for this pathway in exceptional longevity^[50]9. In humans, genome-wide association studies (GWAS) have identified numerous common genetic variants associated with longevity, defined as attaining exceptional old age or having long-lived parents^[51]10,[52]11. However, these variants explain only a small portion of the heritability (12%)^[53]11, suggesting that rare variants may also play an important role^[54]12. Rare variants, particularly those that lead to loss of gene function (LOF), are of great interest in studying human lifespan. LOF variants, including nonsense, splice-site, and frameshift mutations, are generally deleterious and subject to strong purifying selection^[55]13. An increased burden of LOF mutations has been observed in individuals with shorter lifespans and shorter period of life people spent free of disease (or healthspan), suggesting this may significantly impact human health^[56]14. However, LOF variants that confer protective effects, such as those in the APOC3 and PCSK9 genes associated with a lower risk of cardiovascular disease have also been identified^[57]15,[58]16. Despite the growing evidence for the importance of rare variants in aging, the overall burden of LOF mutations in exceptionally long-lived individuals compared to controls has not been systematically examined. A previous study observed no difference in the burden of pathogenic variants between centenarians, their offspring, and controls^[59]17. However, this study did not specifically focus on LOF variants or incorporate key covariates that may introduce batch effects confounding the results. Furthermore, the sample size was smaller than in the present study, limiting the power to detect significant differences. Another study found that the burden of rarest protein-truncating variants (PTVs) in two large cohorts was negatively associated with human healthspan and lifespan, accounting for 0.4 and 1.3 years of their variability, respectively^[60]14. In this study, we leveraged whole-exome sequencing data from a large cohort of Ashkenazi Jewish centenarians and controls to comprehensively compare the burden of rare LOF variants (Fig. [61]1a). By focusing on a genetically homogeneous population, we minimized the potential confounding effects of population stratification. Importantly, we incorporated the dates of recruitment and birth as coefficients in our analysis to control for cohort effects and potential secular trends in environmental and lifestyle factors that may impact lifespan. Our results suggest that centenarians have a lower burden of LOF mutations compared to controls. This depletion was observed across multiple categories of predicted deleterious variants. Furthermore, we performed a genome-wide association study to identify specific genes and pathways that were enriched for protective variants in centenarians. Several genes reached suggestive significance levels, and pathway analysis revealed a depletion of variants in pathways related to hyaluronan metabolism, G-protein receptors, post-translational protein modification, and mitochondrial translation. Notably, 14 out of 35 of these gene associations were validated in an independent cohort from the UK Biobank based on parental lifespan-related traits, supporting the reproducibility of our findings. Together, these results provide new insights into the genetic architecture of human exceptional longevity and highlight potential molecular mechanisms that may contribute to healthy aging. Further studies will be necessary to validate and functionally characterize the roles of these genes and pathways in promoting longevity. Fig. 1. Demographic characteristics and mutation burden of study cohorts. [62]Fig. 1 [63]Open in a new tab a Schematic graph of the study design and analysis performed. Created in BioRender. Biosciences, R. (2022) BioRender.com/i68q986. b Date of birth and date of recruitment for centenarians, offspring, and controls from the Longevity Genes Project (LGP) and LonGenity study cohorts. Each vertical bar represents an individual, color-coded by their cohort status. c Distribution of the cumulative mutation burden across different categories of predicted deleterious variants in centenarians, offspring, and controls. The categories include pLOF only, pLOF and missense, pLOF and predicted deleterious missense (5/5 algorithms predict a deleterious variant), and pLOF and predicted deleterious missense (at least 1/5 algorithms predict a deleterious variant). Each dot represents an individual, with the violin plot showing the distribution of mutation counts within each category and cohort. OPEL: Offspring of Parents with Exceptional Longevity; OPUS: Offspring of Parents with Usual Survival. Source data are provided as a Source Data file. Results The whole-exome sequencing data was obtained from 637 centenarians, 917 offspring of centenarians, and 595 controls from the Longevity Genes Project (LGP) and LonGenity study cohorts of Ashkenazi Jewish individuals (Table [64]1, Fig. [65]1a, Methods)^[66]18. Based on the demographic characteristics, participants were recruited continuously over a period of 20 years (2000–2020). However, the recruited centenarians were mostly born between 1900 and 1920, while most of the offspring and controls were born between 1920 and 1960 (Fig. [67]1b). This suggests that the direct comparison of mutation burden between centenarians and controls may potentially be confounded by the date of recruitment and date of birth. Table 1. Demographic characteristics of the study cohorts Cohort Sample size Males Females Mean age at visit (age range, years) Standard deviation of age at visit (years) Longevity Genes Project Control 224 125 99 70.99 (42.26-93.33) 9.85 Longevity Genes Project Control (Filtered) 147 84 63 66.27 (42.26-84.51) 7.96 Longevity Genes Project Offspring 473 246 227 67.53 (42.88-92.45) 7.98 Longevity Genes Project Proband (Centenarians) 637 464 173 97.70 (84.45-110.10) 3.35 Longevity Genes Project Proband (Centenarians, Filtered) 338 249 89 99.46 (84.45-110.10) 3.27 LonGenity Study Offspring of Parents with Exceptional Longevity 444 265 179 74.20 (61.90-94.08) 6.13 LonGenity Study Offspring of Parents with Usual Survival 371 196 175 76.32 (64.67-97.93) 7.02 LonGenity Study Offspring of Parents with Usual Survival (Filtered) 273 141 132 73.34 (64.67-87.33) 5.22 [68]Open in a new tab The table presents the sample size, number of males and females, female proportion, mean age at visit, and standard deviation of age at visit for each cohort in the Longevity Genes Project and LonGenity Study. We identified loss-of-function (LOF) mutations based on the following criteria: alternate allele frequency (AAF) < 1%, Hardy–Weinberg equilibrium (HWE) threshold of 10^−15, and variant missingness <10%. We classified the variants into different categories based on their predicted deleteriousness: pLOF only, pLOF and missense, pLOF and predicted deleterious missense (5/5 algorithms predict a deleterious variant), and pLOF and predicted deleterious missense (at least 1/5 algorithms predict a deleterious variant). The deleteriousness of missense variants was assessed using five different computational methods (Method). We counted the cumulative mutation burden in centenarians, their offspring, and controls across different categories of predicted deleterious variants. Consistent with the potential confounding effects of dates of recruitment and birth, we initially observed a similar distribution of LOF mutations across the different categories in centenarians and controls (Fig. [69]1c). We performed quality control and filtering, retaining 338 centenarians with recorded age over 100 years old and 420 controls with age less than 90 years old (Table [70]1, Methods). We observed a similar distribution of raw mutation count after filtering (Supplementary Fig. [71]1) We then performed the count-based burden test using linear regression models. Furthermore, we found that even without adjusting for potential confounders, offspring but not centenarians showed a significantly lower mutation burden in all pLOF categories (Supplementary Fig. [72]2). This is likely due to the smaller batch effect between offspring group and control, compared to the centenarian group. We also showed that there is no significant difference observed between centenarians and their offspring (Supplementary Fig. [73]3). To account for these potential confounders, we binned the dates of recruitment and birth and added them as coefficients in the burden test model. After adjusting for these covariates, we found a consistent and significant trend of lower burden of LOF mutations in centenarians and their offspring compared to controls across all categories of predicted deleterious variants (Fig. [74]2). Notably, the depletion of LOF variants was statistically significant for centenarians in all categories, including the pLOF-only category (b = −5.5, p = 0.0453). The effect sizes for centenarians ranged from −5.5 to −39.6, indicating a 11% to 22% reduction in mutation burden compared to controls. Furthermore, the offspring of centenarians also exhibited a significantly lower mutation burden compared to controls in both the LGP (related to LGP centenarians) and LonGenity (unrelated to LGP centenarians) cohorts (Fig. [75]2). The effect sizes for offspring were smaller than those observed for centenarians, but still significant, with p-values ranging from 1.17e-07 to 4.99e-4 in the LGP cohort and from 4.52e-4 to 0.021 in the LonGenity cohort. These results suggest that the protective effect of a lower LOF mutation burden may be inherited by the offspring of centenarians, contributing to their increased likelihood of exceptional longevity. Fig. 2. Burden test results for different categories of predicted deleterious variants, adjusting for potential confounders. [76]Fig. 2 [77]Open in a new tab The points show the effect sizes (beta coefficients) and the error bars show the 95% confidence intervals from the linear regression models, with the mutation burden as the dependent variable and the cohort status (centenarian or offspring, compared to control) as the independent variable. The models were adjusted for the binned date of recruitment and date of birth as covariates. The p-values for each comparison are shown on the right side of the plot. The categories of predicted deleterious variants include pLOF only, pLOF and missense, pLOF and predicted deleterious missense (5/5 algorithms predict a deleterious variant), and pLOF and predicted deleterious missense (at least 1/5 algorithms predict a deleterious variant). Source data are provided as a Source Data file. We also performed a sensitivity analysis by using different covariates, including age at recruitment, top 10 genetic principal components, numerical date of birth, and date of recruitment, and found consistent results for centenarian offspring in the LGP cohort (Supplementary Fig. [78]4). Statistical significance for centenarians and the LonGenity cohort was sensitive to the choice of covariates, suggesting that the genetic associations with longevity are complex and possibly influenced by unmeasured factors. To identify specific genes and pathways that carry a lower mutation burden in centenarians, we performed a gene-level and pathway-level burden test. The gene-level analysis identified 35 genes that reached the significance level at FDR < 0.05 (Fig. [79]3a). Remarkably, 14 out of these 35 genes were validated in an independent study from the UK Biobank using parental lifespan-related traits (Fig. [80]3a)^[81]19. Note that this is an indirect validation as the genetics of exceptional longevity and parental lifespan, while having similarities, may still obtain different characteristics. Pathway-level analysis revealed processes related to hyaluronan metabolism, Class A/1 (Rhodopsin-like receptors), post-translational protein modification, and mitochondrial translation reached the significance level at FDR < 0.05 (Fig. [82]3b). We observed a mild inflation in our test statistics, with a genomic inflation factor (λ) of 1.57. After adjusting for the inflation, the top three pathways still reached the suggestive FDR threshold of 0.2. These results suggest that the depletion of mutations in these pathways may contribute to exceptional longevity. Fig. 3. Gene-level and pathway-level burden test results. [83]Fig. 3 [84]Open in a new tab a Manhattan plot showing the -log10(p-values) for the gene-level burden test in centenarians compared to the control group. Each dot represents a gene, with the x-axis showing the genomic position and the y-axis showing the statistical significance. The red horizontal line indicates the P-value threshold at FDR < 0.05 after adjusted for multiple comparisons. One-sided tests were performed since we only focused on genes with fewer LOF variants in centenarians and the offspring. The top genes reaching FDR < 0.05 are labeled, with those validated in the UK Biobank cohort marked with an “+”. b Pathway enrichment analysis results in centenarians compared to the control group. The scatter plot shows the enriched pathways, with the x-axis representing the expected -log10(p-values) and the y-axis representing the observed -log10(p-values). Two horizontal dashed lines show the P-value threshold at FDR < 0.05 and suggestive threshold of FDR < 0.2 after adjusted for multiple comparisons. One-sided tests were performed. Top pathways are labeled, with pathways passed FDR < 0.05 colored in black and pathways passed suggestive threshold of FDR < 0.2 colored in gray. We also calculated the genomic inflation factor (λ) of 1.57. Top pathways still reach the suggestive threshold of FDR after adjusting for inflation. Source data are provided as a Source Data file. To further investigate the potential causal effects of the identified longevity-associated genes on lifespan-related traits, we performed Mendelian Randomization (MR) analyzes using public blood gene expression QTL data from eQTLgen and GWAS summary statistics of multiple lifespan-related traits (Fig. [85]4a)^[86]20. It is important to note that while the MR analysis uses common variants (eQTLs) rather than rare coding variants, it can provide complementary evidence about a gene’s role in longevity through different mechanisms. MR analysis revealed that seven genes had significant causal effects on multiple lifespan-related traits, such as frailty index, healthspan, lifespan, and extreme longevity (90th and 99th percentiles), and lifespan-GIP1 (the genetic principal component of healthy longevity, Methods). Among them, three genes (RGP1, PCNX2, and ANO9) showed consistent pro-longevity effects across the multiple traits tested, supporting their potential roles in promoting longevity as suggested by burden analysis, while the other four genes showed anti-longevity effects. On the other hand, two of the genes (DYNC1H1 and GALNT12) only show a significant protective effect on one trait (lifespan and extreme longevity at 99th percentile), while PKP4 only shows a significant positive effect on healthspan but not in other traits. The other four genes (ZNF446, PLA2G4B, EFNA3, and ABCF3) show inconsistent effects on lifespan-related traits. Fig. 4. Mendelian randomization and multi-omic analyzes of longevity-associated genes. [87]Fig. 4 [88]Open in a new tab a Mendelian Randomization (MR) analyzes of blood gene expression on aging-related traits. The point shows the estimated causal effect sizes (beta) and the error bars show the 95% confidence intervals for each gene on various aging-related traits, including aging-GIP1, frailty index, healthspan, lifespan, and extreme longevity (90th and 99th percentiles). The forest plot is colored based on the significance and consistency of the causal effect estimates across different traits. Genes with FDR < 0.05 after being adjusted for multiple comparisons are colored in orange. Two-sided tests were performed. The blood eQTLs summary statistics are obtained from eQTLgen study (n = 31,684). b–e Bar plots showing the -log10(P-value) multiplied by the sign of the effect size (beta) for each longevity-associated gene. Associations with nominal P < 0.05 are colored in yellow, and those with FDR < 0.05 after being adjusted for multiple comparisons are colored in red. Two-sided tests were performed. Data sources include exome-wide association analysis with parental lifespan from GeneBass (b), age-related changes in promoter DNA methylation using data from 500 individuals in the MGB biobank (c), age-related changes in blood gene expression using data from the transcriptome-wide association study (TWAS) for aging by Peters et al. 2015 (d), and age-related changes in plasma protein levels using Olink data from 53,015 UK Biobank participants (e). Positive values indicate an increase, while negative values indicate a decrease with age. f Heatmap showing the significance scores of longevity-associated genes across different aging signatures, including human aging, rodent aging, and interventions (caloric restriction, rapamycin, and growth hormone deficiency). Color intensity represents the significance of the association, with red indicating a positive significance score and blue a negative significance score. Associations that reach nominal significance (P < 0.05) are marked with small circles, and those with FDR < 0.05 after being adjusted for multiple comparisons are marked with large circles. Two-sided tests were performed. Source data are provided as a Source Data file. We then profiled the multi-omic associations of the identified longevity-associated genes to provide a systematic evaluation of their expression and regulation during aging (Fig. [89]4b–e). Comparison with exome-wide gene-level associations with parental lifespan obtained from GeneBass (Fig. [90]4b)^[91]19, showed that six out of seven causal genes were significantly associated with parental lifespan, three genes (MLXIP, PCNX2, and DYNC1H1) remain significant after corrected with multiple-testing with FDR. Analysis of age-related changes in promoter DNA methylation using data from 500 individuals in the Massachusetts General Brigham (MGB) biobank (Fig. [92]4c) revealed significant changes for most longevity-associated genes, except two (RGP1 and BCLAF1). Similarly, age-related changes in blood gene expression obtained from the transcriptome-wide association study (TWAS) for aging by Peters et al. (Fig. [93]4d) showed significant changes for genes such as OPN3, PCNX2, GALNT12, and RGP1^[94]21. Furthermore, age-related changes in plasma protein levels using Olink data from 53,015 UK Biobank participants (Fig. [95]4e) revealed significant changes for proteins encoded by DYNC1H1 and FLT4 genes. The results suggest that the expression and regulation of these longevity-associated genes are altered during the aging process. To gain further insights into the potential relevance of the identified longevity-associated genes in aging and interventions, we further compared their significance scores across different signatures of aging and longevity interventions (Fig. [96]4f)^[97]22,[98]23. The signature analysis results in 69 significant associations after adjusting for multiple testing of 266 tests using FDR (Fig. [99]4f). It revealed that many of these genes (18 out of 21 tested) were also significantly associated with aging in humans and rodents, as well as with interventions known to extend lifespan, such as caloric restriction (ABCF3, CKAP2L, and CEP68), rapamycin treatment (PKP4, CTNND1, and RTRAF), growth hormone deficiency (HOGA1, ANKRD33, and MLXIP), as well as overall lifespan after intervention (HOGA1). Together, this multi-layered evidence supports the potential roles of these genes in regulating healthy aging and longevity. Discussion In this study, we have discovered that centenarians, within the large cohort we examined, possess a significantly lower burden of predicted deleterious LOF variants compared to controls. This finding suggests that a protective genetic background, characterized by the depletion of damaging coding mutations, contributes to the exceptional longevity of centenarians. Notably, we also observed a lower mutation burden in centenarian offspring, although the effect was less pronounced. These findings support the notion of a heritable component to longevity outside of protective and common variants and suggest that the combined genetic background, including protective variants and depletion of damaging variants, may be transmitted across generations to support exceptional longevity. Our results are consistent with previous studies that reported an increased burden of LOF variants in individuals with shorter lifespans and age-related diseases^[100]14,[101]24, and provide further evidence for the role of rare coding variants in extreme human longevity. Our study extends these findings by demonstrating that the depletion of LOF variants in centenarians is not limited to the rarest variants but is observed across multiple categories of predicted deleterious variants. However, our findings contrast with those of another study that observed no difference in the burden of pathogenic variants between centenarians, their offspring, and controls^[102]17. This discrepancy may be due to differences in study design, such as the focus on LOF variants specifically, the larger sample size of our study, and the adjustment for potential confounding factors such as date of recruitment, age at recruitment, and date of birth. Besides, due to the retrospective nature of the centenarian study, the centenarians usually have different demographic properties (age, date of birth, and potentially other early life exposures) compared to the control group. While this can be addressed by including these features as covariates, this demographic disparity between centenarians and controls emerges as a critical factor limiting the statistical power of centenarian studies (Fig. [103]1). In contrast, centenarian offspring, demographically more similar to controls, yield stronger statistical evidence, corroborating our findings in centenarians. Future prospective studies with improved demographic matching are essential to elucidate the role of LOF variants in exceptional longevity. Our pathway analysis revealed that centenarian exomes are depleted of LOF variants in several pathways related to aging and disease, including Class A/1 (Rhodopsin-like receptors), hyaluronan metabolism, post-translational protein modification, and mitochondrial translation. Class A/1 (Rhodopsin-like) receptors are involved in various physiological processes and have been implicated in age-related diseases, suggesting their potential role in longevity^[104]25. Hyaluronan is a key component of the extracellular matrix that has been shown to decline with age, and its increase contributes to the extension of lifespan^[105]26. Variants that maintain hyaluronan homeostasis may, therefore, promote healthy aging in humans. Post-translational protein modifications play crucial roles in protein function and stability, and their dysregulation has been associated with various age-related diseases^[106]1. Mitochondrial translation has also been linked to lifespan extension in model organisms^[107]27. To complement our analysis of rare LOF variants, we also investigated the causal role of identified longevity genes in aging-related traits using MR analyzes. This approach allows us to infer potential causal relationships between gene expression and phenotypes of interest by using eQTLs (common variants that are associated with gene expression) as instrumental variables. Our MR analyzes provided evidence for the causal effects of several longevity-associated genes, including RGP1, PCNX2, and ANO9, on multiple aging-related traits. PCNX2 was identified to be associated with longevity in an independent GWAS study^[108]28, while ANO9 was associated with various cancers^[109]29. These findings suggest that these genes may directly influence the aging process and contribute to the extended healthspan and lifespan. The consistent causal effect estimates across different aging-related traits further support the robustness of these associations. Interestingly, our analyzes also revealed genes with more nuanced effects on longevity. For instance, DYNC1H1 and GALNT12 showed significant deleterious effects on only one trait each (lifespan and extreme longevity at the 99th percentile, respectively), while PKP4 demonstrated a significant positive effect solely on healthspan. This suggests that these genes may influence particular aspects of the aging process rather than having a broad impact on all longevity-related traits. Moreover, the inconsistent effects observed for genes such as ZNF446, PLA2G4B, EFNA3, and ABCF3 across different lifespan-related traits underscore the complexity of genetic influences on aging and longevity. The multi-omic analyzes revealed that the expression and regulation of many longevity-associated genes are altered during aging, specifically, 29 out of 31 for DNA methylation, 4 out of 11 for gene expression, and 2 out of 2 for plasma protein (Fig. [110]4). Follow-up studies are needed to elucidate the specific mechanisms by which these genes and their encoded proteins contribute to healthy aging and longevity. Future studies could also explore the relationship between the burden of deleterious germline mutations and the rate of biological aging in centenarians and the general population. Epigenetic clocks, which measure biological age based on DNA methylation patterns, have emerged as a promising tool for assessing the pace of aging^[111]30,[112]31. Previous studies have shown that centenarians exhibit slower epigenetic aging rates compared to the general population^[113]32. Integrating rare variant burden data with epigenetic clock measures could provide novel insights into the interplay between genetic and epigenetic factors in shaping the rate of aging and exceptional longevity, especially with current standardized tools like ClockBase and Biolearn^[114]33,[115]34, as well as advanced aging clocks, including GrimAge2^[116]35, DunedinPace^[117]36, and causality-enriched clocks^[118]37. Such studies may uncover whether the reduced burden of harmful mutations observed in centenarians contributes to their slower biological aging rates. Our study also has several limitations. First, while we adjusted for several important covariates, there may be other confounding factors that were not accounted for, such as environmental exposures and lifestyle factors. Second, our study focused on a specific population (Ashkenazi Jews), although validation analysis in UK biobank suggests that the result may be generalizable to other ethnic groups. Future studies in diverse populations will be necessary to confirm the generalizability of our findings. Third, the validation analysis is based on parental lifespan traits in the UK biobank. Although previous studies on common variants show a substantial similarity between parental lifespan and exceptional longevity (r[g] = 0.81)^[119]38, it is unclear how similar the rare genetic variants contribute to these two traits. Future validation and meta-analysis with other centenarian cohorts may help strengthen the robustness of our findings. Fourth, our study relied on computational predictions of variant deleteriousness, which may not always reflect the true biological impact of a variant. Functional studies will be necessary to validate the causal roles of the identified variants and genes in longevity. It is important to acknowledge that some LOF and missense variants can be protective, as demonstrated by previous studies^[120]39–[121]41. However, our hypothesis is that the overall probability of LOF variants being protective is lower than the probability of them being deleterious. This is because damaging a component in a complex system is more likely to have a detrimental effect than a protective one^[122]42. Additionally, there is a selection bias, as highly damaging mutations are under-represented in the population, while highly protective mutations are preserved^[123]43. These factors may explain the small effect sizes observed in our study. It should also be noted that we did not identify any protective LOF variants (i.e., enrichment of LOF variants in centenarians) as demonstrated in previous study^[124]18, because we used a one-tailed test, focusing only on the depletion of LOF variants. In conclusion, our study provides new insights into the genetic architecture of human exceptional longevity, exemplified by individuals who live to 100 years or beyond, highlighting the importance of rare LOF variants and identifying novel genes and pathways that may promote healthy aging. We demonstrate that centenarians have a lower burden of predicted deleterious LOF variants compared to controls and that this protective genetic background may be transmitted across generations. Our findings also underscore the complex interplay between genetic variation, environmental factors, and age-related diseases in shaping human lifespan. Further studies in diverse populations and integrating multiple omics data will be necessary to fully elucidate the mechanisms underlying exceptional longevity and develop targeted interventions to promote healthy aging. Nonetheless, our results represent an important step towards understanding the genetic basis of human longevity and provide a foundation for future studies in this field. Methods Study population and data collection The study population was derived from two ongoing studies of aging and longevity in the Ashkenazi Jewish population: the cross-sectional Longevity Genes Project (LGP) and the longitudinal LonGenity study^[125]18. The LGP cohort consisted of 637 individuals with exceptional longevity, 473 offspring of long-lived individuals, and 224 controls, while the LonGenity cohort included 444 offspring of centenarians and 371 controls. All participants provided written informed consent, and the study was approved by the Institutional Review Board at Albert Einstein College of Medicine. For the analysis, we applied filtering criteria to ensure the inclusion of appropriate individuals in each group. In the centenarian group, we removed individuals with a death or dropout record before 100 years, retaining 338 exceptionally long-lived centenarians. Similarly, in the control group, we removed individuals without death or dropout record before 90 years, resulting in 147 individuals from the LGP cohort and 273 individuals from the LonGenity cohort being included in the analysis (Table [126]1). Whole-exome sequencing DNA samples from all participants were subjected to whole-exome sequencing using the Illumina HiSeq 2000 platform at the Regeneron Genetics Center^[127]17. The sequencing reads were aligned to the human reference genome (hg38) using the Burrows-Wheeler Aligner (BWA-mem v0.7.17)^[128]44, and duplicate reads were removed using Picard tools (version 1.96, [129]http://broadinstitute.github.io/picard/). Variant calling was performed using the Genome Analysis Toolkit (GATK v3.7)^[130]45. Quality control and variant annotation After genomic principal component analysis (PCA), four individuals with non-European ancestry were excluded from the study. Quality control filtering was applied to remove potentially false-positive variants and genotype calls. Variants were filtered based on the following criteria: alternate allele frequency (AAF) < 1% in the Ashkenazi Jewish population, Hardy–Weinberg equilibrium (HWE) P-value > 10^−15, and variant missingness <10%, as suggested by a previous study^[131]46. After QC filtering, autosomal-only variants with a minimum allele count (MAC) of 1 were divided into sets for centenarians, offspring, and controls for downstream analysis. Loss-of-function (LOF) variants were defined as nonsense, splice-site, or frameshift mutations. Missense variants were classified as (1) possible deleterious missense mutation if they were predicted to be damaging by at least 1 out of 5 algorithms (SIFT^[132]47, Polyphen2_HDIV^[133]48, Polyphen2_HVAR^[134]48, LRT^[135]49, and MutationTaster^[136]50) or (2) deleterious missense mutation if all five algorithms predicted them to be damaging. SIFT (v6.2.1), Polyphen2_HDIV (v2.2.2), Polyphen2_HVAR (v2.2.2), LRT (v2016), and MutationTaster (v2021) were used in this analysis. Burden test analysis Prior to the burden test, we removed the individual in the extreme longevity group with a lifespan or last reported age less than 100 years old. Therefore, only the 338 centenarians are kept. Similarly, individuals in the control group with last reported age larger than 90 years old were also removed, with the remaining 147 individuals from LGP and 273 individuals for lonGenity (Table [137]1). Descriptive statistics were used to summarize the demographic characteristics of the study population. The cumulative mutation burden for each individual was calculated as the total number of population-level LOF (pLOF) and predicted deleterious missense variants. Mutation burden is calculated based on different categories of predicted deleterious variants (pLOF only, pLOF and deleterious missense [5/5 algorithms], and pLOF and possible deleterious missense [≥1/5 algorithms], pLOF and all missense). Count-based burden tests were performed using linear models with binned covariates to account for potential confounding factors, such as date of recruitment, date of birth, gender, age at visit, and top four genomic principal components^[138]51. The cumulative mutation burden was used as the dependent variable, and the independent variables included centenarian status (or offspring status), binned date of recruitment, and binned date of birth. Sensitivity analyzes were conducted by including additional covariates, such as age at recruitment, top 10 genetic principal components, numerical date of birth (i.e., number of days since 1900-01-01), and date of recruitment. Gene-level and pathway-level burden analysis Gene-level and pathway-level burden tests were performed using linear models, with the cumulative mutation burden in each gene or pathway as the dependent variable and centenarian status as the independent variable. Only genes containing at least five pLOF variants across the cohort were included. In total, 4925 unique genes were tested, and the significance threshold for gene-level tests was set at FDR < 0.05. Significant gene-level associations were replicated using summary statistics from a gene-based association study of paternal or maternal lifespan in the GeneBass from UK biobank^[139]19. The significance threshold for replication was set at P < 0.05. Mendelian randomization To investigate the causal relationships between gene expression and aging-related traits, we performed Mendelian Randomization (MR) analyzes using blood cis-eQTL data from eQTLgen, which includes 31,684 blood samples from 37 studies^[140]20. The outcome traits included aging-GIP1, frailty index, healthspan, lifespan, and extreme longevity (90th and 99th percentiles). The parental lifespan GWAS was used as a proxy for individual lifespan and included 512,047 mothers and 500,193 fathers of European ancestry^[141]11. The extreme longevity GWAS included 11,262 European subjects with a lifespan above the 90th percentile and 25,483 controls below the 60th percentile age^[142]10. Healthspan, defined as the age of the first incidence of major age-related diseases or death, was analyzed using a GWAS of 300,447 UK Biobank participants aged 37–73^[143]52. The frailty index GWAS included 164,610 UK Biobank participants aged 60–70 and 10,616 Swedish TwinGene participants aged 41–87^[144]53. Aging-GIP1, the first genetic principal component of six human aging traits, captures both length of life and well-being indices^[145]54. We performed cis-Mendelian Randomization following the approach described by Ying et al^[146]37. Genetic variants strongly associated with whole blood gene expression levels (FDR < 0.05) were selected as instrumental variables for the MR analysis. To minimize pleiotropic effects, only cis-eQTLs (located within 2 MB of target genes) were used, and LD clumping was applied to remove eQTLs with strong LD (r^2 > 0.3). We employed three MR methods based on the number of available eQTLs: Wald ratio for a single eQTL, generalized inverse variance weighted (gIVW) for at least two eQTLs, and generalized MR-Egger regression (gEgger) for at least three eQTLs^[147]55. The gEgger method is robust to directional pleiotropy, we therefore reported the P value from gEgger if pleiotropy is detected by gEgger intercept. Multi-omic analysis of the identified longevity-associated genes To systematically evaluate the expression and regulation of the identified longevity-associated genes during aging, we profiled their multi-omic associations using various datasets. We obtained the exome-wide gene association with parental lifespan using summary statistics from GeneBass^[148]19. Blood gene expression changes with age were obtained from the transcriptome-wide association study (TWAS) for aging by Peters et al^[149]21. Age-related changes in promoter DNA methylation were assessed using data from 500 individuals in the Mass General Brigham (MGB) Biobank, which is also described in this study^[150]56. DNA methylation profiles were generated using the Illumina Infinium MethylationEPIC v2.0 array, which covers over 935,000 CpG sites enriched for regulatory regions^[151]56. The cohort comprised subjects of diverse ages, roughly balanced between male and female, and generally representative of the racial/ethnic distribution of the local area. For each CpG site associated with our identified longevity-associated genes, we performed a linear regression to predict the methylation beta value using age, where the regression coefficient and p-value are calculated. The CpG with the strongest association with age is used to represent the result. Age-related changes in plasma protein levels were investigated using Olink proteomics data from 53,015 UK Biobank participants (UK Biobank Record Table [152]1072). Only two of our identified longevity-associated genes are presented in the Olink panel. We then performed a linear regression to predict the protein level using age, where the regression coefficient and p-value are calculated. FDR was applied to adjust for multiple testing of all 471 sites tested. We performed FDR to adjust for multiple tests in each omic layer. Longevity signature analysis To further explore the potential relevance of the identified longevity-associated genes in aging and interventions, we compared their significance scores across different signatures of aging and longevity interventions using the GENtervention database^[153]57. For transcriptomic signatures of lifespan-extending interventions, we selected the ones reflecting the most established longevity interventions that were identified based on gene expression data from at least 3 independent sources, as described in Tyshkovskiy et al. 2019^[154]23. The signatures included human aging and rodent aging, and interventions (caloric restriction, rapamycin treatment, and growth hormone deficiency). We also include signatures of lifespan across interventions based on a larger set of longevity and lifespan-shortening interventions^[155]22. The significance scores were calculated as the -log10(P-value) multiplied by the sign of the effect size (beta) for each gene in each signature. Nominal significance was set at P < 0.05. Hierarchical clustering with Euclidean distance was performed for the genes based on significance score. Statistics & reproducibility The study included a total of 2149 participants: 338 centenarians (aged 100 or older), 917 offspring of long-lived individuals, and 894 controls. Detailed age and sex/gender breakdowns for each group are provided in Table [156]1. Sex and gender were considered in the study design and determined based on self-report at the time of recruitment. All participants provided written informed consent as stated in the “Study population and data collection” section. Participants were not compensated for their involvement in the study. No statistical method was used to predetermine the sample size. Data exclusion criteria are detailed in the “Study population and data collection” section. No other data were excluded from the analyzes. Statistical analyzes primarily employed linear models for burden tests and Mendelian Randomization, with adjustments for potential confounding factors as described in the “Burden test analysis” and “Mendelian Randomization” sections. Multiple testing corrections were applied using FDR. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment, as this was an observational genetic study. Reproducibility was addressed through replication in independent datasets (UK Biobank). Reporting summary Further information on research design is available in the [157]Nature Portfolio Reporting Summary linked to this article. Supplementary information [158]Supplementary Information^ (304.9KB, pdf) [159]Peer Review File^ (2.7MB, pdf) [160]41467_2024_52967_MOESM3_ESM.pdf^ (83KB, pdf) Description of Additional Supplementary Files [161]Supplementary Data 1^ (643.5KB, csv) [162]Supplementary Data 2^ (4MB, csv) [163]Reporting Summary^ (1.1MB, pdf) Source data [164]Source Data^ (1.5MB, zip) Acknowledgements