Abstract Extreme longevity in humans has a strong genetic component, but whether this involves genetic variation in the same longevity pathways as found in model organisms is unclear. Using whole exome sequences of a large cohort of Ashkenazi Jewish centenarians to examine enrichment for rare coding variants, we found most longevity-associated rare coding variants converge upon conserved insulin/insulin-like growth factor 1 signaling (IIS) and AMP-activating protein kinase (AMPK) signaling pathways. Centenarians have a similar number of pathogenic rare coding variants as control individuals, suggesting the rare variants detected in the conserved longevity pathways are protective against age-related pathology. Indeed, we detected a pro-longevity effect of rare coding variants in the WNT signaling pathway on individuals harboring the known common risk allele APOE4. The genetic component of extreme human longevity constitutes, at least in part, rare coding variants in pathways that protect against aging, including those that control longevity in model organisms. Keywords: rare variants, aging, lifespan, longevity, centenarians, human genetics __________________________________________________________________ Species-specific lifespan is limited by aging, a multifactorial process accompanied by a general decline in tissue function and increased risk for many diseases^[77]1. Instead of a passive, entropic process of deterioration, aging is subject to active modulation by signaling pathways and transcription factors conserved across species^[78]2,[79]3. In model organisms, single gene mutations have been demonstrated to affect lifespan^[80]4. For example, at the extreme end, the lifespan of nematode worms can be increased up to nearly ten-fold by mutations in genes involved in insulin/insulin-like growth factor 1 signaling (IIS)^[81]5,[82]6. But even in more complicated organisms, such as flies and mice, lifespan can be extended up to 50% by mutations affecting the same pathway^[83]7-[84]9 or other pathways involved in growth, metabolism and nutrient sensing, such as the mechanistic target of rapamycin (mTOR) and AMP-activating protein kinase (AMPK)^[85]10. On the basis of homology, it is widely hypothesized that these conserved signaling pathways are similarly involved in human aging and longevity. In humans, lifespan is a complex trait affected by multiple factors that vary considerably within human populations. While non-genetic factors, including diet, physical activity, health habits, and psychosocial factors are important, lifespan clearly has a genetic component as suggested by human population-based studies^[86]11,[87]12. At increasingly older ages, especially beyond 100 years, this genetic component becomes exceedingly strong^[88]13,[89]14. As a highly complex trait, the genetic underpinnings of human lifespan likely encompass different types of genetic variants and epistasis across the allele frequency spectrum. Common variants associated with human survival have been extensively searched for in many recent genome-wide association studies (GWAS) using a variety of trait definitions and study designs^[90]15. Together, these studies identified more than 50 longevity-associated genetic loci of genome-wide significance, among which only few, especially APOE, were replicated by multiple studies^[91]16. On the other hand, several previous studies detected association of human longevity with variants in several aging genes – such as insulin signaling genes^[92]17 and FOXO3^[93]18,[94]19 – by using candidate gene approaches. Most of these longevity-associated SNPs have small effect sizes, and currently common variants collectively only explain a very small proportion of heritability for human longevity. As several recent studies suggest, rare variants likely account for at least some of the 'missing' heritability^[95]20-[96]22. Here we examined rare coding variants in a cohort of 515 Ashkenazi Jewish centenarians by whole-exome sequencing (WES) and tested for enrichment using a case-control design. The exceptional longevity of this cohort and their homogeneous genetic background provided us with increased power to detect causal rare variants^[97]23. As controls we used 496 Ashkenazi Jewish individuals, mostly from the same households as the centenarians, between age ~70 and 95 without a parental history of extreme longevity (neither parent survived beyond 95 years of age) ([98]Tables 1 and [99]Supplementary Table S1). Table 1. Information of study cohorts. Characteristic Centenarians Controls (or non-centenarians) Rare variant association study cohort Number of subjects 515 496 Female, % 72.4% 53% Age at enrollment, years (mean±SD) 97.6±3.5 73.3±8.4 Disease PRS study cohort Number of subjects 479 431 Female, % 73.1% 51.3% Age at enrollment, years (mean±SD) 97.6±3.5 73.2±8.7 Lifespan study cohort Number of subjects 356 197 Female, % 74.2% 44.2% Age at enrollment, years (mean±SD) 97.7±3.7 77.9±7.8 Age at death, years (mean±SD) 100.5±3.4 84.2±7.3 [100]Open in a new tab RESULTS Longevity genes and pathways implicated by rare variants Using a joint genotyping procedure and stringent quality control metrics, we identified 130,297 rare coding variants, including 126,405 SNPs and 3,892 indels, with minor allele frequencies < 0.01 and missing rates < 0.1 in 17,561 genes in centenarians and controls. Of all SNPs, a total of 45,493 SNPs were found to be synonymous. The remaining 84,804 non-synonymous SNPs and all indels include 75,567 missense variants, more than 3,500 loss-of-function variants (1,755 frameshift, 1,736 stop-gain, and 79 stop-loss variants), and other variants with multiple functional annotations. We did not exclude synonymous rare variants from our analysis as not all of them are functionally silent^[101]24. At the whole exome level, we found no significant difference in the number of rare coding variants between centenarians and controls (P = 0.243, logistic regression including gender and the top 10 multidimensional scaling (MDS) components as covariates). We next examined rare variant association with longevity at the variant or the gene level. At the variant level, we applied the “firth logistic regression for rare variant association tests” to examine association between the minor allele count of each rare coding variant and the longevity status. The variant with the strongest association signal was rs2229426 in FASN (fatty acid synthase) (P = 6.23E-05) ([102]Supplementary Table S2). At the gene level, we applied two complementary region-based association tests^[103]25 – the “burden test of rare variants” and Sequence Kernel Association Test (SKAT) – to examine the association between the aggregate effect of rare coding variants in each gene and longevity. The burden test searches for a significant excess of rare alleles in longevity cases or controls, while SKAT implements a variance component test to detect effects of variants on longevity even if they have opposite directions. CLCN6 (chloride voltage-gated channel 6) presented the strongest variant association with longevity (The burden test; P = 3.45E-06 and 1.03E-05 as the lowest and the combined P-values, respectively) ([104]Supplementary Table S3). Although these top associations at the variant or the gene level did not reach genome-wide significance after multiple-test correction, quantile-quantile (QQ) plots of association signals showed upward deviation in the tails – the lowest P-values smaller than expected from uniform distribution (0, 1) – for several groups of rare variants such as functional rare variants (CADD^[105]26 score ≥ 20) and functional but recessive benign rare variants (CADD score ≥ 20 and PrimateAI^[106]27 score < 0.5) (see the variant masking in [107]Methods for variant groups and their interpretation) ([108]Figures 1A and [109]1B, [110]Supplementary Figures S1 and [111]S2, and [112]Supplementary Table S4). These rare variants include several genes known to be related to aging such as FASN^[113]28 and the DNA repair gene BLM RecQ like helicase (BLM)^[114]29. Figure 1. Longevity association of rare variants. Figure 1. [115]Open in a new tab (A) The QQ plots for single rare variant association tests. P-values from tests of 2,787 functional rare variants (CADD score ≥ 20) and 3,127 synonymous rare variants were used to construct two separate QQ plots. Only rare variants with a minor allele count ≥ 15 in the case-control cohort were included in the plots. (B) The QQ plot for gene-based rare variant association tests. SKAT P-values from tests of functional but recessive benign rare variants (CADD score ≥ 20 and PrimateAI scores < 0.5) in 3,717 genes were used to construct the QQ plot. Only genes with two or more masked rare variants in the case-control study cohort were included in the QQ plot. (QQ plots under different rare-variant masks are in [116]Supplementary Figure S2.) (C) Pathway enrichment analysis of genes implicated by rare variants aggregated in a gene functional network. Top 100 IGSP-scored genes showing the trend of network aggregation were analyzed. Top 10 non-redundant (covering unique putative longevity genes) enriched pathways are shown. For a gene in a pathway in the heatmap, the color of its cell indicates a weighted burden of rare variants in centenarians (deeppink) or controls (blue) (See [117]Supplementary Table S5). Genes in the heatmap were ordered based on their hierarchical clustering. (D) Gene-set rare variant association for aging-related pathways. P* denotes P-value corrected for 6 categories of rare variants using the minimal-P value test from Flannick et al^[118]52 ([119]Methods). The text for the significant association denotes the lowest nominal P-value among different categories of rare variants and FDR. The extreme rarity of centenarians in human populations essentially constrains the possibility of performing the large studies necessary to discover rare variants through statistically significant genetic associations with longevity. Instead of a candidate gene approach, we used Integrated Gene Signal Processing (IGSP) to prioritize genes based on the longevity association of rare variants in an unbiased manner through data integration^[120]30. About 94% of human protein-coding genes were in the functional linkage network used by IGSP, and only half of them also had knockout phenotype data for their mouse homologs. To include most genes in our analysis, we opted for a network integration – instead of a full one, which needs both gene network and mouse phenotype data. Individual genes were scored by jointly analyzing the longevity association of genes implicated by rare variants in a gene functional linkage network (predicted based on independent genomic high-throughput data)^[121]31, which implicitly incorporates information of gene-gene functional similarity. Data simulation showed that such integrated scoring greatly increases the prioritization power and effectively uncovers risk genes with marginal association signals^[122]30. The negative-control evaluation showed that ~100 top ranked genes had higher IGSP scores when scored by real data than by randomized data (P = 0.037) ([123]Supplementary Figure S3 and [124]Table S5), which suggests a clustering of longevity-associated genes implicated by rare variants in the gene network captured by a network integration in gene scoring^[125]30. Subsequent pathway enrichment analysis showed that these predicted longevity genes are significantly enriched in insulin signaling (FDR = 0.00879) and mTOR signaling (FDR = 0.0129) ([126]Figure 1C and [127]Supplementary Figure S4). Some predicted longevity genes have an indirect connection to insulin signaling as they are in the pathway of signaling by the insulin receptor (e.g., PSMB9). Interestingly, many of the putative longevity genes carry a burden of rare variants in centenarians, among which potential protective rare variants were also found in previous studies such as ABCA1^[128]32 and PLCG2^[129]33 ([130]Supplementary Table S5). To further increase power, longevity association of rare coding variants can be studied at the pathway level. Since aging is characterized by evolutionarily conserved, parallel and interacting mechanistic hallmarks, we next analyzed rare variants collectively in 20 pathways of all nine aging hallmarks^[131]1 ([132]Supplementary Table S6). Functional but recessively benign rare variants in insulin signaling (SKAT, P = 5.57E-05, FDR = 0.012) and AMPK signaling (SKAT, P = 1.59E-04, FDR = 0.017) pathways were found to be significantly associated with extreme longevity ([133]Figure 1D, [134]Supplementary Tables S7 and [135]S8) after multiple testing correction that took into account the total numbers of pathways, tests, and variant masks. When studying genetic variants in association studies, it is important to validate the results by replicating any observed association signals in unrelated cohorts. Our approach followed the sequence-based replication strategy, which is more powerful than the variant-based strategy that only analyzes rare variants uncovered in the discovery cohort^[136]34. Specifically, we examined three replication cohorts for longevity association of rare coding variants in insulin and AMPK signaling pathways ([137]Supplementary Table S9): a German longevity cohort of 1,265 centenarians (mean age: 99 years) and 4,195 blood donors (mean age: 35 years) as controls, a UK Biobank longevity cohort of 104 participants with at least one long-lived parent (lifespan ≥ 100 years) and 23,405 participants with parents of usual survival (lifespan < 95 years), and an Alzheimer's Disease Sequencing Project (ADSP) longevity cohort of 1,121 non-AD individuals aged ≥ 90 years and 38 non-AD individuals aged < 75 years^[138]35. In the German longevity cohort, we detected a significant longevity association of functional but recessive benign ultra-rare variants (AAF < 0.05% among non-Finnish European in gnomAD^[139]36) in insulin signaling (SKAT, P = 4.41E-04, FDR = 0.018) after appropriate multiple testing correction ([140]Extended Data Figure 1A). In the UK Biobank longevity cohort, we identified significant longevity associations of functional rare variants in insulin signaling (SKAT, P = 9.64E-06, FDR = 3.87E-04) and functional but recessive benign ultra-rare variants in AMPK signaling (SKAT, P = 2.08E-03, FDR = 0.041) pathways ([141]Extended Data Figure 2A). In the ADSP longevity cohort, we identified significant longevity associations of recessive pathogenic rare variants in insulin signaling (Burden test, P = 8.98E-5, FDR = 3.6E-03; Direction on controls) ([142]Extended Data Figure 3). Next, we focused on identifying rare variants associated with human age-related disease. A genetic relationship between extreme human longevity and disease is supported by multiple observation in independent studies of a genetic association between extreme human longevity and APOE, a locus causally related to both cardiovascular and neurodegenerative disease^[143]37. Here we hypothesized that rare genetic variants associated with human longevity can exert their beneficial effects, at least in part, by protecting against chronic disease. Hence, we examined the rare coding variants in the 20 aging hallmark pathways in more refined subgroups of our cohort based on their APOE haplotype status and analyzed separately the longevity sub-cohorts of APOE4 carriers and non-carriers (hereinafter APOE4+ and APOE4−, respectively) to identify longevity-associated pathways in these two distinct genetic backgrounds. Among APOE4−, functional but recessively benign rare variants in both insulin and AMPK signaling pathways were again found significantly associated with longevity (SKAT, P = 6.21E-06 and 7.9E-05, FDR = 2.63E-03 and 0.013, respectively) ([144]Extended Data Figure 4, [145]Supplementary Tables S10 and [146]S11). Interestingly, among APOE4+, we detected a significant association between longevity and 152 functional rare variants in WNT signaling genes after multiple testing correction using both the burden test and SKAT (the burden test, P = 9.16E-05, FDR = 0.013; SKAT, P = 3.40E-04, FDR = 0.036) ([147]Extended Data Figure 4, [148]Supplementary Tables S12 and [149]S13). The direction of association suggests that these rare variants are enriched for protective variants among centenarian APOE4+ in our cohort ([150]Supplementary Table S14). Indeed, only six of them were predicted as highly pathogenic rare variants (PrimateAI score ≥ 0.9), and they are not enriched, individually or collectively, among APOE4+ centenarians. The WNT association was replicated in the UK Biobank longevity cohort with a significant longevity association of functional rare variants in WNT signaling pathway among APOE4+ (SKAT, P = 1.79E-10, FDR = 2.14E-08) ([151]Extended Data Figure 2B and [152]Supplementary Table S9). We did not detect significant longevity association signals from rare variants in WNT signaling pathway in either of APOE4-stratified German longevity sub-cohorts ( [153]Extended Data Figure 1B). We examined further the protective effect of functional rare variants in WNT signaling genes on individual human lifespan in our lifespan cohort of 553 individuals with verifiable ages at death. Starting with the full linear model of lifespan that included gender, APOE4 status, the alternative allele count of protective rare variants in WNT signaling, and all two-way and three-way interaction terms among them, we identified a statistically significant interaction, the only one, between APOE4 status and the allele count in WNT signaling (P = 1.13E-04) ([154]Supplementary Table S15). As the APOE4 status is determined mainly by rs429358, a common variant (MAF = 0.14) associated with aging and age-related diseases, the lifespan analysis result indicates the existence of epistasis between rare variants and aging-associated common variants in the genetic architecture of human aging. We then analyzed the relationship between individual lifespan and the alternative allele count of protective rare variants in WNT signaling in sub-cohorts stratified by the status of both longevity and APOE4 ([155]Figure 2A). Among centenarians, the allele count in WNT signaling has no effect on the lifespan regardless of the APOE4 status. Among non-centenarians, there was a significant positive correlation between the burden of WNT rare variants and lifespan among APOE4+ (r = 0.406, P = 8.39E-03, FDR = 0.026) ([156]Figure 2A, the middle blue panel), compared with non-carriers. The relationship between APOE4 status and the allele count in WNT signaling can also be more readily appreciated by comparing the average lifespan of sub-cohorts stratified based on both APOE and WNT signaling. Among APOE4+, the median difference in lifespan was over nine years (P = 2.38E-03) between individuals with low or high allele counts in WNT signaling ([157]Figure 2B). And the negative effect of APOE4 on lifespan became weaker among individuals with a high burden of potentially protective WNT rare variants ([158]Figure 2C). Interestingly, the aforementioned 152 rare variants in WNT signaling genes are associated with the disease status of individuals in the ADSP (SKAT, P = 4.82E-03). Finally, using the same framework, we analyzed lifespans of centenarians and non-centenarians separately and demonstrated a similar protective effect of the 152 rare variants in WNT signaling genes ([159]Supplementary Table S14) on lifespan among non-centenarian APOE4+ ([160]Supplementary Table S9, [161]Extended Data Figures 5 and [162]6). Figure 2. Protective rare variants in WNT signaling genes for APOE4+. Figure 2. [163]Open in a new tab P denotes uncorrected P-value derived from linear regression with the log-transformed age at death as the outcome and the gender as a covariate (See [164]Methods). 'WNT low' and 'WNT high' represent the alternative allele count of rare variants in WNT signaling genes ≤ 1 and > 1 (the median), respectively. In parentheses are the numbers of individuals. MD stands for 'median difference'. (A) Correlation between lifespan and the alternative allele count of protective rare variants in WNT signaling genes. (B) The lifespan difference of individuals carrying a high and low burden of protective rare variants in WNT signaling genes. The horizontal lines and vertical thick lines in violin plots represent median and interquartile range, respectively. (C) Negative effects of APOE4 on lifespan compensated by protective rare variants in WNT signaling genes. Longevity and common polygenic risk of age-related diseases The phenotypic outcome of individuals that carry rare variants of large effects can also be influenced by the background of common polygenic variation. To assess how rare variants may interact with the genetic background of common variants to affect human aging, we specifically examined in our longevity cohort common variants associated with seven age-related diseases: Alzheimer's disease, coronary artery disease, type 2 diabetes, stroke, breast cancer, prostate cancer, and pancreatic cancer. This analyzed cohort consists of 479 centenarians and 431 controls with both WES and SNP array data available ([165]Table 1 and [166]Supplementary Table S1). We calculated polygenic risk scores (PRS) of individuals for these diseases using summary statistics from their corresponding GWAS (See [167]Methods). Empirical P-values provided by PRSice2 that account for over-fitting indicated significant genetic overlap between longevity and each of Alzheimer's disease, coronary artery disease, and type 2 diabetes ([168]Figure 3, [169]Extended Data Figure 7, and [170]Table S16), which was further supported by the significant results of cross-validation ([171]Supplementary Table S17). PRS for Alzheimer's disease, coronary artery disease, and type 2 diabetes explained 1.93% (P = 0.0019), 1.32% (P = 0.013) and 1.29% (P = 0.015) variance of the longevity status, respectively. Measured by PRS, centenarians tend to have reduced genetic susceptibility to not only Alzheimer's disease and coronary artery disease (Bonferroni-Holm P* = 0.0067 and 0.039, respectively), which were previously found associated with healthy aging^[172]38 but also type 2 diabetes (P* = 0.039). The predictive power of PRS for Alzheimer's disease was mainly driven by the APOE haplotype defined by SNPs rs7412 and rs429358 ([173]Figure 3B and [174]3C). It's not the case for coronary artery disease ([175]Extended Data Figure 7B)^[176]39. To further examine the genetic overlap between longevity and diseases, we applied an 'extreme-longevity phenotyping' strategy and found that the variance explained by PRS for Alzheimer’s disease and coronary artery disease increased almost four times to 4~7% between people with age ≥ 100 years and < 80 years ([177]Figure 3, [178]Supplementary Figure S5, and [179]Table S16). PRS for type 2 diabetes showed a stronger association with the longevity status among males than females in our cohort. PRS for Alzheimer's disease and coronary artery disease, however, showed no such gender difference ([180]Supplementary Figure S6). Figure 3. Common polygenic risk of age-related diseases. Figure 3. [181]Open in a new tab (A) Common polygenic risk for seven different age-related diseases on subjects were calculated using PRS of the corresponding diseases. Nagelkerke's R^2 is based on correlation between the disease PRS and the centenarian status. The bar color denotes the statistical significance of R^2 after adjusting MDS1-10 and gender (except breast cancer and prostate cancer, which are tested with females and males, respectively) as covariates. The statistical significance is based on the permutation P-values of using PRSice-2. For Alzheimer's disease and coronary artery disease, the middle bars show the results of PRS analyses excluding SNPs within 1 Mbps of APOE haplotype SNPs – rs429358 and rs7412. The bottom bars (for Alzheimer's disease, coronary artery disease, breast cancer and prostate cancer) show the results of PRS analyses using extreme-longevity phenotypes (cases and controls with ages ≥ 100 years and < 80 years, respectively. See [182]Supplementary Table S16). (B-D) PRS analyses of Alzheimer's disease as it is, excluding SNPs within 1Mbps of rs7412 or rs429358, or using extreme-longevity phenotypes. In the boxplots, points represent individuals, and horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range (25th to the 75th percentile). n = 910 biologically independent samples. Above the boxplot on the right are raw and adjusted (in parentheses) P-values for the best prediction in the Nagelkerke's R^2 plot on the left, which were calculated based on logistic regression and the permutation test in PRSice2, respectively. Pathogenic rare variants and longevity Since the genetic component of extreme longevity could be explained, at least in part, by a reduced burden of pathogenic variants as compared with that of the general population, we compared the counts of predicted pathogenic rare coding variants (PrimateAI score ≥ 0.9). No significant difference between centenarians and controls (P = 0.243, logistic regression including gender and the top 10 MDS components as covariates; [183]Figure 4A) was observed. Using our lifespan cohort, we next investigated whether pathogenic rare coding variants affect lifespan, whether the effect depends on the common polygenic disease background, and whether the effect is different between centenarians and controls. Consistent with the general observation, females also had significantly better survivorship than males in our lifespan cohort (P = 1.71E-07, [184]Extended Data Figure 8). So had APOE4− than APOE4+ (P = 9.32E-04, [185]Extended Data Figure 9). In our lifespan cohort, 853 pathogenic rare variants were identified. No correlation between the exome-wide burden of pathogenic rare variants and the lifespan was observed among centenarians and non-centenarians together (the full lifespan cohort) or either of them separately ([186]Figure 4B). Figure 4. Analysis of pathogenic rare variants. Figure 4. [187]Open in a new tab (A) Exome-wide burden of pathogenic rare variants in centenarians and controls. (B) Correlation between lifespan and the exome-wide burden of pathogenic rare variants. The left panel shows the result based on all 553 individuals. The middle and the right panels show the results based on individuals with lifespan ≥ 95 years and < 95 years, respectively. (C) Correlation between lifespan and the exome-wide burden of pathogenic rare variants among individuals with high genetic risk of age-related diseases. The left panel shows the correlation among 94 APOE4+. The right panel shows the correlation among 20 APOE4+ with PRS among top 45% for CAD and T2D (see [188]Supplementary Table S18 for the results of using other cutoffs). Human extreme longevity could be causally driven by a lack of genetic risk factors for chronic disease, by protective variants or both. Measured by the polygenic risk score (PRS), centenarians in our cohort tend to have reduced genetic susceptibility to Alzheimer's disease (AD), coronary artery disease (CAD), and type 2 diabetes (T2D) among seven age-related diseases that we examined ([189]Figure 3 and [190]Supplementary Table S16). Using the APOE4 status and PRS of CAD and T2D, we stratified our lifespan cohort according to their common genetic risk of AD, CAD, and T2D – the three age-related diseases with significant genetic overlap with longevity in our cohort – and examined how the common polygenic disease risk background and pathogenic rare variants may together affect human lifespan. We first re-examined the effect of pathogenic rare variants on lifespan on an AD risk background based on the APOE4 status and found a weak negative correlation (r = −0.184, P = 0.064) among APOE4+. However, this relationship became significantly stronger (r = −0.605, P = 2.85E-03, FDR = 7.13E-03) if substantial genetic risk of both CAD and T2D also was included (i.e., APOE4+ with PRS for both diseases higher than the respective median of the longevity cohort) ([191]Figure 4C and [192]Supplementary Table S18). These results suggest that pathogenic rare variants and disease-associated common variants interact. Such genetic interactions may affect the deleterious effect of pathogenic rare variants on human lifespan, a possibility that we formally investigated using a full linear model of lifespan including gender, APOE4 status, separate PRS of CAD and T2D, the pathogenic rare variant counts, and all two-way and higher-order interaction terms among them. The subsequent stepwise model selection identified multiple interactions, among which the most significant is a three-way interaction among the pathogenic rare variant count and the common polygenic disease risk of AD and T2D in our lifespan cohort (P = 3.12E-04) ([193]Supplementary Table S15). Our analyses of stratified sub-cohorts showed that the negative effect of common polygenic disease risk on human lifespan can intensify under a high burden of pathogenic rare variants. For example, the presence of APOE4 reduced life by ~1.5 years on average in our cohort. However, among individuals with ≥ 7 pathogenic rare variants (the median = 3), APOE4+ lived ~17 years less than non-carriers in general (P = 2.77E-04; FDR = 8.31E-04) ([194]Supplementary Figure S7). To replicate this discovery of the relationship between pathogenic rare variants and lifespan, we first constructed a UK Biobank parental lifespan cohort ([195]Methods), which consists of 20,823 unrelated (to the first-degree kinship) participants with known parental ages at death, and then examined the relationship between the exome-wide burden of pathogenic rare coding variants and the parental lifespan in this cohort ([196]Supplementary Figure S8). We observed a negative correlation among APOE4+ (r = −0.024, P = 0.044) ([197]Supplementary Figure S9 and [198]Supplementary Table S9). The stepwise model selection procedure identified a significant interaction related to parental lifespan between the APOE4 status and the exome-wide burden of pathogenic rare coding variants (P = 5.48E-05). DISCUSSION In summary, in this first large-scale genetic study of rare coding variants and human longevity, our network-integrated analysis identified an enrichment of longevity-associated rare coding variants in conserved aging pathways and gene-set association tests confirmed longevity association of rare variants in insulin and AMPK signaling pathways. These results suggest that rare variants in conserved aging pathways important for aging of model organisms also affect human lifespan and constitute a part of the genetic architecture of human longevity. As expected, based on the many species-specific characteristics of aging, the pattern is not completely identical between human and animal longevity. For example, we did not find any association of extreme longevity with variants in the mTOR pathway, which has been associated with longevity in model organisms, including the mouse. On the other hand, we did find other pathways critical to human aging not yet identified in model organisms. For example, we demonstrated protective effects of rare variants in WNT signaling on human lifespan. Interestingly, in the klotho-knockout mouse model of accelerated aging, continuous WNT exposure triggered accelerated cellular senescence, implicating WNT signaling in mammalian aging^[199]40. Finally, our results confirm previous reports that centenarians do not have a lower burden of pathogenic variants. Instead, from our present study, it appears that rare protective variants suppress the adverse effects of pathogenic variants on longevity. To investigate whether the same conserved pathways are important to aging of both model organisms and human, we can examine the effects of rare variants on lifespan-related traits. This is particularly challenging, however, due to a strong intrinsic stochasticity in aging processes: among isogenic C. elegans in a constant environment, lifespan of long-lived (age-1) mutants overlaps with that of the wildtype controls^[200]41,[201]42. Thus, the same genetic variants may have highly variable effects on lifespan among different individuals. While this stochasticity complicates the identification of longevity-associated variants in conserved aging pathways, using appropriate statistical tests and study cohort can help overcome the challenge. In this study, we identified rare coding variants in aging pathways that affect human longevity. Future studies of their molecular functions could generate actionable biological insights on aging. In particular, uncovering the downstream pathways that mediate protective effects of rare variants found in WNT signaling genes is imperative to translate this finding into therapeutic interventions against age-related diseases. Experiments with mouse cells suggest that APOE4 may inhibit WNT signaling^[202]43. Dysregulation of WNT signaling contributes to different types of age-related diseases such as cancer^[203]44, AD^[204]45, and cardiovascular disease^[205]46. Thus, protective rare variants in WNT signaling genes could counteract the adverse effects of APOE4-induced WNT inhibition on the progression of downstream age-related diseases and thus affect lifespan. While coding variants are more likely to reduce than to enhance the function of the protein product, conclusive confirmation and understanding of the functional effects at the molecular, cellular, and organismal levels require experimental validation using functional assays and genome-editing^[206]16. Our study suggests that rare variants can have distinct effects on lifespan on different genetic backgrounds of age-related diseases (such as the APOE4 status), underlying the difficulty to detect and replicate effects of rare variants on lifespan without considering other genetic factors. On the other hand, while common variants associated with age-related diseases are known to influence lifespan^[207]47, our finding of potential genetic interactions between common and rare variants in the context of human lifespan provides novel insights into the mechanism of disease resilience as a part of the genetics of healthy aging among centenarians. How perturbation of conserved aging pathways contributes to human longevity and healthspan cannot be answered by genetics alone. However, while the molecular mechanisms of many conserved aging pathways have been widely studied in model organisms, our findings about rare variants – especially those from centenarians with high common polygenic risk of age-related diseases – can help translate those established longevity-regulating mechanisms in model organisms to therapeutic targets for healthy aging of humans. A limitation of whole-exome sequencing, used in our present study, is the absence of rare, non-coding variants that have been implicated in aging of model organisms^[208]48-[209]50 and thus of potential interest for human longevity. These include, for example, rare variants in non-coding RNAs or other regulatory elements relevant for tissue specificities and variants in long tandem repeats connected to brain health and various neurological disorders. To identify the latter, long sequencing reads are required^[210]51. METHODS Our WES study on the Einstein longevity cohorts complies with all relevant ethical regulations and was approved by the Institutional Review Board at Albert Einstein College of Medicine. Informed consent was obtained from participants or from a proxy if the participant lacked decisional capacity. The WES studies of all the three replication cohorts have informed consent from participants and were approved by the respective ethics committee or institutions: the Ethics Committee at Medical Faculty of Kiel University for the German longevity cohort; the Ethics Advisory Committee and the external ethics committees for the UK Biobank; and the ethics committees of the Broad Institute, Baylor College of Medicine’s Human Genome Sequencing Center, and Washington University’s McDonnell Genome Institute for the ADSP cohort. Recruitment of Einstein longevity cohorts The study subjects were Ashkenazi Jewish participants from two longevity cohorts, the Longevity Genes Project (LGP) and the LonGenity study, who were recruited and characterized at the Albert Einstein College of Medicine since 1998. Cases (centenarians) were defined as individuals with age ≥ 95 years, and individuals with age < 95 without a parental history of longevity (neither parent survived beyond 95 years of age) were classified as controls. The centenarians' dates of birth were confirmed by birth certificates or government issued identification. Vital status and date of death, where applicable, were determined as of April 3, 2019, based on documentation of last contact with the study participant, reports from the next of kin, and search of publicly available databases. In the LGP and LonGenity cohorts, 555 and 508 individuals were classified as longevity cases (mean age: 101) and controls (mean age: 83), respectively. Mortality status was confirmed for 650 individuals, and these individuals were subjects for the lifespan analysis. SNP-array genotyping SNP-array genotyping was performed using Illumina Global Screening Array-24 v1.0 BeadChip with 642,824 markers, 7201 of which could not be 'lifted over' to human genome assembly GRCh38 and thus removed. 2,026 samples were genotyped by SNP-arrays. After removing duplicates and samples not in our longevity studies, 635,623 variants in 1,830 samples were processed and analyzed (1,740 samples also have WES data). Quality control of array-based genotyped data was carried out using PLINK software (version 1.9)^[211]53. First, we checked the missing rate of SNPs and samples. SNPs and samples that miss over 20% genotype calls are removed and this missingness filtering is repeated with a more stringent threshold of 2%. Individuals whose self-reported gender is different from the one predicted based on sex chromosome heterozygosity are removed. SNPs whose genotype frequencies deviate from the Hardy-Weinberg equilibrium with a χ^2-test P < 1E-6 among controls, followed by P < 1E-10 among cases are removed. Finally, samples whose heterozygosity deviated more than three standard deviations from the mean are removed. Exome sequencing and genotyping Exome sequencing of 2,112 individuals in LGP and LonGenity cohorts was performed at the Regeneron Genetics Center (RGC). Sample preparation and whole-exome sequencing were performed using previously described methods^[212]54 ([213]Supplementary Note). Variants in our centenarian cohort were called on human genome assembly GRCh38. For our rare variant analyses using both binary (cases vs. controls) longevity and continuous lifespan data, only rare variants with missing rates < 0.1 in the corresponding study cohorts were analyzed; all samples in our study cohorts have a missing rate < 0.01 on rare variants that passed the quality control ([214]Supplementary Note). Aggregation of SNP-array and WES data For PRS-related analyses, we used genotypic data aggregated from WES and SNP-array ([215]Extended Data Figure 10A) for two reasons: (1) genotypes of common variants from the whole genome (not just the exome) need to be imputed (see the next sub-section) for PRS calculation; and (2) genome-wide imputation based on genotypic data from both WES (for better accuracy) and SNP-array (for better coverage) is better than imputation based on WES data alone. After the aggregation process ([216]Supplementary Note), ~1,203k variants were kept in the merged VCF file. Genotype imputation We used the Michigan Imputation Server (Minimac3)^[217]55 for genotype imputation (n = 1,740). The Haplotype Reference Consortium (HRC, r1.1 2016)^[218]56 was used as the reference panel, Eagle v2.3 for phasing, and the European population (EUR) for quality control. After the post-imputation process ([219]Supplementary Note and [220]Supplementary Figure S10), we obtained ~14,079k polymorphic variants in our cohort. We evaluated the suitability of the HRC reference panel for cross-ethnicity genotype imputation in our study, using 196 Ashkenazi Jewish individuals in our cohort for whom the whole-genome sequencing data are available. Genotype imputation that we performed was highly accurate: in 183 individuals (out of 196), genotypes of >99% of 2,020 randomly selected non-coding variants that were not genotyped by either WES or SNP array data can be correctly imputed ([221]Supplementary Figure S11). Polygenic risk score analysis We calculated polygenic risk scores (PRSs) using PRSice-2^[222]57,[223]58 to analyze disease risk from common variants in our longevity cohort. We first collected summary statistics from the most recent GWAS of seven complex diseases of European or predominantly European ancestry: AD^[224]59, CAD^[225]60, T2D^[226]61, stroke^[227]62, prostate cancer^[228]63, breast cancer^[229]64, and pancreatic cancer^[230]65. From combined genotype data after imputation for 1,740 samples, common SNPs (MAF > 5%) were selected in the cohort and carried out LD clumping if they are within 250 kbps and R^2 > 0.1. After clumping, we used 19 P-values (1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.01, 1E-3, 1E-4, 1E-5, 1E-6, 1E-7, 1E-8, 1E-9, and 1E-10) as cutoffs to select SNPs for scoring and, for AD, additional ones to restrict selection to most AD-associated SNPs. After removing outliers based on Multidimensional Scaling (MDS) analysis ([231]Supplementary Figure S12), non-Ashkenazi Jewish individuals and kinship, 910 centenarians and controls among 1,740 samples were used to evaluate association between disease PRS and longevity in the cohort ([232]Extended Data Figure 10A). To remove population substructure and sex difference, we included the top 10 MDS components derived from common SNPs in the combined genotype dataset and gender as covariates in the regression analysis to evaluate PRS association. When analyzing PRS of prostate cancer and breast cancer, only male and female individuals were considered respectively. Rare variant association analysis Among 2,021 Ashkenazi Jewish individuals with WES data, 536 were centenarians, and 506 were controls. Pairs of individuals with the proportion of alleles shared identity-by-descent (IBD) > 0.4 were identified as related – i.e., monozygotic twins, parents and children, and full siblings – and one sample per pair was excluded, with inclusion to achieve more cases, higher ages of cases, and lower ages of controls. In our study cohort, we identified 31 participants as related to other participants due to high IBD. After excluding them, we had 515 cases (mean age: 101 years) and 496 controls (mean age: 83 years) for rare variant association analysis ([233]Table 1 and [234]Extended Data Figure 10B). In this study, we analyzed rare variants with alternative allele frequencies < 1% in Ashkenazi Jewish populations, which were calculated based on the average of the allele frequencies in 731 unrelated (to the first-degree kinship) Ashkenazi Jewish individuals in our centenarian cohort (2,021) (excluding centenarians and other individuals included in our study ([235]Table 1)) and the ones in Ashkenazi Jews reported in gnomAD. The longevity association was assessed on the variant, gene, and gene-set levels. We evaluated the longevity association of each rare coding variant using the firth logistic regression^[236]66. For association tests at gene and gene-set levels, we performed the burden test and SKAT (implemented in R^[237]67; version 1.3) to test longevity association of six different subsets of rare variants within each gene or gene-set. The variant-masking scheme^[238]52 was designed to group similar rare variants of specific properties based on CADD (version 1.4) and PrimateAI (version 0.2) annotation. CADD is widely used as a variant annotation tool to predict the functionality (i.e., being functional or neutral) of variants. In contrast, PrimateAI predicts their clinical impact (i.e., being pathogenic or benign). We defined different classes of variants based on the recommended thresholds of CADD and PrimateAI scores ([239]Supplementary Table S4): all rare variants (without masking), functional (or non-neutral)^[240]26 rare variants (CADD score ≥ 20), dominant pathogenic rare variants (PrimateAI score > 0.8), recessive pathogenic rare variants (PrimateAI score > 0.7), functional but dominant benign rare variants (CADD score ≥ 20 & PrimateAI score < 0.6), and functional but recessive benign rare variants (CADD score ≥ 20 & PrimateAI score < 0.5). The minimum P-value test^[241]52 was used to combine P-values of the aforementioned six sets of rare variants at the gene or gene-set level. For gene-based association tests, only genes with multiple rare variants after masking were tested for the corresponding rare variant category. 15,935 genes were tested for at least one variant category. For gene-set association tests, we compiled 20 gene sets of aging pathways for nine aging hallmarks^[242]1 ([243]Supplementary Table S6) and used the burden test and SKAT to test longevity association of those six sets of rare variants within each of those 20 gene sets. FDR was used to correct for 130,297 P-values at the variant level, 31,870 (2 × 15,935) combined P-values at the gene level, and 40 (2 × 20) combined P-values at the gene set level, respectively. For rare variant association at the gene-set level, we conducted an independent analysis using the same framework but in two sub-cohorts: APOE4+ and APOE4−. FDR was used to correct for 80 (2 × 2 × 20) combined P-values in this analysis. Gender and top 10 MDS were included as covariates in all rare variant association analyses in the discovery cohort. Network/pathway enrichment of rare variants In addition to conventional approaches of rare-variant association study, we investigated whether longevity-associated rare variants aggregate in a gene network and pathways. We first used IGSP^[244]30 to score longevity-associated genes by integrating rare-variant association tests at the gene level with gene functional network^[245]31. To consider information of all rare coding variants in IGSP scoring, we collected gene association signals by applying the weighted burden test (using the R package SKAT) on rare coding variants of each gene weighted by the corresponding CADD scores. We then tested whether top 100 genes tend to be scored higher than top 100 genes derived from randomized rare variant association signals using the Wilcoxon rank-sum test. To investigate enriched pathways of those top 100 genes implicated by longevity association of rare variants and the functional gene network in an unbiased manner, we first performed the pathway enrichment analysis using ToppGene Suite^[246]68, in which 1,245 pathways from different pathway databases were analyzed concurrently, to summarize top enriched pathways across pathway databases. In addition, we compared the top enriched KEGG and Reactome pathways identified by ToppGene Suite and other three widely used tools for pathway-enrichment analysis – Enrichr^[247]69, g:Profiler^[248]70, and GSEA^[249]71 – to derive enriched pathways supported by multiple analysis tools. Lifespan analysis of rare variants In our longevity cohort, after removing the kinship relatedness, we have date of death – and thus definitive lifespan information – on 553 Ashkenazi Jewish individuals (202 males and 351 females) ([250]Table 1, [251]Extended Data Figures 8 and [252]10), among which 550 (~99.5%) individuals with lifespans ≥ 65 years. Since no censored data were included in our lifespan cohort – i.e., all subjects reached the endpoint (death), for all lifespan analyses of rare variants, we tested the association between lifespan and the burden of rare variants in the lifespan cohort using a unified accelerated life linear model^[253]72 with the log-transformed age at death as the outcome and the gender as a covariate. Different from rare-variant association analyses that aimed to discover longevity-associated rare variants using a longevity case-control design, our lifespan analyses of rare variants investigate how pathogenic rare variants and protective rare variants discovered in our case-control study impact human lifespan through quantitative analyses. Pathogenic rare variants and lifespan We investigated whether pathogenic rare variants can adversely affect lifespan. We used PrimateAI^[254]27, which was specially designed and optimized for predicting disease-causing variants^[255]73, to select highly pathogenic rare coding variants using a stringent score threshold ≥ 0.9 and assessed how the total count of their alternative allele (the exome-wide burden) may affect lifespan. PrimateAI is a machine learning-based method that expands the data set for training by including common variants from non-human primates to improve the power for predicting human pathogenic variants. No direct comparison of variant effects on longevity was made between human and non-human primates by using PrimateAI. Protective rare variants and lifespan Our rare-variant association tests uncovered a burden of rare variants in WNT signaling genes that may have pro-longevity effects among APOE4+ ([256]Supplementary Table S14). We investigated their impact on lifespan by examining those protective rare variants in our lifespan cohort through several analyses: We evaluated whether the alternative allele count of those protective rare variants in WNT signaling genes is correlated with lifespan among APOE4+ and APOE4−, respectively; from a complementary angle, we investigated whether APOE4 differentially affects lifespans of individuals with a high or low burden of those protective rare variants; and finally, we also examined centenarians and non-centenarians separately in the lifespan analysis to differentiate it from the association study in which the longevity status was used. Replication studies To maximize the extent of replicating human longevity association of rare variants, we prepared longevity case-control replication studies using cohort-specific criteria of determining longevity cases and controls. First, longevity cases are individuals older than the human life expectancy. Second, longevity controls are individuals substantially (> 15 years) younger than cases. We used the WES data from three cohorts – a German longevity cohort, a UK Biobank longevity cohort, and a longevity cohort from ADSP – to replicate the longevity association of rare variants discovered in our Ashkenazi Jewish longevity cohort. The German sample comprised 1,265 long-lived individuals (age range: 94 - 110 years; mean age: 99 years) as described previously^[257]74 and 4,195 younger controls (mean age: 35 years) recruited as part of the FoCus cohort^[258]75 and as blood donors at the University Hospital Schleswig-Holstein in Kiel and Lübeck, Germany. For exome sequencing and data analysis (including alignment and variant calling), the same wet lab processes and bioinformatic pipelines at the Regeneron Genetics Center were employed as for the Einstein cohort. The UK Biobank longevity cohort was collected from 49,960 individuals whole-exome sequenced in the UK Biobank^[259]76, consisting of 104 cases and 23,405 controls of British and white ethnicity with at least one long-lived parent (lifespan ≥ 100 years) (mean longest-lifespan of parents: 101 years) and with parents of usual survival (lifespan < 95 years) (mean longest-lifespan of parents: 80 years), respectively. The ADSP longevity cohort consists of 1,121 non-AD individuals aged ≥ 90 years (the ADSP recorded age is right truncated at 90) as cases and 38 non-AD individuals aged < 75 years as controls (mean age: 71 years). Both the German and the UK Biobank longevity cohorts were used to replicate our findings in the full and APOE4-stratified Ashkenazi Jewish longevity cohorts. The ADSP longevity cohort was used only to replicate findings made in the full discovery cohort due to the limited number of its control samples. Relativeness to the first-degree kinship were removed from all three cohorts. We applied the same framework of rare variant association analysis from our discovery study in the replication analysis. We tested the 6 masking-groups of not only rare variants (AAF < 1%) but of ultra-rare variants (AAF < 0.05%), separately, that were not examined specifically in our discovery cohort due to the limited sample size of allele reference panel in Ashkenazi Jews. The minimum P-value test was used to correct for 12 test P-values for a tested gene set accordingly. Rare variants in the three longevity cohorts are determined based on their AAF frequency in the corresponding WES data (5,460, 49,960, and 10,267 individuals in the German, UK Biobank and ADSP WES data, respectively). Ultra-rare variants were further determined from rare variants based on their AAF reported in the large Non-Finnish European reference panel in gnomAD (v2; 56,885 individuals). Rare variants with genotype missing rates ≥ 0.1 were excluded from our analyses. Gender and top 10 principal components from the PCA analyses accounting for the subpopulation structure were used as covariates in the burden test and SKAT. FDR was used to correct for 4 (2 gene sets: Insulin and AMPK; 2 tests: SKAT and the burden test) and 12 (3 gene sets: Insulin, AMPK and WNT; 2 tests: SKAT; 2 sub cohorts: APOE4+ and APOE4−) combined P-values for replication tests at a gene-set level in the full cohort and APOE4 stratified cohorts, respectively. We used the UK Biobank WES data of a parental lifespan cohort to replicate the relationship between pathogenic rare variants and lifespan discovered in our Ashkenazi Jewish lifespan cohort, due to the lack of long-lived individuals with WES data (the longest lifespan is ~80 years). After removing relatedness to the first-degree kinship, this cohort consists of 20,823 individuals of British and white ethnicity with the average parental lifespan ≥ 65 years. We used the same regression framework for replication as we used for discovery, including top 10 principal components as covariates. Statistics and reproducibility No statistical methods were used to predetermine sample size as all the available samples from the WES data were considered. We used various statistical methods to analyze the data; please see the [260]Methods subsections above for details. We used three independent longevity cohorts in which we successfully replicated our finding on longevity association of rare variant in aging pathways. No data were excluded from the analyses. The experiments were not randomized as this approach was not relevant to the study design. The investigators were not blinded to allocation during experiments and outcome assessment as this was not relevant to the study design. Extended Data Extended Data Figure 1. The replication study of gene-set longevity association using the WES data of the German longevity cohort. Extended Data Figure 1. [261]Open in a new tab The longevity case-control study consists of 1,265 longevity cases and 4,195 longevity controls. P* denotes P-value corrected for 12 categories of rare variants using the minimal-P value test from Flannick et al^[262]52 ([263]Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR. (A) Full longevity cohort. (B) APOE4 stratified cohorts. Extended Data Figure 2. The replication study of gene-set longevity association using the UK Biobank WES data. Extended Data Figure 2. [264]Open in a new tab The longevity case-control study consists of 104 cases with at least one parent age at death ≥ 100 years and 23,405 controls with both parent age at death < 95 years. P* denotes P-value corrected for 12 categories of rare variants using the minimal-P value test from Flannick et al^[265]52 ([266]Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR. (A) Full longevity cohort. (B) APOE4 stratified cohorts. Extended Data Figure 3. The replication study of gene-set longevity association using the ADSP WES data. Extended Data Figure 3. [267]Open in a new tab The longevity case-control study consists of 1,121 non-AD cases with age ≥ 90 years and 38 non-AD controls with age < 75 years. P* denotes P-value corrected for 12 categories of rare variants using the minimal-P value test from Flannick et al^[268]52 ([269]Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR. Extended Data Figure 4. Gene-set rare variant association in the APOE4-stratied cohorts of the discovery (Ashkenazi Jewish) longevity cohort. Extended Data Figure 4. [270]Open in a new tab P* denotes P-value corrected for 6 categories of tested variants using the minimal-P value test from Flannick et al^[271]52 ([272]Methods). The text for the significant association denotes the lowest raw P-value among different groups of tested rare variants and FDR. Extended Data Figure 5. Lifespan analysis of protective variants in WNT signaling genes for non-centenarians. Extended Data Figure 5. [273]Open in a new tab P denotes uncorrected P-value derived from linear regression with the log-transformed age at death as the outcome and the gender as a covariate (See [274]Methods). 'WNT low' and 'WNT high' represent the alternative allele count of rare variants in WNT signaling genes ≤ 1 and > 1 (the median), respectively. In parentheses are the numbers of individuals. MD stands for 'median difference'. (A) The lifespan difference of individuals carrying a high and low burden of protective rare variants in WNT signaling genes. (B) Negative effects of APOE4 on lifespan with high and low burden of protective rare variants in WNT signaling for centenarians. Extended Data Figure 6. Lifespan analysis of protective variants in WNT signaling genes for centenarians. Extended Data Figure 6. [275]Open in a new tab P denotes uncorrected P-value derived from linear regression with the log-transformed age at death as the outcome and the gender as a covariate (See [276]Methods). 'WNT low' and 'WNT high' represent the alternative allele count of rare variants in WNT signaling genes ≤ 1 and > 1 (the median), respectively. In parentheses are the numbers of individuals. MD stands for 'median difference'. (A) The lifespan difference of individuals carrying a high and low burden of protective rare variants in WNT signaling genes. (B) Negative effects of APOE4 on lifespan with high and low burden of protective rare variants in WNT signaling for centenarians. Extended Data Figure 7. Disease-PRS analyses for centenarian and control. Extended Data Figure 7. [277]Open in a new tab This shows the results of PRS analyses for age-related diseases in the centenarian cohort. In the boxplots, points represent individuals, and horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range (25th to the 75th percentile). n = 910 biologically independent samples in the boxplots on the right panels for coronary artery disease, type 2 diabetes, stroke, and pancreatic cancer. n = 339 and 571 biologically independent samples in the boxplots on the right panels for prostate cancer and breast cancer, respectively. Above the boxplot on the right are raw and adjusted (in parentheses) P-values for the best prediction in the Nagelkerke R^2 plot on the left, which were calculated based on logistic regression and the permutation test in PRSice2, respectively. For stroke, breast cancer, prostate cancer, and pancreatic cancer, no robust association was observed between their PRS and the longevity status as originally defined in our cohort. (A) Coronary artery disease. (B) Coronary artery disease without considering SNPs within 1Mbps of rs7412 or rs429358 (SNPs for the APOE haplotype). (C) Type 2 diabetes. (D) Stroke. (E) Prostate cancer. Only males are considered. (F) Breast cancer. Only females are considered. (G) Pancreatic cancer. Extended Data Figure 8. Basic statistics of the lifespan cohort. Extended Data Figure 8. [278]Open in a new tab (A) Lifespan distribution of 553 individuals. (B) Survival curves of 202 males and 351 females composing the analyzed cohort. Females have a significant survival rate than males based on cox regression model (P = 1.71E-07; coxph in R package). Extended Data Figure 9. Correlation between lifespan and common-variant genetic risk of age-related diseases. Extended Data Figure 9. [279]Open in a new tab P-values were based on the result of linear regression (regress log lifespan on genetic disease risk) corrected for gender. (A) Alzheimer's disease. The plots on the left and right show the boxplot and survival curves of APOE4+ and APOE4−, respectively. MD stands for 'Median Difference'. In the boxplots, points represent individuals, and horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range (25th to the 75th percentile). n = 553 biologically independent samples. (B) Coronary artery disease. r represents 'correlation coefficient'. (C) Type 2 diabetes. Extended Data Figure 10. Flowcharts of sample collection for different analyses. Extended Data Figure 10. [280]Open in a new tab (A) Flowchart of sample collection for PRS analyses and lifespan analyses of rare variants and disease PRS. Refer '[281]Rare variant association analysis' subsection for the strategy of removing kinship for PRS analysis that involves longevity status. The strategy of removing kinship in lifespan analyses is to randomly exclude one in pairs of individuals with the proportion of alleles shared identity-by-descent (IBD) > 0.4. (B) Flowchart of sample collection for rare variant association tests, network-integrated analyses, and lifespan analyses of rare variants (and APOE4). Supplementary Material Supplementary Tables [282]NIHMS2105468-supplement-Supplementary_Tables.xlsx^ (69.2KB, xlsx) Supplementary Note and Figures [283]NIHMS2105468-supplement-Supplementary_Note_and_Figures.pdf^ (3MB, pdf) ACKNOWLEDGEMENTS