Abstract

   Extreme longevity in humans has a strong genetic component, but whether
   this involves genetic variation in the same longevity pathways as found
   in model organisms is unclear. Using whole exome sequences of a large
   cohort of Ashkenazi Jewish centenarians to examine enrichment for rare
   coding variants, we found most longevity-associated rare coding
   variants converge upon conserved insulin/insulin-like growth factor 1
   signaling (IIS) and AMP-activating protein kinase (AMPK) signaling
   pathways. Centenarians have a similar number of pathogenic rare coding
   variants as control individuals, suggesting the rare variants detected
   in the conserved longevity pathways are protective against age-related
   pathology. Indeed, we detected a pro-longevity effect of rare coding
   variants in the WNT signaling pathway on individuals harboring the
   known common risk allele APOE4. The genetic component of extreme human
   longevity constitutes, at least in part, rare coding variants in
   pathways that protect against aging, including those that control
   longevity in model organisms.

   Keywords: rare variants, aging, lifespan, longevity, centenarians,
   human genetics
     __________________________________________________________________

   Species-specific lifespan is limited by aging, a multifactorial process
   accompanied by a general decline in tissue function and increased risk
   for many diseases^[77]1. Instead of a passive, entropic process of
   deterioration, aging is subject to active modulation by signaling
   pathways and transcription factors conserved across
   species^[78]2,[79]3. In model organisms, single gene mutations have
   been demonstrated to affect lifespan^[80]4. For example, at the extreme
   end, the lifespan of nematode worms can be increased up to nearly
   ten-fold by mutations in genes involved in insulin/insulin-like growth
   factor 1 signaling (IIS)^[81]5,[82]6. But even in more complicated
   organisms, such as flies and mice, lifespan can be extended up to 50%
   by mutations affecting the same pathway^[83]7-[84]9 or other pathways
   involved in growth, metabolism and nutrient sensing, such as the
   mechanistic target of rapamycin (mTOR) and AMP-activating protein
   kinase (AMPK)^[85]10. On the basis of homology, it is widely
   hypothesized that these conserved signaling pathways are similarly
   involved in human aging and longevity.

   In humans, lifespan is a complex trait affected by multiple factors
   that vary considerably within human populations. While non-genetic
   factors, including diet, physical activity, health habits, and
   psychosocial factors are important, lifespan clearly has a genetic
   component as suggested by human population-based studies^[86]11,[87]12.
   At increasingly older ages, especially beyond 100 years, this genetic
   component becomes exceedingly strong^[88]13,[89]14. As a highly complex
   trait, the genetic underpinnings of human lifespan likely encompass
   different types of genetic variants and epistasis across the allele
   frequency spectrum. Common variants associated with human survival have
   been extensively searched for in many recent genome-wide association
   studies (GWAS) using a variety of trait definitions and study
   designs^[90]15. Together, these studies identified more than 50
   longevity-associated genetic loci of genome-wide significance, among
   which only few, especially APOE, were replicated by multiple
   studies^[91]16. On the other hand, several previous studies detected
   association of human longevity with variants in several aging genes –
   such as insulin signaling genes^[92]17 and FOXO3^[93]18,[94]19 – by
   using candidate gene approaches. Most of these longevity-associated
   SNPs have small effect sizes, and currently common variants
   collectively only explain a very small proportion of heritability for
   human longevity. As several recent studies suggest, rare variants
   likely account for at least some of the 'missing'
   heritability^[95]20-[96]22.

   Here we examined rare coding variants in a cohort of 515 Ashkenazi
   Jewish centenarians by whole-exome sequencing (WES) and tested for
   enrichment using a case-control design. The exceptional longevity of
   this cohort and their homogeneous genetic background provided us with
   increased power to detect causal rare variants^[97]23. As controls we
   used 496 Ashkenazi Jewish individuals, mostly from the same households
   as the centenarians, between age ~70 and 95 without a parental history
   of extreme longevity (neither parent survived beyond 95 years of age)
   ([98]Tables 1 and [99]Supplementary Table S1).

Table 1. Information of study cohorts.

  Characteristic                     Centenarians Controls (or non-centenarians)
                                       Rare variant association study cohort
  Number of subjects                     515                   496
  Female, %                             72.4%                  53%
  Age at enrollment, years (mean±SD)   97.6±3.5              73.3±8.4
                                              Disease PRS study cohort
  Number of subjects                     479                   431
  Female, %                             73.1%                 51.3%
  Age at enrollment, years (mean±SD)   97.6±3.5              73.2±8.7
                                               Lifespan study cohort
  Number of subjects                     356                   197
  Female, %                             74.2%                 44.2%
  Age at enrollment, years (mean±SD)   97.7±3.7              77.9±7.8
  Age at death, years (mean±SD)       100.5±3.4              84.2±7.3
   [100]Open in a new tab

RESULTS

Longevity genes and pathways implicated by rare variants

   Using a joint genotyping procedure and stringent quality control
   metrics, we identified 130,297 rare coding variants, including 126,405
   SNPs and 3,892 indels, with minor allele frequencies < 0.01 and missing
   rates < 0.1 in 17,561 genes in centenarians and controls. Of all SNPs,
   a total of 45,493 SNPs were found to be synonymous. The remaining
   84,804 non-synonymous SNPs and all indels include 75,567 missense
   variants, more than 3,500 loss-of-function variants (1,755 frameshift,
   1,736 stop-gain, and 79 stop-loss variants), and other variants with
   multiple functional annotations. We did not exclude synonymous rare
   variants from our analysis as not all of them are functionally
   silent^[101]24. At the whole exome level, we found no significant
   difference in the number of rare coding variants between centenarians
   and controls (P = 0.243, logistic regression including gender and the
   top 10 multidimensional scaling (MDS) components as covariates).

   We next examined rare variant association with longevity at the variant
   or the gene level. At the variant level, we applied the “firth logistic
   regression for rare variant association tests” to examine association
   between the minor allele count of each rare coding variant and the
   longevity status. The variant with the strongest association signal was
   rs2229426 in FASN (fatty acid synthase) (P = 6.23E-05)
   ([102]Supplementary Table S2). At the gene level, we applied two
   complementary region-based association tests^[103]25 – the “burden test
   of rare variants” and Sequence Kernel Association Test (SKAT) – to
   examine the association between the aggregate effect of rare coding
   variants in each gene and longevity. The burden test searches for a
   significant excess of rare alleles in longevity cases or controls,
   while SKAT implements a variance component test to detect effects of
   variants on longevity even if they have opposite directions. CLCN6
   (chloride voltage-gated channel 6) presented the strongest variant
   association with longevity (The burden test; P = 3.45E-06 and 1.03E-05
   as the lowest and the combined P-values, respectively)
   ([104]Supplementary Table S3). Although these top associations at the
   variant or the gene level did not reach genome-wide significance after
   multiple-test correction, quantile-quantile (QQ) plots of association
   signals showed upward deviation in the tails – the lowest P-values
   smaller than expected from uniform distribution (0, 1) – for several
   groups of rare variants such as functional rare variants (CADD^[105]26
   score ≥ 20) and functional but recessive benign rare variants (CADD
   score ≥ 20 and PrimateAI^[106]27 score < 0.5) (see the variant masking
   in [107]Methods for variant groups and their interpretation)
   ([108]Figures 1A and [109]1B, [110]Supplementary Figures S1 and
   [111]S2, and [112]Supplementary Table S4). These rare variants include
   several genes known to be related to aging such as FASN^[113]28 and the
   DNA repair gene BLM RecQ like helicase (BLM)^[114]29.

Figure 1. Longevity association of rare variants.

   Figure 1.
   [115]Open in a new tab

   (A) The QQ plots for single rare variant association tests. P-values
   from tests of 2,787 functional rare variants (CADD score ≥ 20) and
   3,127 synonymous rare variants were used to construct two separate QQ
   plots. Only rare variants with a minor allele count ≥ 15 in the
   case-control cohort were included in the plots. (B) The QQ plot for
   gene-based rare variant association tests. SKAT P-values from tests of
   functional but recessive benign rare variants (CADD score ≥ 20 and
   PrimateAI scores < 0.5) in 3,717 genes were used to construct the QQ
   plot. Only genes with two or more masked rare variants in the
   case-control study cohort were included in the QQ plot. (QQ plots under
   different rare-variant masks are in [116]Supplementary Figure S2.) (C)
   Pathway enrichment analysis of genes implicated by rare variants
   aggregated in a gene functional network. Top 100 IGSP-scored genes
   showing the trend of network aggregation were analyzed. Top 10
   non-redundant (covering unique putative longevity genes) enriched
   pathways are shown. For a gene in a pathway in the heatmap, the color
   of its cell indicates a weighted burden of rare variants in
   centenarians (deeppink) or controls (blue) (See [117]Supplementary
   Table S5). Genes in the heatmap were ordered based on their
   hierarchical clustering. (D) Gene-set rare variant association for
   aging-related pathways. P* denotes P-value corrected for 6 categories
   of rare variants using the minimal-P value test from Flannick et
   al^[118]52 ([119]Methods). The text for the significant association
   denotes the lowest nominal P-value among different categories of rare
   variants and FDR.

   The extreme rarity of centenarians in human populations essentially
   constrains the possibility of performing the large studies necessary to
   discover rare variants through statistically significant genetic
   associations with longevity. Instead of a candidate gene approach, we
   used Integrated Gene Signal Processing (IGSP) to prioritize genes based
   on the longevity association of rare variants in an unbiased manner
   through data integration^[120]30. About 94% of human protein-coding
   genes were in the functional linkage network used by IGSP, and only
   half of them also had knockout phenotype data for their mouse homologs.
   To include most genes in our analysis, we opted for a network
   integration – instead of a full one, which needs both gene network and
   mouse phenotype data. Individual genes were scored by jointly analyzing
   the longevity association of genes implicated by rare variants in a
   gene functional linkage network (predicted based on independent genomic
   high-throughput data)^[121]31, which implicitly incorporates
   information of gene-gene functional similarity. Data simulation showed
   that such integrated scoring greatly increases the prioritization power
   and effectively uncovers risk genes with marginal association
   signals^[122]30. The negative-control evaluation showed that ~100 top
   ranked genes had higher IGSP scores when scored by real data than by
   randomized data (P = 0.037) ([123]Supplementary Figure S3 and
   [124]Table S5), which suggests a clustering of longevity-associated
   genes implicated by rare variants in the gene network captured by a
   network integration in gene scoring^[125]30. Subsequent pathway
   enrichment analysis showed that these predicted longevity genes are
   significantly enriched in insulin signaling (FDR = 0.00879) and mTOR
   signaling (FDR = 0.0129) ([126]Figure 1C and [127]Supplementary Figure
   S4). Some predicted longevity genes have an indirect connection to
   insulin signaling as they are in the pathway of signaling by the
   insulin receptor (e.g., PSMB9). Interestingly, many of the putative
   longevity genes carry a burden of rare variants in centenarians, among
   which potential protective rare variants were also found in previous
   studies such as ABCA1^[128]32 and PLCG2^[129]33 ([130]Supplementary
   Table S5).

   To further increase power, longevity association of rare coding
   variants can be studied at the pathway level. Since aging is
   characterized by evolutionarily conserved, parallel and interacting
   mechanistic hallmarks, we next analyzed rare variants collectively in
   20 pathways of all nine aging hallmarks^[131]1 ([132]Supplementary
   Table S6). Functional but recessively benign rare variants in insulin
   signaling (SKAT, P = 5.57E-05, FDR = 0.012) and AMPK signaling (SKAT, P
   = 1.59E-04, FDR = 0.017) pathways were found to be significantly
   associated with extreme longevity ([133]Figure 1D, [134]Supplementary
   Tables S7 and [135]S8) after multiple testing correction that took into
   account the total numbers of pathways, tests, and variant masks.

   When studying genetic variants in association studies, it is important
   to validate the results by replicating any observed association signals
   in unrelated cohorts. Our approach followed the sequence-based
   replication strategy, which is more powerful than the variant-based
   strategy that only analyzes rare variants uncovered in the discovery
   cohort^[136]34. Specifically, we examined three replication cohorts for
   longevity association of rare coding variants in insulin and AMPK
   signaling pathways ([137]Supplementary Table S9): a German longevity
   cohort of 1,265 centenarians (mean age: 99 years) and 4,195 blood
   donors (mean age: 35 years) as controls, a UK Biobank longevity cohort
   of 104 participants with at least one long-lived parent (lifespan ≥ 100
   years) and 23,405 participants with parents of usual survival (lifespan
   < 95 years), and an Alzheimer's Disease Sequencing Project (ADSP)
   longevity cohort of 1,121 non-AD individuals aged ≥ 90 years and 38
   non-AD individuals aged < 75 years^[138]35.

   In the German longevity cohort, we detected a significant longevity
   association of functional but recessive benign ultra-rare variants (AAF
   < 0.05% among non-Finnish European in gnomAD^[139]36) in insulin
   signaling (SKAT, P = 4.41E-04, FDR = 0.018) after appropriate multiple
   testing correction ([140]Extended Data Figure 1A). In the UK Biobank
   longevity cohort, we identified significant longevity associations of
   functional rare variants in insulin signaling (SKAT, P = 9.64E-06, FDR
   = 3.87E-04) and functional but recessive benign ultra-rare variants in
   AMPK signaling (SKAT, P = 2.08E-03, FDR = 0.041) pathways
   ([141]Extended Data Figure 2A). In the ADSP longevity cohort, we
   identified significant longevity associations of recessive pathogenic
   rare variants in insulin signaling (Burden test, P = 8.98E-5, FDR =
   3.6E-03; Direction on controls) ([142]Extended Data Figure 3).

   Next, we focused on identifying rare variants associated with human
   age-related disease. A genetic relationship between extreme human
   longevity and disease is supported by multiple observation in
   independent studies of a genetic association between extreme human
   longevity and APOE, a locus causally related to both cardiovascular and
   neurodegenerative disease^[143]37. Here we hypothesized that rare
   genetic variants associated with human longevity can exert their
   beneficial effects, at least in part, by protecting against chronic
   disease. Hence, we examined the rare coding variants in the 20 aging
   hallmark pathways in more refined subgroups of our cohort based on
   their APOE haplotype status and analyzed separately the longevity
   sub-cohorts of APOE4 carriers and non-carriers (hereinafter APOE4+ and
   APOE4−, respectively) to identify longevity-associated pathways in
   these two distinct genetic backgrounds. Among APOE4−, functional but
   recessively benign rare variants in both insulin and AMPK signaling
   pathways were again found significantly associated with longevity
   (SKAT, P = 6.21E-06 and 7.9E-05, FDR = 2.63E-03 and 0.013,
   respectively) ([144]Extended Data Figure 4, [145]Supplementary Tables
   S10 and [146]S11). Interestingly, among APOE4+, we detected a
   significant association between longevity and 152 functional rare
   variants in WNT signaling genes after multiple testing correction using
   both the burden test and SKAT (the burden test, P = 9.16E-05, FDR =
   0.013; SKAT, P = 3.40E-04, FDR = 0.036) ([147]Extended Data Figure 4,
   [148]Supplementary Tables S12 and [149]S13). The direction of
   association suggests that these rare variants are enriched for
   protective variants among centenarian APOE4+ in our cohort
   ([150]Supplementary Table S14). Indeed, only six of them were predicted
   as highly pathogenic rare variants (PrimateAI score ≥ 0.9), and they
   are not enriched, individually or collectively, among APOE4+
   centenarians. The WNT association was replicated in the UK Biobank
   longevity cohort with a significant longevity association of functional
   rare variants in WNT signaling pathway among APOE4+ (SKAT, P =
   1.79E-10, FDR = 2.14E-08) ([151]Extended Data Figure 2B and
   [152]Supplementary Table S9). We did not detect significant longevity
   association signals from rare variants in WNT signaling pathway in
   either of APOE4-stratified German longevity sub-cohorts ( [153]Extended
   Data Figure 1B).

   We examined further the protective effect of functional rare variants
   in WNT signaling genes on individual human lifespan in our lifespan
   cohort of 553 individuals with verifiable ages at death. Starting with
   the full linear model of lifespan that included gender, APOE4 status,
   the alternative allele count of protective rare variants in WNT
   signaling, and all two-way and three-way interaction terms among them,
   we identified a statistically significant interaction, the only one,
   between APOE4 status and the allele count in WNT signaling (P =
   1.13E-04) ([154]Supplementary Table S15). As the APOE4 status is
   determined mainly by rs429358, a common variant (MAF = 0.14) associated
   with aging and age-related diseases, the lifespan analysis result
   indicates the existence of epistasis between rare variants and
   aging-associated common variants in the genetic architecture of human
   aging. We then analyzed the relationship between individual lifespan
   and the alternative allele count of protective rare variants in WNT
   signaling in sub-cohorts stratified by the status of both longevity and
   APOE4 ([155]Figure 2A). Among centenarians, the allele count in WNT
   signaling has no effect on the lifespan regardless of the APOE4 status.
   Among non-centenarians, there was a significant positive correlation
   between the burden of WNT rare variants and lifespan among APOE4+ (r =
   0.406, P = 8.39E-03, FDR = 0.026) ([156]Figure 2A, the middle blue
   panel), compared with non-carriers. The relationship between APOE4
   status and the allele count in WNT signaling can also be more readily
   appreciated by comparing the average lifespan of sub-cohorts stratified
   based on both APOE and WNT signaling. Among APOE4+, the median
   difference in lifespan was over nine years (P = 2.38E-03) between
   individuals with low or high allele counts in WNT signaling
   ([157]Figure 2B). And the negative effect of APOE4 on lifespan became
   weaker among individuals with a high burden of potentially protective
   WNT rare variants ([158]Figure 2C). Interestingly, the aforementioned
   152 rare variants in WNT signaling genes are associated with the
   disease status of individuals in the ADSP (SKAT, P = 4.82E-03).
   Finally, using the same framework, we analyzed lifespans of
   centenarians and non-centenarians separately and demonstrated a similar
   protective effect of the 152 rare variants in WNT signaling genes
   ([159]Supplementary Table S14) on lifespan among non-centenarian APOE4+
   ([160]Supplementary Table S9, [161]Extended Data Figures 5 and [162]6).

Figure 2. Protective rare variants in WNT signaling genes for APOE4+.

   Figure 2.
   [163]Open in a new tab

   P denotes uncorrected P-value derived from linear regression with the
   log-transformed age at death as the outcome and the gender as a
   covariate (See [164]Methods). 'WNT low' and 'WNT high' represent the
   alternative allele count of rare variants in WNT signaling genes ≤ 1
   and > 1 (the median), respectively. In parentheses are the numbers of
   individuals. MD stands for 'median difference'. (A) Correlation between
   lifespan and the alternative allele count of protective rare variants
   in WNT signaling genes. (B) The lifespan difference of individuals
   carrying a high and low burden of protective rare variants in WNT
   signaling genes. The horizontal lines and vertical thick lines in
   violin plots represent median and interquartile range, respectively.
   (C) Negative effects of APOE4 on lifespan compensated by protective
   rare variants in WNT signaling genes.

Longevity and common polygenic risk of age-related diseases

   The phenotypic outcome of individuals that carry rare variants of large
   effects can also be influenced by the background of common polygenic
   variation. To assess how rare variants may interact with the genetic
   background of common variants to affect human aging, we specifically
   examined in our longevity cohort common variants associated with seven
   age-related diseases: Alzheimer's disease, coronary artery disease,
   type 2 diabetes, stroke, breast cancer, prostate cancer, and pancreatic
   cancer. This analyzed cohort consists of 479 centenarians and 431
   controls with both WES and SNP array data available ([165]Table 1 and
   [166]Supplementary Table S1). We calculated polygenic risk scores (PRS)
   of individuals for these diseases using summary statistics from their
   corresponding GWAS (See [167]Methods). Empirical P-values provided by
   PRSice2 that account for over-fitting indicated significant genetic
   overlap between longevity and each of Alzheimer's disease, coronary
   artery disease, and type 2 diabetes ([168]Figure 3, [169]Extended Data
   Figure 7, and [170]Table S16), which was further supported by the
   significant results of cross-validation ([171]Supplementary Table S17).
   PRS for Alzheimer's disease, coronary artery disease, and type 2
   diabetes explained 1.93% (P = 0.0019), 1.32% (P = 0.013) and 1.29% (P =
   0.015) variance of the longevity status, respectively. Measured by PRS,
   centenarians tend to have reduced genetic susceptibility to not only
   Alzheimer's disease and coronary artery disease (Bonferroni-Holm P* =
   0.0067 and 0.039, respectively), which were previously found associated
   with healthy aging^[172]38 but also type 2 diabetes (P* = 0.039). The
   predictive power of PRS for Alzheimer's disease was mainly driven by
   the APOE haplotype defined by SNPs rs7412 and rs429358 ([173]Figure 3B
   and [174]3C). It's not the case for coronary artery disease
   ([175]Extended Data Figure 7B)^[176]39. To further examine the genetic
   overlap between longevity and diseases, we applied an
   'extreme-longevity phenotyping' strategy and found that the variance
   explained by PRS for Alzheimer’s disease and coronary artery disease
   increased almost four times to 4~7% between people with age ≥ 100 years
   and < 80 years ([177]Figure 3, [178]Supplementary Figure S5, and
   [179]Table S16). PRS for type 2 diabetes showed a stronger association
   with the longevity status among males than females in our cohort. PRS
   for Alzheimer's disease and coronary artery disease, however, showed no
   such gender difference ([180]Supplementary Figure S6).

Figure 3. Common polygenic risk of age-related diseases.

   Figure 3.
   [181]Open in a new tab

   (A) Common polygenic risk for seven different age-related diseases on
   subjects were calculated using PRS of the corresponding diseases.
   Nagelkerke's R^2 is based on correlation between the disease PRS and
   the centenarian status. The bar color denotes the statistical
   significance of R^2 after adjusting MDS1-10 and gender (except breast
   cancer and prostate cancer, which are tested with females and males,
   respectively) as covariates. The statistical significance is based on
   the permutation P-values of using PRSice-2. For Alzheimer's disease and
   coronary artery disease, the middle bars show the results of PRS
   analyses excluding SNPs within 1 Mbps of APOE haplotype SNPs – rs429358
   and rs7412. The bottom bars (for Alzheimer's disease, coronary artery
   disease, breast cancer and prostate cancer) show the results of PRS
   analyses using extreme-longevity phenotypes (cases and controls with
   ages ≥ 100 years and < 80 years, respectively. See [182]Supplementary
   Table S16). (B-D) PRS analyses of Alzheimer's disease as it is,
   excluding SNPs within 1Mbps of rs7412 or rs429358, or using
   extreme-longevity phenotypes. In the boxplots, points represent
   individuals, and horizontal lines represent upper fence (maximum in
   Q3+1.5×IQR), upper quartile (Q3), median, lower quartile (Q1), lower
   fence (minimum in Q1–1.5×IQR), sequentially from top to bottom; IQR:
   interquartile range (25th to the 75th percentile). n = 910 biologically
   independent samples. Above the boxplot on the right are raw and
   adjusted (in parentheses) P-values for the best prediction in the
   Nagelkerke's R^2 plot on the left, which were calculated based on
   logistic regression and the permutation test in PRSice2, respectively.

Pathogenic rare variants and longevity

   Since the genetic component of extreme longevity could be explained, at
   least in part, by a reduced burden of pathogenic variants as compared
   with that of the general population, we compared the counts of
   predicted pathogenic rare coding variants (PrimateAI score ≥ 0.9). No
   significant difference between centenarians and controls (P = 0.243,
   logistic regression including gender and the top 10 MDS components as
   covariates; [183]Figure 4A) was observed. Using our lifespan cohort, we
   next investigated whether pathogenic rare coding variants affect
   lifespan, whether the effect depends on the common polygenic disease
   background, and whether the effect is different between centenarians
   and controls. Consistent with the general observation, females also had
   significantly better survivorship than males in our lifespan cohort (P
   = 1.71E-07, [184]Extended Data Figure 8). So had APOE4− than APOE4+ (P
   = 9.32E-04, [185]Extended Data Figure 9). In our lifespan cohort, 853
   pathogenic rare variants were identified. No correlation between the
   exome-wide burden of pathogenic rare variants and the lifespan was
   observed among centenarians and non-centenarians together (the full
   lifespan cohort) or either of them separately ([186]Figure 4B).

Figure 4. Analysis of pathogenic rare variants.

   Figure 4.
   [187]Open in a new tab

   (A) Exome-wide burden of pathogenic rare variants in centenarians and
   controls. (B) Correlation between lifespan and the exome-wide burden of
   pathogenic rare variants. The left panel shows the result based on all
   553 individuals. The middle and the right panels show the results based
   on individuals with lifespan ≥ 95 years and < 95 years, respectively.
   (C) Correlation between lifespan and the exome-wide burden of
   pathogenic rare variants among individuals with high genetic risk of
   age-related diseases. The left panel shows the correlation among 94
   APOE4+. The right panel shows the correlation among 20 APOE4+ with PRS
   among top 45% for CAD and T2D (see [188]Supplementary Table S18 for the
   results of using other cutoffs).

   Human extreme longevity could be causally driven by a lack of genetic
   risk factors for chronic disease, by protective variants or both.
   Measured by the polygenic risk score (PRS), centenarians in our cohort
   tend to have reduced genetic susceptibility to Alzheimer's disease
   (AD), coronary artery disease (CAD), and type 2 diabetes (T2D) among
   seven age-related diseases that we examined ([189]Figure 3 and
   [190]Supplementary Table S16). Using the APOE4 status and PRS of CAD
   and T2D, we stratified our lifespan cohort according to their common
   genetic risk of AD, CAD, and T2D – the three age-related diseases with
   significant genetic overlap with longevity in our cohort – and examined
   how the common polygenic disease risk background and pathogenic rare
   variants may together affect human lifespan. We first re-examined the
   effect of pathogenic rare variants on lifespan on an AD risk background
   based on the APOE4 status and found a weak negative correlation (r =
   −0.184, P = 0.064) among APOE4+. However, this relationship became
   significantly stronger (r = −0.605, P = 2.85E-03, FDR = 7.13E-03) if
   substantial genetic risk of both CAD and T2D also was included (i.e.,
   APOE4+ with PRS for both diseases higher than the respective median of
   the longevity cohort) ([191]Figure 4C and [192]Supplementary Table
   S18). These results suggest that pathogenic rare variants and
   disease-associated common variants interact. Such genetic interactions
   may affect the deleterious effect of pathogenic rare variants on human
   lifespan, a possibility that we formally investigated using a full
   linear model of lifespan including gender, APOE4 status, separate PRS
   of CAD and T2D, the pathogenic rare variant counts, and all two-way and
   higher-order interaction terms among them. The subsequent stepwise
   model selection identified multiple interactions, among which the most
   significant is a three-way interaction among the pathogenic rare
   variant count and the common polygenic disease risk of AD and T2D in
   our lifespan cohort (P = 3.12E-04) ([193]Supplementary Table S15). Our
   analyses of stratified sub-cohorts showed that the negative effect of
   common polygenic disease risk on human lifespan can intensify under a
   high burden of pathogenic rare variants. For example, the presence of
   APOE4 reduced life by ~1.5 years on average in our cohort. However,
   among individuals with ≥ 7 pathogenic rare variants (the median = 3),
   APOE4+ lived ~17 years less than non-carriers in general (P = 2.77E-04;
   FDR = 8.31E-04) ([194]Supplementary Figure S7). To replicate this
   discovery of the relationship between pathogenic rare variants and
   lifespan, we first constructed a UK Biobank parental lifespan cohort
   ([195]Methods), which consists of 20,823 unrelated (to the first-degree
   kinship) participants with known parental ages at death, and then
   examined the relationship between the exome-wide burden of pathogenic
   rare coding variants and the parental lifespan in this cohort
   ([196]Supplementary Figure S8). We observed a negative correlation
   among APOE4+ (r = −0.024, P = 0.044) ([197]Supplementary Figure S9 and
   [198]Supplementary Table S9). The stepwise model selection procedure
   identified a significant interaction related to parental lifespan
   between the APOE4 status and the exome-wide burden of pathogenic rare
   coding variants (P = 5.48E-05).

DISCUSSION

   In summary, in this first large-scale genetic study of rare coding
   variants and human longevity, our network-integrated analysis
   identified an enrichment of longevity-associated rare coding variants
   in conserved aging pathways and gene-set association tests confirmed
   longevity association of rare variants in insulin and AMPK signaling
   pathways. These results suggest that rare variants in conserved aging
   pathways important for aging of model organisms also affect human
   lifespan and constitute a part of the genetic architecture of human
   longevity. As expected, based on the many species-specific
   characteristics of aging, the pattern is not completely identical
   between human and animal longevity. For example, we did not find any
   association of extreme longevity with variants in the mTOR pathway,
   which has been associated with longevity in model organisms, including
   the mouse. On the other hand, we did find other pathways critical to
   human aging not yet identified in model organisms. For example, we
   demonstrated protective effects of rare variants in WNT signaling on
   human lifespan. Interestingly, in the klotho-knockout mouse model of
   accelerated aging, continuous WNT exposure triggered accelerated
   cellular senescence, implicating WNT signaling in mammalian
   aging^[199]40. Finally, our results confirm previous reports that
   centenarians do not have a lower burden of pathogenic variants.
   Instead, from our present study, it appears that rare protective
   variants suppress the adverse effects of pathogenic variants on
   longevity.

   To investigate whether the same conserved pathways are important to
   aging of both model organisms and human, we can examine the effects of
   rare variants on lifespan-related traits. This is particularly
   challenging, however, due to a strong intrinsic stochasticity in aging
   processes: among isogenic C. elegans in a constant environment,
   lifespan of long-lived (age-1) mutants overlaps with that of the
   wildtype controls^[200]41,[201]42. Thus, the same genetic variants may
   have highly variable effects on lifespan among different individuals.
   While this stochasticity complicates the identification of
   longevity-associated variants in conserved aging pathways, using
   appropriate statistical tests and study cohort can help overcome the
   challenge.

   In this study, we identified rare coding variants in aging pathways
   that affect human longevity. Future studies of their molecular
   functions could generate actionable biological insights on aging. In
   particular, uncovering the downstream pathways that mediate protective
   effects of rare variants found in WNT signaling genes is imperative to
   translate this finding into therapeutic interventions against
   age-related diseases. Experiments with mouse cells suggest that APOE4
   may inhibit WNT signaling^[202]43. Dysregulation of WNT signaling
   contributes to different types of age-related diseases such as
   cancer^[203]44, AD^[204]45, and cardiovascular disease^[205]46. Thus,
   protective rare variants in WNT signaling genes could counteract the
   adverse effects of APOE4-induced WNT inhibition on the progression of
   downstream age-related diseases and thus affect lifespan. While coding
   variants are more likely to reduce than to enhance the function of the
   protein product, conclusive confirmation and understanding of the
   functional effects at the molecular, cellular, and organismal levels
   require experimental validation using functional assays and
   genome-editing^[206]16.

   Our study suggests that rare variants can have distinct effects on
   lifespan on different genetic backgrounds of age-related diseases (such
   as the APOE4 status), underlying the difficulty to detect and replicate
   effects of rare variants on lifespan without considering other genetic
   factors. On the other hand, while common variants associated with
   age-related diseases are known to influence lifespan^[207]47, our
   finding of potential genetic interactions between common and rare
   variants in the context of human lifespan provides novel insights into
   the mechanism of disease resilience as a part of the genetics of
   healthy aging among centenarians. How perturbation of conserved aging
   pathways contributes to human longevity and healthspan cannot be
   answered by genetics alone. However, while the molecular mechanisms of
   many conserved aging pathways have been widely studied in model
   organisms, our findings about rare variants – especially those from
   centenarians with high common polygenic risk of age-related diseases –
   can help translate those established longevity-regulating mechanisms in
   model organisms to therapeutic targets for healthy aging of humans.

   A limitation of whole-exome sequencing, used in our present study, is
   the absence of rare, non-coding variants that have been implicated in
   aging of model organisms^[208]48-[209]50 and thus of potential interest
   for human longevity. These include, for example, rare variants in
   non-coding RNAs or other regulatory elements relevant for tissue
   specificities and variants in long tandem repeats connected to brain
   health and various neurological disorders. To identify the latter, long
   sequencing reads are required^[210]51.

METHODS

   Our WES study on the Einstein longevity cohorts complies with all
   relevant ethical regulations and was approved by the Institutional
   Review Board at Albert Einstein College of Medicine. Informed consent
   was obtained from participants or from a proxy if the participant
   lacked decisional capacity. The WES studies of all the three
   replication cohorts have informed consent from participants and were
   approved by the respective ethics committee or institutions: the Ethics
   Committee at Medical Faculty of Kiel University for the German
   longevity cohort; the Ethics Advisory Committee and the external ethics
   committees for the UK Biobank; and the ethics committees of the Broad
   Institute, Baylor College of Medicine’s Human Genome Sequencing Center,
   and Washington University’s McDonnell Genome Institute for the ADSP
   cohort.

Recruitment of Einstein longevity cohorts

   The study subjects were Ashkenazi Jewish participants from two
   longevity cohorts, the Longevity Genes Project (LGP) and the LonGenity
   study, who were recruited and characterized at the Albert Einstein
   College of Medicine since 1998. Cases (centenarians) were defined as
   individuals with age ≥ 95 years, and individuals with age < 95 without
   a parental history of longevity (neither parent survived beyond 95
   years of age) were classified as controls. The centenarians' dates of
   birth were confirmed by birth certificates or government issued
   identification. Vital status and date of death, where applicable, were
   determined as of April 3, 2019, based on documentation of last contact
   with the study participant, reports from the next of kin, and search of
   publicly available databases. In the LGP and LonGenity cohorts, 555 and
   508 individuals were classified as longevity cases (mean age: 101) and
   controls (mean age: 83), respectively. Mortality status was confirmed
   for 650 individuals, and these individuals were subjects for the
   lifespan analysis.

SNP-array genotyping

   SNP-array genotyping was performed using Illumina Global Screening
   Array-24 v1.0 BeadChip with 642,824 markers, 7201 of which could not be
   'lifted over' to human genome assembly GRCh38 and thus removed. 2,026
   samples were genotyped by SNP-arrays. After removing duplicates and
   samples not in our longevity studies, 635,623 variants in 1,830 samples
   were processed and analyzed (1,740 samples also have WES data). Quality
   control of array-based genotyped data was carried out using PLINK
   software (version 1.9)^[211]53. First, we checked the missing rate of
   SNPs and samples. SNPs and samples that miss over 20% genotype calls
   are removed and this missingness filtering is repeated with a more
   stringent threshold of 2%. Individuals whose self-reported gender is
   different from the one predicted based on sex chromosome heterozygosity
   are removed. SNPs whose genotype frequencies deviate from the
   Hardy-Weinberg equilibrium with a χ^2-test P < 1E-6 among controls,
   followed by P < 1E-10 among cases are removed. Finally, samples whose
   heterozygosity deviated more than three standard deviations from the
   mean are removed.

Exome sequencing and genotyping

   Exome sequencing of 2,112 individuals in LGP and LonGenity cohorts was
   performed at the Regeneron Genetics Center (RGC). Sample preparation
   and whole-exome sequencing were performed using previously described
   methods^[212]54 ([213]Supplementary Note). Variants in our centenarian
   cohort were called on human genome assembly GRCh38. For our rare
   variant analyses using both binary (cases vs. controls) longevity and
   continuous lifespan data, only rare variants with missing rates < 0.1
   in the corresponding study cohorts were analyzed; all samples in our
   study cohorts have a missing rate < 0.01 on rare variants that passed
   the quality control ([214]Supplementary Note).

Aggregation of SNP-array and WES data

   For PRS-related analyses, we used genotypic data aggregated from WES
   and SNP-array ([215]Extended Data Figure 10A) for two reasons: (1)
   genotypes of common variants from the whole genome (not just the exome)
   need to be imputed (see the next sub-section) for PRS calculation; and
   (2) genome-wide imputation based on genotypic data from both WES (for
   better accuracy) and SNP-array (for better coverage) is better than
   imputation based on WES data alone. After the aggregation process
   ([216]Supplementary Note), ~1,203k variants were kept in the merged VCF
   file.

Genotype imputation

   We used the Michigan Imputation Server (Minimac3)^[217]55 for genotype
   imputation (n = 1,740). The Haplotype Reference Consortium (HRC, r1.1
   2016)^[218]56 was used as the reference panel, Eagle v2.3 for phasing,
   and the European population (EUR) for quality control. After the
   post-imputation process ([219]Supplementary Note and [220]Supplementary
   Figure S10), we obtained ~14,079k polymorphic variants in our cohort.
   We evaluated the suitability of the HRC reference panel for
   cross-ethnicity genotype imputation in our study, using 196 Ashkenazi
   Jewish individuals in our cohort for whom the whole-genome sequencing
   data are available. Genotype imputation that we performed was highly
   accurate: in 183 individuals (out of 196), genotypes of >99% of 2,020
   randomly selected non-coding variants that were not genotyped by either
   WES or SNP array data can be correctly imputed ([221]Supplementary
   Figure S11).

Polygenic risk score analysis

   We calculated polygenic risk scores (PRSs) using
   PRSice-2^[222]57,[223]58 to analyze disease risk from common variants
   in our longevity cohort. We first collected summary statistics from the
   most recent GWAS of seven complex diseases of European or predominantly
   European ancestry: AD^[224]59, CAD^[225]60, T2D^[226]61,
   stroke^[227]62, prostate cancer^[228]63, breast cancer^[229]64, and
   pancreatic cancer^[230]65. From combined genotype data after imputation
   for 1,740 samples, common SNPs (MAF > 5%) were selected in the cohort
   and carried out LD clumping if they are within 250 kbps and R^2 > 0.1.
   After clumping, we used 19 P-values (1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4,
   0.3, 0.2, 0.1, 0.01, 1E-3, 1E-4, 1E-5, 1E-6, 1E-7, 1E-8, 1E-9, and
   1E-10) as cutoffs to select SNPs for scoring and, for AD, additional
   ones to restrict selection to most AD-associated SNPs. After removing
   outliers based on Multidimensional Scaling (MDS) analysis
   ([231]Supplementary Figure S12), non-Ashkenazi Jewish individuals and
   kinship, 910 centenarians and controls among 1,740 samples were used to
   evaluate association between disease PRS and longevity in the cohort
   ([232]Extended Data Figure 10A). To remove population substructure and
   sex difference, we included the top 10 MDS components derived from
   common SNPs in the combined genotype dataset and gender as covariates
   in the regression analysis to evaluate PRS association. When analyzing
   PRS of prostate cancer and breast cancer, only male and female
   individuals were considered respectively.

Rare variant association analysis

   Among 2,021 Ashkenazi Jewish individuals with WES data, 536 were
   centenarians, and 506 were controls. Pairs of individuals with the
   proportion of alleles shared identity-by-descent (IBD) > 0.4 were
   identified as related – i.e., monozygotic twins, parents and children,
   and full siblings – and one sample per pair was excluded, with
   inclusion to achieve more cases, higher ages of cases, and lower ages
   of controls. In our study cohort, we identified 31 participants as
   related to other participants due to high IBD. After excluding them, we
   had 515 cases (mean age: 101 years) and 496 controls (mean age: 83
   years) for rare variant association analysis ([233]Table 1 and
   [234]Extended Data Figure 10B). In this study, we analyzed rare
   variants with alternative allele frequencies < 1% in Ashkenazi Jewish
   populations, which were calculated based on the average of the allele
   frequencies in 731 unrelated (to the first-degree kinship) Ashkenazi
   Jewish individuals in our centenarian cohort (2,021) (excluding
   centenarians and other individuals included in our study ([235]Table
   1)) and the ones in Ashkenazi Jews reported in gnomAD. The longevity
   association was assessed on the variant, gene, and gene-set levels. We
   evaluated the longevity association of each rare coding variant using
   the firth logistic regression^[236]66. For association tests at gene
   and gene-set levels, we performed the burden test and SKAT (implemented
   in R^[237]67; version 1.3) to test longevity association of six
   different subsets of rare variants within each gene or gene-set. The
   variant-masking scheme^[238]52 was designed to group similar rare
   variants of specific properties based on CADD (version 1.4) and
   PrimateAI (version 0.2) annotation. CADD is widely used as a variant
   annotation tool to predict the functionality (i.e., being functional or
   neutral) of variants. In contrast, PrimateAI predicts their clinical
   impact (i.e., being pathogenic or benign). We defined different classes
   of variants based on the recommended thresholds of CADD and PrimateAI
   scores ([239]Supplementary Table S4): all rare variants (without
   masking), functional (or non-neutral)^[240]26 rare variants (CADD score
   ≥ 20), dominant pathogenic rare variants (PrimateAI score > 0.8),
   recessive pathogenic rare variants (PrimateAI score > 0.7), functional
   but dominant benign rare variants (CADD score ≥ 20 & PrimateAI score <
   0.6), and functional but recessive benign rare variants (CADD score ≥
   20 & PrimateAI score < 0.5). The minimum P-value test^[241]52 was used
   to combine P-values of the aforementioned six sets of rare variants at
   the gene or gene-set level. For gene-based association tests, only
   genes with multiple rare variants after masking were tested for the
   corresponding rare variant category. 15,935 genes were tested for at
   least one variant category. For gene-set association tests, we compiled
   20 gene sets of aging pathways for nine aging hallmarks^[242]1
   ([243]Supplementary Table S6) and used the burden test and SKAT to test
   longevity association of those six sets of rare variants within each of
   those 20 gene sets. FDR was used to correct for 130,297 P-values at the
   variant level, 31,870 (2 × 15,935) combined P-values at the gene level,
   and 40 (2 × 20) combined P-values at the gene set level, respectively.
   For rare variant association at the gene-set level, we conducted an
   independent analysis using the same framework but in two sub-cohorts:
   APOE4+ and APOE4−. FDR was used to correct for 80 (2 × 2 × 20) combined
   P-values in this analysis. Gender and top 10 MDS were included as
   covariates in all rare variant association analyses in the discovery
   cohort.

Network/pathway enrichment of rare variants

   In addition to conventional approaches of rare-variant association
   study, we investigated whether longevity-associated rare variants
   aggregate in a gene network and pathways. We first used IGSP^[244]30 to
   score longevity-associated genes by integrating rare-variant
   association tests at the gene level with gene functional
   network^[245]31. To consider information of all rare coding variants in
   IGSP scoring, we collected gene association signals by applying the
   weighted burden test (using the R package SKAT) on rare coding variants
   of each gene weighted by the corresponding CADD scores. We then tested
   whether top 100 genes tend to be scored higher than top 100 genes
   derived from randomized rare variant association signals using the
   Wilcoxon rank-sum test. To investigate enriched pathways of those top
   100 genes implicated by longevity association of rare variants and the
   functional gene network in an unbiased manner, we first performed the
   pathway enrichment analysis using ToppGene Suite^[246]68, in which
   1,245 pathways from different pathway databases were analyzed
   concurrently, to summarize top enriched pathways across pathway
   databases. In addition, we compared the top enriched KEGG and Reactome
   pathways identified by ToppGene Suite and other three widely used tools
   for pathway-enrichment analysis – Enrichr^[247]69, g:Profiler^[248]70,
   and GSEA^[249]71 – to derive enriched pathways supported by multiple
   analysis tools.

Lifespan analysis of rare variants

   In our longevity cohort, after removing the kinship relatedness, we
   have date of death – and thus definitive lifespan information – on 553
   Ashkenazi Jewish individuals (202 males and 351 females) ([250]Table 1,
   [251]Extended Data Figures 8 and [252]10), among which 550 (~99.5%)
   individuals with lifespans ≥ 65 years. Since no censored data were
   included in our lifespan cohort – i.e., all subjects reached the
   endpoint (death), for all lifespan analyses of rare variants, we tested
   the association between lifespan and the burden of rare variants in the
   lifespan cohort using a unified accelerated life linear model^[253]72
   with the log-transformed age at death as the outcome and the gender as
   a covariate. Different from rare-variant association analyses that
   aimed to discover longevity-associated rare variants using a longevity
   case-control design, our lifespan analyses of rare variants investigate
   how pathogenic rare variants and protective rare variants discovered in
   our case-control study impact human lifespan through quantitative
   analyses.

Pathogenic rare variants and lifespan

   We investigated whether pathogenic rare variants can adversely affect
   lifespan. We used PrimateAI^[254]27, which was specially designed and
   optimized for predicting disease-causing variants^[255]73, to select
   highly pathogenic rare coding variants using a stringent score
   threshold ≥ 0.9 and assessed how the total count of their alternative
   allele (the exome-wide burden) may affect lifespan. PrimateAI is a
   machine learning-based method that expands the data set for training by
   including common variants from non-human primates to improve the power
   for predicting human pathogenic variants. No direct comparison of
   variant effects on longevity was made between human and non-human
   primates by using PrimateAI.

Protective rare variants and lifespan

   Our rare-variant association tests uncovered a burden of rare variants
   in WNT signaling genes that may have pro-longevity effects among APOE4+
   ([256]Supplementary Table S14). We investigated their impact on
   lifespan by examining those protective rare variants in our lifespan
   cohort through several analyses: We evaluated whether the alternative
   allele count of those protective rare variants in WNT signaling genes
   is correlated with lifespan among APOE4+ and APOE4−, respectively; from
   a complementary angle, we investigated whether APOE4 differentially
   affects lifespans of individuals with a high or low burden of those
   protective rare variants; and finally, we also examined centenarians
   and non-centenarians separately in the lifespan analysis to
   differentiate it from the association study in which the longevity
   status was used.

Replication studies

   To maximize the extent of replicating human longevity association of
   rare variants, we prepared longevity case-control replication studies
   using cohort-specific criteria of determining longevity cases and
   controls. First, longevity cases are individuals older than the human
   life expectancy. Second, longevity controls are individuals
   substantially (> 15 years) younger than cases. We used the WES data
   from three cohorts – a German longevity cohort, a UK Biobank longevity
   cohort, and a longevity cohort from ADSP – to replicate the longevity
   association of rare variants discovered in our Ashkenazi Jewish
   longevity cohort. The German sample comprised 1,265 long-lived
   individuals (age range: 94 - 110 years; mean age: 99 years) as
   described previously^[257]74 and 4,195 younger controls (mean age: 35
   years) recruited as part of the FoCus cohort^[258]75 and as blood
   donors at the University Hospital Schleswig-Holstein in Kiel and
   Lübeck, Germany. For exome sequencing and data analysis (including
   alignment and variant calling), the same wet lab processes and
   bioinformatic pipelines at the Regeneron Genetics Center were employed
   as for the Einstein cohort. The UK Biobank longevity cohort was
   collected from 49,960 individuals whole-exome sequenced in the UK
   Biobank^[259]76, consisting of 104 cases and 23,405 controls of British
   and white ethnicity with at least one long-lived parent (lifespan ≥ 100
   years) (mean longest-lifespan of parents: 101 years) and with parents
   of usual survival (lifespan < 95 years) (mean longest-lifespan of
   parents: 80 years), respectively. The ADSP longevity cohort consists of
   1,121 non-AD individuals aged ≥ 90 years (the ADSP recorded age is
   right truncated at 90) as cases and 38 non-AD individuals aged < 75
   years as controls (mean age: 71 years). Both the German and the UK
   Biobank longevity cohorts were used to replicate our findings in the
   full and APOE4-stratified Ashkenazi Jewish longevity cohorts. The ADSP
   longevity cohort was used only to replicate findings made in the full
   discovery cohort due to the limited number of its control samples.

   Relativeness to the first-degree kinship were removed from all three
   cohorts. We applied the same framework of rare variant association
   analysis from our discovery study in the replication analysis. We
   tested the 6 masking-groups of not only rare variants (AAF < 1%) but of
   ultra-rare variants (AAF < 0.05%), separately, that were not examined
   specifically in our discovery cohort due to the limited sample size of
   allele reference panel in Ashkenazi Jews. The minimum P-value test was
   used to correct for 12 test P-values for a tested gene set accordingly.
   Rare variants in the three longevity cohorts are determined based on
   their AAF frequency in the corresponding WES data (5,460, 49,960, and
   10,267 individuals in the German, UK Biobank and ADSP WES data,
   respectively). Ultra-rare variants were further determined from rare
   variants based on their AAF reported in the large Non-Finnish European
   reference panel in gnomAD (v2; 56,885 individuals). Rare variants with
   genotype missing rates ≥ 0.1 were excluded from our analyses. Gender
   and top 10 principal components from the PCA analyses accounting for
   the subpopulation structure were used as covariates in the burden test
   and SKAT. FDR was used to correct for 4 (2 gene sets: Insulin and AMPK;
   2 tests: SKAT and the burden test) and 12 (3 gene sets: Insulin, AMPK
   and WNT; 2 tests: SKAT; 2 sub cohorts: APOE4+ and APOE4−) combined
   P-values for replication tests at a gene-set level in the full cohort
   and APOE4 stratified cohorts, respectively.

   We used the UK Biobank WES data of a parental lifespan cohort to
   replicate the relationship between pathogenic rare variants and
   lifespan discovered in our Ashkenazi Jewish lifespan cohort, due to the
   lack of long-lived individuals with WES data (the longest lifespan is
   ~80 years). After removing relatedness to the first-degree kinship,
   this cohort consists of 20,823 individuals of British and white
   ethnicity with the average parental lifespan ≥ 65 years. We used the
   same regression framework for replication as we used for discovery,
   including top 10 principal components as covariates.

Statistics and reproducibility

   No statistical methods were used to predetermine sample size as all the
   available samples from the WES data were considered. We used various
   statistical methods to analyze the data; please see the [260]Methods
   subsections above for details. We used three independent longevity
   cohorts in which we successfully replicated our finding on longevity
   association of rare variant in aging pathways. No data were excluded
   from the analyses. The experiments were not randomized as this approach
   was not relevant to the study design. The investigators were not
   blinded to allocation during experiments and outcome assessment as this
   was not relevant to the study design.

Extended Data

Extended Data Figure 1. The replication study of gene-set longevity
association using the WES data of the German longevity cohort.

   Extended Data Figure 1.
   [261]Open in a new tab

   The longevity case-control study consists of 1,265 longevity cases and
   4,195 longevity controls. P* denotes P-value corrected for 12
   categories of rare variants using the minimal-P value test from
   Flannick et al^[262]52 ([263]Methods). The text for the significant
   association denotes the lowest raw P-value among different groups of
   tested rare variants and FDR. (A) Full longevity cohort. (B) APOE4
   stratified cohorts.

Extended Data Figure 2. The replication study of gene-set longevity
association using the UK Biobank WES data.

   Extended Data Figure 2.
   [264]Open in a new tab

   The longevity case-control study consists of 104 cases with at least
   one parent age at death ≥ 100 years and 23,405 controls with both
   parent age at death < 95 years. P* denotes P-value corrected for 12
   categories of rare variants using the minimal-P value test from
   Flannick et al^[265]52 ([266]Methods). The text for the significant
   association denotes the lowest raw P-value among different groups of
   tested rare variants and FDR. (A) Full longevity cohort. (B) APOE4
   stratified cohorts.

Extended Data Figure 3. The replication study of gene-set longevity
association using the ADSP WES data.

   Extended Data Figure 3.
   [267]Open in a new tab

   The longevity case-control study consists of 1,121 non-AD cases with
   age ≥ 90 years and 38 non-AD controls with age < 75 years. P* denotes
   P-value corrected for 12 categories of rare variants using the
   minimal-P value test from Flannick et al^[268]52 ([269]Methods). The
   text for the significant association denotes the lowest raw P-value
   among different groups of tested rare variants and FDR.

Extended Data Figure 4. Gene-set rare variant association in the
APOE4-stratied cohorts of the discovery (Ashkenazi Jewish) longevity cohort.

   Extended Data Figure 4.
   [270]Open in a new tab

   P* denotes P-value corrected for 6 categories of tested variants using
   the minimal-P value test from Flannick et al^[271]52 ([272]Methods).
   The text for the significant association denotes the lowest raw P-value
   among different groups of tested rare variants and FDR.

Extended Data Figure 5. Lifespan analysis of protective variants in WNT
signaling genes for non-centenarians.

   Extended Data Figure 5.
   [273]Open in a new tab

   P denotes uncorrected P-value derived from linear regression with the
   log-transformed age at death as the outcome and the gender as a
   covariate (See [274]Methods). 'WNT low' and 'WNT high' represent the
   alternative allele count of rare variants in WNT signaling genes ≤ 1
   and > 1 (the median), respectively. In parentheses are the numbers of
   individuals. MD stands for 'median difference'. (A) The lifespan
   difference of individuals carrying a high and low burden of protective
   rare variants in WNT signaling genes. (B) Negative effects of APOE4 on
   lifespan with high and low burden of protective rare variants in WNT
   signaling for centenarians.

Extended Data Figure 6. Lifespan analysis of protective variants in WNT
signaling genes for centenarians.

   Extended Data Figure 6.
   [275]Open in a new tab

   P denotes uncorrected P-value derived from linear regression with the
   log-transformed age at death as the outcome and the gender as a
   covariate (See [276]Methods). 'WNT low' and 'WNT high' represent the
   alternative allele count of rare variants in WNT signaling genes ≤ 1
   and > 1 (the median), respectively. In parentheses are the numbers of
   individuals. MD stands for 'median difference'. (A) The lifespan
   difference of individuals carrying a high and low burden of protective
   rare variants in WNT signaling genes. (B) Negative effects of APOE4 on
   lifespan with high and low burden of protective rare variants in WNT
   signaling for centenarians.

Extended Data Figure 7. Disease-PRS analyses for centenarian and control.

   Extended Data Figure 7.
   [277]Open in a new tab

   This shows the results of PRS analyses for age-related diseases in the
   centenarian cohort. In the boxplots, points represent individuals, and
   horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper
   quartile (Q3), median, lower quartile (Q1), lower fence (minimum in
   Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range
   (25th to the 75th percentile). n = 910 biologically independent samples
   in the boxplots on the right panels for coronary artery disease, type 2
   diabetes, stroke, and pancreatic cancer. n = 339 and 571 biologically
   independent samples in the boxplots on the right panels for prostate
   cancer and breast cancer, respectively. Above the boxplot on the right
   are raw and adjusted (in parentheses) P-values for the best prediction
   in the Nagelkerke R^2 plot on the left, which were calculated based on
   logistic regression and the permutation test in PRSice2, respectively.
   For stroke, breast cancer, prostate cancer, and pancreatic cancer, no
   robust association was observed between their PRS and the longevity
   status as originally defined in our cohort. (A) Coronary artery
   disease. (B) Coronary artery disease without considering SNPs within
   1Mbps of rs7412 or rs429358 (SNPs for the APOE haplotype). (C) Type 2
   diabetes. (D) Stroke. (E) Prostate cancer. Only males are considered.
   (F) Breast cancer. Only females are considered. (G) Pancreatic cancer.

Extended Data Figure 8. Basic statistics of the lifespan cohort.

   Extended Data Figure 8.
   [278]Open in a new tab

   (A) Lifespan distribution of 553 individuals. (B) Survival curves of
   202 males and 351 females composing the analyzed cohort. Females have a
   significant survival rate than males based on cox regression model (P =
   1.71E-07; coxph in R package).

Extended Data Figure 9. Correlation between lifespan and common-variant
genetic risk of age-related diseases.

   Extended Data Figure 9.
   [279]Open in a new tab

   P-values were based on the result of linear regression (regress log
   lifespan on genetic disease risk) corrected for gender. (A) Alzheimer's
   disease. The plots on the left and right show the boxplot and survival
   curves of APOE4+ and APOE4−, respectively. MD stands for 'Median
   Difference'. In the boxplots, points represent individuals, and
   horizontal lines represent upper fence (maximum in Q3+1.5×IQR), upper
   quartile (Q3), median, lower quartile (Q1), lower fence (minimum in
   Q1–1.5×IQR), sequentially from top to bottom; IQR: interquartile range
   (25th to the 75th percentile). n = 553 biologically independent
   samples. (B) Coronary artery disease. r represents 'correlation
   coefficient'. (C) Type 2 diabetes.

Extended Data Figure 10. Flowcharts of sample collection for different
analyses.

   Extended Data Figure 10.
   [280]Open in a new tab

   (A) Flowchart of sample collection for PRS analyses and lifespan
   analyses of rare variants and disease PRS. Refer '[281]Rare variant
   association analysis' subsection for the strategy of removing kinship
   for PRS analysis that involves longevity status. The strategy of
   removing kinship in lifespan analyses is to randomly exclude one in
   pairs of individuals with the proportion of alleles shared
   identity-by-descent (IBD) > 0.4. (B) Flowchart of sample collection for
   rare variant association tests, network-integrated analyses, and
   lifespan analyses of rare variants (and APOE4).

Supplementary Material

   Supplementary Tables
   [282]NIHMS2105468-supplement-Supplementary_Tables.xlsx^ (69.2KB, xlsx)
   Supplementary Note and Figures
   [283]NIHMS2105468-supplement-Supplementary_Note_and_Figures.pdf^ (3MB,
   pdf)

ACKNOWLEDGEMENTS