Abstract Age at menarche (AAM) and age at natural menopause (ANM) are highly heritable traits and have been linked to various health outcomes. We aimed to identify circulating proteins associated with altered ANM and AAM using an unbiased two-sample Mendelian randomization (MR) and colocalization approach. By testing causal effects of 1,271 proteins on AAM, we identified 22 proteins causally associated with AAM in MR, among which 13 proteins (GCKR, FOXO3, SEMA3G, PATE4, AZGP1, NEGR1, LHB, DLK1, ANXA2, YWHAB, DNAJB12, RMDN1 and HPGDS) colocalized. Among 1,349 proteins tested for causal association with ANM using MR, we identified 19 causal proteins among which 7 proteins (CPNE1, TYMP, DNER, ADAMTS13, LCT, ARL and PLXNA1) colocalized. Follow-up pathway and gene enrichment analyses demonstrated links between AAM-related proteins and obesity and diabetes, and between AAM and ANM-related proteins and various types of cancer. In conclusion, we identified proteomic signatures of reproductive ageing in women, highlighting biological processes at both ends of the reproductive lifespan. Subject terms: Genetic association study, Genome-wide association studies __________________________________________________________________ Mendelian randomization and colocalization analyses identified 13 circulating proteins causally associated with age at menarche and 7 proteins associated with age at natural menopause, highlighting the proteomic signatures encompassing both ends of women’s reproductive lifespans. Introduction Menarche and menopause are two major events in a woman’s reproductive life. Growing evidence indicates that early or late occurrences of these events, as part of their natural age variation in the general population, are associated with increased risks of adverse health conditions such as endometriosis, breast cancer, adult obesity, type 2 diabetes, cardiovascular disease, and increased mortality^[34]1. In an effort to prevent these adverse health effects, hormonal-blocking or replacement treatments aim to directly rectify the physiological changes caused by extreme variation in ages at menarche or menopause. Understanding the pathways underlying normal pubertal and menopausal timing in women is paramount to differentiate between natural variations in the reproductive lifespan versus changes due to pathological causes. This knowledge can guide medical decisions to treat women with earlier or later occurrences of menarche or menopause to mitigate related long-term adverse effects. In the majority of cases, premature or delayed menarche or menopause are extreme presentations of the normal physiological variation in the timing of puberty or reproductive aging^[35]2. Simple hormonal measurements in the blood often cannot help differentiate between these normal variations and epiphenomena of underlying pathological processes. Thus, there is an unmet need for discovery of predictive biomarkers of extreme physiological variations in the age of female reproductive aging, while the same molecules could be targets for pharmacological treatment. The large amount of data generated in high throughput proteomic studies offers a source to comprehensively study differences in levels of circulating proteins associated with variation in age at menarche (AAM) and age at natural menopause (ANM), which individually, or as part of a network, can provide insight into the molecular mechanisms that affect timing of menarche and of natural menopause^[36]3,[37]4. Circulating proteins in the blood have been a valuable source for biomarker discovery and drug target characterisation for various diseases^[38]5. However, the serum proteome can be highly sensitive to sample processing, and its measurement is costly and prone to potential bias due to differences in the sample groups. Case-control studies comparing proteomic profiles between individuals affected with diseases versus controls can be subject to reverse causation, but if our goal is prevention, it is necessary to detect changes in proteomic profiles prior to disease onset. Due to the above limitations, it is complicated to comprehensively characterize and compare the serum/plasma proteome in case-control studies^[39]6. Better evidence is needed to understand if circulating proteins play a causal role in female reproductive aging. In the recent years, large-scale proteomic GWAS have identified thousands of variants controlling levels of circulating proteins^[40]7–[41]11. Also, AAM and ANM are highly heritable traits with estimates of heritability for both AAM and ANM varying between 50% and 80%^[42]12–[43]15. These estimates have been confirmed by recent large-scale genome-wide association studies (GWAS) on AAM and ANM, leveraging data from over a million women^[44]16–[45]18. These GWAS provide a valuable opportunity to utilize Mendelian randomization (MR), a method allowing to test causal associations between modifiable exposures (such as circulating proteins) and outcomes, such as AAM and ANM. Under certain assumptions, MR uses genetic variants as instruments to estimate the effect of an exposure on an outcome of interest^[46]19, exploiting the random allocation of these variants at conception to infer causal effects that are robust to confounding^[47]19 and reverse causation, which are major limitations of conventional observational studies^[48]20. In this study, we aimed to leverage data from large proteomic GWAS, to identify circulating proteins, genetically predicted levels of which are associated with AAM or ANM within the MR framework. To further prioritize the MR-identified proteins, we undertook colocalization, followed by pathway, enrichment analyses and analyses using expression profiles. Results To evaluate the causal role of circulating proteins on AAM, and ANM, we identified cis-pQTLs from six proteomic GWAS (Sun et al.^[49]7, N = 3301; Emilsson et al.^[50]8 N = 3200; Suhre et al.^[51]9, N = 1000; Folkersen et al.^[52]10, N = 21,758; Yao et al.^[53]11, N = 6861, and Ferkingstad et al.^[54]21, N = 35,559) and used them as genetic instruments (Fig. [55]1). To assess the association of cis-pQTL with AAM, we retrieved the effects of these cis-pQTL on AAM in 370,000 European women from the REPROGEN Consortium GWAS^[56]16 (Supplementary Data [57]1). We obtained effects of the cis-pQTLs on ANM from the largest GWAS meta-analysis on ANM equally from the REPROGEN Consortium^[58]18, totaling 200,000 women of European descent from 21 studies. (Supplementary Data [59]1). Fig. 1. Flowchart with the study design. [60]Fig. 1 [61]Open in a new tab Mendelian Randomization was carried out with the R package TwoSampleMR. Colocalization was carried out with the R package coloc and the SuSIE plug-in. The results of the MR analysis, expressing changes in AAM or ANM in years per 1 standard deviation (SD) increase in the levels of a circulating protein, are presented in Fig. [62]2. Our power analysis^[63]22 showed that at an alpha = 0.05/1,300 = 3.8 x 10^−^5 (after Bonferroni correction), our power to detect an effect as small as 0.02 years in AAM or of 0.11 years in ANM per SD change in a specific protein level was 100% for the majority of the MR-prioritized proteins, based on the variance explained of each protein by its SNP-instrument, and a reported SD for AAM of 1.3 years and for ANM of 4 years (Supplementary Data [64]2 and [65]3). Fig. 2. MR estimates for 22 proteins associated with AAM, and 18 proteins associated with ANM in our MR analysis. [66]Fig. 2 [67]Open in a new tab Forest plots show the association between genetically determined level of each protein with AAM or ANM using MR. β coefficients represent the effect of a 1 SD higher protein level derived from the genetic instrument on the respective AAM, and ANM trait measured in years. a Biomarkers associated with AAM. b Biomarkers associated with ANM. See Supplementary Data [68]4 and [69]9 for more information. ^†P = 0.05/1,271 or 3.9 x 10^-5; ^‡P < 3.7 × 10^−5 (0.05/1349). Causal effects of circulating proteins on AAM In total, 1271 distinct proteins (instrumented by 1265 directly matched cis-pQTL and 6 proxies) were tested for association with AAM using MR. Considering Bonferroni correction (P = 0.05/1271 or 3.9 x 10^−^5, Supplementary Data [70]4), we identified 22 proteins associated with AAM. The cis-pQTL of the 22 proteins had all an F-statistic > 10, ranging from 35.12 (NEGR1) to 25208.6 (TXNDC15), explaining from 0.1% (FOXO3) to 77.1% (MST1) of the variance of the protein levels. The 22 proteins presented MR beta coefficients (β) on AAM ranging from -0.45 (FOXO3) to 0.46 (MANF) years per 1 SD increase in protein levels (Fig. [71]2a, Supplementary Data [72]5). We cross-validated the findings for LHB and TIE1 levels, for which cis-pQTLs effects were available in both Sun et al.^[73]7 and Emilsson et al.^[74]8, and the MR effect sizes on AAM were comparable (Fig. [75]2a). Similarly, the result using a cis-pQTL for YWHAB in Ferkingstad et al.^[76]21 was replicated using a cis-pQTL from Emilsson et al.^[77]8, the finding for MST1 using a cis-pQTL from Suhre et al.^[78]9 was replicated using a cis-pQTL from Emilsson et al.^[79]8 (Fig. [80]2a), and the MR effect of MANF using a cis-pQTL from Ferkingstad et al.^[81]21 was consistent with that using a cis-pQTL from Emilsson et al.^[82]8 (Supplementary Data [83]4). The cis-pQTL associated with the cross-validated proteins differed between the two proteomic GWAS-sources for all of the 5 aforementioned proteins. To assess if our MR results were driven by confounding due to linkage disequilibrium (LD), we tested whether the MR-prioritized proteins shared a single causal variant with AAM using colocalization, where a posterior probability 4 (H4) indicates presence of a single causal SNP for the protein and AAM, while a posterior probability 3 (H3) indicated the presence of two causal variants (one for the protein and the other for AAM) in LD. For 18 of the 22 MR-prioritized proteins, summary-level GWAS results were publicly available in the Sun et al.^[84]7, Suhre et al.^[85]9, or Ferkingstad et al.^[86]21 GWAS, allowing colocalization analyses. Five proteins [GCKR (cis-pQTL: rs1260326, H4 = 97%), FOXO3 (cis-pQTL : rs3813498, H4 = 95%), SEMA3G (cis-pQTL rs2016575, H4 = 92%), PATE4 (cis-pQTL rs665677, H4 = 89%) and AZGP1 (cis-pQTL rs61381915, H4 = 83%)] showed high evidence of colocalization with AAM with a > 80% posterior probability H4 of sharing a single causal variant with AAM. We found lower evidence of colocalization, but with the H4 being the greatest among all the posterior probabilities, for DBAJB12 (cis-pQTL: rs9416018, H4 = 77%), HPGDS (cis-pQTLrs1965049, H4 = 66%), LHB (cis-pQTL rs79502742, H4 = 59%) and YWHAB (cis-pQTL rs6031848, H4 = 53%) (Supplementary Data [87]6). The remaining 9 proteins among the 18 tested for colocalization showed evidence of association with AAM through two different causative variants in LD (all H3> 93%) (Supplementary Data [88]6). Using the Sum of Single Effects (SuSiE) plug-in in colocalization^[89]23, we found that 8 additional proteins (NEGR1, LHB, DLK1, ANXA2, YWHAB, DNAJB12, RMDN1 and HPGDS) colocalized with AAM with an H4 > 80% if we considered multiple causal variants within a 1 Mb region of each protein’s cis-pQTL (Supplementary Data [90]7). Causal effects of circulating proteins on ANM We used 1349 cis-pQTLs as genetic instruments to test for causal associations of an equal number of circulating proteins with ANM. Considering Bonferroni correction (P = 0.05/1349 or 3.7 x 10^−^5), we identified 18 proteins significantly associated with ANM, with MR beta coefficients (β) ranging from −0.80 (PNP) to 1.80 years (ITPKA) per 1 SD increase of protein levels (Fig. [91]2b). All 19 cis-pQTL of the candidate proteins had F-statistics > 10 ranging from 47.4 (TYMP) to 1273.3 (CPNE1), and explained between 0.2% (ITPKA) and 28.5% (CPNE1) of the variance in the protein levels (Supplementary Data [92]8). Furthermore, we cross-validated the findings for 4 proteins (LCT, PPA1, ARL3, PLXNA1) in at least two GWAS studies except for CPNE1 which replicated across 3 proteomic GWAS studies (Supplementary Data [93]9). The cis-pQTL for LCT and PPA1 were identical in the two proteomic GWAS-sources, but for the three other cross-validated proteins (ARL3, PPA1 and CPNE1), the cis-pQTL differed. Next, we performed colocalization analyses for ANM and the 19 candidate proteins using summary-level results from Sun et al.^[94]7, Suhre et al.^[95]9 or Ferkingstad et al.^[96]21. We found evidence of colocalization for four proteins: CPNE1 (cis-pQTL: rs12481228, H4=100%) TYMP (cis-pQTL: rs131805, H4=98.4%), DNER (cis-pQTL: rs34412673, H4=97.2%), and ADAMTS13 (cis-pQTL: rs71503194, H4=91.3%). Our analysis indicated that 12 MR-prioritized proteins were associated with ANM through two causal variants in LD (H3> 80%) (Supplementary Data [97]10). Using colocalization with SuSiE^[98]23, we found that 3 additional proteins (LCT, ARL and PLXNA1) colocalized with ANM with a posterior probability H4 > 0.8 if we considered multiple causal variants within a 1Mb region of each protein’s cis-pQTL (Supplementary Data [99]11). By comparing the MR-prioritized proteins of the two studied traits, we observed that GCKR (cis-pQTL: rs1260326 in Ferkingstad et al.^[100]21), which showed an increasing effect on ANM in our MR analysis, demonstrated an effect in the same direction on AAM. By using the R package HyPrColoc^[101]24 we did not find evidence that GCKR colocalized simultaneously with AAM and ANM (Supplementary Data [102]12). This suggests that GCKR levels could predict later age at both natural menopause and menarche. Reverse MR analyses In order to test the presence of reverse causation in the association of the candidate proteins for AAM and ANM, we first performed a Steiger directionality test for each protein MR. We obtained a ‘TRUE’ results in all Steiger tests, which confirmed that the direction of causality was from the protein to AAM or ANM and not the opposite (Supplementary Data [103]13 and [104]14). To further explore the direction of the significant MR associations, we performed reverse MR studies, using AAM or ANM as exposures and the candidate proteins as outcomes. For these analyses, we used 172 and 193 genome-wide significant (GWAS p-value < 5 x 10^−^8) and independent (LD R^2 < 0.001) SNPs as instruments for AAM and ANM respectively, which we retrieved from the same REPROGEN Consortium GWAS described above. Our reverse MR analyses were restricted to proteins with cis-pQTLs identified in Sun et al, Suhre et al, and Ferkingstad et al. since full summary-level results of these GWAS were available. We tested reverse MRs for 20 out of the 22 candidate proteins for AAM and for all 19 candidate proteins for ANM. The results of these analyses appear in Supplementary Data [105]15 (for AAM) and 16 (for ANM). For each reverse MR analysis, we computed MR estimates using the inverse variance weighted (IVW) method, and three other pleiotropy-robust methods (MR-Egger, weighted median and weighted mode). For ANM, our reverse MR showed evidence of association (MR p-values < 0.05) in one out of the four MR methods for 6 proteins (DLK1, DBAJB12, GCKR, NEGR1, TXNDC15, MST1), but for LHB we found stronger evidence of a reverse causal effect of AAM on the level of this protein, with significant results in three out of the four MR methods (Supplementary Data [106]15). For ANM we found weak evidence of reverse causation for PNP and TXNDC15, with only one of the four MR methods obtaining estimates with nominally significant p-values (Supplementary Data [107]16). Phenoscanner search We used Phenoscanner to look up for GWAS associations of the cis-pQTL of the candidate proteins for AAM and ANM with possible confounders of the protein-AAM or ANM association. Our Phenoscanner search revealed associations below the genome-wide suggestive p-value threshold of 10^-5 for the majority of the cis-pQTL of the MR-prioritized proteins for both AAM and ANM (Supplementary Data [108]17 and [109]18). We note a predominance of associations with anthropometric traits, such as weight, height, body mass index (BMI) and body composition measurements. Specifically, among 690 genome-side suggestive associations for 23 cis-pQTL of candidate proteins for AAM, 232 associations (around one-third) were with anthropometric traits. Similarly, among 535 genome-suggestive associations for 17 cis-pQTL of candidate proteins for ANM, we note 177 associations (around one-third) with anthropometric traits. The cis-pQTL with the largest number of genome-wide suggestive associations (n = 293) was rs1260326, the MR instrument for GCKR, a protein associated with both AAM and ANM, with a predominance of associations with lipid, glucose metabolism and blood count traits. In summary, for AAM, 8 proteins (ANXA2, ECM1, GCKR, LHB, MST1, NEGR1, SEMA3G, TIE1) showed low evidence of pleiotropy (i.e., < 40% of the total GWAS associations were pleiotropic) while this applied for 8 proteins for ANM (C4A C4B, GCKR, ITPKA, LY9, PLXNA1, PPA1, TNFRSF17, TYMP). Two-step network MR and multivariable MR In order to explore a mediating effect of BMI in our candidate protein-AAM or candidate protein-ANM associations, we conducted a two-step network MR^[110]25 with BMI as a potential mediator (Fig. [111]3) (Supplementary Data [112]19 and [113]20). For the AAM analysis, we used a large childhood BMI GWAS by Vogelezang et al.^[114]26 while in the ANM analysis we used a large adult BMI GWAS by Yengo et al.^[115]27. The analyses were restricted to proteins prioritized by both MR and colocalization, i.e. 13 proteins for AAM and 7 proteins for ANM. To conduct the two-step MR, we first step was an MR between protein level and BMI, and the second step was an MR between the BMI and the outcome (AAM or ANM). The indirect effect of the protein on the outcome (ie the effect explained by the BMI) was estimated by multiplying the effect of proteins on BMI and the effect of BMI on outcomes. The standard error and the test of the indirect effect were carried out with the Sobel test. Note that for the ANM analysis, cis-pQTL for only 4 proteins were available in the GWAS of adult BMI and we could not find proxies for the 3 other proteins in SNiPA. Fig. 3. Two-step network MR analysis for AAM and ANM. [116]Fig. 3 [117]Open in a new tab For the AAM analysis, we used, as mediator, a large childhood BMI GWAS by Vogelezang et al while in the ANM analysis we used, as a mediator, a large adult BMI GWAS by Yengo et al. The analyzes were restricted to proteins prioritized by both MR and colocalization, i.e., 13 proteins for AAM and 7 proteins for ANM. See Supplementary Data [118]19 and [119]20 for the result. We demonstrated that the effect of NEGR1 and AZGP1 proteins on AAM was mediated by pediatric BMI. Respectively, for NEGR1 the indirect effect is 0.118 years (mediated proportion = 67%, p = 0.0026) while for AZGP1 the indirect effect is 0.047 years (mediated proportion = 48%, p = 0.0140). In addition, we tested the simultaneous effects of BMI and proteins on AAM or ANM by conducting multivariable MR (MVMR^[120]28- see Supplementary Data [121]21 and [122]22). To do this, for each protein, we used its cis-pQTL and SNP-instruments for BMI (pediatric or adult), and retrieved their effects on both exposures and on AAM or ANM. For the ANM MVMR, we used 18 SNPs, 1 cis-pQTL for each protein and 17 independent genome-wide significant SNPs for pediatric BMI. For ANM, we used 522 SNPs, 1 cis-pQTL for each protein and 521 independent SNPs from the BMI GWAS. Our MVMR analyses for AAM showed that pediatric BMI had significant effects on AAM, after considering the simultaneous effect of each of 13 candidate proteins. Contrarily the MR effect of all 13 proteins was non-significant after accounting for pediatric BMI. In the MVMR analyses for ANM, the effect of adult BMI was consistently not significant, but two proteins (ADAMTS13 and CPNE1) maintained a significant effect on ANM after accounting for adult BMI. Protein-altering variant (PAV) assessment GWAS-identified cis-pQTL may be, or may not be PAVs or in LD with these. Assessing this is important, since PAVs can influence the measurement of candidate proteins by altering the affinity and binding of the molecules used to quantify the proteins level^[123]29. We assessed if the cis-pQTL of the proteins prioritized by colocalization for AAM and ANM were PAVs, or in LD (r^2 > 0.8) with PAVs. None of the cis-pQTLs of proteins related to AAM were missense variants, suggesting the absence of protein-altering effects for these proteins (Supplementary Data [124]23). Contrarily, rs1049564, the cis-pQTL for PNP, and the cis-pQTL for GCKR (rs35073769), both candidate proteins for ANM, are missense variants, and therefore could be subject to potential binding effects (Supplementary Data [125]24). Pathway and enrichment analyses All proteins with positive evidence for colocalization (H4 > 0.8 using colocalization with or without the SuSiE plug-in), or in total 13 proteins for AAM and 7 proteins for ANM, were retained for downstream pathway and enrichment analyses, to further explore the function of the candidate proteins. Our GeneMANIA analysis showed that the majority of genes encompassing the 13 candidate proteins colocalizing with AAM had physical interaction, co-expression, co-localization or shared a biological pathway (Supplementary Data [126]25). Using Metascape, we found that 4 genes (AZGP1, DLK1, GCKR and NEGR1) are linked to diabetes (Supplementary Data [127]26). FOXO3 and YWHAB are linked to neurodevelopmental disease, while FOXO3 and ANXA2 are associated with cardiovascular diseases. LHB has been linked to delayed puberty. Finally, 8 out of the 13 genes have been associations with various types of cancer (predominantly breast, prostate and lung cancers). Gene network, enriched ontology, and protein-protein interaction^[128]30 analyses in relation to AAM highlighted the involvement of DLK1 in notch signaling and of HPGDS in prostaglandin metabolic process as well as unsaturated fatty acid biosynthetic process (FDR P-value < 0.05) (Fig. [129]4a). Using FUMA, we were able to demonstrate that the genes of the AAM-related proteins were mainly expressed in the following tissues: adrenal gland, adipose subcutaneous tissue, pituitary and esophagus gastroesophageal junction (Fig. [130]5a, b, Supplementary Data [131]27). Fig. 4. Protein-protein interactions and gene network analyses using GeneMANIA for 13 and 7 candidate proteins for AAM and ANM respectively. [132]Fig. 4 [133]Open in a new tab The genes of the coloc-prioritized proteins (a for AAM and b for ANM) are shown as larger inner circles, while genes from the GeneMANIA extension are smaller and appear in the outer circle. Here, the red line represents the Physical Interaction, Purple line represents Co-Expression, Green line represents the Genetic interaction, and blue line represents Pathway and Co-location. (For interpretation of the color references in this figure legend, the reader is referred to the