Abstract

   Osteoarthritis (OA) has been implicated in the development and
   progression of early-stage endometrial cancer (EC), suggesting shared
   pathogenic factors between the two diseases. This study aimed to
   investigate the causal relationship between OA and EC and to identify
   causative genes common to both early-stage EC and OA. A Two-sample
   Mendelian randomization (MR) analysis was first performed to assess the
   causal relationship between OA and EC. Differentially expressed genes
   associated with early-stage EC and OA were identified using the limma
   package. Overlapping genes were extracted to determine common causative
   genes, followed by enrichment analysis. The causal relationship between
   these genes and EC was verified through Mendelian randomization (MR) of
   drug targets. Genes with diagnostic value were identified using
   multiple machine learning algorithms to construct EC prediction models
   and evaluate their performance. Additionally, the study examined the
   correlation between diagnostic-value genes and immune cell
   infiltration. IVW analysis indicated that OA was a high-risk factor for
   the development of EC (P < 0.05). Seven common causative genes (CDKN2A,
   DDA1, LRRC42, POLB, ADCYAP1R1, DNMT3A, and GLRX5) were identified for
   OA and EC, showing significant enrichment in related pathways such as
   heterochromatin. MR analysis of drug targets revealed that CDKN2A,
   DDA1, LRRC42, and POLB had diagnostic value for EC. The EC prediction
   model based on these four genes demonstrated high performance (AUC =
   0.974 for the training set; AUC = 0.966 for the validation set), and
   these genes were significantly associated with immune cell infiltration
   (P < 0.05). CDKN2A, DDA1, LRRC42, and POLB may be common causative
   genes for OA and early-stage EC, potentially serving as targets for
   drug intervention.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-025-04470-x.

   Keywords: Mendelian randomization, Osteoarthritis, Endometrial cancer,
   Enrichment analysis, Nomogram

   Subject terms: Cancer, Drug discovery, Molecular biology, Diseases,
   Risk factors

Introduction

   Endometrial cancer (EC) is the sixth most common cancer among women,
   particularly prevalent in high-income countries. Its incidence and
   mortality rates are rising globally^[34]1. In 2020, there were 417,367
   new confirmed cases and 97,370 new deaths worldwide, with numbers
   expected to increase over the next decade. The primary symptom of EC is
   abnormal vaginal bleeding; however, no distinctive early changes have
   been identified, resulting in many patients being diagnosed at an
   advanced stage and missing the optimal treatment window^[35]1,[36]2.
   The diagnosis of EC primarily involves transvaginal ultrasound and
   endometrial biopsy, which are invasive^[37]3,[38]4.Thus, understanding
   the disease’s etiology and pathogenesis is essential for improving
   diagnosis and treatment.

   Osteoarthritis (OA) is the most common degenerative joint disease,
   affecting an estimated 350 million people worldwide. Its prevalence
   increases sharply with age, with the primary clinical manifestations
   being joint pain and limited mobility. Despite being diseases of
   different systems, EC and OA share common causative factors, such as
   advanced age, obesity, inflammation, and estrogens^[39]5–[40]8. For
   instance, adipose tissue can aromatize adrenal androgens to estrogens,
   which stimulate endometrial proliferation by inducing mitotic activity
   in endometrial cells, thereby increasing the risk of EC^[41]1,[42]5.
   Adipose tissue also contributes to OA pathogenesis by secreting
   cytokines that may influence metabolic processes in bone and
   joints^[43]9. Obesity mediates the development of EC through elevated
   fasting insulin levels and is also a major cause of OA. Overweight can
   overload the joints, leading to the destruction of articular cartilage,
   while obesity promotes fat deposition and insulin resistance, further
   contributing to OA development^[44]6,[45]10. Inflammation is also
   implicated in the development of both EC and OA. Interferon-induced
   monocyte cytokines have been positively associated with EC
   development^[46]8. Additionally, imbalances in macrophage polarization,
   as well as pro-inflammatory and anti-inflammatory mediators produced by
   macrophages, are closely associated with OA^[47]10,[48]11. A
   pro-inflammatory environment, characterized by elevated C-reactive
   protein, interleukin-6, and tumor necrosis factor-α, along with a
   relative lack of protective immune cell types in the endometrium, may
   contribute to EC development^[49]1. Notably, one cohort study found
   that OA was one of the most common comorbidities of EC, with 35% of EC
   patients also having OA^[50]12.

   In summary, common pathogenic factors may exist between EC and OA.
   However, research on the shared pathogenic factors of EC and OA is
   scarce, with very limited evidence available. The causes of EC and OA
   remain unclear, and their early developmental stages exhibit
   considerable variability among individuals, lacking uniform typical
   clinical manifestations. This variability complicates the early
   diagnosis of both diseases. Early diagnosis is crucial for reducing the
   mortality rate of EC and is particularly beneficial for alleviating
   pain and improving the quality of life in OA patients. Thus, there is
   an urgent need to investigate the common pathogenic factors and
   potential pharmacological intervention targets for EC and OA.

   MR is an approach that uses genetic variation as an instrumental
   variable to assess the causal relationship between environmental
   exposures and disease. Its central idea is Mendel’s law of independent
   assortment, which is not susceptible to a number of confounding
   factors. This approach overcomes the limitations of traditional
   epidemiological studies and yields more reliable causal
   relationships^[51]13. Recently, MR analysis has become widely used for
   drug target development and drug repurposing^[52]14.The advancement of
   Genome-Wide Association Studies (GWAS) and molecular mechanism
   identification has provided a foundation for MR-based strategies,
   facilitating the identification of potential therapeutic targets across
   various diseases^[53]14–[54]16.Additionally, MR of drug targets has
   proven useful for modeling the pharmacological effects of drug targets
   in clinical trials and predicting the clinical benefits and adverse
   effects of treatments^[55]17,[56]18. Therefore, this study aims to
   investigate the causal association between EC and OA using the GWAS
   dataset, combined with MR analysis and transcriptome analysis, to
   explore potential pharmacological intervention targets. This research
   will offer new insights into EC and OA, provide a new direction for
   identifying potential drug intervention targets.

Materials and methods

Research design

   This study examined the causal relationship between EC and OA using a
   two-sample MR method. Differential genes between EC and OA were
   identified, and their enrichment pathwys were explored. Causal
   associations with EC were investigated using drug boot point MR based
   on the eQTLs of genes common to both conditions, with verification of
   consistency between the direction of causal association and the
   expected direction. Additionally, various algorithms were employed to
   identify key genes for constructing a nomogram, and the diagnostic and
   predictive performance for EC patients was evaluated. The study also
   explored the correlation between diagnostic genes and immune cell
   infiltration. The MR analysis method adhered to the three assumptions
   of MR research^[57]19 (Fig. [58]1) and the STRIOBE-MR principles^[59]20
   (Suppementary Table [60]1).

Fig. 1.

   [61]Fig. 1
   [62]Open in a new tab

   Schematic diagram of MR associated with EC. Three major assumptions: ①
   The assumption of association: The instrumental variable is closely
   related to the exposure factor. ② The assumption of association: The
   instrumental variable is closely related to the exposure factor. ③ The
   assumption of independence: The instrumental variable is not correlated
   with confounders.

Table 1.

   Summary information on the GWAS database in the MR study.
 Datasource       Phenotype             Sample size Cases  Population Adjustment
 ebi-a-GCST007092 OA of the hip or knee 417,596     39,427 European   -
 ebi-a-GCST006464 EC                    121,885     12,906 European   -
   [63]Open in a new tab

Data source

   Data on EC and OA were obtained from the IEU Open GWAS database
   ([64]https://gwas.mrcieu.ac.uk/), accessed on June 11, 2024 (Table
   [65]1). Bioinformatic analysis included data from The Cancer Genome
   Atlas (TCGA) database (EC data) and the GEO database ([66]GSE12021-OA)
   (Table [67]2).The original studies obtained informed consent from
   participants, so this part of the study did not require ethics
   committee approval.

Table 2.

   Inclusion and exclusion criteria.
   TCGA-UCEC expression data       Excluded sample size Remaining sample size
   UCEC sample                     -                    554
   Sample relapse                  1                    553
   UCECStageI tissue sample        -                    371
   UCECStageII + III tissue sample -                    182
   Normal tissue samples           -                    35
   Total cases                     -                    588
   [68]GSE12021-OA expression data Excluded sample size Remaining sample
   size
   OA Sample - 10
   Control Sample - 9
   Total cases - 19
   [69]Open in a new tab

Screening of instrumental variables

   To minimize bias from weak instrumental variables, a P-value threshold
   of < 1 × 10^–5 was used as the screening criterion for strong
   correlation. Instrumental variables with F-statistics greater than 10
   were preferentially selected. Additionally, if the intercept term of
   the MR-Egger regression model was not zero (P > 0.05), this indicated a
   lack of gene validity.

Instrumental variable screening for EC and OA

   Setting the chain imbalance factor r^2 = 0.001 with a chain imbalance
   region width of 10,000 kb ensured that individual SNPs were independent
   and excluded the effect of gene pleiotropy. SNPs associated with
   confounders and outcomes were removed using LDlink
   ([70]https://ldlink.nih.gov/?tab=ldtrait).Relevant SNPs were extracted
   from the GWAS pooled data for EC, with a minimum r^2 > 0.8.

Screening of tool variables for differential genes and EC

   For tool variables, the interlocking imbalance coefficient was set at
   r^2 = 0.3 with an interlocking imbalance region width of 300 kb and MAF
   > 0.01 to ensure SNP independence. SNPs associated with confounders and
   outcomes were removed using LDlink. Instrumental variables were located
   within ± 300 kb of the cis-acting region of the drug target gene.
   Relevant instrumental variables were extracted from the eQTL data of
   drug target genes. The SNPs were further extracted from the GWAS
   summary data for the outcome variable EC, excluding SNPs with
   palindromic structures and MAF > 0.42.SNPs directly associated with
   outcome variables were excluded (P < 1 × 10^–5), and abnormal SNPs were
   rejected using MR-PRESSO.

MR analysis method

   This study utilized five main regression models to assess the results:
   MR-Egger regression, random-effects inverse-variance weighted (IVW),
   weighted median estimator (WME), weighted and simple models. The IVW
   method was used as the primary analytical method to assess causality,
   while the MR-Egger method served as a complementary approach,
   particularly in the presence of horizontal pleiotropy. The presence of
   heterogeneity among SNPs was evaluated using Cochran’s Q test and the
   I^2 (I-squared) statistic. Heterogeneity was indicated by a P-value <
   0.05 for Cochran’s Q test, and I^2 > 50% suggested some heterogeneity
   in IVW results. The formula for Inline graphic .A non-zero intercept
   term (P > 0.05) in the MR-Egger regression model indicated that the
   SNPs were not pleiotropic. The “leave-one-out” method was employed for
   sensitivity analyses to examine the impact of each SNP on the results
   and assess result robustness. All MR analyses were conducted using the
   Two Sample MR package in R 4.1.0 software, with a significance level of
   α = 0.05.

Differential gene acquisition and functional enrichment analysis

   This study used the limma (Linear Models for Microarray Data)
   package^[71]21 to perform differential expression analysis on several
   sets of transcriptome expression data: ① Phase I EC tissue (371 cases)
   versus normal control tissue (35 cases); ② Phase I EC tissues (371
   cases) versus Phase II and III EC tissues (182 cases); ③ OA (10 cases)
   versus normal control (9 cases) tissues from the GEO dataset
   ([72]GSE12021).Differential genes identified in these comparisons were
   intersected to find common genes. To further elucidate the potential
   functions of these differential genes, Differential Gene Pathway
   Enrichment Analysis was performed. GO (BP/CC/MF) and KEGG enrichment
   analyses were conducted using the R package ‘clusterProfiler’ with a
   q-value filter < 0.05^[73]22–[74]24.KEGG analysis provided insights
   into the high-level functions and utilities of biological systems
   related to differential genes, while GO analysis annotated genes based
   on their functions, particularly in MF, BP, and CC. The top 10
   significantly enriched pathways were visualized for BP, CC, and MF
   enrichment analyses, and the top 30 significantly enriched pathways
   were visualized for KEGG enrichment analyses.

Steiger orientation test of differential genes eQTLs and EC

   Common differential genes between early EC and OA were identified. GWAS
   summary data for eQTLs and EC (ebi-a-GCST006464) corresponding to these
   differential genes were obtained from the IEU Open GWAS projects
   database to validate the MR causal association of drug targets.
   Directional consistency of genotypes for causality between intermediate
   variables and final outcomes was assessed using the Steiger Direction
   Test. This method calculates the explained variance of eQTLs for the
   differential genes and the explained variance of EC, then tests whether
   the variance of EC is less than that of the differential genes. In MR
   Steiger results, if the variance of EC is less than the variance of the
   differential gene, the result is judged as ‘TRUE’, indicating that
   causality is in the expected direction. Conversely, a ‘FALSE’ result
   suggests that causality is in the opposite direction of the expected
   direction.

Construction and evaluation of EC prediction models

   To identify candidate biomarkers and build a prediction model for EC,
   key genes were screened using Random Forest and LASSO regression
   algorithms.

   Random Forest (RF) is a popular ensemble learning method that
   constructs predictive models with a high degree of accuracy. By
   building and integrating multiple decision trees, RF estimates the
   importance of variables, enhancing model diversity and improving the
   generalization of the forest. This process accurately assesses the
   importance of individual features. The LASSO (Least Absolute Shrinkage
   and Selection Operator) model is a penalized linear method used for
   regression analysis. It performs feature selection and regularization
   to prevent overfitting and improve model interpretation by adding an L1
   penalty term to the loss function of ordinary least squares regression.
   This compresses the regression coefficients, making some coefficients
   zero and enabling feature selection. These techniques help reduce model
   complexity and enhance interpretability.

   Based on the genes with diagnostic value, a nomogram was constructed
   using the R package ‘rms’.The area under the receiver operating
   characteristic (ROC) curve was plotted to evaluate the diagnostic
   effectiveness of these genes for EC. Finally, calibration curves and
   decision curve analysis (DCA) were performed to assess the efficiency
   of the nomogram prediction model for endometrial cancer.

Multiple hypothesis testing correction

   Correction of P-values for MR statistics by calculating false discovery
   rate (FDR), It can be used as an indicator to test the error rate to
   flexibly adjust the value. The formula is Inline graphic .

Correlation analysis of diagnostic genes and immune cell infiltration

   The diagnostic gene expression matrix was uploaded to the CIBERSORTx
   data library ([75]https://cibersortx.stanford.edu/) to calculate immune
   cell infiltration for each sample. Correlation analysis between key
   genes and immune cell infiltration was performed using Spearman rank
   correlation coefficients.

Results

OA associated with increased risk of EC

   A total of 92 SNPs associated with EC were identified from eQTLs data
   related to OA, all of which had an F-statistic > 10, making them
   suitable as instrumental variables for assessing the causal
   relationship between OA and EC (Supplementary Table 2). IVW analysis
   indicated that OA was a significant risk factor for the development of
   EC (OR = 1.104, 95% CI: 1.008-1.209, P = 0.032).The I^2 statistic was
   18%, and Cochran’s Q was 110.610 (P = 0.079), suggesting no significant
   heterogeneity among the SNPs used as instrumental variables in the MR
   analyses. MR-Egger results (P = 0.861) indicated that no significant
   pleiotropy was present among the SNPs as instrumental variables (Table
   [76]3). EC scatter plots (Fig. [77]2A) and funnel plots (Fig. [78]2C)
   demonstrated that the distribution of all included SNPs was generally
   symmetrical, suggesting that causal associations were less likely to be
   affected by potential bias. The leave-one-out test (Fig. [79]2B) showed
   that excluding each SNP in turn did not significantly alter the
   results, with no SNPs having a substantial impact on the causal
   association estimates, indicating that the MR results of this study are
   robust.

Table 3.

   Results of MR regression causal association between OA and EC.
   Exposure Outcome Nsnp MR P I2(%) Heterogeneity P Egger intercept
   Horizontal pleiotropy P MR-PRESSO
   OR(95%CI) Cochran’s Q SE P
   OA EC 92 1.104 0.032 18 110.61 0.079 0.082
   (1.008–1.209)
   OA EC 92 1.136 0.454 19 110.572 0.07 -0.001 0.008 0.861
   (0.815–1.585)
   OA EC 92 1.091 0.16
   (0.966–1.231)
   OA EC 92 1.019 0.91
   (0.740–1.402)
   OA EC 92 1.038 0.803
   (0.774–1.394)
   [80]Open in a new tab

Fig. 2.

   [81]Fig. 2
   [82]Open in a new tab

   Two-sample MR analysis of OA and EC. (A) Scatter plot (B)
   ‘Leave-one-out’ sensitivity analysis (C) Funnel plot.

Differential gene acquisition and functional enrichment analysis of EC and OA

   Transcriptome expression data from the TCGA-UCEC dataset were analyzed
   for expression differences using the limma package. Initially, phase I
   EC was compared with normal tissue, using thresholds of |log2FC| > 0.5
   and P < 0.05, resulting in 9,399 differential genes (Fig. [83]3A).
   Next, phase I EC tissue was compared with phase II and III EC tissue,
   identifying 2,607 differential genes (Fig. [84]3B). Intersection
   analysis of genes with expression differences from both comparisons,
   considering opposite directional changes, yielded 661 intersecting
   genes (Fig. [85]3D). Transcriptome expression data from the
   [86]GSE12021 dataset were also analyzed using the limma package, with
   adjusted thresholds of |log2FC| > 0.5 and P < 0.05, resulting in 627
   differentially expressed genes (Fig. [87]3C). Among these, seven
   intersecting genes with consistent expression differences across both
   datasets were identified (Fig. [88]3E).

Fig. 3.

   [89]Fig. 3
   [90]Open in a new tab

   Differential genes for EC and OA. (A) Differentially expressed genes in
   Stage I EC tissues versus normal endometrial tissues. (B)
   Differentially expressed genes in Stage I EC tissues versus Stage II
   and III EC tissues. (C) Differentially expressed genes in OA tissues
   versus normal tissues. (D) Genes with differential expression in
   opposite directions compared to Fig. [91]5A and Fig. [92]5B. (E) Genes
   with the same direction of differential expression as in Figure C and
   Figure D. (F) GO enrichment analysis.

   GO enrichment analysis revealed that differential genes were
   significantly enriched in pathways related to heterochromatin, response
   to ethanol, response to alcohol, and aging. Response to ethanol,
   response to alcohol, and aging were associated with biological
   processes (BP), while heterochromatin was linked to cellular components
   (CC) (Fig. [93]3F). KEGG enrichment analysis did not identify any
   significant pathways.

causal association analysis of eQTLs corresponding to differential genes with
EC, and Multiple hypothesis testing correction

   After applying the screening criteria,seven differential genes were
   extracted from the IEU Open GWAS project website for six differential
   eQTLs (Table [94]4). A total of 130 cis eQTLs for these differential
   genes were identified from the differential eQTL data (Supplementary
   Table 3).IVW results indicated that four genes were associated with an
   increased risk of developing EC.CDKN2A (OR = 1.546, 95%CI: 1.298-1.842,
   P < 0.001) DDA1 (OR = 1.111, 95%CI: 1.021-1.208, P = 0.014) LRRC42 (OR
   = 1.112, 95%CI: 1.026-1.206, P = 0.01) and POLB (OR = 1.072, 95%CI:
   1.004-1.145, P = 0.038) is a risk factor for EC (Fig. [95]4).There was
   no significant heterogeneity or pleiotropy among the instrumental
   variables for these three genes (Figs. S1–S3).Although the gene POLB
   appeared to be positively associated with EC, there may be pleiotropy
   among its instrumental variables, necessitating further assessment of
   its causality (Fig. S4). The results of the Steiger orientation test
   showed that the orientation of all four differential genes was ‘TRUE’,
   indicating that the causal relationship between these genes and the
   outcome was in the expected direction (Table [96]5). These four genes
   may be common causative factors for both OA and EC, participating in
   and mediating the development of EC, although their specific mechanisms
   require further investigation. In addition, we corrected the statistics
   of MR for FDR, and all P values after this correction were less than
   0.05 (Table S4).

Table 4.

   Summary information on eQTLs and GWAS databases in the MR study.
   Datasource Phenotype Sample size Cases Population Adjustment
   IEUOpenGWASproject(eqtl-a-ENSG00000070501) POLB 31,684 - European
   MalesandFemales
   IEUOpenGWASproject(eqtl-a-ENSG00000116212) LRRC42 31,684 - European
   MalesandFemales
   IEUOpenGWASproject(eqtl-a-ENSG00000119772) DNMT3A 31,684 - European
   MalesandFemales
   IEUOpenGWASproject(eqtl-a-ENSG00000147889) CDKN2A 31,684 - European
   MalesandFemales
   IEUOpenGWASproject(eqtl-a-ENSG00000182512) GLRX5 31,684 - European
   MalesandFemales
   IEUOpenGWASproject(eqtl-a-ENSG00000130311) DDA1 14,263 - European
   MalesandFemales
   IEUOpenGWASproject(ebi-a-GCST006464) EC 121,885 12,906 European -
   [97]Open in a new tab

Fig. 4.

   [98]Fig. 4
   [99]Open in a new tab

   Forest plot of the results of causal association analysis between
   differential gene eQTLs and EC.

Table 5.

   Results of causal association analysis of differential gene eQTLs with
   EC.
   Exposure Outcome Nsnp MR P I2(%) Heterogeneity P Egger intercept
   Horizontal pleiotropy MR-PRESSO Steiger
   OR(95%CI) Cochran’s Q SE P P Correct_causal_direction
   CDKN2A EC 6 1.546 (1.298–1.842) < 0.001 12 5.675 0.339 0.391 TRUE
   CDKN2A EC 6 1.322 (0.654–2.673) 0.48 26 5.399 0.249 0.014 0.032 0.675
   CDKN2A EC 6 1.608 (1.276–2.025) < 0.001
   CDKN2A EC 6 1.678 (1.142–2.465) 0.046
   CDKN2A EC 6 1.649 (1.137–2.392) 0.046
   DDA1 EC 19 1.111 (1.021–1.208) 0.014 0 16.632 0.549 0.607 TRUE
   DDA1 EC 19 1.306 (1.042–1.635) 0.033 0 14.333 0.643 -0.02 0.013 0.148
   DDA1 EC 19 1.093 (0.970–1.230) 0.144
   DDA1 EC 19 1.112 (0.909–1.359) 0.315
   Exposure Outcome Nsnp MR Heterogeneity Horizontal pleiotropy MR-PRESSO
   Steiger
   DDA1 EC 19 1.103 (0.940–1.293) 0.245
   LRRC42 EC 18 1.112 (1.026–1.206) 0.01 0 16.78 0.469 0.491 TRUE
   LRRC42 EC 18 1.165 (0.993–1.367) 0.079 2 16.331 0.43 -0.007 0.01 0.516
   LRRC42 EC 18 1.100 (0.971–1.245) 0.133
   LRRC42 EC 18 1.045 (0.846–1.290) 0.69
   LRRC42 EC 18 1.076 (0.927–1.249) 0.349
   POLB EC 18 1.072 (1.004–1.145) 0.038 0 13.629 0.693 0.744 TRUE
   POLB EC 18 0.967 (0.864–1.084) 0.574 0 8.845 0.92 0.023 0.01 0.044
   POLB EC 18 1.045 (0.953–1.146) 0.349
   POLB EC 18 1.046 (0.923–1.185) 0.494
   POLB EC 18 1.048 (0.954–1.152) 0.339
   [100]Open in a new tab

Construction and evaluation of EC prediction models

   Based on differential analysis and MR results, four genes with
   consistent orientation and positive analysis results were identified.
   These genes were screened using both the LASSO (Fig. [101]5A)and RF
   (Fig. [102]5B) models, resulting in the selection of four genes with
   diagnostic value through the intersection of the two methods
   (Fig. [103]5C).To enhance diagnostic and predictive performance for EC,
   a nomogram was constructed using logistic regression analysis based on
   these four genes(Fig. [104]6A). The ROC curves indicated that the
   model’s AUC values for both the training (Fig. [105]6B) and validation
   (Fig. [106]6C) sets were greater than 0.96, suggesting strong
   diagnostic efficacy. The calibration curves demonstrated that the
   predictive probability of the nomogram closely matched the ideal model
   (Fig. [107]6D, [108]E). Additionally, DCA curves indicated that
   decision-making based on the nomogram may improve EC diagnosis
   (Fig. [109]6F, [110]G).

Fig. 5.

   [111]Fig. 5
   [112]Open in a new tab

   Screening of genes with diagnostic value using multiple machine
   learning algorithms. (A: Construction of an EC prediction model using
   LASSO modelling B: Construction of an EC prediction model using RF
   modelling, C: The two models A and B take the intersection.).

Fig. 6.

   [113]Fig. 6
   [114]Open in a new tab

   Construction of the prediction model. (A) Nomogram for the 4 genes with
   diagnostic value (B) ROC training curves for the 4 genes with
   diagnostic value (C) ROC testing curves for the 4 genes with diagnostic
   value (D) Training curves for nomogram prediction models (E) Testing
   curves for nomogram prediction models (F) DCA training Curve for
   nomogram prediction models (G) DCA testing Curve for nomogram
   prediction models.

Correlation analysis of genes of diagnostic value with immune cell
infiltration

   Analysis of immune cell infiltration in EC revealed that the expression
   of the key gene CDKN2A was significantly positively correlated with the
   degree of NK cell activation (P < 0.001) and negatively correlated with
   the degree of activated CD4 memory T cells (P < 0.001). The key gene
   LRRC42 was significantly positively correlated with Macrophages M1,
   Macrophages M2, and the degree of activated Dendritic cells (P <
   0.001).DDA1 gene expression was negatively correlated with the degree
   of resting CD4 memory T cells (P < 0.001), but positively correlated
   with the degree of activated NK cells (P < 0.01). POLB expression
   showed a significant positive correlation with the degree of follicular
   helper T cells and memory B cells (P < 0.01) (Fig. [115]7).

Fig. 7.

   [116]Fig. 7
   [117]Open in a new tab

   Correlation analysis of immune cell infiltration (ns, p ≥ 0.05;*, p <
   0.05;**, p < 0.01;***, p < 0.001).

Discussion

   In this study, we explored the potential causal relationship between OA
   and EC and investigated potential drug target genes for EC. MR analysis
   revealed small effect size (OR = 1.104) but it’s still important in
   epidemiological studies. Firstly, the prevalence of OA is high among
   middle-aged and older women, and a risk increase of about 10% may carry
   a significant disease burden at the population level. Secondly, MR
   analysis assess the effects of ‘lifetime exposure’ at the genetic
   level, so that the effect values are generally more conservative than
   for environmental exposures, but the causal relationships are more
   robust. In addition, this study further identified cofactor genes such
   as CDKN2A, reinforcing the biological plausibility of this risk
   association. Therefore, although it is only a mild risk factor, it has
   potential clinical translational value in disease prediction, mechanism
   research and management of high-risk populations. So we can consider OA
   a high risk factor for EC. In addition, this study has found: The
   common causative genes for OA and early EC—CDKN2A, DDA1, LRRC42, POLB,
   ADCYAP1R1C, DNMT3A, and GLRX5—along with CDKN2A,DDA1, LRRC42, and POLB
   specifically, are potential drug targets for EC and have diagnostic
   value.

   OA is recognized as a systemic inflammatory chronic disease, often
   associated with increased levels of various inflammatory serum markers,
   including interleukins, adipokines, chemokines, and tumor necrosis
   factor-alpha (TNF-α). Local and systemic inflammatory responses
   contribute to the destruction and remodeling of articular cartilage,
   facilitating OA development^[118]7. Gamma-interferon-induced
   mononuclear factor (MIG/CXCL9) is a high-risk factor for EC
   development, as a chronic inflammatory environment can promote tumor
   transformation. Alterations in inflammatory markers in OA patients have
   been strongly linked to the development of cancers such as breast,
   lung, ovarian, and bladder cancers^[119]8,[120]11,[121]25. This study
   clarified the causal relationship between OA and EC from a genetic
   perspective through MR analysis, indicating that OA is a significant
   risk factor for EC. Further research is needed to determine whether
   female patients with OA should be routinely screened for EC.

   CDKN2A is a cell cycle protein-dependent kinase inhibitor that plays an
   important role in cell cycle regulation. Its expression products
   include two proteins: p16INK4a and p14ARF. It has been shown that
   CDKN2A can further influence colorectal tumour development and
   progression by regulating copper ion concentration^[122]26, and copper
   deficiency may compromise cartilage integrity and increase the
   prevalence of OA^[123]27. A MR study showed that physiologically higher
   copper circulatory status was positively associated with the risk of
   developing OA^[124]28. Other studies have shown that abnormalities in
   copper metabolism also influence the development of many gynaecological
   tumours, including EC, and that copper may contribute to cell
   proliferation and neoplasia by affecting physiological processes such
   as mitochondrial respiration, redox, autophagy and antioxidant
   defences. It also promotes the growth and movement of vascular
   endothelial cells by regulating the synthesis and secretion of major
   pro-angiogenic mediators. It can also be involved in tumour spreading
   in conjunction with its binding proteins^[125]29,[126]30. In addition,
   with age, senescent cells accumulate in the body, and when joint damage
   occurs, senescent cells induce CDKN2A gene expression, which promotes
   the expression of the downstream protein p16 INK 4a, leading to
   synovial inflammation and accelerating the progression of OA^[127]31,
   at the same time methylation of this gene leads to p16INK4a loss of
   function and promotes the onset and progression of EC^[128]32. CDKN2A
   is involved in cell mitosis and regulates the cell cycle, with high
   expression correlating with poor cancer outcomes. Tumors with elevated
   CDKN2A expression tend to be more aggressive and have shorter
   recurrence intervals^[129]33.This study demonstrated that the CDKN2A
   gene is a common causative gene for OA and EC, which may promote OA by
   affecting cellular copper metabolism, while interfering with the normal
   cell cycle and promoting the onset and progression of EC. Therefore, we
   suggest that CDKN2A may be a potential target for pharmacological
   intervention in OA and EC and has diagnostic value for early EC and OA.

   DDA1 is a DNA damage repair-related gene that can form a complex with
   DET 1 and DDB 1, which functions in conjunction with
   ubiquitin-conjugating enzyme E2^[130]34. DDA1 is also part of the core
   subunit of the ubiquitin ligase E3 and can control
   transcription-coupled repair by regulating ubiquitination
   activity^[131]35. It has been shown that ubiquitin ligase E3 is
   involved in mitochondrial dysfunction, inflammatory vesicle induction,
   and matrix-degrading enzyme overexpression in the development of
   OA^[132]36, Curcumin reduces inflammation and oxidative stress in OA by
   regulating the function of ubiquitin ligase E3^[133]37.Deubiquitinating
   enzymes can affect chondrocyte function in OA patients by regulating
   the NF-κB signalling pathway and the Wnt/β-catenin signalling
   pathway^[134]38. In addition, overexpression of DDA1 is closely
   associated with the development and progression of a variety of
   malignant tumours. DDA 1 overexpression promotes lung tumour cell
   proliferation^[135]34. In breast cancer tissues, DDA1 is a target of
   STAT,and its overexpression favours cancer cell proliferation
   metastasis and invasion, Dihydroartemisinin inhibits proliferation and
   induces apoptosis in cisplatin-resistant breast cancer cells by
   regulating the STAT3/DDA1 signalling pathway^[136]39. In colon cancer
   tissues, DDA1 is activated, potentially promoting colon carcinogenesis
   through the NFκB/CSN2/GSK3β signaling pathway.^[137]40.
   Ubiquitin-conjugating enzyme E2 is strongly associated with EC
   progression, and it may have a role in promoting tumour metastasis and
   invasion^[138]41. DDA1 has been identified as a prevalent pathogenic
   gene associated with OA and EC, potentially through its influence on
   the regulation of ubiquitinating enzymes, which subsequently impacts
   the initiation and progression of both OA and early EC. Therefore, DDA1
   may be a biomarker for predicting EC, and Dihydroartemisinin may
   inhibit the progression of EC by regulating DDA1, but its specific
   mechanism needs to be further explored.

   LRRC42 is a member of the LRR superfamily and encodes a protein
   characterised by leucine-rich repeat sequences. LRR proteins have
   multiple functions, including apoptosis, nu-clear mRNA transport, cell
   adhesion, neuronal development, and immune response^[139]42. In
   addition, LRR superfamily expression is upregulated in many types of
   cancer. It has been shown that LRRC59 overexpression is associated with
   poor prognosis in lung cancer and promotes the proliferation and
   metastasis of lung cancer cells^[140]43. LRRC15 plays an important role
   in ovarian cancer metastasis and can increase the probability of
   adhesion, colonisation to the omentum and invasion, suggesting that
   LRRC15-targeted therapy can inhibit ovarian cancer progression to a
   certain extent^[141]44. However, there are relatively few reports on
   the role of LRRC42 in cancer progression. A study showed that LRRC42
   was highly overexpressed in lung cancer cells, and down-regulation of
   LRRC42 expression inhibited the growth of lung cancer cells, suggesting
   that LRRC42 may be an important growth-promoting factor in lung cancer
   cells^[142]45. Another study showed that LRRC42 expression was
   increased in hepatocellular carcinoma cell lines, however, there was a
   significant inhibition of cell proliferation when the gene was knocked
   out^[143]42. Then, we can speculate that the occurrence of early EC may
   also be associated with the overexpression of LRRC42. Although the
   relationship between LRRC42 and OA has not been clearly reported, it
   has been shown that LRRC39,which is of the same family as LRRC42,is
   specifically highly expressed in skeletal muscle^[144]46,LRRC42 is
   likely to be involved in the onset and development of OA as well.
   Inflammatory infiltration of the intra-articular microenvironment in
   patients with OA is predominantly macrophage-dominated and has been
   implicated as a cause of OA^[145]47,[146]48. Our study showed that
   LRRC42 is a common pathogenic gene in OA and EC, and its expression is
   significantly and positively correlated with immune cell infiltration,
   especially the activation status of macrophage M1,M2, and dendritic
   cells, which in EC suppresses the immune response through the secretion
   of anti-inflammatory factors, and accelerates the progression of EC
   through the secretion of pro-angiogenic factors that promote
   angiogenesis^[147]49. In summary, LRRC42 may be mediating the onset and
   progression of early EC as well as OA through Macrophages M1 and
   Macrophages M2, and LRRC42 is expected to be a potential target for
   pharmacological intervention, providing new insights into the diagnosis
   and treatment of both diseases.

   POLB is a DNA repair polymerase involved in base excision repair,
   recombination and drug resistance. It has been shown that colorectal
   cancer patients are often accompanied by overexpression of
   POLB^[148]50, Overexpression of POLB leads to cisplatin resistance and
   poorer prognosis in colorectal cancer patients^[149]51. In gastric
   cancer patients, overexpression of POLB stimulates tumour proliferation
   and promotes invasion and metastasis^[150]52. Another study reinforces
   the important role of POLB in tumour development^[151]53, In acute
   lymphoblastic leukaemia, POLB overexpressing tumour cells are more
   resistant, POLB inhibitor oleanolic acid (OA) increases the sensitivity
   of resistant cells to thiopurines^[152]54.Additionally, the dRP
   cleavage enzyme-deficient variant of POLB (Leu22-Pro or L22P) increases
   genomic instability associated with mitotic dysfunction, leading to
   cytoplasmic DNA-mediated inflammatory responses. Inhibition of poly ADP
   ribose polymerase 1 exacerbates chromosomal instability and enhances
   cytoplasmic DNA-mediated inflammatory responses^[153]55. In contrast,
   OA pathogenesis is primarily associated with a local inflammatory
   response^[154]8. Overall, the POLB gene may be linked to the
   development of early EC and OA, as supported by MR results from this
   study, though its exact pathogenesis requires further investigation.

   Although our study has identified robust causal relationships using MR
   analyses, several factors warrant consideration. First, the study
   population was limited to a European cohort, The findings of the
   research should be approached with caution when extrapolating to other
   groups characterized by heightened genetic and environmental diversity
   (for instance, Asian populations). Second, the database is derived from
   public sources, features a relatively small sample size, and lacks
   long-term follow-up data. Finally, completely excluding the effect of
   potential pleiotropy remains challenging in MR studies and thorough
   experimental validation is essential before any clinical applications
   can proceed, Future studies should repeat the validation in a
   multiracial context to test the robustness and applicability of this
   study’s findings.

Conclusion

   OA is a significant risk factor for the development of EC and may
   influence its progression. The DKN2A, DDA1, LRRC42, and POLB genes
   could be common causative factors for both early EC and OA. This
   suggests that OA and EC may share a common genetic susceptibility in
   certain populations. Validation of this finding in a large prospective
   cohort could provide a basis for clinical development of screening
   strategies for high-risk populations. For example, regular endometrial
   cancer screening is recommended for female OA patients carrying
   high-risk variants. In addition, co-causal genes such as CDKN2A and
   DDA1 may become common drug intervention targets, and subsequent
   studies may further combine functional experiments and drug databases
   to explore the druggability of these genes and their potential in
   combination therapy.

Electronic supplementary material

   Below is the link to the electronic supplementary material.
   [155]Supplementary Material 1^ (41.1KB, docx)
   [156]Supplementary Material 2^ (66.3KB, xlsx)
   [157]Supplementary Material 3^ (2MB, docx)

Abbreviations

   EC
          Endometrial cancer

   OA
          Osteoarthritis

   MR
          Mendelian randomization

   log2FC
          Log2 fold change

   GEO
          Gene Expression Omnibus

   GWAS
          Genome-wide association studies

   IVW
          Inverse-variance weighted

   eQTLs
          expression quantitative trait loci

Author contributions

   Yy.B., D.L. and S.L. conceived the project. Yy.B., S.L., Rz.S., Lw.Y.
   and Xm.Z. wrote the manuscript. Yy.B., S.L. and Lw.Y. performed the
   computational analysis. All authors read and approved the final
   manuscript.

Funding

   This study was supported by the Yinchuan Science and Technology
   Tackling Project (2023SFZD02) and Special Funds for the Central
   Government to Guide Local Scientific and Technological Development
   (2024FRD05067). The funders had no roles in study design, data
   collection and analysis, publication decision, or manuscript
   preparation.

Data availability

   The data analysed in this study are available from the GEO
   ([158]http://www.ncbi.nlm.nih.gov/geo/) (accession
   number:[159]GSE12021) and TCGA ([160]https://portal.gdc.cancer.gov/)
   databases (accession number: TCGA- UCEC), the IEU OpenGWAS project
   ([161]https://gwas.mrcieu.ac.uk/) (accession number: OA:
   ebi-a-GCST007092, endometrial cancer: ebi-a-GCST006464, 7 genes:
   eqtl-a-ENSG00000070501, eqtl-a- ENSG00000116212,
   eqtl-a-ENSG00000119772, eqtl-a-ENSG00000147889, eqtl-a-ENSG00000182512,
   eqtl-a-ENSG00000130311) were obtained.

Declarations

Competing interests

   The authors declare no competing interests.

Ethics approval and consent to participate

   This study’s data came from a European population via the publicly
   available GWAS database. Informed consent was obtained from the
   participants in the original study, which meant that ethics committee
   approval was not required for this aspect of the research.

Footnotes

   Publisher’s note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

References