Abstract Knee osteoarthritis (KOA) is a prevalent degenerative joint disorder, yet its underlying molecular mechanisms remain puzzling. This study aimed to uncover the genes with a causal relationship to KOA using Mendelian randomization (MR), transcriptomic profiling, and machine learning methods. MR analysis was conducted utilizing expression quantitative trait loci (eQTL) data from the eQTLGen consortium alongside KOA-related GWAS summary statistics to identify candidate genes. Subsequently, differential expression analysis and WGCNA were applied to synovial tissue microarray datasets obtained from the GEO database. The intersecting genes were further refined using three machine learning algorithms: LASSO, random forest, and SVM–RFE. Diagnostic efficacy was assessed via ROC curve analysis and nomogram construction. Validation was ultimately performed using qPCR on clinical synovial tissue samples. Twelve genes with putative causal associations to KOA were identified, with MEG3 and MAPK3 emerging as the most diagnostically robust. Both exhibited high sensitivity and specificity in ROC analysis, and their differential expression was corroborated by qPCR. This study underscores the diagnostic utility of MEG3 and MAPK3 in KOA and offers a promising molecular framework for early disease detection. Nonetheless, validation in larger, independent cohorts and further mechanistic investigations are warranted to substantiate these findings. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-06175-7. Keywords: Knee osteoarthritis, MEG3, MAPK3, Mendelian randomization, EQTL Subject terms: Computational biology and bioinformatics, Diseases Introduction Knee osteoarthritis (KOA) is a common age-related degenerative joint disease frequently observed among the elderly with rising global incidence and significant impact on mobility and quality of life^[34]1,[35]2. Aberrant inflammatory responses during synovitis initiate and exacerbate KOA by promoting cartilage degradation and disrupting the subchondral bone microenvironment, a process regarded as the primary trigger of the pathogenic cascade^[36]3. However, the exact pathophysiology of KOA remains incompletely understood, and no current treatment can fully cure the disease or halt its progression^[37]4. Thus, early identification of sensitive and specific diagnostic markers is crucial for effective intervention^[38]5. Mendelian randomization (MR) is a study design that utilizes genetic variants (SNPs) that are robustly associated with the exposures of interest to generate more reliable evidence. The randomization of SNP-based assignment at conception reduces confounding bias and lessens the impact of reverse causation, hence improving the reliability of causal findings^[39]6,[40]7. Transcriptome analysis based on data from the GEO database has been extensively applied in biomedical research^[41]8. This study identified key KOA-related genes by integrating Mendelian randomization (MR) with transcriptomic analyses and multiple machine learning algorithms. Publicly available GWAS and eQTL data were used to screen genes with causal associations via MR, Meanwhile, synovial transcriptome datasets obtained from GEO were subjected to differential expression analysis (DEA) and weighted gene co-expression network analysis (WGCNA) to uncover modules associated with disease pathology. The intersection of MR and transcriptomic results yielded core candidate genes, which were further prioritized through LASSO, Random Forest, and SVM. MEG3 and MAPK3 were ultimately identified as principal diagnostic targets. Their diagnostic performance was evaluated through ROC and nomogram models, and their expression was validated in synovial tissues via qPCR. This study evaluates MEG3 and MAPK3 PGRN as possible biomarkers for diagnosing KOA (Fig. [42]1). Fig. 1. [43]Fig. 1 [44]Open in a new tab Flow chart for this study. Materials and methods Data source and data processing Exposure factors were extracted using eQTL from the eQTLGen consortium, comprising 31,684 blood alongside peripheral blood mononuclear cell samples representing 19,942 genes^[45]9. In the variable screening process, the cis region was delineated as the area extending 10,000 kb both upstream and downstream of the gene coding sequence. Cis-eQTLs with P < 5 × 10⁻⁸ were identified as suitable variables for bidirectional MR study. Only SNPs having an effect allele frequency > 0.01 and an F-statistic > 10 were considered. Furthermore, the threshold of R² < 0.001 was utilized for linkage disequilibrium clustering, resulting in the exclusion of palindromic SNPs. The outcome can be downloaded from GWAS summary data ([46]https://gwas.mrcieu.ac.uk/), using “knee osteoarthritis” as the keyword to obtain the OA dataset (GWAS ID: ebi-a-GCST007090), which encompasses 403,124 samples: 24,955 patients and 378,169 controls. Transcriptome data were obtained from multiple human microarray datasets pertaining to KOA in the GEO collection, [47]GSE12021-[48]GPL96, [49]GSE1919-[50]GPL91, [51]GSE206848-[52]GPL570, [53]GSE55235-[54]GPL96, and [55]GSE55457-[56]GPL96. The initial three datasets, encompassing 43 samples: 21 healthy control (HC) and 22 KOA samples, served as the training set, whilst the latter two datasets, consisting of 40 samples (20 HC and 20 KOA samples), functioned as the validation set (Supplementary Table S1). The statistical analyses were done in R 4.3.2. The limma (version 3.62.2) and SVA (version 3.54.0) packages were utilized to mitigate batch effects that existed between datasets from various platforms. The sample data were normalized and integrated, followed by visualization and analysis by principal component analysis (PCA)^[57]10,[58]11. MR analysis This portion of the research was executed using publicly accessible GWAS pooled statistics and eQTL data, obviating the necessity for ethical approval. Initially, eligible SNPs were identified as IVs^[59]12adhering to three fundamental assumptions: (1) the correlation assumption, indicating that SNPs were significantly related to the exposure variable (eQTL data); (2) the independence assumption, ensuring that the IVs were not connected to potential confounders associated with KOA; and (3) the exclusion assumption, stipulating that the IVs were solely linked to KOA through the eQTL and did not influence the outcome through alternative pathways. The TwoSampleMR (version R 0.6.15) package was utilized to harmonize the IVs with the outcome data, subsequently applying five methodologies to evaluate causal effects: Inverse Variance Weighting (IVW), Weighted Median, MR Egger, Weighted Mode, and Simple Mode^[60]13. The Cochrane Q test assessed heterogeneity among the various SNP techniques. The Wald ratio approach was employed to ascertain the effect size of each SNP on both exposure and outcome. The sensitivity study utilized the MR Egger regression to measure pleiotropy risk and the leave-one-out approach to assess the influence of particular SNPs on the overall findings. Differential expression gene screening and WGCNA To determine differentially expressed genes (DEGs) between HC and OA samples, the limma package was deployed with a threshold of P < 0.05. The resultant DEGs were depicted using the function heatmaps of the pheatmap (version 1.0.12) package. The WGCNA (version 1.73) package generated gene co-expression networks for HC and OA samples^[61]14. Gene subsets exhibiting standard deviations over 0.5 were initially chosen for subsequent study. The WGCNA function goodSampleGenes was employed to validate the dataset, confirming no missing values or anomalies. The ideal soft threshold was subsequently identified using the WGCNA function pickSoftThreshold. Utilizing this threshold, we transformed the gene expression matrix into a comparable adjacency matrix. Gene modules were discovered by topological overlap-based clustering methods, and a hierarchical clustering dendrogram was created by calculating module eigengenes and consolidating comparable modules. Ultimately, phenotypic data (KOA vs. control) were incorporated to calculate gene significance (GS), defined as the absolute value of the Pearson correlation between gene expression and phenotype. Module significance (MS) was then determined as the average GS of all genes within each module. This analysis revealed the potential clinical relevance of key modules and their association with disease status. Screening of core genes The MR findings were filtered according to these criteria: (1) P < 0.05 for the IVW technique; (2) consistent odds ratio (OR) directions across all five methods, with all ORs either > 1 or < 1; and (3) p > 0.05 for heterogeneity in multiplicity analysis. Genes that satisfied the criteria of P < 0.05 and |logFC| > 0.585 in the DEA were classified as DEGs. Modules linked to disease, with P < 0.05 in the WGCNA, were classified as important modules. The VennDiagram (version 1.7.3) package was employed to conduct an intersection analysis of genes found in the MR, DEGs, and principal modular genes from the WGCNA^[62]15. The overlapping genes identified were designated as core genes. GO and KEGG enrichment analyses and chromosomal mapping The core genes underwent GO and KEGG enrichment assessment using the clusterProfiler (version 4.16.0) and enrichplot (version 1.28.2) packages^[63]16with significance levels established at P < 0.05 and q < 0.05. The GO analysis categorized the genes into three principal domains: biological processes (BP), cellular components (CC), and molecular functions (MF). Enrichment analysis of KEGG pathways revealed key signaling routes, offering critical insights into the biological relevance of the genomic data^[64]17. The enrichment outcomes were illustrated according to their statistical significance, including p-values. The circlize (version 0.4.16) package was utilized to map the genomic positions of the key genes, illustrating their chromosomal distribution in a visually relevant style. Immune cell infiltration profiling and associated correlation analysis In this study, the CIBERSORT, an analytical tool from the Alizadeh Lab and Newman Lab to estimate the relative abundance of various immune cell types from tissue gene expression profiles, was employed to assess immune cell infiltration in KOA compared to the control group. We determined the composition and relative abundance of 22 immune cell types in HC and OA samples, employing support vector regression for deconvolution analysis^[65]18. The outcomes were depicted using a heatmap produced by the heatmap function. The rank-sum test was utilized to evaluate variations in ICI between both groups, with a significance threshold established at P < 0.05. Additionally, Spearman correlation analysis was performed to investigate the interconnections between core genes and ICI. All results were visually displayed utilizing the ggplot2 (version 3.5.2) package, offering extensive perspectives into the interaction between immune cells and key genes. Screening of disease-related target genes Target genes were identified through three distinct machine learning approaches: LASSO regression, Random Forest (RF), and Support Vector Machine–Recursive Feature Elimination (SVM-RFE). The LASSO regression, implemented with the glmnet (version 4.1–8) package, applies L1 regularization to select relevant features and reduce model complexity, effectively preventing overfitting and improving generalizability in high-dimensional data^[66]19. The RF algorithm, run via the randomForest (version 4.7–1.2) package, employs ensemble learning to generate multiple decision trees from random data and feature subsets, enhancing model robustness and identifying genes with strong predictive importance^[67]20. SVM-RFE, executed with the mRFE (version 0.0.0.9000) package, is a supervised method that iteratively removes less informative features based on their contribution to class separation, providing accurate gene selection in high-dimensional datasets^[68]21. Results from the three computational models were integrated to pinpoint reliable biomarkers for KOA diagnosis. Development of ROC curves and nomogram models for predicting clinical outcomes The diagnostic efficacy of marker genes for KOA was assessed by constructing an ROC curve utilizing the pROC (version 1.18.5) package. The area under the curve (AUC) was calculated to assess diagnostic accuracy, with an AUC value greater than 0.7 indicating strong predictive performance. A nomogram risk prognostic model was built utilizing the rms (version 8.0–0) package to enhance the prediction of KOA risk. This model calculates a cumulative score derived from the contributions of specific marker genes, with elevated scores indicating a greater likelihood of developing KOA. The model’s prediction accuracy was assessed by generating a calibration curve with the regplot (version 1.1) package. A calibration curve slope approaching 1 indicates enhanced prediction accuracy and congruence between anticipated and actual results. qPCR validation The synovial sample collection strategy for HC and KOA was evaluated by the Ethics Committee of the Second Affiliated Hospital of Anhui Medical University (Ethics No. YX2022-104) and followed the applicable ethical principles and standards. The HC synovial samples were collected from three patients who had amputations resulting from automotive accidents, while KOA synovial samples were acquired from three patients with severe KOA who received surgical intervention. Immediately following specimen collection, the specimens were immersed in an RNA-preserving solution, thereby extracting RNA with a Trizol extraction kit (Invitrogen, Carlsbad, CA). The RNA was subjected to reverse transcription to cDNA by a reverse transcription kit (Thermo Scientific, Massachusetts, USA). The cDNA amplification via polymerase chain reaction was conducted with a qRT-PCR kit (GeneCopoeia, Guangzhou, China). The qPCR reaction utilized the SYBR Green detection technique, designating GAPDH as the internal reference gene. The ΔΔCt method, a relative quantitative tool, was deployed to determine the relative miRNA expression across various groups. Each sample underwent three technical duplicates to ensure result reliability. The utilized primer sequences are as follows: MAPK3 (F: CTACACGCAGCTGCAGTACATC, R: GTGCGCTGACAGTAGGTTTG), MEG3(F: TACACCTCACGAGGGCACTA, R: CAGGGCTTAATGCCCAATGC). Statistical analysis Quantitative data were obtained from three different experiments and shown as the mean ± standard deviation (X ± SD). Statistical analysis was performed with GraphPad Prism 8.0. An unpaired t-test was used for pairwise comparisons, with P < 0.05 being statistically significant. The outcomes were graphically depicted using bar charts. Results Causal relation between gene expression and KOA In the investigation of gene causality in KOA, genome-wide SNPs were screened and assessed using eQTL datasets combined with MR. After eliminating linkage disequilibrium, 5,430 genes were identified, encompassing 26,152 strongly associated SNPs (F > 10) (Supplementary Table 1), Based on exposure data-derived instrumental variables (SNPs), corresponding outcome data were extracted from the GWAS meta-analysis for KOA, resulting in a dataset of 25,490 SNPs significantly associated with the disease (Supplementary Table 2), Subsequent analyses verified that these instrumental factors displayed no significant heterogeneity (Supplementary Table 3) or pleiotropy (Supplementary Table 4), After standardizing the directions of reference alleles, an MR was performed, and the associated ORs were computed. Utilizing predefined screening criteria, 307 genes were ultimately identified as being linked to the development of OA. This comprised 138 and 169 high- and low-risk genes, respectively (Supplementary Table 5). Variations in these gene expression levels exhibited a strong association with KOA susceptibility. The activation of high-risk genes was significantly correlated with a heightened probability of developing KOA, while the upregulation of low-risk genes was negatively correlated, suggesting a preventive effect against the condition. Principal component analysis and differential gene screening Before data normalization and merging, PCA indicated that the three datasets were independently distributed without overlapping (Fig. [69]2A). However, after integration, a notable overlap in the distributions of the datasets was observed (Fig. [70]2B), indicating that the merged dataset could serve as a cohesive unit for subsequent analyses. Fig. 2. [71]Fig. 2 [72]Open in a new tab Principal Component Analysis and Identification of Differentially Expressed Genes (DEGs). (A–B) PCA: Sample distribution before and after data integration. (C) Volcano plot: DEG distribution between knee osteoarthritis (KOA) and healthy control (HC) samples. (D) Heatmap: Expression profiles of identified DEGs in KOA and control samples. Subsequently, DEA was performed to compare expression profiles between the HC and OA groups, uncovering considerable disparities between both groups (Fig. [73]2C–D). Out of the 2,340 discovered DEGs, 1,151 were highly upregulated, and 1,189 were significantly downregulated. The batch effect correction results and final expression matrices are provided in supplementary tables S2–S8. Screening and identification of WGCNA key modules Here, WGCNA was used to explore the correlation between genes and phenotypic features (i.e., OA vs. healthy control status). The appropriate soft threshold β = 4 (with a scale-free fitting exponent R² = 0.9) was initially established via network topology analysis and thereafter employed to construct the gene co-expression network for the HC and OA samples (Fig. [74]3C–D). Gene clustering was executed utilizing the dynamic tree-cutting algorithm, identifying six discrete co-expression modules, each designated by a distinct hue (Fig. [75]3A). Subsequently, Pearson’s correlation coefficients and significance levels between the modules and the osteoarthritis case-control status were calculated, and the results were illustrated in heat maps (Fig. [76]3B). The brown (434 genes), green (362 genes), yellow (372 genes), and turquoise (1,807 genes) modules were ultimately recognized as critical modules, and the genes within these modules were chosen for further research. Fig. 3. [77]Fig. 3 [78]Open in a new tab WGCNA Results and Key Module. (A) Gene hierarchical clustering dendrogram: OA vs. HC samples, where each branch represents a gene, and the branch color corresponds to the co-expression module to which it belongs. (B) Heatmap: Pearson correlations and their significance levels between each co-expression module and clinical features. (C–D) Scale-free network fit indices and average connectivity curves for different soft-threshold powers determining the optimal soft threshold β = 4. (E) Venn Diagram: Overlap among MR-associated genes, DEGs, and key WGCNA module genes. Screening for intersection core genes Here, we performed an intersection analysis encompassing 307 MR genes, 2,340 DEGs, and 2,795 pivotal genes from the WGCNA, all of which are causally associated with KOA progression. The study identified 12 essential genes (Fig. [79]3E). The MR findings for the 12 principal genes were illustrated through a forest plot to assess the causal association between these core genes and KOA. Genes exhibiting a P < 0.05, determined by the IVW approach, were recognized as being linked to the development of KOA. Genes were stratified into high- and low-risk categories according to their odds ratios (ORs), as depicted in the forest plot. Genes with ORs < 1 (left of the dashed line) indicated a diminished risk of KOA, whereas genes with ORs > 1(right of the dashed line) indicated an elevated KOA risk. Subsequently, we categorized these 12 core genes into five high-risk genes and seven low-risk genes, all within 95% confidence ranges (Supplementary Figure S1). To elucidate the potential relations among these genes, we conducted a correlation study between high-risk and low-risk genes, visualizing their functional connections or co-regulation through heatmaps or circular plots (Supplementary Figure S2). Core gene annotation and pathway enrichment analysis GO and KEGG enrichment analyses indicated involvement of the core genes in osteogenesis, immune regulation, cellular differentiation, and stress response. Molecular function and cellular component analyses revealed associations with receptor activity, enzymatic regulation, and intracellular signaling structures. KEGG pathways further implicated inflammatory responses, metabolic processes, and endocrine signaling. These findings suggest potential mechanistic roles of the core genes in KOA pathogenesis. Full enrichment results and chromosomal distributions are provided in supplementary tables S9 and S10 and supplementary Figure S3. Analysis of the relation between ICI of KOA and core gene co-expression To investigate the complex interplay between ICI and core gene expression in KOA, we analyzed the composition and relative abundance of 22 immune cell types in both KOA and HC samples utilizing the CIBERSORT algorithm (Fig. [80]4A). Subsequently, we performed a differential analysis of ICI (Fig. [81]4B), which indicated that OA samples exhibited a substantial increase in the infiltration of resting mast cells (P = 0.001) and follicular helper T cells (Tfh) (P = 0.046). Conversely, HC samples demonstrated a significantly greater infiltration of resting killer cells (P = 0.049). To further elucidate the immunological relevance of key genes, we assessed their correlations with the infiltration levels of the 22 immune cell types across all samples (Fig. [82]4C). Fig. 4. [83]Fig. 4 [84]Open in a new tab Immune cell infiltration (ICI) assessment and visualization analysis. (A) Composition and abundance distribution of 22 immune cells in KOA vs. HC samples. (B) Fiddle diagram: Differential analysis of immune cells. (C) Correlation analysis: Core gene expression with ICI level in the samples. Screening and identification of target genes for disease characterization In this study, twelve core genes were subjected to feature selection using a trio of machine learning approaches—Random Forest (RF), LASSO regression, and Support Vector Machine (SVM)—to isolate the most robust target candidates. Feature genes were initially discovered using the RF method, leading to the selection of four genes with relative importance scores beyond 0.9 (Fig. [85]5A–B): ASNS, GABBR1, MEG3, and MAPK3. The LASSO regression approach was subsequently employed to identify gene combinations with minimal cross-validation errors across 10 iterations (Fig. [86]5C–D), resulting in eight significant genes: ASNS, GABBR1, MEG3, MAPK3, PHACTR2, ST6GAL1, RAI14, and TCN1. The SVM technique was subsequently employed with 10 rounds of cross-validation to ascertain gene combinations with minimal errors (Fig. [87]5E–F), ultimately identifying five genes: MEG3, MAPK3, TCN1, PHACTR2, and ST6GAL1. Ultimately, overlapping results from the three computational models (Fig. [88]5G) revealed two candidate target genes, MEG3 and MAPK3, which may serve as biomarkers for OA diagnosis and prognosis. Fig. 5. [89]Fig. 5 [90]Open in a new tab Feature gene screening and result visualization based on RF, LASSO, and SVM algorithms. (A) Importance ranking plot based on RF model (B) Plot of the number of trees in the RF model vs. classification error rate (C–D) Plots of ten-fold cross-validation curves and trajectories of their coefficients for the LASSO model, with vertical dashed lines denoting optimal lambda values. (E–F) Trend plots of accuracy vs. cross-validation error for SVM models for optimizing hyperparameter selection. (G) Venn diagram: Screening results of the three algorithms. Causal association between MEG3, MAPK3 and KOA Building upon insights from prior MR research, we further investigated and established the connection between MEG3, MAPK3, and KOA (Fig. [91]6A, E). Fig. 6. [92]Fig. 6 [93]Open in a new tab Results of MR between MEG3, MAPK3, and OA. (A, E) Scatter plots: Causal effects of MEG3 and MAPK3 on the risk of developing OA. (B, F) Forest plots: Specific effect estimates and their confidence intervals for each SNP on OA risk. (C, G) Funnel plots: Causal relationship between MEG3 and MAPK3 expression levels and OA, with a largely symmetrical distribution of scatter points indicating no significant bias in the results. (D, H) Sensitivity analysis plots: Individual SNPs removed one by one to verify the stability of the causal effects of MEG3 and MAPK3 on OA risk. The IVW analysis demonstrated a significant interplay between the expression of these two marker genes and KOA risk. The OR for MEG3 was 0.962 (95% CI: 0.932–0.993, P = 0.017), suggesting a possible protective impact. In contrast, MAPK3 demonstrated an OR of 1.094 (95% CI: 1.048–1.148, P < 0.001), indicating a heightened risk of OA. Furthermore, we examined the distribution of effect values of SNPs on OA (Fig. [94]6B, F), noting that the funnel plots exhibited nearly symmetrical distributions. Moreover, MR-Egger regression analysis revealed no significant horizontal pleiotropy bias (Fig. [95]6C, G), hence strengthening the robustness of the causal relationships. To verify reliability, we performed sensitivity analyses by systematically removing specific SNPs and reanalyzing the remaining datasets. The results consistently corresponded with the initial findings (Fig. [96]6D, H), suggesting that the identified causal connections were collectively influenced by all SNPs rather than being predominantly affected by a particular SNP. These data underscore the strength and importance of the causal link among MEG3, MAPK3, and KOA. Development and performance evaluation of clinical prediction models To assess the individual specificity and sensitivity performance of MEG3 and MAPK3 in KOA, ROC curve analysis was conducted separately for each gene. In the training set, MEG3 and MAPK3 attained AUC values of 0.768 and 0.784, correspondingly (Fig. [97]7A), In the validation set, AUC values of 0.745 for MEG3 and 0.713 for MAPK3 further substantiated their diagnostic capability (Fig. [98]7H). Fig. 7. [99]Fig. 7 [100]Open in a new tab Construction, evaluative analysis, and synovial tissue qPCR validation of ROC curves utilizing the nomogram clinical prognostic model. (A, H) ROC curve analysis (B, I) and Nomogram risk scoring models in the training and validation cohorts. Calibration curves for both sets evaluating the model’s prediction consistency. The DCA for both sets evaluating the clinical decision-making efficacy of the model. The CIC for both sets further validating the model for clinical applications. Boxplots: Differential expression of MEG3 and MAPK3 in the (F–G) training and (M–N) validation set samples. (O–P) RT-qPCR results: Molecular expression of MEG3 and MAPK3 in synovial tissues. Control and experimental groups (n = 3/group). Utilizing the training and validation datasets, we constructed a nomogram-based clinical prognostic model that integrates MEG3 and MAPK3 as marker genes to evaluate their diagnostic efficacy (Fig. [101]7B, I). The model exhibited an accuracy of 90% in forecasting KOA risk across both datasets. The calibration curves (Fig. [102]7C, J) demonstrated slopes near 1, signifying exceptional prediction accuracy. Additional validation via decision analysis curves (DAC) and clinical impact curves (CIC) substantiated the model’s reliability. In the DAC, a net gain value beyond 0 indicated outstanding predictive performance (Fig. [103]7D, K), but the congruence of the red solid line and blue dashed line in the CIC underscored significant clinical utility (Fig. [104]7E, L). Differential expression of MEG3 and MAPK3 was confirmed in both the training and validation datasets. In the training set, Box plot analysis indicated that MEG3 expression was significantly increased in HC samples (P = 0.0021), while MAPK3 expression was substantially boosted in KOA samples (P = 0.0011), (Fig. [105]7F, G). The validation set demonstrated consistent tendencies, with MEG3 (P = 0.0073) and MAPK3 (P = 0.021) exhibiting analogous expression patterns (Fig. [106]7M–N). Moreover, the expression patterns of MEG3 and MAPK3 were corroborated using qPCR assays, revealing a considerable increase in MEG3 expression in HC samples, while MAPK3 expression was markedly enhanced in KOA samples (Fig. [107]7O, P). aligning with the causal associations established by MR analysis. Discussion Knee osteoarthritis (KOA) is a multifactorial musculoskeletal disease marked by chronic inflammation and progressive joint deterioration, frequently resulting in impaired mobility or disability^[108]22. Synovial inflammation plays a key role in cartilage degradation and subchondral bone remodeling through immune cell infiltration (ICI), accelerating joint dysfunction and underscoring the importance of synovial involvement in KOA pathogenesis^[109]23–[110]25. Early diagnosis is often challenging due to subtle initial symptoms, while disease progression typically results in persistent pain, cartilage destruction, and, ultimately, joint replacement surgery^[111]24. Consequently, identifying particular biomarker targets is crucial for the prompt diagnosis and efficient management of KOA. Publicly available transcriptomic datasets from GEO have been extensively applied in KOA-related research. For example, Zhou et al. integrated GEO and HAGR datasets to pinpoint aging-related key genes, employing differential gene analysis, WGCNA, and SVM techniques^[112]26. Functional enrichment analyses indicated the implication of these genes in immune response, apoptosis, and inflammatory pathways. The findings indicated that synovial senescence may expedite KOA progression through immune-mediated inflammation. This study exclusively utilized SVM for gene screening. While SVM is proficient in classification and regression tasks, its susceptibility to overfitting and constraints in managing intricate feature interactions may result in the neglect of essential features. Jun Qin et al. consolidated GEO and HADb autophagy gene databases to ascertain DEARGs, emphasizing the significance of CDKN1A, DDIT3, MAP1LC3B, and MYC in KOA^[113]27. This study identified significant autophagy-related genes; however, its limited sample size and exclusive dependence on PPI network analysis, without multi-algorithm cross-validation, compromised the robustness and trustworthiness of the results. These instances highlight the necessity for rigorous techniques, cross-validation, and extensive datasets to improve the accuracy and relevance of biomarker studies in KOA. Here, we integrated Mendelian randomization, differential expression analysis, and WGCNA, and identified twelve core genes associated with KOA, including five potentially promoting disease progression and seven with presumed protective effects. Validation with the IVW technique identified MEG3 (OR: 0.962, 95% CI: 0.932–0.993, P = 0.017) and MAPK3 (OR: 1.094, 95% CI: 1.048–1.140, P < 0.001) as important core genes. Although the effect sizes were moderate, such odds ratios are frequently observed in complex diseases and may still reflect biologically meaningful associations.A subsequent correlation study demonstrated that MEG3 exhibits a positive association with RAI14, MAPK3, GABBR1, CSTA, PHACTR2, and ASNS while showing a negative correlation with RYR1 and CD79B. Likewise, MAPK3 had positive associations with PHACTR2, ASNS, and CD79B while demonstrating negative connections with RYR1 and TCN1. Both GO and KEGG enrichment analyses revealed substantial participation of these core genes in essential biological processes and pathways, including the immune system, signal transduction, neurological system, and endocrine system regulation. Enrichment was observed in pathways related to neurological diseases, amino acid metabolism, some malignancies, excretory systems, glycan production and metabolism, and endocrine and metabolic disorders. These findings emphasize the significance of MEG3 and MAPK3 as prospective biomarker targets in OA, illustrating their role in several biological processes and pathways essential to the pathogenesis and progression of the condition. Studies have demonstrated that immune-mediated inflammation plays a pivotal role in the onset and progression of osteoarthritis (OA). The infiltration of immune cells within the synovial tissue of OA patients triggers synovial inflammation and accelerates the degradation of articular cartilage, we investigated the relationship between ICI and significant genes in KOA, indicating a significant rise in the infiltration of resting mast cells (P = 0.001) and Tfh (P = 0.046) in KOA samples, While Tfh cells and mast cells have not been traditionally emphasized in OA pathogenesis, emerging studies suggest their potential involvement in inflammatory joint conditions. In rheumatoid arthritis (RA), Deng et al. reported that excessive differentiation of Tfh cells correlates with disease activity^[114]28. Zhang et al. demonstrated that Tfh cells can directly drive inflammation independent of B cells in inflammatory bowel disease models^[115]29​. Furthermore, Pan et al. reported increased proportions of Tfh cells and higher concentrations of cytokines (IL-21, BLyS) in the synovial fluid of RA patients relative to those with OA^[116]30. Although direct evidence in OA is limited, these studies suggest that Tfh cells might contribute to the inflammatory environment in joint diseases by supporting B cell activation and cytokine production Similarly, mast cells can release inflammatory mediators such as tryptase and growth factors, which promote chondrocyte apoptosis and synovial inflammation. Previous studies have shown increased mast cell markers and factors like FGF2 in OA synovium, particularly in obese patients, suggesting a potential role in cartilage degradation^[117]31,[118]32. A correlation study of immune cells revealed that these genes were strongly linked to several immune cell types. These findings indicate that these genes may be crucial in the initiation and advancement of OA by regulating the activity or function of certain immune cells. To enhance the clinical applicability and representative value of the selected genes, we employed an integrative approach using three machine learning models—LASSO, RF, and SVM—to evaluate the 12 core candidates. Cross-validation of these methods converged on MEG3 and MAPK3 as the key targets. To verify the credibility of the screening results, we additionally evaluated the predictive efficacy of the two genes in both sets, and found that they exhibited consistently high AUC values, demonstrating robust diagnostic performance. A nomogram incorporating MEG3 and MAPK3 expression levels was constructed and validated, exhibiting strong calibration and promising clinical applicability. Importantly, this model assesses KOA risk by quantifying gene expression levels, and may be applicable to peripheral blood samples. Since the eQTL data used for gene identification were derived from blood, and both genes were also differentially expressed in synovial tissues, the integration of blood-based genetic regulation with tissue-level expression profiles supports the feasibility of a blood-based diagnostic tool. This offers a non-invasive and accessible strategy for early KOA screening and risk stratification in clinical settings. Additionally, the expression profiles of MEG3 and MAPK3 were examined in individuals affected by KOA. The results indicated that MEG3 was markedly under-expressed in the patient cohort (Figs. [119]7F, M), while MAPK3 was considerably over-expressed (Figs. [120]7G, N). To further corroborate these findings, we performed a qPCR to verify the expression levels of MEG3 and MAPK3 (Figs. [121]7O, P). The qPCR results confirmed our prior analyses, indicating that MEG3 was expressed at reduced levels in the patient group, whereas MAPK3 had elevated expression. The concordance between the qPCR results and previous analysis substantiates the expression profiles of these two genes in KOA patients and underscores their potential biological significance. However, it should be noted that the qPCR validation was conducted on a limited number of synovial tissue samples (N = 3 per group), which may constrain the statistical power of the findings. Therefore, the findings should be considered preliminary. Additionally, although control samples lacked clinical or radiological evidence of KOA, no specific imaging or histological assessments were conducted to exclude subclinical OA, leaving the possibility of asymptomatic degeneration. Further validation in larger, independent cohorts is warranted to substantiate the diagnostic utility of these biomarkers and assess their applicability in early KOA detection. MEG3 is a long non-coding RNA (lncRNA) and serves a crucial function in numerous biological processes^[122]33. Tang et al. demonstrated in a study on the pathogenesis of psoriasis that MEG3 overexpression promotes autophagy and mitigates inflammatory responses^[123]34. Hence diminishing the inflammatory response and mitigating psoriasis-related inflammation. In investigations concerning liver fibrosis (LF), WU et al. discovered that MEG3 modulates its expression through direct interaction with NLRC5, to regulate NF-κB signaling, influencing downstream inflammatory mediators^[124]35. The overexpression of MEG3 counteracted NLRC5 activation and mitigated the advancement of LF. Our findings indicate that MEG3 may function as a pivotal gene in osteoarthritis, this observation resonates with the study by Wang et al., which demonstrated MEG3’s chondroprotective role through the miR-361-5p/FOXO1 signaling axis^[125]36. Conversely, You et al. reported that MEG3 inhibits chondrogenesis of synovium-derived mesenchymal stem cells via the EZH2/TRIB2 pathway, underscoring its potentially context-dependent or stage-specific functions in OA^[126]37. While earlier studies have primarily focused on its role in cartilage biology, our findings suggest that MEG3 may also participate in synovial inflammation and systemic immune modulation, meriting further functional exploration. MAPK3, commonly referred to as ERK1, has been recognized as a key modulator in the pathogenesis of osteoarthritis (OA). Loeaser et al. demonstrated that ERK1/2 signaling is abnormally activated in OA cartilage, promoting chondrocyte catabolism and inflammation^[127]38. Although earlier ERK inhibitors showed therapeutic promise, their clinical development was hindered by systemic toxicity and low isoform specificity. Recent advances in nanomedicine, such as pH-responsive nanocarriers, have improved local delivery, enhanced tissue selectivity, and reduced adverse effects^[128]39. These innovations may be applicable for intra-articular treatment in OA joints. Furthermore, early-phase clinical trials have shown that optimized dosing regimens and routes of administration can mitigate toxicity, further supporting the feasibility of MAPK3 as a therapeutic target^[129]40. This study highlights the probable functions of MEG3 and MAPK3 in KOA etiology from a fresh viewpoint and establishes a basis for further investigation of these genes as prospective disease indicators targets. Furthermore, by using the eQTL dataset derived from blood samples, we established a link between the blood expression levels of MEG3 and MAPK3 and KOA, suggesting that KOA is a systemic condition. These results enhance insights into KOA pathogenesis and highlight MEG3 and MAPK3 as potential biomarkers with diagnostic and prognostic relevance, offering meaningful implications for early identification and clinical surveillance. Conclusion This investigation underscored the pivotal involvement of MEG3 and MAPK3 in KOA, revealing their substantial diagnostic and prognostic value. The constructed risk model supports early detection and facilitates evaluation of rehabilitation progress. Electronic supplementary material Below is the link to the electronic supplementary material. [130]Supplementary Material 1^ (39.1KB, docx) [131]Supplementary Material 2^ (3.1MB, csv) [132]Supplementary Material 3^ (5.3MB, csv) [133]Supplementary Material 4^ (531.9KB, csv) [134]Supplementary Material 5^ (227.2KB, csv) [135]Supplementary Material 6^ (3.5MB, csv) [136]Supplementary Material 7^ (12.1MB, tif) [137]Supplementary Material 8^ (7.4MB, tif) [138]Supplementary Material 9^ (17.4MB, tif) [139]Supplementary Material 10^ (2.6KB, csv) [140]Supplementary Material 11^ (3.9KB, csv) [141]Supplementary Material 12^ (33.4KB, pdf) [142]Supplementary Material 13^ (68KB, pdf) [143]Supplementary Material 14^ (503.4KB, csv) [144]Supplementary Material 15^ (2.1MB, csv) [145]Supplementary Material 16^ (2.2MB, csv) [146]Supplementary Material 17^ (1.5MB, csv) [147]Supplementary Material 18^ (1.6MB, csv) [148]Supplementary Material 19^ (16KB, docx) [149]Supplementary Material 20^ (5.9MB, csv) [150]Supplementary Material 21^ (8.2MB, csv) Acknowledgements