Abstract Knee osteoarthritis (KOA) is a common chronic joint disease globally. Synovial inflammation plays a pivotal role in its pathogenesis, preceding cartilage damage. Identifying biomarkers in osteoarthritic synovial tissues holds promise for early diagnosis and targeted interventions. Gene expression profiles were obtained from the Gene Expression Omnibus database. Subsequent analyses included differential expression gene (DEG) analysis and weighted gene co-expression network analysis (WGCNA) on the combined datasets. We performed functional enrichment analysis on the overlapping genes between DEGs and module genes and constructed a protein–protein interaction network. Using Cytoscape software, we identified hub genes related to the disease and conducted gene set enrichment analysis on these hub genes. The CIBERSORT algorithm was employed to evaluate the correlation between hub genes and the abundance of immune cells within tissues. Finally, Mendelian randomization analysis was utilized to assess the potential of these hub genes as biomarkers. We identified 46 differentially expressed genes (DEGs), comprising 20 upregulated and 26 downregulated genes. Using WGCNA, we constructed a gene co-expression network and selected the most relevant modules, resulting in 24 intersecting genes with the DEGs. KEGG enrichment analysis of the intersecting genes identified the IL-17 signaling pathway, associated with inflammation, as the most significant pathway. Cytoscape software was utilized to rank the candidate genes, with JUN, ATF3, FOSB, NR4A2, and IL6 emerging as the top five based on the Degree algorithm. A nomogram model incorporating these five genes, supported by ROC curve analysis, validated their diagnostic efficacy. Immune infiltration and correlation analysis revealed that macrophages were significantly associated with JUN (p < 0.01), FOSB (p < 0.01), and NR4A2 (p < 0.05). Additionally, T follicular helper cells showed significant associations with ATF3 (p < 0.05), FOSB (p < 0.05), and JUN (p < 0.05). Mendelian randomization analysis provided strong evidence linking JUN (IVW: OR = 0.910, p = 0.005) and IL6 (IVW: OR = 1.024, p = 0.026) with KOA. Through the utilization of various bioinformatics analysis methods, we have pinpointed key hub genes relevant to knee osteoarthritis. These findings hold promise for advancing pre-symptomatic diagnostic strategies and enhancing our understanding of the biological underpinnings behind knee osteoarthritis susceptibility genes. Keywords: Knee osteoarthritis, Synovial tissue, Biomarkers, Weighted correlation network analysis, Gene set enrichment analysis, Bioinformatics analysis, Immune infiltration, Mendelian randomization Subject terms: Biomarkers, Diseases, Medical research, Pathogenesis Introduction Knee Osteoarthritis (KOA) is a common chronic degenerative joint disease affecting over 300 million people globally and is recognized as a primary cause of non-traumatic disability^[40]1. Various factors, primarily mechanical injury, chronic inflammation, and immune dysregulation, interact to drive the formation and pathological changes of KOA, including cartilage degradation, synovial inflammation, subchondral bone remodeling, and osteophyte formation^[41]2. As research progresses, the significance of synovial inflammatory reactions in the course of KOA has garnered increasing attention from the academic community. Recent studies suggest that synovial lesions may occur prior to cartilage changes^[42]3, and prolonged chronic and mild synovial inflammation leads to continuous erosion of chondrocytes, resulting in a vicious cycle of mutual damage between synovium and cartilage^[43]4. Thus, exploring the pathogenesis of osteoarthritis and diagnostic markers from the perspective of synovium holds clinical significance in identifying therapeutic targets for osteoarthritis. With the advancement of bioinformatics technology and the rapid development of high-throughput sequencing, whole transcriptome sequencing has been applied to explore the pathogenesis of KOA. Through the integration of multiple analysis methods, the identification of key pathogenic genes playing pivotal roles in the occurrence and progression of KOA, and the deeper understanding of their biological functions, contribute to a better comprehension of the disease mechanism and aid in the selection of future therapeutic targets. Materials and methods Data source and processing In the GEO database ([44]https://www.ncbi.nlm.nih.gov/geo/), sequencing datasets of KOA synovial tissue were retrieved using the search term ‘knee osteoarthritis’. Six datasets, namely [45]GSE55235, [46]GSE55457, [47]GSE12021, [48]GSE32317, [49]GSE82107, and [50]GSE46750, were obtained. The synovial tissues in [51]GSE55235 and [52]GSE55457 originated from a three-center study conducted by Dirk Woetzel’s team^[53]5, with detailed characteristics of all cases, including age, gender, and medical history, presented in Table [54]1. The remaining data sets were used for validation. Table [55]2 summarizes key details of the datasets sourced from GEO. Table 1. Characteristics of the study cohort. Patients (total number) Ethnic rigin Gender (male/female) Age (Years) Disease duration (years) RF (+ / −) ESR (mm/1 h) CRP^a (mg/l) Concomitant medication (n) Control n = 20 Caucasian 15/5 54.7 ± 4.0 0.3 ± 0.3^b n.d n.d n.d NSAIDs (n = 1) (n.d. = 13) None (n = 7) (n.d. = 12) Osteoarthritis n = 26 Caucasian 4/22 71.0 ± 1.4 7.0 ± 1.3 3/18 22.4 ± 2.7 5.3 ± 1.5 NSAIDs (n = 16) (n.d. = 1) (n.d. = 5) (n.d. = 5) (n.d. = 3) None (n = 10) [56]Open in a new tab For the parameters age, disease duration, ESR, CRP, means ± standard error of the mean are given; for the remaining parameters, numbers are provided. n.a., not applicable; n.d., not determined; NSAID, nonsteroidal anti-inflammatory drug; RF, rheumatoid factor; ESR, erythrocyte sedimentation rate; CRP, C-reactive protein. ^aNormal range, < 5 mg /l; ^bDisease duration in joint trauma patients. Table 2. Information of the GEO datasets. Dataset Platform Manufacturer Group Tissue type Trait Normal OA [57]GSE55235 [58]GPL96 Affymetrix HG-U133A 10 10 Synovial membrane train [59]GSE55457 [60]GPL96 Affymetrix HG-U133A 10 10 Synovial membrane train [61]GSE12021 [62]GPL96 Affymetrix HG-U133A 9 10 Synovial membrane test [63]GSE82107 [64]GPL570 Affymetrix HG-U133 Plus 2.0 Array 7 10 Synovial membrane test [65]GSE32317 [66]GPL570 Affymetrix HG-U133 Plus 2.0 Array 0 10 (Early) 9 (End) Synovial membrane test [67]GSE46750 [68]GPL10558 Illumina Human HT-12 V4.0 0 12 (Normal) 12 (Inflammatory) Synovial membrane test [69]Open in a new tab Using Perl software ([70]https://www.perl.org/; version 5.32.1), we annotated the probes in each dataset series matrix file, and the average expression values of all probes corresponding to the same gene were calculated. Subsequently, the [71]GSE55235 and [72]GSE55457 datasets were merged, and genes expressed in all samples were included in the study to obtain the integrated gene expression matrix. Batch normalization was performed using the “limma” and “sva” packages in the R software (version 4.3.1) to eliminate differences between different batches. Identification of differentially expressed genes (DEGs) We performed differentially expressed genes (DEGs) analysis screening using the “limma” package. Afterward, a False Discovery Rate (FDR) test was conducted on the differential expression results. Differentially expressed genes were filtered based on the criteria of adjusted p-value < 0.05 and |log2 FC (fold change)|≥ 2. Volcano plots and DEGs expression heatmaps were generated by processing the “ggplot2” and the “pheatmap” R packages. Weighted gene co-expression network analysis (WGCNA) WGCNA, a systematic methodology in biology, is frequently employed to unveil intricate patterns of genetic associations among diverse samples. By scrutinizing the intricate interplay between genes and their phenotypic manifestations, WGCNA serves as a valuable tool for pinpointing promising marker candidates^[73]6. We utilized the “WGCNA” R package to construct a gene co-expression network of KOA. First, a sample-clustering tree was drawn to assess the presence of outliers. Second, the sft function in the pickSoftThreshold function automatically recommends an appropriate soft threshold after pre-calculation. After manually reviewing the pre-calculation results and considering factors such as the fit index and the average connectivity of network nodes, we set the soft thresholding power β to 3 when R^2 > 0.8 and the average connectivity was high, which was consistent with the function's recommendation. Subsequently, a topological overlap matrix (TOM) was constructed to establish the network topology. Third, the “DynamicTreeCut” method was used to classify genes with similar expression profiles into the same gene modules (the parameters were adjusted to minModuleSize = 60 and deepSplit = 2). We measured the association between module genes and sample trait information by computing the Pearson correlation coefficient and P-value. This analysis served to evaluate the relationships between various modules and clinical traits. The module exhibiting the highest correlation coefficient was chosen for further analysis, and the conclusive results were visually represented through a heatmap depicting the correlations. GO / KEGG analysis for candidate genes Gene Ontology (GO) is a set of extensively used gene annotation terms, employed to identify biological features in high-throughput genomic or transcriptomic data, encompassing biological processes, cellular components, and molecular functions^[74]7. The KEGG (Kyoto Encyclopedia of Genes and Genomes) database stands as an invaluable resource for systematically analyzing gene functions, and seamlessly linking genomic data with broader functional insights, particularly gleaned from extensive molecular datasets^[75]8,[76]9. We utilized the R programming language to perform GO analysis and KEGG pathway enrichment analysis. Before initiating the analysis, we identified the intersection of DEGs and WGCNA-derived hub genes using the “VennDiagram” R package. The genes at this intersection were designated as candidate hub genes associated with the pathogenesis of KOA. Subsequently, we applied R packages, including “clusterProfiler”, “org.Hs.eg.db”, and “enrichplot”, for GO enrichment analysis and KEGG analysis on the core genes obtained from the intersection. This allowed us to unravel the enriched functional pathways of these core genes, aiding in comprehending the molecular mechanisms underlying KOA pathogenesis and progression. Finally, we visualized the results using the ggplot2 package. Protein–protein interaction (PPI) networks of candidate hub genes We employed the STRING online database ([77]http://string-db.org) and the Cytoscape software platform to establish and visualize PPI and molecular interaction networks. Initially, we inputted intersecting genes into the STRING database to extract and illustrate PPI networks. Subsequently, using the Cytoscape software, we ranked significant genes within these networks. Gene set enrichment analysis (GSEA) of candidate hub genes To further investigate the potential signaling pathways through which hub genes influence disease onset and progression, we conducted a more in-depth enrichment analysis. According to the median gene expression, KOA synovial tissue samples were divided into high- and low-expression groups. Gene set enrichment analysis was performed via the GSEA^[78]10 software (version 4.3.3). GO (‘c5.go.bp.v2023.2.Hs.symbols.gmt’) and KEGG (‘c2.cp.kegg_medicus.v2023.2.Hs.symbols.gmt’) gene sets were selected as the reference gene set. The numbers of maximum and minimum gene sets were set as 500 and 10 as filtering conditions, respectively. The final results were obtained based on a P-value of < 0.05 and |NES|> 1. Finally, we used the “GseaVis” package to visualize and combine the GESA analysis results. Construction of a nomogram model based on hub genes We conducted a Receiver Operating Characteristic (ROC) curve analysis to evaluate the predictive potential of the identified core genes as potential biomarkers. The Area under the Curve (AUC) was utilized to assess the diagnostic accuracy of these genes, with a higher AUC indicating enhanced precision in detection. Additionally, a nomogram model was constructed using the “rms” package^[79]11,[80]12 to predict the risk of KOA. Evaluation of immune cell infiltration To assess immune cell infiltration in the synovium of knee osteoarthritis patients, explore correlations, and analyze differences among immune cells, we investigated the relationships between immune cells and characteristic genes. The study utilized the CIBERSORT deconvolution algorithm to calculate tissue expression profiles for all KOA samples in the immune database. Through 1000 simulated calculations, we obtained the distribution of 22 immune cell infiltrations in KOA tissue samples. Employing a significance threshold of p < 0.05, we calculated the percentage of each immune cell in the samples, establishing correlations between genes and immune cells. Concurrently, immune cell analyses were conducted on characteristic genes, revealing the immune cells associated with characteristic genes in KOA. Mendelian randomization (MR) We employed a two-sample Mendelian randomization analysis to confirm the potential causal relationships between the identified core genes and related diseases. Data corresponding to hub genes were retrieved from open databases, with detailed information available in the data availability statement at the end of the manuscript. We first screened for significant SNPs (p < 5 × 10^−6) and removed those in linkage disequilibrium (LD, R^2 = 0.01, kb = 5000). For IL6, the number of filtered instrumental variables was insufficient, so we relaxed the p-value threshold to 1 × 10^−5. We tested and excluded weak instrumental variables (F > 10). For ATF3 and NR4A2, which lacked GWAS data, we used cis-eQTL data from the eQTLGen database as exposure data. Gene information, including chromosome number and BP location (GRCh37 version), was retrieved from the NCBI Gene database ([81]https://www.ncbi.nlm.nih.gov/gene/). All SNPs within 1000 kb upstream and downstream of the gene loci were extracted as eQTL data for these genes. We then selected SNPs associated with the gene (p-value threshold less than 5 × 10^−6, LD threshold R^2 < 0.2, and kb = 500)^[82]13. After completing these steps, we performed a two-sample Mendelian randomization analysis using KOA GWAS data. Our study primarily relied on results derived from the inverse-variance weighted (IVW) method. We used Cochran’s Q test to assess heterogeneity. To investigate potential pleiotropy, we employed the MR-Egger intercept method proposed by Emdin et al.^[83]14 and Dudbridge^[84]15. An MR-Egger intercept p-value greater than 0.05 indicated the absence of pleiotropy. Additionally, we implemented a “leave-one-out” analysis to systematically assess the influence of individual SNPs on the MR results. Results DEGs screening Merging [85]GSE55235 and [86]GSE55457 datasets and removing batch effects. Comparative the KOA and normal groups in the merged dataset led to the identification of 46 Differentially Expressed Genes (DEGs), comprising 20 upregulated and 26 downregulated genes. The detailed results of the DEGs are visually presented in the figure (Fig. [87]1a, b; Supplementary Table [88]2). Fig. 1. [89]Fig. 1 [90]Open in a new tab Differentially expression analysis. (a) Differentially expressed gene expression volcano plot between KOA and normal controls. (b) Differentially expressed gene expression heatmap between KOA and normal controls (adjusted p < 0.05 and a |log2 FC|≥ 2). Construction of WGCNA network and identification of knee osteoarthritis-related module Due to differences in gene expression levels, the outlier sample ([91]GSM1337307) was removed following clustering analysis. A soft-thresholding parameter (β) of 3 was set. At this threshold, the network approximates scale-free network characteristics (R^2 = 0.84) while maintaining high average connectivity. Utilizing the “dynamicTreeCut” package, the clustering distance threshold was defined as 0.25. After merging certain gene modules, a total of 7 gene modules were identified. Figure [92]2d illustrates the relationship heatmap between gene modules and traits. In the module-trait relationship heatmap, the blue module, consisting of 889 genes, exhibited a significant correlation with KOA (p = 8e–10). This blue module was selected as the characteristic module for further analysis (Fig. [93]2e; Supplementary Table [94]3). Fig. 2. [95]Fig. 2 [96]Open in a new tab Identification of knee osteoarthritis-related gene modules in the GEO dataset using WGCNA. (a) Scale independence corresponding to different soft thresholds (β). (b) Mean connectivity corresponding to different soft thresholds. (c) The scale-free topology is checked at a soft threshold of 3. (d) Module-trait heatmap showing the correlation between the clustered gene modules and KOA in the merged dataset. Each module contains the corresponding correlation coefficient and p-value. (e) The scatter plot of the blue module, which shows the strongest positive correlation with KOA in the merged dataset. GO / KEGG analyses The overlap between the identified DEGs and candidate genes from WGCNA yielded 24 common genes (depicted in Fig. [97]3a and detailed in Supplementary Table [98]4). These 24 genes underwent GO and KEGG pathway analyses. The GO functional annotation indicated that the biological processes these genes were involved in were primarily related to chemotaxis and migration of chemokines and inflammatory cells. The main molecular functions included RNA polymerase II transcription regulator complex, and the cellular components were primarily associated with cytokine activity and its receptors, as well as chemokine activity and its receptors (Fig. [99]3b). KEGG pathway analysis demonstrated that the genes in the module were mainly involved in pathways related to immune regulation and inflammation, such as IL-17 Signaling Pathway, NOD-like Receptor Signaling Pathway, NF-kappa B Signaling Pathway, TNF Signaling Pathway, Viral Protein Interaction with Cytokine and Cytokine Receptor. Additionally, pathways associated with infection and microbial interaction included Epithelial Cell Signaling in Helicobacter pylori Infection, Legionellosis, Kaposi Sarcoma-Associated Herpesvirus Infection, Viral Protein Interaction with Cytokine and Cytokine Receptor. Metabolic and atherosclerosis-related pathways encompassed Lipid and Atherosclerosis (Fig. [100]3c). Fig. 3. [101]Fig. 3 [102]Open in a new tab Screening and validation of candidate hub genes. (a) Venn diagram revealed 23 overlapping candidate hub genes. (b) GO enrichment analysis of candidate hub genes. (d) KEGG pathway analysis of candidate hub genes. PPI network analysis for hub genes Utilizing the STRING online tool, we constructed a Protein–Protein Interaction (PPI) network encompassing the hub genes that displayed overlap, as illustrated in Fig. [103]4a. Following this, we visualized the top five highly ranked genes using Cytoscape software (Fig. [104]4b). Briefly, JUN, ATF3, FOSB, NR4A2, and IL6 were sorted out. The intensity of coloration within the visualization reflects the degree of significance, with darker hues indicating higher scores. Fig. 4. [105]Fig. 4 [106]Open in a new tab The construction of PPI network. (a) PPI network of overlapping hub genes. (b) The core genes of the interaction network were obtained by DEGREE algorithm. (The deeper the color, the higher the score). Construction of nomogram model for knee osteoarthritis risk prediction We selected the aforementioned 5 significantly meaningful core genes (JUN, ATF3, FOSB, NR4A2, IL6) for the construction of a nomogram model (Fig. [107]5a). By plotting axial perpendicular lines based on the scores, we calculated the total score, facilitating the prediction of the patient’s disease risk. Subsequently, ROC curves were generated for the five hub genes to assess their diagnostic effectiveness. The AUC values for JUN, ATF3, FOSB, NR4A2, and IL6 are, respectively, 0.960, 0.930, 0.887, 0.900, and 0.848 (Fig. [108]5b). Fig. 5. [109]Fig. 5 [110]Open in a new tab Predicting the risk of knee osteoarthritis using nomograms. (a) Nomogram model of hub genes. (b) ROC curves to assess the diagnostic efficacy of nomogram model and each hub gene. Expression of hub genes and validation of external datasets To thoroughly investigate the expression of hub genes in the synovial tissue of patients with KOA, we conducted an analysis across multiple datasets. The datasets were processed using the same methods as previously described. The [111]GSE12021 dataset provided insights into the expression of hub genes in the context of disease, with all five hub genes showing differential expression (refer to the figures). Combining datasets [112]GSE82107 and [113]GSE32317 using the same methodology will aid in evaluating the expression levels of hub genes at different stages of KOA. Notably, ATF3 expression was found to be significantly upregulated in the early stages of the disease (p < 0.05) and subsequently downregulated in the later stages (p < 0.01). The [114]GSE46750 dataset allowed us to explore the expression of genes in various areas of the synovium in KOA patients, revealing significant disparities in the expression of NR4A2 and IL6 in inflammatory areas (p < 0.01) (Fig. [115]6a–e). Fig. 6. [116]Fig. 6 [117]Open in a new tab Verification of the five hub genes. (a–e) The “disease” group data are derived from [118]GSE12021, verifying gene expression under disease conditions. The “stage” group data are sourced from [119]GSE82107 and [120]GSE31317, verifying gene expression at different stages. The “area” group data are obtained from [121]GSE46750, verifying gene expression in inflamed and non-inflamed synovium areas of OA patients. The dark gray triangles in the figure represent the mean expression values of each group. The significance of inter-group comparisons is indicated in the figures. *p < 0.05, **p < 0.01, ***p < 0.001. Potential signaling mechanisms for hub genes In the high gene expression subgroup, we discovered that the 18 biological processes associated with JUN (p < 0.01) mainly revolve around Cell Cycle and Division-related processes, Metabolic and Transport Regulation, and Transcription and Signal Transduction. KEGG pathway enrichment revealed 6 pathways (p < 0.05), such as “IL1 IL1R JNK”, “ENV FACTOR E2 TO RAS ERK”, and “ENV FACTOR METALS TO RAS ERK”. Detailed results are presented in Fig. [122]7a. The 64 biological processes associated with IL6 (p < 0.01) mainly involve Actin Nucleation and Its Regulation, Exosomal and Extracellular Vesicle Secretion, and Immune Responses. KEGG pathway enrichment showed 9 related pathways (p < 0.05), such as “CCR CXCR GNB G PI3K RAC” and “TSH DUOX2 TG” (Fig. [123]7b). FOSB is associated with 22 biological processes (p < 0.01), including Cell Migration and Development, Cell Junction and Structure, and RNA and Protein Regulation. KEGG enrichment analysis revealed 6 significantly related pathways (p < 0.01), such as “ENV FACTOR METALS TO RAS ERK” and “KSHV VGPCR TO GNB G ERK” (Fig. [124]7c). ATF3 is associated with 13 biological processes (p < 0.01), including Cell Apoptosis and Immune Regulation, and Protein Modification and Metabolism: DNA Damage and Signal Transduction. KEGG enrichment analysis revealed 3 related pathways (p < 0.05), such as “IL1 IL1R P38” (Fig. [125]7d). NR4A2 is associated with 17 biological processes (p < 0.01), including Cell Division and Chromosome-related Processes, Immune and Cellular Responses, Transcription and Signal Transduction, and Metabolism and Biosynthesis. KEGG enrichment analysis revealed 14 related pathways (p < 0.05), including “IL1 IL1R P38” and several integrin-related pathways (Fig. [126]7e). The biological processes enriched for each gene in GOBP and the remaining KEGG pathways can be found in the Supplementary materials. Fig. 7. [127]Fig. 7 [128]Open in a new tab Single-gene GSEA analysis. (a–e) Representative KEGG pathways enriched under high expression of each hub gene. From left to right: JUN, IL6, FOSB, ATF3, NR4A2. Assessment of immune cell infiltration in knee osteoarthritis Since the synovial microenvironment affects the prognosis of KOA patients, and immune cells play an important role in the synovial microenvironment, we conducted an in-depth analysis of immune cell infiltration within the synovial tissue. The barplot shows the composition of 22 immune cells (Fig. [129]8a). Notable correlations include a marked negative association between Macrophages and M2 Mast cells resting (R =  − 0.68), as well as between Macrophages M2 and B cells memory (R =  − 0.66). Conversely, a notable positive correlation emerges between Macrophages M0 and T cells gamma delta (R = 0.71), while NK cells resting exhibit a conspicuous positive correlation with T cells CD4 naive (R = 0.61), as depicted in Fig. [130]8b. The violin plot revealed significant downregulation of CD4 memory resting T cells, regulatory T cells, activated NK cells, and activated mast cells in the KOA group while resting mast cells were upregulated (Fig. [131]8c). Immune infiltration and correlation analysis revealed that macrophages were significantly associated with JUN (p < 0.01), FOSB (p < 0.01), and NR4A2 (p < 0.05). Additionally, T follicular helper cells showed significant associations with ATF3 (p < 0.05), FOSB (p < 0.05), and JUN (p < 0.05). T cells gamma delta are associated with JUN (p < 0.05), and plasma cells are associated with NR4A2 (p < 0.05) (Fig. [132]8d). Fig. 8. [133]Fig. 8 [134]Open in a new tab Immune infiltration analysis. (a) The barplot of immune cell infiltration. (b) Correlation heatmap of immune cells in all samples. Red squares indicate positive correlation, and blue squares indicate negative correlation; the deeper color squares indicate stronger correlations. (c) Violin diagram of the proportion of 22 different immune cell types and functional states between KOA and normal samples. (d) Correlation between hub genes and immune cells. Causal association between hub genes and KOA provided by MR analysis The SNP characteristics of hub genes and KOA are shown in Supplementary Table [135]S5. All SNPs were not weak instrumental variables. MR-PRESSO analysis identified no outliers. On the premise that there is no heterogeneity, the IVW method is selected as the main analysis method. We found that JUN was associated with KOA with an OR of 0.910 (95% CI = 0.852–0.971, p = 0.005). The MR–Egger method showed no significant statistical significance (OR = 0.821, 95% CI = 0.717–0.942, p = 0.014). The IL6 was identified as a potential risk factor for KOA (IVW: OR = 1.024, 95% CI: 1.003–1.046, p = 0.026; MR–Egger: OR = 1.048, 95% CI: 1.006–1.092, p = 0.043). FOSB (IVW: OR = 1.127, 95% CI: 0.969–1.311, p = 0.122), ATF3 (IVW: OR = 1.062, 95% CI: 0.906–1.244, p = 0.461), and NR4A2 (IVW: OR = 0.970, 95% CI: 0.840–1.122, p = 0.685) did not show significant associations in the IVW method, as shown in Fig. [136]9. The MR Egger regression analysis found no indication of horizontal pleiotropy, providing further evidence that pleiotropy did not influence the causal effect. Furthermore, leave-one-out analysis indicates the consistent nature of the results even after systematically excluding individual SNPs. Scatter plots, funnel plots, and SNP forest plots are available in the supplementary materials. Fig. 9. [137]Fig. 9 [138]Open in a new tab Mendelian randomization study results. Heterogeneity was tested using Cochran’s Q test, with p > 0.05 indicating no heterogeneity. Horizontal pleiotropy was assessed using MR-Egger, with p > 0.05 indicating no horizontal pleiotropy. The “BH” method was used to perform FDR correction on the p-values of each test method separately, with FDR_P < 0.05 indicating statistical significance. Discussion For a considerable period, the clinical management of KOA has predominantly focused on symptom alleviation. However, due to its insidious onset and the absence of early diagnostic biomarkers, patients often miss optimal treatment opportunities, resulting in unfavorable prognoses. Hence, early detection and intervention for KOA are imperative. While synovial inflammatory reactions were previously believed to stem from debris generated by cartilage wear^[139]16, recent studies indicate that these reactions may serve as early indicators of osteoarthritis onset, manifesting before radiological evidence of cartilage damage^[140]17. Mounting evidence suggests that chronic synovial inflammation significantly contributes to joint injury, destruction, and disability^[141]18, with anti-inflammatory drugs and interventions showing promising efficacy^[142]19–[143]21. Consequently, identifying biomarkers in osteoarthritic synovial tissues is critical for elucidating underlying mechanisms and identifying novel therapeutic targets. In this study, employing bioinformatics methodologies, we identified 5 hub genes with the highest scores: JUN, ATF3, FOSB, NR4A2, and IL6. Validation with external datasets confirmed that five hub genes are indeed associated with KOA and exhibit consistent expression trends, further corroborating the WGCNA analysis results. Notably, Zhuang Z Jin et al. discovered that in the synovial tissue of osteoarthritis patients, JUN mRNA levels decrease while its protein expression levels increase. They suggested that this paradoxical phenomenon may be related to O-GlcNAc glycosylation, protein stability, and negative feedback regulation induced by iron overload^[144]22. This observation is similar to the expression changes we observed in our hub genes and warrants further investigation into the underlying mechanisms. In the early stages of KOA, ATF3 expression significantly increases, followed by a marked decrease in the later stages of the disease. Although no significant differences were detected in the other genes, the comparison of means indicated similar expression trends. We speculate that the increased expression of hub genes in the early stages may be one of the initiating factors of KOA, and subsequently, under complex pathophysiological conditions, their expression gradually decreases in the later stages of the disease. We believe that with an expanded sample size, the expression differences between different disease stages may become more apparent. Similarly, significant upregulation of NR4A2 and IL-6 expression was observed in the inflamed synovium. Although the expression of other genes did not show significant differences, an upward trend was observed to varying degrees, emphasizing the diagnostic significance of focusing on inflamed synovial areas. The subsequent discussion will focus on the correlation between individual genes and KOA. JUN encodes c-Jun, a pivotal mediator in immune and inflammatory responses, activated by stimuli such as pro-inflammatory cytokines, microbial products, and various cellular stressors^[145]23. Previous studies have demonstrated that JUN can exacerbate osteoarthritis by accelerating cartilage degradation^[146]24–[147]28. Our findings indicate that JUN is significantly associated with immune-related pathways, notably the IL-17 and TNF signaling pathways. IL-17, a critical inflammatory mediator in OA progression, enhances the production of vascular cell adhesion molecule 1 (VCAM-1) by synovial fibroblasts, promoting monocyte adhesion and intensifying the inflammatory response^[148]29. This suggests that JUN may promote synovial inflammation through this mechanism. Further analysis revealed that high JUN expression correlates with the upregulation of pro-inflammatory pathways such as the “IL1/IL1R/JNK” pathway and the RAS/ERK signaling pathway. Gehrke et al. showed that IL1R knockout, which blocks this pathway, could mitigate the inflammatory response^[149]30. TNF stimulation perpetuates chronic inflammation by affecting T cell signaling, particularly through the selective activation of the RAS/ERK pathway^[150]31. Gaur et al. discovered that T cells gamma delta cells exacerbate OA by producing IL-17^[151]32. Moreover, an increased frequency of specific T cells follicular helper (TFH) subsets (IL-21 + CXCR5 + CD4 +) correlates positively with OA disease activity^[152]33. This finding aligns with our observation of a significant correlation between JUN and both T cells gamma delta and TFH. Zhang et al. noted that TNF activates synovial tissue and synovial fluid macrophages (M0) to polarize into M1 macrophages, which secrete pro-inflammatory cytokines. During this polarization, the JNK pathway’s activation upregulates JUN activity, amplifying the immune response and worsening OA^[153]34. MR provided robust results, confirming a causal relationship between JUN and KOA. Given JUN’s regulatory roles in normal tissue morphology development, transcription, signal transduction, and the cell cycle, its effects in the complex inflammatory environment of KOA may differ. This could explain why the OR in the MR analysis is less than 1, warranting further experimental validation. FOSB, a member of the Fos transcription factor family, forms an AP-1 complex through dimerization with JUN protein, binding to the TPA response element (5’-TGAC /GTCA-3’) in target gene promoters and enhancer regions^[154]35, regulating various physiological and pathological processes^[155]36,[156]37. Studies by Hiraku Motomura et al. demonstrated that AP-1 inhibitors significantly reduce the mRNA expression of MMPs (1, 3, and 13) and inflammatory cytokines (IL-1β, TNF-α, and IL-6) in a mouse model of osteoarthritis, effectively mitigating cartilage degradation and osteophyte formation^[157]38. Enrichment analysis revealed that FOSB is closely associated with the IL-17 inflammatory pathway, and in the FOSB high-expression subgroup, pathways related to metal environmental factors were upregulated. This finding aligns with the research of Li Chen et al.^[158]39, who noted that heavy metals prevalent in the environment, such as cadmium (Cd), lead (Pb), and cobalt (Co), can directly cause chondrocyte death through cytotoxicity. Moreover, these metals activate inflammatory signaling pathways (e.g., NF-κB), increasing the expression of cytokines like IL-6 and TNF-α, thereby exacerbating inflammation and tissue damage. Routes of heavy metal exposure are not limited to dietary intake; Bin Wang's transcriptomic study identified that heavy metals in cosmetics stimulate the expression of pro-inflammatory factors such as JUN and FOSB^[159]40. Correlation analysis found that FOSB is closely related to macrophages, which is consistent with the analysis of JUN. Previous studies have also confirmed that FOSB expression is positively correlated with IL-6 levels, a key signaling molecule for macrophage polarization towards a pro-inflammatory phenotype^[160]41. Despite the lack of positive evidence from Mendelian randomization analysis, we remain confident in the significant role of FOSB in osteoarthritis. Interleukin 6 (IL-6) is a crucial inflammatory factor. Elevated levels of IL-6 in synovial fluid and serum are positively correlated with the severity of lesions observed in X-ray images of osteoarthritis patients^[161]42–[162]46. Maurizio Pacifici noted that IL-6 regulates synovial tissue metabolism, joint cartilage degradation, and the expression of pain-related factors via the JAK/STAT signaling pathway^[163]47. In IL-6 gene knockout mice models, there is a tendency for aggravated degenerative changes^[164]48, whereas IL-6 injection reduces polysaccharide loss and induces osteophyte formation in chronic arthritis inflammation in mice^[165]49. This illustrates the dual role of IL-6 in both the catabolic and anabolic processes within OA joints. Enrichment analysis revealed that the PI3K/RAC signaling pathway is upregulated in the context of high IL-6 expression. Carlo C Campa reported that specific chemokine receptors (such as CCR5, CCR7, CXCR2, CXCR4, etc.) activate G proteins and initiate the downstream PI3K/RAC signaling pathway upon binding with corresponding chemokines. This process recruits neutrophils, monocytes, and certain TH cells to the diseased synovium, releasing inflammatory factors, including IL-6, and promoting the inflammatory response^[166]50. The enrichment results also indicate a close relationship between IL-6 and the IL-17 signaling pathway. Both IL-6 and IL-17 promote fibroblast-like synoviocytes to secrete VEGF^[167]51, leading to the excessive development of the synovial vascular network^[168]52,[169]53. This network provides a pathway for inflammatory cells (such as lymphocytes, macrophages, and plasma cells) to enter the joint cavity and synovium from the bloodstream, exacerbating the inflammatory response and creating a vicious cycle that ultimately results in chronic inflammation^[170]54. Furthermore, IL-17 has been shown to reduce chondrocyte autophagy, inhibit the synthesis of chondrocyte proteoglycans, and promote the production of MMPs, culminating in cartilage degradation^[171]55,[172]56. Mendelian randomization results also support IL-6 as a risk factor for knee osteoarthritis. Given the variations in IL-6 expression across different diseases, we believe that combining IL-6 with other biomarkers as diagnostic markers for osteoarthritis remains of practical significance. ATF3, a member of the ATF/CREB protein family, responds to stress and inflammatory signals in various cell types and is involved in regulating cell growth, differentiation, and apoptosis^[173]57,[174]58. In the ATF3 high-expression subgroup, we observed upregulation of the “IL1/IL1R/P38/MAPK” pathway, which is associated with the induction of ATF3 expression by IL-1. Studies have shown that inflammatory factors such as IL-1 and Oncostatin M (OSM) can induce ATF3 expression through specific signaling pathways like PKC, PKD3, or JNK/SAPK (c-Jun N-terminal kinase/stress-activated protein kinase)^[175]58,[176]59. ATF3 can directly bind to the promoter region of the matrix metalloproteinase 13 (MMP13) gene, particularly to the AP-1 element, thereby regulating the transcription of MMP13^[177]60. Overexpression of MMP13 promotes the degradation of type II collagen, leading to ECM loss and disruption of cartilage homeostasis. In OA animal models, the absence of ATF3 has been found to alleviate OA symptoms and progression^[178]61. Correlation analysis of hub genes and immune cells revealed that ATF3 is associated with TFH cells. Previous studies have shown that ATF3 plays a role in the early differentiation stage of TFH cells by binding to the promoter region of B-cell lymphoma 6 protein (Bcl6), increasing the expression of Bcl6, and promoting the differentiation of T cells from the initial state to the TFH cell state^[179]57. A study using flow cytometry detected the frequency of TFH cells and the concentration of serum IL-21 in the peripheral blood of OA patients. Compared to healthy controls, OA patients showed higher expression of TFH cell-related markers, which positively correlated with OA disease activity^[180]62. This suggests that TFH cells might mediate this effect by secreting IL-21. Although Mendelian randomization analysis did not provide evidence of a causal relationship between ATF3 and OA, we believe this might be due to a false-negative result caused by insufficient SNP numbers. Therefore, we still believe in the significant role of ATF3 in OA. The studies by Jiang et al.^[181]63 and Li et al.^[182]64 also support the potential of ATF3 as an early diagnostic biomarker and a potential therapeutic target for OA. NR4A2, a member of the orphan nuclear receptor NR4A subfamily, plays a critical role in diseases characterized by chronic inflammatory responses^[183]65,[184]66. In synovial cells, NR4A2 functions as a transcription factor, binding to specific DNA sequences in the IL-8 gene promoter region to activate IL-8 gene transcription^[185]67. Clinical trials have demonstrated that methotrexate treatment in patients with inflammatory arthritis can significantly inhibit NR4A2 expression, thereby reducing IL-8 levels and alleviating symptoms^[186]68. This regulatory role highlights NR4A2’s potential as a therapeutic target for inflammatory joint diseases. Kimberlee S. Mix et al. observed that TNFα, a prevalent factor in the inflammatory microenvironment of arthritis, rapidly and selectively induces NR4A2 expression in synovial cells. This interaction promotes MMP-13 gene expression while inhibiting TIMP-2 expression, leading to cartilage degradation, connective tissue destruction, and an exacerbated inflammatory response within the joint^[187]69. Our enrichment analysis showed that in the context of high NR4A2 expression, the “IL1/IL1R/P38” pathway and several integrin-related pathways are significantly upregulated. This may be associated with NR4A2’s ability to regulate the phosphorylation levels of key downstream proteins in the MAPK pathway (ERK1/2, P38, and JNK)^[188]70. In osteoblasts, integrins aggregate and autophosphorylate upon binding with the ECM, activating MAPK via the FAK/RAS pathway, thus influencing cell proliferation and differentiation^[189]71. These findings suggest that NR4A2 is involved not only in the inflammatory process of OA but also in bone tissue repair and regeneration. Notably, studies have reported that NR4A2 can induce the expression of M2 macrophage marker genes such as arginase 1, mannose receptor, and Ym1, which aid in anti-inflammatory responses and tissue repair^[190]72. Similarly, Huangtai Miao observed that NR4A2 promotes macrophage polarization towards the M2 phenotype, reducing myocardial cell loss and myocardial damage^[191]73. This is inconsistent with our finding of NR4A2’s association with M0 macrophages, indicating the need for further in-depth research. In summary, these findings provide valuable insights into the role of NR4A2 in OA and offer potential targets for the development of new therapeutic strategies. Undoubtedly, our study has certain limitations. While public datasets offer valuable resources, they also present challenges such as limited sample sizes, infrequent data updates, and restricted data coverage. Additionally, merging datasets from different platforms and chips cannot completely eliminate batch effects and inherent technical errors of the chips. Therefore, we selected datasets from uniform sources for our analysis, resulting in a relatively smaller sample size in the WGCNA and GSEA analyses. Thus, we introduced additional external datasets and merged datasets to validate the stability of our results, attempting to compensate for the limitation of insufficient sample size. In the future, we will continue to validate the accuracy of our findings by including more datasets from uniform sources. Secondly, due to the lack of GWAS data for ATF3 and NR4A2, we used eQTL data extracted from the eQTLGEN database as a substitute. However, the insufficient number of SNPs included in the MR analysis may lead to potential false-negative results. As more studies publish related GWAS data in the future, we will further refine our analysis. Thirdly, we identified these core genes through bioinformatics analysis methods. Although evidence supporting our findings can be found in published studies, we still lack our own experimental validation. We plan to address this issue in future research through animal experiments. Looking ahead, our future efforts will include further exploration of the expression patterns of hub genes in different disease stages and synovial regions, as well as studying the factors influencing hub gene expression to provide insights into the diagnosis and treatment of KOA. Conclusion Through the utilization of differential expression analysis, as well as bioinformatics methods such as WGCNA, we have identified JUN, ATF3, FOSB, NR4A2, and IL6 as potential biomarkers for KOA. Through two-sample Mendelian randomization analysis, we have validated the causal association between JUN and IL-6 with KOA. We plan to further investigate the potential value of these biomarkers in the diagnosis and treatment of KOA. Supplementary Information [192]Supplementary Information 1.^ (43.5KB, docx) [193]Supplementary Information 2.^ (220.9MB, zip) Abbreviations KOA Knee osteoarthritis GEO Gene Expression Omnibus DEGs Differentially expressed genes WGCNA Weighted Gene Co-expression Network Analysis PPI Protein–Protein Interaction MR Mendelian randomization FC Fold-change TOM Topological overlap matrix GO Gene Ontology KEGG Kyoto Encyclopedia of Genes and Genomes ROC Receiver Operating Characteristic AUC Area under the Curve LD Linkage disequilibrium IVW Inverse Variance Weighting JUN Jun proto-oncogene ATF3 Activating transcription factor 3 FOSB FosB proto-oncogene NR4A2 Nuclear receptor subfamily 4 group A member 2 IL-6 Interleukin-6 IL-21 Interleukin 21 JNK C-Jun N-terminal kinase VCAM-1 Vascular cell adhesion molecule 1 AP-1 Activator Protein 1 MMP13 Matrix metalloproteinase 13 JAK Janus kinase STAT Signal Transducers and Activators of Transcription SAPK Stress-activated protein kinase PI3K Phosphoinositide 3-kinase MAPK Mitogen-Activated Protein Kinase GSEA Gene Set Enrichment Analysis TFH T cells follicular helper eQTL Expression Quantitative Trait Locus ECM Extracellular matrix CCR C–C chemokine receptor type VEGF Vascular Endothelial Growth Factor ERK Extracellular signal-regulated kinase Author contributions L.X., J.M., and C.Z. conceived and designed the study. X.W. and K.Z. collected the G.E.O. datasets and carried out initial data analysis. L.X. and J.M. completed the data analysis and drafted the manuscript. C.Z. and X.L. revised and finalized the manuscript. Z.S., Y.C., and T.C. were of immense help in the preparation of the manuscript. All authors contributed to the article and approved the submitted version. Funding This study was funded by LIN Xianming’s Famous Chinese Medicine Doctor Studio Project (Z.W.F[2018]69); Zhejiang Engineering Research Center for Intelligent Technology and Equipment (Z.F.G.G.J[2022]209–64). Data availability The datasets supporting the conclusions of this article are available in the Gene Expression Omnibus (GEO) database ([194]http://www.ncbi.nlm.nih.gov/geo/). The specific names of the databases and their respective accession numbers for the datasets are provided within the text of the article or in the Supplementary materials. The GWAS data for IL6 (ieu-a-GCST90000478) and KOA (ieu-a-GCST007090) can be downloaded from the IEU database ([195]https://gwas.mrcieu.ac.uk/); The FOSB data are available from the UKB-PPP database's pQTL data ([196]https://www.synapse.org/Synapse:syn51364943/files/), which can be accessed by searching for “FOSB”. The eQTL data can be obtained through the link ([197]https://www.eqtlgen.org/cis-eqtls.html). Competing interests The authors declare no competing interests. Footnotes Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. These authors have contributed equally: Lilei Xu, Jiaqi Ma, Chuanlong Zhou, Zhe Shen, Kean Zhu and Xuewen Wu. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-73188-z. References