Abstract Background Polycystic ovarian syndrome (PCOS) is a common endocrine and metabolic disorder in women of reproductive age. In this study, we aimed to investigate the potential prognostic value of mitophagy-related genes in PCOS patients and to analyze their role in immune infiltration during PCOS pathogenesis and progression. Methods Training datasets were used for differential expression genes. Gene ontology annotations and Kyoto encyclopedia of genes and genomes signaling pathway enrichment analysis were performed. The potential biomarkers of mitophagy-related and immune infiltration in PCOS were screened by protein-protein interaction network using different algorithms, and the area under the curve was calculated to analyze their diagnostic value. The test datasets were used to validate the expression of hub genes, and receiver operating characteristic curves were used to evaluate the predictive effect of hub genes. Results Five hub genes: CTSD, IGF2R, ATP13A2, NAPA and GRN were identified as the potential diagnostic biomarkers of immune-mitophagy-related PCOS through five algorithms. GRN and NAPA were validated to be significantly different in oocytes and granulosa cells between primary and secondary follicles, respectively. Based on the single sample Gene Set Enrichment Analysis score, the infiltration of 4 immune cell types in PCOS was associated with PCOS and mitophagy. Specifically, the hub gene GRN showed a positive correlation with monocytes and plasmacytoid dendritic cells, while hub gene NAPA was negatively correlated with gamma delta T cell. Conclusions The current study identified immune-mitophagy-related hub genes for prognostic biomarkers of PCOS, which provided an innovative insight for the prevention and treatment of PCOS. Keywords: Polycystic ovarian syndrome, Mitophagy, Immune infiltration, Biomarkers 1. Introduction Polycystic ovarian syndrome (PCOS) is a common complex endocrine and metabolic disorder characterized by polycystic ovaries, hyperandrogenism, and chronic ovulatory dysfunction [[27]1]. Globally, PCOS impacts 6–10 % of women within the reproductive age and becomes a major cause of infertility in women of childbearing age by affecting oocytes quality [[28]2]. However, PCOS is a heterogeneous disorder of uncertain aetiology, and the molecular mechanisms underlying the onset and progression of PCOS remain unclear. PCOS is an endocrine and metabolic disease. Recent studies indicated that mitochondria, as the organelles mainly responsible for metabolic functions within cells, have a significant relationship with the pathogenesis and disease characteristics of PCOS. Many studies showed that mitophagy, an impaired autophagy occurred in mitochondria, participated in the progression and related complications of PCOS [[29]4]. It also has been reported that the oocyte quality and maturation in PCOS are correlated with mitophagy [[30]11]. Shen et al. showed that oxidative stress-related mitophagy was the main cause of granulosa cells (GCs) death in PCOS [[31]5], and they also found that activation of the PI3K-AKT-MTOR pathway was a necessary condition for FSH to inhibit mitophagy in GCs [[32]6]. And the critical role of inflammation and immune regulation induced by mitophagy would increase the development of PCOS [[33]9]. Several studies have demonstrated that women diagnosed with PCOS exhibit elevated levels of certain immune cells, such as T-helper 17 (Th17) cells and natural killer (NK) cells, which may contribute to the inflammatory response [[34]9]. It also been reported that chronic low-grade inflammation and PCOS have an important relationship and interaction [[35]7,[36]8]. The persistent low levels of inflammatory markers (e.g. TNF-α, IL-6, IL-1β and IL-8) secreted by these immune cells have been implicated in the chronic anovulatory state of PCOS, together with the impaired GCs caused by abnormal mitophagy [[37]7,[38]8,[39]10]. Therefore, it is necessary to further explore the relationship between mitophagy and the development of PCOS. Due to the extremely complex pathogenesis of PCOS, few diagnostic markers were identified. However, the relationship between mitochondrial dysfunction and the development of PCOS made it possible to screen key mitophagy genes to assist in the diagnosis of PCOS. Recently, more bioinformatics articles showed some candidate prognostic biomarkers of PCOS and the correlations with the immune cells were presented [[40][12], [41][13], [42][14]]. However, different algorithms obtained different hub genes and immune infiltrating cells, and the relationship between mitophagy and immune cells and the regulatory mechanisms in the progression of PCOS have not been fully elucidated. Therefore, we analyzed the correlation between immune infiltration and mitophagy-related genes using the CIBERSORT algorithm, which is used to characterize the composition and expression of immune cells based on RNA sequencing data. In the current study, we aimed to screen hub genes in mitophagy related to PCOS by using protein-protein interaction network and analyze the relationship between the hub genes and immune infiltration by using single sample Gene Set Enrichment Analysis (ssGSEA). We also validate the expression of hub genes in the test dataset and evaluate the predictive effect of hub genes by ROC curve. Through the RNAactDrug database, we further explored the potential small molecule compounds for the treatment of PCOS. 2. Materials and methods 2.1. Data collection and pre-processing A total of 3 microarray datasets: [43]GSE34526 [[44]15], [45]GSE84958 [[46]16] and [47]GSE107746 [[48]17] were downloaded from the Gene Expression Omnibus (GEO) database ([49]https://www.ncbi.nlm.nih.gov/geo), which contains count and FPKM data from PCOS patients and normal controls, as well as clinical information. The three datasets were sequenced on [50]GPL570, [51]GPL16791 and [52]GPL20795, respectively (Affymetrix). [53]Table 1 showed the overall database information. In detail, dataset [54]GSE84958 contained a total of 38 untreated adipose tissue samples, including 23 normal controls and 15 patients with PCOS. Dataset [55]GSE34526 contained a total of 10 GC samples, including 3 normal controls and 7 patients with PCOS. Dataset [56]GSE107746 contained a total of 24 GC samples and 23 oocyte samples from antral follicles, 18 GC samples and 3 oocyte samples from preovulatory follicles, 15 GC samples and 25 oocyte samples from primary follicles, 8 GC samples and 17 oocyte samples from primordial follicles, 6 GC samples and 12 oocyte samples from secondary follicles. However, only 56 oocyte and GC samples were formally included in the analysis, including 15 GC samples and 23 oocyte samples from the primary follicle, 6 GC samples and 12 oocyte samples from the secondary follicle. The clinical information of participants, including sample size, age range and gender, was shown in [57]Table S1. Table 1. GEO dataset information summary. [58]GSE84958 [59]GSE34526 [60]GSE107746 Organism Homo sapiens Homo sapiens Homo sapiens Experiment type Counts Expression profiling by array FPKM Platforms [61]GPL16791 [62]GPL570 [63]GPL20795 Diagnosis (number) Normal 23 10 PCOS 15 3 Oocyte from primary follicle 0 0 23 Oocyte from secondary follicle 0 0 12 Granulosa cells from primary follicle 0 0 15 Granulosa cells from secondary follicle 0 0 6 total 38 13 56 [64]Open in a new tab PCOS: Polycystic ovarian syndrome. We used the R package affy v1.78.0 [[65]18] for reading, rma background noise reduction and quantile normalisation of the CEL files from the [66]GSE34526 dataset, while the counts in [67]GSE84958 were processed logarithmically. 2.2. Gene differential expression analysis We used the limma package v3.56.1 [[68]19] in the [69]GSE84958 and [70]GSE34526 datasets to perform differential expression genes (DEGs) analysis, with |log2FC| > 1 and P < 0.05 as the threshold, and the DEGs were visualised by the volcano map. The 4655 mitophagy genes ([71]Table S2) were downloaded from GeneCards ([72]https://www.genecards.org/). Previous literature (9 genes), several databases (57 genes) [[73]20,[74]21], GEO database (34 genes) [[75]22] and GSEA database (34 genes) [[76]23] ([77]Table S3), were searched with Mesh ‘mitophagy’ and de-duplicated. These final mitophagy-related DEGs were obtained by the intersection of the DEGs and mitophagy genes and visualised by the heatmap. Meanwhile, the correlation analysis of the mitophagy-related DEGs was performed to generate a correlation analysis heatmap. 2.3. Functional enrichment analysis The identified DEGs from both datasets were then used for gene ontology (GO) analysis and Kyoto Eecyclopedia of Genes and Genomes (KEGG) enrichment analysis to analyze the potential functions. GO enrichment analysis is a common method for large-scale functional enrichment studies of genes in different dimensions and at different levels, generally at three levels: biological process (BP), molecular function (MF) and cellular component (CC) [[78]24]. KEGG is a widely used database that stores information on genomes, biological pathways, diseases and drugs [[79]25]. GO functional annotation of all significant DEGs was performed using the R package clusterProfiler v4.8.1 [[80]26,[81]27] to identify significantly enriched biological processes, and enrichment results were visualised using the R package GOplot v1.0.2 [[82]28] and topGO, with a significance threshold of P < 0.05 for enrichment analysis. 2.4. Finding the appropriate genes using ROC Using the pROC package v1.18.2 [[83]29], the intersection of DEGs and mitophagy-related genes was used perform receiver operating characteristic (ROC) analysis, calculate the area under the curve (AUC) for ROC analysis of genes in both datasets, enumerate genes with AUC >0.8 and plot box plots of expression profiles. The FDR value was false and the discovery rate was <0.1. 2.5. Overall analysis of immune infiltration in PCOS Immune-related genes were downloaded from the literature [[84]30]. The gene set contains 782 genes and 28 cell types, with immunity types including adaptive and innate. We used the ssGSEA algorithm to calculate immune scores (GSVA package v1.48.0 [[85]31]) for the intersection of DEGs and mitochondrial autophagy-related genes in the [86]GSE84958 and [87]GSE34526 datasets, obtained immune infiltration matrices, plotted correlation heat map, and performed correlation analysis for the corresponding scores and intersection genes of the immune cells involved, and plotted correlation analysis heat map with P < 0.05 combinations, and correlation coefficients were labelled in the plots. 2.6. Protein-protein interaction (PPI) network and the identification of hub genes Based on the [88]GSE34526 and [89]GSE84958 datasets and mitophagy-related genes, we used the Cytoscape v3.9.0 software [[90]32] to calculate the network properties of each node and found hub genes using five algorithms (Degree, MCC, MNC, DMNC, EPC algorithms) from the CytoHubba plugin [[91]33], took the intersection of the five algorithms and used the software to plot the PPI network for each algorithm and the intersecting genes of the five algorithms. The miRNAs involved in the hub genes were predicted using the Tarbase database [[92]34] ([93]http://www.microrna.gr/tarbase), and the RNA binding protein (RBP) was predicted based on the hub genes in the RBP database (RNAct database [[94]35]). To reveal the relationship between these hub genes and small molecule drug sensitivity, drug sensitivity data (IC50 values) were downloaded from the RNAact Drug database [[95]36], which integrates three databases CCLE [[96]37], CellMiner [[97]38] and GDSC [[98]39]. Correlations between hub genes and small molecule drug IC50 values were then calculated separately and presented on a database-by-database basis using bubble plots. 2.7. Analysis of the differences in immune cells We performed ssGSEA of all genes in the [99]GSE84958 and [100]GSE34526 datasets, box plots of differences in ssGSEA scores for 28 immune cells, ssGSEA of mitochondrial autophagy and overlapping genes in the [101]GSE84958 and [102]GSE34526 datasets in [103]GSE107746, and a heat map in oocytes and GCs of the primary follicle versus that of the secondary follicle. 2.8. Analysis of the correlation between hub genes and immune cells Hub gene expression was correlated with immune cell ssGSEA score, and the combination of P < 0.01 and r > 0.4 was plotted as a scatter plot for correlation analysis. Hub gene expression was calculated in the [104]GSE84958 and [105]GSE34526 datasets and in the [106]GSE107746 dataset for oocyte and GCs from primary follicle versus that from secondary follicle. To determine the ability of the hub genes to predict the prognostic value of PCOS, we used the pROC package [[107]26] to construct ROC curves for these hub genes and calculated the AUC. 2.9. Statistical analysis All data calculations and statistical analyses were performed using the R program ([108]https://www.r-project.org/, version 4.1.2). For comparisons between two groups of continuous variables, statistical significance of normally distributed variables was estimated using the independent Student t-test, and differences between non-normally distributed variables were analyzed using Mann-Whitney U tests (i.e. Wilcoxon rank sum test). All statistical P values were two-tailed, with P < 0.05 considered statistically significant. 3. Results 3.1. Analysis process The whole analysis flowchart was showed in [109]Fig. 1. The [110]GSE34526 and [111]GSE84958 datasets were used to perform DEGs analysis (control vs. PCOS group) and enrichment analysis for GO and KEGG based on DEGs. DEGs and mitophagy genes were used for intersection, followed by immuno-infiltration analysis and PPI network construction to search for hub genes. Hub genes were used to retrieve relevant miRNA and RBP predictions, and finally incorporated into the [112]GSE107746 dataset to calculate the ssGSEA score, validate hub gene expression, and use ROC curves to evaluate the predictive power of hub genes. Fig. 1. [113]Fig. 1 [114]Open in a new tab The flowchart of the analysis. 3.2. Gene differential expression analysis Gene differential expression analysis was performed in the [115]GSE84958 and [116]GSE34526 datasets. A total of 3510 up-regulated genes, 1208 down-regulated genes and 17740 genes without significant change were finally obtained from the [117]GSE84958 dataset ([118]Fig. 2A); a total of 555 up-regulated genes, 183 down-regulated genes and 22,782 genes with no significant change were finally obtained from the [119]GSE34526 dataset ([120]Fig. 2B). A total of 43 mitophagy-related DEGs were intersected between the two datasets for DEGs and mitophagy-related genes ([121]Fig. 2C). Clustering heat maps were then constructed to demonstrate that in [122]Fig. 2D and E. ZNF185, MS4A4A, ARMCX2, ADAP2, FAM20A, C3orf70, P2RX7 were the genes with significantly lower expression in the [123]GSE84958 dataset, whereas TKT, CTSD, TSPO and FABP5 were expressed at relatively high levels. In the [124]GSE34526 dataset, TUBB2A and UABP5 were significantly more highly expressed in the control group, while C3orf70 was significantly less expressed in PCOS. P2RX7 and C3orf70 were largely negatively correlated with all other genes in the [125]GSE84958 dataset ([126]Fig. 2F), whereas TUBB2A, ARMCX2, UAP1, SUCLA2 and C3orf70 were largely negatively correlated with all other genes in the [127]GSE34526 dataset ([128]Fig. 2G). Fig. 2. [129]Fig. 2 [130]Open in a new tab Gene differential expression analysis (A) Volcano plot of differential gene expression between the control and PCOS groups in the [131]GSE84958 dataset, with down-regulated genes in blue, up-regulated genes in yellow and unchanged genes in grey, each dot representing a gene. (B) Volcano plot of differential gene expression between control and PCOS groups in the [132]GSE34526 dataset, with down-regulated genes in blue, up-regulated genes in yellow and unchanged genes in grey, each dot representing a gene. (C) Venn diagram of mitochondrial autophagy-related genes and differentially expressed genes in the [133]GSE84958 and [134]GSE34526 datasets. (D) Heat map of intersectional expression clusters of 43 differentially expressed genes and mitochondrial autophagy-associated genes in [135]GSE84958, with red blocks representing high expression and purple blocks representing low expression. (E) Heat map of the intersectional expression clusters of 43 differentially expressed genes and mitochondrial autophagy-associated genes in [136]GSE34526, with red coloured blocks as high expression and purple coloured blocks as low expression. (F) Heat map of the intersectional correlation analysis of 43 differentially expressed genes and mitochondrial autophagy-associated genes in [137]GSE84958, blue indicates correlation coefficient equal to 1, red indicates correlation coefficient equal to −1. (G) Heat map of the intersectional correlation analysis of 43 differentially expressed genes and mitochondrial autophagy-associated genes in [138]GSE34526, blue indicates correlation coefficient equal to 1, red indicates correlation coefficient equal to −1. PCOS: Polycystic ovarian syndrome. 3.3. GO/KEGG enrichment analysis Using DEGs for KEGG, GO enrichment analysis ([139]Table S3), 15 KEGG-enriched pathways were selected and shown as bar graphs in [140]Fig. 3A and B. The specific names of the pathways enriched in the [141]GSE84958 dataset ([142]Fig. 3A) were as follows: Ribosome, Fatty acid metabolism, Salmonella infection, Protein processing in endoplasmic reticulum, PPAR signalling pathway, Pathways of neurodegeneration-multiple diseases, Thermogenesis, Chemical carcinogenesis-reactive oxygen species, Oxidative phosphorylation, Non-alcoholic fatty liver disease, Parkinson's disease, Prion disease, Huntington's disease, Amyotrophic lateral sclerosis, Coronavirus disease-COVID-19; The specific names of the pathways enriched in the [143]GSE34526 dataset ([144]Fig. 3B) are as follows: Phagosome, Tuberculosis, Leishmaniasis, Osteoclast differentiation, Staphylococcus aureus infection, Rheumatoid arthritis, Hematopoietic cell lineage, Viral myocarditis, Lysosome, Inflammatory bowel disease, Pertussis, Antigen processing and presentation, Allograft rejection, Neutrophil extracellular trap formation, Cell adhesion molecules. Fig. 3. [145]Fig. 3 [146]Open in a new tab GO/KEGG enrichment analysis (A) Histogram of KEGG enrichment analysis of differentially expressed genes in [147]GSE84958, the colour of the bars varies with the size of P. adapt, the x-axis is -log10 (p value) and the y-axis is the KEGG term and pathway enriched in. (B) Histogram of KEGG enrichment analysis of differentially expressed genes in [148]GSE34526, the colour of the bars varies with the size of P. adapt, the x-axis is -log10 (p value) and the y-axis is the KEGG term and pathway enriched to. (C) GO analysis string diagram of the [149]GSE84958 dataset for differentially expressed genes, red dots represent up-regulated genes, blue dots represent down-regulated genes, pathway numbers and specific names are listed in the table below the diagram. (D) GO analysis string diagram of the [150]GSE34526 dataset, red dots represent up-regulated genes, blue dots represent down-regulated genes, pathway numbers and specific names are listed in the table below the diagram. GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes. The top 10 GO-enriched pathways in descending order of selected P-values are shown as string plots in [151]Fig. 3C and D. The specific names of the pathways enriched in the [152]GSE84958 dataset ([153]Fig. 3C) are as follows: cotranslational protein targeting to membrane, SRP-dependent cotranslational protein targeting to membrane, protein targeting to ER, establishment of protein localisation to endoplasmic reticulum, protein localisation to endoplasmic reticulum, nuclear transcribed mRNA catabolic process, nonsense-mediated decay, translation initiation, viral gene expression, protein targeting, viral transcription; the specific names of the pathways enriched in the [154]GSE34526 dataset ([155]Fig. 3D) are as follows: tertiary granule, macrophage activation, cellular response to biotic stimulus, cellular response to molecule of bacterial origin, tertiary granule membrane, ficolin-1-rich granule, cellular response to lipopolysaccharide, regulation of myeloid leukocyte-mediated immunity, chemokine production, microglial cell activation. 3.4. Screening appropriate genes using ROC analysis Using ROC analysis a collection of 12 genes with relatively good prediction were selected and overlapping genes were taken from the mitophagy genes combined with the [156]GSE84958 and [157]GSE34526 datasets, including ARMCX2, MS4A4A, SRM, CFL2, ATP13A2, STX10, FABP5, LIPA, PPP1CA, UAP1, TSPO, NAPA. All 12 genes in the [158]GSE84958 dataset were significantly different in the control and PCOS groups (ARMCX2, P = 0.00026; MS4A4A, P = 0.0006; SRM, P = 0.00013; CFL2, P = 0.00014; ATP13A2, P = 0.00079; STX10, P = 0.00083; FABP5, P = 0.00077; LIPA, P = 0.00039; PPP1CA, P = 0.00189; UAP1, P = 0.00169; TSPO, P = 0.00189; NAPA, P = 0.0017; [159]Fig. 4A); all 10 genes in the [160]GSE34526 dataset, except FABP5 and NAPA, were significantly different in the control and PCOS group: LIPA, P = 0.017; UAP1, P = 0.017; TSPO, P = 0.017; PPP1CA, P = 0.033; CFL2, P = 0.017; ARMCX2, P = 0.033; MS4A4A, P = 0.017; SRM, P = 0.017; STX10, P = 0.033; ATP13A2, P = 0.017 ([161]Fig. 4B). Fig. 4. [162]Fig. 4 [163]Open in a new tab ROC analysis to find the appropriate gene (A) Expression of genes in [164]GSE84958 dataset with area under the curve >0.8 for ROC analysis in both datasets, blue for control and red for patient group. (B) Expression of genes in [165]GSE34526 dataset with area under the curve >0.8 for ROC analysis in both datasets, blue for control and red for patient group. P < 0.05 was considered significant. ROC: Receiver operating characteristic; PCOS: Polycystic ovarian syndrome. 3.5. Overall analysis of immune infiltration in PCOS A total of 43 overlapping genes from the DEGs of both datasets and mitophagy-related genes were used for ssGSEA analysis, enriching for a total of 4 immune cells. In the [166]GSE84958 dataset, monocytes showed a lower immune score in the PCOS ([167]Fig. 5A), CD56 bright natural killer cells showed a significant correlation (P < 0.05) with the hub gene CTSD. The correlation coefficient was 0.77. There was a significant negative correlation (P < 0.05) between regulatory T cells and ARF5, with a correlation coefficient of −0.68. ([168]Fig. 5C). Fig. 5. [169]Fig. 5 [170]Open in a new tab Overall analysis of DEG in immune infiltration in PCOS (A) Heat map of the ssGSEA scores of the six immune cells in the control and PCOS groups in the [171]GSE84958 dataset. Red represents high abundance, blue represents low abundance. (B) Heat map of ssGSEA scores of six immune cells in the control and disease groups in the [172]GSE34526 dataset. Red represents high abundance, blue represents low abundance. (C) Heat map of correlation analysis of immune cells and differentially expressed genes in the [173]GSE84958 dataset with overlapping genes of mitochondrial autophagy genes, red indicates correlation coefficients equal to 1, purple indicates correlation coefficients equal to −1, and all combinations with correlation coefficients marked in the figure are combinations with P < 0.05. (D) Heat map of the correlation analysis of immune cells and differentially expressed genes in the [174]GSE34526 dataset with overlapping genes of mitochondrial autophagy genes, red indicates correlation coefficients equal to 1, purple indicates correlation coefficients equal to −1, and all combinations with correlation coefficients marked in the figure are combinations with P < 0.05. ssGSEA: Single Sample Gene Set Enrichment Analysis. In contrast, in the control and PCOS groups of the [175]GSE34526 dataset, Regulatory T cells showed a very significant difference, with a higher immune score in the PCOS group ([176]Fig. 5B). Gamma delta T cells were significantly correlated with FABP5 (P < 0.05, r = 0.81), a combination that was also reflected in [177]GSE84958 (P < 0.05, r = 0.76), and monocyte showed a significant negative correlation with FABP5 (P < 0.05, r = −0.6) ([178]Fig. 5D). These results indicated that the immune infiltration changed in PCOS. 3.6. Protein-protein interaction network and the identification of hub genes To further identify the hub biomarkers, five algorithms were used to calculate the interaction. Degree algorithm yielded 10 hub genes: CTSD, IGF2R, ITGAM, PPP1CA, ATP13A2, EIF5A, NAPA, RAB5C, RAB5B, GRN, among which CTSD has a higher rank and 5 genes have reciprocal relationship with it ([179]Fig. 6A). The MCC algorithm yielded 10 hub genes: CTSD, IGF2R, ITGAM, NAPA, PPP1CA, ATP13A2, EIF5A, RAB5C, RAB5B, GRN, among which CTSD had the highest rank and with six genes had reciprocal relationships with it ([180]Fig. 6B). The MNC algorithm yielded 10 hub genes: NAPA, IGF2R, PPP1CA, ATP13A2, GRN, CTSD, STX10, CAPNS1, EIF5A, CD44, of which NAPA and IGF2R have a higher rank and an interaction relationship with each other ([181]Fig. 6C). The DMNC algorithm yielded 10 hub-gene interactions: NAPA, IGF2R, PPP1CA, ATP13A2, GRN, CTSD, STX10, CAPNS1, EIF5A, CD44 among which NAPA and IGF2R were of higher rank and they had an interaction relationship with each other ([182]Fig. 6D). The EPC algorithm yielded 10 hub-gene interactions: CTSD, IGF2R, NAPA, RAB5B, RAB5C, ATP13A2, STX10, GRN, AP1S2, ARF5, with CTSD having the highest rank ([183]Fig. 6E). Fig. 6. [184]Fig. 6 [185]Open in a new tab PPI analysis construction of ten hub gene (A) Map of the 10 hub-gene interaction networks obtained by the Degree algorithm. (B) Map of the 10 hub-gene interactions obtained by the MCC algorithm. (C) Map of the 10 hub-gene interaction networks obtained by the MNC algorithm. (D) Map of the 10 hub-gene interaction networks obtained by the DMNC algorithm. (E) Map of the 10 hub-gene interactions obtained by the EPC algorithm. (F) Intersection of the five algorithms with the hub gene interaction network graph. MCC: Maximum Clique Centrality; MNC: Maximum Neighbourhood Component; DMNC: Density of Maximum Neighbourhood Component; EPC: Edge Percolated Component. The interoperability network diagram of the five algorithmic intersection hub genes (CTSD, IGF2R, ATP13A2, NAPA, GRN) was shown in [186]Fig. 6F. It can be observed that GRN, IGF2R, and ATP13A2 all interact in relation to CTSD. After identifying the final 5 hub genes (CTSD, IGF2R, ATP13A2, NAPA, GRN), we proceeded to conduct RBP prediction, miRNA prediction ([187]Figure S1), and drug small molecule prediction based on these hub genes. A total of 27 RBP intersections were predicted for the 5 hub genes ([188]Fig. 7A). To show the relationship between these 5 hub genes and small molecule drug sensitivity, data from three drug sensitivity databases were integrated and bubble plots of the correlation between gene expression values and small molecule drug IC50 values were plotted separately. A higher IC50 value indicates higher drug resistance, meaning that a positive correlation suggests that increased gene expression contributes to enhanced resistance, while a negative correlation implies that higher gene expression leads to reduced resistance. The small molecule drugs which have a significant correlation with the hub genes were selected for further exploring the function in polycystic ovary syndrome (PCOS). The bubble plots in [189]Fig. 7B, C, and 7D illustrate drug sensitivity analyses using the GDSC, CellMiner, and CCLE databases, respectively. The drugs provided in these databases are usually those that have been widely tested in clinical settings. In the three databases, the GRN gene showed correlations with most small molecule drugs, and the GRN gene showed significant negative correlations with drugs Trametinib, Selumetinib, Refametinib and PD0325901 with relatively small p-values in the GDSC database ([190]Fig. 7B). In the CellMiner database, CTSD was negatively correlated with most small molecules, and the only small molecules correlated with GRN were kinetin riboside and 1,1′-biphenyl, 4,4′-bis(thiomethyl)- ([191]Fig. 7C). In the CCLE database, CTSD, ATP13A2 and GRN genes were positively associated with most small molecule drugs, while NAPA was only negatively associated with Panobinostat ([192]Fig. 7D). Fig. 7. [193]Fig. 7 [194]Open in a new tab Small molecule drug prediction based on hub genes (A) Venn diagram of RBP prediction results for 5 hub genes (CTSD, IGF2R, ATP13A2, NAPA, GRN). (B) Bubble plots of the GDSC database for drug sensitivity analysis. (C) Bubble plots of the drug sensitivity analysis from the CellMiner database. (D) Bubble plots of drug sensitivity analysis from the CCLE database. The bubble size represents -log2FDR and the colour shade represents the correlation coefficient. RBP: RNA binding protein; GDSC: Genomics of Drug Sensitivity in Cancer; CellMiner: a web-based suite of genomic and pharmacological tools for exploring transcript and drug patterns in the NCI-60 cell line set; CCLE: Cancer Cell Line Encyclopedia. 3.7. Analysis of the differences in immune cells between normal and PCOS In the [195]GSE84958 and [196]GSE34526 datasets, immune infiltration analysis and differential expression calculations were performed for 28 immune cells between PCOS and controls. In [197]GSE84958, immune cells with significantly different ssGSEA scores ([198]Fig. 8A) included Activated CD4 T cells (P = 7.45E-05), Regulatory T cells (P = 0.0201), Type 2 T helper cells (P = 0.0258), Effector memory CD4 T cells (P = 0.0280), Central memory CD4 T cells (P = 0.0412), Eosinophils (P = 0.0412), and MDSCs (P = 0.0444). In [199]GSE34526, immune cells with significant differences ([200]Fig. 8B) included Activated dendritic cells, Central memory CD4/CD8 T cells, Gamma delta T cells, Immature dendritic cells, Macrophages, Mast cells, MDSCs, Monocytes, Natural killer T cells, Neutrophils, Plasmacytoid dendritic cells, Regulatory T cells, T follicular helper cells, Type 1 T helper cells (all P = 0.0167), Activated B cells, CD56dim natural killer cells, Effector memory CD8 T cells, and Immature B cells (all P = 0.033). Fig. 8. [201]Fig. 8 [202]Open in a new tab Analysis of the differences in immune cells between samples (A) Analysis of the difference in immune cell ssGSEA scores between the control and disease groups in the [203]GSE84958 dataset, whereh red represents the control group and blue represents the PCOS group. (B) Analysis of the difference in immune cell ssGSEA scores between the control and PCOS groups in the [204]GSE34526 dataset, red represents the control group and blue represents the PCOS group. p < 0.05 is considered significant. (C) Clustered heat map of ssGSEA scoring results for oocytes from primary follicles vs. oocytes of secondary follicles in [205]GSE107746. (D) Heat map of clustering results for ssGSEA scoring of GCs from primary follicles vs. GCs from secondary follicles in [206]GSE107746. However, immuno-infiltration analysis with 43 intersecting genes in [207]GSE107746 showed a higher ssGSEA score for Regulatory T cells in oocytes from secondary follicles compared to oocytes from primary follicles ([208]Fig. 8C), with no significant difference was observed when comparing GCs from primary and secondary follicles ([209]Fig. 8D). 3.8. Analysis of the correlation between hub genes and immune cells The ssGSEA scores calculated for the five hub genes (CTSD, IGF2R, ATP13A2, NAPA, GRN) were correlated with all genes involved in [210]GSE107746 and the combinations of P < 0.01 and |r| > 0.4 were plotted as scatter plots for correlation analysis, with the specific P and r values shown in [211]Fig. 9. Among the combinations with negative correlation are the following: Effector memory CD4 T cell and CTSD ([212]Fig. 9B), Activated B cell and IGF2R ([213]Fig. 9E), Activated CD4 T cell and IGF2R ([214]Fig. 9F), Immature B cell and IGF2R ([215]Fig. 9G), Gamma delta T cell and NAPA ([216]Fig. 9H); and the combinations that show a positive correlation are: CD56dim natural killer cell and CTSD ([217]Fig. 9A), MDSC and CTSD ([218]Fig. 9C), Natural killer cell and CTSD ([219]Fig. 9D), Monocyte and GRN ([220]Fig. 9I), Plasmacytoid dendritic cell and GRN ([221]Fig. 9J). Fig. 9. [222]Fig. 9 [223]Open in a new tab Analysis of the correlation between hub genes and immune cells (A) Scatter plot of the correlation analysis between CD56dim natural killer cells and CTSD. (B) Scatter plot of correlation analysis between Central memory CD4 T cellls and CTSD. (C) Scatter plot of correlation analysis between effector memory CD4 T cells and CTSD. (D) Scatter plot of correlation analysis between Effector memory CD8 T cells and CTSD. (E) Scatter plot of correlation analysis between MDSC and CTSD. (F) Scatter plot of correlation analysis between Natural killer cells and CTSD. (G) Scatter plot of correlation analysis between Activated B cells and IGF2R. (H) Scatter plot of correlation analysis between Immature B cells and IGF2R. (I) Scatter plot of correlation analysis between Gamma delta T cells and NAPA. (J) Scatter plot of correlation analysis between Monocyte and GRN. (K) Scatter plot of correlation analysis between Plasmacytoid dendritic cells and GRN. The differential expression of the five hub genes (CTSD, IGF2R, ATP13A2, NAPA, GRN) in the [224]GSE84958 and [225]GSE34526 datasets is shown in [226]Fig. 10A and B. Except for CTSD, the cases were significantly different from the control in the [227]GSE84958 dataset for all four genes (NAPA, P = 0.0017; GRN, P = 0.00599; IGF2R, P = 0.01552; ATP13A2, P = 0.00079) ([228]Fig. 10A), while in the [229]GSE34526 dataset, GRN (P = 0.033), IGF2R (P = 0.033) and ATP13A2 (P = 0.017) were all significantly different ([230]Fig. 10B). The ROC curves of the [231]GSE84958 dataset showed that the AUCs of all five hub genes were high, with AUCs of 0.629 for CTSD, 0.736 for IGF2R, equal to 0.822 for ATP13A2, 0.806 for NAPA and 0.768 for GRN ([232]Fig. 10C), whereas the AUC of all four genes in [233]GSE34526 was above 0.9, except for NAPA (AUC = 0.81) ([234]Fig. 10D). In the [235]GSE107746 dataset, we compared hub gene expression in oocytes and GCs of primary follicles versus oocytes and GCs of secondary follicles, and found that GRN and NAPA were significantly different (P = 2.9e-8, P = 2.8e-6) in oocytes from primary follicles compared to oocytes of secondary follicles ([236]Fig. 10E), and only GRN and NAPA were significantly different between GCs of primary follicles compared to secondary follicles (P = 0.0007, P = 0.0024) ([237]Fig. 10F). Fig. 10. [238]Fig. 10 [239]Open in a new tab Hub gene expression and ROC analysis (A) Differential expression of CTSD, IGF2R, ATP13A2, NAPA, GRN in the [240]GSE84958 dataset. (B) Differential expression of CTSD, IGF2R, ATP13A2, NAPA and GRN in the [241]GSE34526 dataset. (C) ROC curves for CTSD, IGF2R, ATP13A2, NAPA, GRN in the [242]GSE84958 dataset, where each colour represents one gene. (D) ROC curves for CTSD, IGF2R, ATP13A2, NAPA, GRN in the [243]GSE34526 dataset, where each colour represents one gene. ROC: Receiver operating characteristic; AUC: Area under the curve. 4. Discussion PCOS is a heterogeneous and complex endocrine-metabolic disorder of uncertain aetiologies and variable phenotype in women of reproductive age. The Rottterdam criteria and clinical symptoms have been the mainstay of the diagnosis of PCOS due to the lack of an exact pathogenesis mechanism. Recently, it has been increasingly recognized that mitophagy and immune regulation plays a critical role in the development of PCOS, however, immune-mitophagy-related biomarkers remain underdiagnosed. In this current study, we identified five hub genes (CTSD, IGF2R, ATP13A2, NAPA and GRN) for mitophagy-related PCOS and screened the association between these hub genes and immune cells infiltration by ssGSEA. Meanwhile, we validated the expression of these hub genes in the test dataset to investigate the relationship between the oocytes and immune infiltration. In addition, we found six small molecule compounds for PCOS treatment through drug sensitive IC50. Therefore, these immune-mitophagy-related hub genes have the potential role of guiding individualized therapy, developing new therapeutic strategies and identifying new targets of PCOS. The pathogenesis of PCOS is complex and the exact regulatory genes remain elusive. We firstly screen out the differential gene expression in The [244]GSE34526 and [245]GSE84958 datasets. The GO enrichment analysis results showed that the DEGs were involved in positive regulation of neutrophil activation in immune response, neutrophil mediated immunity and protein targeting, which revealed that immune inflammatory state might be one of the important factors in the pathogenesis of PCOS. The KEGG analysis results suggested that the DEGs were associated with PPAR signaling pathway, oxidative phosphorylation and phagosome. It has been reported that the increased expression of PPAR in the GCs involved in the progress of PCOS [[246]49]. Oxidative phosphorylation and phagosome shared the mitophagy pathway. Several studies indicated the relationship between mitophagy and pathology of PCOS. Therefore, mitophagy disfunction might contribute to the PCOS disorder. After intersected with mitophagy-related genes,43 genes were yielded ([247]Fig. 2). The identification of diagnostic markers for PCOS has been challenging due to its complex pathogenesis. Selecting hub genes from mitochondria-related genes could greatly aid in the diagnosis of PCOS. Based on our results, five hub genes were screened for the diagnostic biomarkers of PCOS by PPI network, revealing their involvement in PCOS progression. Interestingly, the CTSD hub protein interacted with the other four hub proteins in the PPI analysis, highlighting CTSD's critical role in PCOS-associated mitophagy. CTSD, a normal and major component of lysosomes, is a lysosomal enzyme and is present in the autolysosome during autophagy [[248]40]. The increased expression of CTSD was induced by autophagy and promotes apoptosis [[249]40]. In our analysis, CTSD consistently scored highest among hub genes identified through five algorithms, suggesting that autophagy is activated in PCOS and that CTSD may contribute to PCOS progression via autophagy-dependent apoptosis in granulosa cells (GCs). Similarly, IGF2R is an insulin-like growth factor type 2 receptor that plays an important role in the transport of phosphorylated lysosomal enzymes from the Golgi complex and the cell surface to the lysosomes [[250]41]. IGF2R also scored highly in our analysis, indicating its potential involvement in PCOS development through activated mitophagy, alongside CTSD. Furthermore, ATP13A2, which can inhibit autophagy by activating mTORC [[251]42] and involved in insulin resistance [[252]43,[253]44], which was consistent with our analysis that ATP13A2 was low expressed in the [254]GSE84958 dataset with adipose tissue sample. Two additional biomarkers, GRN and NAPA, were identified using ssGSEA score correlation analysis and validated in the test dataset. GRN, also known as PGRN, is localized in the GCs of developing follicles in pigs [[255]45], and was also reported to be involved in development of oocyte and the proliferation of GCs in PCOS [[256]46,[257]47]. Meanwhile, as a novel proinflammatory biomarker, PGRN might promote the chronic inflammatory state of PCOS. The protein encoded by NAPA was aSNAP, which has been reported involving the regulating of yeast autophagy biogenesis [[258]48]. However, its relationship with PCOS remains unexplored, warranting further investigation to establish the role of these two genes in mitophagy-associated PCOS. Given the importance of immune regulation in PCOS [[259]9], immune infiltration analysis was first performed in PCOS and normal patients using CIBERSORT. Four immune cells: Monocyte, Regulatory T cell, CD56 bright natural killer cell and Gamma delta T cell was obtained, and were associated with the development of PCOS and mitophagy, which was consistent with the previous result [[260]22]. These immune cells were associated with different DEGs across different datasets, further confirming the heterogeneous nature of PCOS. Further analysis of hub genes and immune infiltration cells using ssGSEA score association revealed specific correlations: effector memory CD4 T cells were negatively correlated with CTSD, and IGF2R was negatively correlated with activated B cells and activated CD4 T cells. Gamma delta T cells were negatively correlated with immature B cells and NAPA. Positive correlations included CD56dim natural killer cells with CTSD, MDSCs with CTSD, natural killer cells with CTSD, monocytes with GRN, and plasmacytoid dendritic cells with GRN. Finally, Regulatory T cells were found to have a higher ssGSEA score in the oocyte of the secondary follicle compared to the primary follicle after validation in the test dataset, but no difference was found in the granular cell, showing that Regulatory T cells might play a critical role in the selection and development of the dominant follicle. These findings have implications for the treatment of immune problems in PCOS, allowing individualized treatment. However, for specific individuals with genetic mutations and drug-resistant patients, the development of new immune-targeting drugs needs to be further explored and optimized. Our article reveals the potential molecular mechanism and diagnostic ability of mitochondrial autophagy in PCOS. After identifying the hub genes in this study, predictions for RNA-binding proteins and miRNAs were conducted. In order to reveal the potential role of these targets in clinical practice and screen out possible drug targets, we conducted an analysis in combination with the drug sensitivity database. Additionally, several small-molecule drugs significantly associated with the hub genes were identified. By integrating drug sensitivity data from the CCLE, CellMiner, and GDSC databases, the potential roles of these compounds in regulating core genes linked to the pathogenesis of PCOS were explored. The analysis revealed that CTSD exhibited a negative correlation with most small molecules in the GDSC and CellMiner databases, while showing a positive correlation in the CCLE database. Given that the drugs in these databases have undergone extensive clinical testing, these findings may offer promising treatment options or inspire new research directions for PCOS. This bioinformatics analysis is the first to combine mitophagy and immune infiltration to find potential diagnostic biomarkers for PCOS and to provide a new therapeutic target. However, this analysis still has some limitations that need to be addressed. Firstly, there are only two train data sets with limited sample size and from different tissues, resulting in a partial understanding of the pathogenic mechanism of PCOS. More train data sets should be analyzed that include different phenotypic characteristics of PCOS in the future. Secondly, we assessed the prognostic prediction ability of hub genes with ROC curves in this study. More performance metrics, such as Matthews Correlation Coefficient (MCC), Accuracy (ACC), Sensitivity (Sn), Specificity (Sp), and etc. would offer a more robust evaluation. And more machine learning algorithms (e.g., logistic regression, support vector machines) would potentially enhance the predictive accuracy and biomarkers for PCOS diagnosis. So, more functional indicators to enhance prediction accuracy in future studies. A randomized controlled trial could help to verify the efficacy of immune therapy on PCOS based on these hub genes. Thirdly, 12 hub genes were obtained from two datasets, however, each cell will have its own transcriptional signature that is also related to the microenvironment, we did not analyze the different cell types to gain a comprehensive understanding of individual cells, therefore, we will perform individual analyses taking into account the uniqueness of each cell type in the future. Finally, and importantly, there is a lack of wet experiments to further verify the results. Therefore, further in vivo and in vitro research is needed to explore the relationship of hub genes with immune infiltration in PCOS and the mechanism. PCOS is a indeed a syndrome and most like represents multiple types of underlying causative etiologies and physiologic alterations that may be present. The most common presentation of PCOS with insulin resistance and obesity goes along very well with the underlying hypothesis, the presented data and conclusions of this study. But other phenotypic presentations of PCOS such a thin body habitus with absolutely no evidence of metabolic syndrome or insulin resistance may be totally unrelated to the findings of this study. Therefore, future in-depth experimental research is warranted to fully understand their potential mechanisms involved in the pathogenesis of different phenotypes. and to explore effective treatments for PCOS. 5. Conclusion In conclusion, our work firstly combined mitophagy and immune infiltration to find potential diagnostic biomarkers for PCOS and evaluated their potential interactions with immune cells during the pathogenesis of PCOS, which enriched the biomarkers of mitophagy and immune-related pathological features of PCOS. CRediT authorship contribution statement Hongjuan Ye: Writing – original draft. Xicheng Wang: Data curation. Zhiying He: Supervision, Investigation, Funding acquisition. Ethical approval and consent to participate Not applicable. Consent for publication All authors have given consent for publication. Availability of data and materials All data generated or analyzed during this study are included in this published article. Funding Major Program of National Key Research and Development Project [grant number 2020YFA0112600], and Shanghai Engineering Research Center of Stem Cells Translational Medicine [grant number 20DZ2255100]. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements