Abstract Polycystic ovarian syndrome (PCOS) is one of the most common endocrinopathies among reproductive women worldwide, contributing greatly on the incidence of female infertility and gynecological cancers. It is a complex health condition combining of multiple symptoms like androgen excess, uncontrolled weight gain, alopecia, hirsutism, etc. Conventionally PCOS was associated with obesity while it is often found among lean women nowadays, making the disease more critical to diagnose as well treatment. The disorder has an impact on several signal transduction pathways, including steroidogenesis, steroid hormone activity, gonadotrophin regulation, insulin secretion, energy balance, and chronic inflammation. Understanding the aetiology and pathophysiology of PCOS is difficult due to its multiple causes, which include environmental factors, intricate genetic predisposition, and epigenetic modifications. Despite research supporting the role of familial aggregations in PCOS outcomes, the inheritance pattern remains unknown. Henceforth, to reduce the burden of PCOS, it is inevitably important to diagnose at early ages as well as intervene through personalized medicine. With this brief background, it was imperative to elucidate the genetic architecture of PCOS considering BMI as an controlling factor. This study aims to investigate the genetic basis behind obesity-mediated PCOS, focusing on both obese and lean individuals. It uses a comprehensive bioinformatics methodology to depict pathways and functionality enrichment, allowing for cost-effective risk prediction and management. In the present research, the representative study participants (N = 2) were chosen from a cross-sectional epidemiological survey, based on their anthropometric parameters and confirmation of PCOS. Upon voluntary participation and written consent, biological fluids (whole blood and buccal swab) were taken from where DNA was extracted. The clinical-exome sequencing was performed by the Next-generation Illumina platform using the Twist Human Comprehensive Exome Kit. A comprehensive bioinformatics methodology was employed to identify the most important, unique, and common genes. A total of 26,550 variants were identified in clinically important exomes from two samples, with 5170 common and 2232 and 2322 unique among PCOS lean and obese phenotypes, respectively. Only 262 and 94 variants were PCOS-specific in lean and obese PCOS. Three filters were applied to shortlist the most potent variants, with 4 unique variants in lean PCOS, 2 unique variants in obese PCOS, and 5 common variants in both. The study found that leptin signalling impairment and insulin resistance, as well as mutations in CYP1A1, CYP19A1, ESR1, AR, AMH, AdipoR1, NAMPT, NPY, PTEN, EGFR, and Akt, all play significant roles in PCOS in the studied group. Young women in West Bengal, India, are more likely to have co-occurring PCOS, which includes estrogen resistance, leptin receptor insufficiency, folate deficiency, T2DM, and acanthosis nigricans, with obesity being a common phenotypic expression. Keywords: Polycystic ovarian syndrome (PCOS), Next-generation sequencing (NGS), Clinical-exome sequencing (CES), Comorbidity analysis, Mutation-phenotype interactions, Molecular screening Subject terms: Biological techniques, Genetics, Endocrinology, Molecular medicine Introduction Polycystic ovarian syndrome (PCOS), one of the most common heterogeneous, severe endocrinopathies that affect 15–20% of reproductive women globally, has been linked to infertility and even some gynaecological cancers^[26]1–[27]3. The condition is classified as a syndrome rather than a disease due to its diverse phenotypic symptoms, such as ultrasonographic polycystic ovarian morphology (PCOM), clinical or biochemical hyperandrogenism, hirsutism, acne, acanthosis nigricans, obesity, infertility, and hair loss^[28]4,[29]5. Understanding the aetiology and pathophysiology of PCOS has been a difficult task because of its plentiful causes, including environmental variables, genetic predisposition, and epigenetic changes^[30]2,[31]3. On the other hand, despite having sufficient evidence to confirm the importance of familial aggregations in PCOS outcomes^[32]6–[33]10, the inheritance pattern remains unclear. PCOS affects multiple signal transduction pathways, including steroidogenesis, steroid hormone activity, gonadotrophin control, insulin secretion, energy balance, and chronic inflammation^[34]2–[35]4. According to the database^[36]11,[37]12, 533 potential genes and 145 SNPs have been linked to PCOS, and countless studies are being conducted throughout the world to comprehend the molecular signals and their intricate underlying mechanisms. It is terrible that women are still not completely aware of their reproductive health. In our previous survey, we discovered that the majority of the population is unaware of their menstrual health while experiencing a few symptoms such as irregular menstruation, dysmenorrhea, and so on^[38]13. The condition stays the same in both developing and developed nations, and this could be related to traditional and cultural views^[39]14–[40]16. Over the last 2 decades, the global burden of PCOS-related infertility has grown dramatically^[41]17. A study showed the same observation as that of Dhar et al.^[42]13, at 30 years of age and those with a prior diagnosis of PCOS were 4 times more likely to have reported difficulties becoming pregnant than those undiagnosed and regularly sought medical treatment^[43]18. Contradictory to this, an earlier study reported that POM was present among 14% of healthy women and it was more prevalent among women in the age group < 35 years^[44]19. In addition, irregular menstrual bleeding is less trustworthy until it is prolonged, but moderate hair growth and chronic anovulation are normal in late puberty and early adolescence, which can lead to a delayed diagnosis^[45]20 and failure in PCOS management. So, accumulating shreds of evidence reinforces the importance of detection of PCOS early in youth or before puberty to ensure timely diagnosis and adequate health care (Fig. [46]1). Fig. 1. [47]Fig. 1 [48]Open in a new tab Gaps & scope of the present research hypothesis. In the era of medical advent, the introduction of next-generation sequencing (NGS) has caused a paradigm shift in genomics research, providing unequalled capabilities for analysing DNA and RNA molecules in a high-throughput manner. It provides insights into genome structure, genetic variants, gene expression levels, and epigenetic alterations, broadening research into uncommon genetic illnesses, cancer genomics, and precision medicine^[49]21. Clinical exome sequencing (CES) is a cutting-edge molecular diagnostic technology that rapidly discovers disease-causing genetic abnormalities in any human gene and is gaining popularity in clinical settings. Its high diagnostic yield makes it cost-effective and suitable for routine diagnosis in patients with a variety of phenotypes^[50]22. Since the last few years, a plethora of studies have been conducted to elucidate the genetic architecture of PCOS, with the majority of the studies using whole-exome sequencing methodology^[51]8,[52]9,[53]23–[54]26. The current advances in genomics research of PCOS have primarily focused on family cohorts to better understand the disease’s inheritance pattern, whereas the genetic mechanism behind obesity-mediated PCOS remains unknown. With this background, the present study has attempted to explore the clinically relevant mutational landscape of PCOS emphasizing the obese and lean category subject. In addition to this, this study has tried to elaborately represent the pathways enrichment, and functionality enrichment using a holistic bioinformatics approach so that further risk can be predicted and managed cost-effectively. Materials and methods Selection of study subjects This study was conducted in urban and peri-urban areas of West Bengal, India, based on the epidemiological survey^[55]13. For subject selection, Rotterdam criteria were employed for confirmation of PCOS and two representative probands (N = 2) were included in this study. The reason behind the selection of these subjects due to their opposite anthropometric features. To start with, all demographic and lifestyle details have been collected. Pathological screening like blood parameters and pelvic ultrasonography have been done along with other phenotypic assessments such as hirsutism, acne, alopecia,acanthosis nigricans, skin tags/moles. Pedigree data was also taken to understand familial aggregation. Subjects provided written informed consent for the genetic research studies, which were performed following the study protocol approved by the Institutional ethical committee of the University of Calcutta (Ref. No. 07/ET/20-21/1777) and in concordance with the Helsinki Declaration of 1975, as revised in 2008. Biological sample collection and preparation 3 mL of whole blood was collected from each subject and genomic DNA was extracted from peripheral blood mononuclear cells (PBMCs) through density gradient centrifugation using HiSep™ (Himedia, India, LS001), followed by TriZol (Invitrogen™, United States) method. The concentration and purity of extracted DNA were assessed using a NanoDrop instrument (NanoDrop Technologies, Wilmington, DE, USA). Clinical exome sequencing For quality control, genomic DNA was further quantified using a Qubit assay (Invitrogen, USA). Exome regions were captured using the Twist Human Comprehensive Exome Kit (Twist Bioscience, San Francisco, CA, USA). Paired-end sequencing (2 × 150 bp) was performed using an Illumina HiSeqX / NovaSeq next-generation sequencing platform (San Diego, CA, USA), yielding a minimum read depth of 80-100X. The sequences were aligned with the human reference assembly (hg38). The standard pipeline using the GATK tools for exome sequencing data analysis which has been shown to yield results of higher quality was used for analysis. Variants screening using bioinformatics approach Ultra-filtration was done in different approaches using VEP, Variation origin, OMIM report and Variation class. The functional enrichment analysis in terms of both pathways and ontology enrichment was done using the EnrichR bioinformatic tool^[56]27–[57]29 and visualization was done using Appyters ([58]https://appyters.maayanlab.cloud/Enrichment_Analysis_Visualizer) tool. For pathway enrichment analysis, 6 websites viz. Reactome 2022, KEGG 2021, Elsevier Pathway Collection, WikiPathway 2023, Biocarta 2016 and Panther 2016 has been used. For ontology analysis, Biological process, Cellular components and Molecular functions has been taken into consideration. Protein–protein interaction (PPI) network was created using another web-based tool viz. NetworkAnalyst v3.0 ([59]https://www.networkanalyst.ca/NetworkAnalyst/). Results Description of the study participants This study encompasses two Indian Hindu Bengali families with 2 PCOS probands, one PCOS lean phenotype and another PCOS obese phenotype. Demography and anthropometric features have been defined in Table [60]1. Table 1. Demographic and anthropometric details of the Subjects participate in the present study. Features PCOS-Lean PCOS-Obese Age (years) 28 29 Residential status Urban Peri-urban Economic status Higher-middle class Literacy Yes Food habit Non-vegetarian Any medications Yes (OCP + Metformin + Folate supplement) Yes (only OCP) Physical activity High Low to moderate Family history No Yes Blood pressure 136/89 118/78 Pulse rate 94 101 Body weight (kg) 45 79.4 Total fat content (%) 17.7 37.6 Visceral fat content (%) 0.5 9 BMI (kg/m^2) 19.5 29.5 Resting metabolism 919 1532 Body age (years) 18 46 Subcutaneous mass content (%) Whole body 14 34.5 Trunk 9.7 30.4 Arm 28.6 52.2 Leg 24.4 50.2 Skeletal muscle mass content (%) Whole body 29.4 23.4 Trunk 25.8 17.3 Arm 37.1 20.5 Leg 39.6 37.1 [61]Open in a new tab Proband-1 (PCOS lean phenotype): At the time of this study, the age of the subject was 28 years and the age of onset was 22 years. The pelvic ultrasonography clearly showed the bilateral polycystic ovaries; the right ovary was enlarged (16.8 cc vol.) and the left ovary was normal in size (11.6 cc vol.) with peripheral pea-sized multiple follicles. The level of Thyroid stimulating hormone (TSH), Prolactin, Leutinizing hormone (LH) Follicle stimulating hormone (FSH), Anti-mullerian hormone (AMH) and fasting insulin were 1.3 µIU/mL, 8.9 ng/mL, 1.9 mIU/mL, 2.7 mIU/mL, 7.29 ng/mL and 9.3 U/mL respectively. Other phenotypic characteristics including acne, acanthosis nigricans, and hirsutisms were not seen, but a slight alopecia was observed. Therefore, the subject belongs to Phenotype-D, i.e., Non-hyperandrogenic PCOS category. The family pedigree is depicted in Fig. [62]2a. Fig. 2. [63]Fig. 2 [64]Open in a new tab Familial aggregation chart. (a) Lean PCOS subject, (b) Obese PCOS subject. Proband-2 (PCOS obese phenotype): The age of the subject (at the time of this study) was 29 years and the age of onset was 16 years. The hormonal profile and fasting glucose data were 4.76 µIU/mL, 26.70 ng/mL, 10.46 mIU/mL, 4.74 mIU/mL, 2.85 ng/mL and 110.5 mg/dL respectively. The pelvic features showed bilateral polycystic ovaries with both enlarged ovaries; the right ovary was 13 cc vol. and the left ovary was 14 cc vol. with peripheral pea-sized multiple follicles. Subject exhibited extreme hirsutism and slight alopecia whereas acne and acanthosis nigricans were not observed. Therefore, the subject belongs to Phenotype-A, i.e., Full-blown PCOS category. The family history is depicted in Fig. [65]2b. Variants summary A total of 26,550 variants were identified in the clinically important exomes from two samples, among which 93.56% were SNPs and the rest were InDels. After filtering and eliminating the synonymous variations, we identified 92.68% missense variants and the rest contained nonsense, startloss, stoploss, frameshift-InDels and inframe-InDels. After passing through variant filtering, 5170 variations were found to be common in all samples whereas 2232 and 2322 variations were unique among PCOS lean phenotype and PCOS obese phenotype respectively. Among 5170 variants, only 262 variants from 131 genes were PCOS-specific, according to the PCOSKb database ([66]https://pcoskb.bicnirrh.res.in/index.php). Similarly, 120 variants from 70 genes were PCOS-specific in lean PCOS and 94 variants from 70 genes in obese PCOS. Classification of variants and functional enrichment analysis Figure [67]3 depicts the distribution of various types of mutations. To increase the chance of detecting biological processes related to the above genetic variants, all variants, regardless of the expected effect, were included in functional and pathway enrichment analysis. Functional enrichment analysis on 131 genes to which the above 262 variants (Supplementary Fig. 1 and 2) were assigned showed predominant biological processes like positive regulation of protein phosphorylation, cytokine production, protein kinase B signalling, regulation of MAPK cascade, etc. which involved in molecular functions like steroid hydroxylase activity, heme binding, IGF-binding, transcription regulatory region nucleic acid binding, Wnt receptor activity, etc. However, in the lean-PCOS subject, analysis of 70 genes (Supplementary Fig. 3) showed the most important biological processes like cellular response to peptide hormone stimulus, response to insulin, regulation of protein phosphorylation, regulation of glucose import, etc. thereby participating in molecular functions such as transmembrane receptor protein tyrosine kinase activity, sequence-specific dsDNA binding, steroid hydroxylase activity, ATP binding, transcription cis-regulatory region binding, etc. (Fig. [68]4). Similarly, in obese-PCOS subject, analysis of 70 genes (Supplementary Fig. 4) showed important biological processes like regulation of glucose import, negative regulation of cell differentiation, cellular response to insulin stimulus, regulation of fat cell differentiation, etc. thereby participating in molecular functions such as LDL particle binding, phosphatidylinositol phospholipase activity, IGF binding, Ca^2+ binding, protein homodimerization activity, etc. (Fig. [69]5). Fig. 3. [70]Fig. 3 [71]Open in a new tab Distribution of different types of mutations found in the study participants. Fig. 4. [72]Fig. 4 [73]Open in a new tab Functional enrichment analysis in Lean PCOS. (a,c,e) Bar charts of top 10 terms from the GO-2023 database gene set library, based on a combined score of p-value and q-value; (b,d,f) Volcano plots of terms from the GO-2023 database gene set; each point represents a single term, plotted by the corresponding odds ratio in X-axis and -log[10] p-value in Y-axis from the enrichment results of the input query gene set; larger and darker-coloured the point, the more significantly enriched the input gene set for the term. Fig. 5. [74]Fig. 5 [75]Open in a new tab Functional enrichment analysis in Obese PCOS. (a,c,e) Bar charts of top 10 terms from the GO-2023 database gene set library, based on a combined score of p-value and q-value; (b,d,f) Volcano plots of terms from the GO-2023 database gene set; each point represents a single term, plotted by the corresponding odds ratio in X-axis and -log[10] p-value in Y-axis from the enrichment results of the input query gene set; larger and darker-coloured the point, the more significantly enriched the input gene set for the term. The pathway enrichment analysis of 131 common genes was depicted in Supplementary Fig. 5 and 6. In lean-PCOS (Supplementary Fig. 7), HIF-1 signalling, PI3K-Akt signalling, AMPK signalling, complement & coagulation cascading, etc. play the major role in PCOS manifestation (Fig. [76]6) whereas in obese-PCOS (Supplementary Fig. 8), longevity regulating pathway, pathways in cancer, FoxO signalling, GPCR downstream signalling plays a crucial role (Fig. [77]7). Fig. 6. [78]Fig. 6 [79]Open in a new tab Pathway enrichment analysis in Lean PCOS. (a,c,e,g,i,k) Bar charts of top 10 terms from pathway analysis database gene set library, based on a combined score of p-value and q-value; (b,d,f,h,j,l) Volcano plots of terms from the pathway analysis database gene set; each point represents a single term, plotted by the corresponding odds ratio in X-axis and -log[10] p-value in Y-axis from the enrichment results of the input query gene set; larger and darker-coloured the point, the more significantly enriched the input gene set for the term. Fig. 7. [80]Fig. 7 [81]Open in a new tab Pathway enrichment analysis in Obese PCOS. (a,c,e,g,i,k) Bar charts of top 10 terms from pathway analysis database gene set library, based on a combined score of p-value and q-value; (b,d,f,h,j,l) Volcano plots of terms from the pathway analysis database gene set; each point represents a single term, plotted by the corresponding odds ratio in X-axis and -log[10] p-value in Y-axis from the enrichment results of the input query gene set; larger and darker-coloured the point, the more significantly enriched the input gene set for the term. The PPI network for both subjects has been demonstrated in Fig. [82]8. Genes with higher scores in terms of node degree and betweenness were considered ‘hub’ genes. The hub genes include EGFR, NFKB1, AR, NCOR1, ERBB2, PIK3R1, INSR, VEGFA, RUNX2 and AKT3 in the case of PCOS subject with lean phenotype while AKT1, YAP1, PTEN, PIK3CA, BRCA1, TLR2, INSR and TLR2 in the case of PCOS subject with obese phenotype. Fig. 8. [83]Fig. 8 [84]Open in a new tab Protein–protein interaction network. (a) Lean PCOS, (b) Obese PCOS. Variants prioritization To shortlist the most potent clinically relevant variants, 3 filters have been applied to the clinical-exome sequencing data, i.e. only germline mutations with ‘High’ VEP and OMIM ID have been selected. Therefore, 4 unique variants were found in the lean-PCOS subject, 2 unique variants were found in the obese-PCOS subject and 5 common variants were found in both (Table [85]2). Among them, VDR (rs2228570), CYP21A2 (rs7755898), KISS1 (rs71745629) and XDH (rs773456900) were already found as PCOS-associated genes. Table 2. Summary of the important candidate variants from clinical-exome sequencing of PCOS subjects. Subject Gene (rsID) Variation class Inheritance Clinical significance Zygosity Associated phenotypes 1000G gnomAD SAS Both TMEM216 (rs10897158) Intronic splice acceptor Autosomal recessive Benign Homo.^1 Joubert syndrome, Meckel syndrome 0.719649 0.819658 Both VDR (rs2228570) Start-loss Autosomal recessive Likely pathogenic Hetero.^2 Vitamin-D deficiency with or without alopecia 0.671526 0.738688 Both CTU2 (rs11278302) Intronic splice donor Autosomal recessive Benign Homo Microcephaly, facial dysmorphism, renal agenesis, and ambiguous genitalia syndrome 0.773762 0.844626 Both SON (rs34373121) Frameshift insertion Autosomal dominant Benign Homo ZTTK syndrome 1 0.999793 Both SON (rs34377180) Frameshift deletion L-PCOS CHIT1 (rs150192398) Nonsense Autosomal recessive Benign Hetero Chitotriosidase deficiency 0.289137 0.427436 L-PCOS FUT6 (rs145035679) Nonsense NA Uncertain Hetero Fucosyltransferase-6 deficiency 0.0289537 0.0532974 L-PCOS Cyp21A2 (rs7755898) Nonsense Autosomal recessive Pathogenic Hetero Hyperandrogenism, congenital adrenal hyperplasia due to 21-hydroxylase deficiency NA^3 0.0128205 L-PCOS Fam20C (rs774848096) Nonsense Autosomal recessive Benign Hetero Raine syndrome NA 0.317136 Ob-PCOS KISS1 (rs71745629) Stop-loss Autosomal recessive Benign Hetero Congenital idiopathic hypogonadotropic hypogonadism 0.221845 0.194549 Ob-PCOS XDH (rs773456900) Frameshift insertion Autosomal recessive Pathogenic Hetero Xanthinuria NA 0.0008285 [86]Open in a new tab ^1Homozygous ^2Heterozygous ^3Not found Bold represents the PCOS-specific genes according to PCOSkb database. To find out candidate mutations in mitochondrial DNA, 7 unique variants were found in the lean-PCOS subject, 3 unique variants were found in the obese-PCOS subject and 5 common variants were found in both (Table [87]3). Table 3. Summary of shortlisted variations in mitochondrial DNA of PCOS women. Subject Gene (rsID) Variation class Variation impact Gene type Zygosity Allele frequency Ontology Both CYB (rs527236041) Missense (Thr7Ile) Moderate Protein Homo NA Oxidoreductase and ubiquinol-cytochrome-c reductase activity CYB (rs2853508) Missense (Thr194Ala) Moderate Protein Homo NA RNR1 (rs2001030) Exonic-NC Modifier rRNA Homo NA NA RNR1 (rs2853518) Exonic-NC Modifier rRNA Homo NA ATP6 (rs2001031) Missense (Thr112Ala) Moderate Protein Homo NA ATP hydrolysis activity and proton transmembrane transporter activity L-PCOS ATP6 (rs879150284) Missense (Thr45Ala) Moderate Protein Homo NA ATP6 (novel) Missense (Thr13Ala) Moderate Protein Homo NA TL2 (rs2853498) Exonic-NC Modifier tRNA Homo NA NA ND1 (rs200180511) Missense (Thr263Ala) Moderate Protein Homo NA NADH dehydrogenase (ubiquinone) activity ND2 (rs878939965) Missense (Trp239Cys) Moderate Protein Homo NA ND5 (rs28359178) Missense (Ala458Thr) Moderate Protein Homo NA RNR2 (novel) Exonic-NC Modifier rRNA Homo NA NA O-PCOS RNR2 (rs200040509) Exonic-NC Modifier rRNA Hetero NA TC (rs9659239) Exonic-NC Modifier tRNA Homo NA NA CO1 (novel) Missense (Ile416Thr) Moderate Protein Homo NA Fe-ion binding and electron transfer activity [88]Open in a new tab Comment Principle findings PCOS has become a global concern due to its significant role in female infertility, as well as ovarian and/or endometrial cancer. It is one of the most complex transgenerationally inherited endocrine-related metabolic disorders and is also difficult to detect at an early stage of life due to its intricate phenotypic nature. Hitherto, a plethora of studies have already revealed the genetic association of PCOS and the familial aggregation of PCOS, but the genetic architecture of the disease is still not clear. The present study highlights the important genetic variations in PCOS in the Bengali population from West Bengal, India and attempts to understand the molecular crosstalk of PCOS while stratified by BMI. The results may imply the genetic difference between a lean PCOS group and an obese PCOS group, which in turn, helps in precision medicine invention. Results in the context of what is known Substantial reports have been found the increasing trends of PCOS prevalence worldwide. In 2016, Bozdag et al. revealed that the PCOS prevalence according to NIH, Rotterdam and AE-PCOS Society was 6%, 10% and 10% respectively^[89]27 while these proportions have become 5.5%, 11% and 7.1% respectively in 2024^[90]28. Overall, global prevalence of hirsutism, hyper-androgenism, PCOM and oligo-anovulation were 13%, 11%, 28% and 15% respectively^[91]27. Nonetheless, studies have also shown that the proportion of PCOS-A, PCOS-B, PCOS-C and PCOS-D phenotype were 44.8%, 14.9%, 16.2% and 19.5% respectively in USA and Europe^[92]29 whereas that of 23.9%, 46.3%, 21.6% and 8.2% respectively in Middle-east countries^[93]30 and PCOS-D was more prevalent in East-Asian countries^[94]31. In the last few years, studies have shown that the prevalence of PCOS phenotype-A has been decreased from 67.7% to 22.72% while that of PCOS phenotype-D has been sharply increased from 3.6% to 49.53% in India^[95]32,[96]33. However, Dadachanji and his group reported that prevalence of PCOS phenotype-D was highest (63.1%) followed by phenotype-A (30%) among lean subjects whereas the prevalence of phenotype-A (46.8%) and D (42.2%) were relatively closure among obese group^[97]34. In general, PCOS-A possesses insulin resistance, excess LH, T, 17αOHPG and ovarian estradiol by involving ovarian core pathways whereas PCOS-D possesses excess progesterone by involving adrenal regulatory pathways^[98]35. In the present study, the Proband-1 exhibit LH : FSH ratio ≤ 1 and increased AMH level suggesting the possibility of functional hypothalamic amenorrhea (FHA) which cause ovulation dysfunction and ovarian cyst formation^[99]36. Contrasting to this, high fasting glucose level (insulin resistance) might disrupt the ovarian steroidogenesis mechanism and thereby cause PCOS by initiating HPO feedback loop^[100]35. In previous studies, VDR polymorphism (rs2228570) aka Fok1 was found to be associated with systemic lupus erythematosus^[101]37, chronic spontaneous urticaria^[102]38, T2DM^[103]39, Parkinson’s disease^[104]40, autoimmune thyroiditis^[105]41, Vit-D deficiency^[106]42, and many other disorders^[107]43–[108]45. In PCOS individuals, rs2228570 showed significant association with phenotype C, i.e., hyperandrogenemia and PCOM^[109]46 and homozygous mutant genotype showed significant association with traits like hirsutism, infertility, alopecia and acne^[110]47. Lurie et al.^[111]45 reported that each copy of the mutant allele was associated with a modest 9% increased risk of invasive ovarian carcinoma. In our study, a heterozygous genotype of Fok1 has been found in both categories of PCOS women and thus can be correlated with the phenotypic features. Polymorphism in Cyp21A2 (rs7755898) was found to be associated with Congenital Adrenal Hyperplasia due to 21-hydroxylase deficiency^[112]48. However, this rare variant (~ 1% of the population) has been found in Lean PCOS individuals for the first time in our study. The KISS1 gene or its polymorphisms have not been reported to be associated directly with PCOS yet. However, follicular development is controlled by both estrogen production and feedback, with low estrogen levels fine-tuning GnRH/LH pulses via negative feedback. Estrogen production rises with follicular development, resulting in a surge in GnRH/LH and ovulation via positive feedback. Kisspeptin neurons, which are connected to GnRH neurons, play an important role in LH preovulatory surge and ovulation defoliation. Kisspeptin’s positive feedback and raised LH levels play important roles in PCOS pathogenesis^[113]49. In this study, KISS1 (rs71745629) has been found in Obese PCOS subject as a unique variant. It is noteworthy to mention that this stop-loss mutation was earlier found to be associated with metastatic cancer patients via VEGF-dependent angiogenesis signalling^[114]50. Besides this, another polymorphism in the XDH gene (rs773456900) has been found in our study which is a very rare variant and has not been reported to be associated with any disease. Apart from these PCOS-associated genes, TMEM216 (rs10897158), CTU2 (rs11278302), SON (rs34373121 & rs34377180) were found in both PCOS subjects whereas CHIT1 (rs150192398), FUT6 (rs145035679), Fam20C (rs774848096) were found in lean PCOS. In this study, the family pedigree tree showed that T2DM is the most predominant metabolic disorder among both probands’ families indicating a genetic pattern of shared architecture with T2DM. In addition, hypertension has been frequently found in the case of the Lean-PCOS family, which denotes the cardiometabolic risk in the subject. However, PCOS was not detected in any of the family members of lean-PCOS although hypomenorrhea on the paternal side, but it was detected in the mother of obese PCOS, thereby confirming the non-mendelian inheritance pattern of PCOS. Dhar and her colleagues^[115]51 predicted mutations in 11 loci, i.e. LepR, CAPN10, INSR, IRS1, CYP1A1, CYP11A1, CYP17A1, CYP19A1, ESR1, FSHR, and HSD11B1 which are functionally co-expressed with 5 loci (Lep, INS, FDX1, PTPN1 and PTPN11). In another study, mutations in 6 loci, i.e. ERBB4, GATA4, INSR, LHCGR, SUOX and YAP1 were predicted to have a potential effect on the manifestation of PCOS^[116]52. In this study, mutations in LepR, CAPN10, INSR, GATA4, SUOX, CYP1A1, CYP19A1 and ESR1 were observed in both category subjects while mutations in Lep, ERBB4 were found in lean PCOS and mutations in YAP1 was found in obese PCOS. Combining results suggest that leptin signalling impairment and insulin resistance thus play a major role in PCOS of the study population. Besides this, mutations in CYP1A1, CYP19A1 and ESR1 genes disrupt ovarian steroidogenesis and initiate a negative feedback loop, thereby leading to estrogen sensitivity and aromatase deficiency^[117]53,[118]54. Additionally, mutations in AR (affecting androgen signalling), AMH (affecting the secretion of AMH), AdipoR1 (affecting adiponectin signalling), NAMPT, NPY (affecting energy homeostasis), PTEN, EGFR, Akt (cancer signalling) were observed. Clinical implications Comorbidity analysis using the data from the present study showed in Fig. [119]9. Based on genetic network of the study participants, it was found that the risk for estrogen resistance, leptin receptor deficiency, MTHFR/folate deficiency, T2DM and acanthosis to co-occur with PCOS was much higher while shared-gene based analysis showed very high risk for T2DM to co-occur with PCOS. Overall, this study may give a comprehensive overview of PCOS among young women from West Bengal, India emphasizing obesity as the major phenotypic expression. Henceforth, folate supplementation, insulin sensitizers can effectively act as a therapeutic drug upon the associated genetic risk factors. Fig. 9. [120]Fig. 9 [121]Open in a new tab Comorbidity analysis of PCOS individuals based on (a) network-based separation of shared genes, (b) shared genes. Research implications Previously, a whole-genome association study on women from Asian and European populations revealed that DENND1A, THADA, FSHR, LHCGR, AMH, AMHR2, ADIPOQ, FTO, HNF1A, CYP19, YAP1, HMGA2, RAB5B, SUOX, INSR, and TOX3 were the common genes associated with PCOS^[122]55. In Table [123]4, the comprehensive summary of genomics study on PCOS subjects. With this background, our study, for the first time, attempted to identify and differentiate the genetic outlook of PCOS between Obese and lean subjects. Our findings support future exploration with a statistically significant sample size to validate the high-throughput data. On the other hand, the identified genetic variations in this study must be extrapolated using a sanger-sequencing technique in a large population size in order to establish the variant as an early indicator of PCOS. Nonetheless, protein–protein interaction network and comorbidity analysis, when combined, show the primary signalling pathways and their ultimate fate, implying the extent of risk detection at the pre-pubertal stage and protection from future or long-term issues. Table 4. Comprehensive summary of identified genetic variations among different study populations. Study population Study design Methodology Sample size Observation References