Abstract Limitations in cognitive functioning and adaptive behavior are hallmarks of Intellectual Disability (ID), a neurodevelopmental disease. Specific genetic disorders that result in ID can also have immune system anomalies, such as changes in T (CD4^+ and CD8^+) cell activity. This work aimed to compare single-cell RNA-sequencing (scRNA-seq) and transcriptome data to find biomarkers linked to T cells that could potentially be utilized for the diagnosis and assessment of ID. After integrating genes and performing a comparative analysis 196 genes were identified as differentially expressed genes (DEGs). Furthermore, the DAVID online platform and FunRich software were utilized to detect signal transduction and translation, immune response, MHC (Major Histocompatibility Complex) class II, antigen processing and presentation, allograft rejection and important pathways of type I diabetes mellitus. In this investigation, six ribosomal proteins (RPS27A, RPS21, RPS18, RPS7, RPS5, and RPL9) have been identified as the hub genes of ID from PPI. Additionally, eleven topological algorithms discovered only one hub protein, namely RPS27A from the protein-protein interaction (PPI) network. Through the analysis of the regulatory network, we have identified several crucial transcriptional factors (TFs) including FOXC1, FOXL1, and GATA2; microRNAs such as mir-92a-3p, and mir-16-5p were investigated by procedural data analysis. This study used scRNA-seq and transcriptomics data analysis to define unique biomarkers associated with T cell types throughout the progression of ID. Ongoing research on the activity of ID genes is contributing to a greater understanding of the pathophysiology of ID and will become more scientific and research-based in future. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-85162-4. Keywords: Intellectual disability, Single-cell RNA-sequencing, T cell, Biomarkers, Hub genes, RPS27A, And FOXL1 Subject terms: Computational biology and bioinformatics, Drug discovery, Systems biology, Biomarkers, Molecular medicine Introduction Intellectual disability (ID) is defined by major obstacles in intellectual and adaptive skills that begin throughout stages of development^[40]1. Intellectual disability can be distinguished primarily by deficits in cognitive as well as adaptive operation; its dominance is estimated to range from 1 to 3% of the population with geographical variations^[41]2. This particular syndrome is commonly observed, with a prevalence of approximately 1.5% in Western countries, and potentially increasing to 4% in socioeconomically disadvantaged regions of the world^[42]3. The average occurrences of ID in Western nations are estimated between 1.5% and 2%, with a further 0.3–0.5% experiencing severe impairment, characterized by IQ levels below 50^[43]4. The development and transmission of synapses is one of the biological functions that is frequently discussed. Therefore, it is evident that variations in genes connected to ID affect a variety of biological processes^[44]5. Centrosome function, protein modification, chromatin remodeling, transcriptional and translational regulation, and the development of neural and supporting nervous system cells are some of these activities^[45]6,[46]7. On the other hand, severe variants are believed to be the result of specific genetic reasons, chromosomal anomalies, or deficiencies in single genes. Considerable advancements have been achieved in understanding the genetic elements that produce severe ID over recent years. Approximately 15% of cases are caused by chromosomal abnormalities that are cytogenetically detectable^[47]4. There is no causal relationship that can be established between T cells and intellectual disability. However, people who have specific genetic disorders that result in intellectual disability can also have immune system anomalies, such as changes in T cell activity^[48]8. The vulnerability of an individual to infections and their ability to combat them may be influenced by the function of T cells. Altering CD8^+T cell expressing activation markers, autism patients have dramatically altered adaptive cellular immune function, which may be a reflection of behavioral abnormalities, developmental disorders, and defective immunological activation^[49]9. Mild types of ID are thought to be on the low end of the normal range for IQ, which is the consequence of numerous genetic as well as nongenetic elements. As the list of genes related to ID has expanded, numerous scientists have searched for patterns in the roles that these genes encode. With hundreds of genes currently recognized to participate in the phenotype, there is a great deal of genetic variability associated with ID^[50]10. Several academic articles have revealed biomarkers associated with ID. Notable X-linked genes include MECP2, which was first associated with Rett syndrome and is now responsible for several male and female-specific ID symptoms^[51]11. There have since been two reports of mutations in the DYNC1H1 gene in other ID patients^[52]12. The pathophysiology of various neuropsychiatric traits is also influenced by genes related to ID. Even with the latest advancements in gene discovery, only a small percentage of ID cases are still explainable. Significant clinical and genetic heterogeneity made large-scale genetic research difficult. Through computational analysis, a novel biomarker discovery is urgently needed for a better understanding of the molecular mechanisms of ID. RNA sequencing and next-generation sequencing (NGS) has contributed to the recent rapid progress in differential gene identification such as ID. Some recent studies^[53]13,[54]14 have used single-cell RNA sequencing (scRNA-seq) to determine disease pathogenesis and prognosis and have been commonly used to identify new biomarkers. We used scRNA-seq in this study to investigate transcriptional patterns at the level of gene expression^[55]15. Our objective was to identify key genes (KG), key microRNAs (miRNAs), and key transcriptional factors (TFs) that might be utilized for the personalized diagnosis and prognosis of ID. Therefore, the use of bioinformatics analysis to identify significant genes, miRNAs, TFs, and associated signaling pathways in the context of ID holds great potential for advancing future research in this field. Results Processing of collected datasets After reading the raw dataset, we created a Seurat object so that it could be further examined. We used initial quality control during the data read process to eliminate low quality cells with less than 200 expressed features (genes) and lowly expressed features seen in less than three cells. We removed zero count instances and selected 10,000 read counts from the complete gene expression matrix to illustrate gene expression patterns prior to and during normalization. Approximately 2,000 significant features are usually found by Seurat for our dataset. We have identified 28 clusters employing the KNN algorithm. After cluster analysis, 28 scRNA-seq data clusters (have been annotated using reference-based annotation and represented in an UMAP plot (Fig. [56]1) Then five different sets of cell markers including T cells, B cells, natural killer cells, monocytes, and dendritic cells. Those five cell markers have provided multiple subsets. In addition, Supplementary Table [57]S1 displays the clustering information for 28 detected and annotated clusters. We have merged only seven groups (0, 1, 4, 6, 8, 9, and 10) of T cell related marker including Th1 cells (CD4 + Tcell), Non-Vd2 gd T cells, Mait cells, and Th1 cells. Fig. 1. [58]Fig. 1 [59]Open in a new tab UMAP shows the 28 clusters that were found. A significant number of related DEGs are shown by the region of the colors and various colors representing separate clusters. Generated using ggplot2 (v 3.5.0) in R (v 4.3.1). Statistical analysis and identification of shared DEGs between scRNA-seq and bulk RNA-seq datasets By performing several statistical operations, we have found 1318, 1417, 1223, 1376, 1055, 1420, and 1246 DEGs from 0, 1, 4, 6, 8, 9, and 10 number of clusters, respectively. After removing duplicate and merging all identified DEGs from seven T cell clusters, we have obtained 3510 unique individual DEGs in ID patients. On the other hand, analyzing RNA-seq dataset ([60]GSE46831) through the GREIN database, we identified 3459 significant DEGs of ID applying a cut-off range (P < 0.05, and |log2FC| > 1). Performing cross-match analysis between the merged DEGs of the T cells marker of scRNA-seq and the RNASeq datasets ([61]GSE46831), total 196 shared DEGs were found for further analysis (Fig. [62]2). Among 196 shared DEGs, 102 genes were up-regulated, and 89 genes were down-regulated, showed in Table [63]S2. About five DEGs were found common in both up and down-regulated (HLA-DQA1, HLA-DRA, HLA-DRB1, CSNK2B, PPP1R18). Fig. 2. [64]Fig. 2 [65]Open in a new tab The Venn diagram represents the list of 196 common DEGs between T-cell clusters of scRNA-seq and bulk RNA-seq ([66]GSE46831) datasets; generated using Venny (v2.0.1). Functional pathways enrichment analysis For the gene ontology (GO) pathways, a common 196 DEGs were used in the FunRich software. In the biological process (BP), both signal transduction and translation are enriched with 15.7% genes, cytoplasmic translation with 15.2%, and immune response with 9% genes. In the cellular component (CC), 27.7% of genes are enriched with extracellular vesicular exosome, 13.6% with the cytosolic ribosome, 12% with focal adhesion and 3.8% with the MHC class II protein complex. In molecular function (MF), 84.4% of genes are enriched with protein binding, 20.6% of genes are enriched with RNA binding and 15.6% with the structural constituent of the ribosome (Fig. [67]3). Fig. 3. [68]Fig. 3 [69]Open in a new tab Bar diagram, constructed with Funrich (v 3.4.1), illustrates the analysis of significant gene ontology (GO) pathways of ID. Based on the p-value (< 0.05), the top 10 pathways were included in the (A) biological process (BP), (B) cellular components (CC) and (C) molecular functions (MF) pathways, respectively. We have operated KEGG, Reactome, WiKi, and BioCarta for metabolic pathway enrichment analysis. Based on the p-value (cut-off value < 0.05) and the enrichment score; the most important pathways were taken in this study. In KEGG analysis, Fig. [70]4 shows that the main enriched pathways are allograft rejection (21%), type I diabetes mellitus (18%), and graft-versus-host disease (20%). The Reactome database shows the most enriched pathways for shared DEGs of T cells including viral mRNA translation (24.32%), eukaryotic translation elongation (24.32%), and peptide chain elongation (24.15%) as the most enriched pathways. In WiKi pathways, cytoplasmic ribosomal proteins (18.77%), allograft rejection (7.47%), Ebola virus infection in host cells, and the B-cell receptor signaling pathway are some of the most enriched pathways. In BioCarta, the most enriched pathways are antigen processing and presentation (23.05%), and BCR signaling pathway (5.97%) (Fig. [71]4). Fig. 4. [72]Fig. 4 [73]Open in a new tab The bubble graph represents the significant metabolic pathways of ID. Based on the p-value (< 0.05), the top 10 pathways, were included in the KEGG, Reactome, WiKi and BioCarta databases, respectively; where databases were accesses through DAVID (v 6.0) and visualized using the SRplot web server. Identification of the hub protein from protein-protein interaction (PPI) network In STRING, 196 shared DEGs of T cells were used in the PPI network. The analysis involves 191 nodes and 376 edges with an average degree of node of 3.94 with enrichment p-value < 1.0e-16. We analyze the physical sub-network under the confidence score 700 (high confidence) represented in Fig. [74]5. Based on the score, the most enriched 15 genes were selected from each of eleven topological methods (betweenness, stress, bottleneck, eccentricity, radiality, EPC, MNC, closeness, degree, DMNC, and MCC) in the cytoHubba plugin of Cytoscape. Using these genes, we created an upset plot to represent the most significant hub genes (Fig. [75]6). We extracted the hub genes that cover seven or more methods had represented them in the Upset plot. Therefore, we found 6 genes that are considered key hub genes (KGs) for T cells, such as RPS27A, RPS18, RPS5, RPS7, RPS21, and RPL9. Among these, only RPS27A hub genes were found in 11 algorithms of the cytoHubba plugin. Furthermore, Table [76]1 represents the biological function of the 6 potential hub genes in the human body. Fig. 5. [77]Fig. 5 [78]Open in a new tab Visualization of the PPI network of 196 DEGs, where the sky-blue color indicates the hub genes, the network nodes symbolize target proteins, while the edges denote the relationships between proteins. STRING (v11.0) was employed to construct the network and Cytoscape (v3.10.1) for visualization. Fig. 6. [79]Fig. 6 [80]Open in a new tab Upset plot displaying hub genes identified across eleven cytoHubba topological algorithms, generated using the SRplot web server. RPS27A is present across all methods. The X-axis denotes the eleven cytoHubba algorithms, while the Y-axis represents interaction size. Table 1. Evaluated the six key hub genes through a variety of regulatory processes. Gene symbol Gene Function Description Uniport ID RPS27A Ribosomal protein S27a The genetic sequence considered is responsible for the synthesis of a hybrid protein composed of ubiquitin at the amino (N) terminus and the ribosomal protein S27a at the carboxyl (C) terminus. [81]P62979 RPS18 Ribosomal protein S18 Multiple processed pseudogenes of this gene are dispersed throughout the genome, which is a common characteristic observed in genes that encode ribosomal proteins. [82]P62269 RPS21 Ribosomal protein s21 This protein is a member of the S21E ribosomal protein group and is found in the cytoplasm. [83]P63220 RPS7 Ribosomal protein S7 This protein is a member of the S7E family of ribosomal proteins. [84]P62081 RPS5 Ribosomal protein S5 A small subunit called 40 S and a large subunit called 60 S make up ribosomes, which are organelles that facilitate the process of protein synthesis. [85]P46782 RPL9 Ribosomal protein L9 The gene in question is responsible for encoding a ribosomal protein, which serves as a constituent of the 60 S subunit. [86]P32969 [87]Open in a new tab Chord plot showing significant paths and key hub genes interaction The arcs that connect the elements in a chord plot, which are arranged radially as geometric chords, indicate how the elements interact. Data groupings are distinguished from each other using different arc colors. The most enriched GO terms (BP, CC, and MF) with HUBGs including cytoplasmic translation, cytosolic ribosome, structural constituent of the ribosome, translation, ribosome, and small ribosomal subunit as depicted in Fig. [88]7 have strong connections with the main targets of the ID. Furthermore, the top nine molecular pathways including ribosome, Coronavirus disease - COVID-19, eukaryotic translation elongation, the response of EIF2AK4 (GCN2) to amino acid deficiency, viral mRNA translation, peptide chain elongation, and cytoplasmic ribosomal proteins were related to the core ID targets as shown in Fig. [89]8. Fig. 7. [90]Fig. 7 [91]Open in a new tab Mapping the terms of the gene ontology and their association with the key hub genes in the chord plot, visualized through the chord plot module on the SRplot web server. Based on the log2FC value of key hub genes, different colors indicate the different pathways. Fig. 8. [92]Fig. 8 [93]Open in a new tab Mapping the most enriched molecular pathways and their association with the hub genes in the chord plot, visualized through the chord plot module on the SRplot web server. Based on the log2FC value of key hub genes, different colors indicate the different pathways. TFs-hub gene and hub genes-miRNAs interaction networks Transcriptional and post-transcriptional regulatory networks were identified using network-based techniques to analyze the TF and miRNAs linkage networks of key hub genes. TFs and miRNAs are displayed as squares, and hub genes are displayed as circles shown in Fig. [94]9. Biomolecules including FOXC1, FOXL1, GATA2, TFAP2C, NR3C1, HINFP, and SREBF1 were the most significant TF regulators identified from the analysis of the JASPER database. We determined the quantity of seven miRNAs (hsa-mir-186-5p, hsa-mir-193b-3p, hsa-mir-93-5p, hsa-mir-16-5p, hsa-mir-92a-3p, hsa-mir-5011-5p, and hsa-mir-1277-5p) in our investigation from the miRTarbase database (Fig. [95]9). Furthermore, the biological function of reported biomolecules is represented in Table [96]2. Fig. 9. [97]Fig. 9 [98]Open in a new tab Interaction of key hub genes regulatory and therapeutic interaction network of transcription factors, microRNA, and chemicals. In the figure, the oval shape indicates key hub genes, TFs denote an angle shape, miRNAs denote a rectangular shape and a deep yellow color, and chemicals are differentiated by green colors and rectangular shapes. The interaction network was constructed with NetworkAnalyst (v 3.0) and visualized in Cytoscape (v3.10.1). Table 2. Potential transcriptional and post-transcription regulatory biomolecules of ID. TFs Description Function Transcriptional regulatory biomolecules (TFs) FOXC1 Forkhead box The activity of DNA-binding transcription factors and the specificity of RNA polymerase II. GATA2 GATA binding protein 2 The activity of transcription factors that bind to DNA and their association with chromatin. FOXL1 Forkhead box L1 The transcription factor is essential for the appropriate proliferation and differentiation processes within the gastrointestinal epithelium. TFAP2C Transcription Factor AP-2 Gamma A DNA-binding protein with sequence specificity that engages with inducible viral and cellular enhancer elements. NR3C1 Nuclear Receptor Subfamily 3 Group C Member 1 Aids in the degradation of messenger RNAs (mRNAs) quickly by binding to their 5’ untranslated regions (UTRs). HINFP Histone H4 Transcription Factor The transcriptional repressor exhibits binding affinity to the consensus sequence 5’-CGGACGTT-3’ as to well as the RB1 promoter. SREBF1 Sterol Regulatory Element Binding Transcription Factor 1 The precursor of the transcription factor form, known as processed sterol regulatory element binding protein 1 (SREBP-1) is localized within the membrane of the endoplasmic reticulum. miRNAs Description Function Post-transcription regulatory biomolecules (miRNAs) mir-16- 5p MicroRNA 16 Prevent replication of multiple viruses (EV71). mir-193b-3p MicroRNA 193b Demonstrates reciprocal interaction with MYC and inhibits the growth and metastasis. mir-93-5p MicroRNA 93 Contributes to the genesis of different diseases. mir-92a-3p MicroRNA 92a Reduces PTEN and prevents Eca-109 cells from being phosphorylated and inhibited by Akt. mir-186-5p MicroRNA186 Inhibit tumorigenesis of glioblastoma multiforme both in vitro and in vivo. [99]Open in a new tab Protein chemical interaction analysis of key hub genes Protein-chemical interactions (PCI) are essential for understanding the functions of proteins that assist molecular mechanisms within the cell; this knowledge could be of great assistance in the process of drug discovery. In this investigation, the protein–chemical interaction networks of ID were identified. Thirteen potentially interrelated chemical compounds were identified, among them chloropicrin, sodium selenite, arsenic trioxide, estradiol, enzyme inhibitors, and cupric oxide are among the highly enriched chemical agents found (Fig. [100]9). Performance evaluation by ROC curve analysis In this section, six hub genes and seven TFs were used in the ROC analysis. The area under the curve (AUC) in a ROC analysis serves as a performance benchmark. When the AUC score is between 0.5 and 1.0, the classifier is acceptable. In our test, the AUC score of the hub genes for [101]GSE7329 stays between 0.751 (RPS5) and 0.891 (RPS27A); in the [102]GSE25507 dataset, the range is 0.511 (RPS5) to 0.771 (RPS18) shown in Fig. [103]10a and b, respectively. TFs also have shown a significant AUC score in our test. In [104]GSE7329, the range of AUC values are 0.436 to 0.818; where both FOXC1 and FOXL1 show the highest AUC value (Fig. [105]10c). In [106]GSE25507, the range is 0.519 to 0.844; where GATA2 shows the highest AUC value (Fig. [107]10d). Fig. 10. [108]Fig. 10 [109]Open in a new tab ROC curve of potential biomarkers (hub genes and TFs). The ROC curve of six Hub genes was represented in two GEO profiles (a) [110]GSE7329 and (b) [111]GSE25507. On the other hand, seven TFs were represented in two GEO profiles (c) [112]GSE7329 and (d) [113]GSE25507. The ROC curve in constructed via pROC package (v 1.18.5) in R (v 4.3.1). Discussion Using a cutting-edge technique of data analysis technique, single-cell RNA sequencing (scRNA-seq) uncovered previously unreported biomarkers associated with intellectual disability (ID). In summary, our research has helped identify certain genes and pathways that undergo alterations in individuals with intellectual disability (ID), suggesting their potential relevance as diagnostic markers for this condition. Based on our raw dataset, the PBMCs clustered in 28 cellular subsets using the advanced tools of scRNA-seq ^[114]16. In our study, we observed consistent cell proportions for each cell type in the five experimental groups, namely Natural Killer cells, Naïve B cells, T cells, Monocytes and Dendritic cells. In the context of our clustering analysis, we designate the following cell types including Th1 cells (CD4 + Tcell), non-Vd2 gd T cells, Mait cells, and Th1 cells as T cells. Subsequently, a comparison was performed between the aforementioned genes and the DEGs identified in the RNA-seq dataset ([115]GSE46831). We analyzed the set of genes that show differential expression and were found to be shared by the cluster and datasets. Meanwhile, we discerned noteworthy Gene Ontology (GO) within the domains of biological process (BP), cellular component (CC), and molecular function (MF). The Gene Ontology (GO) technology is specifically designed to support the computational representation of biological systems which provides relevant information concerning the function of gene^[116]17. Based on our analysis, it has been determined that translation is among the prominent gene ontology (BP) concepts associated with biological processes. This particular process is responsible for initiating the translation of RNA molecules about specific biomolecules^[117]18. The process of small subunit biogenesis holds significant importance in individuals with intellectual disability, as evidenced by previous research^[118]19. The B cell receptor signaling pathway has been suggested to exhibit potential connections with gene mutations associated with intellectual disability^[119]20. In the context of form CC, it is observed that the most enriched terms exhibit a strong correlation with the ID. The binding of major histocompatibility complex (MHC) class II protein complexes and the activity of MHC class II receptors have been found to indicate a significant correlation with immune disorders. Insufficiency of this particular protein has been identified as a potential etiological factor that can contribute to intellectual disability (ID)^[120]21. Furthermore, there is a substantial correlation between 5’-UTR mRNA binding and RNA binding, particularly in the context of fragile X syndrome, a specific type of intellectual disability (ID)^[121]22. Certain genes associated with autism spectrum disorder (ASD) can cause an inappropriate immunological response, which is also related to intellectual disability (ID)^[122]23. In contrast, we discovered the KEGG, Reactome, WiKi, and BioCarta pathways. We found that the most important KEGG pathways were related to allograft rejection, graft-versus-host disease, asthma, type I diabetes mellitus, autoimmune thyroid disease, and coronavirus disease. A recent study shows that people with ID are at increased risk for respiratory problems and asthma and demand special care from a trained caregiver^[123]24. The risk of diabetes is prevalent within the population of people with intellectual disability (ID)^[124]23. There is a documented correlation between autoimmune thyroid disease and individuals with intellectual disabilities, particularly in the pediatric population^[125]25. Individuals with intellectual disabilities are more likely to contract coronavirus disease 2019 (COVID-19) and have negative consequences from it^[126]26. The primary expression of Reactome pathways was found in SARS-CoV-1, which affects the host translation machinery, CD22-mediated BCR control, ZAP-70 translocation to the immunological synapse, selenocysteine synthesis and peptide chain elongation. The results indicate that selenium is synthesized and incorporated into selenoproteins. SEPSECS mutations have been associated with selenium insufficiency for severe intellectual disabilities^[127]27. We discovered that significant pathways were largely expressed mainly in the B cell receptor complex, allograft rejection, antigen processing and the presentation, and BCR signaling pathway using the WiKi pathway. A large number of biological activities, including signal transduction and transcriptional control, are based on protein-protein interactions^[128]28. The PI3K/AKT/mTOR- vitamin D3 signaling pathway exhibited notable enrichment within the BioCarta pathway. A comprehensive study has revealed the pathogenic involvement of mTOR signaling in various neurological disorders, including epilepsy, autism (ASD), intellectual disability (ID), dementia, traumatic brain injury, brain tumors, and hypoxic-ischemic injury^[129]29. A PPI network facilitates the formation of protein complexes and regulates many cellular processes such as signaling, regulation, and transport^[130]30. In our genomic analysis, we successfully identified six highly connected central hub genes, namely RPS27A, RPS21, RPS18, RPS7, RPS9, and RPL9. This identification was achieved by utilizing eleven distinct computational methods available within the Cytoscape CytoHubba plug-in. All of our hub genes are related to the ribosomal protein (RP) family. These RPs play a crucial role in various biological processes, including ribosome biogenesis, protein synthesis, cellular growth, development, and programmed cell death (apoptosis)^[131]31. Immunometabolism is the process by which immune cell activity and metabolic pathways are combined, and ribosomal proteins are involved in this process. During activation, T cells undergo metabolic reprogramming, changing from oxidative phosphorylation to glycolysis. This needs increased ribosome biogenesis and function^[132]32. Certain ribosomal proteins have direct interactions with RNA molecules or affect transcription factors’ activities, regulating the expression of genes essential for T cell growth, differentiation, and function^[133]33. Ribosome biogenesis plays a crucial role in regulating cell development and proliferation. Dysregulation of this process can lead to abnormal cell growth and pathological conditions such cancer and metabolic diseases^[134]32. Some metabolic conditions, especially those that affect the nerve system or brain, might cause behavioral problems, cognitive problems, or developmental delays that resemble autism spectrum disorders^[135]34. A study related to perisylvian polymicrogyria shows that RPS27A has been highly enriched when genes were extracted from the ID patient^[136]35. RPS27A dysfunction impacts ribosome assembly, protein synthesis, and the Ubiquitin-Proteasome System (UPS), resulting in cellular stress and immunological dysfunction. These impacts may affect neurodevelopment, potentially leading to ID^[137]36. In our study, RPS6, although not classified as one of our central genes, exhibited expression in T-cell genes. Multiple studies have discovered a significant increase in RPS6 expression among individuals diagnosed with intellectual disability (ID)^[138]37. Numerous investigations have substantiated the involvement of the aforementioned protein in the translation of 5’ terminal oligopyrimidine tract (TOP) mRNAs, alongside its role in governing cell size and proliferation regulation^[139]38. RPS7 mutants have active matrix metalloproteinase (MMP) family genes, suggesting that improper cell migration may be occurring, leading to further dysfunctional development and RPS5 knockdown leads to brain abnormalities in Zebrafish^[140]39. Further investigation may reveal that mutations in RPS5 and RPS7 could be biomarkers of ID. According to a recent study, significant changes in the mRNA levels of RPL9 in both individuals with autism spectrum disorder (ASD) were found to be consistent in the same direction through quantitative polymerase chain reaction (qPCR) validation^[141]40. RPL10 is present in people with intellectual impairment, dysmorphism, and autism^[142]41. The two additional hub genes, RPS18 and RPS21, exhibit limited significance in the identification process. RPS18 has been used as the reference gene in certain studies on autism spectrum disorder (ASD) as the housekeeping gene^[143]42. It is concluded that T cells are essential for the immune response and play an important role in neuroinflammation. Excessive T cell activation can cause prolonged neuroinflammatory states, disrupting normal brain development and function. Disruptions to neuronal growth, synapse formation, and cognitive processes can lead to ID^[144]43. TFs play an important role in controlling transcription, which means that their amounts can be used to find people with diseases like ID. Our study shows that the TFs that regulate DEGs through the TFs-Hub genes interaction network play a key role in the development of ID. The TFs are FOXC1, FOXL1, GATA2, TFAP2C, NR3C1, HINFP, and SREBF1 were identified in our study. Two separate case report studies have established a correlation between the presence of FOXC1 on ring chromosome 6 and intellectual disability (ID), short stature, and various facial deformities^[145]44. The whole genome a GATA2 mutation produces a rare Syndromic Congenital Neutropenia with ID, according to sequencing^[146]45. FOXL1 is a Forkhead box protein (FOX) whose dysregulation encourages the Wnt/b-catenin signaling pathway^[147]45. There is a potential correlation between TFAP2C and the development of intellectual disability, as suggested in previous research^[148]46. Nuclear receptor subfamily 3, group C, member 1 (NR3C1) has been identified by the ingenuity pathway analysis (IPA) as an upstream transcriptional regulator for ID^[149]47. Previous research shows that HINFP and SREBF1 do not have any significant relevance in studies related to intellectual disability (ID). However, it is suggested that these factors may be of value in future investigations. Using the miRTarBase v8.0 database, we investigated the interaction between miRNAs and hub genes. MicroRNAs control the post-transcriptional regulation of gene expression, as opposed to transcription factors, making them a potentially valuable tool for diagnostic testing and biomarker research^[150]48. Based on topological analysis, the following miRNAs have been identified as significant including hsa-mir-186-5p, hsa-mir-193b-3p, hsa-mir-93-5p, hsa-mir-16-5p, hsa-mir-92a-3p, hsa-mir-5011-5p, and hsa-mir-1277-5p. The microRNA mir92a-3p shows a significant correlation with the structure and function. Furthermore, it has been recognized as a biomarker in peripheral blood for schizophrenia^[151]49. Furthermore, dysregulation of mir-92a-3p was found in peripheral blood of patients in a study on gene regulation linked to autism in the Chinese population^[152]50. Furthermore, miR-16-5p has previously been described in rat neurons as a negative regulator of dendritic complexity and mediates BDNF-induced dendritogenesis by regulating the translation of the BDNF mRNA itself, supporting the hypothesis that miR-16-5p plays a role in neuronal development^[153]51. We identified chloropicrin, sodium selenite, arsenic trioxide, estradiol, enzyme inhibitors, and cupric oxide via protein-chemical interaction analysis. Some studies have indicated that sodium selenite treatment improves cognitive performance in triple transgenic Alzheimer’s disease mice by reducing lipid peroxidation^[154]52. A retrospective cohort study was conducted to investigate the association between soil arsenic concentration and intellectual disability (ID) in pregnant women. The findings revealed that elevated soil arsenic concentration was only significantly associated with increased odds of diagnosed ID during the first trimester of pregnancy^[155]53. In the final phase, we evaluated the diagnostic effects of our biomarkers with the help of ROC analysis. Our predicted biomarkers (hub genes and TFs) have shown decent performance during analysis. Our atlas offers a valuable framework for improving our understanding of this intricate immune-mediated disease. Despite several limitations including, lower number of dataset and human-mediated sample, excluding cell line data, total sample number less than ten, and wet laboratory validation of potential biomarkers, we provide a thorough overview of the variations. In summary, we believe that our approach is scientifically valid and provides meaningful insights into the role of CTCF mutations in Intellectual Disability. As a result, we can serve as a resource to guide future studies aimed at identifying the most effective approach for managing or potentially curing disease. Materials and methods A graphical representation of the combined system biology and analytical technique approach to identify biological markers and pathways in the blood tissues of ID patients is presented in Fig. [156]11. Fig. 11. [157]Fig. 11 [158]Open in a new tab Graphical illustration of the workflow in our study, constructed with Adobe Illustrator CC (v 2019). Parameters of single cell RNA-seq dataset analysis Collection and pre-processing The “Single Cell Portal” was used to access publicly available scRNA-seq data (PMID: 28212749), which was collected from patient peripheral blood (Single Cell Comparison: PBMC data - Single Cell Portal (broadinstitute.org))^[159]54. Data processing was performed with the help of the Seurat package 4.3.0 version 4.3.1 of R^[160]55. Expression data in this study are represented as logarithmic counts per 10,000. This representation is derived from unique molecular identifier (UMI) counts for all methods, except for Smart-seq2, where it is based on read count^[161]56. The data processing was carried out with the help of Seurat package 4.3.0 version 4.3.1 of R. Initially, we converted the raw dataset into a Seurat object. Quality control, normalization, and feature selection Three cells and a minimum of 200 features with nonzero counts were needed for the gene retrieval process to provide quality control^[162]55. By using a function that determines the percentage of counts derived from a given set of parameters, 51 mitochondrial quality control (QC) measures can be obtained^[163]57. The filtered data from the previous step was normalized^[164]58 to simplify the following analysis; a log transformation is applied to the previously processed data. To account for and minimize cell-specific bias, which could affect subsequent uses such as detection of differential gene expression, normalization of the data is necessary^[165]58. The purpose of normalization is to correct for measurement discrepancies between samples and/or features (genes, for example) that are caused by undesirable biological effects (batch effects, for example) or technical artifacts rather than biological effects of interest^[166]59. Next, we count how many features in the sample are highly variable from cell to cell, which means that those are particularly expressed in specific cells but not in the remains^[167]60. Cell clustering, annotation, and visualization When identifying groups of cells with similar patterns of gene expression, clustering is a useful technique. Initially, a graph-based clustering algorithm based on K-nearest neighbor (KNN) is used. Seurat v3 implements a clustering methodology based on graphs, this process is also used in previous studies^[168]61. Our approach to cell clustering involves using modularity optimization techniques, such as the Louvain algorithm, to systematically group cells together. The maximum standard modularity function is the aim^[169]62. The reference-based cluster annotation is implemented in a separate module SingleR^[170]62and started by converting the information from Seurat data to its format. Finally, annotation based on external references enables side-by-side