Abstract

   Limitations in cognitive functioning and adaptive behavior are
   hallmarks of Intellectual Disability (ID), a neurodevelopmental
   disease. Specific genetic disorders that result in ID can also have
   immune system anomalies, such as changes in T (CD4^+ and CD8^+) cell
   activity. This work aimed to compare single-cell RNA-sequencing
   (scRNA-seq) and transcriptome data to find biomarkers linked to T cells
   that could potentially be utilized for the diagnosis and assessment of
   ID. After integrating genes and performing a comparative analysis 196
   genes were identified as differentially expressed genes (DEGs).
   Furthermore, the DAVID online platform and FunRich software were
   utilized to detect signal transduction and translation, immune
   response, MHC (Major Histocompatibility Complex) class II, antigen
   processing and presentation, allograft rejection and important pathways
   of type I diabetes mellitus. In this investigation, six ribosomal
   proteins (RPS27A, RPS21, RPS18, RPS7, RPS5, and RPL9) have been
   identified as the hub genes of ID from PPI. Additionally, eleven
   topological algorithms discovered only one hub protein, namely RPS27A
   from the protein-protein interaction (PPI) network. Through the
   analysis of the regulatory network, we have identified several crucial
   transcriptional factors (TFs) including FOXC1, FOXL1, and GATA2;
   microRNAs such as mir-92a-3p, and mir-16-5p were investigated by
   procedural data analysis. This study used scRNA-seq and transcriptomics
   data analysis to define unique biomarkers associated with T cell types
   throughout the progression of ID. Ongoing research on the activity of
   ID genes is contributing to a greater understanding of the
   pathophysiology of ID and will become more scientific and
   research-based in future.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-025-85162-4.

   Keywords: Intellectual disability, Single-cell RNA-sequencing, T cell,
   Biomarkers, Hub genes, RPS27A, And FOXL1

   Subject terms: Computational biology and bioinformatics, Drug
   discovery, Systems biology, Biomarkers, Molecular medicine

Introduction

   Intellectual disability (ID) is defined by major obstacles in
   intellectual and adaptive skills that begin throughout stages of
   development^[40]1. Intellectual disability can be distinguished
   primarily by deficits in cognitive as well as adaptive operation; its
   dominance is estimated to range from 1 to 3% of the population with
   geographical variations^[41]2. This particular syndrome is commonly
   observed, with a prevalence of approximately 1.5% in Western countries,
   and potentially increasing to 4% in socioeconomically disadvantaged
   regions of the world^[42]3. The average occurrences of ID in Western
   nations are estimated between 1.5% and 2%, with a further 0.3–0.5%
   experiencing severe impairment, characterized by IQ levels below
   50^[43]4.

   The development and transmission of synapses is one of the biological
   functions that is frequently discussed. Therefore, it is evident that
   variations in genes connected to ID affect a variety of biological
   processes^[44]5. Centrosome function, protein modification, chromatin
   remodeling, transcriptional and translational regulation, and the
   development of neural and supporting nervous system cells are some of
   these activities^[45]6,[46]7. On the other hand, severe variants are
   believed to be the result of specific genetic reasons, chromosomal
   anomalies, or deficiencies in single genes. Considerable advancements
   have been achieved in understanding the genetic elements that produce
   severe ID over recent years. Approximately 15% of cases are caused by
   chromosomal abnormalities that are cytogenetically detectable^[47]4.
   There is no causal relationship that can be established between T cells
   and intellectual disability. However, people who have specific genetic
   disorders that result in intellectual disability can also have immune
   system anomalies, such as changes in T cell activity^[48]8. The
   vulnerability of an individual to infections and their ability to
   combat them may be influenced by the function of T cells. Altering
   CD8^+T cell expressing activation markers, autism patients have
   dramatically altered adaptive cellular immune function, which may be a
   reflection of behavioral abnormalities, developmental disorders, and
   defective immunological activation^[49]9.

   Mild types of ID are thought to be on the low end of the normal range
   for IQ, which is the consequence of numerous genetic as well as
   nongenetic elements. As the list of genes related to ID has expanded,
   numerous scientists have searched for patterns in the roles that these
   genes encode. With hundreds of genes currently recognized to
   participate in the phenotype, there is a great deal of genetic
   variability associated with ID^[50]10. Several academic articles have
   revealed biomarkers associated with ID. Notable X-linked genes include
   MECP2, which was first associated with Rett syndrome and is now
   responsible for several male and female-specific ID symptoms^[51]11.
   There have since been two reports of mutations in the DYNC1H1 gene in
   other ID patients^[52]12. The pathophysiology of various
   neuropsychiatric traits is also influenced by genes related to ID. Even
   with the latest advancements in gene discovery, only a small percentage
   of ID cases are still explainable. Significant clinical and genetic
   heterogeneity made large-scale genetic research difficult. Through
   computational analysis, a novel biomarker discovery is urgently needed
   for a better understanding of the molecular mechanisms of ID.

   RNA sequencing and next-generation sequencing (NGS) has contributed to
   the recent rapid progress in differential gene identification such as
   ID. Some recent studies^[53]13,[54]14 have used single-cell RNA
   sequencing (scRNA-seq) to determine disease pathogenesis and prognosis
   and have been commonly used to identify new biomarkers. We used
   scRNA-seq in this study to investigate transcriptional patterns at the
   level of gene expression^[55]15. Our objective was to identify key
   genes (KG), key microRNAs (miRNAs), and key transcriptional factors
   (TFs) that might be utilized for the personalized diagnosis and
   prognosis of ID. Therefore, the use of bioinformatics analysis to
   identify significant genes, miRNAs, TFs, and associated signaling
   pathways in the context of ID holds great potential for advancing
   future research in this field.

Results

Processing of collected datasets

   After reading the raw dataset, we created a Seurat object so that it
   could be further examined. We used initial quality control during the
   data read process to eliminate low quality cells with less than 200
   expressed features (genes) and lowly expressed features seen in less
   than three cells. We removed zero count instances and selected 10,000
   read counts from the complete gene expression matrix to illustrate gene
   expression patterns prior to and during normalization. Approximately
   2,000 significant features are usually found by Seurat for our dataset.
   We have identified 28 clusters employing the KNN algorithm. After
   cluster analysis, 28 scRNA-seq data clusters (have been annotated using
   reference-based annotation and represented in an UMAP plot (Fig. [56]1)
   Then five different sets of cell markers including T cells, B cells,
   natural killer cells, monocytes, and dendritic cells. Those five cell
   markers have provided multiple subsets. In addition, Supplementary
   Table [57]S1 displays the clustering information for 28 detected and
   annotated clusters. We have merged only seven groups (0, 1, 4, 6, 8, 9,
   and 10) of T cell related marker including Th1 cells (CD4 + Tcell),
   Non-Vd2 gd T cells, Mait cells, and Th1 cells.

Fig. 1.

   [58]Fig. 1
   [59]Open in a new tab

   UMAP shows the 28 clusters that were found. A significant number of
   related DEGs are shown by the region of the colors and various colors
   representing separate clusters. Generated using ggplot2 (v 3.5.0) in R
   (v 4.3.1).

Statistical analysis and identification of shared DEGs between scRNA-seq and
bulk RNA-seq datasets

   By performing several statistical operations, we have found 1318, 1417,
   1223, 1376, 1055, 1420, and 1246 DEGs from 0, 1, 4, 6, 8, 9, and 10
   number of clusters, respectively. After removing duplicate and merging
   all identified DEGs from seven T cell clusters, we have obtained 3510
   unique individual DEGs in ID patients. On the other hand, analyzing
   RNA-seq dataset ([60]GSE46831) through the GREIN database, we
   identified 3459 significant DEGs of ID applying a cut-off range
   (P < 0.05, and |log2FC| > 1). Performing cross-match analysis between
   the merged DEGs of the T cells marker of scRNA-seq and the RNASeq
   datasets ([61]GSE46831), total 196 shared DEGs were found for further
   analysis (Fig. [62]2). Among 196 shared DEGs, 102 genes were
   up-regulated, and 89 genes were down-regulated, showed in Table [63]S2.
   About five DEGs were found common in both up and down-regulated
   (HLA-DQA1, HLA-DRA, HLA-DRB1, CSNK2B, PPP1R18).

Fig. 2.

   [64]Fig. 2
   [65]Open in a new tab

   The Venn diagram represents the list of 196 common DEGs between T-cell
   clusters of scRNA-seq and bulk RNA-seq ([66]GSE46831) datasets;
   generated using Venny (v2.0.1).

Functional pathways enrichment analysis

   For the gene ontology (GO) pathways, a common 196 DEGs were used in the
   FunRich software. In the biological process (BP), both signal
   transduction and translation are enriched with 15.7% genes, cytoplasmic
   translation with 15.2%, and immune response with 9% genes. In the
   cellular component (CC), 27.7% of genes are enriched with extracellular
   vesicular exosome, 13.6% with the cytosolic ribosome, 12% with focal
   adhesion and 3.8% with the MHC class II protein complex. In molecular
   function (MF), 84.4% of genes are enriched with protein binding, 20.6%
   of genes are enriched with RNA binding and 15.6% with the structural
   constituent of the ribosome (Fig. [67]3).

Fig. 3.

   [68]Fig. 3
   [69]Open in a new tab

   Bar diagram, constructed with Funrich (v 3.4.1), illustrates the
   analysis of significant gene ontology (GO) pathways of ID. Based on the
   p-value (< 0.05), the top 10 pathways were included in the (A)
   biological process (BP), (B) cellular components (CC) and (C) molecular
   functions (MF) pathways, respectively.

   We have operated KEGG, Reactome, WiKi, and BioCarta for metabolic
   pathway enrichment analysis. Based on the p-value (cut-off
   value < 0.05) and the enrichment score; the most important pathways
   were taken in this study. In KEGG analysis, Fig. [70]4 shows that the
   main enriched pathways are allograft rejection (21%), type I diabetes
   mellitus (18%), and graft-versus-host disease (20%). The Reactome
   database shows the most enriched pathways for shared DEGs of T cells
   including viral mRNA translation (24.32%), eukaryotic translation
   elongation (24.32%), and peptide chain elongation (24.15%) as the most
   enriched pathways. In WiKi pathways, cytoplasmic ribosomal proteins
   (18.77%), allograft rejection (7.47%), Ebola virus infection in host
   cells, and the B-cell receptor signaling pathway are some of the most
   enriched pathways. In BioCarta, the most enriched pathways are antigen
   processing and presentation (23.05%), and BCR signaling pathway (5.97%)
   (Fig. [71]4).

Fig. 4.

   [72]Fig. 4
   [73]Open in a new tab

   The bubble graph represents the significant metabolic pathways of ID.
   Based on the p-value (< 0.05), the top 10 pathways, were included in
   the KEGG, Reactome, WiKi and BioCarta databases, respectively; where
   databases were accesses through DAVID (v 6.0) and visualized using the
   SRplot web server.

Identification of the hub protein from protein-protein interaction (PPI)
network

   In STRING, 196 shared DEGs of T cells were used in the PPI network. The
   analysis involves 191 nodes and 376 edges with an average degree of
   node of 3.94 with enrichment p-value < 1.0e-16. We analyze the physical
   sub-network under the confidence score 700 (high confidence)
   represented in Fig. [74]5. Based on the score, the most enriched 15
   genes were selected from each of eleven topological methods
   (betweenness, stress, bottleneck, eccentricity, radiality, EPC, MNC,
   closeness, degree, DMNC, and MCC) in the cytoHubba plugin of Cytoscape.
   Using these genes, we created an upset plot to represent the most
   significant hub genes (Fig. [75]6). We extracted the hub genes that
   cover seven or more methods had represented them in the Upset plot.
   Therefore, we found 6 genes that are considered key hub genes (KGs) for
   T cells, such as RPS27A, RPS18, RPS5, RPS7, RPS21, and RPL9. Among
   these, only RPS27A hub genes were found in 11 algorithms of the
   cytoHubba plugin. Furthermore, Table [76]1 represents the biological
   function of the 6 potential hub genes in the human body.

Fig. 5.

   [77]Fig. 5
   [78]Open in a new tab

   Visualization of the PPI network of 196 DEGs, where the sky-blue color
   indicates the hub genes, the network nodes symbolize target proteins,
   while the edges denote the relationships between proteins. STRING
   (v11.0) was employed to construct the network and Cytoscape (v3.10.1)
   for visualization.

Fig. 6.

   [79]Fig. 6
   [80]Open in a new tab

   Upset plot displaying hub genes identified across eleven cytoHubba
   topological algorithms, generated using the SRplot web server. RPS27A
   is present across all methods. The X-axis denotes the eleven cytoHubba
   algorithms, while the Y-axis represents interaction size.

Table 1.

   Evaluated the six key hub genes through a variety of regulatory
   processes.
   Gene symbol Gene Function Description Uniport ID
   RPS27A Ribosomal protein S27a The genetic sequence considered is
   responsible for the synthesis of a hybrid protein composed of ubiquitin
   at the amino (N) terminus and the ribosomal protein S27a at the
   carboxyl (C) terminus. [81]P62979
   RPS18 Ribosomal protein S18 Multiple processed pseudogenes of this gene
   are dispersed throughout the genome, which is a common characteristic
   observed in genes that encode ribosomal proteins. [82]P62269
   RPS21 Ribosomal protein s21 This protein is a member of the S21E
   ribosomal protein group and is found in the cytoplasm. [83]P63220
   RPS7 Ribosomal protein S7 This protein is a member of the S7E family of
   ribosomal proteins. [84]P62081
   RPS5 Ribosomal protein S5 A small subunit called 40 S and a large
   subunit called 60 S make up ribosomes, which are organelles that
   facilitate the process of protein synthesis. [85]P46782
   RPL9 Ribosomal protein L9 The gene in question is responsible for
   encoding a ribosomal protein, which serves as a constituent of the 60 S
   subunit. [86]P32969
   [87]Open in a new tab

Chord plot showing significant paths and key hub genes interaction

   The arcs that connect the elements in a chord plot, which are arranged
   radially as geometric chords, indicate how the elements interact. Data
   groupings are distinguished from each other using different arc colors.
   The most enriched GO terms (BP, CC, and MF) with HUBGs including
   cytoplasmic translation, cytosolic ribosome, structural constituent of
   the ribosome, translation, ribosome, and small ribosomal subunit as
   depicted in Fig. [88]7 have strong connections with the main targets of
   the ID. Furthermore, the top nine molecular pathways including
   ribosome, Coronavirus disease - COVID-19, eukaryotic translation
   elongation, the response of EIF2AK4 (GCN2) to amino acid deficiency,
   viral mRNA translation, peptide chain elongation, and cytoplasmic
   ribosomal proteins were related to the core ID targets as shown in
   Fig. [89]8.

Fig. 7.

   [90]Fig. 7
   [91]Open in a new tab

   Mapping the terms of the gene ontology and their association with the
   key hub genes in the chord plot, visualized through the chord plot
   module on the SRplot web server. Based on the log2FC value of key hub
   genes, different colors indicate the different pathways.

Fig. 8.

   [92]Fig. 8
   [93]Open in a new tab

   Mapping the most enriched molecular pathways and their association with
   the hub genes in the chord plot, visualized through the chord plot
   module on the SRplot web server. Based on the log2FC value of key hub
   genes, different colors indicate the different pathways.

TFs-hub gene and hub genes-miRNAs interaction networks

   Transcriptional and post-transcriptional regulatory networks were
   identified using network-based techniques to analyze the TF and miRNAs
   linkage networks of key hub genes. TFs and miRNAs are displayed as
   squares, and hub genes are displayed as circles shown in Fig. [94]9.
   Biomolecules including FOXC1, FOXL1, GATA2, TFAP2C, NR3C1, HINFP, and
   SREBF1 were the most significant TF regulators identified from the
   analysis of the JASPER database. We determined the quantity of seven
   miRNAs (hsa-mir-186-5p, hsa-mir-193b-3p, hsa-mir-93-5p, hsa-mir-16-5p,
   hsa-mir-92a-3p, hsa-mir-5011-5p, and hsa-mir-1277-5p) in our
   investigation from the miRTarbase database (Fig. [95]9). Furthermore,
   the biological function of reported biomolecules is represented in
   Table [96]2.

Fig. 9.

   [97]Fig. 9
   [98]Open in a new tab

   Interaction of key hub genes regulatory and therapeutic interaction
   network of transcription factors, microRNA, and chemicals. In the
   figure, the oval shape indicates key hub genes, TFs denote an angle
   shape, miRNAs denote a rectangular shape and a deep yellow color, and
   chemicals are differentiated by green colors and rectangular shapes.
   The interaction network was constructed with NetworkAnalyst (v 3.0) and
   visualized in Cytoscape (v3.10.1).

Table 2.

   Potential transcriptional and post-transcription regulatory
   biomolecules of ID.
   TFs Description Function
   Transcriptional regulatory biomolecules (TFs)
    FOXC1 Forkhead box The activity of DNA-binding transcription factors
   and the specificity of RNA polymerase II.
    GATA2 GATA binding protein 2 The activity of transcription factors
   that bind to DNA and their association with chromatin.
    FOXL1 Forkhead box L1 The transcription factor is essential for the
   appropriate proliferation and differentiation processes within the
   gastrointestinal epithelium.
    TFAP2C Transcription Factor AP-2 Gamma A DNA-binding protein with
   sequence specificity that engages with inducible viral and cellular
   enhancer elements.
    NR3C1 Nuclear Receptor Subfamily 3 Group C Member 1 Aids in the
   degradation of messenger RNAs (mRNAs) quickly by binding to their 5’
   untranslated regions (UTRs).
    HINFP Histone H4 Transcription Factor The transcriptional repressor
   exhibits binding affinity to the consensus sequence 5’-CGGACGTT-3’ as
   to well as the RB1 promoter.
    SREBF1 Sterol Regulatory Element Binding Transcription Factor 1 The
   precursor of the transcription factor form, known as processed sterol
   regulatory element binding protein 1 (SREBP-1) is localized within the
   membrane of the endoplasmic reticulum.
   miRNAs Description Function
   Post-transcription regulatory biomolecules (miRNAs)
    mir-16- 5p MicroRNA 16 Prevent replication of multiple viruses (EV71).
    mir-193b-3p MicroRNA 193b Demonstrates reciprocal interaction with MYC
   and inhibits the growth and metastasis.
    mir-93-5p MicroRNA 93 Contributes to the genesis of different
   diseases.
    mir-92a-3p MicroRNA 92a Reduces PTEN and prevents Eca-109 cells from
   being phosphorylated and inhibited by Akt.
    mir-186-5p MicroRNA186 Inhibit tumorigenesis of glioblastoma
   multiforme both in vitro and in vivo.
   [99]Open in a new tab

Protein chemical interaction analysis of key hub genes

   Protein-chemical interactions (PCI) are essential for understanding the
   functions of proteins that assist molecular mechanisms within the cell;
   this knowledge could be of great assistance in the process of drug
   discovery. In this investigation, the protein–chemical interaction
   networks of ID were identified. Thirteen potentially interrelated
   chemical compounds were identified, among them chloropicrin, sodium
   selenite, arsenic trioxide, estradiol, enzyme inhibitors, and cupric
   oxide are among the highly enriched chemical agents found
   (Fig. [100]9).

Performance evaluation by ROC curve analysis

   In this section, six hub genes and seven TFs were used in the ROC
   analysis. The area under the curve (AUC) in a ROC analysis serves as a
   performance benchmark. When the AUC score is between 0.5 and 1.0, the
   classifier is acceptable. In our test, the AUC score of the hub genes
   for [101]GSE7329 stays between 0.751 (RPS5) and 0.891 (RPS27A); in the
   [102]GSE25507 dataset, the range is 0.511 (RPS5) to 0.771 (RPS18) shown
   in Fig. [103]10a and b, respectively. TFs also have shown a significant
   AUC score in our test. In [104]GSE7329, the range of AUC values are
   0.436 to 0.818; where both FOXC1 and FOXL1 show the highest AUC value
   (Fig. [105]10c). In [106]GSE25507, the range is 0.519 to 0.844; where
   GATA2 shows the highest AUC value (Fig. [107]10d).

Fig. 10.

   [108]Fig. 10
   [109]Open in a new tab

   ROC curve of potential biomarkers (hub genes and TFs). The ROC curve of
   six Hub genes was represented in two GEO profiles (a) [110]GSE7329 and
   (b) [111]GSE25507. On the other hand, seven TFs were represented in two
   GEO profiles (c) [112]GSE7329 and (d) [113]GSE25507. The ROC curve in
   constructed via pROC package (v 1.18.5) in R (v 4.3.1).

Discussion

   Using a cutting-edge technique of data analysis technique, single-cell
   RNA sequencing (scRNA-seq) uncovered previously unreported biomarkers
   associated with intellectual disability (ID). In summary, our research
   has helped identify certain genes and pathways that undergo alterations
   in individuals with intellectual disability (ID), suggesting their
   potential relevance as diagnostic markers for this condition.

   Based on our raw dataset, the PBMCs clustered in 28 cellular subsets
   using the advanced tools of scRNA-seq ^[114]16. In our study, we
   observed consistent cell proportions for each cell type in the five
   experimental groups, namely Natural Killer cells, Naïve B cells, T
   cells, Monocytes and Dendritic cells. In the context of our clustering
   analysis, we designate the following cell types including Th1 cells
   (CD4 + Tcell), non-Vd2 gd T cells, Mait cells, and Th1 cells as T
   cells. Subsequently, a comparison was performed between the
   aforementioned genes and the DEGs identified in the RNA-seq dataset
   ([115]GSE46831).

   We analyzed the set of genes that show differential expression and were
   found to be shared by the cluster and datasets. Meanwhile, we discerned
   noteworthy Gene Ontology (GO) within the domains of biological process
   (BP), cellular component (CC), and molecular function (MF). The Gene
   Ontology (GO) technology is specifically designed to support the
   computational representation of biological systems which provides
   relevant information concerning the function of gene^[116]17. Based on
   our analysis, it has been determined that translation is among the
   prominent gene ontology (BP) concepts associated with biological
   processes. This particular process is responsible for initiating the
   translation of RNA molecules about specific biomolecules^[117]18. The
   process of small subunit biogenesis holds significant importance in
   individuals with intellectual disability, as evidenced by previous
   research^[118]19. The B cell receptor signaling pathway has been
   suggested to exhibit potential connections with gene mutations
   associated with intellectual disability^[119]20. In the context of form
   CC, it is observed that the most enriched terms exhibit a strong
   correlation with the ID. The binding of major histocompatibility
   complex (MHC) class II protein complexes and the activity of MHC class
   II receptors have been found to indicate a significant correlation with
   immune disorders. Insufficiency of this particular protein has been
   identified as a potential etiological factor that can contribute to
   intellectual disability (ID)^[120]21. Furthermore, there is a
   substantial correlation between 5’-UTR mRNA binding and RNA binding,
   particularly in the context of fragile X syndrome, a specific type of
   intellectual disability (ID)^[121]22. Certain genes associated with
   autism spectrum disorder (ASD) can cause an inappropriate immunological
   response, which is also related to intellectual disability
   (ID)^[122]23.

   In contrast, we discovered the KEGG, Reactome, WiKi, and BioCarta
   pathways. We found that the most important KEGG pathways were related
   to allograft rejection, graft-versus-host disease, asthma, type I
   diabetes mellitus, autoimmune thyroid disease, and coronavirus disease.
   A recent study shows that people with ID are at increased risk for
   respiratory problems and asthma and demand special care from a trained
   caregiver^[123]24. The risk of diabetes is prevalent within the
   population of people with intellectual disability (ID)^[124]23. There
   is a documented correlation between autoimmune thyroid disease and
   individuals with intellectual disabilities, particularly in the
   pediatric population^[125]25. Individuals with intellectual
   disabilities are more likely to contract coronavirus disease 2019
   (COVID-19) and have negative consequences from it^[126]26. The primary
   expression of Reactome pathways was found in SARS-CoV-1, which affects
   the host translation machinery, CD22-mediated BCR control, ZAP-70
   translocation to the immunological synapse, selenocysteine synthesis
   and peptide chain elongation. The results indicate that selenium is
   synthesized and incorporated into selenoproteins. SEPSECS mutations
   have been associated with selenium insufficiency for severe
   intellectual disabilities^[127]27. We discovered that significant
   pathways were largely expressed mainly in the B cell receptor complex,
   allograft rejection, antigen processing and the presentation, and BCR
   signaling pathway using the WiKi pathway. A large number of biological
   activities, including signal transduction and transcriptional control,
   are based on protein-protein interactions^[128]28. The PI3K/AKT/mTOR-
   vitamin D3 signaling pathway exhibited notable enrichment within the
   BioCarta pathway. A comprehensive study has revealed the pathogenic
   involvement of mTOR signaling in various neurological disorders,
   including epilepsy, autism (ASD), intellectual disability (ID),
   dementia, traumatic brain injury, brain tumors, and hypoxic-ischemic
   injury^[129]29.

   A PPI network facilitates the formation of protein complexes and
   regulates many cellular processes such as signaling, regulation, and
   transport^[130]30. In our genomic analysis, we successfully identified
   six highly connected central hub genes, namely RPS27A, RPS21, RPS18,
   RPS7, RPS9, and RPL9. This identification was achieved by utilizing
   eleven distinct computational methods available within the Cytoscape
   CytoHubba plug-in. All of our hub genes are related to the ribosomal
   protein (RP) family. These RPs play a crucial role in various
   biological processes, including ribosome biogenesis, protein synthesis,
   cellular growth, development, and programmed cell death
   (apoptosis)^[131]31. Immunometabolism is the process by which immune
   cell activity and metabolic pathways are combined, and ribosomal
   proteins are involved in this process. During activation, T cells
   undergo metabolic reprogramming, changing from oxidative
   phosphorylation to glycolysis. This needs increased ribosome biogenesis
   and function^[132]32. Certain ribosomal proteins have direct
   interactions with RNA molecules or affect transcription factors’
   activities, regulating the expression of genes essential for T cell
   growth, differentiation, and function^[133]33. Ribosome biogenesis
   plays a crucial role in regulating cell development and proliferation.
   Dysregulation of this process can lead to abnormal cell growth and
   pathological conditions such cancer and metabolic diseases^[134]32.
   Some metabolic conditions, especially those that affect the nerve
   system or brain, might cause behavioral problems, cognitive problems,
   or developmental delays that resemble autism spectrum
   disorders^[135]34.

   A study related to perisylvian polymicrogyria shows that RPS27A has
   been highly enriched when genes were extracted from the ID
   patient^[136]35. RPS27A dysfunction impacts ribosome assembly, protein
   synthesis, and the Ubiquitin-Proteasome System (UPS), resulting in
   cellular stress and immunological dysfunction. These impacts may affect
   neurodevelopment, potentially leading to ID^[137]36. In our study,
   RPS6, although not classified as one of our central genes, exhibited
   expression in T-cell genes. Multiple studies have discovered a
   significant increase in RPS6 expression among individuals diagnosed
   with intellectual disability (ID)^[138]37. Numerous investigations have
   substantiated the involvement of the aforementioned protein in the
   translation of 5’ terminal oligopyrimidine tract (TOP) mRNAs, alongside
   its role in governing cell size and proliferation regulation^[139]38.
   RPS7 mutants have active matrix metalloproteinase (MMP) family genes,
   suggesting that improper cell migration may be occurring, leading to
   further dysfunctional development and RPS5 knockdown leads to brain
   abnormalities in Zebrafish^[140]39. Further investigation may reveal
   that mutations in RPS5 and RPS7 could be biomarkers of ID. According to
   a recent study, significant changes in the mRNA levels of RPL9 in both
   individuals with autism spectrum disorder (ASD) were found to be
   consistent in the same direction through quantitative polymerase chain
   reaction (qPCR) validation^[141]40. RPL10 is present in people with
   intellectual impairment, dysmorphism, and autism^[142]41. The two
   additional hub genes, RPS18 and RPS21, exhibit limited significance in
   the identification process. RPS18 has been used as the reference gene
   in certain studies on autism spectrum disorder (ASD) as the
   housekeeping gene^[143]42. It is concluded that T cells are essential
   for the immune response and play an important role in
   neuroinflammation. Excessive T cell activation can cause prolonged
   neuroinflammatory states, disrupting normal brain development and
   function. Disruptions to neuronal growth, synapse formation, and
   cognitive processes can lead to ID^[144]43.

   TFs play an important role in controlling transcription, which means
   that their amounts can be used to find people with diseases like ID.
   Our study shows that the TFs that regulate DEGs through the TFs-Hub
   genes interaction network play a key role in the development of ID. The
   TFs are FOXC1, FOXL1, GATA2, TFAP2C, NR3C1, HINFP, and SREBF1 were
   identified in our study. Two separate case report studies have
   established a correlation between the presence of FOXC1 on ring
   chromosome 6 and intellectual disability (ID), short stature, and
   various facial deformities^[145]44. The whole genome a GATA2 mutation
   produces a rare Syndromic Congenital Neutropenia with ID, according to
   sequencing^[146]45. FOXL1 is a Forkhead box protein (FOX) whose
   dysregulation encourages the Wnt/b-catenin signaling pathway^[147]45.
   There is a potential correlation between TFAP2C and the development of
   intellectual disability, as suggested in previous research^[148]46.
   Nuclear receptor subfamily 3, group C, member 1 (NR3C1) has been
   identified by the ingenuity pathway analysis (IPA) as an upstream
   transcriptional regulator for ID^[149]47. Previous research shows that
   HINFP and SREBF1 do not have any significant relevance in studies
   related to intellectual disability (ID). However, it is suggested that
   these factors may be of value in future investigations.

   Using the miRTarBase v8.0 database, we investigated the interaction
   between miRNAs and hub genes. MicroRNAs control the
   post-transcriptional regulation of gene expression, as opposed to
   transcription factors, making them a potentially valuable tool for
   diagnostic testing and biomarker research^[150]48. Based on topological
   analysis, the following miRNAs have been identified as significant
   including hsa-mir-186-5p, hsa-mir-193b-3p, hsa-mir-93-5p,
   hsa-mir-16-5p, hsa-mir-92a-3p, hsa-mir-5011-5p, and hsa-mir-1277-5p.
   The microRNA mir92a-3p shows a significant correlation with the
   structure and function. Furthermore, it has been recognized as a
   biomarker in peripheral blood for schizophrenia^[151]49. Furthermore,
   dysregulation of mir-92a-3p was found in peripheral blood of patients
   in a study on gene regulation linked to autism in the Chinese
   population^[152]50. Furthermore, miR-16-5p has previously been
   described in rat neurons as a negative regulator of dendritic
   complexity and mediates BDNF-induced dendritogenesis by regulating the
   translation of the BDNF mRNA itself, supporting the hypothesis that
   miR-16-5p plays a role in neuronal development^[153]51.

   We identified chloropicrin, sodium selenite, arsenic trioxide,
   estradiol, enzyme inhibitors, and cupric oxide via protein-chemical
   interaction analysis. Some studies have indicated that sodium selenite
   treatment improves cognitive performance in triple transgenic
   Alzheimer’s disease mice by reducing lipid peroxidation^[154]52. A
   retrospective cohort study was conducted to investigate the association
   between soil arsenic concentration and intellectual disability (ID) in
   pregnant women. The findings revealed that elevated soil arsenic
   concentration was only significantly associated with increased odds of
   diagnosed ID during the first trimester of pregnancy^[155]53. In the
   final phase, we evaluated the diagnostic effects of our biomarkers with
   the help of ROC analysis. Our predicted biomarkers (hub genes and TFs)
   have shown decent performance during analysis. Our atlas offers a
   valuable framework for improving our understanding of this intricate
   immune-mediated disease. Despite several limitations including, lower
   number of dataset and human-mediated sample, excluding cell line data,
   total sample number less than ten, and wet laboratory validation of
   potential biomarkers, we provide a thorough overview of the variations.
   In summary, we believe that our approach is scientifically valid and
   provides meaningful insights into the role of CTCF mutations in
   Intellectual Disability. As a result, we can serve as a resource to
   guide future studies aimed at identifying the most effective approach
   for managing or potentially curing disease.

Materials and methods

   A graphical representation of the combined system biology and
   analytical technique approach to identify biological markers and
   pathways in the blood tissues of ID patients is presented in
   Fig. [156]11.

Fig. 11.

   [157]Fig. 11
   [158]Open in a new tab

   Graphical illustration of the workflow in our study, constructed with
   Adobe Illustrator CC (v 2019).

Parameters of single cell RNA-seq dataset analysis

Collection and pre-processing

   The “Single Cell Portal” was used to access publicly available
   scRNA-seq data (PMID: 28212749), which was collected from patient
   peripheral blood (Single Cell Comparison: PBMC data - Single Cell
   Portal (broadinstitute.org))^[159]54. Data processing was performed
   with the help of the Seurat package 4.3.0 version 4.3.1 of R^[160]55.
   Expression data in this study are represented as logarithmic counts per
   10,000. This representation is derived from unique molecular identifier
   (UMI) counts for all methods, except for Smart-seq2, where it is based
   on read count^[161]56. The data processing was carried out with the
   help of Seurat package 4.3.0 version 4.3.1 of R. Initially, we
   converted the raw dataset into a Seurat object.

Quality control, normalization, and feature selection

   Three cells and a minimum of 200 features with nonzero counts were
   needed for the gene retrieval process to provide quality
   control^[162]55. By using a function that determines the percentage of
   counts derived from a given set of parameters, 51 mitochondrial quality
   control (QC) measures can be obtained^[163]57. The filtered data from
   the previous step was normalized^[164]58 to simplify the following
   analysis; a log transformation is applied to the previously processed
   data.

   To account for and minimize cell-specific bias, which could affect
   subsequent uses such as detection of differential gene expression,
   normalization of the data is necessary^[165]58. The purpose of
   normalization is to correct for measurement discrepancies between
   samples and/or features (genes, for example) that are caused by
   undesirable biological effects (batch effects, for example) or
   technical artifacts rather than biological effects of interest^[166]59.
   Next, we count how many features in the sample are highly variable from
   cell to cell, which means that those are particularly expressed in
   specific cells but not in the remains^[167]60.

Cell clustering, annotation, and visualization

   When identifying groups of cells with similar patterns of gene
   expression, clustering is a useful technique. Initially, a graph-based
   clustering algorithm based on K-nearest neighbor (KNN) is used. Seurat
   v3 implements a clustering methodology based on graphs, this process is
   also used in previous studies^[168]61. Our approach to cell clustering
   involves using modularity optimization techniques, such as the Louvain
   algorithm, to systematically group cells together. The maximum standard
   modularity function is the aim^[169]62. The reference-based cluster
   annotation is implemented in a separate module SingleR^[170]62and
   started by converting the information from Seurat data to its format.
   Finally, annotation based on external references enables side-by-side