Abstract Background Type 2 diabetes mellitus (T2DM) is a heterogeneous, metabolic, and chronic condition affecting vast numbers of the world’s population. The related variables and T2DM associations have not been fully understood due to their diverse nature. However, functional genomics can facilitate understanding of the disease. This information will be useful in drug design, advanced diagnostic, and prognostic markers. Aim To understand the genetic causes of T2DM, this study was designed to identify the differentially expressed genes (DEGs) of the disease. Methods We investigated 20 publicly available disease-specific cDNA datasets from Gene Expression Omnibus (GEO) containing several attributes including gene symbols and clone identifiers, GenBank accession numbers, and phenotypic feature coordinates. We analyzed an integrated system-level framework involving Gene Ontology (GO), protein motifs and co-expression analysis, pathway enrichment, and transcriptional factors to reveal the biological information of genes. A co-expression network was studied to highlight the genes that showed a coordinated expression pattern across a group of samples. The DEGs were validated by quantitative PCR (qPCR) to analyze the expression levels of case and control samples (50 each) using glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as the reference gene. Results From the list of 50 DEGs, we ranked three T2DM-related genes (p < 0.05): SRR, NFKB1, and PDE4B. The enriched terms revealed a significant functional role in amino acid metabolism, signal transduction, transmembrane and intracellular transport, and other vital biological functions. DMBX1, TAL1, ZFP161, NFIC (66.7%), and NR1H4 (33.3%) are transcriptional factors associated with the regulatory mechanism. We found substantial enrichment of insulin signaling and other T2DM-related pathways, such as valine, leucine and isoleucine biosynthesis, serine and threonine metabolism, adipocytokine signaling pathway, P13K/Akt pathway, and Hedgehog signaling pathway. The expression profiles of these DEGs verified by qPCR showed a substantial level of twofold change (FC) expression [MATH: (2− ΔΔCT) :MATH] in the genes SRR (FC ≤ 0.12), NFKB1 (FC ≤ 1.09), and PDE4B (FC ≤ 0.9) compared to controls (FC ≥ 1.6). The downregulated expression of these genes is associated with pathophysiological development and metabolic disorders. Conclusion This study would help to modulate the therapeutic strategies for T2DM and could speed up drug discovery outcomes. Keywords: cDNA datasets, T2DM, differential expressed genes, enrichment analysis, qPCR, expression profiling Introduction Type 2 diabetes mellitus (T2DM) is a metabolic and complicated condition characterized by insulin resistance and insulin deficiency due to hyperglycemia ([31]1). Currently, about 285 million individuals are estimated to have type 1 and 2 diabetes, which makes up around 90% of the total ([32]2). Tissues including the pancreas, liver, skeletal muscle, adipose, and intestinal tissues have a role in its progress ([33]3), and despite this, several critical variables such as lifestyle, food, obesity, and genetic factors were identified in insulin and T2DM development ([34]4–[35]6). The fundamental mechanisms are still unclear, and due to its high morbidity and increased clinical impact, this disease has become an extremely severe global issue ([36]2). Due to the multifaceted mechanisms and diverse contributory factors of T2DM, there is an urgent need to identify vital biological molecules as potential therapeutic targets and improve the treatment strategies. Different oral and injectable treatments for T2DM are available; however, the principal difficulties associated with the use of these medications are due to a lack of effectiveness, hypoglycemia, weight gain, and increased susceptibility to infections. It is thus a responsibility of the scientist to study the causes and the mechanisms of the disease to properly manage the risks. Recent developments have shown that diabetes is diverse, with complicated genetic causes. Many studies have demonstrated the significance of various genes in the etiology of disease. Most genes participating in T2DM have been recognized as prospective therapeutic targets; nevertheless, the limited knowledge of the complexity and interaction of these systems has been a significant problem in the development of effective T2DM treatments. Most cases of diabetes include several genes, each of which contributes to an increased risk of type 2 diabetes ([37]2), and similar genes related to T2DM poorly show recognized pathways of insulin signaling ([38]5). The present techniques for finding statistically significant functional classes of genes associated with T2DM have detected regulatory cell cycles ([39]7, [40]8). However, in T2DM and in the molecular biology of insulin resistance, the functional categories and the potential therapeutic role of the expressed genes have not been fully understood ([41]8). There are, therefore, still substantial gaps in the clinical outcomes, encouraging scientists to seek further improvements in each of these issues. The differentially expressed genes (DEGs) associated with T2DM, such as SRR, NFKB1, and PDE4B, were examined in cases and controls to identify the genetic causes of the disease. Based on quantitative PCR (qPCR)-validated genome- and transcriptome-level studies, CHL1, LRFN2, RASGRP1, NFKB1, and PPM1K have been substantially linked with insulin secretion and T2DM. In non-diabetic individuals, the influence of genetic diversity on continuous hyperglycemia events has largely shown that insulin secretion has been disturbed ([42]9). Systems biology based on genome-wide association studies (GWAS) reveals the genetic reasons for T2DM pathogenesis and insulin resistance ([43]9). Likewise, aberrant glucagon secretion resulted in inflammation of the islets in T2DM, and interleukin-6 has been found to stimulate secretion ([44]10). Quantitative real-time PCR (qRT-PCR) is a method of choice in gene expression profiling and follow-up validation based on its high accuracy rate with an increased dynamic range and sensitivity ([45]11, [46]12). An important technique to observe relative gene expression changes in qPCR studies is 2^-ΔΔC[T] . This approach is useful in assessing the relative levels of gene expression among various samples by using the qPCR technology directly to compute the threshold cycle (CT) ([47]13). Genome expression in insulin signaling and integrated pathology can alter any of these genes that might develop clinically important insulin resistance and diabetes ([48]2). The systems biology method may combine these biological networks and assist in uncovering important etiological factors. As genes are crucial to a better understanding of the network of systems biology, complementary DNA (cDNA) microarray technology is an effective tool to simultaneously analyze the expression levels of thousands of genes. The significant number of expression datasets in the public field provides a valuable source of genome-wide information on T2DM and gives an opportunity to investigate the expression of a large number of samples. This work aimed to discover the insulin resistance and T2DM genetic variations by qPCR experimental validation. System-level analysis screened out the SRR, NFKB1, and PDE4B genes from cDNA datasets. qPCR analysis examined the dysregulation of these genes and their pathological role in T2DM. These findings would help in understanding the genetic basis of the disease and will update therapeutic strategies against T2DM. Materials and Methods Ethical Approval and Collection of Blood Samples We collected a total of 100 individual blood samples with an equal ratio of control/cases (n = 50) from the local hospital in Multan, Punjab Province. For qPCR analysis, blood samples were collected from type 2 diabetic patients and healthy individuals based on inclusion and exclusion criteria. The approval of the study and informed consent were obtained from the Research Ethics Committee of the Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan (ref. no. IMBB/2019/002). Inclusion and Exclusion Criteria We used 1) pathologically confirmed cases of T2DM; 2) new patients diagnosed by the Nishtar Hospital, Multan, for the first time; 3) patients aged ≥18 years; 4) patients who agreed to provide blood samples for scientific research and consent for the publication of research data; and 5) healthy individuals with no history of diabetes, cancer, and cardiovascular diseases, who were included as controls. The exclusion criteria were: 1) pathologically confirmed local vascular invasion; 2) cases with multiple and complex diseases; 3) cases with cancer or immune disorders; and 4) cases with a history of surgery in the past 3 years. Normalization and Differential Expression Analysis Twenty T2DM-related cDNA expression datasets from the NCBI Gene Expression Omnibus (GEO) database were downloaded ([49] Supplementary Table S1 ). DEGs were identified using the Affymetrix U133 Plus 2.0 Array platform and a hgu133plus2 annotation probe. The R platform with Affy, AffyQCReport, AnnotationDbi, Annotate, Biobase, Lima, and hgu133a2cdf Bioconductor packages were used to analyze the computable outcomes. AffyRNAdeg, summary AffyRNAdeg, and plotAffyRNAdeg were used for RNA degradation analysis and for checking the quality of RNA samples. Perfect match (PM) and mismatch (MM) were measured using the Robust Multi-Array Average (RMA) normalization approach, and normalization of the microarray datasets was utilized for comparison. These datasets were grouped into recognizable pheno-data files that include the accession numbers, sample types, number of samples, disease, and clinical conditions ([50]14). RMA was applied for noise reduction from local signals ([51]15, [52]16). RMA has been a frequently used method to generate an expression matrix from Affymetrix data. The raw expression values of genomic data were background corrected, log2 transformed, and then quantitatively standardized, followed by the linear model being applied to acquire an expression measure for each probe set on each cDNA array. Normalization was carried out to observe the perfect matches through the median level. For normalization and background correction, PM and MM, the following equation was applied: [MATH: PMij< mi>k= BGijk+Sijk :MATH] where PM indicates a perfect match, BG is “background,” S is nonspecific binding, and ijk represents the signal for probe j for the probe set k on microarray i. [MATH: BG(PM ijk)=E[Sijk< /mi> |PMijk]>0 :MATH] [MATH: Sijk∼Exp(λijk< /mi>) BGijk∼N(βi,σ2) :MATH] In differential analysis, the PM highlights the communal signals of background (BG) and expression (E). The “ArrayQualityMetrics” Bioconductor package was used to measure the quality of the samples, indicating the median expression level. N(β[i] , σ ^2) is the normal distribution involving BG[ijk] . In this case, β[i] is the mean regression parameter that assesses the independent variable gene expression i. σ ^2 is the least squares regression representing the dependent variables based on the linear model. λ[ijk] is a distribution rate parameter that reads the transcription of lane k to gene j from sample i. The gene square matrix for each dataset was calculated across the microarrays and missing values were ignored. For a probe set, we utilized the RMA method to compute a summary of the values among samples. [MATH: Xnorm =F2−1(F1(x)) :MATH] where F [1] and F [2] show the distribution functions of the case and the reference DNA chips, respectively. In this study, we detected the T2DM-related DEGs from each cDNA dataset by pairwise comparison ([53]17). DEGs and duplicated spots with quality signals were shortlisted. The statistical parameters were recorded and the genes were ranked. False discovery rate (FDR) <0.05, p-value ≤0.05, average expression level (AEL) ≥40%, and an absolute log-fold change (LFC) >1 were used as significant cutoff values ([54]18, [55]19). K-Fold Cross-Validation We applied k-fold cross-validation to assess accuracy in the differential comparison using the Bioconductor “boot” package, and this method had the edges eventually using all samples for the training and test datasets ([56]18). The k-fold approach is usually less biased in comparison with other methods because it ensures that all observations from the original dataset are shown in the training and test sets. In molecular studies, bootstrapping has been successfully used to correct mismatches and background noise ([57]20). The generalized Gaussian linear model was used to check the k-fold cross-validation with the cv.glm technique. It calculates the true error as the average error. [MATH: E=1/k∑i=1kEi< /mstyle> :MATH] The Gaussian rule is applied based on leave-one-out-cross-validation (LOOCV). In this case, the LOOCV procedure is considered the test set while the remaining data are used as a training set. LOOCV is a modified k-fold, where k = N and k(i) = i. For training and other testing, we applied N number of subsets. Increasing the number of samples would decrease errors and improve the validity ([58]18). For the validation and assessment of errors, the following formula was used: [MATH: E=1/N∑i=1NEi< /mstyle> :MATH] Gene Ontology and Pathway Enrichment Analysis Gene Ontology (GO) is a common categorization method used to define the key signaling pathways for the biological and molecular activities of DEGs and cellular components. To recognize the function and biological pathways of the T2DM-related DEGs, GO analysis was carried out using a GOnet online web server that shows the gene product and its biological functions ([59]21). The pathway enriched terms (p < 0.05) were analyzed using the FunRich tool, version 3.1.3 ([60]22). Identifying Transcription and Regulatory Motifs To develop a connected transcription network consisting of transcription factors (TFs) and other signaling molecules, we anticipated the possible regulators of T2DM-related DEGs using the FunRich online tool. These gene regulators have different biological and pathological roles ([61]23). The motifs are signatures of protein families that define the link between the secondary structural components of proteins, and, in every instance, the spatial sequences of the amino acid residues encoded by genes may be comparable in any order. We used the Motif Search online tool to analyze the relationship between the primary sequence and the tertiary structure of proteins to predict protein functions ([62]24). Mutation Analysis Mutations resulting from inherited and non-inherited diseases can be understood to decode genetic variations by the association of genotype with phenotype. There are hundreds of single nucleotide variations (SNVs) in the human genome, and many are known to develop diseases. Nearly 21% of the amino acid variations cause single nucleotide missense mutations at different protein sites (known as posttranslational modifications) that are responsible for disease development. Therefore, chemical alterations of amino acids mostly affect the functions of proteins. The online ActiveDriverDB tool was used to analyze the mutations of T2DM-related DEGs ([63]25). In this analysis, the needle plot shows a visual summary of the location, frequency, and functionality of all alterations found in our DEGs. Posttranslational modification (PTM) sites with all changes and the predicted disordered regions of the protein sequences were analyzed. The position of the pins in the plot correlates with the gene and protein sequences, while the associated mutation effect and PTMs may be observed in the figure legend. Protein Co-Expression Network Analysis Co-expression network analysis enables the identification of protein modules whose biological activities are characterized by expression patterns. These networks are the most versatile for probing diseases. It correlates at the transcript level, although it may also be used to analyze correlations at all biological levels. The STRING database version 11.0 ([64]26, [65]27) was used to study the protein–protein interactions (PPIs) of the top-ranked T2DM-related DEGs, and a protein co-expression network was constructed to highlight the important gene signatures directly or indirectly interacting with the DEGs. The protein network based on neighborhood scoring was constructed with high confidence (score > 0.99) ([66]26). In this co-expression network, the RNA levels and protein regulation were analyzed and annotated keywords (FDR < 0.05) were studied. RNA Extraction and Quantification Each 300 μl blood sample was transferred to a 700-μl triazole-containing Eppendorf tube (1.5 ml). These tubes were homogenized and incubated gently at 25°C for 5 min. Then, 400 µl of chloroform was added and kept for 3 min, followed by centrifugation at 12,000 rpm for 10 min at 4°C for phase separation. Sequentially, the aqueous top layer was taken into a new tube while keeping it on ice with equal proportions of isopropyl alcohol. These tubes were retained on ice at −20°C for 10 min in a horizontal position to precipitate RNA, followed by centrifugation at 12,000 rpm (4°C) for 10 min while the supernatant was discarded. The pellet was washed twice and air dried for 5 min using 1 ml of 70% ethanol at 7,500 rpm. Forty microliters of RNase-free water was added and then RNA was stored at −80°C ([67]28) using an RNA stabilizer. Finally, RNA was quantified at 260, 280, and 320 nm by Nanodrop (Skanit RE 4.1, Thermo Scientific, Waltham, MA, USA) ([68]29). cDNA Synthesis The isolated RNA was converted into cDNA using a cDNA synthesis kit (Vivantis cDSK01-050). As per the manufacturer’s protocol, each RNA primer of the DEGs was co-mixed with 10 µl of the cDNA synthesis mix. After centrifugation at 10,000 rpm, the samples were incubated at 40°C for 60 min. The tubes were then incubated at 85°C for 5 min to terminate the reaction. Finally, the tubes were chilled on ice and centrifuged at the same conditions. The synthesized cDNA was directly used for further analysis ([69]29, [70]30). Quantitative Real-Time PCR Analysis The 260:280 ratios between 1.5 and 2.7 indicated high-quality RNA with 800–1250 ng/μl quantity. RNA quantification presented a substantial level for further cDNA synthesis. The relative expressions of the T2DM-related genes SRR, PDE4B, and NFKB1 were studied and estimated based on the relative [MATH: 2−ΔΔ< mtext>CT :MATH] method. CT values were obtained by absolute quantification presenting quality and significant expression at the real level. We used PrimerBank, an online server, to design the primers of the T2DM-related DEGs ([71]31) ([72] Table 1 ). For optimization, gradient PCRs were carried out using a Galaxy-XP thermal cycler (Bioer, Hangzhou, China) at standard conditions. To validate the DEGs of T2DM, qRT-PCR was performed using MIC-PCR (Bio Molecular Systems, Upper Coomera, Australia) at optimized conditions ([73]32). A final volume of 10 µl of the reaction mixture was prepared using 2.6 µl of cDNA (1:10), and 5 µl of the SYBR Green Master Mix and 0.4 µl of each gene-specific reverse and forward primer were added. The final composition of the reaction mixture was kept for DEGs and the reference genes. GAPDH was used as the internal reference, and a two-step qPCR procedure was applied to measure the expressions of the test and reference genes. The relative expression level of each differential gene was calculated with the GAPDH expression level as a “1” standard value. The qPCR cycling conditions were as follows: denaturation at 95°C for 12 min, 40 cycles of 95°C for 15 s, 57°C for 20 s, and 72°C for 20 s. A final extension step was carried out at 72°C for 10 min ([74]33). In amplification, during each cycle, the amplified products were doubled in an exponential form. The PCR products were analyzed by melting the curve graph to assess the dissociation characteristics of double-stranded DNA during heating and to observe the absorbance intensity ([75]34). C [T] is a logarithmic value converted to a relative quantity ([76]32), and the average C [T] values were measured for both DEGs and the reference gene. In the next step, ΔC [T] (delta threshold) was calculated for the target and the reference gene, and the relative expressions of the T2DM-related DEGs were studied and estimated based on the relative 2^−ΔΔC[T] formula. Table 1. Primer sequences and amplicon size of the differentially expressed genes SRR, NFKB1, and PDE4B used in the qPCR reaction. Gene symbol Probe ID Protein name Forward primer Reverse primer Amplicon size (bp) SRR 219204_s_at Serine racemase ATGTGTGCTCAGTATTGCATCTC AAGATTGCGCCCTGTTAGTTG 126 NFKB1 201502_s_at Nuclear factor NF-kappa-B p105 AACAGAGAGGATTTCGTTTCCG TTTGACCTGAGGGTAAGACTTCT 104 PDE4B 203708_at cAMP-3′,5′-cyclic phosphodiesterase 4B AACGCTGGAGGAATTAGACTGG GCTCCCGGTTCAGCATTCT 110 GAPDH (reference gene) – Glyceraldehyde-3-phosphate dehydrogenase GGAGCGAGATCCCTCCAAAAT GGCTGTTGTCATACTTCTCATGG 197 [77]Open in a new tab [MATH: ΔCT=C< /mi>Tvalue of target gene−CTvalue of the internal reference gene (GAPDH)< /mrow> :MATH] The ΔΔC [T] (delta–delta threshold) specifies the differences between the expression levels of the T2DM-related DEGs and the reference gene ([78]35). Finally, [MATH: 2−ΔΔ< mtext>CT :MATH] was calculated, indicating the fold difference of the expressions of DEGs from that of the control ([79]36). We assessed the absolute correlation of these DEGs to show the complete expression levels. Hierarchical clustering analysis of genes to illustrate the expression pattern was observed ([80]37) using the online one-matrix CIMminer tool ([81]38). Results Normalization of Gene Expression Data and Cross-Validation To study the T2DM-related DEGs, we retrieved 20 publicly available Affymetrix cDNA datasets. Each dataset contains a number of distinct samples and genes obtained by messenger RNA (mRNA) expression profiling utilizing several Affymetrix T2DM platforms. The data were normalized missing values were corrected, and the normalized distance between the DNA chip array and the different arrays of each dataset reveals the quality of the arrays for a medium expression level. The genegene covariance matrix in all arrays of every dataset was calculated by removing the missing values to determine whether the arrays were on the same level. A quality histogram depicting the normalized intensity arrays of the entire DNA chip was constructed during the quantile standardization. Array intensity distributions show boxplots representing summaries of the signal intensity distributions of the arrays and normalization with similar positions and widths. The patterns revealed the distribution of the arrays having similar shapes and ranges ([82] Figure 1 ). From the list of 50 DEGs ([83] Figure 2 ), we identified and selected three DEGs of T2DM—SRR, NFKB1, and PDE4B—based on the FDR (<0.05), p-value (≤0.05), and logFC (>1) parameters. Figure 1. [84]Figure 1 [85]Open in a new tab Array intensity distributions showing boxplots representing summaries of the signal intensity distributions of the arrays and normalization with similar positions and widths. Each box corresponds to one array. Outlier detection was performed by computing the Kolmogorov–Smirnov K [a] statistic between each distribution of the array and the distribution of the pooled data. Figure 2. [86]Figure 2 [87]Open in a new tab Differential expressions of 50 genes obtained from microarray cDNA datasets. The heat map indicates the differential expressions of these genes in each type 2 diabetes mellitus (T2DM)-related dataset. An automated method was employed for comparing biologically similar groups in pairs. We excluded any subgroup without recurrence comparison, accuracy, and differential analysis verification and assessed errors of the cross-validation using the generalized linear model “cv.glm.” This generalized linear model normalizes the linear regression, enabling the linear model to be associated with the dependent variables through a correlation and allowing a function of the anticipated value of the variance of each measurement. The Gaussian dispersion criterion was 0.00459, indicating the degree of confidence ([88] Table 2 ). LOOCV is a type of cross-validation technique where each observation is treated as a validation set and the remaining observations (N − 1) as a training set. The model was fitted in LOOCV and a validation set was used to estimate it. With k-fold valuation, we found the same delta value of 0.0050 we used in the LOOCV approach. The significant codes (0.1, 0.01, 0.001, and 0.05) with residuals of low variance showed that the differential analysis was consistent. We evaluated the confidence of the dataset in the original samples in order to identify the variations at the transcription level. The normalization procedure was utilized to standardize the sampling methods and to evaluate the optimum RNA quality by utilizing refined measures to analyze statistics and algorithms. Table 2. K-fold cross-validation using the Bioconductor “boot” package based on Gaussian dispersion parameters. Estimate SE t value Pr(>|t|) (Intercept) 0.000107 0.000211 3.99 <1.00E^−13*** x [1] 0.030034 0.001602 20.018 <1.00E^−11*** x [2] −0.01041 0.001105 −4.017 <1.96E^−10*** x [3] 0.110112 0.003101 22.015 <1.00E^−13*** x [4] 0.110410 0.001212 20.200 <1.00E^−11*** x [5] 0.016013 0.002130 28.003 <1.00E^−10*** x [6] 0.131220 0.003361 21.012 <1.00E^−10*** x [7] −0.01201 0.001461 −21.112 <1.00E^−9*** x [8] 0.001212 0.002411 19.115 <1.00E^−12*** x [9] 0.102111 0.003802 62.0716 <1.00E^−13*** x [10] 0.010020 0.000500 4.001 <1.00E^−11*** x [11] 0.010521 0.001003 22.003 <1.00E^−11*** x [12] −0.01102 0.002014 −2.014 0.0068* x [13] −0.12421 0.002809 −50.023 <1.00E^−7*** x [14] 0.010021 0.001230 1.312 5.28E^−8*** x [15] −0.015581 0.001200 −17.102 <1.00E^−11*** [89]Open in a new tab Number of Fisher scoring iterations: 2; $K: [1] 10; $delta: [1] 0.00455 = 0.00459. Null deviance: 100,502.1 with 50,101 degrees of freedom. Residual deviance: 2,504.1 with 40,119 degrees of freedom. Signif. codes: 0.001 ‘**’. Gene Ontology and Pathway Enrichment Analysis The GO of the T2DM-related DEGs showed significantly enriched terms. SRR, PDE4B, and NFKB1 were directly and indirectly linked to cell morphogenesis, anatomical structure development, amino acid metabolism, biosynthetic and catabolic processes, mRNA processing, signal transduction, protein transport, transmembrane and intracellular transport, and other vital biological functions (p < 0.05). The molecular annotation of SRR showed associations with ATP, magnesium, phosphate, calcium, and serine–ammonia lyase binding, and threonine racemase activity. Similarly, at the molecular level, NFKB1 and PDE4B were involved in actinin chromatin binding, transcription regulation, and cAMP-binding phosphodiesterase activity. The role of the gene, its regulation, subtypes, and the cellular processes are essential in the comprehension of biological functions, and abnormalities in these processes can cause T2DM and other metabolic disturbances ([90] Figure 3A ). The pathway enrichment showed the role of valine, leucine, and isoleucine biosynthesis; glycine, serine, and threonine metabolism; the adipocytokine signaling pathway; the Hedgehog signaling pathway; and insulin and antifolate resistance in T2DM ([91] Figure 3B ). These DEGs associated with signaling pathways are critically linked to pathophysiological mechanisms. Figure 3. [92]Figure 3 [93]Open in a new tab (A) Gene Ontology network of the differentially expressed genes indicating important biological functions. (B) Pathway enrichment analysis showing the enriched terms associated with type 2 diabetes mellitus (T2DM). Transcription and Motif Analysis We identified the transcriptional factors of the T2DM-related DEGs, such as DMBX1, TAL1, ZFP161, NFIC (66.7%), and NR1H4 (33.3%), and others with a substantial p-value (<0.05) ([94] Figure 4A ). These motifs play a crucial role in the protein interactions among network components. These motifs help to discover TF binding sites and expression of gene regulation. The outcomes showed that the remodeling of a variety of motifs established the promiscuous protein characteristic resulting in different biological functions. We found eight motifs of the nuclear factor NF-kappa-B subunit (NFKB1) protein containing Pfam-annotated DNA binding, cell death, dimerization domain, and ankyrin repeats (proteins that facilitate the attachment of fundamental membrane proteins). Pfam annotation was predicted based on multiple sequence alignments and a hidden Markov model. Under diabetic conditions, the aberrant expression of NFKB1 ankyrin repeats dysregulate glucose homeostasis and protein kinase (AMP) activity. For the other DEGs, SRR contained PALP and DUF4126, while PDE4B had the PDEase_1 and PDE4_UCR motifs. These motifs are biologically associated with cyclic nucleotide phosphodiesterase and pyridoxal phosphate-dependent enzyme activity. The dysregulation of these transcriptional regulators changes the phosphodiesterase activity, resulting in insulin resistance and T2DM. The motifs were scanned with a significant cutoff value of e < 0.0002 at the default parameters ([95] Figure 4B ). Figure 4. Figure 4 [96]Open in a new tab (A) Transcriptional factors of the differentially expressed genes (DEGs) showing DMBX1, TAL1, ZFP161, NFIC, and other regulatory factors. (B) Motif analysis highlights a significant number of functional motifs associated with important biological functions. Mutation Analysis NFKB1 had 39 PTM sites with 119 repeated mutations at the positive strand of chromosome 4 encoding 968 protein residues, indicating 29.96% of the predicted disordered region. The mutation visualization plot displayed NFKB1 isoforms distal, proximal, direct, and network-rewiring mutation impacts between 450 and 900, with reference amino acid (AA) residues Q (Gln), D (Asp), N (Asn), M (Met), H (His), and A (Ala) and mutated amino acid residues E (Glu), K (Lys), R (Arg), D (Asp), T (Thr), and Y (Tyr), and others in the protein. The mutated site of NFKB1-Q900E at position 897 contained S amino acid residues enriched with a phosphorylation network-rewiring mutation impact. NFKB1 D442Y contained K residues with acetylation and ubiquitination proximal mutation impacts at the 440 position of a protein. On the other hand, PDE4B showed 38% disordered regions in a sequence. The PDE4B R196S isoform revealed the mutation at position 197 presenting amino acid residue R to the mutated amino acid residue S. At this position, the S amino acid residue sites enriched with phosphorylation indicated network-rewiring PTM impact. PDE4B-A200V contained the S residue with a phosphorylation proximal mutation impact at the 201 position of a protein. Similarly, the differentially expressed SRR gene showed a 4.12% disordered region with 44 mutations predicted at chromosome 17 on the positive strand involving 340 amino acid residues and seven PTM sites. The SRR R58G isoform showed the mutation at position 58 presenting amino acid residue R to the mutated amino acid residue G with ubiquitination, indicating proximal PTM impact ([97] Figure 5 ). Similarly, at position 203 of SRR proteins, the AA residue S was altered by N, indicating proximal PTM impact. Figure 5. [98]Figure 5 [99]Open in a new tab Mutation analysis of type 2 diabetes mellitus (T2DM)-related differentially expressed genes (DEGs) indicating posttranslational modifications with significant cutoff parameters. It highlights the significant disordered regions of the proteins contributing a pathophysiological role and to disease development. Protein Product Co-Expression Network Analysis The SRR, PDE4B, and NFKB1 genes were studied for possible interactions with each other using the STRING database. It was predicted that these DEGs would have significant interactions. The PPI network contained 33 numbers of nodes (each node indicates proteins), and the edges present the interactions. The PDE4B network showed the enriched co-expressed genes (PPI enrichment, p < 0.05) functionally associated with adenine phosphoribosyl transferase (APRT), deoxycytidine kinase (DCK), adenyl succinate lyase (ADSL), AMP phosphotransferase (AK3), protein phosphatase-1 regulatory subunit 1B (PPP1R1B), adenosine kinase (ADK), and cAMP-dependent protein kinase catalytic subunit alpha (PRKACA). PDE4B is directly connected to the SRR source gene via DISC1 (schizophrenia 1 protein). SRR is linked to d-amino-acid oxidase (DAO), l-serine ammonia-lyase (SDS), cystathionine beta-synthase-like protein (CBSL), phosphoserine phosphatase (PSPH), and other gene signatures. Based on the protein product co-expression data, we evaluated the expressions of the genes SRR, PDE4B, and NFKB1A and observed that these were downregulated. Similarly, it has been observed that NFKB1A interacts with important target proteins such as transcription factor A (RELA), TNFAIP3-interacting protein 2 (TNIP2), an inhibitor of nuclear factor kappa-B kinase subunit alpha (CHUK), an inhibitor of nuclear factor kappa B kinase subunit beta (IKBKB), and other gene signatures. The RNA co-expression pattern and protein co-regulation showed the significant level of association of the co-expressed genes with annotated keywords: hyperlipidemia, very low-density lipoprotein (VLDL), low-density lipoprotein (LDL), high-density lipoproteins (HDL), lipid metabolism, and disease mutation ([100] Figure 6 ). Figure 6. [101]Figure 6 [102]Open in a new tab Protein product co-expression network analysis using the STRING database, version 11.0. The protein network was calculated based on the neighborhood score with higher confidence (confidence score > 0.99). Nodes represent proteins and edges indicate interactions. The co-expression scores based on RNA expression patterns and protein co-regulation were studied and annotated keywords (FDR < 0.05) were observed. Validation qRT-PCR Assay and Expression Profiling The RT-PCR was consistent enough to prevent the reaction from displaying an increased fluorescence signal. The relative expression showed that SRR, NFKB1A, and PDE4B were downregulated with significant fold change values [MATH: (2− ΔΔCT) :MATH] compared to controls. These DEGs were aberrantly expressed and related to disease development and growth ([103] Figure 7A ). Based on the 2^−ΔΔC[T] method, we observed a substantial level (R ^2= 0.80, p < 0.05) of the relationship between the expression levels of the T2DM-related DEGs assessed by array analysis and the expression levels evaluated by individual qRT-PCR ([104] Figure 7B ). Figure 7. [105]Figure 7 [106]Open in a new tab (A) Abnormal expression levels of the differentially expressed genes (DEGs) in type 2 diabetes mellitus (T2DM) cases and controls based on the fold change of gene expression. (B) Quantitative real-time PCR (qRT-PCR) array validation of the DEGs. Plot graph shows the correlation between the expression levels of type 2 diabetes mellitus (T2DM)-related DEGs measured by array analysis and the expression levels measured by individual qRT-PCR. The 2^−ΔΔC[T] method was applied for this analysis. In our analysis, the cluster study specified the gene expression profiles of the cases and controls (two groups) with significant expression level differences. [107]Figure 8 shows that most of the samples have differential expression profiles compared to the left-hand dendrogram, where some genes have similar expression patterns ([108] Figure 8 ). The genes NFKB1A, SRR, and PDE4B were analyzed in 50 cases and controls, and the expression levels of the cases and controls (log fold values) were studied individually and indicated on a heat map. In this figure, the columns indicate the samples and the rows represent the DEGs. These genes may have a parallel biological function or contribute to the same physiological role. Genes with increased differential expression were termed as a gene cluster, and we found important gene clusters that were expressed differently in a number of samples (fold change ≥2 and p < 0.05). Figure 8. [109]Figure 8 [110]Open in a new tab Hierarchical cluster analysis heat map indicates the expression profiles of differentially expressed genes (SRR, NFKB1, and PDE4B). The columns in the figure represent the sample numbers and the rows represent the differentially expressed genes. Discussion In developing countries, T2DM risk factors are frequently recorded, and the prevalence of diabetes is increasing, yet its control worldwide is insufficient ([111]39, [112]40). Human gene expression has been shown to be of significance for the identification of phenotypic genetic determinants and for discovering complex genetic features. Many genes interact with the most prevalent conditions in individuals and thus need an integrative biological approach to resolve the complications and causes behind these issues ([113]41). Progress in microarray analysis enabled scientists to simultaneously examine a large number of genes and to uncover genetic evidence for different diseases ([114]42). In this study, publicly accessible cDNA datasets were analyzed to identify the DEGs for T2DM. It was revealed that SRR, PDE4B, and NFKB1 variants were involved in interactions with known T2DM-associated genes, including AK3, DISC1, ADK, ASPH, SDS, RELB, HDAC1, REL, and EP300. The dysregulation and functional abnormality of these differential genes have been investigated in T2DM development ([115]43–[116]45). The ontology of these genes and pathway analysis have demonstrated insulin resistance, insulin and adipokine signaling, amino acid metabolism, and T2DM. The amino acid metabolism, biosynthetic and catabolic processes, signal transduction, and other vital molecular functions were directly and indirectly associated with SRR, PDE4B, and NFKB1. We observed that PIk3, S1P1, endothelin, CALM1, TGFB, IL, STAT6, and the adipocytokine signaling pathways linked to these DEGs ([117]40, [118]44, [119]46) were regulated by transcriptional factors such as DMBX1, TAL1, ZFP161, NFIC, and NR1H4. The co-expression network showed the direct and indirect functional interactions of the gene signatures with the source genes. The diverse role of the family of NFKB1A genes has been studied in type 2 diabetes ([120]44) associated with impairment of glucose metabolism. NFKB1A is the gene expression regulatory factor for many pro-inflammatory proteins. Animal studies have shown that the dysregulated and increased activity of this gene causes the pathogenesis of insulin resistance and muscle atrophy ([121]47). It has been observed that amplified NFKB1A signaling may be involved in the pathogenesis of insulin resistance. The modified activity of this TF is connected to muscle loss and weakness ([122]48), common characteristics observed in diabetic individuals. Increased expressions of interleukin (IL)-1β and NFKB1 and enhanced infiltration of macrophages have been observed in pancreatic islets of patients with T2DM. Methylation of the NFKB1 gene was negatively correlated with the levels of IL-1Ra in individuals with T2DM ([123]49). cDNA differential expression analysis and integrative enrichment studies indicated the substantial association of IL-6, NFKB1, and PIK3CG with T2DM ([124]50). NFKB1, USF2, HINFP, MEF2A, and SRF are important genes that are differentially expressed in T2DM ([125]51). In another study, it has been observed that NFKB1 is engaged in the generation of mild inflammation and oxidative stress, which cause diabetic issues ([126]52). Similarly, we observed the association of the PDE4B gene with cyclic AMP and the SRC, INSRR, GRIN, DISC1, PRKACA, PRKACB, AK3, ADK, and APRT gene signatures, their dysregulation causing insulin resistance and type 2 diabetes ([127]53–[128]55). PDEs are a large family of phosphodiesterases that catalyze cAMP and cGMP to 5′-AMP and 5′-GMP, respectively. It plays a vital role in intracellular signaling pathways and is linked to the regulation of glucagon-like peptide-1 release ([129]56). The dysregulation of this enzyme leads to T2DM ([130]57), making it a potential therapeutic target. PDE4 controls the level of cAMP and forms cAMP signaling, which delicately directs signals from various environmental stimuli to specific microenvironments. PDE4-cAMP signaling dysregulation signifies an important pathophysiological path in a metabolic disorder, as shown by its important role in processes such as inflammation, lipid and glucose metabolism, hepatic steatosis, abnormal lipolysis, and disturbed neuroendocrine functions ([131]58). It has been observed that PDE3A, PDE3B, PDE4B, PDE4D, and PDE8B in rat islets and in INS-1E cells activate multiple signal pathways critical for pancreatic beta cell function, and their abnormalities are associated with insulin resistance ([132]53). It has been reported that PDE4B is a potential therapeutic target to treat obesity-related metabolic diseases. These findings from rodent studies showed that genetic and pharmacological variations in PDE4B lead to insulin resistance, insulin and adipocytokine secretion, and T2DM ([133]59). In another study, it has been observed that the serine racemase (SRR) gene is involved in type 2 diabetes ([134]60). GWAS have highlighted the polymorphic association of the SRR gene with T2DM ([135]61). This analysis has identified >100 loci independently contributing to T2DM risk, signifying the evidence for a role in insulin secretion of T2DM susceptibility genes, including PRC1, GRIN, SRR, ZFAND3, and ZFAND6 ([136]62). Collectively, we found that the genes SRR, PDE4B, and NFKB1A were directly and indirectly associated with insulin secretion, insulin resistance, and T2DM ([137]49, [138]58, [139]63). The pathway analysis diagram ([140] Figure 9 ) showed that source DEGs were primarily enriched in cAMP signaling, amino acid metabolism, protein biosynthesis, the MAPK signaling pathway, regulation of lipolysis in adipocytes, the PI3K/Akt signaling pathway, insulin resistance, and T2DM ([141]40, [142]57, [143]62, [144]64, [145]65). qPCR analysis was used to validate the outcomes of the differential analysis indicating the significant relationship of SRR, PDE4B, and NFKB1 with T2DM. A significant fold variation was observed in the target samples using the [MATH: 2−ΔΔ< mtext>CT :MATH] method ([146]66), and real-time expressions of the T2DM-related DEGs demonstrated substantial level expression compared to control. Figure 9. [147]Figure 9 [148]Open in a new tab The pathways analysis indicates the mechanism and functional association of source genes involved in the pathophysiology of insulin resistance and type 2 diabetes mellitus. Conclusions In this study, we found a significant correlation of the DEGs SRR, PDE4B, and NFKB1 with T2DM in cases compared to controls. Systems biology analysis of the cDNA data allowed us to discover potential therapeutic targets for T2DM, and our comprehensive and integrated steps were helpful in revealing the genome to phenome association with diabetes. The qPCR-based validation and expression profiling of these genes specified their abnormal expression and relates their pathological role in the disease process. These findings clearly demonstrate that the dysregulated expressions of selected genes are correlated with the pathophysiology of T2DM. Therefore, these genes would be considered as possible drug targets that would help update the therapeutic strategies for insulin resistance, T2DM, and other metabolic disorders. Data Availability Statement The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/[149] Supplementary Material . Ethics Statement The approval of the study and informed consent were obtained from the Research Ethical Committee of Institute of Molecular Biology and Biotechnology, BZ University, Multan (ref. no. IMBB/2019/002). The patients/participants provided written informed consent to participate in this study. Author Contributions WR and JG collected the materials and performed the work. MQ and SM designed the study, interpreted the data, and wrote the manuscript. MQ and SM supervised the study. BB interpreted the data. All authors contributed to the article and approved the submitted version. Funding This study was funded and supported by the Higher Education Commission (HEC) of Pakistan under Award Project no. 6913/Punjab/NRPU/R&D/HEC/2017. This funding body had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Conflict of Interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Publisher’s Note All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. Supplementary Material The Supplementary Material for this article can be found online at: [150]https://www.frontiersin.org/articles/10.3389/fendo.2021.774696/ful l#supplementary-material Supplementary Table 1 List of cDNA datasets analyzed in this study. [151]Click here for additional data file.^ (14.7KB, docx) References