Abstract With the increasing prevalence of diabetes mellitus worldwide, type 2 diabetes mellitus (T2D) combined with cognitive impairment and aging has become one of the common and important complications of diabetes mellitus, which seriously affects the quality of life of the patients, and imposes a heavy burden on the patients’ families and the society. Currently, there are no special measures for the treatment of cognitive impairment and aging in type 2 diabetes mellitus. Therefore, the search for potential biological markers of type 2 diabetes mellitus combined with cognitive impairment and aging is of great significance for future precisive treatment. We downloaded three gene expression datasets from the GEO database: [26]GSE161355 (related to T2D with cognitive impairment and aging), [27]GSE122063, and [28]GSE5281 (related to Alzheimer’s disease). Differentially expressed genes (DEGs) were identified, followed by gene set enrichment analysis (GSEA). A protein-protein interaction (PPI) network was constructed using the STRING database, and the top 15 hub genes were identified using the CytoHubba plugin in Cytoscape. Core genes were ultimately determined using three machine learning methods: LASSO regression, Support Vector Machine Recursive Feature Elimination (SVM-RFE), and Linear Discriminant Analysis (LDA). The diagnostic performance of these genes was assessed using ROC curve analysis and validated in an independent dataset ([29]GSE5281). Regulatory genes related to ferroptosis were screened from the FerrDb database, and their biological functions were further explored through GO and KEGG enrichment analyses. Finally, the CIBERSORT algorithm was used to analyze immune cell infiltration, and the correlation between core genes and immune cell infiltration levels was calculated, leading to the construction of an mRNA-miRNA regulatory network. In the [30]GSE161355 and [31]GSE122063 datasets, 217 common DEGs were identified. GSEA analysis revealed their enrichment in the PI3K-PLC-TRK signaling pathway, TP53 regulation of metabolic genes pathway, Notch signaling pathway, among others. PPI network analysis identified 15 candidate core genes, and further selection using LASSO, LDA, and SVM-RFE machine learning algorithms resulted in 6 core genes: BCL6, TP53, HSP90AA1, CRYAB, IL1B, and DNAJB1. ROC curve analysis indicated that these genes had good diagnostic performance in the [32]GSE161355 dataset, with TP53 and IL1B achieving an AUC of 0.9, indicating the highest predictive accuracy. BCL6, HSP90AA1, CRYAB, and DNAJB1 also had AUCs greater than 0.8, demonstrating moderate predictive accuracy. Validation in the independent dataset [33]GSE5281 showed that these core genes also had good diagnostic performance in Alzheimer’s disease samples (AUC > 0.6). Ferroptosis-related analysis revealed that IL1B and TP53 play significant roles in apoptosis and immune response. Immune cell infiltration analysis showed that IL1B is significantly positively correlated with infiltration levels of monocytes and NK cells, while TP53 is significantly negatively correlated with infiltration levels of follicular helper T cells. The construction of the miRNA-mRNA regulatory network suggested that miR-150a-5p might play a key role in the regulation of T2D-associated cognitive impairment and aging by TP53. This study, by integrating bioinformatics and machine learning methods, identified BCL6, TP53, HSP90AA1, CRYAB, IL1B, and DNAJB1 as potential diagnostic biomarkers for T2D with cognitive impairment and aging, with a particular emphasis on the significance of TP53 and IL1B in immune cell infiltration. These findings not only enhance our understanding of the molecular mechanisms linking type 2 diabetes to cognitive impairment and aging, providing new targets for early diagnosis and treatment, but also offer new directions and targets for basic research. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-74480-8. Subject terms: Biochemistry, Medical research Introduction T2DM is a common chronic metabolic disease, with 463 million people diagnosed with T2DM globally in 2019, and this number is expected to rise to 700 million by 2045, a 51% increase^[34]1. Cognitive impairment is a common complication of diabetes, people with diabetes are 1.5-2.0 times more likely to have cognitive impairment, cognitive impairment, or dementia than non-diabetics^[35]2 and the risk of developing cognitive impairment with diabetes is 2.25–2.91 times higher^[36]3. Diabetic cognitive impairment (DCI) is a chronic complication of diabetes that is characterized by cognitive deficits in situational memory, verbal ability and spatial memory^[37]4. DCI can occur at all stages of diabetes, and in the pre-diabetic stage, patients age 50% faster than normal. Accelerated by 50% into the next stage of cognitive impairment^[38]5. Meanwhile, cognitive decline in T2DM patients mainly occurs in middle-age and old age, with insidious onset, high disability rate, and loss of ability to live independently in the late stage, requiring care by others, which brings a heavy economic burden and care burden to society and families^[39]6. Epidemiological surveys^[40]7 show that DCI is a non-communicable disease that affects nearly 400 million people, and its incidence is gradually increasing with the aging of the population, dietary structure, and lifestyle changes. Cognitive impairment is neurological damage caused by microangiopathy in the brain due to disorders of glucolipid metabolism, and is one of the serious clinical complications of T2DM. Diabetic cognitive impairment (DCI) is closely related to nerve damage, dysglycaemia, peripheral insulin resistance, blood-brain barrier damage, central insulin resistance, neuroinflammation and other factors^[41]8. Currently, there are no special clinical measures for the treatment of cognitive impairment in type 2 diabetes mellitus. Therefore, cognitive impairment as a complication of diabetes mellitus has gradually become a research hotspot. It is of great significance to effectively reduce diabetic cognitive impairment. The pathogenesis of cognitive impairment in diabetes mellitus is complex and often accompanied by neuronal loss as well as death. In recent years, neuronal cell death modalities such as apoptosis, cellular pyroptosis, Ferroptosis, and autophagy have been identified^[42]9–[43]11. Insulin resistance and abnormal glucose metabolism, which are at the heart of diabetes, can cause an imbalance between neuronal oxidation and antioxidant activity, leading to activation of the NLRP3 inflammasome, which leads to cellular death and neuroinflammation^[44]12,[45]13, which can be treated to reduce the risk of cognitive impairment. Meanwhile, diabetic cognitive impairment may also be related to immuno-inflammation, oxidative stress, β-amyloid aggregation and abnormal phosphorylation of Tau proteins, and alterations in the brain microenvironment, etc. Therefore, clarification of the pathogenesis of diabetic cognitive impairment is of great significance to the clinical prevention and treatment of dementia^[46]14,[47]15. In this study, we aimed to explore the potential biomarkers associated with type 2 diabetes mellitus combined with cognitive impairment and aging disease, and to elucidate the potential pathogenesis of type 2 diabetes mellitus combined with cognitive impairment and aging patients, using bioinformatics to find the associated biomarkers, which could identify new diagnostic and therapeutic targets for patients with type 2 diabetes mellitus combined with cognitive impairment and aging. However, few diagnostic biomarkers associated with Ferroptosis in T2DM combined with cognitive impairment and aging have been explored by bioinformatics analysis. Meanwhile, the mechanisms underlying abnormal Ferroptosis metabolism and abnormal immune messages in T2DM combined with cognitive impairment and aging are unclear. Therefore, there is an urgent need to further investigate the role of Ferroptosis death and immune status in the pathogenesis of T2DM combined cognitive impairment and aging to elucidate their associated signalling pathways. These studies can enhance our understanding of the roles of Ferroptosis and immune infiltration in the development of T2DM combined cognitive impairment and aging, thus providing new molecular targets for the diagnosis and treatment of T2DM combined cognitive impairment and aging. In this study, diagnostic genes for T2DM combined with cognitive impairment and aging were predicted by combining bioinformatics approaches related to immune infiltration and ferroptosis. In addition, external datasets were used as validation sets in this study. Specific process: (Supplementary Fig. [48]1). Materials and methods Data collection and preprocessing We obtained gene expression profile data related to “type 2 diabetes with cognitive impairment and aging” and “Alzheimer’s disease” from the Gene Expression Omnibus (GEO, [49]https://www.ncbi.nlm.nih.gov/geo/) of the National Center for Biotechnology Information (NCBI). We retrieved three independent datasets and downloaded them from the GEO database using the GEOquery package (version 2.64.2): [50]GSE161355, [51]GSE122063, and [52]GSE5281. [53]GSE161355 Dataset: This dataset includes temporal lobe tissue samples from 6 cases of T2DM with cognitive impairment and aging, as well as 5 control cases. This dataset directly reflects the gene expression profiles of T2D patients under conditions of cognitive impairment and aging. [54]GSE122063 Dataset: This dataset contains temporal lobe autopsy tissue samples from 28 Alzheimer’s disease (AD) cases and 22 non-demented control cases. This dataset is useful for studying specific aspects of cognitive impairment, particularly gene expression changes associated with AD. [55]GSE5281 Dataset: This dataset includes brain tissue samples from 16 AD patients and 12 non-AD patients, used as an external validation dataset. By performing validation analyses on this dataset, we assess the stability and consistency of key genes across different datasets and sample conditions, thereby enhancing the reliability of the study results. (Supplementary Table [56]8). The data were standardised again by the normalize Between Arrays function of limma [3.52.2]^[57]16, and the standardised dataset was subjected to Principal Component Analysis (PCA) using R (version 4.2.1), and plotted using ggplot2 [version 3.3.6]. PCA plots to see clustering between sample subgroups, and finally analyses of variance using the limma package^[58]17. Identification of common differentially expressed genes (DEGs) The samples in [59]GSE161355 and [60]GSE122063 were extracted and analysed using the limma package [3.52.2], respectively, to obtain DEGs of cases with type 2 diabetes combined with cognitive impairment and aging versus cases without diabetes combined with cognitive impairment and aging in the [61]GSE161355 dataset; the same method was used to obtain DEGs of cases with AD and non-AD in the [62]GSE122063 dataset DEGs of AD and non-AD patients in the dataset and de-emphasise the above 2 sets of results^[63]18. FDR was used to correct the q-value for multiple hypothesis testing, |log2FC| > 1 and P < 0.05 was statistically significant. Subsequently, in order to better understand the distribution of DEGs, the R package “ggplot2” was used to plot volcano maps, and “Complex Heatmap [version 2.13.1]” was used to plot heat maps^[64]19. Finally, we used “Venn Diagram [version 3.6.3]” to obtain the overlapping DEGs, and the “ggplot2” package to plot the Venn diagram. GSEA enrichment analysis To explore the biological functions and signalling pathways of genes for type 2 diabetes combined with cognitive impairment and aging, we used the clusterProfiler package [4.4.4] version^[65]20 for gene set enrichment analysis (GSEA)^[66]21, species: Homo sapiens, with c2.cp. v7.2.symbols.gmt [Curated] as the reference gene set, MSigDB Collections gene set database. The corrected normalised enrichment score |NES|>1, False discovery rate (FDR) < 0.25 and p.adjust < 0.05 conditions were considered as significantly enriched, and the corresponding enriched pathways were screened out, as well as the core genes that play key roles for these enriched pathways. Protein-protein interaction (PPI) networks of common DEGs The STRING database ([67]https://string-db.org/) was used to present and evaluate the PPI network^[68]22. The common DEGs screened in this study were imported into STRING, and the potential connections between these DEGs could be further explored by the STRING analysis tool. The results of the interaction node data with a joint score > 0.4 medium confidence were imported into Cytoscape (version 3.9.1), and the common differentially expressed genes were analysed for the protein interactions network, and the visualisation and correlation analysis^[69]23, which were labelled as hub genes by applying the MCC algorithm of CytoHubba plugin to filter out the top 15 genes at key positions in the PPI network^[70]24,[71]25. Spearman correlation method was used to estimate the correlation between variables, two-by-two correlation analysis of hub genes was performed using igraph [1.3.4], ggraph [2.1.0], and the results of the analysis were visualised using chordal graphs^[72]26. Identification and Evaluation of Diagnostic Biomarkers Associated with Cognitive Impairment and Aging in Type 2 Diabetes Mellitus Comorbidities To further refine the list of hub DEGs, for the top15 hub genes, we independently executed three methods, LASSO, SVM-RFE and LDA. The LASSO regression model was constructed using the glmnet package [4.1.7] on the cleaned data^[73]27. First, we performed LASSO coefficient screening using cross-validation (ten-fold cross-validation; seed number: 2022) to obtain the variable lambda value, likelihood value, or classification error rate. We determined the optimal parameter (λ) using 10-fold cross-validation and plotted the partial likelihood deviation curves relative to log(λ). Visualisation was also performed using the glmnet package [4.1.7]. Here, we simultaneously use the recursive feature elimination technique (SVM-RFE) based on support vector machines (e1071 package [1.7–13])^[74]28,[75]29. By adding a feature ranking process to the outer layer of cross-validation, the algorithm obtains an unbiased estimate of the generalisation error^[76]30; the basic idea is that in each iterative loop, the algorithm (1) trains a simple linear SVM model, (2) ranks the features based on the weight values in the SVM solution, and (3) eliminates the features with the lowest weights. Linear discriminant analysis (LDA) is also a widely used feature engineering technique that aims to find one or more linear combinations of features to better deal with binary or multiclassification problems^[77]31,[78]32. We performed a recursive feature elimination algorithm with auxiliary function ladFuncs using the caret [6.0–94] package^[79]33 (repeated cross-validation, repeats = 10, number = 10). The intersection of LASSO, SVM-RFE and LDA screening results was finally taken to obtain diagnostic biomarkers related to type 2 diabetes combined with cognitive impairment and aging, which are presented in a Wayne diagram. Statistical analysis and differential expression between groups of type 2 diabetes mellitus combined with cognitive impairment and aging hub genes In order to observe the differences in the diagnostic genes of type 2 diabetes mellitus combined with cognitive impairment and aging between the experimental group and the control group, statistical analyses of these genes were carried out using the R package (version 4.2.1), and the differences between the groups were determined using the student t-test and the Weltch t’ test. Data were tested for normality and chi-square, and if close to normal distribution (P > 0.05), t-test was used. If the variance of the observed variables in both groups was equal (P > 0.05), independent samples t-test was used. If the variances of the observed variables of the two groups were statistically unequal (p < 0.05), the Weltch t’ test was used. Finally, the combination consisting of dot plots, box plots, and violin plots was visualised using ggplot2, where the significance markers: ns denotes p ≥ 0.05; “*” denotes p& lt; 0.05. ROC analysis of genes associated with type 2 diabetes combined with cognitive impairment and aging In gene sets [80]GSE161355 and [81]GSE5281, ROC curves were analysed using the pROC software package^[82]34 for the two datasets to determine the sensitivity and specificity of the above genes, respectively, and to predict the ROC-related information and data of the variables at their respective cut-off values, to assess the accuracy of the genes for the diagnosis of type 2 diabetes mellitus combined with cognitive impairment and aging. The [83]GSE161355 assay was used to screen and identify the target genes and to establish ROC curves with area under the ROC curve values between 0.5 and 1. The closer the AUC is to 1, the better the diagnostic result. the AUC has a low accuracy at 0.5 to 0.7, medium accuracy at 0.7 to 0.9, and a high accuracy at AUC of 0.9 and above. Subsequently, external validation will be performed using the independent dataset [84]GSE5281. The dataset [85]GSE5281 includes 16 brain tissue samples from Alzheimer’s disease patients (AD) and 12 brain tissue samples from non-Alzheimer’s disease patients. The results were also quantified as area under the ROC curve (AUC), and the genes screened as diagnostic with AUC > 0.6 were also visualised using ggplot2. Using six independent variables from the data, we constructed a logistic regression model using a Generalized Linear Model (GLM). We then applied the pROC package to perform ROC analysis, evaluating the combined effect of the indicators. The results were visualized using the ggplot2 package. Screening of ferroptosis-related genes associated with type 2 diabetes combined with cognitive impairment and aging The DEG landscape from the Ferroptosis-related genes Database ([86]http://www.zhounan.org/ferrdb/current/) was used to view the differentially expressed Ferroptosis-Related Genes (DE-FRGs) regulators in diabetes to create the DEG landscape. FRGs were obtained from the FRGs Database. FRGs include 369 driver genes, 348 suppressor genes, 11 marker genes and, 116 unclassified genes. Subsequently, the unique and shared parts between FRGs and TOP15 hub genes were analysed using “UpSet plot”, and finally intersected with the machine-learning-derived diagnostic biomarkers of T2DM combined with cognitive impairment and aging biomarkers to obtain T2DM combined cognitive impairment and aging-related Ferroptosis genes, and visualise the results with the ggplot2 package. Meanwhile, the correlation analysis of the above genes at RNA level and protein level was refined. Subsequently, GO and KEGG enrichment analyses^[87]20–[88]35, GSEA enrichment analyses were performed to assess their biological functions and related pathways. Enrichment analysis of ferroptosis genes associated with T2DM combined with cognitive impairment and aging Functional enrichment analysis of gene ontology (GO) of Homo sapiens in background was carried out using DAVID online database ([89]https://david.ncifcrf.gov/) to provide the required GO functional enrichment data. The functions of the genes: biological process (BP), cellular component (CC), and molecular function (MF) were annotated and classified, while kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses were used to discover the biological pathways that may be involved^[90]36–[91]38. Among them, we used clusterProfiler package [4.4.4] for enrichment analysis^[92]39, org.Hs.eg.db package for ID conversion, and GO plot package [1.0.2] to calculate zscore values^[93]40. We set the minimum gene to 10 and the maximum gene to 500, and P < 0.05 and FDR < 0.2 were considered to be statistically significant to obtain the major enrichment functions and pathways of T2DM combined cognitive impairment and aging-related Ferroptosis genes^[94]41, and finally, we drew bubble plots, heat maps, clustering trees, and used the ggplot2 package to visualise the results of enrichment analysis. Evaluation of immune cell infiltration The CIBERSORT algorithm^[95]42,[96]43 analysed by the CIBERSORT.R script is a deconvolution ( deconvolution algorithm) based on the principle of linear support vector regression for the expression matrix of human immune cell subtypes. We used the signature matrix gene expression profiles of 22 immune cells provided by the CIBERSORTx website ([97]https://cibersortx.stanford.edu/), which is a collection of gene expression signatures of 22 immune cell subtypes: LM22.txt, consisting of 547 genes, including 7 T cell types, 2 B cells, Plasma cells, 2 NK cells resting, myeloid cell subsets, and granulocyte lineage, to calculate the proportion of the 22 immune cell types in the temporal lobe tissue of patients with T2DM combined with cognitive impairment and aging temporal lobe tissues versus control patients. All analyses and visualisations were performed in R 4.2.1. We used the “ggplot2” package^[98]44 to plot dot plots, box plots and violin plots to visualise the differences in immune cell infiltration between normal and control groups. We also used the dist function to calculate the distance between the samples, the hclust function to construct a clustering model between the samples, and the ggplot2 package to visualise the clustering model, and the cleaned data were statistically analysed to obtain the proportion of each immune cell subpopulation in the samples, and to compare the distribution of 22 immune cell subpopulations in the temporal lobe tissues of the patients with cognitive impairment in combination with type 2 diabetes mellitus and senile patients with normal controls. And normal controls, and the percentage data were visualised using the ggplot2 package to draw clustered superimposed histograms. Finally, the correlation between DE-FRGs and immune cells was analysed individually using the statistical method: Spearman, and the results were visualised using the ggplot2 package for lollipop plots, correlation heatmaps, and correlation scatter plots. Construction of mRNA-miRNA regulatory networks and prediction of key miRNAs Prediction of mRNA and miRNA interactions was performed using the miRWalk database^[99]45 ([100]http://mirwalk.umm.uni-heidelberg.de/), Oct/2023 -new update^[101]46, followed by the miRDB database ([102]http://www.mirdb.org), miRTarBase database ([103]http://miRTarBase.mbc.nctu.edu.tw/), and TargetScan database ([104]http://www.TargetScan.org), to jointly perform the prediction of miRNAs, and obtain the candidate miRNAs at the intersection of the four databases^[105]47,[106]48, from which we screened the important miRNAs and mRNAs and visualised with Cytoscape [3.9.1]. Results Screening for co-expression of DEGs in type 2 diabetes combined with cognitive impairment and aging The datasets [107]GSE161355 and [108]GSE122063 were downloaded from GEO data and the above 2 datasets were normalised respectively. PCA (Principal Component Analysis) was then performed to demonstrate clustering with scatter plots, and both datasets showed more significant clustering results. In the [109]GSE161355 dataset, PC1 was 34.3% and PC2 was 23.5% (Fig. [110]1A). In the [111]GSE122063 dataset, PC1 was 19.8% and PC2 was 6.3%, with significant differences between subgroups (Fig. [112]1B). The volcano plot showed that using |log2(FC)|>1, p-value < 0.05 as the screening threshold, 933 differentially expressed genes were identified in the [113]GSE161355 dataset, of which 526 genes were up-regulated and 407 genes were down-regulated in terms of expression (Fig. [114]1C). Using the same screening threshold, 7986 differentially expressed genes were identified in the [115]GSE122063 dataset, of which 3387 genes were up-regulated in expression and 4599 genes were down-regulated in expression, as in (Fig. [116]1D). The heatmaps showed the top20 genes upregulated and downregulated in each of the two datasets, respectively, as in (Fig. [117]1E and F). Finally, intersections were taken for the differential genes in [118]GSE161355 and [119]GSE122063 that satisfied the screening conditions, and Wayne plots showed that 217 common DEGs were obtained (Fig. [120]1G). Figure 1. [121]Figure 1 [122]Open in a new tab Associated genes and Venn diagrams. (A-B) Principal components analysis (PCA) of [123]GSE161355 and [124]GSE122063. (C-D) Volcano plot of differentially expressed mRNAs, |logFC| > 1, adj: P < 0.05. (C). Differentially expressed genes of [125]GSE161355. (D). Differentially expressed genes of [126]GSE122063. Up-regulated genes are shown in red, down-regulated genes are shown in blue, and genes with no significant Up-regulated genes are shown in red, down-regulated genes are shown in blue, and genes with no significant difference are shown in grey. (E-F) Heatmap of 2 datasets. (E) An expression heat map of the top 40 DEGs in the [127]GSE161355 dataset. (F) An expression heat map of the top 40 DEGs in the [128]GSE122063 dataset. The up regulated genes are indicated as red dots; the down regulated genes are indicated as blue dots; genes without significant differences are indicated as white dots. (G) Venn diagram showing the number of Identifification of shared differential co-expression genes among 2 groups. Results of GSEA enrichment analysis We analysed the genes in the expression profiles at the overall level by GSEA software using Molecular Signatures Database, and explored the potential functional pathways of DEGs by enrichment analysis using clusterProfiler R package. Among them, 476 gene sets were significantly enriched at a threshold of False discovery rate (FDR) < 0.25, p.adjust < 0.05 (Fig. [129]2A-D). The results of GSEA enrichment analysis showed that DEGs were mainly enriched in: PI3K-PLC-TRK Signaling Pathway; Interleukin-1 Family Signaling Pathway; Glycolysis in Senescence Pathway; Robo Receptor Signaling Pathway; TP53 Regulation of Metabolic Genes Pathway; Nervous System Development Pathway; Notch Signaling Pathway; MAPK6 and MAPK4 Signaling Pathway; Apoptosis Pathway; TGF-β Signaling Pathway, etc.; Figure 2. [130]Figure 2 [131]Open in a new tab GSEA enrichment analysis: mountain range map. Construction of PPI network and identification of hub genes In order to better understand the interactions between the above co-expressed DEGs, we used STRING, to perform PPI network construction on 217 DEGs (Fig. [132]3A), imported the results into Cytoscapev.3.9.1 software, and used the CytoHubba plugin, to identify the top 15 hub genes based on the MCC values: IL1B, CD44, TP53, HSP90AA1, ITGAMDNAJB1, DNAJA1, BCL6, KITLG, HSPB8, CRYAB, SELL, TGFBR1, NLRP3, FCGR2A (Fig. [133]3B). Next, we investigated the interrelationships between 15 key genes with 15 nodes, where the scale lines on the nodes show the value of the strength of the relationship between the gene and other genes, and the width of the chord demonstrates the strength of the correlation between two genes (Fig. [134]3C). Figure 3. [135]Figure 3 [136]Open in a new tab (A) Protein-protein interaction network. The nodes represent proteins, and the edges represent the interaction of proteins. The color of the nodes, ranging from dark to light, represents a decrease in degree values. (B) Hub gene identification. (C) Chord diagram showing relationships between significant correlations. Multiple machine learning algorithms to screen for genes associated with T2DM combined with cognitive impairment and aging We screened pivotal genes to identify diagnostic biomarkers of T2DM combined with cognitive impairment and aging using LASSO, LDA and SVM-RFE. Twelve significant variables were obtained by LASSO, including BCL6, CD44, CRYAB, DNAJA1, DNAJB1, FCGR2C, HSP90AA1, IL1B, KITLG, NLRP3, TGFBR1, TP53 (Fig. [137]4A and B). Using LDA, when the number of feature subsets was 13, both Kappa value and Accuracy took the maximum value of 0.839 and 0.968, respectively, suggesting high cross-validation consistency and model prediction performance at this point. Thirteen features were obtained, including ITGAM, BCL6, DNAJB1, IL1B, HSPB8, CRYAB, TP53, CD44, SELL, DNAJA1, HSP90AA1, FCGR2A, and KITLG (Fig. [138]4D and E). Using SVM-RFE, the variation of the generalisation error corresponding to different numbers of optimal features versus training the final classifier with these features and applying it to the test set is demonstrated, and the process ensures the stability of the results by calculating the average of the 10-fold cross-validation; it is worth noting that the CV-ERROR has already been reduced to 0 at n = 5, and subsequent feature additions do not and could not improve the performance. Given that the retaining valuable features as much as possible, we chose n = 11 as the optimal subset of SVM-RFE, including ITGAM, HSPB8, TP53, CRYAB, HSP90AA1, FCGR2A, DNAJB1, TGFBR1, IL1B, BCL6, SELL (Fig. [139]4F). The results obtained from the 3 machine learning methods were intersected to obtain 6 overlapping hub genes: BCL6, TP53, HSP90AA1, CRYAB, IL1B and DNAJB1 (Fig. [140]4C). Figure 4. [141]Figure 4 [142]Open in a new tab screening hub DEGs. (A) LASSO regression of 15 hub genes. (B) Cross validation of parameter selection in LASSO regression. (C) Venn diagram of LASSO, SVM-RFE and LDA results. (D) Repeated cross-validation of kappa values corresponding to different numbers of optimal features for LDA-based RFE algorithm. (E) LDA-based RFE algorithm with different number of optimal features corresponding to repeated cross-validation Accuracy. (F) SVM-RFE algorithm with different number of optimal features corresponding to 10-fold cross-validation generalisation error. Expression of target genes associated with T2DM combined with cognitive Impairment and aging and identification of screening We used the dataset [143]GSE161355 as a training set to detect the expression of six hub genes, and the results showed that the expression of BCL6, TP53, HSP90AA1, CRYAB, IL1B and DNAJB1 was higher in the T2DM combined with cognitive impairment and aging group than in the Control group, and the difference was statistically significant (p < 0.05) (Fig. [144]5A- F) (supplementary material: Table [145]1). Subsequently, we used ROC curves, which were used to assess the diagnostic performance of these six target genes for T2DM combined cognitive impairment and aging. The area under the ROC curves had values between 0.5 and 1, and the closer the AUC was to 1, the better the diagnostic performance. The results showed that in the dataset [146]GSE161355, the AUC = 0.9 for TP53 and IL1B had a high accuracy in predictive ability, while the AUC > 0.8 for BCL6, HSP90AA1, CRYAB, and DNAJB1 had a moderate accuracy in predictive ability (Fig. [147]5G-L) (supplementary material: Table [148]2). Subsequently, we performed external validation using an independent dataset, [149]GSE5281, which consisted of 16 brain tissue samples from Alzheimer’s disease patients (AD) and 12 brain tissue samples from non-Alzheimer’s disease patients, to create the validation ROC curves. The results showed that in the independent external dataset [150]GSE5281, we confirmed that BCL6, TP53, HSP90AA1, CRYAB, IL1B, and DNAJB1 were equally diagnostic (AUC > 0.6), which was consistent with the predicted results (Fig. [151]6A-F) (supplementary material: Table [152]3). Finally, we used two datasets to generate the receiver operating characteristic (ROC) curves for the combined hub genes. The ROC curves demonstrated good discriminative ability, with an AUC of 0.959 (95% CI: 0.914-1.000) in the [153]GSE122063 dataset and an AUC of 0.859 (95% CI: 0.722–0.997) in the [154]GSE5281 dataset (Supplementary Fig. [155]2A and Fig. [156]2B). Figure 5. [157]Figure 5 [158]Open in a new tab Comparison and ROC curves of hub gene expression in T2DM combined with cognitive impairment and aging versus healthy samples. (A-F) Comparison of the expression of the hub genes in T2DM with cognitive impairment and aging versus healthy samples. BCL6 (A, G), TP53 (B, H), HSP90AA1 (C, I), CRYAB (D, J), IL1B (E, K), DNAJB1 (F, L). (*p < 0.05; **p < 0.01). AUC > 0.8000 for 6 hub genes (BCL6, TP53, HSP90AA1, CRYAB, IL1B, DNAJB1). Note: abbreviations: AUC: area under the curve; TPR: true positive rate; FPR: false positive rate. Figure 6. [159]Figure 6 [160]Open in a new tab The ROC curve for this model in the validation set. Note: 6 hub genes (BCL6, TP53, HSP90AA1, CRYAB, IL1B, DNAJB1) are AUC > 0.6000. Screening of genes for Ferroptosis associated with type 2 diabetes combined with cognitive impairment and aging From the FerrDb database ([161]http://www.zhounan.org/ferrdb/current/), we observed the DE-FRGs regulators in diabetes mellitus (Fig. [162]7A and B). Meanwhile, “UpSet plots”, showed TGFBR1, CD44, TP53 and IL1B as diagnostic biomarkers associated with Ferroptosis and type 2 diabetes mellitus combined with cognitive impairment and aging (Fig. [163]7C). Using Correlation analysis, correlations between TGFBR1, CD44, TP53 and IL1B in the CNS were viewed at the RNA level and protein level, respectively (Fig. [164]7D-G). Among them, according to the results of GO analysis, the above four genes were mainly involved in Regulation of apoptotic signalling pathway; Leukocyte cell-cell adhesion; Leukocyte activation involved in immune response; T cell differentiation; Cell growth; T cell activation involved in immune response; Regulation of intrinsic apoptotic signalling pathway by p53 class mediator; Positive regulation of MAPK cascade and other biological processes (Fig. [165]8A-C; Table [166]1). According to KEGG enrichment analysis, IL1B was mainly involved in Th17 cell differentiation; NOD-like receptor signalling pathway; Hematopoietic cell lineage and so on. Meanwhile, IL1B and TP53 were jointly involved in MAPK signalling pathway; Fluid shear stress and atherosclerosis; Lipid and atherosclerosis; Pancreatic cancer and so on. (Fig. [167]8A-C; Table [168]1). In GSEA enrichment analysis, we found that IL1B was mainly involved in Overview of Proinflammatory and Profibrotic Mediators; Cytokine-Cytokine Receptor Interaction Pathway; BioCarta IL-1 Receptor Pathway; Toll-Like Receptor Signaling Pathway; Signal Transduction through IL-1 Receptor, etc. (Fig. [169]8D-H).TP53 is mainly involved in Notch Signaling Pathway; Programmed Cell Death Pathway; TP53 Regulation of Metabolic Genes Pathway; Signaling by Interleukins Pathway; TGF-Beta Signaling Pathway; MYD88-Independent TLR4 Cascade; Glycolysis in Senescence Pathway; p53 Hypoxia Pathway; Regulation of TP53 Activity through Acetylation Pathway; Wnt Signaling Pathway; Toll-Like Receptor 9 (TLR9) Cascade, etc. (Fig. [170]8I-S). Figure 7. [171]Figure 7 [172]Open in a new tab (A-B). DE-FRGs regulators in diabetes. (C). UpSet plots showing the number of genes co-expressed between top15 hub genes and Ferroptosis-related genes. (D-G). The correlation between RNA levels and protein levels of TGFBR1, CD44, TP53 and IL1B. Figure 8. [173]Figure 8 [174]Open in a new tab Enrichment analysis. (A). GO/KEGG categories and pathways. (B). The heat map was used to visualise the biological processes and pathways assessed by GO terms and KEGG pathway. (C). Clustering tree.TGFBR1, CD44, TP53 and IL1B gene enrichment clustering results. (D-S).GSEA enrichment analysis of IL1B and TP53. Table 1. Results of GO biological process and KEGG analysis. Ontology ID Description GeneRatio pvalue p.adjust qvalue BP GO:0006986 response to unfolded protein 4/15 3.67e-06 0.0028 0.0011 BP GO:0035966 response to topologically incorrect protein 4/15 6.41e-06 0.0028 0.0011 BP GO:2001233 regulation of apoptotic signalling pathway 5/15 7.34e-06 0.0028 0.0011 CC GO:0030667 secretory granule membrane 4/15 7.49e-05 0.0063 0.0044 CC GO:0005657 replication fork 2/15 0.0009 0.0396 0.0278 CC GO:0043025 neuronal cell body 3/15 0.0054 0.0939 0.0659 MF GO:0051082 unfolded protein binding 4/15 2.29e-06 0.0002 9.89e-05 MF GO:0051087 chaperone binding 3/15 8.03e-05 0.0041 0.0017 MF GO:0031072 heat shock protein binding 3/15 0.0001 0.0043 0.0018 KEGG hsa04640 Hematopoietic cell lineage 4/14 1.86e-05 0.0022 0.0014 KEGG hsa04141 Protein processing in endoplasmic reticulum 4/14 0.0002 0.0080 0.0052 KEGG hsa05133 Pertussis 3/14 0.0003 0.0080 0.0052 [175]Open in a new tab Analysis and assessment of the association of T2DM combined with cognitive impairment and aging-associated Ferroptosis genes with immune-infiltrating cells The CIBERSORT algorithm was used to calculate the fraction of immune cells in samples of T2DM combined with cognitive impairment and senescence versus control samples, and to speculate on the composition of immune cells in each tissue.22 immune cell subtypes including B cells: B cells naïve, B cells memory; PCs; Plasma cells; T cells: T cells CD8, T cells CD4 naive, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), T cells gamma delta; NK cells: NK cells, NK cells and NK cells. delta; NK cells: NK cells resting, NK cells activated; myeloid cells: Monocytes, Macrophages M0, Macrophages M1, Macrophages M2, Dendritic cells resting, Dendritic cells activated, Mast cells resting, Mast cells activated; granulosa lineage: Eosinophils, Neutrophils. We found that the degree of infiltration of many immune cells differed between the two models. Violin plots and box plots of differences in immune cell infiltration showed differences in immune cell expression levels between the T2DM combined cognitive impairment and senescence groups and the control group; T cells CD4 memory activated, NK cells resting, NK cells activated in the T2DM combined cognitive impairment and senescence groups, Dendritic cells activated were all expressed at higher levels than in the control group (p < 0.05), while Monocytes were significantly lower (p < 0.01) (Fig. [176]9A). Meanwhile, we also found that in the T2DM combined with cognitive impairment group, the proportion of immune cells also differed between each sample (Fig. [177]9B). Then, the spearman correlation coefficient between hub-genes and the infiltration level of the immune cell was calculated. A positive correlation coefficient indicates that there is a positive correlation between the two variables; a negative correlation coefficient indicates that there is a negative correlation between the two variables; an absolute value of the correlation coefficient of 0.5–0.8 represents a moderate correlation; 0.8-1 represents a strong correlation; and a P < 0.05 is statistically significant. As a result, in the T2DM combined with cognitive impairment and aging patients, the infiltration levels of Monocytes (R = 0.886, p < 0.05) and NK cells activated (R = 0.820, p < 0.05) were positively and highly correlated with the expression of IL1B (Fig. [178]9C and E-G) (supplementary material: Tables [179]4 and [180]6); the levels of T cells follicular helper (R= -0.943, p < 0.05) infiltration level was negatively and significantly correlated with TP53 expression (Fig. [181]9D, E and H) (supplementary material: Tables [182]5 and [183]7). Figure 9. [184]Figure 9 [185]Open in a new tab Assessment of immune cell infiltration. (A) Violin plot showing the difference in immune cell infiltration between the two groups calculated by the CIBERSORT algorithm. (B) Clustered superimposed histogram showing the proportion of infiltrated immune cells calculated by the CIBERSORT algorithm. (C) Lollipop plot showing the correlation between IL1B and immune cells. (D) Lollipop plot showing the correlation between TP53 and immune cells. (E) Heatmap of the correlation between CD44, IL1B, TGFBR1, TP53 and immune cells in T2DM combined with cognitive impairment and aging samples. (*p < 0.05). (F) Correlation analysis of IL1B with Monocytes in type 2 diabetes combined with cognitive impairment and aging. (G) Correlation analysis between IL1B and NK cells activated in type 2 diabetes mellitus combined with cognitive impairment and aging. (H) Correlation analysis between TP53 and T cells follicular helper in type 2 diabetes mellitus combined with cognitive impairment and aging. Results of network construction of mRNA and miRNAs To further evaluate the potential of miRNAs as markers of T2DM combined with cognitive impairment and aging-related markers to screen for important miRNAs and mRNAs, we used both miRWalk and miRDB databases to predict miRNAs for pivotal genes and screened for six specifically expressed target genes (BCL6, TP53, HSP90AA1, CRYAB, IL1B and DNAJB1) for 223 target miRNAs, and 748 mRNA-miRNA pairs were determined (Fig. [186]10A). Visualised mRNA-miRNA networks were constructed by Cytoscape. Subsequently, it was sequentially screened by overlapping with the miRTarBase database (Fig. [187]10B) and TargetScan database, and finally 1 mRNAs: TP53 was screened, and the corresponding miRNAs were identified: hsa-miR-150a-5p (Fig. [188]10C). Figure 10. [189]Figure 10 [190]Open in a new tab mRNA-miRNA regulatory network. Discussion In this experiment, we performed differential analyses of [191]GSE161355 and [192]GSE122063, respectively. [193]GSE161355 included 6 T2DM combined cognitive impairment and senescence temporal lobe tissue samples, and 5 control samples. [194]GSE122063 included 28 AD temporal lobe tissue samples and 22 control samples. We identified 217 common DEGs. GSEA analysis showed that the 217 differential genes were enriched in biological processes such as iron uptake and transport, immune response and inflammation, interleukin 1 family of signalling processes, glycolysis in aging, Robo receptor-mediated signalling processes, neurodevelopment, apoptosis, regulation of autophagy and others. PPI network analysis of 217 DEGs was performed and 15 hub genes were screened. Next, we identified six diagnostic biomarkers for DN, including BCL6, TP53, HSP90AA1, CRYAB, IL1B and DNAJB1, by LASSO regression model, SVM-RFE and aldaFuncs analysis. To validate the accuracy of the diagnostic model, we used ROC curve analysis. The AUC values of the diagnostic models in the training set were all > 0.8. The AUC in the validation set, [195]GSE5281, was > 0.6, suggesting that the diagnostic model for T2DM combined with cognitive impairment and aging has good predictive performance. From the FerrDb database, four Ferroptosis-related factors were screened for T2DM combined with cognitive impairment and aging: TGFBR1, CD44, TP53 and IL1B, which were involved in the following five biological processes (BPs): (1) Leukocyte activation involved in immune response: in immune response, leukocytes need to be activated in order to perform their functions such as phagocytosis of pathogens or production of antibodies. (2) T cell differentiation: T cells are part of the adaptive immune system and differentiate from an immature state to mature cells that are able to recognise specific antigens. (3) Cell activation involved in immune response: During an immune response, multiple cell types (not just leukocytes) need to be activated to respond to pathogens.4. T cell activation involved in immune response: T cells are a major component of the adaptive immune system and need to be activated to recognise and respond to specific antigens.5. Leukocyte cell-cell adhesion: Leukocytes (e.g. lymphocytes, neutrophils, etc.) interact with each other through intercellular adhesion molecules, which are essential for their migration to sites of infection or inflammation. In summary, TGFBR1, CD44, TP53 and IL1B are involved in the biological processes of immunity and inflammation, so we further refined the analysis of immune infiltration, and obtained biomarkers that work together to correlate Ferroptosis and immune infiltration: TP53 and IL1B. The TP53 protein is a tumour suppressor. A variety of cellular stresses, including DNA damage, hypoxia and metabolic impairment, can activate TP53 tumour suppressor. It plays an important role in regulating homeostasis in genes related to apoptosis in neuronal cells and is central to the regulation of apoptosis. Aging of adipocytes and pancreatic β-cells is strongly correlated with TP53, which severely affects insulin secretory function, insulin resistance, and glucose homeostasis.TP53 activation signalling leads to senescence or apoptosis, which enables the development of diabetes mellitus and mild cognitive impairment^[196]49. Increased p53 in microglia has been reported to decrease synaptic protein levels and promote neurodegeneration^[197]50. As the disease progresses to^[198]51,[199]52, patients with type 2 diabetes will eventually develop symptoms of cognitive impairment, which remains a less frequently addressed complication of diabetes. Interleukin-1β (IL-1β or IL-1B), also known as catabolite, is a member of the interleukin-1 family of cytokines.IL-1β is an important mediator of the inflammatory response and is involved in a wide range of cellular activities, including cell proliferation, differentiation and apoptosis. Diabetes also induced an IL-1β-enhanced neuroinflammatory state in the hippocampus of poorly controlled STZ-diabetic rats as a result of the direct involvement of GSK3β in pro-inflammatory cytokine production. One study found^[200]53 that the combination of memantine and donepezil can treat cognitive impairment by modulating the TNF, ACHE, BAX, IL1B, and CASP3 genes. Diabetes also induced a neuroinflammatory state signified by IL-1β augmentation in the hippocampus of poorly controlled STZ-diabetic rats, which is the result of GSK3β direct involvement in the production of proinflammatory cytokines^[201]54 .reduced IL-1β levels in the hippocampus of STZ-induced diabetic rats improves the level of their inflammatory state, which is the result of the reduced binding of Ang II to its AT1 receptor, which triggers specific cytokines and chemokines that cause T cells to accumulate at the site of inflammation^[202]55. Meanwhile, IL1B and TP53 are also Ferroptosis and immune infiltration related genes, which are mainly enriched in Regulation of apoptotic signalling pathway, Cell activation involved in immune response, T cell differentiation, Regulation of intrinsic apoptotic signalling pathway by p53 class mediator, Positive regulation of MAPK cascade and other biological processes. In T2DM combined with cognitive impairment and aging, the infiltration levels of Monocytes (R = 0.886, p < 0.05), NK cells activated (R = 0.820, p < 0.05) were positively correlated with IL1B expression; the infiltration level of T cells follicular helper (R=-0.943, p < 0.05) was positively correlated with IL1B expression; the infiltration level of T cells follicular helper (R=-0.943, p < 0.05) infiltration level was negatively correlated with TP53 expression.IL1B was mainly enriched in the Th17 cell differentiation, NOD-like receptor signalling pathway.IL1B and TP53 are jointly involved in three pathways, 1. Lipid and atherosclerosis,2. MAPK signalling pathway ,3. Fluid shear stress and atherosclerosis. In the lipid and atherosclerosis signalling pathway, inflammatory response serves as the dominant mechanism, and lipid deposition participates in the whole process as an important link of the pathway. Abnormalities in the signalling pathway have been shown to be associated with many human diseases, including Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS) and various types of cancer. For example, sustained activation of JNK or p38 signalling pathways has been shown to play a role in mediating neuronal apoptosis in AD, PD and ALS; whereas the ERK signalling pathway plays a key role in several steps of tumourigenesis including cancer cell proliferation, migration and invasion^[203]56. This is also in line with our results of this experiment. MicroRNAs are promising biomarkers as a potential non-invasive diagnostic tool for their novelty in modern histological studies while being stable in blood. It is well known that miRNAs and transcription factors are key factors in the regulation of gene expression. Our current study found that this trial screened miR-150 as an important microRNA involved in diabetes mellitus combined with cognitive impairment and aging diseases. microRNAs (miRNAs/miRs) are a group of small non-coding rna that have a wide range of biological functions in a variety of human diseases, such as metabolic homeostasis^[204]57. Tetramethylpyrazine ameliorates isoflurane-induced cognitive impairment by inhibiting neuroinflammation via miR150 in rats. miR-150 has been suggested to be an inhibitor of neuroinflammation^[205]58, and it can ameliorate neuropathic pain by targeting AKT3, which is a major regulator of the inflammatory response^[206]59. It has been shown^[207]60 that overexpression of miR-150 in rats led to improved cognitive function and neuroinflammation, while the knockdown of miR-150 abrogated the protective effect of Tetramethylpyrazine (TMP) against isoflurane-induced cognitive impairment and neuroinflammation. It has also been found that miR-150 expression declines in overall Alzheimer’s disease pathology in both men and women^[208]61. Our study has limitations, and prospective clinical trial cohorts and more in-depth molecular biology experiments need to be designed and conducted to further validate the mechanism of action of these 2 related genes, miR-150, in the occurrence and development of type 2 diabetes mellitus combined with cognitive impairment and aging. The necessity and clinical significance of this study is that, given the current level of medical development for type 2 diabetes mellitus combined with cognitive impairment and aging is not yet curable, it is particularly important to consider the active prevention of type 2 diabetes mellitus combined with cognitive impairment and aging that has not yet occurred to improve the quality of survival of more patients. Finding the right target for treatment and giving individualised treatment plans can reduce the economic pressure on patients, families and even society. Conclusions In summary, candidate genes TP53 and IL1B screened based on bioinformatics analysis have the potential to influence the course of type 2 diabetes combined with cognitive impairment and aging through the Lipid and atherosclerosis, MAPK signalling pathway, fluid shear stress and atherosclerosis signalling pathways. They may play important roles in the course and disease outcome of type 2 diabetes mellitus combined with cognitive impairment and aging, and the results of this study provide meaningful clues and directions for clinical prognosis and treatment. Electronic supplementary material Below is the link to the electronic supplementary material. [209]Supplementary Material 1^ (1.3MB, tif) [210]Supplementary Material 2^ (236.3KB, tiff) [211]Supplementary Material 3^ (41.2KB, docx) [212]Supplementary Material 4^ (17.7KB, doc) [213]Supplementary Material 5^ (17.7KB, doc) Acknowledgements