Abstract Objective This article aims to identify genetic features associated with immune cell infiltration in cerebral ischemia-reperfusion injury (CIRI) development through bioinformatics, with the goal of discovering diagnostic biomarkers and potential therapeutic targets. Methods We obtained two datasets from the Gene Expression Omnibus (GEO) database to identify immune-related differentially expressed genes (IRDEGs). These genes' functions were analyzed via Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG). Tools such as CIBERSORT and ssGSEA assessed immune cell infiltration. The Starbase and miRDB databases predicted miRNAs interacting with hub genes, and Cytoscape software mapped mRNA-miRNA interaction networks. The ENCORI database was employed to predict RNA binding proteins interacting with hub genes. Key genes were identified using a random forest algorithm and constructing a Support Vector Machine (SVM) model. LASSO regression analysis constructed a diagnostic model for hub genes to determine their diagnostic value, and PCR analysis validated their expression in cerebral ischemia-reperfusion. Results We identified 10 IRDEGs (C1qa, Ccl4, Cd74, Cd8a, Cxcl10, Gmfg, Grp, Lgals3bp, Timp1, Vim). The random forest algorithm, and SVM model intersection revealed three key genes (Ccl4, Gmfg, C1qa) as diagnostic biomarkers for CIRI. LASSO regression analysis, further refined this to two key genes (Ccl4 and C1qa), With ROC curve, analysis confirming their diagnostic efficacy (C1qa AUC = 0.75, Ccl4 AUC = 0.939). PCR analysis corroborated these findings. Conclusions Our study elucidates immune and metabolic response mechanisms in CIRI, identifying two immune-related genes as key biomarkers and potential therapeutic targets in response to cerebral ischemia-reperfusion injury. Keywords: Cerebral ischemia-reperfusion injury, Bioinformatics, Immune-related differentially expressed genes, Diagnotic, Biomarkers Abbreviations AUC Area Under the Curve BH Benjamini-Hochberg method CCL4 Chemokine (C–C motif) Ligand 4 CD4 Cluster of Differentiation 4 CD40 Cluster of Differentiation 40 CIRI Circular RNA Identifier DAB Diaminobenzidine ELK3 ETS-domain protein (SRF accessory protein 2) EMT Epithelial-Mesenchymal Transition ENCORI ENCode Overlapping RNA Isoforms FDR False Discovery Rate GO Gene Ontology GSEA Gene Set Enrichment Analysis GSVA Gene Set Variation Analysis KEGG Kyoto Encyclopedia of Genes and Genomes MDSC Myeloid-Derived Suppressor Cells MHC Major Histocompatibility Complex PBS Phosphate Buffered Saline PCR Polymerase Chain Reaction ROC Receiver Operating Characteristic RT Reverse Transcription SVM Support Vector Machine SYBR SYBR Green I TGFB Transforming Growth Factor Beta THP-1 Human Monocytic Cell Line TNFA Tumor Necrosis Factor Alpha TTC Tetrazolium Chloride 1. Introduction Ischemic stroke is a prevalent vascular disease in the central nervous system and a leading cause of mortality [[37]1,[38]2]. Following an ischemic stroke, cells within the central infarct zone suffer irreversible neuronal damage due to inadequate energy supply. Surrounding this core area, the ischemic penumbra, consists of hypoperfused blood zones where neurons, although dormant or semi-dormant, maintain their structural integrity, and possess the potential for functional recovery [[39][3], [40][4], [41][5]]. Currently, the reconstruction of blood flow in ischemic regions is deemed the most effective treatment strategy for ischemic stroke [[42]6]. However, the re-establishment of blood and oxygen supply during treatment often aggravates tissue damage, a phenomenon known as cerebral ischemia-reperfusion injury (CIRI) [[43]7,[44]8]. CIRI is marked by suboptimal oxygen supply and blood flow restoration, leading to brain dysfunction such as neuroinflammation, neuronal damage, and brain edema. This complex pathological process involves excessive inflammatory responses, uncontrolled oxidative stress, DNA fragmentation, and mitochondrial dysfunction, which significantly influence the development of CIRI [[45][9], [46][10], [47][11]]. Post-reperfusion reactive oxygen species contribute to oxidative stress, intensify deleterious inflammatory reactions, and further aggravate tissue damage [[48]12,[49]13]. Immune cells are pivotal in CIRI development. At the onset of cerebral ischemia, both protective and harmful inflammatory signaling molecules are released, rapidly mobilizing immune cells to the injury site, with neutrophils swiftly infiltrating the ischemic brain area [[50]14]. In vivo microscopy reveals that neutrophils can penetrate the brain within minutes post-injury, peaking 24–48 h after ischemia, and promoting inflammation [[51]15], Post-reperfusion, the activation and infiltration of inflammatory cells trigger a cascade of reactions, mutually enhancing the synthesis and secretion of adhesion molecules [[52]16]. Recent studies suggest that limiting immune cell influx, curbing inflammatory signal release, and modifying bone marrow cells towards an anti-inflammatory phenotype can mitigate CIRI to some degree [[53][17], [54][18], [55][19]]. Despite advancements in understanding CIRI pathogenesis, its high mortality rate continues to pose a significant health threat. Rapid identification and diagnosis of CIRI remain major challenges. Bioinformatics offers new avenues for exploring disease mechanisms and identifying potential diagnostic biomarkers, precise treatments, and prognostic evaluations. Thus, this article aims to identify genetic characteristics associated with immune cell infiltration in CIRI development using bioinformatics methods. This could uncover diagnostic biomarkers and potential therapeutic targets for CIRI, offering new perspectives for early diagnosis and improved treatment strategies. 2. Materials and methods 2.1. Data acquisition and processing The datasets [56]GSE211253 and [57]GSE201258 [[58]20], related to cerebral ischemia reperfusion injury (CIRI), were downloaded from the Gene Expression Omnibus (GEO, accessible at: [59]https://www. ncbi.nlm.nih.gov/geo/) using the GEOquery package [[60]21]. The [61]GSE211253 dataset comprises a total of 11 samples, including 3 control samples and 8 CIRI samples, based on the [62]GPL25947 data platform. Similarly, the [63]GSE201258 dataset includes 9 samples, consisting of 3 control samples and 6 CIRI samples, based on the [64]GPL25916 data platform. All samples were incorporated into this study. By using ‘Immunomodulation’ as the search keyword, a total of 707 Immunomodulation-related genes (IRGs) were compiled from the GeneCards database [[65]22], along with an additional 2 IRGs identified from the literature. These IRGs, sourced from three different origins, were subsequently transformed into rat gene union sets using the “homologene” package in R software (version 4.2.1). Consequently, a final tally of 672 IRGs was incorporated into the subsequent analysis ([66]Fig. 1) The specific gene names are detailed in [67]Table 1. Fig. 1. [68]Fig. 1 [69]Open in a new tab Technology roadmap. Table 1. GEO Dataset Information list. [70]GSE211253 [71]GSE201258 Platform [72]GPL25947 [73]GPL25916 Species Rattus norvegicus Rattus norvegicus Tissue the Lung Meridian of Hand-Taiyin Cerebral ischemia-reperfusion in rat cortex Samples in Normal group 3 3 Samples in PCOS group 8 6 Reference None PMID, 36,479,246 [74]Open in a new tab CIRI, cerebral ischemia reperfusion injury. 2.2. Identification of immunomodulation-related differentially expressed genes Initially, batch processing was conducted on the datasets [75]GSE211253 and [76]GSE201258 to generate the Combined-Datasets, utilizing the “sva” package [[77]23] in R software (version 4.2.1). The Combined-Datasets comprise a total of 20 samples, including 6 control and 14 CIRI samples. Subsequently, principal component analysis (PCA) [[78]24] was applied to the dataset both pre- and post-batch processing. PCA, a method for reducing data dimensionality, extracts feature vectors from high-dimensional data, converting them into lower-dimensional, forms and visualizing these characteristics via two-dimensional or three-dimensional plots. Differential analysis of the expression spectrum data from the Combined-Datasets was then performed using the “limma” package in R software (version 4.2.1). Differentially expressed genes (DEGs) were identified based on criteria of |logFC| >1 and a p-value<0.05. Genes with a logFC >1 and a p-value<0.05were categorized as upregulated, while those with a logFC<1 and a p-value<0.05 were considered downregulated. Overlapping genes between DEGs and IRGs were identified as immunomodulation-related differentially expressed genes (IRDEGs) through Venn diagram analysis. To display the results of the differential analysis, a volcano plot and a differential ranking mapwere created using the “ggplot2” package in R software (version 4.2.1). Additionally, a heatmap related to DEGs was generated using the R package “pheatmap”. 2.3. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of IRDEGs The “clusterProfiler” package within the R environment [[79]25] was utilized for conducting GO and KEGG annotation analyses of IRDEGs (Ischemia-Reperfusion Damage-Related Expression Genes). The selection criteria for entries were set at a p.adjust <0.05 and a false discovery rate (FDR) of <0.25, both of which were deemed statistically significant. The method used for p.adjust correction was the Benjamini-Hochberg (BH) procedure. Subsequently, the Pathview package in the R environment [[80]26] was employed to graphically represent the results of the KEGG pathway enrichment analysis. 2.4. Gene Set Enrichment Analysis (GSEA) Initially, the DEGs in the Combined-Datasets were sorted based on the logFC. Subsequently, an enrichment analysis of these DEGs was conducted using the “clusterProfiler” package (R environment). The parameters set for the GSEA included a seed of 2022, a total of 10,000 calculations, a minimum gene set size of 10 genes, and a maximum of 500 genes per set. The P-value correction method employed was the BH procedure. The gene set “c2.all.v2022.1.Hs.symbols.gmt” was sourced from the MSigDB database [[81]13,[82]26]. The criteria for significant enrichment were determined as an adjusted p.adjust of <0.05 and a FDR of <0.25. 2.5. Gene set variation analysis The gene set “h.all.v7.4.symbols.gmt” was obtained from the MSigDB database to conduct GSVA on all samples in the Combined-Datasets. This analysis calculates the functional differences in enrichment pathways between the CIRI group and the Normal group. The criterion for significant enrichment was set at p < 0.05. Additionally, chromosomal mapping of IRDEGs was performed using the “RCircos” package within the R environment [[83]27]. 2.6. Support Vector Machine (SVM) and random forest analysis Initially, an SVM model was developed utilizing IRDEGs with the SVM (disease ontology) algorithm [[84]28]. IRDEGs were selected based on their ability to achieve the highest accuracy and the lowest error rate. The “randomForest” package within the R environment [[85]29] was employed to construct a model. This model was based on the expression levels of IRDEGs in the expression matrix of the Combined-Datasets. Parameters set for the model included set.seed (234) and ntree = 1000. [MATH: I(X=xi)=log2p(xi) :MATH] The genes identified through both SVM and Random Forest analyses were considered hub genes. The initial step involved conducting friend analysis on these hub genes. Following this, the “glmnet” package (R environment) [[86]30] was used to perform LASSO (Least Absolute Shrinkage and Selection Operator) regression on key genes. Parameters for this regression were seed = 2022 and family = “binary” [[87]31], with the process running for 1000 cycles. Subsequent to this, the results of the LASSO regression were visualized. Building upon the outcomes of the LASSO analysis, a nomogram was constructed using the “rms” package (R environment) [[88]32]. 2.7. Immune infiltration analysis To analyze immune infiltration, the gene expression matrix data of the Combined-Datasets was uploaded to CIBERSORTx ([89]https://cibersortx.stanford.edu/). This data was then integrated with the LM22 feature gene matrix. Samples with a p-value <0.05 were filtered out to obtain the immune cell infiltration matrix. The output data were further refined by selecting only those with immune cell enrichment scores greater than zero, resulting in the detailed immune cell infiltration matrix. Subsequently, correlation heatmaps were employed to illustrate the associations between immune cells and hub genes. The ssGSEA algorithm [[90]33] was used to quantify the relative abundance of 28 types of immune cell infiltration in the Combined-Datasets. Additionally, a correlation analysis was performed to assess the relationship between the degree of immune cell infiltration and the expression levels of hub genes in the sample. 2.8. Animals 22 adult male Sprague-Dawley (SD) rats, aged 6–8 weeks and weighing 240–300 g, were obtained from the Laboratory Animal Center of Kunming Medical University, Kunming, China. These animals were maintained under an environment with a temperature of 22 ° C-25 °C, relative humidity of 48 ± 2%, and a 12-h light/dark cycle, with unrestricted access to food and water. The rats were randomly divided into two groups: the sham group and the CIRI group. The sham group only had the neck incised and the common carotid artery exposed. The CIRI group used thread occlusion method to construct a middle cerebral artery occlusion/reperfusion model (MCAO/R). After 2 h of ischemia and 24 h of reperfusion, the brain tissue was euthanized and taken for subsequent experiments. All animal research conducted for this study received approval from the Animal Ethics Committee at Kunming Medical University (approval number: Kmmu20220786). 2.9. 2, 3, 5-Triphenyltetrazolium chloride (TTC) staining 24 h post-reperfusion, the rats were deeply anesthetized and decapitated, followed by the swift extraction of the entire brain tissue. The brain tissue was then frozen at −80 °C for 30 min and subsequently thawed at room temperature. The brain was sliced into approximately six sections, each 2 mm thick, and placed in a container with 20 mL of 2% TTC solution (Solarbio, Beijing, China). The slices were stained at 37 °C for 20 min, after which the staining solution was discarded. Next, 4% paraformaldehyde solution was gently added to the container, and the samples were stored in a dark place. After 24 h, the slices were photographed using Image J software (version 1.43), and the percentage of the infarct area was calculated. 2.10. Nissl (cresyl violet) staining assay 24 h post-reperfusion, rats were anesthetized using pentobarbital, then euthanized through PBS perfusion and subsequently fixed with a 4% paraformaldehyde (PFA) solution. Following a 48-h fixation in 4% PFA, the brain tissues underwent paraffin embedding and were sectioned coronally. The brain slices were then dewaxed and incubated in Cresyl Violet Stain solution (SolarBio, Beijing, China) for 56 min. This was followed by a 2-min treatment with Ness differentiation solution (SolarBio) and a 20-min xylene treatment. Finally, the slices were rinsed with distilled water, allowed to dry naturally, and then examined under a microscope. 2.11. Immunohistochemistry Paraffin-embedded sections were first deparaffinized, followed by antigen retrieval. To block endogenous peroxidase activity, the sections were treated with a 3% hydrogen peroxide solution. Subsequently, the sections were incubated with serum to prevent non-specific binding. The primary antibody, Gmfg (Proteintech, Wuhan, China), was then applied, followed by an HRP-labeled secondary antibody (Proteintech, Wuhan, China). Color development was achieved using DAB (diaminobenzidine) coloring reagent (Servicebio, Wuhan, China). Finally, the nuclei were counterstained with hematoxylin (Servicebio, Wuhan, China), and the results were examined under a white light microscope. 2.12. Histopathology Brain tissues were perfused and fixed in 4% paraformaldehyde for 24 h. Subsequently, these tissues were dehydrated using a gradient concentration of ethanol solutions and embedded in paraffin. The tissues were then sectioned and stained with hematoxylin and eosin (H&E), followed by analysis using standard histological staining methods. 2.13. Quantitative real-time polymerase chain reaction (qRT-PCR) For the isolation of total RNA from brain tissue, TRIzol reagent (Takara) was utilized. The Prime Script RT kit (Takara) facilitated the construction of first-strand cDNA from 1 μg of total RNA. This cDNA then served as a template for the PCR reaction. Amplifications for qRT-PCR were performed using a Bio-Rad CFX-96 Real-Time System (Bio-Rad, USA) with SYBR Premix Ex TaqII (Takara). Specific PCR primer pairs are detailed in [91]Table 5. β-Actin was employed as the internal control. Table 5. Specific PCR primer pairs. PCR Primer Primer Sequence (5′-3′) R-β-Actin-F TGCTATGTTGCCCTAGACTTCG R-β-Actin-R GTTGGCATAGAGGTCTTTACGG R-GMFG-F GCTGCAAGCCTGAACAACAG R-GMFG-R TCCTTAAGCCAGGTCTCGGT R-CCL4-F TACGTGTCTGCCTTCTCTCTCCT R-CCL4-R GCAAAGGCTGCTGGTCTCATAG R–C1QA-F CACGGAGGCAGGAACATCAT R–C1QA-R GACATCTTCAGCCACTGTCCATA R-CD74-F CAAACCTGTGAGCCCGATG R-CD74-R TTAAGGTGCTTCAGATTCTCCGG R-CD8A-F GAGTGGCTCAGTGGAGGGAAT R-CD8A-R GTTCCTGTGGCAGCAGATGAGA R-CXCL10-F GAGCCAAAGAAGGTCAAAAAGA R- CXCL10-R ACTGGGTAAAGGGAGGTGGA R-GRP -F ATGGCCTGAAGGAGCAACTGA R-GRP -R CCTTGAGAACCTGGAGCAGAGAG R-LGALS3BP -F TCCAGGGACTCAAGGTGCAAA R- LGALS3BP -R CAGCATGATTGGTCCCTTTCCT R-TIMP1-F CCTGGTTCCCTGGCATAATC R- TIMP1-R GATCGCTCTGGTAGCCCTTC R-VIM-F GCAAAGCAGGAGTCAAACGA R- VIM -R CTTCCTTCATGTTCTGGATCTCATC [92]Open in a new tab PCR, polymerase chain reaction. 2.14. Statistical analysis In this study, the R software (Version 4.1.2) was utilized for all data processing and analyses. For comparing two groups of continuous variables, independent Student's t-tests were applied to evaluate the statistical significance of variables with normal distribution. In contrast, differences among variables with non-normal distribution were analyzed using the Mann-Whitney U test (Wilcoxon rank-sum test). When comparing three or more groups, the Kruskal-Wallis test method was employed. For categorical variables, the statistical significance between two groups was determined using either the chi-square test or Fisher's exact test. Spearman correlation analysis was conducted to calculate the correlation coefficients between different molecules. Unless otherwise specified, all statistical P-values are two-sided, with p < 0.05 considered indicative of statistical significance. 3. Results 3.1. Identification of immunomodulation-related differentially expressed genes The Combined-Datasets were categorized into two groups: the CIRI group and the Control group. A differential analysis was conducted on these datasets to discern DEGs between the two groups. The findings revealed a total of 68 DEGs in the Combined-Datasets, comprising 50 upregulated genes (logFC >1 and p-value <0.05) and 18 downregulated genes (logFC <1 and p-value <0.05). A volcano plot ([93]Fig. 2A) and a differential ranking map ([94]Fig. 2B) were created based on the results of this differential analysis. The intersection of DEGs and IRGs yielded 10 IRDEGs (C1qa, Ccl4, Cd74, Cd8a, Cxcl10, Gmfg, Grp, Lgals3bp, Timp1, Vim), which are illustrated in a Venn diagram ([95]Fig. 2C). Subsequently, the expression disparities of IRDEGs between the different groups in the Combined-Datasets were examined. Group comparison maps ([96]Fig. 2D) and complex value heatmaps ([97]Fig. 2E) were generated to depict these differences. Finally, a correlation heatmap was constructed to illustrate the interrelations among IRDEGs ([98]Fig. 2F). Fig. 2. [99]Fig. 2 [100]Open in a new tab Identification of immunomodulation-related differentially expressed genes. A. Volcano Plot showing the DEGs between the CIRI group vs. normal group in the Combined-Datasets B. the differential ranking map showing the DEGs between the CIRI group vs. normal group in the Combined-Datasets, the marker genes are the IRDEGs obtained Subsequently. C. The Venn diagram of DEGs and IRGs D. Group comparison plot of IRDEGs in the CIRI and Normal groups in the Combined-Datasets E. Heatmap showing the IRDEGs in the CIRI group vs. normal group in the Combined-Datasets F. Correlation heat map between the hub gene. ns, P ≥ 0.05,*, P < 0.05, **, P < 0.01,***, P < 0.001. 3.2. GO and KEGG enrichment analysis of IRDEGs To explore the biological processes (BPs), molecular functions (MFs), cellular components (CCs), and biological pathways of IRDEGs, we conducted GO and KEGG enrichment analyses. The results are displayed in a histogram ([101]Fig. 3A). The relationship between IRDEGs and the results of the GO and KEGG enrichment analyses is illustrated in a network diagram ([102]Fig. 3B). In this diagram, the lines connecting the nodes represent annotations for the corresponding molecule and entry, with larger nodes indicating entries that contain more molecules. Subsequently, based on the enrichment analysis, we calculated the Z-score for each gene using the logFC values of IRDEGs from the differential analysis of the Combined-Datasets. The results of the GO and KEGG enrichment analyses, combined with the joint logFC, are presented in a circular diagram ([103]Fig. 3C) and a chord diagram ([104]Fig. 3D). According to [105]Fig. 4A and B, the results indicated that IRDEGs were predominantly enriched in BPs such as positive regulation of leukocyte chemotaxis (GO:0002690), neutrophil chemotaxis (GO:0030,593), and regulation of leukocyte chemotaxis (GO:0002688). In the CC category, enrichments were observed in the secretory granule lumen (GO:0034,774), cytoplasmic vesicle lumen (GO:0060,205), and vesicle lumen (GO:0031,983). For MFs, enrichments were noted in receptor ligand activity (GO:0048,018), signaling receptor activator activity (GO:0030,546), and MHC protein complex binding (GO:0023,023) ([106]Table 3). The KEGG pathway enrichment analysis revealed that IRDEGs were concentrated in three biological pathways: Cytosolic DNA-sensing pathway (hsa04623), Antigen processing and presentation (hsa04612), and Viral protein interaction with cytokine and cytokine receptor (hsa04061) ([107]Table 3). Fig. 3. [108]Fig. 3 [109]Open in a new tab Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of the IRDEGs. A. The histogram of GO and KEGG analysis results of IRDEGs. B. The network diagram of GO and KEGG analysis results of IRDEGs. C. The cycle diagram of the GO and KEGG enrichment analysis results of the joint logFC. D. The chord diagram of the GO and KEGG enrichment analysis results of the joint logFC. Fig. 4. [110]Fig. 4 [111]Open in a new tab Gene Set Enrichment Analysis (GSEA). A. the Ridgeline plot of GSEA enrichment analysis about the four main biological functions in the Combined-Datasets. B-E. The DEGs in the Combined-Datasets are significantly enriched in HARRIS HYPOXIA ([112]Fig 4B), WP OXIDATIVE STRESS RESPONSE([113]Fig 4C), GRAESSMANN APOPTOSIS BY SERUM DEPRIVATION UP([114]Fig 4D), and HOLLERN EMT BREAST TUMOR UP([115]Fig 4E). Table 3. GO and KEGG enrichment analysis results of IRDEGs. Ontology ID Description GeneRatio BgRatio pvalue p.adjust BP GO, 0002690 positive regulation of leukocyte chemotaxis 3/10 94/18,800 1.42e-05 0.0032 BP GO, 0030,593 neutrophil chemotaxis 3/10 106/18,800 2.03e-05 0.0032 BP GO, 0002688 regulation of leukocyte chemotaxis 3/10 124/18,800 3.25e-05 0.0032 BP GO, 0071,621 granulocyte chemotaxis 3/10 128/18,800 3.57e-05 0.0032 BP GO, 1,990,266 neutrophil migration 3/10 128/18,800 3.57e-05 0.0032 CC GO, 0034,774 secretory granule lumen 4/10 322/19,594 1.39e-05 0.0003 CC GO, 0060,205 cytoplasmic vesicle lumen 4/10 325/19,594 1.44e-05 0.0003 CC GO, 0031,983 vesicle lumen 4/10 327/19,594 1.48e-05 0.0003 CC GO, 0062,023 collagen-containing extracellular matrix 3/10 429/19,594 0.0011 0.0164 CC GO, 0009897 external side of plasma membrane 3/10 455/19,594 0.0013 0.0164 MF GO, 0048,018 receptor ligand activity 5/10 489/18,410 2.92e-06 7.84e-05 MF GO, 0030,546 signaling receptor activator activity 5/10 496/18,410 3.13e-06 7.84e-05 MF GO, 0023,023 MHC protein complex binding 2/10 36/18,410 0.0002 0.0023 MF GO, 0042,287 MHC protein binding 2/10 42/18,410 0.0002 0.0023 MF GO, 0005125 cytokine activity 3/10 235/18,410 0.0002 0.0023 KEGG hsa04623 Cytosolic DNA-sensing pathway 2/8 63/8164 0.0016 0.0342 KEGG hsa04612 Antigen processing and presentation 2/8 78/8164 0.0024 0.0342 KEGG hsa04061 Viral protein interaction with cytokine and cytokine receptor 2/8 100/8164 0.0040 0.0342 KEGG hsa04620 Toll-like receptor signaling pathway 2/8 104/8164 0.0043 0.0342 KEGG hsa04062 Chemokine signaling pathway 2/8 192/8164 0.0140 0.0825 [116]Open in a new tab GO, Gene Ontology. BP, biological process. CC, cellular component. MF, molecular function. KEGG, Kyoto Encyclopedia of Genes and Genomes. IRDEGs, Immunomodulation-related differentially expressed genes. 3.3. Gene Set Enrichment Analysis (GSEA) To assess the impact of gene expression levels on CIRI, GSEA was employed. This analysis elucidated the association between gene expression and various biological processes, cellular components, and molecular functions in the Combined-Datasets. A Ridgeline plot ([117]Fig. 4A) was constructed based on the enrichment outcomes. The findings indicated significant enrichment of genes in the Combined-Datasets in pathways such as HARRIS HYPOXIA ([118]Fig. 4B; [119]Table 4), WP OXIDATIVE STRESS RESPONSE ([120]Fig. 4C; [121]Table 4), GRAESSMANN APOPTOSIS BY SERUM DEPRIVATION UP ([122]Fig. 4D; [123]Table 4), and HOLLERN EMT BREAST TUMOR UP ([124]Fig. 4E; [125]Table 4). Table 4. GSEA enrichment analysis results. ID setSize enrichmentScore NES pvalue p.adjust qvalue DEMAGALHAES_AGING_UP 42 0.8589484 2.757689 1e-10 1.27e-08 9.7e-09 WIELAND_UP_BY_HBV_INFECTION 65 0.7914372 2.744049 1e-10 1.27e-08 9.7e-09 ICHIBA_GRAFT_VERSUS_HOST_DISEASE_D7_UP 81 0.7576160 2.709433 1e-10 1.27e-08 9.7e-09 VERHAAK_GLIOBLASTOMA_MESENCHYMAL 176 0.6700233 2.620163 1e-10 1.27e-08 9.7e-09 MCLACHLAN_DENTAL_CARIES_UP 151 0.6764151 2.593763 1e-10 1.27e-08 9.7e-09 WP_TYROBP_CAUSAL_NETWORK_IN_MICROGLIA 47 0.7912463 2.586037 1e-10 1.27e-08 9.7e-09 ICHIBA_GRAFT_VERSUS_HOST_DISEASE_35D_UP 102 0.7016996 2.567111 1e-10 1.27e-08 9.7e-09 NADLER_OBESITY_UP 53 0.7619047 2.548136 1e-10 1.27e-08 9.7e-09 MARKEY_RB1_ACUTE_LOF_DN 182 0.6487667 2.542022 1e-10 1.27e-08 9.7e-09 POOLA_INVASIVE_BREAST_CANCER_UP 177 0.6492288 2.540559 1e-10 1.27e-08 9.7e-09 [126]Open in a new tab GSEA, Gene Set Enrichment Analysis. 3.4. Support Vector Machine (SVM) analysis and RandomForest (RF) analysis Initially, RandomForest (RF) was utilized to examine the expression levels of IRDEGs ([127]Fig. 5A). The analysis focused on results with IncNodePurity >0.1, and these findings are depicted in [128]Fig. 5B. Utilizing the random forest algorithm, we identified four potential diagnostic markers for CIRI from the 10 IRDEGs: Ccl4, Gmfg, C1qa, and Lgals3bp. Subsequently, an SVM model was constructed to determine the number of genes corresponding to the lowest error rate ([129]Fig. 5C) and highest accuracy rate ([130]Fig. 5D). The SVM model exhibited the highest accuracy with eight genes (C1qa, Ccl4, Cd8a, Cxcl10, Gmfg, Grp, Timp1, Vim). Ultimately, we cross-referenced the genes identified by both algorithms ([131]Fig. 5E), resulting in three intersecting genes (Ccl4, Gmfg, C1qa), which were designated as hub genes ([132]Table 2). Additionally, to assess the functional importance of these hub genes, we conducted a functional similarity analysis and presented the results in a box diagram ([133]Fig. 5F). Fig. 5. [134]Fig. 5 [135]Open in a new tab Support Vector Machine (SVM) analysis and Random Forest (RF) analysis. A. Model training error diagram of random forest algorithm. B. The random forest model displays IRDEGs (In descending order of IncNodePurity). C. The number of genes with the lowest error rate obtained by the SVM algorithm. D. The number of genes with the highest accuracy rate obtained by the SVM algorithm. E. Intersection of SVM algorithm and Random Forest algorithm in Venn diagram. F. The box diagram of hub genes' Friends functional similarity analysis. Table 2. List of hub genes of differential expression analysis. Gene Symbol Description logFC P.Value adj.P.Val Ccl4 C–C motif chemokine ligand 4 1.427405 6.05E-07 0.008262 C1qa complement C1q A chain 1.244132 2.69E-05 0.133159 Gmfg glia maturation factor gamma 1.001491 0.000378 0.153079 [136]Open in a new tab 3.5. Construction of LASSO model and diagnostic value of hub genes To evaluate the diagnostic potential of hub genes (Ccl4, Gmfg, C1qa), a LASSO regression analysis was employed to construct a diagnostic model focusing on these genes ([137]Fig. 6A). The LASSO diagnostic model, as indicated in the visuals, predominantly incorporated two critical genes (Ccl4 and C1qa). Nomogram analysis was applied to these genes to assess the model's diagnostic capacity, resulting in the creation of a nomogram ([138]Fig. 6B) and a group comparison chart ([139]Fig. 6F) for the LASSO model. Concurrently, column charts ([140]Fig. 6C) and forest plots ([141]Fig. 6D) were generated for these two key genes. Additionally, the nomogram of the LASSO model underwent prognostic calibration analysis, leading to the production of a calibration curve ([142]Fig. 6E). The diagnostic value of the LASSO model and the two key genes within the CIRI group was further scrutinized by drawing the receiver operating characteristic curve (ROC). Analysis of the ROC curve indicated that gene C1qa displayed moderate diagnostic accuracy in the CIRI group (AUC = 0.75, [143]Fig. 6H), while gene Ccl4 exhibited high diagnostic precision (AUC = 0.939, [144]Fig. 6I). Lastly, the ROC curve representing the risk score for the CIRI group, as derived from the LASSO model, showed an AUC of 1 ([145]Fig. 6G), suggesting excellent diagnostic accuracy. Fig. 6. [146]Fig. 6 [147]Open in a new tab Construction of LASSO model and diagnostic value of hub genes. A. LASSO regression prognostic model diagram of hub genes. B. The nomogram of the LASSO model risk score. C. The nomogram of key genes in the LASSO model. D. The forest plot of key genes in the LASSO model. E. the calibration curve of LASSO model risk scoring model. The main function of the calibration curve is to fit and evaluate the prediction accuracy of the LASSO model. The fitting curve in the graph has a high degree of overlap with the prediction curve, indicating that the diagnostic effect of the LASSO model is well. F. The group comparison diagram of LASSO model risk scoring model. G. the diagnostic ROC curve of risk score for LASSO model. H. The diagnostic ROC curve of gene C1qa. I. The diagnostic ROC curve of gene Ccl4. The closer the AUC is to 1, the better the diagnostic effect. AUC has lower accuracy between 0.5 and 0.7, has certain accuracy between 0.7 and 0.9, and has higher accuracy above 0.9. 3.6. GSEA and GSVA enrichment analysis based on the LASSO model Utilizing the LASSO model, the samples in the Combined-Datasets were segregated into high and low-risk groups, determined by the median risk score after excluding normal samples. A volcano plot ([148]Fig. 7A) and differential ranking maps ([149]Fig. 7B) were constructed, highlighting key hub genes (Ccl4, Gmfg, C1qa). To assess the impact of DEGs' expression levels on CIRI occurrence in these groups, GSEA enrichment analysis was performed. A Ridgeline plot, based on the enrichment outcomes, was illustrated ([150]Fig. 7C). The analysis revealed significant enrichment of genes in the Combined-Datasets in pathways such as GROSS HYPOXIA VIA ELK3 UP ([151]Fig. 7D), WEIGEL OXIDATIVE STRESS BY HNE AND TBH ([152]Fig. 7E), HOLLMANN APOPTOSIS VIA CD40 DN ([153]Fig. 7F), FOROUTAN PRODRANK TGFB EMT UP ([154]Fig. 7G), among others. To explore variations in hallmark gene sets between the high-risk and low-risk groups, GSVA analysis was applied to all gene expressions in CIRI samples. The results indicated that genes in the Combined-Datasets were significantly enriched in 11 pathways, including hallmark TNFA Signaling VIA NFKB, hallmark HYPOXIA, hallmark ADIPOGENESIS, and hallmark Composition. These 11 hallmark gene sets exhibited notable differences between the high-risk and low-risk groups as per the LASSO model (p-value <0.05, [155]Fig. 7H). Fig. 7. [156]Fig. 7 [157]Open in a new tab GSEA and GSVA enrichment analysis based on LASSO model. A. Based on the LASSO model, the Volcano plot of DEGs in high-risk and low-risk groups in the CIRI samples. B. Based on the LASSO model, the differential ranking maps of DEGs in high-risk and low-risk groups in the CIRI samples. C. The Ridgeline plot base on the results of GSEA enrichment results. D-G. The DEGs of the CIRI samples in the Combined-datasets were significantly enriched in pathways such as GROSS HYPOXIA VIA ELK3 UP ([158]Fig 7D), WEIGEL OXIDATIVE STRESS BY HNE AND TBH ([159]Fig 7E), HOLLMANN APOPTOSIS VIA CD40 DN ([160]Fig 7F), FOROUTAN PRODRANK TGFB EMT UP ([161]Fig 7G). H. The Heatmap of functional scores in GSVA analysis of CIRI samples in Combined-Datasets. 3.7. Immune cell infiltration analysis using the logistic model (CIBERSORTx) This analysis, focused on immune cell infiltration, led to the creation of a stacked bar chart ([162]Fig. 8A) depicting the distribution of immune cells in CIRI samples, as well as a correlation heatmap ([163]Fig. 8B) illustrating the relationships between hub genes (Ccl4, Gmfg, C1qa) and immune cells. Scatter plots were generated to represent the correlations between specific genes and immune cells. These include the correlation between the gene C1qa and Mast cells resetting (p < 0.001, r = 0.793, [164]Fig. 8C), the gene Gmfg and Mast cells resetting (p = 0.001, r = 0.771, [165]Fig. 8D), the gene Gmfg and Mast cells activated (p = 0.015, r = −0.633, [166]Fig. 8E), and the gene C1qa and T cells CD4 memory resting (p = 0.019, r = −0.626, [167]Fig. 8F). Fig. 8. [168]Fig. 8 [169]Open in a new tab Immune Cell Infiltration Analysis based on logistic model (CIBERSORTx). A. Histogram of stacking of CIBERSORTx algorithm immune cells across samples in the merged dataset. B. Heatmap of correlation of infiltration abundance of CIBERSORTx algorithm immune cells in the merged dataset. C. Scatterplot of correlation of gene C1qa with immune cells Mast cells resting. D. Scatterplot of correlation of gene Gmfg with immune cells Mast cells resting. E. correlation scatter plot of gene Gmfg with immune cells Mast cells activated. F. correlation scatter plot of gene C1qa with immune cells T cells CD4 memory resting. A correlation coefficient with an absolute value of 0.8 or more is considered a strong correlation, an absolute value of 0.5–0.8 is considered a moderate correlation, an ab-solute value of 0.3–0.5 is considered a fair correlation, and an absolute value of 0.3 or less is considered a weak or no correlation. 3.8. Immune cell infiltration analysis using the LASSO model The ssGSEA algorithm was employed to evaluate the correlation between the infiltration of 28 immune cell types and the high-risk and low-risk group data derived from the LASSO model. Based on the outcomes of the immune cell infiltration analysis, a correlation heatmap was created to depict the relationships between the immune cells and three hub genes (Ccl4, Gmfg, C1qa) ([170]Fig. 9A). Concurrently, using the data from the correlation heatmap ([171]Fig. 9A), we identified combinations of genes and immune cells that demonstrated significant positive and negative correlations for each top two. Scatter plots were generated to illustrate the correlation between the gene Gmfg and the immune cell Type 1 T helper cell (p < 0.001, r = 0.908, [172]Fig. 9B), between the gene C1qa and the immune cell MDSC (p < 0.001, r = 0.859, [173]Fig. 9C), between the gene Gmfg and the immune cell CD56dim natural killer cell (p = 0.003, r = −0.758, [174]Fig. 9D), and between the gene Gmfg and the immune cell Neutrophil (p = 0.026, r = −0.600, [175]Fig. 9E). Fig. 9. [176]Fig. 9 [177]Open in a new tab Immune Cell Infiltration Analysis based on LASSO model. A. Heatmap of correlation of infiltration abundance of ssGSEA algorithm immune cells in the merged dataset. B. Scatterplot of correlation of gene Gmfg with immune cells Type 1 T helper cell. C. Scatterplot of correlation of gene C1qa with immune cells MDSC. D. Scatterplot of correlation of gene Gmfg with immune cells CD56dim natural killer Scatterplot of correlation between gene Gmfg and immune cells CD56dim natural killer cells. E. Scatterplot of correlation between gene Gmfg and immune cells Neutrophil. 3.9. Histopathology After 24 h of ischemia/reperfusion, TTC staining revealed that the cerebral infarction volume in the CIRI group was significantly larger compared to the sham group (p < 0.001, [178]Fig. 10A and B). HE staining indicated that in the sham group, neurons were neatly and tightly arranged, with nuclei being circular or elliptical. The nucleolar structure was distinctly visible, and there were no necrotic neurons observed. In contrast, the CIRI group exhibited disordered, swollen, and shrunken neurons in the cerebral cortex. The nuclei were condensed and deeply stained, the nucleoli were less prominent, and there was a significant increase in necrotic cells ([179]Fig. 10C). Additionally, Nissl staining results showed a significant decrease in the number of Nissl bodies in the CIRI group compared to the sham group ([180]Fig. 10D). Fig. 10. [181]Fig. 10 [182]Open in a new tab A. Comparison of TTC staining of rat brain sections in sham and CIRI groups. B. Percentage of cerebral infarction to the whole brain. C. Comparison of HE staining and cellular morphology of rat cerebral cortical tissues in sham and CIRI groups, with light microscope magnification of 40×, 100x, and 400x. D. Nysted staining of rat cerebral cortex in sham and CIRI groups, with light microscope magnification of 40×, 100-fold and 400-fold. n (n = 3 rats per group). **** equals P < 0.001. 3.10. PCR and immunohistochemical verification of key gene expression profiles The expression of 10 key genes in the CIRI group (n = 5) and the sham group (n = 5) was compared using real-time quantitative PCR. The results indicated that the expression of these genes (C1qa, Ccl4, Cd74, Cd8a, Cxcl10, Gmfg, Grp, Lgals3bp, Timp1, Vim) was upregulated compared to the sham group. Notably, the upregulation of genes C1qa, Ccl4, Cd8a, Gmfg, Grp, Lgals3bp, and Timp1 was statistically significant ([183]Fig. 11A). This gene expression pattern was consistent with the trends observed in the microarray analysis. Furthermore, immunohistochemical staining demonstrated an upregulation of Gmfg in the cortical area of cerebral ischemia/reperfusion in rats of the CIRI group compared to the sham group ([184]Fig. 11B). Fig. 11. [185]Fig. 11 [186]Open in a new tab A. Comparison of qRT-PCR results of key genes in Sham group and CIRI group n (n = 5 rats per group). B: Comparison of immunohistochemical staining of Gmfg in the cortical area of cerebral ischemia/reperfnusio in rats of Sham group and CIRI group n (n = 3 rats per group). *P < 0.05, **P < 0.01. 4. Discussion To date, stroke remains a leading cause of disability and death worldwide, imposing significant economic and social burdens [[187]34]. Ischemic stroke, constituting approximately 85% of all stroke cases [[188]35], is characterized by inadequate blood supply to the brain tissue. Timely vascular reconstruction is crucial for minimizing ischemic damage and restoring brain function, thus standing as the standard treatment for ischemic stroke. However, reperfusion therapy can also induce additional brain tissue damage, known as CIRI [[189]36]. Increasing research indicates that CIRI is a multifaceted pathological cascade influenced by various factors. These include the excessive release of inflammatory mediators post-tissue ischemia, uncontrolled oxidative stress, mitochondrial damage, and compromised endothelial cell function [[190]37,[191]38]. In recent years, the role of immunity in CIRI has gained increasing recognition as research advances, with immune-related brain-protective agents showing promise in both basic and clinical studies [[192]39]. Research indicates that alterations in gene expression can initiate inflammatory responses and programmed cell death in ischemic brain tissues. These gene expression changes, triggered by ischemic injury, may provoke an inflammatory cascade that can persist for hours to days, extending damage across the ischemic region [[193][40], [194][41], [195][42]]. However, it is now well-established that these events may also play a role in the recovery from ischemia-reperfusion injury [[196]43,[197]44]. Therefore, analyzing immune-related differentially expressed genes and immune cell infiltration in CIRI is crucial. This approach aims to identify key inflammatory molecules as early as possible and construct a precise network of these molecules to develop new therapeutic targets. In this study, we identified IRDEGs associated with CIRI by intersecting IRGs and DEGs. Subsequent functional enrichment analyses, including GO, KEGG, and GSEA, revealed that these IRDEGs were enriched in various biological processes and pathways. These included positive regulation of leukocyte and neutrophil chemotaxis, regulation of leukocyte chemotaxis, secretory granule lumen, cytoplasmic vesicle lumen, vesicle lumen, receptor ligand activity, signaling receptor activator activity, MHC protein complex binding, Cytosolic DNA-sensing pathway, Antigen processing and presentation, and Viral protein interaction with cytokine and cytokine receptor. The enrichment analysis indicated a primary involvement of these IRDEGs in inflammation and immune metabolic responses. We then pinpointed three hub genes (Ccl4, Gmfg, C1qa) using SVM and RandomForest analyses. To assess the diagnostic potential of these hub genes, a model was constructed using LASSO regression analysis. From this model, two key genes (Ccl4, C1qa) were identified as potential early diagnostic markers for CIRI, evidenced by ROC curves yielding AUC values > 0.7. Further analysis of the correlation between these hub genes and immune cell infiltration demonstrated a significant association with differential expression, predominantly correlated with high levels of resting mast cell infiltration and low levels of activated mast cell and resting CD4^+ T cell infiltration. We hypothesize that the upregulation of the hub genes Ccl4, Gmfg, and C1qa may play a crucial role in enhancing innate immunity while suppressing adaptive immunity. Supporting this hypothesis, our PCR results indicated that the expression levels of these three hub genes (Ccl4, Gmfg, C1qa) were consistent with our analytical findings, providing substantial evidence to corroborate our analysis. C–C chemokine ligand (CCL) 4, a ligand for C–C chemokine receptor type 5, is associated with spherosclerosis [[198]23]. CCL4 is secreted by various blood vessels and blood cells, including activated leukocytes, lymphocytes, vascular endothelial cells, and pulmonary vascular smooth muscle cells [[199]25], and acts as a chemoattractant for the CD4^+CD25^+ T cell population. This suggests a potential role for adaptive immunity in atherosclerotic vascular inflammation [[200]26]. CCL4 is known to induce reactive oxygen species and facilitate adhesion between THP-1 cells (human monocytes) and human endothelial cells in vitro [[201]45]. Additionally, CCL4 expression has been observed in infarcted mouse myocardium [[202]46], and clinically, elevated circulating CCL4 levels are noted in patients with atherosclerosis [[203]47]. Within atherosclerotic plaques, CCL4 is detectable in T lymphocytes, smooth muscle cells, and macrophages [[204]27,[205]28], and is further upregulated in vulnerable plaques [[206]28]. In conclusion, these findings suggest a significant involvement of CCL4 in atherosclerosis. CCL4 demonstrates complex in vivo effects, playing a vital role in immune responses to infection and inflammation. It can activate acute neutrophil inflammation and stimulate the synthesis and release of inflammatory cytokines such as IL-1, IL-6, and TNF-α by fibroblasts and macrophages [[207]48]. Therefore, the influence of CCL4 on the vulnerability and progression of atherosclerosis may be linked to its role in activating macrophages and endothelial cells. Glial cell maturation factor-γ (Gmfg) is a small 17 kDa protein initially identified as an inducer of brain cell differentiation [[208]49,[209]50]. Subsequent research has revealed that Gmfg plays a role in regulating the actin cytoskeleton, as it belongs to the actin depolymerization factor homology (ADF-H) family [[210]51,[211]52]. All members of the ADF-H family are involved in remodeling the actin cytoskeleton by binding to actin and/or actin-associated proteins (Arps) [[212]53], a process crucial for immune system function, angiogenesis, and cell motility [[213]54]. Predominantly expressed in inflammatory cells, Gmfg is linked with neutrophil and T cell migration. It is a significant regulator of cellular iron metabolism and macrophage phenotype, potentially making it a novel therapeutic target for modulating macrophage function in immune and metabolic disorders [[214]55]. The role of these key genes in CIRI has been scarcely explored. However, our analysis suggests they may contribute to the treatment, early diagnosis, and prognostic evaluation of CIRI. Our research also has limitations. Firstly, it relies on multiple public datasets, which may have inconsistencies in disease progression. Secondly, the study's small sample size and short duration limit its scope. Future studies will aim to increase the sample size, extend the duration, and incorporate more patient data for comprehensive analysis. 5. Conclusions This study investigates the immune and metabolic response mechanisms associated with CIRI. It identifies two immune-related genes as key factors, signifying potential biomarkers and therapeutic targets pertinent to the immune response in cerebral ischemia-reperfusion injury. Data availability The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author. Funding This research was funded by THE NATIONAL NATURAL SCIENCE FOUNDATION OF CHINA, grant number 81960250, THE PROGRAM OF SCIENCE AND TECHNOLOGY AGENCY OF YUNNAN PROVINCE, grant number 202101AB0701142 and YUNNAN PROVINCIAL SCIENCE AND TECHNOLOGY DEPARTMENT-KUNMING MEDICAL UNIVERSITY JOINT SPECIAL PROJECT, grant number 202301AY070001-193. CRediT authorship contribution statement Yuan Yang: Writing – original draft, Software, Formal analysis, Conceptualization. Yushan Duan: Writing – review & editing, Writing – original draft, Methodology, Data curation. Huan Jiang: Software, Methodology. Junjie Li: Formal analysis, Data curation. Wenya Bai: Software, Methodology. Qi Zhang: Software, Methodology. Junming Li: Software, Data curation. Jianlin Shao: Visualization, Supervision, Funding acquisition, Conceptualization. Declaration of competing interest No competing financial interests exist. References