Abstract

   This study employs machine learning and single-cell transcriptome
   sequencing (scRNA-seq) analysis to unearth novel biomarkers and
   delineate the immune characteristics of ischemic stroke (IS), thereby
   contributing fresh insights into IS treatment strategies.Our research
   leverages gene expression data sourced from the GEO database. We
   undertake weighted gene co-expression network analysis (WGCNA) to
   filter pertinent genes and subsequently employ machine learning
   algorithms for the identification of feature genes. Concurrently, we
   rigorously execute quality control measures, dimensionality reduction
   techniques, and cell annotation on the scRNA-seq data to pinpoint
   differentially expressed genes (DEGs). The identification of core
   genes, denoted as Hub genes, among the feature genes and DEGs, is
   achieved through meticulous overlapping analysis. We illuminate the
   immune characteristics of these Hub genes using a suite of analytical
   tools, encompassing CIBERSORT, MCPcounter, and pseudotemporal analysis,
   all based on immune cell annotations and single-cell transcriptome
   data.Subsequently, we harness the CMap database to prognosticate
   potential therapeutic drugs and scrutinize their associations with the
   identified Hub genes. Our findings unveil robust linkages between three
   pivotal Hub genes—namely, RNF13, VASP, and CD163—and specific immune
   cell types such as T cells and neutrophils. These Hub genes
   predominantly manifest in macrophages and microglial cells within the
   scRNA-seq immune cell population, exhibiting variances across different
   stages of cellular differentiation. In conclusion, this study unearths
   highly pertinent biomarkers for IS diagnosis and elucidates IS-induced
   immune infiltration characteristics, thus providing a firm foundation
   for a comprehensive exploration of potential immune mechanisms and the
   identification of novel therapeutic targets for IS.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-024-77495-3.

   Keywords: Ischemic stroke, Immune features, Machine learning
   algorithms, Molecular docking, Pseudo-time analysis, Single-cell
   transcriptomic sequencing

   Subject terms: Functional clustering, Gene ontology, Genome
   informatics, Machine learning, Virtual drug screening

Introduction

   Based on extensive epidemiological investigations, stroke has emerged
   as the second leading global cause of mortality and disability. It is
   categorized into two main types: ischemic stroke and hemorrhagic
   stroke, with ischemic stroke (IS) accounting for more than 80% of
   reported cases^[32]1,[33]2. IS arises from localized cerebral ischemia
   caused by either intracranial arterial thrombosis or embolism,
   resulting in subsequent neurological dysfunction. Owing to its elevated
   prevalence, disability incidence, recurrence frequency, and fatality
   rate, IS exerts a notable burden on both families and public health
   systems^[34]3,[35]4. While thrombolysis and thrombectomy serve as
   effective clinical interventions for IS, their success hinges on
   precise diagnosis and timely thrombolysis within a critical 4.5-hour
   window to significantly diminish disability rates. Unfortunately, a
   significant proportion of IS patients are unable to access prompt and
   efficacious care, leading to compromised limb functionality and a
   marked deterioration in their overall quality of life^[36]5.
   Consequently, delving into the underlying pathogenesis of IS and
   exploring novel therapeutic approaches assume paramount importance in
   enhancing patient outcomes.

   As genomic sequencing technologies continue to evolve and the field of
   bioinformatics makes strides, an expanding array of disease mechanisms
   is becoming comprehensible. This comprehension is facilitated through
   the amalgamation of high-throughput sequencing-driven multi-omics
   analyses, which encompass data from genomics, transcriptomics,
   proteomics, and metabolomics. This integrative approach allows for the
   systematic depiction of the pathological and physiological processes
   underpinning diseases. Weighted Gene Co-Expression Network Analysis
   (WGCNA) represents a sophisticated bioinformatics approach aimed at
   delineating sets of genes characterized by intricate patterns of
   coordinated variability. Through an evaluation of the
   interconnectedness among gene sets and their relationships with
   phenotypes, WGCNA discerns gene sets exhibiting the highest degree of
   correlation with disease. This method enjoys extensive utilization
   across diverse research domains, spanning disease investigations and
   gene-trait association analyses, among other applications. Machine
   learning constitutes a field encompassed by artificial intelligence,
   entailing the selection of suitable algorithms from datasets, automated
   inference of logic or rules, and subsequent utilization of the inferred
   outcomes (models) for predicting outcomes based on novel data.
   Presently, machine learning has garnered substantial usage within
   clinical contexts^[37]6,[38]7.

   Single-cell RNA sequencing (scRNA-seq) is an innovative technique that
   enables high-throughput sequencing analysis of genomics,
   transcriptomics, and epigenomics at the resolution of individual cells.
   Unlike bulk RNA sequencing, which provides a representation of average
   gene expression across cell populations, scRNA-seq delves into gene
   structures and expression profiles at the singular cell level, thereby
   capturing the intricate tapestry of cellular heterogeneity. This
   pivotal capability holds the key to uncovering emerging cell types,
   pinpointing cells pivotal in disease progression, comprehending the
   intricacies of cellular diversity, and unveiling facets of biological
   variation. Consequently, scRNA-seq plays an instrumental role in
   furthering our comprehension of cell biology and the underpinnings of
   disease etiology^[39]8,[40]9. Currently, the integration of WGCNA and
   scRNA-seq analyses of sequencing data, along with the implementation of
   machine learning for biomarker selection and prognostic gene
   identification, has garnered extensive usage in diagnosing and studying
   the underlying mechanisms of diverse diseases.

   To investigate differentially expressed genes (DEGs) among IS patients
   and investigate potential biomarkers and immune infiltration associated
   with IS progression, this study employed the WGCNA method. It analyzed
   datasets related to IS from the Gene Expression Omnibus (GEO) database.
   Additionally, the Elastic Net, Lasso Regression (Lasso), Ridge
   regression, and Random Forest (RF) algorithms were assessed and
   validated to select the machine learning algorithm with the highest
   diagnostic value for identifying feature genes.Subsequently, scRNA-seq
   data was analyzed to identify differentially expressed genes within
   distinct cell populations. The differentially expressed genes were
   overlapped and intersected with the feature genes to obtain Hub genes.
   Immune infiltration correlation analysis, scRNA-seq immune cell cluster
   expression analysis, and trajectory analysis were then conducted on the
   selected Hub genes. Furthermore, prediction of potential drug binding
   molecular docking was performed to provide a basis for the selection of
   IS immune-related diagnostic markers and treatment strategies. More
   detailed descriptions of the experimental design and methods are
   provided in Supplementary Material 1.

Materials and methods

Data set acquisition

   Retrieved from the publicly accessible NCBI GEO database
   ([41]https://www.ncbi.nlm.nih.gov/geo/), the [42]GSE16561 dataset,
   comprising 24 healthy control subjects and 39 ischemic stroke patients,
   underwent comprehensive analysis. For validation purposes, the
   [43]GSE22255 dataset, featuring 20 healthy control subjects and 20
   ischemic stroke patients, was employed to corroborate the machine
   learning algorithms. Moreover, a distinct single-cell dataset
   ([44]GSE157278) was retrieved from the GEO database, encompassing
   samples from 3 Maco model mice and 3 sham-operated mice.

WGCNA

   Employing the R software package “WGCNA”, we constructed a
   co-expression network of genes derived from samples of both IS patients
   and healthy control subjects. This analysis delved deeply into the
   dataset, employing an unscaled network framework built upon gene
   expression data^[45]10. Highly correlated modules were computed for
   [46]GSE16561 in relation to IS, with the identification and exclusion
   of genes displaying expression levels below 0.5. Detection of outliers
   was facilitated through a clustering tree approach. The “β
   soft-thresholding” function was then applied to determine the optimal
   soft-thresholding parameter β. The weighted adjacency was computed and
   subsequently transformed into a topological overlap matrix (TOM). The
   selection of pertinent modules was based on the discerned expression
   disparities between the IS and control groups.

Co-expression genes set enrichment analysis

   The functional gene sets, encompassing c2.cp. Reactome, c5.go.bp,
   c5.go.cc, and c5.go.mf, were retrieved from the MSigDB database
   ([47]https://www.gsea-msigdb.org/gsea/msigdb/index.jsp). To perform
   gene set enrichment analysis on the co-expression genes, we employed
   the R package ClusterProfiler. The threshold for enrichment
   significance was defined as P < 0.05, coupled with |NES|>1.

Selection of machine learning algorithms and identification of feature genes

   We opted for widely employed machine learning algorithms, specifically
   Elastic Net, Lasso, Ridge regression, and Random Forest^[48]11,[49]12,
   to conduct our analysis. The expression profiles of [50]GSE16561 were
   meticulously examined through the application of these four algorithms.
   To gauge their diagnostic accuracy, each algorithm’s performance was
   evaluated by constructing AUC-ROC curves across all four methods. The
   machine learning algorithm displaying the highest AUC value was
   subsequently subjected to a normalized confusion matrix analysis to
   assess its diagnostic efficacy. Ultimately, we validated the precision
   of the algorithms using the [51]GSE22255 validation dataset,
   culminating in the identification of the specific machine learning
   algorithm tailored for the recognition of feature genes.

Single-cell sequencing data analysis

   Download the [52]GSE174574 dataset for single-cell transcriptome
   sequencing, encompassing blood samples from ischemic stroke cases. This
   dataset comprises three samples from the middle cerebral artery
   occlusion (Maco) model of ischemic stroke in mice, along with three
   sham-operated control samples. Employ the R software Seurat package for
   the analysis of the single-cell sequencing data. Cells expressing fewer
   than three genes or less than 200 genes are filtered out, while cells
   expressing over 2500 genes or possessing mitochondrial genes exceeding
   10% are excluded^[53]13. Principal component analysis is employed to
   reduce the dimensionality of the single-cell sequencing data, with the
   harmony software package facilitating sample integration. Cell type
   clustering is executed using the FindClusters function, employing a
   resolution of 0.20. Following this, t-distributed stochastic neighbor
   embedding (t-SNE) is applied for clustering and dimensionality
   reduction analysis, thereby visualizing the clustering outcomes^[54]14.
   Marker genes for each cluster are identified using the FindAllMarkers
   function. The SingleR software package is utilized to annotate distinct
   cell subtypes and ascertain the proportions of different cell types
   within the single-cell transcriptome sequencing dataset.

DEGs enrichment analysis and hub genes

   By employing the FindAllMarkers function within the Seurat package,
   differential expression genes were discerned among distinct cell
   subtypes through a non-parametric Wilcoxon rank sum test, employing
   criteria of |log[2]FC|>1 and P < 0.05 to define differential
   expression^[55]15. Subsequent to this, the DEGs underwent Gene Ontology
   (GO) functional enrichment analysis and KEGG^[56]16–[57]18 pathway
   enrichment analysis utilizing the DAVID website
   ([58]https://david.ncifcrf.gov/). Furthermore, an intersection was
   performed between the DEGs and the feature genes, culminating in the
   identification of Hub genes.

Immune infiltration and immune correlation analysis of hub genes

   By utilizing the software packages “CIBERSORT.R” and “MCP-counter.R”,
   we conducted an analysis of the expression levels of distinct immune
   cells within the expression profiles^[59]19,[60]20. Subsequently, an
   investigation into the correlation between Hub genes and immune cells
   was carried out. The “Corrplot” package was utilized to generate a
   heatmap visualizing the associations between Hub genes and immune cell
   activities.

Hub gene single-cell transcriptome expression analysis

   By employing the SingleR package, we performed annotation of immune
   cells within individual clusters from the Maco single-cell
   transcriptome analysis. Subsequently, we analyzed the expression
   patterns of Hub genes across diverse immune cell populations using the
   FeaturePlot and VlnPlot packages^[61]21.

Immune cell development trajectories and pseudo-time analysis

   The R package Monocle 3 is widely employed to establish single-cell
   trajectories that elucidate the progression of cellular development
   processes ([62]https://cole-trapnell-lab.github.io/monocle3).
   Leveraging the capabilities of the Monocle package, a developmental
   pseudo-time analysis was executed on Macrophages and Microglia cells
   sourced from the single-cell sequencing data of Maco mice. These cells
   underwent initial clustering analysis, succeeded by subsequent
   procedures of dimensionality reduction and UMAP visualization, all
   contributing to the comprehensive visualization of the data^[63]22.

Results

WGCNA and selection of relevant modules

   Based on the findings of our analysis, a soft threshold of 16 (R^2 =
   0.86) was chosen to establish robust connectivity relationships (Fig.
   [64]1A, B). A representative subset comprising 400 genes was then
   randomly selected for the construction of a gene co-expression network
   heatmap, facilitating the visualization of intra-modular gene
   connectivity. Employing the framework of clustering analysis,
   predicated on gene expression levels and their interrelations within
   each module, we successfully delineated the existence of seven distinct
   co-expression modules (Fig. [65]1C, D). To deepen our understanding of
   the interrelation between these modules and IS, the correlation
   coefficients for each module were meticulously computed. This
   analytical endeavor unveiled a noteworthy positive correlation between
   the yellow module (cor = 0.65, P = 8.6e-10), the blue module
   (cor = 0.49, P = 1.7e-12), and IS (Fig. [66]1E). Upon amalgamating the
   gene content of these two modules, a comprehensive compilation yielded
   a total of 328 co-expression genes, forming the foundation for
   subsequent analytical pursuits.

Fig. 1.

   [67]Fig. 1
   [68]Open in a new tab

   Analysis of WGCNA Modules. (A) Depiction of the scale-free topology
   under the condition of a soft threshold set to 16. (B) Examination of
   the soft thresholding process. (C) Investigation into intra-modular
   gene connectivity. (D) Heatmap representation showcasing the
   correlation between module feature genes and IS. (E) Scrutiny of module
   membership and the significance of individual genes within the modules.

Enrichment analysis of co-expressed genes

   BP analysis unveils a notable enrichment of co-expressed genes within
   processes encompassing protein modification, developmental growth,
   protein aggregation. CC analysis underscores the predominant enrichment
   of co-expressed genes within realms including the microtubule
   cytoskeleton and microtubules. MF analysis substantiates that
   co-expressed genes are principally enriched in activities such as
   peptidase activity, phospholipid binding (Fig. [69]2A–C). Furthermore,
   the Reactome pathway analysis underscores the enrichment of potential
   genes within signaling pathways associated with cell cycle regulation,
   vesicle-mediated transport, and the cellular response to stimuli (Fig.
   [70]2D).

Fig. 2.

   [71]Fig. 2
   [72]Open in a new tab

   Gene enrichment analysis. (A) Biological processes. (B) Cellular
   components. (C) Molecular functions. (D) Reactome pathway analysis.

Performance evaluation of machine learning algorithms

   Utilizing the Elastic Net, Lasso, Ridge, and RF algorithms, we
   conducted an in-depth analysis of the expression profiles from
   [73]GSE16561. The initial phase of analysis encompassed the
   construction of Receiver Operating Characteristic (ROC) curves for all
   four algorithms. Notably, the outcomes revealed markedly elevated Area
   Under the Curve (AUC) values for Elastic Net, Lasso, and RF (AUC: 1),
   whereas Ridge Regression exhibited an AUC of 0.98 (Fig. [74]3A). As a
   result, we selected the Elastic Net, Lasso, and RF models for
   subsequent extensive investigation.

Fig. 3.

   [75]Fig. 3
   [76]Open in a new tab

   Analysis of machine learning algorithms. (A) ROC curves for four
   algorithms. (B–D) Normalized confusion matrix for Elastic Net, Lasso
   and RF.

   Subsequent to the initial analysis, we generated normalized confusion
   matrices for the Elastic Net, Lasso, and RF algorithms. This enabled a
   meticulous comparison of the algorithmic performance in pinpointing
   feature genes. The normalized confusion matrix outcomes demonstrated
   the remarkable precision achieved by all three algorithms–Elastic Net,
   Lasso, and RF–in effectively distinguishing between control and stroke
   group genes within the expression profiles (Fig. [77]3B–D).

Independent dataset validation of machine learning algorithms

   Leveraging the gene expression profiles sourced from [78]GSE22255 as an
   autonomous validation dataset, we constructed Receiver Operating
   Characteristic (ROC) curves to assess the efficacy of the Elastic Net,
   Lasso, and RF algorithms. The results illuminated AUC values of 1 for
   both Lasso and RF, while Elastic Net displayed a commendable AUC value
   of 0.968 (Fig. [79]4). This outcome underscores the substantial
   discriminatory potential inherent within the three machine learning
   algorithms, reaffirming their proficiency in precisely discerning
   feature genes.

Fig. 4.

   [80]Fig. 4
   [81]Open in a new tab

   Analysis of machine learning algorithm validation set.

Machine learning identification of feature genes

   Employing the Lasso, Elastic Net, and RF algorithms for feature gene
   identification, the Lasso regression algorithm ascertained 11 feature
   genes through meticulous parameter selection using the lambda.min
   criterion (Fig. [82]5A). The Elastic Net algorithm meticulously
   identified 18 promising feature genes, following a rigorous selection
   methodology (Fig. [83]5B). Utilizing gene importance scoring within the
   RF algorithm yielded the discovery of 20 distinctive feature genes
   (Fig. [84]5C). Ultimately, the integration of all three machine
   learning algorithms culminated in the comprehensive identification of
   27 feature genes collectively associated with IS.

Fig. 5.

   [85]Fig. 5
   [86]Open in a new tab

   Machine learning selection of potential biomarkers for IS. (A) Feature
   gene identification using Lasso regression analysis. (B) Feature gene
   identification based on variable importance ranking using Elastic Net
   regression algorithm. (C) Feature gene ranking according to importance
   scores using the RF algorithm.

Dimensionality reduction, clustering, and annotation of cellular subtypes in
scRNA-seq data

   We conducted an integrated analysis of single-cell transcriptomic
   sequencing data obtained from Maco mice. Upon subjecting the raw
   sequencing data to rigorous quality control procedures, we proceeded
   with data normalization, batch correction, dimensionality reduction,
   and clustering. Subsequent to the meticulous preprocessing steps, which
   included the elimination of low-quality cells from the scRNA-seq data,
   we discerned the top 10 genes of significant importance (Fig. [87]6A).
   Through a comparative assessment of the collection of highly variable
   genes within each cluster and others, we elucidated specific marker
   genes unique to individual cell subgroups. The manifestation of these
   cluster-specific marker genes is visually depicted in a heatmap
   presentation (Fig. [88]6B). Utilizing the t-Distributed Stochastic
   Neighbor Embedding (tSNE) clustering approach, we conducted cellular
   clustering based on the corresponding principal component data,
   employing clustering parameters with Dims (dimensions) set at 30 and
   Resolution at 0.5. Subsequently, cells were stratified into 13
   clusters, designated as 0 to 11 (Fig. [89]6C). Through the
   identification of marker genes, we performed cell type annotation for
   the 13 clusters and subsequently organized them into 8 distinct
   cellular subtypes (Fig. [90]6D). Remarkably, noteworthy variations in
   the distribution proportions of each cellular cluster were observed
   between the sham-operated and Maco groups. Among the eight distinct
   cellular subtype groups, Microglia, Monocytes, and Astrocytes displayed
   heightened proportions within the Maco group (Fig. [91]6E).

Fig. 6.

   [92]Fig. 6
   [93]Open in a new tab

   Cellular types and distribution in maco mouse scRNA-seq data. (A)
   Variance plot displaying a total of 11,990 genes across all cells, with
   the red dots indicating 2000 highly variable genes. (B) Heatmap
   illustrating the expression of cell markers within cellular subtypes.
   (C) t-SNE clustering visualization analysis of cells, resulting in the
   identification of 13 distinct cell clusters. (D) Cellular subtypes
   annotated through marker gene characterization. Enumeration and
   distribution of cell types. (E) Distribution of cellular types between
   the Maco and sham-operated groups.

DEGs enrichment analysis and hub genes

   Through the analysis of cellular clusters in the Maco scRNA-seq
   dataset, a comprehensive identification of 1764 differentially
   expressed genes was achieved (Fig. [94]7A). Our GO enrichment analysis
   unveiled a predominant association of these differentially expressed
   genes with functionalities encompassing inflammatory cell migration and
   immunoglobulin binding (Fig. [95]7B). The KEGG analysis further
   illustrated the engagement of these genes in governing pathways such as
   Leukocyte transendothelial migration and Oxidative phosphorylation,
   thereby underlining a significant nexus between immune response,
   inflammatory infiltration, and IS (Fig. [96]7C). Subsequently, we
   conducted a meticulous overlap analysis between the differentially
   expressed genes obtained from the single-cell transcriptome and the
   feature genes identified through machine learning. This intersection
   brought to light three pivotal Hub genes: RNF13, VASP, and CD163.

Fig. 7.

   [97]Fig. 7
   [98]Open in a new tab

   Differential gene expression and enrichment analysis in maco mouse
   scRNA-seq. (A) Distribution of differentially expressed genes within
   distinct cell clusters, with red indicating upregulated genes and blue
   indicating downregulated genes. (B) GO analysis. (C) KEGG analysis.

Immune infiltration correlation of hub genes

   Immunomodulatory Analysis Utilizing CIBERSORT and MCP-counter
   Algorithms on the [99]GSE16561 Dataset: Employing the CIBERSORT
   algorithm, our analysis revealed notable increases in the proportions
   of Macrophages M0, Macrophages M2, resting Mast cells, Neutrophils, and
   T cells gamma delta infiltration within the stroke group when compared
   to the control group (Fig. [100]8A). Our examination unveiled
   noteworthy correlations between the trio of Hub genes—RNF13, VASP, and
   CD163—and specific immune cells, including Plasma cells, resting Mast
   cells, Neutrophils, and CD8-positive T cells (Fig. [101]8B–D).

Fig. 8.

   [102]Fig. 8
   [103]Open in a new tab

   Immunological analysis using CIBERSORT. (A) Differential distribution
   of immune cells between control and stroke samples in the expression
   profiles. (B–D) Correlation analysis between Hub genes and
   immune-infiltrating cells.

   The results obtained through the MCP-counter algorithm revealed a
   pronounced increase in the proportions of Monocytic lineage,
   Endothelial cells, and Neutrophils infiltration within the stroke group
   compared to the control group (Fig. [104]9A). Notably, the three
   identified Hub genes—RNF13, VASP, and CD163—showed significant
   correlations with pivotal immune cell types, encompassing T cells,
   Neutrophils, B lineage cells, and Endothelial cells (Fig. [105]9B–D).

Fig. 9.

   [106]Fig. 9
   [107]Open in a new tab

   Immunological analysis using MCP-counter. (A) Differential distribution
   of immune cells between control and stroke samples in the expression
   profiles. (B–D) Correlation analysis between Hub genes and
   immune-infiltrating cells.

Expression levels of hub genes in scRNA-seq immune cell subpopulations

   By annotating cellular subtypes within the 13 clusters using marker
   genes, we established a classification into five distinct cell type
   groups. Broadly summarized as T cells, monocytes, B cells, NK cells,
   and platelets(Fig. [108]10A). Notably distinct proportions of cellular
   distribution were evident between the sham-operated and Maco groups.
   Within these five cell type groups, Macrophages and Microglia displayed
   elevated proportions in the Maco group (Fig. [109]10B). An analysis of
   Hub gene expression in immune cells unveiled that Rnf13 was primarily
   expressed within Endothelial cells and Microglia, Cd163 expression
   stood out in Macrophages and Microglia, and Vasp was detected in
   Macrophages and Stromal cells (Fig. [110]10C, D). The collective
   expression of these three Hub genes in Macrophages and Microglia
   underscores their intricate involvement in the immunomodulatory
   processes that underlie the pathogenesis of ischemic stroke.

Fig. 10.

   [111]Fig. 10
   [112]Open in a new tab

   Expression Distribution of Hub Genes in scRNA-seq Immune Cell
   Subpopulations. (A) Enumeration and distribution of immune cell types.
   (B) Distribution of immune cell types in the Maco and sham-operated
   groups. (C) Violin plots illustrating the expression of Hub genes
   within immune cell clusters. (D) Expression distribution of Hub genes
   across different immune cell types.

10 Pseudo-time analysis of immune cell development trajectories

   The analysis of immune cell expression through scRNA-seq has unveiled
   the expression of all three Hub genes in both Macrophages and
   Microglia. Moreover, Macrophages and Microglia actively participate in
   immune responses and inflammatory reactions as the immune system
   progresses. To elucidate the intricate interplay between different
   developmental stages of cell trajectories and the dynamic expression
   patterns of Hub genes, we engaged Monocle to conduct developmental
   trajectory analysis and pseudo-time assessment. Initially, a
   comprehensive exploration involving dimensionality reduction and
   visualization analysis was performed utilizing UMAP. This analysis
   aimed to discern variations in cell distribution between Macrophages
   and Microglia in instances of sham surgery and Maco samples (Fig.
   [113]11A). Subsequently, leveraging the outcomes of pseudo-time
   analysis, we formulated cell trajectories. These trajectories unveiled
   two distinctive paths within Macrophages and Microglia: one spanning
   from Macrophages to Microglia, and the other signifying internal
   differentiation within the Microglia population (Fig. [114]11B, C).

Fig. 11.

   [115]Fig. 11
   [116]Open in a new tab

   Pseudo-time series analysis. (A) UMAP-derived clustering outcomes. (B)
   UMAP visualization depicting trajectories of individual cells. Cells
   are sequenced in a pseudo-temporal arrangement, with colors
   transitioning from purple to yellow to signify varying degrees of cell
   differentiation. (C) Trajectories of differentiation and orientations
   within Macrophages and Microglia subgroups. Numeric labels denote the
   order of differentiation. (D) Alterations in expression of Hub genes
   during pseudo-time analysis. (E) Analysis of cell-gene module
   correlations, with gene module colors denoting their associations with
   cellular development.

   The pseudo-time analysis of Hub gene expression patterns exhibited that
   Cd163 remains relatively steady in expression, while Rnf13 and Vasp
   exhibit declining expression levels as pseudo-time advances (Fig.
   [117]11D). Furthermore, a comprehensive heatmap analysis targeting gene
   modules linked with cell developmental trajectories was conducted. This
   analysis offered insights into the underlying regulatory mechanisms
   orchestrating the behaviors of Macrophages and Microglia (Fig.
   [118]11E).Consequently, through the integration of cell trajectory
   analysis and the dynamic expression patterns of Hub genes across
   various pseudo-time points, our work provides a substantial reference
   for attaining a deeper comprehension of the regulatory functions
   executed by immune cells in the context of the IS.

Discussion

   IS is a complex disorder influenced by multiple pathophysiological
   mechanisms. The occlusion of cerebral arteries leads to compromised
   brain oxygen delivery, resulting in inadequate glucose and energy
   provision, excitotoxicity, oxidative stress, and inflammatory
   reactions, ultimately causing cerebral parenchymal necrosis^[119]23.
   Currently, the clinical diagnosis of IS lacks highly specific and
   sensitive early markers, and treatment primarily relies on thrombolysis
   with tissue-type plasminogen activator. However, the narrow therapeutic
   window and risk of hemorrhage limit its effectiveness^[120]24.
   Therefore, identifying IS biomarkers through advanced high-throughput
   sequencing, single-cell transcriptomics, and exploring correlations
   between Hub genes and the post-IS immune microenvironment are crucial.

   WGCNA is a method used to explore gene correlations and their
   relationships with external sample traits. This approach helps identify
   correlations between gene clusters and clinical traits, as well as
   connections between genes and co-expression modules^[121]25. In our
   research, WGCNA identified two modules associated with IS. A gene
   enrichment analysis of the 328 genes within these modules revealed
   significant involvement in pathways regulating the cell cycle,
   vesicle-mediated transport, and cellular response to stimuli. Changes
   in the inflammatory response and immune microenvironment play key roles
   in the initiation and progression of IS.

   The Elastic Net, Lasso, and RF machine learning models are known for
   their efficiency, robust outcomes, and reliability, making them
   prominent in medical research^[122]26–[123]28. In this study, we used
   these models to analyze expression profiles and constructed ROC curves
   to evaluate their precision. Elastic Net, Lasso, and RF achieved an AUC
   value of 1, indicating high accuracy. Visual examination of the
   normalized confusion matrices further demonstrated their strong
   classification performance, accurately identifying feature genes. ROC
   validation on an independent set confirmed the diagnostic utility and
   accuracy of these algorithms in identifying potential biomarkers for
   IS, underscoring their substantial diagnostic and predictive
   capabilities.

   Single-cell transcriptomic sequencing provides non-targeted
   quantification of transcripts at the individual cell level, offering
   precise gene expression resolution and insights into cellular
   identities and functions. Analysis of Maco’s single-cell transcriptomic
   data revealed that, excluding endothelial and glial cells, other cell
   types were significantly associated with inflammatory responses.
   Consistent with WGCNA’s co-expressed gene enrichment analysis,
   functional and pathway enrichment assessments of single-cell
   differentially expressed genes highlighted the involvement of immune
   and inflammatory processes in IS progression. Overlap analysis between
   machine learning-identified feature genes and scRNA-seq DEGs identified
   RNF13, VASP, and CD163 as pivotal Hub genes for further investigation.

   Ring finger protein 13 (RNF13) functions as an E3 ubiquitin ligase and
   is involved in various cellular processes^[124]29. Studies have shown
   that increased RNF13 expression induces neurite outgrowth in PC12 cells
   and is upregulated following neurite induction in B35 neuroblastoma
   cells, indicating its role in neurite outgrowth signaling^[125]30.
   Animal studies reveal that RNF13 knockout in mice disrupts
   ubiquitination of SNARE-associated proteins, reducing synaptic vesicles
   and altering neurotransmitter transmission. In a Parkinson’s disease
   model, silencing RNF13 expression decreased apoptosis-related signaling
   proteins, suggesting RNF13’s potential in mitigating motor impairments
   and protecting against neural damage by regulating
   apoptosis^[126]31–[127]33.

   Vasodilator-stimulated phosphoprotein (VASP), along with Mena and EVL,
   forms the ENA/VASP protein family and is mainly activated downstream of
   the prostacyclin receptor IP1. VASP is crucial in cellular processes
   such as deformation, migration, adhesion, and proliferation^[128]34.
   The disruption of cerebral blood flow leads to cell death in ischemic
   lesions, causing neuroinflammation and secondary tissue damage. Immune
   infiltration analysis indicates that VASP plays a role in the
   pathological progression of ischemic injury by influencing interactions
   among immune cells. VASP is involved in actin reorganization in
   macrophages and neutrophils, contributing to immune responses^[129]35.
   After IS, increased blood-brain barrier permeability correlates with
   elevated VASP phosphorylation, which is linked to higher expression of
   vascular endothelial growth factors and hypoxia-inducible factors under
   hypoxic conditions. This leads to cerebral edema and worsens ischemic
   stroke damage^[130]36.

   Cluster of Differentiation 163 (CD163), a member of the scavenger
   receptor cysteine-rich superfamily, plays a key role in pathogen
   eradication, lipid transport, homeostasis, and immune
   responses^[131]37. Predominantly found on monocytes and macrophages,
   CD163 facilitates M2-type macrophages to secrete anti-inflammatory
   cytokines, demonstrating strong anti-inflammatory effects. However, in
   inflammatory conditions, CD163 can detach, enhancing neurotoxicity by
   modulating scavenger receptor functions^[132]38. CD163 is crucial in
   the immune response to IS-induced injury. Sequencing studies by
   O’Connell et al. showed significant upregulation of CD163 in peripheral
   blood shortly after ischemic stroke onset, impacting stroke prognosis
   and autoimmune complications^[133]39,[134]40. Consistent with our
   findings, Pedragosa et al. demonstrated that IS triggers reprogramming
   of CD163 macrophage gene expression, affecting leukocyte chemotaxis and
   blood-brain barrier integrity, leading to acute-phase neurological
   dysfunction and edema^[135]41. In summary, RNF13, VASP, and CD163 are
   key contributors to IS pathology, significantly influencing its onset,
   progression, and treatment through their roles in inflammatory
   responses and immune cell regulation.

   We used CIBERSORT and MCP-counter to analyze immune cell infiltration
   in expression profiles, revealing increased T lymphocyte infiltration
   in IS and Hub genes. The balance between CD4 + and CD8 + T lymphocytes
   closely correlates with post-stroke inflammatory response, neurological
   impairments, and changes in cellular immune function^[136]42–[137]44.
   Neutrophils, the first blood-derived immune cells to reach ischemic
   brain tissue, disrupt the blood-brain barrier, form thrombi, and
   trigger inflammation. Monocytes quickly accumulate at the injury site,
   differentiating into macrophages and dendritic cells, further
   intensifying inflammation and blood-brain barrier
   damage^[138]45,[139]46. Using scRNA-seq, we found co-expression of
   three Hub genes among macrophages, microglia, and oligodendrocytes,
   indicating a strong link with immune microenvironment changes due to
   ischemic hypoxia in IS. Ischemic stroke activates molecular signaling
   related to stress and cell death, engaging the adaptive immune system.
   Microglia, key mediators of brain immune responses, and infiltrating
   macrophages play crucial roles in modulating intracerebral immune
   reactions and shaping the post-stroke immune
   microenvironment^[140]47,[141]48.

   Potential drug prediction and molecular docking identified compounds
   interacting with key IS-related genes. Using the CMAP dataset, we
   identified four candidates: LY-225,910, cinnarizine, K-858, histamine,
   and benzthiazide. Our focus was on cinnarizine, a calcium channel
   blocker used to treat vestibular disorders^[142]49. It inhibits
   abnormal calcium ion influx, protects cellular integrity, and enhances
   vascular function, making it potentially effective for stroke and
   related conditions^[143]50. Molecular docking showed strong binding of
   cinnarizine to RNF13, VASP, and CD163, suggesting it may aid recovery
   from ischemic injuries. These results align with existing literature,
   providing a foundation for further research. Clinical validation is
   needed(detailed methods and results are in the supplementary materials
   2–3).

   In summary, this study integrated machine learning algorithms and
   scRNA-seq analysis to identify Hub genes implicated in ischemic stroke.
   We investigated the relevance of these genes in immune infiltration and
   their distribution within immune cell populations, tracing the
   differentiation trajectories of pertinent cells. Molecular docking
   experiments confirmed the potential efficacy of identified drugs.

   These findings have significant implications for ischemic stroke
   research and treatment, highlighting the role of the immune
   microenvironment and inflammatory responses. However, the study has
   limitations. The analysis relied on GEO database data, which has
   constraints in accuracy and sample size, necessitating further
   validation with additional databases or experiments. Additionally,
   conclusions from immune cell expression analysis, cell trajectory
   exploration, and drug prediction should be corroborated by relevant
   literature and followed by in vivo and in vitro validations.

Conclusion

   To sum up, our study has successfully pinpointed RNF13, VASP, and CD163
   as potential diagnostic biomarkers for IS, closely intertwined with
   immune responses. The application of single-cell transcriptomic
   analysis has unveiled a significant augmentation of these diagnostic
   indicators within immune cell subsets like macrophages and microglia,
   with their expression also exhibiting alterations along the
   trajectories of cell differentiation. Predictive drug analysis
   underscores a robust binding interaction between Cinnarizine and these
   markers, consequently positioning it as a promising contender for
   potential IS therapy. These findings serve as a foundational framework
   for leveraging immune cell regulation and associated pathways in the
   therapeutic approach to IS. Moreover, an in-depth exploration of these
   Hub genes may introduce innovative avenues for both clinical
   interventions and mechanistic explorations into the domain of IS.

Electronic supplementary material

   Below is the link to the electronic supplementary material.
   [144]Supplementary Material 1^ (11.8MB, docx)
   [145]Supplementary Material 2^ (11.8MB, tif)
   [146]Supplementary Material 3^ (44.9MB, tif)
   [147]Supplementary Material 4^ (14KB, docx)

Author contributions

   YW Z: dataset analysis and original manuscript writing. XY M and XH M:
   collect the gene list and data visualization. HY L and Q T: study
   design, supervision, and funding support. All authors reviewed the
   manuscript.

Funding

   This study was supported by Scientific Research Project of Heilongjiang
   Administration of Traditional Chinese Medicine (ZHY2022-164);
   Heilongjiang Postdoctoral Fund Project (LBH-[148]Z20201).

Data availability

   The datasets analysed during the current study are available in the GEO
   repository. The datasets analyzed during the current study are
   available in the GEO repository, [149]GSE16561:
   [150]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16561;
   [151]GSE22255:
   [152]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22255;
   [153]GSE157278:
   [154]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157278.

Declarations

Competing interests

   The authors declare no competing interests.

Footnotes

   Publisher’s note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

Contributor Information

   Hongyu Li, Email: lihongyu-1991@126.com.

   Qiang Tang, Email: tangqiang1963@163.com.

References