Abstract

Objective

   The study aimed to explore the miRNA and mRNA biomarkers in post-stroke
   depression (PSD) and to develop a miRNA–mRNA regulatory network to
   reveal its potential pathogenesis.

Methods

   The transcriptomic expression profile was obtained from the GEO
   database using the accession numbers [29]GSE117064 (miRNAs, stroke vs.
   control) and [30]GSE76826 [mRNAs, late-onset major depressive disorder
   (MDD) vs. control]. Differentially expressed miRNAs (DE-miRNAs) were
   identified in blood samples collected from stroke patients vs. control
   using the Linear Models for Microarray Data (LIMMA) package, while the
   weighted correlation network analysis (WGCNA) revealed co-expressed
   gene modules correlated with the subject group. The intersection
   between DE-miRNAs and miRNAs identified by WGCNA was defined as
   stroke-related miRNAs, whose target mRNAs were stroke-related genes
   with the prediction based on three databases (miRDB, miRTarBase, and
   TargetScan). Using the [31]GSE76826 dataset, the differentially
   expressed genes (DEGs) were identified. Overlapped DEGs between
   stroke-related genes and DEGs in late-onset MDD were retrieved, and
   these were potential mRNA biomarkers in PSD. With the overlapped DEGs,
   three machine-learning methods were employed to identify gene
   signatures for PSD, which were established with the intersection of
   gene sets identified by each algorithm. Based on the gene signatures,
   the upstream miRNAs were predicted, and a miRNA–mRNA network was
   constructed.

Results

   Using the [32]GSE117064 dataset, we retrieved a total of 667 DE-miRNAs,
   which included 420 upregulated and 247 downregulated ones. Meanwhile,
   WGCNA identified two modules (blue and brown) that were significantly
   correlated with the subject group. A total of 117 stroke-related miRNAs
   were identified with the intersection of DE-miRNAs and WGCNA-related
   ones. Based on the miRNA-mRNA databases, we identified a list of 2,387
   stroke-related genes, among which 99 DEGs in MDD were also embedded.
   Based on the 99 overlapped DEGs, we identified three gene signatures
   (SPATA2, ZNF208, and YTHDC1) using three machine-learning classifiers.
   Predictions of the three mRNAs highlight four miRNAs as follows:
   miR-6883-5p, miR-6873-3p, miR-4776-3p, and miR-6738-3p. Subsequently, a
   miRNA–mRNA network was developed.

Conclusion

   The study highlighted gene signatures for PSD with three genes (SPATA2,
   ZNF208, and YTHDC1) and four upstream miRNAs (miR-6883-5p, miR-6873-3p,
   miR-4776-3p, and miR-6738-3p). These biomarkers could further our
   understanding of the pathogenesis of PSD.

   Keywords: post-stroke depression, gene signatures, miRNA–mRNA network,
   machine-learning, diagnostics-clinical characteristic

Graphical Abstract

   [33]graphic file with name fneur-14-1096911-g0001.jpg

1. Introduction

   Post-stroke depression (PSD) is a common complication after a stroke,
   affecting up to one-third of stroke survivors ([34]1). It compromises
   individual functional recovery, impairs the quality of life, and
   increases the burden on the healthcare system ([35]2). It was observed
   that depressive symptoms were negatively correlated with functional
   recovery ([36]3) and related to higher mortality ([37]4). However, the
   pathogenesis of PSD remains elusive to date.

   MicroRNAs (miRNAs) are a group of small non-coding RNAs with
   downregulative activity on post-transcriptional gene expression by
   binding to the 3′ untranslated regions of target mRNAs ([38]5, [39]6).
   Profiles of miRNA expression could be as useful as mRNA in diagnosis
   and prognosis ([40]7). Of note, miRNAs can be secreted into body
   fluids, including peripheral blood and urine, which can be
   non-invasively accessed for detection ([41]8). As such, serum miRNAs
   have been widely studied in various diseases, including stroke as
   potential biomarkers ([42]9–[43]12). With the development of RNA
   sequencing technology, data sharing, and machine-learning methods, the
   identification of feature serum miRNAs and the construction of related
   signatures in a large number of subjects have become practical.
   According to previous studies, serum miRNA-based signatures could
   effectively predict the risk of strokes in healthy individuals ([44]13)
   and clinical outcomes ([45]14, [46]15) in patients with neurological
   tumors. Moreover, miRNAs are functionally involved in numerous
   biological processes, including cellular metabolism ([47]16),
   cell-cycle regulation ([48]17), and immune response ([49]18).
   Therefore, we hypothesized that miRNAs and their target mRNAs could be
   potential biomarkers implicated in the pathogenesis of PSD.

   With the advancement of the machine-learning methods, the
   identification of relevant biomarkers becomes practical in the big data
   era. The least absolute shrinkage and selection operator (LASSO), a
   regression analysis algorithm, uses regularization to improve
   prediction accuracy ([50]19). The support vector machine (SVM) is a
   supervised machine-learning technique widely utilized for both
   classification and regression ([51]20). Random forest (RF) is
   considered the most accepted group classification technique because of
   having excellent features such as variable importance measure and
   out-of-bag error ([52]21). In this study, we aimed to explore the miRNA
   as well as mRNA biomarkers in PSD and to construct a miRNA–mRNA
   regulatory network to reveal its potential pathogenesis using a
   machine-learning approach.

2. Materials and methods

2.1. Data source

   The miRNAs expression profile was obtained from the GEO database
   ([53]https://www.ncbi.nlm.nih.gov/geo/) using the accession number
   [54]GSE117064, while mRNA expressions of blood samples from patients
   with late-onset MDD and controls were embedded in [55]GSE76826. A total
   of 1,785 serum samples were incorporated into the dataset, which
   consists of 173 samples of patients with stroke and 1,612 controls.
   Extraction, detection, and data processing of serum miRNAs were
   provided in the previous report ([56]13). Stroke was diagnosed based on
   physical and neurological examinations supplemented with brain imaging
   data, including computed tomography and/or magnetic resonance imaging,
   while healthy controls were defined as having no history of stroke and
   negative on medical checkup in a clinic ([57]13). In [58]GSE76826,
   there were 12 blood samples of late-onset MDD and 10 samples of
   controls. Late-onset MDD was defined according to the DSM-IV diagnosis
   and age of ≥ 50 years ([59]22).

2.2. Differential analysis, WGCNA, and identification of stroke-related genes

   The linear models for microarray data (LIMMA) package ([60]23) in R
   software was applied to extract differentially expressed miRNAs
   (DE-miRNAs) between stroke and control samples. The p-value was
   adjusted with the false discovery rate (FDR) ([61]24). An FDR of <0.05
   and |FC| of > 1 were set as the threshold for DE-miRNAs. The
   visualization of differential analysis was presented with a heatmap and
   a volcano plot. Using [62]GSE117064, the weighted gene correlation
   network analysis (WGCNA) ([63]25, [64]26) was performed to construct a
   co-expression network to identify hub miRNA modules using the “WGCNA”
   package. Filtered miRNAs were employed to construct a scale-free
   network by calculating the connection strength between miRNAs. We
   assessed the correlation among miRNA modules as well as their
   correlations to the clinical group (stroke vs. control). Subsequently,
   DE-miRNAs incorporated in the stroke-related modules were identified as
   stroke-related miRNAs, and their target genes were predicted using
   miRDB ([65]https://mirdb.org/), miRTarBase
   ([66]https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2022/php/ind
   ex.php), and TargetScan ([67]https://www.targetscan.org/vert_80/).

2.3. Identification of overlapped DEGs in post-stroke depression

   Overlapped DEGs for post-stroke depression were defined as aberrantly
   expressed mRNAs in the blood sample collected from stroke patients as
   well as late-onset MDD. In other words, these overlapped DEGs were
   implicated in two diseases simultaneously. Differentially expressed
   mRNAs (DE-mRNAs) for late-onset MDD were defined in a similar manner
   with an FDR of <0.05 and |FC| of > 0.5. Among these DE-mRNAs,
   stroke-related genes were selected to identify overlapped DEGs for
   post-stroke depression. With these overlapped DEGs, the clusterProfiler
   R package ([68]27) was utilized to perform the gene ontology (GO) terms
   and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway
   enrichment analysis on the predicted target genes. Three categories
   were included in the GO enrichment analysis, i.e., biological process
   (BP), cellular component (CC), and molecular function (MF).

2.4. Gene signatures and a miRNA–mRNA regulatory network for PSD

   The least absolute shrinkage and selection operator (LASSO), random
   forest (RF), and support vector machine (SVM) algorithms were utilized
   to build gene signatures for PSD using the overlapped DEGs. Gene
   signatures were established with the intersection of gene sets
   identified by each algorithm. The receiver operating characteristic
   (ROC) curves were mapped for the identified gene signatures, where the
   area under curves (AUCs) was calculated as an indicator of
   classification. The AUCs of > 0.8 were considered excellent
   classification, while AUCs of > 0.7 were considered acceptable. By
   matching miRNA–mRNA pairs in multiple databases (miRDB, miRTarBase, and
   TargetScan), the upstream miRNAs were predicted using the gene
   signatures for PSD. Subsequently, a potential miRNA–mRNA regulatory
   network for PSD was established.

3. Results

3.1. Differential analysis, WGCNA, and identification of stroke-related genes

   The overall analysis of the study was presented in the [69]Graphical
   abstract. Using the LIMMA package, we retrieved a total of 667
   DE-miRNAs, which included 420 upregulated and 247 downregulated ones.
   The feature DE-miRNAs with the absolute value of logFC of >2 were
   marked in green in the volcano plot ([70]Figure 1A). Meanwhile, a
   heatmap showing the top 100 DE-miRNAs is presented in [71]Figure 1B.
   The details of these DE-miRNAs can be found in [72]Supplementary
   material 1.

Figure 1.

   [73]Figure 1
   [74]Open in a new tab

   Differential analysis of miRNA expression profiles in stroke vs.
   control: (A) a volcano plot and (B) a heatmap with top 100 DE-miRNAs.

   We performed the hierarchical clustering of the samples ([75]Figure
   2A), and the soft-thresholding power was set at 5 with the cutoff score
   of Scale-free R^2 being 0.9 ([76]Figure 2B). The clustering dendrograms
   of the sample identified five modules ([77]Figure 2C) and their
   correlations with clinical groups were presented in the heatmap plot
   ([78]Figure 2D). Subsequently, we selected the blue and brown modules
   with the highest correlation coefficient for downstream analysis. The
   scatter plots in [79]Figures 2E, [80]F show a significant correlation
   between gene significance and module memberships in the aforementioned
   modules, while the details can be accessed in [81]Supplementary
   material 2.

Figure 2.

   [82]Figure 2
   [83]Open in a new tab

   WGCNA identified two stroke-related miRNA modules: (A) a sample
   clustering tree; (B) soft threshold determination; (C) cluster
   dendrogram; (D) a module-trait correlation heatmap; (E) a scatter plot
   presenting correlation of module membership vs. gene significance in
   the blue module; and (F) a scatter plot presenting correlation of
   module membership vs. gene significance in the brown module.

   Stroke-related miRNAs were identified with the intersection of
   DE-miRNAs and modules of interest found in WGCNA, leading to the
   identification of 117 miRNAs ([84]Figure 3A). A total of 2,387 target
   mRNAs were predicted with these stroke-related miRNAs according to the
   databases. Thus, these mRNAs were identified as stroke-related genes.

Figure 3.

   [85]Figure 3
   [86]Open in a new tab

   Identification of potential DEGs in PSD and enrichment analysis: (A)
   identification of stroke-related miRNAs; (B) identification of
   overlapped DEGs in PSD; (C) GO enrichment analysis; and (D) KEGG
   pathway analysis.

3.2. Identification of overlapped DEGs in post-stroke depression

   Using mRNA expression profiles in [87]GSE76826, we identified 641 DEGs
   with 10 samples from late-onset MDD and 12 controls. To find potential
   biomarkers in PSD, we intersected stroke-related genes with DEGs
   observed in MDD patients. As a result, 99 DEGs were identified
   ([88]Figure 3B). The list of the 99 DEGs can be accessed in
   [89]Supplementary material 3.

   Thereafter, using clusterProfiler in R, GO functional and KEGG
   enrichment analyses were performed on the 99 DEGs to further understand
   their biological functions. As represented in [90]Figure 3C, the
   biological process (BP) was significantly enriched in T cell
   differentiation in the thymus, mononuclear cell differentiation,
   cellular response to glucose starvation, regulation of striated muscle
   cell differentiation, regulation of lymphocyte apoptotic process; the
   cellular component (CC) was particularly enriched in the cytoplasmic
   side of the plasma membrane, cytoplasmic side of the membrane, nuclear
   speck, podosome, heterotrimeric G-protein complex; and molecular
   function (MF) was mainly enriched in DNA-binding transcription
   activator activity, RNA polymerase II-specific, DNA-binding
   transcription activator activity, cytokine receptor binding,
   platelet-derived growth factor receptor binding, and mitogen-activated
   protein (MAP) kinase activity.

   Furthermore, the KEGG pathway analysis of the 99 DEGs is shown in
   [91]Figure 3D. Among all the pathways enriched, the top five most
   significant pathways were as follows: sphingolipid signaling pathway,
   EGFR tyrosine kinase inhibitor resistance, human cytomegalovirus
   infection, FoxO signaling pathway, and MAPK signaling pathway. Among
   them, MAPK signaling pathways have been associated with the
   pathophysiology of PSD in several studies ([92]28–[93]30).

   The PPI network of the 99 DEGs was constructed by the STRING online
   database with high confidence of >0.4 applied. The disconnected nodes
   (genes) were removed from the PPI network ([94]Figure 4A). The network
   was then presented using the cytoHubba tool in Cytoscape ([95]Figure
   4B); the top 10 hub genes were as follows: TP53, MAPK14, VEGFA, GPR29,
   CD40LG, SMAD3, GNAQ, PTEN, IL7R, and IL6R.

Figure 4.

   [96]Figure 4
   [97]Open in a new tab

   Protein–protein interaction (PPI) network using the overlapped DEGs.
   (A) PPI network based on the STRING database; (B) representation of the
   network using Cytoscape.

3.3. Selection of feature mRNA by machine-learning algorithms and
construction of a miRNA–mRNA network

   For a further selection of the mRNA features for post-stroke
   depression, we used the LASSO algorithm to identify a set of 13 mRNAs
   ([98]Figure 5A), the SVM algorithm to select a set of 10 mRNAs
   ([99]Figures 5B, [100]C), and the RF algorithm to select a set of 23
   mRNAs ([101]Figure 5D). Specifically, a total of 99 genes were selected
   by the SVM algorithm to construct the classification model using a
   10-fold cross-validation. When 10 genes were selected, the accuracy of
   the model was the highest ([102]Figures 5B, [103]C). The RF algorithm
   screened out a total of 99 genes, and the top 23 genes with positive
   values of importance were selected ([104]Supplementary material 4).

Figure 5.

   [105]Figure 5
   [106]Open in a new tab

   Machine-learning methods identified three hub genes in PSD: (A) 10-fold
   cross-validation for tuning parameter selection in the LASSO model,
   where LASSO identified 13 mRNAs; (B) accuracy of the SVM algorithm; (C)
   error of the estimate generation for the SVM algorithm; (D)
   relationship between model error rate and number of trees for the RF
   algorithm; and (E) feature selection with the intersection of results
   from LASSO, SVM, and RF algorithms.

   RF: The RF algorithm screened out a total of 99 genes, and the top 23
   genes with positive values of importance were selected (datasheet
   attached). After combining the mRNAs screened out via the LASSO, SVM,
   and RF algorithms, three diagnostic mRNAs (SPATA2, ZNF208, and YTHDC1)
   were identified for post-stroke depression ([107]Figure 5E). These
   genes were also validated using the ROC curves with AUCs > 0.89
   ([108]Figures 6A–[109]C). The prediction of these genes revealed a
   panel of four miRNAs, with which the genes form a miRNA–mRNA regulatory
   network, as presented in [110]Figure 6D.

Figure 6.

   [111]Figure 6
   [112]Open in a new tab

   ROC validation of three hub genes (A) SPATA2, (B) YTHDC1, (C) ZNF208
   and construction of a miRNA–mRNA network (D).

4. Discussion

   In the present study, we identified stroke-related genes through
   differential analysis, WGCNA, as well as target prediction via miRNAs.
   Subsequently, 99 overlapped DEGs were identified in late-onset MDD to
   reveal potential biomarkers in PSD. Enrichment analysis revealed that
   these genes were implicated in pathways related to sphingolipid
   signaling, EGFR tyrosine kinase inhibitor resistance, human
   cytomegalovirus infection, FoxO signaling, and MAPK signaling.
   Furthermore, three machine-learning algorithms were employed to explore
   gene signatures for PSD, which was validated with the ROC curves. At
   last, a miRNA–mRNA network was constructed.

   Our study highlighted gene signatures for PSD with three genes: SPATA2,
   ZNF208, and YTHDC1. However, there were no reports on their role in
   PSD. SPATA2 enables signaling receptor complex adaptor activity and
   ubiquitin-specific protease binding activity ([113]31). SPATA2 is
   involved in several processes, including protein deubiquitination,
   necroptotic process, and tumor necrosis factor-mediated signaling
   pathway ([114]32, [115]33). The knockdown of SPATA2 leads to the
   activation of P38MAPK and NLRP3 inflammasome and the enhancement of
   NF-κB signaling, indicating that SPATA2 plays a protective role against
   brain inflammation induced by ischemia/reperfusion injury ([116]34).
   ZNF208 polymorphisms were observed to be associated with ischemic
   stroke in a Chinese Han population ([117]35); however, no other reports
   concerning its role in stroke or depression were reported. An
   increasing number of studies have shown that YTHDC1, an important
   N6-methyladenosine (m6A) reader, plays a key role in multiple
   biological functions as well as in disease progression. It was observed
   that YTHDC1 could be protective against ischemic stroke by enhancing
   Akt phosphorylation via destabilizing PTEN mRNA ([118]36). Therefore,
   it could be a potential therapeutic target for ischemic stroke. With
   the gene signatures for PSD, we predicted a list of four miRNAs
   (miR-6883-5p, miR-6873-3p, miR-4776-3p, and miR-6738-3p) based on the
   databases. Subsequently, a miRNA–mRNA network was developed, and it
   could shed light on the pathogenesis of PSD.

   Emerging studies have investigated the role of miRNA–mRNA networks in
   the pathogenesis and progression of diseases, such as HBV-related
   hepatocellular carcinoma ([119]37), stroke due to atrial fibrillation
   ([120]38), and MDD in ovarian cancer patients ([121]39). These studies
   revealed potential mechanisms by which the existing risk factor or
   disease contributes to the development of specific complications. In
   the case of PSD, datasets were exploited to find overlapped DEGs in
   stroke and late-onset MDD. Using three machine-learning classifiers, we
   further selected three feature genes and four upstream miRNAs, which
   could be potential targets for PSD treatment. To the best of our
   knowledge, the present study was the first to depict a miRNA–mRNA
   network for PSD; further investigations with a focus on the biological
   functions of these miRNAs and mRNAs are necessary. Our study suffered
   from a lack of wet lab validation, which may be the major limitation.
   However, the role of the miRNAs and mRNAs on PSD was first reported in
   our study, which could be of interest to further studies.

5. Conclusion

   Our study highlighted gene signatures for PSD with three genes: SPATA2,
   ZNF208, and YTHDC1; their upstream miRNAs were predicted as follows:
   miR-6883-5p, miR-6873-3p, miR-4776-3p, and miR-6738-3p. The miRNA–mRNA
   network was constructed, and these biomarkers could further our
   understanding of the pathogenesis of PSD.

Data availability statement

   The original contributions presented in the study are included in the
   article/[122]Supplementary material, further inquiries can be directed
   to the corresponding authors.

Ethics statement

   Ethical review and approval was not required for the study on human
   participants in accordance with the local legislation and institutional
   requirements. Written informed consent from the patients/participants
   or patients/participants' legal guardian/next of kin was not required
   to participate in this study in accordance with the national
   legislation and the institutional requirements.

Author contributions

   HQ, YM, and YS: data curation, formal analysis, roles/writing—original
   draft, and writing—review and editing. LS: data curation. YM and YS:
   conceptualization, design, and methodology. All authors contributed to
   the article and approved the submitted version.

Funding Statement

   The study was funded by the National Key R&D Program of China
   (2022YFC2009700 and 2022YFC2009701) and Basic Science (Natural Science)
   Research Project of Higher Education Institutions in Jiangsu Province
   (No. 23KJB320001).

Conflict of interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Publisher's note

   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.

Supplementary material

   The Supplementary Material for this article can be found online at:
   [123]https://www.frontiersin.org/articles/10.3389/fneur.2023.1096911/fu
   ll#supplementary-material
   Supplementary material 1

   DE-miRNA list retrieved from [124]GSE117064.
   [125]Click here for additional data file.^ (54.7KB, XLS)
   Supplementary material 2

   Correlations between gene significance and module membership.
   [126]Click here for additional data file.^ (227KB, XLS)
   Supplementary material 3

   List of 99 DEGs implicated in PSD.
   [127]Click here for additional data file.^ (638B, XLS)
   Supplementary material 4

   Gene list generated by the RF algorithm.
   [128]Click here for additional data file.^ (1.3KB, XLS)

References