Abstract
Analyses of long non-coding RNAs (lncRNAs) and microRNAs (miRNAs)
implicated in myocardial infarction (MI) have increased our
understanding of gene regulatory mechanisms in MI. However, it is not
known how their expression fluctuates over the different stages of MI
progression. In this study, we used time-series gene expression data to
examine global lncRNA and miRNA expression patterns during the acute
phase of MI and at three different time points thereafter. We observed
that the largest expression peak for mRNAs, lncRNAs, and miRNAs
occurred during the acute phase of MI and involved mainly
protein-coding, rather than non-coding RNAs. Functional analysis
indicated that the lncRNAs and miRNAs most sensitive to MI and most
unstable during MI progression were usually related to fewer biological
functions. Additionally, we developed a novel computational method for
identifying dysregulated competing endogenous lncRNA-miRNA-mRNA
triplets (LmiRM-CTs) during MI onset and progression. As a result, a
new panel of candidate diagnostic biomarkers defined by seven lncRNAs
was suggested to have high classification performance for patients with
or without MI, and a new panel of prognostic biomarkers defined by two
lncRNAs evidenced high discriminatory capability for MI patients who
developed heart failure from those who did not.
Keywords: non-coding RNA, myocardial infarction, competing endogenous
RNA, expression profile
INTRODUCTION
Myocardial infarction (MI) is one of the most severe coronary artery
diseases, and a leading cause of morbidity and mortality in developed
and developing countries [[37]1]. Although currently available
biomarkers such as cardiac troponins T and I and creatine kinase-MB are
valuable aids in the diagnosis of MI, novel biomarkers may
substantially increase early diagnosis accuracy to improve treatment
strategies and patient outcomes. Additionally, since the 5-year
mortality rate of the patients who developed heart failure (HF) after
MI is as high as 50% [[38]2], identifying early-stage prognostic
biomarkers associated with post-MI HF is also very important. Although
for some specific conditions the molecular mechanisms underlying MI
have been defined, the dynamic modulation of gene expression,
especially of non-coding RNAs (ncRNAs), during MI progression has not
been fully investigated at a system level.
The proportion of the human genome encoding protein-coding genes is
only ~2%, and estimations based on current sequencing methods suggest
that most of the human transcriptome is composed of ncRNAs [[39]3].
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) are two important
ncRNA classes. Many lncRNAs share miRNA binding sites with other coding
and non-coding transcripts, and thus regulate the miRNA pool and
influence posttranscriptional control by acting as competing endogenous
RNAs (ceRNAs) [[40]4, [41]5]. Moreover, mounting evidence indicates
that lncRNAs and miRNAs can exert transcriptional and epigenetic
regulation, and influence the course of various diseases including MI
[[42]6–[43]9]. The potential of these ncRNAs as diagnostic and
prognostic biomarkers in MI is suggested by studies demonstrating that
dysregulated lncRNA and miRNA expression is closely associated with MI
initiation and progression [[44]10–[45]12]. For example, inhibition of
lncRNA-TUG1 was recently found to ameliorate myocardial injury and
protect against acute MI by upregulation of miR-142-3p and subsequent
suppression of HMGB1 and Rac1 expression [[46]8].
Time-series gene expression data provide more valuable information than
steady-state expression data for deciphering molecular mechanisms
mediating biological processes and disease progression [[47]13–[48]17].
For example, dynamic gene regulatory networks for human myeloid
differentiation were constructed using time-series RNA-seq and ATAC-seq
data, from which a role for PU.1 and other key transcriptional
regulators in maintaining and driving regulatory circuits specified
during human myeloid differentiation was identified [[49]18]. Whereas
the dynamic characteristics of lncRNAs and circRNAs in cardiac
differentiation have been recently explored [[50]19] and our own work
recently addressed the temporal changes in the expression, biological
function, and regulatory interactions among miRNAs, TFs, and target
genes during MI [[51]20], the expression profiles of lncRNAs and miRNAs
during MI progression remain unexplored.
To characterize dynamic patterns of lncRNA and miRNA expression and
their potential functions in MI progression, we systematically analyzed
lncRNA and miRNA expression profiles at four time points after MI.
Additionally, we propose a novel algorithm for identifying dysregulated
competing endogenous lncRNA-miRNA-mRNA triplets (LmiRM-CTs) during MI
progression, with which we identified new panels of lncRNA biomarkers
for MI diagnosis and prognosis.
RESULTS
Dynamic expression and functional characteristics of lncRNAs and miRNAs
during MI progression
We investigated the dynamic expression of lncRNAs and miRNAs in MI
patients at four time points: on admission (day 1 of MI), at discharge
(4-6 days after MI), 1 month after MI, and 6 months after MI. To this
end, we first analyzed the global expression distribution of mRNAs,
lncRNAs, and miRNAs in MI patients. As shown in [52]Figure 1A, the
average expression level of miRNAs and lncRNAs was significantly lower
than that of mRNAs (p<2.2e-16, Kolmogorov-Smirnov test), and miRNAs
displayed the lowest expression levels. This was in accordance with
current knowledge attesting lower expression of lncRNAs compared to
protein-coding genes in different tissues [[53]21]. Subsequently, we
identified significantly differentially expressed (SDE) genes between
adjacent MI stages, and calculated the percentage of SDE mRNAs,
lncRNAs, and miRNAs relative to their total numbers in expression
profiles ([54]Figure 1B). Results showed that the largest percentage of
SDE transcripts corresponded to the acute phase of MI (day 1), and
among those, mRNAs were the most abundant. This indicated that the
largest transcriptome change in MI occurs during the acute phase, and
involves mainly differential expression of protein-coding, rather than
ncRNA-coding, genes. In addition, we observed that most SDE transcripts
(65.0% of mRNAs, 72.2% of lncRNAs, and 66.7% of miRNAs) were
upregulated in the acute phase of MI ([55]Figure 1C). Interestingly,
while most SDE mRNAs were downregulated in the successive MI stages,
the expression of SDE lncRNAs and miRNAs changed in parallel, as most
of them were downregulated 4-6 days after MI, upregulated 1 month
post-MI, and downregulated again 6 months after MI.
Figure 1.
[56]Figure 1
[57]Open in a new tab
Global gene expression distribution in MI and analysis of SDE genes
during MI progression. (A) Expression distribution of mRNAs, lncRNAs,
and miRNAs in MI patients. (B) Percentage of SDE mRNAs, lncRNAs, and
miRNAs estimated by comparing expression data between adjacent stages
during MI progression. S0, control; S1, day 1 of MI; S2, 4-6 days after
MI; S3, 1 month after MI; S4, 6 months after MI. (C) Percentage of
upregulated and downregulated SDE genes during MI progression. (D)
Relative expression levels of three lncRNAs that showed differential
expression at each MI stage.
SDE lncRNAs and miRNAs were further explored to identify the most
responsive ones at each stage. As a result, three lncRNAs (AF131216.7,
LINC00052, and RP11-739B23.1), but no miRNAs, were retrieved. As shown
in [58]Figure 1D, the expression trend for these lncRNAs was consistent
during MI progression. Whereas LINC00052 has been recently associated
with certain cancers [[59]7, [60]22, [61]23], no disease associations
have been detected, to the best of our knowledge, for AF131216.7 and
RP11-739B23.1.
To further study the dynamic changes in lncRNAs and miRNAs expression
during MI progression, we classified stage-specific SDE lncRNAs and
miRNAs into 6 and 3 clusters, respectively, using Mfuzz [[62]24]. As
shown in [63]Figure 2A, lncRNAs in Clusters 1, 2,4, and 5 were all
rapidly upregulated reaching a plateau on day 1 after MI, but they had
different expression patterns in the following stages. Among all
clusters, Cluster 5 had the most dramatic expression changes at each
stage. Stage-specific variations were also noted for SDE miRNAs
([64]Figure 2B). Cluster 1 was downregulated, while Clusters 2 and 3
were upregulated on day 1 after MI, and for these three clusters
differing expression patterns ensued in subsequent stages. These
results demonstrate that changes in the expression of lncRNAs and
miRNAs during MI progression were not linear, but often evidenced
drastic transitions at different time points.
Figure 2.
[65]Figure 2
[66]Open in a new tab
Expression patterns of SDE lncRNAs and miRNAs during MI progression.
(A) SDE lncRNA clusters. (B) SDE miRNA clusters. The analysis was
performed on R using the Mfuzz package.
Next, we examined the potential biological functions of the SDE lncRNAs
and miRNAs included in the different clusters, and identified KEGG
subpathways significantly enriched with these transcripts with basis on
their co-expressed mRNAs ([67]Supplementary Table 1). Subpathways
associated with MI are shown in [68]Figure 3. For lncRNAs ([69]Figure
3A), Clusters 1 and 2 shared several common subpathways. Some of these,
closely related to MI, such as PI3K-Akt signaling pathway, MAPK
signaling pathway, Chemokine signaling pathway, T cell receptor
signaling pathway, and apoptosis, were also shared with Clusters 3, 4,
and 6. Simultaneously, we found that the lncRNAs and miRNAs with the
sharpest expression changes during MI progression participated in fewer
biological pathways. Thus, the lncRNAs in Cluster 5 enriched the fewest
subpathways, while no subpathway was found to be significantly enriched
with the miRNAs in Cluster 2 ([70]Figure 3B). These results suggested
that the lncRNAs and miRNAs most sensitive to environmental changes and
thus unstable during MI progression might be of lesser biological
relevance.
Figure 3.
[71]Figure 3
[72]Open in a new tab
Subpathway enrichment analysis of MI-related lncRNA/miRNA clusters. (A)
MI-related KEGG subpathways enriched for different lncRNA clusters. (B)
MI-related KEGG subpathways enriched for different miRNA clusters.
Identification of dysregulated LmiRM-CTs and validation of their roles in MI
We developed a novel computational method for identifying dysregulated
LmiRM-CTs in MI. This was done by integrating sample-matched expression
profiles from 73 MI patients and 46 control samples in a Gene
Expression Omnibus (GEO) dataset, and experimental verification of
regulatory interactions among mRNAs, lncRNAs, and miRNAs (see Materials
and Methods). As a result, 1,173 dysregulated LmiRM-CTs comprising 517
mRNAs, 49 lncRNAs, and 35 miRNAs were obtained ([73]Supplementary Table
2). We validated the roles of these dysregulated LmiRM-CTs in MI from
several perspectives and compared our method with the traditional one.
The latter considered a LmiRM-CT as dysregulated when the mRNA, miRNA,
and lncRNA in the LmiRM-CT satisfied the following criteria [[74]25]:
(1) they were all SDE in MI samples compared with controls; (2) the
mRNA shared a significant number of miRNA binding sites with its paired
lncRNA (hypergeometric test); (3) negative correlations existed within
miRNA-mRNA and miRNA-lncRNA pairs, and the mRNA-lncRNA interaction was
positively correlated in controls, but not in MI samples. Thus, the
traditional method yielded 941 dysregulated LmiRM-CTs, which included
427 mRNAs, 46 lncRNAs, and 32 miRNAs ([75]Supplementary Table 2), but
did not provide scores for these LmiRM-CTs.
We first analyzed the distribution of SDE and MI-related mRNAs, miRNAs,
and lncRNAs comprising dysregulated LmiRM-CTs. We found that the
proportion of both SDE and MI-related transcripts in dysregulated
LmiRM-CTs was significantly higher than in candidate LmiRM-CTs
(hypergeometric test, p<0.001 and p<0.05, respectively). SDE and
MI-related transcripts in the top 5%, 10%, 15%, 20%, 30%, 40%, and 50%,
and in the full (100%) dysregulated LmiRM-CT set were also examined. As
demonstrated in [76]Figure 4A and [77]4B, and in [78]Supplementary
Table 3, the top-ranked dysregulated LmiRM-CTs had more SDE and
MI-associated transcripts, while dysregulated LmiRM-CTs identified by
the traditional method represented a lower rate of transcripts related
to MI (5.74%) compared to our method (6.82%). [79]Figure 4B shows that
the top 15% dysregulated LmiRM-CTs exhibited the largest percentage of
MI-related transcripts. Since ceRNAs analysis is still a developing
field, newer studies on MI-associated ncRNAs aided by tools to increase
transcript profiling accuracy might help unmask additional LmiRM-CTs of
clinical relevance.
Figure 4.
[80]Figure 4
[81]Open in a new tab
Ranked distribution of SDE and MI-related transcripts. Distribution of
SDE transcripts (A) and MI-related transcripts (B) is shown for the top
5%, 10%, 15%, 20%, 30%, 40%, 50%, and 100% of dysregulated LmiRM-CTs.
(C) Significantly enriched MI-related subpathways specifically detected
by our method.
We next investigated the putative biological functions of the
dysregulated LmiRM-CTs. Using SubpathwayMiner [[82]26], 72 and 74
significant KEGG subpathways were revealed using our method and the
traditional one, respectively (p<0.05, [83]Supplementary Table 4). We
found that several MI-related pathways, such as the signal
transduction-related PI3K-Akt and MAPK pathway, the
inflammation-related chemokine signaling pathway, and the
immune-related B cell and T cell receptor signaling pathway were shared
by both methods. However, some pathways with critical roles in MI
initiation and progression, as those involving atherogenesis [[84]27],
inflammation-related leukocyte transendothelial migration, ventricular
remodeling-related TGF-β signaling [[85]28], blood pressure-related
vascular smooth muscle contraction [[86]29], and angiogenesis-related
VEGF signaling, were significantly enriched on our method, but not on
the traditional one ([87]Figure 4C and [88]Supplementary Table 4).
These observations support the strength of our approach in identifying
LmiRM-CTs dysregulated in MI.
Progression-related dysregulated LmiRM-CTs analysis reveals diagnostic and
prognostic biomarkers of MI
To identify potential diagnostic and prognostic biomarkers of MI, we
applied our method to identify progression-related dysregulated
LmiRM-CTs at the four indicated MI stages ([89]Supplementary Table 5).
Thus, 3,126 LmiRM-CTs (940 mRNAs, 59 miRNAs, and 77 lncRNAs) for day 1
of MI, 459 LmiRM-CTs (279 mRNAs, 36 miRNAs, and 55 lncRNAs) for days
4-6 after MI, 641 LmiRM-CTs (367 mRNAs, 45 miRNAs, and 58 lncRNAs) for
month 1 after MI, and 749 LmiRM-CTs (427 mRNAs, 43 miRNAs, and 57
lncRNAs) for month 6 after MI were detected. The acute phase (day 1) of
MI exhibited the largest number of dysregulated LmiRM-CTs, indicating
that the most pronounced changes in gene expression and regulatory
interactions occurred at this stage.
We next focused on the 20 lncRNAs and 15 miRNAs specific for the MI
acute phase ([90]Supplementary Figure 1). A Random Forest supervised
classification algorithm was applied and 10 lncRNAs and 7 miRNAs mostly
related to MI occurrence were selected (see Materials and Methods).
There were 2^10-1=1023 and 2^7-1=127 combinations of these lncRNAs and
miRNAs, respectively. Classification accuracies for all the
combinations were computed using the support vector machine (SVM)
classification model, and the optimal biomarkers with the highest
classification accuracy were identified. Consequently, two biomarker
panels defined respectively by 7lncRNAs ([91]AC016747.3, MIR4697HG,
RMRP, RP11-2C24.4, RP11-802E16.3, RP4-785G19.5, and TBC1D3P1-DHX40P1)
and 4 miRNAs (hsa-mir-144, hsa-mir-200b, hsa-mir-211, and hsa-mir-29a)
were defined in the discovery cohort. In the training set, 5-fold
cross-validation accuracies of 0.824 and 0.782 and AUC values of 0.859
and 0.823 were obtained, respectively, for the 7 lncRNAs and 4 miRNAs
signatures ([92]Figure 5A and [93]Supplementary Figure 2A).
Furthermore, we examined these signatures in an independent test set
([94]GSE62646) which included 28 MI patients and 14 control samples. In
this dataset, accuracies of 0.980 and 0.667 and AUC values of 0.814 and
0.700 were obtained for the 7 lncRNAs and 4 miRNAs signatures,
respectively ([95]Figure 5B and [96]Supplementary Figure 2B). We next
performed hierarchical clustering analysis using expression data of
these two panel biomarkers and 2 major sample clusters were found
([97]Figure 5C and [98]5D, and [99]Supplementary Figure 2C and
[100]2D). For the 7 lncRNAs, the rates of MI patients in the predicted
MI group were 72.4% (71/98) and 93.3% (28/30) in the training and test
sets, respectively, whereas the corresponding rates in the predicted
control group were 90.5% (19/21) and 100% (12/12), respectively. The
classification results of the 4 miRNAs were not better than those of
the 7 lncRNAs ([101]Supplementary Figure 2C and [102]2D).
Figure 5.
[103]Figure 5
[104]Open in a new tab
Classification performance of diagnostic lncRNA biomarkers for MI.
Performance evaluation of the 7 diagnostic lncRNA biomarkers in the
training (A) and test (B) sets using 5-fold cross-validation.
Hierarchical clustering heat map of the expression profiles of 7
lncRNAs in (C) the training set (119 samples) and (D) the test set (42
samples).
Additionally, we tested whether early changes in gene expression could
predict disease prognosis and distinguish patients who developed HF
after MI from those who did not. Therefore, 20 lncRNAs and 15 miRNAs
that were specifically dysregulated in the acute phase of MI were
examined. In the same way, 10 lncRNAs and 7 miRNAs mostly related to MI
prognosis were selected. It is worth noting that they were different
from those associated with MI diagnosis. Finally, two panel biomarkers
defined by 2 lncRNAs ([105]AC084018.1 and LOC100128288) and 2 miRNAs
(hsa-mir-211 and hsa-mir-214) were defined, showing respectively
accuracies of 1.000 and 0.385 and AUC values of 0.857 and 0.857 using
5-fold cross-validation ([106]Figure 6A and [107]Supplementary Figure
3A). Hierarchical clustering heat maps of the two biomarker panels are
shown in [108]Figure 6B and [109]Supplementary Figure 3B. For the 2
lncRNAs, HF patient rate in the predicted HF group was 63.6% (7/11),
whereas the corresponding rate in the predicted non-HF group was 100%
(2/2). The classification results of the 2 miRNAs were no better than
those of the 2 lncRNAs. The above results suggest that the lncRNA
biomarkers we identified had higher classification efficiency than the
miRNA biomarkers for both MI diagnosis and prognosis, and that the
lncRNA signatures reliably distinguished MI patients from controls and
MI patients who developed HF from those who did not. Additionally, we
found that most MI diagnostic lncRNA and miRNA biomarkers were
upregulated in MI patients, while all the MI prognostic lncRNA and
miRNA biomarkers were downregulated in MI patients who developed HF
([110]Figure 7 and [111]Supplementary Figure 4).
Figure 6.
[112]Figure 6
[113]Open in a new tab
Classification performance of prognostic lncRNA biomarkers for MI. (A)
Performance evaluation by 5-fold cross-validation of the 2 prognostic
lncRNA biomarkers in the training set. (B) Hierarchical clustering heat
map of 13 samples based on expression profiles of the 2 lncRNAs in the
training set.
Figure 7.
[114]Figure 7
[115]Open in a new tab
Relative expression of lncRNA biomarkers for MI diagnosis and
prognosis. (A) Relative expression of 7 lncRNAs in MI and control
samples. (B) Relative expression of 2 lncRNAs in HF and non-HF samples.
DISCUSSION
The current study explored the global dynamic expression and tentative
functions of lncRNAs and miRNAs during MI progression by systematically
analyzing MI-related time-series gene expression data. Using a novel
computational method integrating sample-matched mRNA, lncRNA, and miRNA
expression profiles, significantly dysregulated ceRNA triplets were
identified at different stages of MI progression.
We obtained lncRNA and miRNA expression profile data for dynamic
expression analysis through microarray re-annotation. Evidence shows
that about 10% to 30% of probes in microarrays designed for
protein-coding genes actually map to ncRNAs [[116]30], thus acquisition
of lncRNA expression information through re-annotation is a widely used
approach in transcriptomics studies [[117]31, [118]32]. According to a
previously described pipeline [[119]33], we extracted expression data
of lncRNAs and miRNAs directly from existing expression profiles, and
thus reduced the error. As a result, 1,282 lncRNAs and 260 miRNAs were
acquired. Currently, there are 1,913 miRNAs in the miRBase (release 22)
database and 15,779 lncRNAs in GENCODE (release 28). The ratio of
miRNAs (13.6%) we obtained from re-annotation was larger than that of
lncRNAs (8.12%).
We first retrieved candidate LmiRM-CTs from control samples (patients
with stable coronary artery disease and without a history of myocardial
infarction) and assumed that their dysfunction was associated with the
occurrence and development of MI. The extent of dysfunction of a
LmiRM-CT was assessed by integrating differential expression of the
corresponding transcripts and differential linked co-expression between
two transcripts in two biological states (consecutive MI stages).
Therefore, the method not only reflects expression changes of a single
transcript, but also reveals concurrent changes between two
transcripts. Compared with the traditional method, our method detected
a larger number of dysregulated LmiRM-CTs, and expanded the number of
MI-related biological pathways enriched by these ceRNA triplets.
Because our method was based on sample-matched mRNA, miRNA, and lncRNA
expression profiles with experimentally verified interactions within
individual miRNA-mRNA and miRNA-lncRNA duplexes, superior analysis
stringency was achieved. Interestingly, SVM classification analysis
suggested that two separate lncRNA signatures reliably distinguished MI
patients from controls and MI patients who developed HF from those who
did not. Information for these lncRNAs is scarce or absent, especially
on relation to MI. Among the dysregulated lncRNAs, RMRP was recently
found to be upregulated by hypoxia in cardiomyocytes, and to aggravate
myocardial ischemia-reperfusion injury by sponging miR-206 to target
ATG3 expression [[120]34]. In addition, upregulation of RMRP was also
reported to promote the activation of cardiac fibroblasts by regulating
miR-613 [[121]7].
In conclusion, the present study provided a system level exploration of
the landscape of dysregulated lncRNAs and miRNAs, and their
interactions with differentially expressed mRNAs, at different time
points over MI progression. Using a novel computational approach, we
present two new lncRNA biomarker panels to aid MI diagnosis and predict
HF in MI patients.
Worth noting, several limitations stand out in the present work and
merit further experimental validation. These include
underrepresentation of the actual number of miRNAs and lncRNAs
extractable from expression profiles, incomplete or biased
characterization of cell- tissue- or disease-specific regulatory
interactions between miRNA-mRNA and miRNA-lncRNA related to MI, and the
relatively small sampling size of the GEO dataset examined in this
study.
MATERIALS AND METHODS
RNA expression profiling
The mRNA expression profile data of [122]GSE59867 (based on Affymetrix
Human Gene 1.0 ST Array) was downloaded from the GEO database
([123]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse59867)
[[124]35]. The dataset included 111 patients with ST-segment elevation
MI and a control group comprised of 46 patients with stable coronary
artery disease and without a history of MI. The corresponding mRNA
expression profiles were obtained from peripheral blood mononuclear
cells at four time points: on admission (MI day 1), at discharge (4-6
days after MI), 1 month after MI, and 6 months after MI. We retained
samples that included expression profiles from all four time points,
and 73 case samples and 46 controls were obtained. Additionally, among
the 73 patients 13 had follow-up data, including 7 HF and 6 non-HF
cases.
LncRNA expression profiles were obtained by applying a lncRNA
classification pipeline based on transcript clusters in the Affymetrix
1.0 ST Arrays [[125]33]. First, we downloaded the Annotation file
(HuGene-1_0-st-v1 Transcript Cluster Annotations, CSV, Release 36) and
then mapped gene names to associated Affymetrix 1.0 ST transcript
cluster IDs. Each transcript cluster was assigned to an mRNA transcript
associated with an Ensembl gene ID and/or RefSeq transcript ID. Second,
for transcript clusters with Ensembl gene IDs, we retained those
annotated as “lincRNA”, “antisense_RNA”, “processed_transcript”,
“sense_intronic”, “TEC”, “3prime_overlapping_ncRNA”,
“bidirectional_promoter_lncRNA”, “sense_overlapping” and “non_coding”
in the GENCODE project
([126]https://www.gencodegenes.org/human/release_28.html) [[127]3]. For
transcript clusters with RefSeq transcript IDs, those labeled as “NR_”
(non-coding RNA) were retained. Finally, duplicated transcripts were
removed and lncRNA expression profiles were determined from 1,655
transcript clusters and 1,282 lncRNAs. MiRNA expression profiles were
obtained using the same pipeline as for lncRNAs. Transcript clusters
with an mRNA-assignment that included an miRNA and/or Refseq transcript
ID were retained. We unified miRNA names using miRBase (release 22)
database, and miRNA expression profiles comprising 252 transcript
clusters and 260 miRNAs were acquired.
Expression profile analyses
For multiple probes corresponding to the same gene within the mRNA,
lncRNA, and miRNA expression profiles, their median value was taken as
expression value for such gene. We then retained protein-coding genes
in mRNA expression profiles. Finally, 18,451 mRNAs, 1,282 lncRNAs, and
260 miRNAs were retained for further analysis. SDE genes were screened
by comparing expression data between two adjacent stages, and
differential analysis was performed using an empirical Bayesian method
implemented in R “limma” package [[128]36]. Genes with p<0.05 were
selected as SDE genes.
Expression and functional analysis of lncRNAs and miRNAs
LncRNAs and miRNAs were grouped according to their dynamic expression
patterns. The R package “Mfuzz” [[129]24] was used to detect
lncRNA/miRNA clusters with consistent expression trends during MI
progression. To investigate their biological functions, co-expressed
mRNAs were extracted based on a Pearson correlation coefficient
(PCC)>0.7 or <-0.7 and p<0.01. KEGG sub-pathway enrichment analysis for
co-expressed mRNAs was implemented using the R “SubpathwayMiner”
package [[130]26]. Significantly enriched sub-pathways were identified
based on p<0.05. If multiple significantly enriched sub-pathways
corresponded to an entire pathway, the sub-pathway with the lowest p
value was retained.
Regulatory interactions between miRNA-mRNA and miRNA-lncRNA duplexes
Experimentally verified miRNA-mRNA regulatory relationships were
collected from TarBase (version 6.0) [[131]37], miRTarBase (version
7.0) [[132]38], and miRecords (version 4) [[133]39] databases, and
391,694 non-redundant miRNA-mRNA interactions were obtained.
Experimentally confirmed miRNA-lncRNA interactions were retrieved from
starBase v2.0 [[134]40] and DIANA-LncBase v2.0 [[135]41] databases, and
64,716 non-redundant miRNA-lncRNA relationships were retained.
Candidate LmiRM-CTs
We assumed that LmiRM-CTs existed in control samples, and their
dysfunction would lead to initiation and progression of cardiac
diseases. Based on the ceRNA hypothesis [[136]4, [137]5], a candidate
LmiRM-CT in control samples was identified if it met all of the
following criteria: (1) the mRNA and the lncRNA shared a significant
number of miRNAs as determined by a hypergeometric test (p<0.05); (2)
the PCC of the mRNA (lncRNA) and the miRNA was negatively correlated
(p<0.05), and the PCC of the lncRNA and the mRNA was positively
correlated (p<0.05). To increase the reliability of the results, we
retained the top correlated interaction pairs for further analysis
[[138]42, [139]43]. Interaction pairs with PCCs above the threshold of
the 90^th percentile of the corresponding overall correlation
distribution were retained. By integrating expression profile data in
the control group and confirmed miRNA-mRNA and miRNA-lncRNA
interactions, 7,468 candidate LmiRM-CTs comprising 120 lncRNAs, 97
miRNAs, and 1,718 mRNAs were retained.
Identification of dysregulated LmiRM-CTs in MI
Dysregulated LmiRM-CTs in MI were identified by considering the
dysregulation extent of all genes (nodes) and their
regulatory/competing relationships (edges) in two different biological
states (i.e. case and control samples) using-sample matched expression
profiles. First, each node score was calculated through the following
formulas according to the extent of differential expression [[140]44,
[141]45]:
[MATH:
Scorenode=
φ−1(1−2×(1−φ(Dnod<
/mi>e))) :MATH]
(1)
[MATH:
Dnode=(−log<
mn>10p) ⋅|
log2FC| :MATH]
(2)
where φ^−1 is the inverse normal cumulative distribution function, p is
the p-value reflecting the significance of differential expression
determined by the R ‘limma’ package, and FC is the corresponding fold
expression change. Second, each edge score was computed according to
the change ingene co-expression between two different biological states
[[142]45–[143]47] using equations (3) and (4):
[MATH:
Scoreedge=
φ−1(1−2×(1−φ(|ξ|))) :MATH]
(3)
[MATH: ξ=F(rsta<
/mi>te2)⋅(−log<
mn>10psta
te2)−F(rsta<
/mi>te1)⋅(−log<
mn>10psta
te1)1nstate2−3+
1nsta<
/mi>te1−3 :MATH]
(4)
[MATH: F(r)=12(ln1+
r1−r) :MATH]
(5)
Here, r[state][2] and r[state][1] are the PCCs of gene expression in
state[2] and state[1] samples (i.e. case and control samples),
respectively, p[state][2] and p[state][1] are their respective
p-values, and n[state][2] and n[state][1] are the corresponding number
of samples. F is the Fisher transformation function, applied to improve
the power of identifying differentially rewired genes [[144]48].
Finally, the score of candidate LmiRM-CTs was computed by integrating
the node and the edge scores:
[MATH: Score=ω∑node∈
LmiRM−C<
mi>TScorenoden
node
+(1−ω)∑edge∈
LmiRM−C<
mi>TScoreedge
mrow>nedg
e
:MATH]
(6)
where n[node] and n[edge] are the number of nodes and edges in the
LmiRM-CT. Here, it is considered that regulatory relationships exist
for both miRNA-lncRNA and miRNA-mRNA duplexes, and competing
relationships exist between lncRNA-mRNA pairs. Therefore, a value of 3
is assigned to both n[node] and n[edge]. The weight parameter
[MATH: ω (0≤ω≤1) :MATH]
is used to control the contribution of the node and edge scores. Here,
both scores were considered equally weighted, and ω was defined as 0.5.
We performed permutation analysis to evaluate the significance of a
given LmiRM-CT. An arbitrary LmiRM-CT was generated by randomly
selecting a lncRNA, an miRNA, and an mRNA, and its score calculated
through the above equations. This process was repeated 10,000 times,
and the empirical p-value was defined as the proportion of randomly
obtained scores larger than the observed score:
[MATH:
p−value
=(Number of
mtext> Scorerandom>Score
)/10000
:MATH]
(7)
LmiRM-CTs with p<0.05 were selected as dysregulated LmiRM-CTs.
Collection of mRNAs, lncRNAs, and miRNAs related to MI
MI-related mRNAs were collected from DisGeNET (V5.0) [[145]49], a
comprehensive human gene-disease association database that integrates
many current, widely used gene-disease databases such as OMIM
[[146]50], the Genetic Association Database (GAD) [[147]51], the
Comparative Toxico genomics Database (CTD) [[148]52], the Mouse Genome
Database (MGD) [[149]53], PubMed, and Uniprot [[150]54]. We removed
repeated gene-disease entries, and 990 non-redundant MI-related mRNAs
were acquired.
MI-related lncRNAs were collected by performing a comprehensive
literature review. Relevant articles were compiled from a
experimentally confirmed human lncRNA-disease association database,
LncRNADisease (version 2.0) [[151]6] using the search phase “myocardial
infarction”, and from PubMed using the search phrase “myocardial
infarction AND (lncRNA OR long non-coding RNA)”. Each article was
manually searched for lncRNAs with aberrant expression in MI. Finally,
37 unique lncRNAs were obtained.
MI-related miRNAs were collected from a manually curated and
experimentally confirmed human miRNA-disease association database, HMDD
(version 3.1) [[152]55]. After removing redundant miRNA-disease
relations and unifying miRNA names according to the miRBase database
(release 22) [[153]56], 101 miRNAs were selected.
Identification of candidate diagnostic and prognostic biomarkers for MI
Candidate diagnostic and prognostic biomarkers for MI were identified
by applying a classification model based on SVM. This process was
performed using the R ‘e1071’ package, and the performance was
estimated by classification accuracy and the area under the receiver
operating characteristic curve (AUC) based on 5-fold cross-validation.
AUC values range from 0 to 1, with 0.5 indicating random performance
and 1.0 implying perfect predictive performance.
LncRNAs and miRNAs highly related to MI diagnosis and prognosis were
selected using a Random Forest supervised classification algorithm
[[154]57]. At each step, an importance score was computed for each
lncRNA/miRNA using the out-of-bag samples via permutation test, and the
lowest scoring thirds of the lncRNAs/miRNAs were removed. We then
reserved certain lncRNAs/miRNAs considering a balance between
classification accuracy and the number of lncRNAs/miRNAs. Finally,
classification accuracy for all combinations of the remaining
lncRNAs/miRNAs was assessed using SVM, and the optimal lncRNA/miRNA
biomarkers were obtained.
Supplementary Material
Supplementary Figures
[155]aging-12-102667-s005..pdf^ (1.1MB, pdf)
Supplementary Table 1
[156]aging-12-102667-s002..xlsx^ (33.5KB, xlsx)
Supplementary Table 2
[157]aging-12-102667-s006..xlsx^ (1.2MB, xlsx)
Supplementary Table 3
[158]aging-12-102667-s003..pdf^ (66.3KB, pdf)
Supplementary Table 4
[159]aging-12-102667-s001..xlsx^ (18.4KB, xlsx)
Supplementary Table 5
[160]aging-12-102667-s004..xlsx^ (151.9KB, xlsx)
Footnotes
AUTHOR CONTRIBUTIONS: H.S., G.Z., and Y.L. designed the research; H.S.,
H.S., and J.L. performed the research and analyzed the results; Z.B.
and X.L. analyzed the data. H.S., H.S., J.L. and J.W. wrote the paper.
All authors read and approved the final manuscript.
CONFLICTS OF INTEREST: The authors declare that they have no conflicts
of interest to disclose.
FUNDING: This work was supported by the National Natural Science
Foundation of China (Grants Nos. 31500675, 61602135, 81770480), the
Innovation Special Fund of Harbin Science and Technology Bureau of
Heilongjiang Province (2017RAQXJ203), the Postdoctoral Foundation of
Heilongjiang Province (LBH-Q17133), and the scientific research project
of Heilongjiang Province Health and Family Planning Commission (Grant
No. 2016-131). The funders had no role in study design, data collection
and analysis, interpretation, and writing of the report.
REFERENCES