Abstract
The activity of miRNA varies across different cell populations and
systems, as part of the mechanisms that distinguish cell types and
roles in living organisms and in human health and disease. Typically,
miRNA regulation drives changes in the composition and levels of
protein-coding RNA and of lncRNA, with targets being down-regulated
when miRNAs are active. The term “miRNA activity" is used to refer to
this transcriptional effect of miRNAs. This study introduces
miTEA-HiRes, a method designed to facilitate the evaluation of miRNA
activity at high resolution. The method applies to single-cell
transcriptomics, type-specific single-cell populations, and spatial
transcriptomics data. By comparing different conditions, differential
miRNA activity is inferred. For instance, miTEA-HiRes analysis of
peripheral blood mononuclear cells comparing Multiple Sclerosis
patients to control groups revealed differential activity of miR-20a-5p
and others, consistent with the literature on miRNA underexpression in
Multiple Sclerosis. We also show miR-519a-3p differential activity in
specific cell populations.
Subject terms: Statistical methods, Software, Computational models
__________________________________________________________________
This study introduces miTEA-HiRes, a method designed to facilitate the
evaluation of miRNA activity at high resolution. The method applies to
single cell transcriptomics, type-specific single cell populations, and
spatial transcriptomics data.
Introduction
MicroRNAs (miRNAs) are small RNAs (sRNAs) that operate by
post-transcriptionally repressing their target genes, either by
inducing messenger RNA (mRNA) deadenylation and degradation or by
inhibiting translation^[30]1,[31]2. Many studies have shown that
overexpression of miRNAs reduces the expression level of their
targets^[32]2–[33]6 and their protein levels^[34]3,[35]4,[36]7, while
miRNAs knockdown elevates the expression levels of their
targets^[37]2,[38]5,[39]6. miRNAs have a vital effect on many
developmental and disease processes, making them potential theraputic
targets in cancer^[40]8–[41]12, cardiovascular disease^[42]13–[43]15,
neurodegenerative diseases^[44]16, diabetes mellitus^[45]17, achute
ischemmic disease^[46]18, inflammatory disease^[47]19,
depression^[48]20 and more^[49]21–[50]25.
In the last twenty years, there has been significant progress in
single-cell RNA-sequencing, resulting in data and insight for different
cell types and across systems. This advancement has facilitated a
thorough investigation of cell diversity in developmental and disease
processes, relying almost exclusively on mRNAs and long non-coding RNAs
(lncRNAs)^[51]26,[52]27. The detection of miRNAs in single cells,
however, is one step behind. Early attempts relied on
PCR^[53]28–[54]32, nanomotors^[55]33, atomic force microscopy^[56]34,
isothermal amplification methods^[57]35–[58]37, or FISH^[59]38, all of
which are limited in the number of measured miRNAs. Small RNA single
cell sequencing was first attempted by ref. ^[60]39. Then, other
methods emerged for the simultaneous sequencing of sRNAs and
mRNAs^[61]40–[62]43. However, these are very limited in the number of
cells profiled. The concept of adding polyA tails to nonpolyadenylated
RNA strands, which was first introduced by ref. ^[63]44 for sequencing
of small (<150 bp) DNAs or RNAs, has later matured into the methods now
known as “totalRNA single cell sequencing". This approach detects RNA
strands of all lengths by adding polyA tails to nonpolyadenylated RNA,
either before or after fragementation^[64]45–[65]47.
In the past decade, high-throughput spatial transcriptomics methods
were developed^[66]48. Through spatially resolved gene expression, we
can now not only identify biological processes in a tissue, but also
pinpoint these processes to an exact location of the
tissue^[67]48–[68]52. Traditionally, these methods focused entirely on
mRNAs and lncRNAs. However, very recently the STRS (spatial total
RNA-sequencing) method was introduced^[69]53. STRS employs the same
polyadenylation concept (mentioned earlier for single cell) to slices
of tissue, resulting in a gene expression spatial map for genes of all
lengths.
The efforts to measure miRNA expression in single-cell and spatially
are important and would probably soon mature and become widespread.
However, they do not immediately solve the open question of expression
versus activity of miRNAs in a natural setting. The relationship
between miRNA expression and its level of activity, that is- the extent
of target repression, has been extensively
researched^[70]1,[71]7,[72]42,[73]54. Reference ^[74]7 found that
regulation by miRNAs is non-linear: miRNA can behave both as a switch
of target expression and as a fine-tuner. Reference ^[75]12 report on
studying both miRNA and mRNA profiles in a cohort of breast cancer
patients. The analysis leads to identifying miR29’s role in regulating
the extra-cellular matrix, as well as other findings. Reference ^[76]42
found that in a few examples, activity depends on expression levels of
both the miRNA and its targets, that is: miRNAs that were overall
expressed had a negative correlation with their overall expressed
targets. We also know that the relationship between a miRNA and its
targets may be more complex than a simple negative regulation, and
includes feedback loops, feed-forward loops and miRNA
’sponging’^[77]54. The recent emergence of totalRNA single-cell and
spatial methods enables us to initiate an exploration of miRNA to
target relationships at a higher resolution. However, current
transcriptomics technology also allows for inference and insight, as we
show in this work.
Several methods have been suggested to facilitate the study of miRNA
activity. Our group has formerly developed miTEA, a method to predict
miRNA activity by examining the mutual statistical enrichment in two
ranked lists - the list of expressed genes and the list of targets for
any given (pivot) miRNA (the output of a target prediction algorithm or
experiment)^[78]55. MIENTURNET^[79]56, DIANA-mirExTra2.0^[80]57 and
enrichMiR^[81]58 offer a miRNA target enrichment analysis for a set of
genes based on the hypergeometric test. DIANA-mirExTra2.0 and enrichMiR
also accept differential expression analysis results as input, and
enrichMiR also offers other statistical approaches. However, all of
these methods and tools focus on activity prediction for a
two-condition experiment, and do not support the more complex
architecture of single cell RNA sequencing experiments.^[82]59
developed miReact, which is a method aimed at evaluating miRNA activity
in single cell RNAseq datasets based on motif enrichment. They describe
sound statistics for this context and provide a friendly software
package. Their approach, however, does not utilize database
(experimentally validated) information and does not address spatial
transcriptomics. Also, it does not use the activity scores obtained for
subsets of the data to find significant differences in miRNA activity
between cell types or conditions. Olgun et al.^[83]60 developed
miRSCAPE, a computational tool to infer miRNA expression in single cell
data. They hypothesize that measuring miRNA expression can be performed
more accurately, exploiting the complex direct as well as indirect
regulatory links between miRNA and the expression of other mRNAs.
miRSCAPE demonstrates impressive results. However, the accuracy of the
results relies on finding and pre-training a model using a bulk dataset
of the same cell type and condition as the inspected single-cell
dataset, in order to capture the relevant regulatory links. This makes
miRSCAPE results highly variable, depending on the availability and
quality of such bulk dataset, affecting the generalizability,
reproducibility, and usability of the tool. Finally, all of the above
mentioned tools do not address spatial RNA sequencing.
In this work, we present miTEA-HiRes: a statistical approach to
interpret the activity of miRNAs based on the expression pattern of
their targets in single-cell RNAseq and spatial transcriptomics. The
method creates the list of targets by filtering the miRTarBase
database^[84]61. It then uses the multiple-sample setting to perform
normalization of gene expression values, and finally applies the
minimum HyperGeometric (mHG) test^[85]62,[86]63 to compute the level of
activity in each sample. We use this approach for two main purposes.
First, to create, for the first time to the best of our knowledge,
miRNA activity maps for scRNAseq and spatial datasets, enabling the
exploration of miRNA heterogeneity (variability) across different cell
types or across the geography of a sample. Second, to find
differentially active miRNAs between cells in different conditions, for
example, disease vs. normal. We evaluate our method using bulk data,
and single-cell miRNA induction experiments. We investigate the
relationship between miRNA expression and activity, as emerges from
total RNA single-cell sequencing.
Our method and usage instructions are available at
[87]https://github.com/EfiHerbst31/miTEA-HiRes.
Results
The statistical method
miTEA-HiRes relies on a precomputed miRNA-target interaction set, which
was curated from the miRTarBase database as in ref. ^[88]63: a
miRNA-target interaction is included in the interaction set if it is
functional, and has at least one strong experimental support or two
weak ones. Statistical characteristics of the final list of
miRNA-target interactions can be found in Supplementary Fig. [89]1.
The miTEA-HiRes analysis has two steps. At the first step, it computes
the activity p-values, representing the level of activity of every
evaluated miRNA in each of the cells or spots. At the second step,
miTEA-HiRes combines the activity p-values for each miRNA to yield
activity scores, indicating its biological significance. miTEA-HiRes
also produces activity maps and histograms when appropriate.
First step: computing activity p-values
As a preprocessing step ([90]1, top row), miTEA-HiRes first performs
normalization of total reads per spot (for spatial datasets) or cell
(for single cell datasets). Next, it performs a Z-score transformation
per gene. We recommend preprocessing data to account for batch
effects^[91]64 if relevant, and to remove cells with very few reads
(<1000) before using it as input for miTEA-HiRes, as these cells may
disproportionately impact z-scores. Then, for each spot or cell,
miTEA-HiRes ranks the genes in ascending order of their Z-scores (see
Methods for the mathematical definitions). Our main assumption is that
if a certain miRNA is active, then its targets would be down-regulated,
so they would appear at the top of this ranked list (as depicted in
Supplementary Fig. [92]3). Enriched occurrence is evaluated using the
minimum-HyperGeometric (mHG) test^[93]55,[94]62,[95]65–[96]67, a
statistical procedure assessing the enrichment level of a set of
elements at the top of a list. The mHG test is performed for each
evaluated miRNA, for all spots\cells. The reported p-values of the mHG
test are then used as the activity p-values associated with every pair
of miRNA and cell\spot (Fig. [97]1, middle row). The activity matrix,
containing the activity p-values, is provided as output for users to
facilitate further user-defined analysis, in addition to the analysis
performed by miTEA-HiRes pipeline. For example - an analysis based on
clustering according to activity profiles can be performed. See Methods
and pseudo-code (Supplementary Algorithm [98]1) for a more detailed
description of this step.
Fig. 1. miTEA-HiRes pipeline.
[99]Fig. 1
[100]Open in a new tab
Top row: normalization and standardization. Middle row: computation of
activity p-values (shown for spatial data. The same process applies to
single-cell data). Bottom row: computation of aggregated activity
scores (for all data types); Generation of spatial activity maps for
spatial data, or activity maps on a UMAP layout for single-cell data;
Statistical comparison between activity p-values of different cell
populations for single-cell data in comparative activity mode. See text
and Methods for more details.
Second step: computation of aggregated activity scores
For spatial data, miTEA-HiRes computes the aggregated activity score of
each miRNA as the percentage of active spots (defined by thresholding
the activity p-values) and outputs a list of miRNAs sorted by this
value. miTEA-HiRes also provides an activity map for each miRNA over
the spatial coordinates (Fig. [101]1 bottom row and Fig. [102]2).
Fig. 2. miTEA-HiRes spatial activity maps.
[103]Fig. 2
[104]Open in a new tab
Activity maps of two selected miRNAs per tissue are presented in the
two rightmost columns. Colors of activity maps represent -log10 of
activity p-values. Corresponding H&E slides and GE clustering, provided
by Visium, appear in the two leftmost columns. Datasets presented (from
top to bottom): mouse brain (2264 spots), human breast cancer (4898),
human skin melanoma (3458), human lung cancer (3858), human ovarian
cancer (4674) and human cerebellum (4992).
For single cell data, miTEA-HiRes has two working modes. The total
activity mode treats the dataset as a single population of cells, so it
does not require any cell annotations. In this mode, the aggregated
activity scores are defined as the corrected average activity p-values
(Methods). miTEA-HiRes then reports the miRNAs sorted by their activity
scores, as well as an activity map for each miRNA over a UMAP layout
(Fig. [105]1 bottom row).
The second mode is the comparative activity mode. In this mode,
miTEA-HiRes performs a two-sided Wilcoxon rank-sum (WRS) test to
evaluate the difference in activity p-values between compared
populations or conditions (for example: cell types, disease vs.
control). miTEA-HiRes then defines the aggregated activity scores by
correcting the WRS p-values using a false-discovery rate (FDR)
correction. For this mode, the terms “activity score" and “corrected
WRS p-value" are used interchangeably. miTEA-HiRes reports the miRNAs
that have significant activity p-values in at least one of the compared
populations and manifest significant differential activity between the
populations (Methods). miTEA-HiRes also reports a histogram of the
activity p-value distributions in the two populations (see
Table [106]1, Fig. [107]1 bottom row). A difference in miRNA activity
between two conditions reflects a difference in the expression values
of its targets (See Fig. [108]3 and Supplementary Fig. [109]11).
Table 1.
miTEA-HiRes analysis of MS single cell dataset of PBMCs
total activity mode comparative activity mode
miRNA rank activity score rank activity score Histogram of activity
p-values Literature
miR-106b-5p* 1 1.46e−16 1 3.03e−170 graphic file with name
42003_2025_7454_Taba_HTML.gif Increased MS-related disability is
associated with down-regulation of miR-106b-5p and miR-19b-3p in RRMS
patients^[110]84; belongs to the miR-106b/25 cluster^[111]78
miR-93-5p* 2 3.92e−15 2 3.19e-204 graphic file with name
42003_2025_7454_Tabb_HTML.gif Regulates MS risk genes^[112]81; Found to
be overexpressed in MS and in RRMS^[113]81,[114]82; belongs to the
miR-106b/25 cluster^[115]78
miR-17-5p* 3 6.50e−15 3 3.14e−165 graphic file with name
42003_2025_7454_Tabc_HTML.gif Inhibits T Cell activation genes, and
underexpressed in MS whole blood^[116]134; belongs to the miR-17/92
cluster^[117]79
miR-20a-5p 4 1.23e−14 5 1.58e−199 graphic file with name
42003_2025_7454_Tabd_HTML.gif Inhibits T Cell activation genes, and
underexpressed in MS whole blood^[118]134; belongs to the miR-17/92
cluster^[119]79
miR-16-5p* 5 1.90e−13 4 9.22e−140 graphic file with name
42003_2025_7454_Tabe_HTML.gif Underexpressed in PBMCs, CD4+ T cells and
B cells of RRMS patients, comparing to healthy subjects^[120]135
miR-519d-3p 6 1.35e−12 6 4.50e−158 graphic file with name
42003_2025_7454_Tabf_HTML.gif No literature found
miR-19b-3p* 7 3.44e−12 7 4.50e−183 graphic file with name
42003_2025_7454_Tabg_HTML.gif Increased MS-related disability is
associated with downregulation of miR-106b-5p and miR-19b-3p in
relapsing-remitting MS patients^[121]84; belongs to the miR-106a/363
cluster^[122]80
miR-20b-5p 8 3.49e−12 8 7.84e−152 graphic file with name
42003_2025_7454_Tabh_HTML.gif Belongs to the miR-106a/363
cluster^[123]80, whose member miR-19b-3p is reported to be linked with
MS^[124]84. The miR-106a/363 cluster is highly homologous to the
miR-17/92 cluster, they are subsumed in miRBase as one family of miRNAs
with similar target genes and functions^[125]80,[126]136,[127]137. The
miR-17/92 cluster includes hsa-miR-17-5p and hsa-miR-20a-5p, which are
known to be underexpressed in MS whole blood^[128]134.
miR-106a-5p* 9 8.80e−11 9 2.46e−162 graphic file with name
42003_2025_7454_Tabi_HTML.gif Belongs to the miR-106a/363
cluster^[129]80, see cell above.
miR-19a-3p* 10 9.68e−10 >10 - - Belongs to the miR-17/92
cluster^[130]79
miR-155-5p >10 - 10 3.07e−113 graphic file with name
42003_2025_7454_Tabj_HTML.gif A crucial regulator of inflammation,
modulates the autoimmune response in MS and affects the function of the
brain-blood barrier in MS patients^[131]138; up-regulated in PBMCs of
RRMS patients in the remission phase compared with healthy
controls^[132]83
[133]Open in a new tab
Top 10 most overall active miRNAs as computed by miTEA-HiRes in total
activity mode, and top 10 differentially active miRNAs as computed by
miTEA-HiRes in comparative activity mode: Multiple Sclerosis (MS) vs.
control. In the histograms the x-axes indicate -log10(activity
p-values). Histogram legend: Blue: MS, orange: control. miRNAs marked
with * were ranked at the top 10 of the total activity mode for the MS
single cell cerebrospinal fluid (CSF) dataset (Supplementary
Table [134]1). Analysis in comparative activity mode resulted in
identical top 10 lists of differentially active miRNAs for peripheral
blood mononuclear cells (PBMCs) and CSF datasets (Supplementary
Table [135]1). In comparative activity mode, miRNAs are ranked as
described in Methods.
Fig. 3. miRNA differential activity between MS and control groups at
different resolutions.
[136]Fig. 3
[137]Open in a new tab
miRNA differential activity between Multiple Sclerosis (MS) and control
at all-cells level (left most pair of boxes), compared to cell type
specific level. a miR-519a-3p analysis on peripheral blood mononuclear
cells (PBMCs); b miR-519a-3p in cerebrospinal fluid (CSF) cells; c
miR-651-3p in PBMCs; d miR-651-3p in CSF cells. Two sided WRS test
p-value legend: ns: p > 0.05; * : 0.01 < p ≤ 0.05; ** : 1e−3 < p ≤
0.01; *** : 1e−4 < p ≤ 1e−3. **** : p ≤ 1e−4. Cell-type keys: B1 and
B2: B cell subsets; CD4: CD4+ T cells; CD8a: activated CD8+ T cells;
CD8n: non-activated CD8+ T cells; Gran: granulocites; Mono: monocytes;
NK1: natural killer cells; Tdg: γ δ T cells; Tregs: regulatory CD4+ T
cells^[138]77. Group sizes can be found in Supplementary Table [139]4.
miTEA-HiRes thus takes single cell or spatial gene expression data
(with or without cell annotations) as input, and outputs a list of
miRNAs ranked by their potential biological interest, combining overall
activity and differential activity, when relevant. miTEA-HiRes is
available at Github: [140]https://github.com/EfiHerbst31/miTEA-HiRes.
miTEA-HiRes provides evidence of tissue-specific and localized patterns of
miRNA activity, utilizing spatial transcriptomics
We downloaded spatial transcriptomics data of a mouse brain^[141]68,
human breast cancer^[142]69, human skin melanoma^[143]70, human lung
cancer^[144]71, human ovarian cancer^[145]72 and human
cerebellum^[146]73 from the Visium website^[147]74. Fig. [148]2 shows
the histology slides (H&E slides) along with gene expression (GE)
clustering provided on the Visium website, and specific noteworthy
miRNA activity maps exemplifying miTEA-HiRes’s output.
Some activity maps clearly resemble the H&E or the GE clustering-
indicating miRNA activity in specific tissue types. For instance,
miR-16-5p’s activity throughout the primary human breast cancer tissue
is similar to its H&E slide (Fig. [149]2, second row). miR-16-5p is
known to suppress breast cancer proliferation by targeting
ANLN^[150]75, which is under-expressed in this dataset compared to
other datasets (Supplementary Fig. [151]2). Another example is
let-7b-5p, with an activity pattern that is similar to the GE
clustering in the lung cancer slide (Fig. [152]2, fourth row). In the
human cerebellum, the relatively understudied miR-1228-5p and miR-8064
seem clearly more active in specific regions that are also visible in
the H&E slide and GE clustering ([153]2, bottom row).
Other activity maps may offer a more unique insight- information about
the tissue that cannot be inferred from the H&E or the GE clustering.
For example, the activity of miR-29c-3p in the breast cancer sample
(Fig. [154]2, second row) does not precisely correspond, at the
qualitative visual inspection level, to either the clustering patterns
or the characteristics observed in the H&E slide (Supplementary
Fig. [155]10). In order to investigate whether the activity pattern
might be affected by the sparse nature of spatial gene expression
datasets, we experimented with downsampling the counts matrix (see
Supplementary Fig. [156]12). miR-29c-3p is known to be a tumor
suppressor miRNA^[157]76. Also, previous bulk analysis study showed
that miR-29c is over-expressed in luminal A breast cancer compared to
basal breast cancer, and that its targets tend to have lower expression
values in breast cancer samples with elevated miR-29c expression,
indicating that it is not only expressed but also active^[158]12.
MiR-29 is known to play a role in extra-cellular matrix
formation^[159]12 and in cardiovascular disease pathogenesis in
mice^[160]14.
We found that some miRNAs are highly active only in certain tissue
regions (for example, miR-424-5p in the ovarian cancer sample, [161]2,
fifth row), while other miRNAs have a more dispersed activity pattern
(miR-200c-3p, also in the ovarian cancer sample, Fig. [162]2, fifth
row). Another interesting observation is that some miRNAs exhibit
activity patterns that are disjoint from those of other miRNAs. For
example, in the lung cancer tissue (Fig. [163]2, fourth row), the
well-studied let-7b-5p and the less understood miR-6804-3p demonstrate
non-overlapping patterns of activity. To quantify this observation, we
compared the top 10% most active spots in let-7b-5p with those of
miR-6804-3p and found zero overlap (hypergeometric p-value for
under-enrichment 2e−19). When considering the top 20%, there was an
overlap of only 8 spots (hypergeometric p-value for under-enrichment
8e−71).
It is important to note that not all miRNAs exhibit high activity,
which is expected. In the evaluated samples, the fraction of miRNAs
that were found to be active (-log10(p-value) > 5) in at least 20% of
the spots was between 0% (in the mouse brain and human brain) and 10%
(in the human breast cancer).
miTEA-HiRes identifies disease-associated miRNAs at different resolutions in
single-cell PBMCs and CSF datasets of Multiple Sclerosis patients
miTEA-HiRes was applied to a single-cell RNAseq dataset of peripheral
blood mononuclear cells (PBMCs) obtained from five patients with
Multiple Sclerosis (MS) and five control cases^[164]77(Table [165]1).
miTEA-HiRes was also applied to a dataset of RNAseq of cerebrospinal
fluid samples (CSF) collected from MS patients and a control group
(Supplementary Table [166]1). The PBMCs dataset consisted of 42,969
cells and the CSF dataset consisted of 22,357 cells. The breakdown into
cell type-specific groups can be found in Supplementary Table [167]4.
Comparing miRNA activity in MS to control
To reduce computation time, 10,000 cells were randomly selected from
the PBMCs dataset (5000 from each group of MS and control). We then
conducted the analysis using two modes: the total activity mode and the
comparative activity mode - with the MS cell population vs. control.
We present and discuss miTEA-HiRes’s results by observing the 10 miRNAs
ranked highest for overall activity (in total activity mode, without
cell annotations), and the 10 miRNAs ranked highest for differential
activity between MS and control (in comparative activity mode) (Table
[168]1). For convenience, we refer to the first list as the total
activity list and to the second list as the comparative activity list.
The following miRNAs, which appear in both lists, have known
connections to MS: miR-106b-5p and miR-93-5p (which belong to cluster
106b/25^[169]78), miR-17-5p and miR-20a-5p (which belong to the
miR-17/92 cluster^[170]79), miR-19b-3p (which belongs to
miR-106a/363^[171]80), and miR-16-5p. miR-155-5p, which only appears in
the comparative activity list, also has a known connection to MS.
Literature suggests that miR-106b-5p, miR-17-5p, miR-20a-5p, miR-16-5p,
and miR-19b-3p are underexpressed or underregulated in MS patients
compared to a control group (Table [172]1). This underexpression
pattern fits the pattern of miRNA inactivity in MS cells depicted in
miTEA-HiRes’s histograms (see Table [173]1). For two other miRNAs:
miR-93-5p and miR-155-5p, whose miTEA-HiRes histograms also show an
under-activity pattern in MS, literature suggests that they are
over-expressed in MS (but not necessarily over-active) compared to
healthy subjects^[174]81–[175]83. We suggest that these miRNAs might
have a complex role in MS that should be further investigated. The
relationship between miRNA activity and expression is further explored
in the validation section.
miTEA-HiRes also detected some miRNAs which are not yet reported to be
related to MS. Some of them, however, belong to the same or similar
clusters as known miRNAs related to MS. For example: hsa-miR-20b-5p
(total activity list # 8) and miR-106a-5p (total activity list # 9),
which belong to the miR-106a/363 cluster (Supplementary
Table [176]1,^[177]80,[178]84).
Some of the highly ranked miRNAs are not well studied for their
functionality, such as miR-519d-3p (Table [179]1). Following these
results, miR-519d-3p could potentially be investigated for its relation
to MS.
We found that nine miRNAs appeared in both lists. Also, all miRNAs in
the total activity list were found to be differentially active between
MS and control in comparative activity mode (corrected WRS p-values
< 9e−140). We further touch on this point in the discussion.
Cell type specific miRNA differential activity
We downloaded the PBMCs dataset with cell types. In order to improve
statistical significance when analyzing cell subsets, we considered the
entire dataset for this section, without sampling. We then applied
miTEA-HiRes to the data in comparative activity mode, to compare the
activity of miRNAs between MS and control within each cell type
(Table [180]2, Supplementary Table [181]2), and also to compare miRNAs
activity between cell types within the MS or control groups
(Table [182]3, Supplementary Table [183]3). Full Results can be found
at Zenodo 10.5281/zenodo.10720979.
Table 2.
Comparison between MS and control within a specific cell-type
Cell type and origin miR-19b-3p miR-519a-3p miR-3609 miR-651-3p
non-activated CD8+ T cells, PBMCs 1.65e−26 ↓ 1.52e−13 ↓ 5.52e−11 ↓
1.20e−06 ↓
activated CD8+ T cells, PBMCs 1.02e−76 ↓ 4.89e−21 ↓ 9.63e−66 ↓ 8.49e−16
↓
CD4+ T cells, PBMCs 3.36e−105 ↓ 1.28e−40 ↓ 8.27e−75 ↓ 4.01e−38 ↓
regulatory CD4+ T cells, PBMCs 1.51e−07 ↓ 2.76e−02 2.39e−13 ↓ 9.20e−05
↓
natural killer cells, PBMCs 3.77e−25 ↓ 6.74e−16 ↓ 6.17e−24 ↓ 5.29e−02
B1 (B cell subset), PBMCs 9.46e−13 ↓ 2.61e−14 ↓ 1.70e−05 ↓ 8.91e−01
B2 (B cell subset), PBMCs 7.92e−10 ↓ 1.47e−08 ↓ 3.05e−05 ↓ 7.06e−02
granulocytes, PBMCs 1.62e−05 ↓ 8.24e−01 3.76e−06 ↓ 3.67e−13 ↑
activated CD8+ T cells, CSF 5.72e−14 ↓ 6.75e−08 ↓ 2.12e−03 ↓ 2.07e−19 ↓
CD4+ T cells, CSF 4.68e−21 ↓ 2.14e−01 2.84e−01 1.45e−84 ↓
regulatory CD4+ T cells, CSF 5.03e−02 6.22e−01 4.48e−01 4.45e−10 ↓
[184]Open in a new tab
For each cell type, we compared the Multiple Sclerosis (MS) cell
population to the control population in comparative activity mode. The
table lists corrected WRS p-values for miR-19b-3p, miR-519a-3p,
miR-3609 and miR-651-3p, for each comparison. ↓, ↑ represent
reduced\elevated activity in MS, respectively. Activity was reduced in
MS compared to control in all significant instances (p-values < 0.01)
except one: miR-651-3p in granulocytes, where activity in MS was
elevated compared to control. Group sizes can be found in Supplementary
Table [185]4. More results can be found in Supplementary Table [186]2.
Full results can be found at: 10.5281/zenodo.10720979.
Table 3.
Comparison between CD4+ T cells and activated CD8+ T cells in the MS
PBMCs and CSF
CD4+ T cells vs. activated CD8+ T cells miR-19b-3p miR-519a-3p miR-3609
miR-651-3p
PBMCs control 3.38e−01 6.99e−04 ↑ 3.66e−02 4.89e−05 ↑
PBMCs MS 3.73e−17 ↓ 4.48e−02 5.90e−09 ↓ 2.10e−06 ↑
CSF control 1.27e−01 1.05e−10 ↑ 1.31e−04 ↑ 1.76e−01
CSF MS 7.87e−03 ↓ 9.94e−01 2.68e−01 6.84e−06 ↑
[187]Open in a new tab
miRNAs are the same as shown in Table [188]2. Numbers represent the
corrected WRS p-values associated with this differential activity
comparison. More results can be found in Supplementary Table [189]3. ↓,
↑ represent reduced\elevated activity in activated CD8+ T cells
compared to CD4+ T cells, respectively. Group sizes can be found in
Supplementary Table [190]4.
Comparison between MS and control groups within specific cell types
We compared miRNA activity p-values between MS and control within the
same cell type, in the comparative activity mode. Table [191]2 shows
the comparison results for four selected miRNAs. Full results can be
found at Zenodo 10.5281/zenodo.10720979. This comparison allows us to
further explore the origin of differential activity seen in
non-cell-type specific analyses (Fig. [192]3, Table [193]1,
Supplementary Table [194]1). For example, miR-519a-3p was found to have
differential activity between PBMCs obtained from MS patients and
control (Table [195]1, corrected WRS p-value: 4.50e − 158). However,
following the cell type-specific comparison, we can determine that this
difference is subtle or absent in granulocytes and regulatory CD4+ T
cells, but very prominent in other lymphocytes (Fig. [196]3, Table
[197]2). On the other hand, mir-519a-3p was not found to be
differentially active between CSF cells obtained from MS patients and
control, and cell-type specific comparison reveals differential
activity in activated CD8 cells (Fig. [198]3). A noteworthy observation
is that both miR-519a-3p and miR-3609 are significantly differentially
active between MS and control in PBMCs CD4+ cells, but not in CSF CD4+
cells (Table [199]2). This is an interesting finding considering the
role these cells play in MS^[200]85. We note that we could not find
literature elucidating the roles of miR-519a-3p and miR-3609 in MS.
Another example is miR-651-3p, which was also found to have
differential activity between MS and control when considering all PBMCs
(though it was not ranked at the top 10; corrected WRS p-value:
3.62e−97) and also when considering all CSF cells (corrected WRS
p-value: 5.96e−96). By observing the cell-type specific comparisons
(Fig. [201]3 and Table [202]2), we can conclude that this difference
originates from T cells (both in PBMCs and CSF), but is much less
prominent in natural killer cells. Interestingly, granulocytes in PBMCs
demonstrate a significant opposite activity pattern with the MS group
more active than the control group (Fig. [203]3).
The variation in miRNA activity values between conditions stems from
differences in the expression of their target genes. For example,
miR-519-3p shows reduced activity in B-cells from MS patients’ PBMCs
compared to controls (Fig. [204]3), suggesting that its targets may be
less down-regulated (more expressed) in MS compared to controls. To
test this, we conducted the comparisons as in Fig. [205]3, but
compared target expression between cell groups (rather than activity)
(see Supplementary Fig. [206]11). In all cases where activity
differences were significant, we observed a corresponding significant
change in target expression (zero false positives). In 7 out of 44
comparisons, differences in target expression were significant but no
significant changes in activity were captured.
Comparison between cell types within MS or control cell populations
Table [207]3 describes one example for comparison between different
cell types: CD4+ T cells (CD4) and activated CD8+ T cells (CD8a).
Notably, miR-19b-3p is significantly more active in CD4 cells compared
to CD8a cells derived from PBMCs of MS patients, and also (though less
significantly) in CSF derived from MS patients.
The case is different for miR-519a-3p, which has a significant
difference in activity levels between CD4 and CD8 cells in both control
PBMCs and CSF, but this difference, in both sample populations, is lost
in MS patients.
More results can be found in Supplementary Table [208]3. Full results
can be found at Zenodo 10.5281/zenodo.10720979.
We note that while the presented WRS p-values are corrected for the
number of evaluated miRNAs, they are not adjusted for multiple pairwise
comparisons, for instance, between all pairs of cell types. Users can
correct their p-values for multiple comparisons by multiplying by the
number of comparisons. For the MS dataset, even if all relevant
pairwise comparisons are performed (~100), this correction would not
affect conclusions regarding significant patterns (Tables [209]2 and
[210]3).
miTEA-HiRes identifies unique differentially active miRNAs with potential
association to breast cancer metastasis
We analyzed single-cell RNAseq data from three breast cancer cell lines
(MDA-MB-231, SUM149, SUM159) and from patient-derived cells
(GUM36)^[211]86. As described in ref. ^[212]86, the samples were
enriched for migratory breast cancer cells, and the migratory cell
population was microfluidically separated from static cells (Fig.
[213]4). The authors aimed to identify genes associated with cell
migration and clinical outcomes in breast cancer, which could
potentially serve as prognostic biomarkers. In line with this research
direction, we sought to explore whether differentially active miRNAs
could further improve our understanding of the process and possibly
also serve as prognostic biomarkers in breast cancer and specifically
as markers of metastasis.
Fig. 4. miRNA activity layouts for migratory and static breast cancer cells.
[214]Fig. 4
[215]Open in a new tab
a UMAP layout of the data. Colors represent cell-line and status. “WT_"
refers to static cells, “M_" refers to migratory cells. b–e Activity
maps in a UMAP layout, as produced by miTEA-HiRes in total activity
mode. Colors depict -log10(activity p-values). f Comparative activity
map in a UMAP layout, as produced by miTEA-HiRes in comparative
activity mode. Left: static population, right: migratory population.
Colors depict -log10(activity p-values). g–s Histograms of activity
p-values divided by populations, as produced by miTEA-HiRes in
comparative activity mode. In the histograms the x-axes indicate
-log10(activity p-values). “WT_" refers to static cells, “M_" refers to
migratory cells. More results are available in Supplementary
Fig. [216]4. Full results are available at 10.5281/zenodo.10720979.
To enhance the statistical power and leverage the similarities between
breast cancer samples, we merged all cell lines and patient-derived
cells into two groups: 1182 static cells and 1992 migratory cells. We
then used the miTEA-HiRes analysis pipeline to identify both overall
active miRNAs in breast cancer (total activity mode), as well as miRNAs
uniquely active (or uniquely inactive) in migratory cells (comparative
activity mode). Full results are available at [217]Zenodo.
Upon applying miTEA-HiRes in total activity mode, we found that the top
10 active miRNAs are all known oncogenic miRNAs: miR-16-5p^[218]11,
miR-17-5p^[219]11, miR-8485^[220]87, miR-124-3p^[221]88,
miR-20a-5p^[222]11, miR-93-5p^[223]12, miR-19a-3p^[224]11,
miR-106b-5p^[225]89, miR-155-5p^[226]11 and miR-190a-3p^[227]90. As
seen in the activity maps of miR-16-5p, miR-17-5p, miR-8485 and
miR-124-3p (Fig. [228]4), these miRNAs are universally active (activity
p-values for all cells < 1e−8, representing activity in all cells);
also, their activity patterns are similar to one another (see
Discussion for more comments on these findings). Next, we turned to
comparative activity mode to find miRNAs that are differentially active
between the static and the migratory populations. Out of the top 10
miRNAs in total activity mode, only two of them: miR-124-3p and
miR-8485 were also found to be differentially active between the static
and the migratory populations (comparative activity mode, corrected WRS
p-values: 6e−09 and 5e−03 respectively). Their histograms of activity
values are shown in Fig. [229]4. miR-124-3p is known to promote
proliferation^[230]88 and metastasis^[231]91 of triple-negative breast
cancer. Since the three breast cancer cell lines are from
triple-negative origin, it is likely that miR-124-3p has a similar role
here. Not much is known about miR-8485 in the context of breast cancer.
However, it was found to promote proliferation and migration in
mesenchymal stem cells derived from oral carcinoma^[232]87. The other
eight miRNAs in the list of top 10 miRNAs in total activity mode had no
significant differential activity between migratory and static
populations (comparative activity mode, corrected WRS p-values
> 0.01).
Our analysis yielded additional miRNAs observed to be significantly
differentially active between migratory and static cells. We found five
miRNAs that were ranked at the top 30 in comparative activity mode (See
Methods for details) and also had corrected WRS p-value < 1e−15:
miR-122-5p (1e−20, Fig. [233]4 and Supplementary Fig. [234]4e), miR-665
(3e−16, Fig. [235]4 and Supplementary Fig. [236]4f), miR-125b-5p
(4e−19, Supplementary Fig. [237]4a and [238]4c), miR-125a-5p (5e−17,
Supplementary Fig. [239]4b and [240]4d) and miR-4257 (Fig. [241]4). All
of which were more active in migratory cells compared to static cells.
miR-122-5p has been previously identified as a promoter of aggression
and epithelial-mesenchymal transition in triple-negative breast
cancer^[242]92. miR-665 is associated with metastasis and poor survival
in breast cancer patients^[243]93. The differential and overall
increased activity of miR-4257 is an interesting finding. While there
is no specific indication of its involvement in breast cancer
metastasis, the literature indicates inflammation-associated miR-4257
as a promoter of malignancy in colorectal cancer^[244]94. An additional
study found upregulated exosomal miR-4257 in non-small cell lung cancer
patients with recurrence compared with those without
recurrence^[245]95. Considering these findings, miR-4257 may be further
investigated to determine its role in breast cancer progression. As to
miR-125b-5p and miR-125a-5p, there is contradictory evidence for these
two miRNAs, suggesting that they inhibit the metastatic potential of
breast cancer cells^[246]96,[247]97. Further investigation is required
in a dedicated study to explore this inconsistency, as there could be a
more complex mechanism explaining this relationship between expression
and activity in this case.
Other miRNAs were not ranked at the top 30 of comparative activity mode
(Methods), however they show highly significant differential activity
(corrected WRS p-value < 1e−40) and substantial activity in at least
one of the populations (activity p-values below 1e − 4 among at least
third of cells in at least one population). These include miR-6721-5p
(corrected WRS p-value: 3e−83, Fig. [248]4 and Supplementary
Fig. [249]4g), miR-486-3p (8e−74, Fig. [250]4 and Supplementary
Fig. [251]4h), miR-1207-5p (1e−60, Fig. [252]4 and Supplementary
Fig. [253]4i), miR-552-3p (2e−50, Fig. [254]4 and Supplementary
Fig. [255]4j), miR-4763-3p (2e−48, Fig. [256]4 and Supplementary
Fig. [257]4k) and miR-6783-5p (2e−46, Fig. [258]4 and Supplementary
Fig. [259]4l). These miRNAs are more active in migratory cells when
compared to static cells. Both miR-486-3p and its opposite strand
miR-486-5p are known to have central roles in several types of cancer,
including functions relating to metastasis and cell
migration^[260]11,[261]98 (though we could not find any evidence
specific to breast cancer). Regarding miR-1207-5p, the literature
suggests that it promotes breast cancer cell growth by targeting
STAT6^[262]99. miR-552 may also contribute to cell proliferation and
migration^[263]100. The other microRNAs in this list have not been
extensively studied in the context of cancer metastasis.
Two examples of miRNAs that are less active in migratory cells are
miR-892c-5p (8e−29, Fig. [264]4 and Supplementary Fig. [265]4m) and
miR-506-5p (2e−32, Fig. [266]4 and Supplementary Fig. [267]4n). For
both of these miRNAs, their activity maps reveal that the difference in
activity levels is most prominently observed in the primary tissue,
wherein the static population has elevated activity comparing to the
migratory population. The specific role of these miRNAs in relation to
metastasis in breast cancer remains unknown. However, it is worth
noting that miR-506-5p has been reported to be sponged by FOXD2-AS1,
thereby having a role in glioma metastasis^[268]101. This finding could
potentially explain the reduced activity observed in migratory cells
compared to static cells.
Some miRNAs have very similar activity maps. One set is the pair
miR-125a-5p (Supplementary Fig. [269]4d) and miR-125b-5p (Supplementary
Fig. [270]4c), which belong to the miR-10 family^[271]102. In this
case, the similarity of their activity maps can be explained by the
fact that their mature sequence similarity is high (0.682^[272]102) and
that their target lists greatly overlap (hypergeometric p-value
5.42e−62). Another pair of miRNAs, miR-892c-5p (Supplementary
Fig. [273]4m) and miR-506-5p (Supplementary Fig. [274]4n), also have
somewhat similar activity maps. They do not belong to the same
family^[275]102, however they too are similar in sequence (mature
sequence similarity 0.762^[276]102) and in target lists (hypergeometric
p-value 1.63e−29). Some similarity also exists between miR-486-3p
(Supplementary Fig. [277]4h), miR-1207-5p (Supplementary Fig. [278]4i),
miR-6721-5p (Supplementary Fig. [279]4g) and miR-4763-3p (Supplementary
Fig. [280]4k). They also have somewhat similar mature sequences (≥0.238
for all pairwise comparisons^[281]102) and similar target lists
(hypergeometric p-value < 1.8e−9 for all pairwise comparisons).
miR-16-5p, miR-17-5p, miR-8485, and miR-124-3p (Fig. [282]4) present an
interesting case. While the ranges of activity p-values differ from one
miRNA to another, their overall pattern is almost identical. These
miRNAs have no significant overlap of their target lists
(hypergeometric p-value > 0.26 for all pairwise comparisons). Also,
the mature sequence similarity between them is low (≤0.273 for all
pairwise comparisons of miR-16-5p, miR-17-5p, and miR-124-3p; data of
the sequence similarity of miR-8485 were missing from miRPathDB). To
investigate whether the low expression targets of these miRNAs are
involved in similar biological processes, we conducted pathway
enrichment analysis using the canonical pathways gene set collection of
the Human Molecular Signatures Database (MSigDB)^[283]103–[284]105. Our
analysis revealed 10 pathways enriched (hypergeometric p-value ≤0.01)
among the low expression targets of miR-16-5p, miR-17-5p and miR-124-3p
(Supplementary Fig. [285]5), almost all of them related to cancer.
Further research is required to elucidate their specific functions, as
well as the potential involvement of miR-8485.
To allow users to easily determine whether the target lists of two
miRNAs greatly overlap, we computed the Jaccard index for every pair of
miRNA target lists. The resulting table is available at Zenodo
10.5281/zenodo.10720979.
Validation and comparison to other methods
Validation on bulk data from the NCI Genomic Data Commons (GDC) database and
comparison to miREACT
We investigate the relationship between miTEA-HiRes activity p-values
and measured expression by using data from the GDC portal^[286]106,
which includes cancer tumors with matched bulk RNAseq and bulk miRNAseq
samples. Similarly, we compare miReact^[287]59 activity scores to
measured expression.
We established a list of 11,927 cases in the GDC portal that have both
bulk RNAseq sample and a miRNAseq sample, from a solid tissue primary
tumor (Methods under GDC data curation). The dataset included tumors in
140 tissue types. We ran miTEA-HiRes and miReact on the RNAseq counts
to predict miRNA activity p-values (Methods under Evaluating agreement
between activity and expression in the GDC dataset, for miTEA- HiRes
and miReact). For each sample, we sorted the miRNAs by descending
expression, and used the mHG test to evaluate whether active miRNAs are
over-represented at the top of the list (Fig. [288]5). Similarly, we
computed the over-representation of non-active miRNAs at the bottom of
the list (Fig. [289]5). Briefly, active vs. non-active is determined by
p-value < 0.05, p-value > 0.1, respectively, in both tools. As seen in
Fig. [290]5, in both cases, the agreement between expression and
activity was better for miTEA-HiRes than for miReact.
Fig. 5. miTEA-HiRes validation and comparison to miReact, using the GDC bulk
dataset.
[291]Fig. 5
[292]Open in a new tab
Agreement between expression and activity, per sample, reflected by the
enrichment of: a active miRNAs among the most expressed miRNAs; b
non-active miRNAs among the least expressed miRNAs; miTEA-HiRes
activity p-values grouped by cancer tissue for: c miR-101-3p; d
miR-22-3p; e miR-10b-3p. Full results are available at Zenodo
10.5281/zenodo.10720979.
We also present anecdotal examples of miRNAs with known functions in
specific cancers (Methods under Activity and expression across cancer
tissues of origin). Their expression levels are, indeed, reflected in
their activity scores. For example, miR-101-3p has elevated activity,
as determined by miTEA-HiRes, in the liver and the adrenal gland,
compared to other cancer types (Fig. [293]5). Literature suggests it
has a significant role in hepatocellular cancer^[294]107–[295]109, and
also in adrenocortical carcinoma^[296]110. Similarly, miR-22-3p has a
role in heptocellular cancer^[297]111,[298]112 and in adrenocortical
carcinoma^[299]113, as reflected in its activity levels (Fig. [300]5).
And finally, miR-10b-3p has a role in adrecortical carcinoma^[301]110,
as reflected in its activity levels, as inferred by miTEA-HiRes (Fig.
[302]5). The expression data of these three miRNAs is depicted in
Supplementary Fig. [303]6. Full results are available at Zenodo
10.5281/zenodo.10720979.
Validation using two single-cell total-RNA datasets showcases significant
relationship between miRNA expression and activity
We further investigate the relationship between miRNA activity and
expression by analyzing total-RNA sequencing data obtained using the
Smart-seq-total method^[304]45. This method allows for the simultaneous
measurement of single-cell mRNA and miRNA expression. The datasets
consist of RNAome data from human primary fibroblasts, from HEK293T and
MCF7 cells (which were treated as a single multi-cell type dataset
consisting of 633 cells), as well as from induced murine embryonic stem
cells differentiated into embryoid bodies (2167 cells)^[305]45.
We quantified the association between expression and activity by first
sorting the miRNAs in each dataset by expression, then measuring the
enrichment of active (non-active) miRNAs, as assessed by miTEA-HiRes
(Methods under Evaluating the relationship between single-cell miRNA
expression and activity), at the top (bottom) of the list. We utilized
the mHG test to determine the level of enrichment. This process was
performed for both human and mouse datasets (see Methods and enrichment
plots in Supplementary Fig. [306]7). For the human dataset we found a
significant enrichment of active miRNAs at the top of the list (mHG
p-value = 3.76e−05). We also found a significant enrichment of
non-active miRNAs at the bottom of the list (p-value = 0.0055). For
the mouse dataset, significant enrichment (<0.05) was detected on both
sides as well; enrichment of active miRNAs at the top of the list
(p-value = 0.0301) and non-active miRNAs at the bottom of the list
(p-value = 0.0114).
We also tested the agreement between single-cell expression and
activity in a single-cell resolution, for specific miRNAs that were
found to be both overall active and expressed. Supplementary
Fig. [307]8 shows example results from this analysis.
We were hoping to explore the relationship between the expression and
activity of miRNAs using additional total-RNA datasets generated by
VASA-seq^[308]46 and STORM-seq^[309]47 technologies. However, we found
that these technologies report very low expression values of miRNAs,
probably due to low miRNA capture efficiency.
Validation using scRNA-seq from miRNA induction experiment
Inspired by miRSCAPE^[310]60 validation, we sought to find whether
miTEA-HiRes is able to accurately capture the differential expression
of miR-199a-5p and miR-199a-3p, induced in i199 and i199-KTN1 cell
lines (derived from HEK293 cell line, by using a plasmid containing the
pri-miRNA)^[311]114. Towards this, we obtained the count matrices of
3280 i199 cells, and 3143 i199-KTN1 cells, including induced and
uninduced annotations directly from the author^[312]114. We applied
miTEA-HiRes on the two datasets, and found that for both cell lines,
both miR-199a-5p and miR-199a-3p were significantly more active in the
induced cells compared to the uninduced ones (see Methods under
Evaluating differential activity in single-cell miRNA induction
experiment). Two-sided WRS p-values for induced vs. uninduced cells are
as follows. For induced i199 (695 cells) vs. uninduced (1561 cells), we
get p-value = 7e−15 for miR-199a-3p and p-value =1e−10 for miR-199a-5p.
For induced i199-KTN1 (611 cells) vs. uninduced (1075 cells), we get
p-value = 0.001 for miR-199a-3p and p-value = 0.007 for miR-199a-5p. As
reported by miRSCAPE^[313]60 authors, miRSCAPE results were accurately
aligned with the induction of both miRNAs while miReact could not
consistently detect the upregulation of miR-199 in induced cells.
Results are available at Zenodo 10.5281/zenodo.10720979.
Discussion
We developed miTEA-HiRes, an algorithmic approach, implemented as a
Python package (including a pip-installable version), to predict miRNA
activity in single cell RNA-seq and spatial transcriptomics datasets.
miTEA-HiRes relies on experimentally validated miRNA-target
interactions and on the mHG statistics. We used miTEA-HiRes to predict
miRNA activity in six spatial datasets by Visium (Fig. [314]2), single
cell datasets of PBMCs and CSF from MS patients and a control group
(Fig. [315]3, Tables [316]1–[317]3, Supplementary
Tables [318]1–[319]3), as well as in a single cell dataset of
stationary and metastatic cells of breast cancer cell lines and a
primary tumor samples (Fig. [320]4 and Supplementary Fig. [321]4). We
also evaluated the relationship between expression and activity of
miRNAs using totalRNA data (Supplementary Figs. [322]7 and [323]8).
Finally we compare our method to miREACT using bulk GDC data (Fig.
[324]5), and to both miREACT and miRSCAPE using miRNA induction
experiment in single-cell data.
By employing miTEA-HiRes on spatial transcriptomics data, it is
possible to obtain information about the spatial distribution of miRNA
activity, information that cannot be directly obtained from standard
measurement. The inferred information then allows us to determine, for
example, whether the miRNA activity is concentrated in one location or
spread across the tissue, whether it coincides with specific tissue
regions identified by H&E staining, by standard clustering, or by
adjacent IHC, and if it overlaps with the activity patterns of other
miRNAs within the same sample. For example, in the lung cancer sample,
let-7b-5p and miR-6804-3p were found to have significantly
non-overlapping activity maps which suggests a synthetic-lethality
behavior for this pair of miRNAs. This finding provides insight into
miR-6804-3p’s function by drawing on the established functions of
let-7b-5p. Another example is the activity of the tumor suppressor
miR-16-5p, which is located in a well-defined area of the breast cancer
sample, an area that has a distinct H&E staining pattern and that is
marked at the level of full gene expression clustering. This activity
pattern stands in contrast with the activity of another tumor
suppressor, miR-29c-3p, that is far more spread out across the same
tissue (Fig. [325]2).
For single cell data, miTEA-HiRes has two optional modes: total
activity mode, which reports on miRNAs that are overall active, and
comparative activity mode, which compares between populations of cells.
We found that in some scenarios, miRNAs that are generally more active
in a sample (reported by the total activity mode) greatly overlap with
miRNAs that are differentially active between the sample main
populations (reported by the comparative activity mode). An example
observation is found analyzing MS and control PBMCs and CSF cells. In
this analysis, we see a significant share of miRNAs that were found to
be overall active and to also be differentially active between the MS
and control populations. In other scenarios, such as the stationary and
metastatic breast cancer cells, distinct sets of miRNAs were reported
by the different modes, that is- miRNAs that were found to be highly
active in all cells, did not have a significant overlap with the ones
that were found to be differentially active between the stationary and
metastatic populations. The variation of this aspect between the two
datasets could potentially be attributed to the unique nature of these
pathologies and the varying roles that miRNAs play in their formative
and developmental processes.
We found that in the breast cancer single cell dataset, some sets of
miRNAs had very similar activity maps. The activity map similarity of
most of these sets can be easily explained by the fact that their
mature sequence similarity is high or by the (related) fact that their
target gene lists greatly overlap. However, the set miR-16-5p,
miR-17-5p, miR-8485 and miR-124-3p (Fig. [326]4) cannot be so simply
explained. This, therefore, is an insight about a possible similarity
of functions between these miRNAs (or their targets), or perhaps a
biological process that involves all of them. A future extension of
miTEA-HiRes may include this downstream analysis step of spotting these
very interesting sets of possibly cooperating miRNAs. Currently, users
can use the Jaccard index similarity table available at Zenodo
10.5281/zenodo.10720979 to evaluate the overlap between miRNA target
lists.
In some cases, miTEA-HiRes found that miRNAs were universally active,
that is, active in all (or almost all) cells or spots. Examples of this
phenomenon are given by mir-106b-5p in the MS PBMCs data (Supplementary
Fig. [327]9), and four miRNAs in the breast cancer dataset mentioned
above (miR-16-5p, miR-17-5p, miR-8485, and miR-124-3p; Fig. [328]4).
The phenomenon may be biologically explained by the fact that some
miRNAs have multiple isomiRs: miRNA variants that originate from the
same loci, but are somewhat different due to various processes such as
RNA editing, SNPs, or imprecise cleavage^[329]115,[330]116. While a
miRNA may have a large number of targets, each specific isomiR is more
inclined to target a subset of these targets. If different isomiRs are
present at different regions of the tissue or at different cell
populations, the subset of targets being down-regulated would change
between cells\spots. However- the miRNA would be considered universally
active, from our statistical perspective. Mathematically, this finding
can be counter-intuitive due to the normalization step. However, the
same logic applies here: different subsets of the target list are
down-regulated in different subsets of the cells\spots. To address this
mathematical phenomenon we present a synthetic count matrix (see
Supplementary Fig. [331]9).
An important aspect of miTEA-HiRes is its ability to facilitate the
comparison of miRNA activity between subpopulations of the data,
allowing researchers to derive more precise conclusions. For example,
miR-651-3p was found to be significantly differentially active between
MS and control in both PBMCs and CSF cells. The activity scores
indicated a decreased activity in MS compared to the control group
overall. However, only when comparing each cell type separately, we
found that this pattern predominantly manifests in T cells.
Surprisingly, in PBMCs, granulocytes exhibit an opposite pattern, with
miR-651-3p being significantly more active in MS compared to the
control group (Fig. [332]3, Table [333]2). As demonstrated by comparing
Fig. [334]3 and Supplementary Fig. [335]11, significant differences in
miRNA activity between conditions consistently reflect changes in the
expression levels of their targets. The ability to compare
subpopulations also effectively addresses another concern. When
analyzing the two primary populations within the data, their
composition can influence the outcomes. For instance, if a specific
cell type is more prevalent in the first population and has higher
activity of a particular miRNA, comparing the populations as a whole
would simply indicate that the first group is more active for this
miRNA. However, conducting comparisons among cell types within each
group, as well as between the groups, would provide a better
understanding of the findings.
The connection between the expression level of a miRNA and its activity
level is an open question, not yet well addressed due to scarce
resources that offered measurement of both. We explored this connection
using a total RNA mouse and human datasets. The datasets include both
the expression levels of mRNAs and lncRNAs (miRNA targets, which can be
used by miTEA-HiRes to predict activity levels), and the expression
levels of the miRNAs themselves. In both datasets, we found a
significant enrichment of active miRNAs among highly expressed miRNAs
(Supplementary Fig. [336]7), and also a significant enrichment of
non-active miRNAs among low expression miRNAs. This finding is even
more remarkable given that these datasets include the expression of the
3p and 5p strands of each miRNA combined, while they appear separately
in the miRNA-target database thus yielding possibly different
miTEA-HiRes results.
We also examined the relationship between expression and activity in
MS, by scanning the literature for what is known about the expression
of miRNAs that were found to be differentially active. The ten miRNAs
that were found to be the most differentially active in MS PBMCs were
all found to be less active in MS compared to healthy control. For five
of them, activity and expression had a similar pattern, that is: these
miRNAs are known to be down-regulated or under-expressed in MS compared
to healthy control, or in severe MS compared to mild MS. Two of them
are known to be over-expressed in certain stages of MS compared to
healthy control, indicating a more complex working mechanism that may
lead to disagreement between activity and expression. For the remaining
three miRNAs, we could not find literature regarding their expression
in MS.
While miRNA expression and activity are obviously related, they are not
unanimous due to the complexity of cell biology. Our findings shed
light on the interrelation of miRNA expression and activity. Through
the integration of miRNA expression measurement and activity inference
using our approach and code package, researchers can gain deeper
insight into cellular level regulatory pathways, leading to a better
understanding of spatial and cell-level biology.
In summary, miTEA-HiRes provides a means to capture the signal of miRNA
activity, which usually goes undetected in gene expression datasets.
Our study demonstrated that miTEA-HiRes successfully identified
numerous functional miRNAs in specific samples and sample types.
Through comparison with other methods, we emphasize the strengths of
miTEA-HiRes. Most notably miTEA-HiRes provides robust quality of
results while relying on the data itself without the need to collect
and pre-train on large problem-specific datasets. miTEA-HiRes also
offers ease of usage, with a friendly end-to-end pip-installable Python
package. Additionally, we provide several leads that may be further
investigated. miTEA-HiRes can thus help define miRNA roles in
biological processes and point out miRNAs that should be further
explored. Also, future work could utilize the general framework of our
technique in uncovering the activity of transcription factors,
providing a broader and more comprehensive understanding of active
biological processes.
Methods
Statistical assessment of miRNA activity
The computation of activity p-values by miTEA-HiRes (including data
preprocessing) is described as a pseudo-code in Supplementary
Algorithm [337]1. miTEA-HiRes accepts as input both spatial
transciptomics and single-cell RNA-sequencing raw count matrices
containing gene counts per spot or cell. Bulk data may be analyzed as
single cell data. The steps and components of the analysis are further
described herein.
miRNA-target interactions (MTIs)
MTIs for humans and mice were acquired from the experimentally
validated miRNA-target interactions database miRTarBase 8.0^[338]117.
Non-functional MTIs were excluded, and functional MTIs with a strong
experimental evidence or at least two weak experimental evidences were
preserved, similar to the approach used in a previous study^[339]63.
The filtered MTIs are provided within the miTEA-HiRes package for 706
mouse miRNAs and 2571 human miRNAs.
Preprocessing
For single-cell data, if there are multiple files, they are merged into
a single matrix and a random sample of 10,000 cells is taken to improve
performance and minimally sacrificing statistical power. After data
cleansing (i.e. removing duplicated genes, removing genes with zero
counts, and removing cells with identical gene expression profiles),
miTEA-HiRes normalizes the count matrix so that the total number of
reads in each cell or spot is 10,000 (Fig. [340]1, top middle panel).
To capture deviations in gene expression relative to its own dynamic
expression across samples, each gene’s values are transformed into
z-scores (Fig. [341]1, top right panel). That is, for a dataset with S
cells or spots, each count c(g, s) per gene\transcript g and cell\spot
s is transformed as follows:
[MATH:
z(g,s)=c(<
mi>g,s)−a(g)
σ(g) :MATH]
Where
[MATH:
a(g)=
1S∑sc(g,s) :MATH]
and
[MATH:
σ(g)=
1S
∑s<
mrow>(c(g,s)−a(g))2 :MATH]
Automatic data preprocessing is available in the miTEA-HiRes package.
Activity p-values
As depicted in Fig. [342]1 (middle row), for each spot or cell s,
miTEA-HiRes ranks the genes\transcripts g by their z-score transformed
counts, with the smallest value z(g, s) at the top and the largest
value z(g, s) at the bottom. Next, using the set of MTIs, miTEA-HiRes
converts this ranked gene list into a binary ordered list for each
miRNA of interest, where target genes are assigned a value of 1 and all
other genes are assigned a value of 0 (as depicted in Supplementary
Fig. [343]3). miTEA-HiRes then employs the mHG (minimum hypergeometric)
test (see next section) to assess statistical enrichment in this ranked
binary list. For this purpose, we are using the XL-mHG Python
package^[344]118,[345]119. The mHG p-value is then used by miTEA-HiRes
to represent the miRNA activity level for the specific combination of
miRNA and cell or spot being analyzed. In case no targets are
identified, the p-value generated by the mHG test is exactly one.
mHG (minimum hypergeometric) test
The nonparametric mHG test was developed by ref. ^[346]65 and is only
briefly described here for completeness. The test aims to evaluate the
level of enrichment present at the top of a ranked binary list.
According to the null hypothesis, the given list is randomly and
uniformly selected from a pool of all lists containing N entries, with
K of them being 1s. The alternative hypothesis suggests that the 1s
have a tendency to prominently appear at the top of the list.
Let
[MATH:
X~Hyper<
mi>geometric(N,K<
mo>,n) :MATH]
and let λ be the observed binary list of length N, containing K 1s and
(N−K) 0s. The mHG statistic is computed as:
[MATH:
mHG(λ)=min1≤n≤NProb(X
mi>≥bn(λ))
mo> :MATH]
Where
[MATH:
bn=∑i=1
nλi. :MATH]
The p-value of mHG is then computed as the fraction of permutations
[MATH: λ~
:MATH]
, of the vector λ, for which
[MATH: mHG(λ~)≤mHG
(λ) :MATH]
. That is, when using Λ to denote all such permuted patterns, we
consider:
[MATH: p-valuemHG(
λ)=<
mfenced close="∣" open="∣">λ~∈Λ∣mHG(λ~)≤mHG
(λ)<
/mfenced>NK
mrow> :MATH]
1
Activity scores
According to the data type and mode of operation (spatial, or single
cell: total activity or comparative activity modes), miTEA-HiRes ranks
miRNAs to highlight the most notable ones, and assigns an aggregated
activity score for each one of them (details follow). Relevant miRNA
activity plots are produced, with activity p-values (per cell/spot)
transformed to -log10(p-value) to emphasize significant results (see
Fig. [347]1, bottom row). The ranked list of miRNAs, as well as links
to the corresponding plots, are provided by the miTEA-HiRes package.
For spatial transcriptomics
The aggregated activity score for each miRNA is the fraction of spots
with observed significant activity (p-value < 1e−5). miRNAs are sorted
based on their activity scores, and spatial miRNA activity maps are
generated as part of the output.
For single-cell RNA-seq in total activity mode
miTEA-HiRes first averages the activity p-values across all cells for
each miRNA. Then, these average p-values are FDR-corrected to obtain
the final activity score for every miRNA. Also, miTEA-HiRes produces
activity maps on a UMAP layout. In order to treat the activity score as
an exact p-value, a more conservative merging approach may be
applied^[348]120–[349]124.
For single-cell RNA-seq in comparative activity mode
The miRNA activity score represents the level of differential activity
between the two populations and is defined by the two-sided WRS
p-value, subjected to FDR correction. We recommend using the
comparative activity mode to compare populations of at least 100 cells.
The ranking mechanism of miRNAs was designed to highlight miRNAs that
are not only differentially active but also have consistently high
activity p-values in at least one population. At the top of the ranked
list are miRNAs that demonstrate an average activity p-value that is
equal to or greater than the 0.97 quantile in at least one population
and have activity scores smaller than 1e−8. Following these top miRNAs,
are miRNAs that manifest an average activity p-value that is equal to
or greater than the 0.9 quantile in at least one population and have
activity score smaller than 1e−8. All the remaining miRNAs are sorted
based on their activity scores.
Finally, relevant UMAPs and histograms are produced to demonstrate the
different activity patterns across the two populations.
UMAP projection
Before computing the UMAP coordinates, miTEA-HiRes first performs
commonly used preprocessing steps on the raw count matrix: cells with
less than 200 genes are excluded, genes expressed in less than 3 cells
are excluded, cells with total number of counts below the bottom 2% and
above the top 2% are excluded to remove outliers, cells are normalized
to 10,000 total counts, and all counts are transformed to log(count).
miTEA-HiRes then follows the Scanpy recommended pipeline^[350]125 for
generating UMAP, including the extraction of highly variable genes, PCA
(principal component analysis), computation of the neighborhood graph
of cells, and finally computation of UMAP coordinates.
miTEA-HiRes Python package
miTEA-HiRes is a user friendly python package, available in
[351]https://github.com/EfiHerbst31/miTEA-HiResGithub and installable
using pip. It can be executed with merely a path to the data and
dataset name, but also has multiple options for more refined analysis.
Basic execution includes end-to-end miRNA activity map computation.
Output of the basic execution includes an activity matrix, an html list
of ranked miRNAs, along with their activity scores and links to plots
of top 10 miRNAs. Executions can be done via command-line or by
importing the library and using its functions. Data type (i.e.
single-cell or spatial transcriptomics), and data extension (supported:
txt, tsv, pkl and mtx; files may be zipped) are automatically inferred
from the data location. If preprocessing is required, multiple files
are merged, 10,000 cells are sampled by default and data cleaning is
performed (processed data is saved as .pkl file for reproducibility).
Supported species are homo sapiens and mus musculus. In order to run in
single-cell comparative activity mode, populations strings contained
within cell names are required, which are assumed to be unique across
the cells. miTEA-HiRes is computation intensive (for example, the
running time of a spatial dataset with 2264 spots and ~6000 genes, on
a 16 cores machine with 706 examined microRNAs, is 22 minutes), hence
it is recommended to run miTEA-HiRes on a multicore machine (by default
all available CPUs are utilized). By default all miRNAs available in
the MTI database are analyzed. By default top 10 most interesting
miRNAs are plotted, however this can be changed. More fine-tuning can
be done using the possibility to filter spots with a small number of
reads, and with the possibility to change the threshold below which a
spot is considered to be active. Debugging mode allows for detailed
output. Detailed information is provided in the package
[352]https://github.com/EfiHerbst31/miTEA-HiResreadme document.
Packages utilized within miTEA-HiRes: Anndata^[353]126, Scanpy^[354]127
and XL-mHG^[355]119.
Statistics and reproducibility
Analyzing spatial transcriptomics data sets
Spatial transcriptomics datasets were analyzed using miTEA-HiRes
without any preprocessing, using the default parameters.
Analyzing Multiple Sclerosis single-cell data sets
Both PBMC and CSF datasets were analyzed using miTEA-HiRes without any
preprocessing, using default parameters (sampling 10K cells). The MS
dataset with cell types was analyzed without sampling, to improve
statistical significance when analyzing cell subsets.
Analyzing breast cancer single-cell data sets
To enhance the statistical power and leverage the similarities between
breast cancer samples, we merged all cell lines and patient-derived
cells into two groups: 1182 static cells and 1992 migratory cells. We
then used the miTEA-HiRes analysis pipeline.
GDC data curation
First, we used the GDC portal^[356]106 to create two cohorts- one of
cases with RNAseq data and the other of cases with miRNAseq data. We
then used Python code to intersect these two cohorts, and loaded the
intersected cohort back to the GDC portal. We used the portal to
identify the RNAseq samples and the miRNAseq samples that followed
these criteria: primary tumor, solid tissue, open access, tsv or txt.
We then used Python to download the relevant samples. If more than one
RNAseq sample \miRNAseq sample was available for the same case, we
chose the one with the most detected genes \miRNAs. Finally, we
produced one gene expression count matrix of 11,927 samples and 57,287
genes from the RNAseq samples, and one miRNA expression (reads per
million) count matrix of 11,927 samples and 1810 miRNAs from the
miRNAseq samples. We then found that in the second matrix, for some
miRNAs, we had the expression values of the precursors of the miRNA
(marked with “-1","-2"... at the end) rather than the mature miRNA. In
these cases, we summed the expression values of the precursors to get
an assessment of the expression of the mature miRNA. After performing
this step, and removing miRNAs that were not expressed at all, we had
1662 miRNAs in the miRNA expression matrix.
Evaluating agreement between activity and expression in the GDC dataset, for
miTEA-HiRes and miReact
We ran miTEA-HiRes on the gene expression count matrix using the total
activity mode (Methods). We also ran miReact^[357]59 in R on the same
gene expression count matrix, as described in miReact
tutorial^[358]128. From both methods we received an activity p-values
matrix as output, with a prediction of the activity of each miRNA (2571
miRNAs for miTEA-HiRes, 2578 miRNAs for miReact) in each of the 11,927
samples. We transformed the miTEA-HiRes activity p-values matrix to
-log10(p-values) (the miReact activity p-values were already -log10
transformed). Each of the miRNAs that appeared in the activity matrices
of both miTEA-HiRes and miReact, also appeared as one of the 1662
miRNAs in the miRNA expression matrix. For strand-specific miRNAs, no
strand was specified for the miRNA that appeared in the miRNA
expression matrix.
For both miTEA-HiRes and miReact, we determined activity as follows. A
miRNA was considered active in a certain sample if its activity p-value
was less than 0.05. If a strand-specific miRNA was considered active,
we considered the no-strand version of the same miRNA (the one that
appeared in the miRNA expression matrix) as active. For each sample, we
ordered the miRNAs based on their descending expression. We then used
the mHG test to evaluate whether active miRNAs were enriched at the top
of the list. The p-values of the mHG tests performed for all samples
are summarized in Fig. [359]5.
Similarly, we considered miRNAs with an activity p-value of more than
0.1 as non-active. If a strand-specific miRNA was found to be
non-active, and its opposite strand was not found to be active, then we
considered the no-strand version of the same miRNA (the one that
appeared in the miRNA expression matrix) to be non-active. For each
sample, we ordered the miRNAs based on their ascending expression. We
then used the mHG test to evaluate whether non-active miRNAs were
enriched at the top of the list. The p-values of the mHG tests
performed for all samples are summarized in Fig. [360]5.
Activity and expression across cancer tissues of origin
We downloaded the clinical data for the final set of cases, chosen from
the GDC portal as explained above. Five cases had multiple entries
(with inconsistencies in the relevant column) and were excluded. On the
“tissue_or_organ_of_origin" column, we found that 11,896 cases had 140
tissues of origin, while for 26 cases the tissue of origin was not
reported. We used a Python function to group the origins into 35
combined tissues. For example, “Fallopian tube", “Cervix uteri" and
“Endometrium" were classified as “Female Reproductive System", while
“Cortex of adrenal gland" and “Medulla of adrenal gland" were
classified as “Adrenal Gland". Then, for miR-101-3p, miR-22-3p, and
miR-10b-3p, we produced the box plots of miTEA-HiRes activity values
across 14 combined tissues that had at least 200 samples each (Fig.
[361]5). Similarly, we produced the expression box plots for the same
miRNAs (Supplementary Fig. [362]6).
Evaluating the relationship between single-cell miRNA expression and activity
We downloaded total-RNA sequencing datasets obtained using the
Smart-seq-total method^[363]45. We then merged the datasets of human
primary fibroblasts, HEK293T, and MCF7 cells into a single human
dataset (633 cells), and the datasets of all stages of murine embryonic
stem cells differentiation into a single mouse dataset (2167 cells). To
explore the enrichment level of active miRNAs at the top of the ranked
list of overexpressed miRNAs, we performed the following processing
pipeline. To obtain the list of most expressed miRNAs, we summed the
raw counts for each miRNA over all the cells. We retained only those
miRNAs that were also available in miTEA-HiRes database. We ranked this
list in descending order, with the most expressed miRNAs positioned at
the top. A miRNA was considered active if its activity score was less
than 0.05 (average activity p-value over all cells). miTEA-HiRes’ MTI
database distinguishes between miRNAs with 3p and 5p strands, while the
total-RNA dataset displays only one value for the two strands, which
represents the summed expression of them. To account for this, we
considered a miRNA to be active if at least one of the strands was
active. To utilize the mHG test, we converted the ranked list of
overexpressed miRNAs into 1’s (representing active miRNAs) and 0’s.
To explore the enrichment level of non-active miRNAs at the top of the
ranked list of underexpressed miRNAs, we performed the following
processing pipeline. To obtain the list of underexpressed miRNAs, we
used the reverse-order of the overexpressed miRNAs list obtained
previously. A miRNA was considered to be non-active if its activity
score was greater than 0.1 (average activity p-value over all cells).
If a miRNA was found to be non-active, but its other strand (3p or 5p)
was previously considered as active, it was excluded from the
non-active miRNAs list to ensure that only truly non-active miRNAs were
included. To utilize the mHG test, we converted the ranked list of
underexpressed miRNAs into 1’s (representing non-active miRNA) and 0’s
otherwise.
For the human dataset, enrichment of active miRNAs at the top of the
list of expressed miRNAs was calculated as follows:
p[mHG](N = 1462, B = 740, n* = 138, b(n*) = 97) = 3.76e−05. Enrichment
of non-active miRNAs at the bottom of the list:
p[mHG](N = 1462, B = 526, n* = 1367, b(n*) = 508) = 0.0055. For the
mouse dataset, enrichment of active miRNAs at the top of the list:
p[mHG](N = 492, B = 47, n* = 234, b(n*) = 32) = 0.0301. Enrichment of
non-active miRNAs at the bottom of the list:
p[mHG](N = 492, B = 413, n* = 265, b(n*) = 236) = 0.0114). N refers to
the number of shared miRNAs between the dataset and miTEA-HiRes
database, B is the number of active (non-active) miRNAs, n* is the
cutoff selected by the mHG test, b(n*) is the number of miRNAs both
highly expressed and active (lowly expressed and non-active) at the
cutoff.
Evaluating differential activity in single-cell miRNA induction experiment
We obtained the count matrices of 3280 distinct i199 cells, and 3143
i199-KTN1 cells, including induced and uninduced annotations directly
from the author^[364]114. Following miRSCAPE methods^[365]60, we
annotated the cells according to GeGFP-00001 expression levels
(’induced’ if the expression level was within the top 20%, and
’uninduced’ if the expression level was zero). After cell annotation,
we removed the genes related to miRNA induction experiment from the
data. As a preprocessing step, we filtered out cells having gene
detection rate amongst the top 2%. Also, as we noted that the overall
gene detection rate was lower in the i199-KTN1 cells compared to the
i199 cells, we filtered out cells having less than 5000 detected genes.
As a result, we were left with the following cell type composition:
i199 had 695 induced cells, 1561 uninduced cells, and 2814 cells in
total. i199-KTN1 had 611 induced cells, 1075 uninduced cells, and 2264
cells in total. We then applied miTEA-HiRes in comparative activity
mode, focusing on miR-199a-5p and miR-199a-3p. miTEA-HiRes indicated
correctly the excess activity of both miRNAs in the induced cell group
in both cell types, as measured using WRS p-values. Results are
available at Zenodo 10.5281/zenodo.10720979.
Reporting summary
Further information on research design is available in the [366]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[367]Transparent Peer Review file^ (282.9KB, pdf)
[368]Supplementary Information^ (4.6MB, pdf)
[369]Reporting Summary^ (1.9MB, pdf)
Acknowledgements