Abstract The activity of miRNA varies across different cell populations and systems, as part of the mechanisms that distinguish cell types and roles in living organisms and in human health and disease. Typically, miRNA regulation drives changes in the composition and levels of protein-coding RNA and of lncRNA, with targets being down-regulated when miRNAs are active. The term “miRNA activity" is used to refer to this transcriptional effect of miRNAs. This study introduces miTEA-HiRes, a method designed to facilitate the evaluation of miRNA activity at high resolution. The method applies to single-cell transcriptomics, type-specific single-cell populations, and spatial transcriptomics data. By comparing different conditions, differential miRNA activity is inferred. For instance, miTEA-HiRes analysis of peripheral blood mononuclear cells comparing Multiple Sclerosis patients to control groups revealed differential activity of miR-20a-5p and others, consistent with the literature on miRNA underexpression in Multiple Sclerosis. We also show miR-519a-3p differential activity in specific cell populations. Subject terms: Statistical methods, Software, Computational models __________________________________________________________________ This study introduces miTEA-HiRes, a method designed to facilitate the evaluation of miRNA activity at high resolution. The method applies to single cell transcriptomics, type-specific single cell populations, and spatial transcriptomics data. Introduction MicroRNAs (miRNAs) are small RNAs (sRNAs) that operate by post-transcriptionally repressing their target genes, either by inducing messenger RNA (mRNA) deadenylation and degradation or by inhibiting translation^[30]1,[31]2. Many studies have shown that overexpression of miRNAs reduces the expression level of their targets^[32]2–[33]6 and their protein levels^[34]3,[35]4,[36]7, while miRNAs knockdown elevates the expression levels of their targets^[37]2,[38]5,[39]6. miRNAs have a vital effect on many developmental and disease processes, making them potential theraputic targets in cancer^[40]8–[41]12, cardiovascular disease^[42]13–[43]15, neurodegenerative diseases^[44]16, diabetes mellitus^[45]17, achute ischemmic disease^[46]18, inflammatory disease^[47]19, depression^[48]20 and more^[49]21–[50]25. In the last twenty years, there has been significant progress in single-cell RNA-sequencing, resulting in data and insight for different cell types and across systems. This advancement has facilitated a thorough investigation of cell diversity in developmental and disease processes, relying almost exclusively on mRNAs and long non-coding RNAs (lncRNAs)^[51]26,[52]27. The detection of miRNAs in single cells, however, is one step behind. Early attempts relied on PCR^[53]28–[54]32, nanomotors^[55]33, atomic force microscopy^[56]34, isothermal amplification methods^[57]35–[58]37, or FISH^[59]38, all of which are limited in the number of measured miRNAs. Small RNA single cell sequencing was first attempted by ref. ^[60]39. Then, other methods emerged for the simultaneous sequencing of sRNAs and mRNAs^[61]40–[62]43. However, these are very limited in the number of cells profiled. The concept of adding polyA tails to nonpolyadenylated RNA strands, which was first introduced by ref. ^[63]44 for sequencing of small (<150 bp) DNAs or RNAs, has later matured into the methods now known as “totalRNA single cell sequencing". This approach detects RNA strands of all lengths by adding polyA tails to nonpolyadenylated RNA, either before or after fragementation^[64]45–[65]47. In the past decade, high-throughput spatial transcriptomics methods were developed^[66]48. Through spatially resolved gene expression, we can now not only identify biological processes in a tissue, but also pinpoint these processes to an exact location of the tissue^[67]48–[68]52. Traditionally, these methods focused entirely on mRNAs and lncRNAs. However, very recently the STRS (spatial total RNA-sequencing) method was introduced^[69]53. STRS employs the same polyadenylation concept (mentioned earlier for single cell) to slices of tissue, resulting in a gene expression spatial map for genes of all lengths. The efforts to measure miRNA expression in single-cell and spatially are important and would probably soon mature and become widespread. However, they do not immediately solve the open question of expression versus activity of miRNAs in a natural setting. The relationship between miRNA expression and its level of activity, that is- the extent of target repression, has been extensively researched^[70]1,[71]7,[72]42,[73]54. Reference ^[74]7 found that regulation by miRNAs is non-linear: miRNA can behave both as a switch of target expression and as a fine-tuner. Reference ^[75]12 report on studying both miRNA and mRNA profiles in a cohort of breast cancer patients. The analysis leads to identifying miR29’s role in regulating the extra-cellular matrix, as well as other findings. Reference ^[76]42 found that in a few examples, activity depends on expression levels of both the miRNA and its targets, that is: miRNAs that were overall expressed had a negative correlation with their overall expressed targets. We also know that the relationship between a miRNA and its targets may be more complex than a simple negative regulation, and includes feedback loops, feed-forward loops and miRNA ’sponging’^[77]54. The recent emergence of totalRNA single-cell and spatial methods enables us to initiate an exploration of miRNA to target relationships at a higher resolution. However, current transcriptomics technology also allows for inference and insight, as we show in this work. Several methods have been suggested to facilitate the study of miRNA activity. Our group has formerly developed miTEA, a method to predict miRNA activity by examining the mutual statistical enrichment in two ranked lists - the list of expressed genes and the list of targets for any given (pivot) miRNA (the output of a target prediction algorithm or experiment)^[78]55. MIENTURNET^[79]56, DIANA-mirExTra2.0^[80]57 and enrichMiR^[81]58 offer a miRNA target enrichment analysis for a set of genes based on the hypergeometric test. DIANA-mirExTra2.0 and enrichMiR also accept differential expression analysis results as input, and enrichMiR also offers other statistical approaches. However, all of these methods and tools focus on activity prediction for a two-condition experiment, and do not support the more complex architecture of single cell RNA sequencing experiments.^[82]59 developed miReact, which is a method aimed at evaluating miRNA activity in single cell RNAseq datasets based on motif enrichment. They describe sound statistics for this context and provide a friendly software package. Their approach, however, does not utilize database (experimentally validated) information and does not address spatial transcriptomics. Also, it does not use the activity scores obtained for subsets of the data to find significant differences in miRNA activity between cell types or conditions. Olgun et al.^[83]60 developed miRSCAPE, a computational tool to infer miRNA expression in single cell data. They hypothesize that measuring miRNA expression can be performed more accurately, exploiting the complex direct as well as indirect regulatory links between miRNA and the expression of other mRNAs. miRSCAPE demonstrates impressive results. However, the accuracy of the results relies on finding and pre-training a model using a bulk dataset of the same cell type and condition as the inspected single-cell dataset, in order to capture the relevant regulatory links. This makes miRSCAPE results highly variable, depending on the availability and quality of such bulk dataset, affecting the generalizability, reproducibility, and usability of the tool. Finally, all of the above mentioned tools do not address spatial RNA sequencing. In this work, we present miTEA-HiRes: a statistical approach to interpret the activity of miRNAs based on the expression pattern of their targets in single-cell RNAseq and spatial transcriptomics. The method creates the list of targets by filtering the miRTarBase database^[84]61. It then uses the multiple-sample setting to perform normalization of gene expression values, and finally applies the minimum HyperGeometric (mHG) test^[85]62,[86]63 to compute the level of activity in each sample. We use this approach for two main purposes. First, to create, for the first time to the best of our knowledge, miRNA activity maps for scRNAseq and spatial datasets, enabling the exploration of miRNA heterogeneity (variability) across different cell types or across the geography of a sample. Second, to find differentially active miRNAs between cells in different conditions, for example, disease vs. normal. We evaluate our method using bulk data, and single-cell miRNA induction experiments. We investigate the relationship between miRNA expression and activity, as emerges from total RNA single-cell sequencing. Our method and usage instructions are available at [87]https://github.com/EfiHerbst31/miTEA-HiRes. Results The statistical method miTEA-HiRes relies on a precomputed miRNA-target interaction set, which was curated from the miRTarBase database as in ref. ^[88]63: a miRNA-target interaction is included in the interaction set if it is functional, and has at least one strong experimental support or two weak ones. Statistical characteristics of the final list of miRNA-target interactions can be found in Supplementary Fig. [89]1. The miTEA-HiRes analysis has two steps. At the first step, it computes the activity p-values, representing the level of activity of every evaluated miRNA in each of the cells or spots. At the second step, miTEA-HiRes combines the activity p-values for each miRNA to yield activity scores, indicating its biological significance. miTEA-HiRes also produces activity maps and histograms when appropriate. First step: computing activity p-values As a preprocessing step ([90]1, top row), miTEA-HiRes first performs normalization of total reads per spot (for spatial datasets) or cell (for single cell datasets). Next, it performs a Z-score transformation per gene. We recommend preprocessing data to account for batch effects^[91]64 if relevant, and to remove cells with very few reads (<1000) before using it as input for miTEA-HiRes, as these cells may disproportionately impact z-scores. Then, for each spot or cell, miTEA-HiRes ranks the genes in ascending order of their Z-scores (see Methods for the mathematical definitions). Our main assumption is that if a certain miRNA is active, then its targets would be down-regulated, so they would appear at the top of this ranked list (as depicted in Supplementary Fig. [92]3). Enriched occurrence is evaluated using the minimum-HyperGeometric (mHG) test^[93]55,[94]62,[95]65–[96]67, a statistical procedure assessing the enrichment level of a set of elements at the top of a list. The mHG test is performed for each evaluated miRNA, for all spots\cells. The reported p-values of the mHG test are then used as the activity p-values associated with every pair of miRNA and cell\spot (Fig. [97]1, middle row). The activity matrix, containing the activity p-values, is provided as output for users to facilitate further user-defined analysis, in addition to the analysis performed by miTEA-HiRes pipeline. For example - an analysis based on clustering according to activity profiles can be performed. See Methods and pseudo-code (Supplementary Algorithm [98]1) for a more detailed description of this step. Fig. 1. miTEA-HiRes pipeline. [99]Fig. 1 [100]Open in a new tab Top row: normalization and standardization. Middle row: computation of activity p-values (shown for spatial data. The same process applies to single-cell data). Bottom row: computation of aggregated activity scores (for all data types); Generation of spatial activity maps for spatial data, or activity maps on a UMAP layout for single-cell data; Statistical comparison between activity p-values of different cell populations for single-cell data in comparative activity mode. See text and Methods for more details. Second step: computation of aggregated activity scores For spatial data, miTEA-HiRes computes the aggregated activity score of each miRNA as the percentage of active spots (defined by thresholding the activity p-values) and outputs a list of miRNAs sorted by this value. miTEA-HiRes also provides an activity map for each miRNA over the spatial coordinates (Fig. [101]1 bottom row and Fig. [102]2). Fig. 2. miTEA-HiRes spatial activity maps. [103]Fig. 2 [104]Open in a new tab Activity maps of two selected miRNAs per tissue are presented in the two rightmost columns. Colors of activity maps represent -log10 of activity p-values. Corresponding H&E slides and GE clustering, provided by Visium, appear in the two leftmost columns. Datasets presented (from top to bottom): mouse brain (2264 spots), human breast cancer (4898), human skin melanoma (3458), human lung cancer (3858), human ovarian cancer (4674) and human cerebellum (4992). For single cell data, miTEA-HiRes has two working modes. The total activity mode treats the dataset as a single population of cells, so it does not require any cell annotations. In this mode, the aggregated activity scores are defined as the corrected average activity p-values (Methods). miTEA-HiRes then reports the miRNAs sorted by their activity scores, as well as an activity map for each miRNA over a UMAP layout (Fig. [105]1 bottom row). The second mode is the comparative activity mode. In this mode, miTEA-HiRes performs a two-sided Wilcoxon rank-sum (WRS) test to evaluate the difference in activity p-values between compared populations or conditions (for example: cell types, disease vs. control). miTEA-HiRes then defines the aggregated activity scores by correcting the WRS p-values using a false-discovery rate (FDR) correction. For this mode, the terms “activity score" and “corrected WRS p-value" are used interchangeably. miTEA-HiRes reports the miRNAs that have significant activity p-values in at least one of the compared populations and manifest significant differential activity between the populations (Methods). miTEA-HiRes also reports a histogram of the activity p-value distributions in the two populations (see Table [106]1, Fig. [107]1 bottom row). A difference in miRNA activity between two conditions reflects a difference in the expression values of its targets (See Fig. [108]3 and Supplementary Fig. [109]11). Table 1. miTEA-HiRes analysis of MS single cell dataset of PBMCs total activity mode comparative activity mode miRNA rank activity score rank activity score Histogram of activity p-values Literature miR-106b-5p* 1 1.46e−16 1 3.03e−170 graphic file with name 42003_2025_7454_Taba_HTML.gif Increased MS-related disability is associated with down-regulation of miR-106b-5p and miR-19b-3p in RRMS patients^[110]84; belongs to the miR-106b/25 cluster^[111]78 miR-93-5p* 2 3.92e−15 2 3.19e-204 graphic file with name 42003_2025_7454_Tabb_HTML.gif Regulates MS risk genes^[112]81; Found to be overexpressed in MS and in RRMS^[113]81,[114]82; belongs to the miR-106b/25 cluster^[115]78 miR-17-5p* 3 6.50e−15 3 3.14e−165 graphic file with name 42003_2025_7454_Tabc_HTML.gif Inhibits T Cell activation genes, and underexpressed in MS whole blood^[116]134; belongs to the miR-17/92 cluster^[117]79 miR-20a-5p 4 1.23e−14 5 1.58e−199 graphic file with name 42003_2025_7454_Tabd_HTML.gif Inhibits T Cell activation genes, and underexpressed in MS whole blood^[118]134; belongs to the miR-17/92 cluster^[119]79 miR-16-5p* 5 1.90e−13 4 9.22e−140 graphic file with name 42003_2025_7454_Tabe_HTML.gif Underexpressed in PBMCs, CD4+ T cells and B cells of RRMS patients, comparing to healthy subjects^[120]135 miR-519d-3p 6 1.35e−12 6 4.50e−158 graphic file with name 42003_2025_7454_Tabf_HTML.gif No literature found miR-19b-3p* 7 3.44e−12 7 4.50e−183 graphic file with name 42003_2025_7454_Tabg_HTML.gif Increased MS-related disability is associated with downregulation of miR-106b-5p and miR-19b-3p in relapsing-remitting MS patients^[121]84; belongs to the miR-106a/363 cluster^[122]80 miR-20b-5p 8 3.49e−12 8 7.84e−152 graphic file with name 42003_2025_7454_Tabh_HTML.gif Belongs to the miR-106a/363 cluster^[123]80, whose member miR-19b-3p is reported to be linked with MS^[124]84. The miR-106a/363 cluster is highly homologous to the miR-17/92 cluster, they are subsumed in miRBase as one family of miRNAs with similar target genes and functions^[125]80,[126]136,[127]137. The miR-17/92 cluster includes hsa-miR-17-5p and hsa-miR-20a-5p, which are known to be underexpressed in MS whole blood^[128]134. miR-106a-5p* 9 8.80e−11 9 2.46e−162 graphic file with name 42003_2025_7454_Tabi_HTML.gif Belongs to the miR-106a/363 cluster^[129]80, see cell above. miR-19a-3p* 10 9.68e−10 >10 - - Belongs to the miR-17/92 cluster^[130]79 miR-155-5p >10 - 10 3.07e−113 graphic file with name 42003_2025_7454_Tabj_HTML.gif A crucial regulator of inflammation, modulates the autoimmune response in MS and affects the function of the brain-blood barrier in MS patients^[131]138; up-regulated in PBMCs of RRMS patients in the remission phase compared with healthy controls^[132]83 [133]Open in a new tab Top 10 most overall active miRNAs as computed by miTEA-HiRes in total activity mode, and top 10 differentially active miRNAs as computed by miTEA-HiRes in comparative activity mode: Multiple Sclerosis (MS) vs. control. In the histograms the x-axes indicate -log10(activity p-values). Histogram legend: Blue: MS, orange: control. miRNAs marked with * were ranked at the top 10 of the total activity mode for the MS single cell cerebrospinal fluid (CSF) dataset (Supplementary Table [134]1). Analysis in comparative activity mode resulted in identical top 10 lists of differentially active miRNAs for peripheral blood mononuclear cells (PBMCs) and CSF datasets (Supplementary Table [135]1). In comparative activity mode, miRNAs are ranked as described in Methods. Fig. 3. miRNA differential activity between MS and control groups at different resolutions. [136]Fig. 3 [137]Open in a new tab miRNA differential activity between Multiple Sclerosis (MS) and control at all-cells level (left most pair of boxes), compared to cell type specific level. a miR-519a-3p analysis on peripheral blood mononuclear cells (PBMCs); b miR-519a-3p in cerebrospinal fluid (CSF) cells; c miR-651-3p in PBMCs; d miR-651-3p in CSF cells. Two sided WRS test p-value legend: ns: p > 0.05; * : 0.01 < p ≤ 0.05; ** : 1e−3 < p ≤ 0.01; *** : 1e−4 < p ≤ 1e−3. **** : p ≤ 1e−4. Cell-type keys: B1 and B2: B cell subsets; CD4: CD4+ T cells; CD8a: activated CD8+ T cells; CD8n: non-activated CD8+ T cells; Gran: granulocites; Mono: monocytes; NK1: natural killer cells; Tdg: γ δ T cells; Tregs: regulatory CD4+ T cells^[138]77. Group sizes can be found in Supplementary Table [139]4. miTEA-HiRes thus takes single cell or spatial gene expression data (with or without cell annotations) as input, and outputs a list of miRNAs ranked by their potential biological interest, combining overall activity and differential activity, when relevant. miTEA-HiRes is available at Github: [140]https://github.com/EfiHerbst31/miTEA-HiRes. miTEA-HiRes provides evidence of tissue-specific and localized patterns of miRNA activity, utilizing spatial transcriptomics We downloaded spatial transcriptomics data of a mouse brain^[141]68, human breast cancer^[142]69, human skin melanoma^[143]70, human lung cancer^[144]71, human ovarian cancer^[145]72 and human cerebellum^[146]73 from the Visium website^[147]74. Fig. [148]2 shows the histology slides (H&E slides) along with gene expression (GE) clustering provided on the Visium website, and specific noteworthy miRNA activity maps exemplifying miTEA-HiRes’s output. Some activity maps clearly resemble the H&E or the GE clustering- indicating miRNA activity in specific tissue types. For instance, miR-16-5p’s activity throughout the primary human breast cancer tissue is similar to its H&E slide (Fig. [149]2, second row). miR-16-5p is known to suppress breast cancer proliferation by targeting ANLN^[150]75, which is under-expressed in this dataset compared to other datasets (Supplementary Fig. [151]2). Another example is let-7b-5p, with an activity pattern that is similar to the GE clustering in the lung cancer slide (Fig. [152]2, fourth row). In the human cerebellum, the relatively understudied miR-1228-5p and miR-8064 seem clearly more active in specific regions that are also visible in the H&E slide and GE clustering ([153]2, bottom row). Other activity maps may offer a more unique insight- information about the tissue that cannot be inferred from the H&E or the GE clustering. For example, the activity of miR-29c-3p in the breast cancer sample (Fig. [154]2, second row) does not precisely correspond, at the qualitative visual inspection level, to either the clustering patterns or the characteristics observed in the H&E slide (Supplementary Fig. [155]10). In order to investigate whether the activity pattern might be affected by the sparse nature of spatial gene expression datasets, we experimented with downsampling the counts matrix (see Supplementary Fig. [156]12). miR-29c-3p is known to be a tumor suppressor miRNA^[157]76. Also, previous bulk analysis study showed that miR-29c is over-expressed in luminal A breast cancer compared to basal breast cancer, and that its targets tend to have lower expression values in breast cancer samples with elevated miR-29c expression, indicating that it is not only expressed but also active^[158]12. MiR-29 is known to play a role in extra-cellular matrix formation^[159]12 and in cardiovascular disease pathogenesis in mice^[160]14. We found that some miRNAs are highly active only in certain tissue regions (for example, miR-424-5p in the ovarian cancer sample, [161]2, fifth row), while other miRNAs have a more dispersed activity pattern (miR-200c-3p, also in the ovarian cancer sample, Fig. [162]2, fifth row). Another interesting observation is that some miRNAs exhibit activity patterns that are disjoint from those of other miRNAs. For example, in the lung cancer tissue (Fig. [163]2, fourth row), the well-studied let-7b-5p and the less understood miR-6804-3p demonstrate non-overlapping patterns of activity. To quantify this observation, we compared the top 10% most active spots in let-7b-5p with those of miR-6804-3p and found zero overlap (hypergeometric p-value for under-enrichment 2e−19). When considering the top 20%, there was an overlap of only 8 spots (hypergeometric p-value for under-enrichment 8e−71). It is important to note that not all miRNAs exhibit high activity, which is expected. In the evaluated samples, the fraction of miRNAs that were found to be active (-log10(p-value) > 5) in at least 20% of the spots was between 0% (in the mouse brain and human brain) and 10% (in the human breast cancer). miTEA-HiRes identifies disease-associated miRNAs at different resolutions in single-cell PBMCs and CSF datasets of Multiple Sclerosis patients miTEA-HiRes was applied to a single-cell RNAseq dataset of peripheral blood mononuclear cells (PBMCs) obtained from five patients with Multiple Sclerosis (MS) and five control cases^[164]77(Table [165]1). miTEA-HiRes was also applied to a dataset of RNAseq of cerebrospinal fluid samples (CSF) collected from MS patients and a control group (Supplementary Table [166]1). The PBMCs dataset consisted of 42,969 cells and the CSF dataset consisted of 22,357 cells. The breakdown into cell type-specific groups can be found in Supplementary Table [167]4. Comparing miRNA activity in MS to control To reduce computation time, 10,000 cells were randomly selected from the PBMCs dataset (5000 from each group of MS and control). We then conducted the analysis using two modes: the total activity mode and the comparative activity mode - with the MS cell population vs. control. We present and discuss miTEA-HiRes’s results by observing the 10 miRNAs ranked highest for overall activity (in total activity mode, without cell annotations), and the 10 miRNAs ranked highest for differential activity between MS and control (in comparative activity mode) (Table [168]1). For convenience, we refer to the first list as the total activity list and to the second list as the comparative activity list. The following miRNAs, which appear in both lists, have known connections to MS: miR-106b-5p and miR-93-5p (which belong to cluster 106b/25^[169]78), miR-17-5p and miR-20a-5p (which belong to the miR-17/92 cluster^[170]79), miR-19b-3p (which belongs to miR-106a/363^[171]80), and miR-16-5p. miR-155-5p, which only appears in the comparative activity list, also has a known connection to MS. Literature suggests that miR-106b-5p, miR-17-5p, miR-20a-5p, miR-16-5p, and miR-19b-3p are underexpressed or underregulated in MS patients compared to a control group (Table [172]1). This underexpression pattern fits the pattern of miRNA inactivity in MS cells depicted in miTEA-HiRes’s histograms (see Table [173]1). For two other miRNAs: miR-93-5p and miR-155-5p, whose miTEA-HiRes histograms also show an under-activity pattern in MS, literature suggests that they are over-expressed in MS (but not necessarily over-active) compared to healthy subjects^[174]81–[175]83. We suggest that these miRNAs might have a complex role in MS that should be further investigated. The relationship between miRNA activity and expression is further explored in the validation section. miTEA-HiRes also detected some miRNAs which are not yet reported to be related to MS. Some of them, however, belong to the same or similar clusters as known miRNAs related to MS. For example: hsa-miR-20b-5p (total activity list # 8) and miR-106a-5p (total activity list # 9), which belong to the miR-106a/363 cluster (Supplementary Table [176]1,^[177]80,[178]84). Some of the highly ranked miRNAs are not well studied for their functionality, such as miR-519d-3p (Table [179]1). Following these results, miR-519d-3p could potentially be investigated for its relation to MS. We found that nine miRNAs appeared in both lists. Also, all miRNAs in the total activity list were found to be differentially active between MS and control in comparative activity mode (corrected WRS p-values < 9e−140). We further touch on this point in the discussion. Cell type specific miRNA differential activity We downloaded the PBMCs dataset with cell types. In order to improve statistical significance when analyzing cell subsets, we considered the entire dataset for this section, without sampling. We then applied miTEA-HiRes to the data in comparative activity mode, to compare the activity of miRNAs between MS and control within each cell type (Table [180]2, Supplementary Table [181]2), and also to compare miRNAs activity between cell types within the MS or control groups (Table [182]3, Supplementary Table [183]3). Full Results can be found at Zenodo 10.5281/zenodo.10720979. Table 2. Comparison between MS and control within a specific cell-type Cell type and origin miR-19b-3p miR-519a-3p miR-3609 miR-651-3p non-activated CD8+ T cells, PBMCs 1.65e−26 ↓ 1.52e−13 ↓ 5.52e−11 ↓ 1.20e−06 ↓ activated CD8+ T cells, PBMCs 1.02e−76 ↓ 4.89e−21 ↓ 9.63e−66 ↓ 8.49e−16 ↓ CD4+ T cells, PBMCs 3.36e−105 ↓ 1.28e−40 ↓ 8.27e−75 ↓ 4.01e−38 ↓ regulatory CD4+ T cells, PBMCs 1.51e−07 ↓ 2.76e−02 2.39e−13 ↓ 9.20e−05 ↓ natural killer cells, PBMCs 3.77e−25 ↓ 6.74e−16 ↓ 6.17e−24 ↓ 5.29e−02 B1 (B cell subset), PBMCs 9.46e−13 ↓ 2.61e−14 ↓ 1.70e−05 ↓ 8.91e−01 B2 (B cell subset), PBMCs 7.92e−10 ↓ 1.47e−08 ↓ 3.05e−05 ↓ 7.06e−02 granulocytes, PBMCs 1.62e−05 ↓ 8.24e−01 3.76e−06 ↓ 3.67e−13 ↑ activated CD8+ T cells, CSF 5.72e−14 ↓ 6.75e−08 ↓ 2.12e−03 ↓ 2.07e−19 ↓ CD4+ T cells, CSF 4.68e−21 ↓ 2.14e−01 2.84e−01 1.45e−84 ↓ regulatory CD4+ T cells, CSF 5.03e−02 6.22e−01 4.48e−01 4.45e−10 ↓ [184]Open in a new tab For each cell type, we compared the Multiple Sclerosis (MS) cell population to the control population in comparative activity mode. The table lists corrected WRS p-values for miR-19b-3p, miR-519a-3p, miR-3609 and miR-651-3p, for each comparison. ↓, ↑ represent reduced\elevated activity in MS, respectively. Activity was reduced in MS compared to control in all significant instances (p-values < 0.01) except one: miR-651-3p in granulocytes, where activity in MS was elevated compared to control. Group sizes can be found in Supplementary Table [185]4. More results can be found in Supplementary Table [186]2. Full results can be found at: 10.5281/zenodo.10720979. Table 3. Comparison between CD4+ T cells and activated CD8+ T cells in the MS PBMCs and CSF CD4+ T cells vs. activated CD8+ T cells miR-19b-3p miR-519a-3p miR-3609 miR-651-3p PBMCs control 3.38e−01 6.99e−04 ↑ 3.66e−02 4.89e−05 ↑ PBMCs MS 3.73e−17 ↓ 4.48e−02 5.90e−09 ↓ 2.10e−06 ↑ CSF control 1.27e−01 1.05e−10 ↑ 1.31e−04 ↑ 1.76e−01 CSF MS 7.87e−03 ↓ 9.94e−01 2.68e−01 6.84e−06 ↑ [187]Open in a new tab miRNAs are the same as shown in Table [188]2. Numbers represent the corrected WRS p-values associated with this differential activity comparison. More results can be found in Supplementary Table [189]3. ↓, ↑ represent reduced\elevated activity in activated CD8+ T cells compared to CD4+ T cells, respectively. Group sizes can be found in Supplementary Table [190]4. Comparison between MS and control groups within specific cell types We compared miRNA activity p-values between MS and control within the same cell type, in the comparative activity mode. Table [191]2 shows the comparison results for four selected miRNAs. Full results can be found at Zenodo 10.5281/zenodo.10720979. This comparison allows us to further explore the origin of differential activity seen in non-cell-type specific analyses (Fig. [192]3, Table [193]1, Supplementary Table [194]1). For example, miR-519a-3p was found to have differential activity between PBMCs obtained from MS patients and control (Table [195]1, corrected WRS p-value: 4.50e − 158). However, following the cell type-specific comparison, we can determine that this difference is subtle or absent in granulocytes and regulatory CD4+ T cells, but very prominent in other lymphocytes (Fig. [196]3, Table [197]2). On the other hand, mir-519a-3p was not found to be differentially active between CSF cells obtained from MS patients and control, and cell-type specific comparison reveals differential activity in activated CD8 cells (Fig. [198]3). A noteworthy observation is that both miR-519a-3p and miR-3609 are significantly differentially active between MS and control in PBMCs CD4+ cells, but not in CSF CD4+ cells (Table [199]2). This is an interesting finding considering the role these cells play in MS^[200]85. We note that we could not find literature elucidating the roles of miR-519a-3p and miR-3609 in MS. Another example is miR-651-3p, which was also found to have differential activity between MS and control when considering all PBMCs (though it was not ranked at the top 10; corrected WRS p-value: 3.62e−97) and also when considering all CSF cells (corrected WRS p-value: 5.96e−96). By observing the cell-type specific comparisons (Fig. [201]3 and Table [202]2), we can conclude that this difference originates from T cells (both in PBMCs and CSF), but is much less prominent in natural killer cells. Interestingly, granulocytes in PBMCs demonstrate a significant opposite activity pattern with the MS group more active than the control group (Fig. [203]3). The variation in miRNA activity values between conditions stems from differences in the expression of their target genes. For example, miR-519-3p shows reduced activity in B-cells from MS patients’ PBMCs compared to controls (Fig. [204]3), suggesting that its targets may be less down-regulated (more expressed) in MS compared to controls. To test this, we conducted the comparisons as in Fig. [205]3, but compared target expression between cell groups (rather than activity) (see Supplementary Fig. [206]11). In all cases where activity differences were significant, we observed a corresponding significant change in target expression (zero false positives). In 7 out of 44 comparisons, differences in target expression were significant but no significant changes in activity were captured. Comparison between cell types within MS or control cell populations Table [207]3 describes one example for comparison between different cell types: CD4+ T cells (CD4) and activated CD8+ T cells (CD8a). Notably, miR-19b-3p is significantly more active in CD4 cells compared to CD8a cells derived from PBMCs of MS patients, and also (though less significantly) in CSF derived from MS patients. The case is different for miR-519a-3p, which has a significant difference in activity levels between CD4 and CD8 cells in both control PBMCs and CSF, but this difference, in both sample populations, is lost in MS patients. More results can be found in Supplementary Table [208]3. Full results can be found at Zenodo 10.5281/zenodo.10720979. We note that while the presented WRS p-values are corrected for the number of evaluated miRNAs, they are not adjusted for multiple pairwise comparisons, for instance, between all pairs of cell types. Users can correct their p-values for multiple comparisons by multiplying by the number of comparisons. For the MS dataset, even if all relevant pairwise comparisons are performed (~100), this correction would not affect conclusions regarding significant patterns (Tables [209]2 and [210]3). miTEA-HiRes identifies unique differentially active miRNAs with potential association to breast cancer metastasis We analyzed single-cell RNAseq data from three breast cancer cell lines (MDA-MB-231, SUM149, SUM159) and from patient-derived cells (GUM36)^[211]86. As described in ref. ^[212]86, the samples were enriched for migratory breast cancer cells, and the migratory cell population was microfluidically separated from static cells (Fig. [213]4). The authors aimed to identify genes associated with cell migration and clinical outcomes in breast cancer, which could potentially serve as prognostic biomarkers. In line with this research direction, we sought to explore whether differentially active miRNAs could further improve our understanding of the process and possibly also serve as prognostic biomarkers in breast cancer and specifically as markers of metastasis. Fig. 4. miRNA activity layouts for migratory and static breast cancer cells. [214]Fig. 4 [215]Open in a new tab a UMAP layout of the data. Colors represent cell-line and status. “WT_" refers to static cells, “M_" refers to migratory cells. b–e Activity maps in a UMAP layout, as produced by miTEA-HiRes in total activity mode. Colors depict -log10(activity p-values). f Comparative activity map in a UMAP layout, as produced by miTEA-HiRes in comparative activity mode. Left: static population, right: migratory population. Colors depict -log10(activity p-values). g–s Histograms of activity p-values divided by populations, as produced by miTEA-HiRes in comparative activity mode. In the histograms the x-axes indicate -log10(activity p-values). “WT_" refers to static cells, “M_" refers to migratory cells. More results are available in Supplementary Fig. [216]4. Full results are available at 10.5281/zenodo.10720979. To enhance the statistical power and leverage the similarities between breast cancer samples, we merged all cell lines and patient-derived cells into two groups: 1182 static cells and 1992 migratory cells. We then used the miTEA-HiRes analysis pipeline to identify both overall active miRNAs in breast cancer (total activity mode), as well as miRNAs uniquely active (or uniquely inactive) in migratory cells (comparative activity mode). Full results are available at [217]Zenodo. Upon applying miTEA-HiRes in total activity mode, we found that the top 10 active miRNAs are all known oncogenic miRNAs: miR-16-5p^[218]11, miR-17-5p^[219]11, miR-8485^[220]87, miR-124-3p^[221]88, miR-20a-5p^[222]11, miR-93-5p^[223]12, miR-19a-3p^[224]11, miR-106b-5p^[225]89, miR-155-5p^[226]11 and miR-190a-3p^[227]90. As seen in the activity maps of miR-16-5p, miR-17-5p, miR-8485 and miR-124-3p (Fig. [228]4), these miRNAs are universally active (activity p-values for all cells < 1e−8, representing activity in all cells); also, their activity patterns are similar to one another (see Discussion for more comments on these findings). Next, we turned to comparative activity mode to find miRNAs that are differentially active between the static and the migratory populations. Out of the top 10 miRNAs in total activity mode, only two of them: miR-124-3p and miR-8485 were also found to be differentially active between the static and the migratory populations (comparative activity mode, corrected WRS p-values: 6e−09 and 5e−03 respectively). Their histograms of activity values are shown in Fig. [229]4. miR-124-3p is known to promote proliferation^[230]88 and metastasis^[231]91 of triple-negative breast cancer. Since the three breast cancer cell lines are from triple-negative origin, it is likely that miR-124-3p has a similar role here. Not much is known about miR-8485 in the context of breast cancer. However, it was found to promote proliferation and migration in mesenchymal stem cells derived from oral carcinoma^[232]87. The other eight miRNAs in the list of top 10 miRNAs in total activity mode had no significant differential activity between migratory and static populations (comparative activity mode, corrected WRS p-values > 0.01). Our analysis yielded additional miRNAs observed to be significantly differentially active between migratory and static cells. We found five miRNAs that were ranked at the top 30 in comparative activity mode (See Methods for details) and also had corrected WRS p-value < 1e−15: miR-122-5p (1e−20, Fig. [233]4 and Supplementary Fig. [234]4e), miR-665 (3e−16, Fig. [235]4 and Supplementary Fig. [236]4f), miR-125b-5p (4e−19, Supplementary Fig. [237]4a and [238]4c), miR-125a-5p (5e−17, Supplementary Fig. [239]4b and [240]4d) and miR-4257 (Fig. [241]4). All of which were more active in migratory cells compared to static cells. miR-122-5p has been previously identified as a promoter of aggression and epithelial-mesenchymal transition in triple-negative breast cancer^[242]92. miR-665 is associated with metastasis and poor survival in breast cancer patients^[243]93. The differential and overall increased activity of miR-4257 is an interesting finding. While there is no specific indication of its involvement in breast cancer metastasis, the literature indicates inflammation-associated miR-4257 as a promoter of malignancy in colorectal cancer^[244]94. An additional study found upregulated exosomal miR-4257 in non-small cell lung cancer patients with recurrence compared with those without recurrence^[245]95. Considering these findings, miR-4257 may be further investigated to determine its role in breast cancer progression. As to miR-125b-5p and miR-125a-5p, there is contradictory evidence for these two miRNAs, suggesting that they inhibit the metastatic potential of breast cancer cells^[246]96,[247]97. Further investigation is required in a dedicated study to explore this inconsistency, as there could be a more complex mechanism explaining this relationship between expression and activity in this case. Other miRNAs were not ranked at the top 30 of comparative activity mode (Methods), however they show highly significant differential activity (corrected WRS p-value < 1e−40) and substantial activity in at least one of the populations (activity p-values below 1e − 4 among at least third of cells in at least one population). These include miR-6721-5p (corrected WRS p-value: 3e−83, Fig. [248]4 and Supplementary Fig. [249]4g), miR-486-3p (8e−74, Fig. [250]4 and Supplementary Fig. [251]4h), miR-1207-5p (1e−60, Fig. [252]4 and Supplementary Fig. [253]4i), miR-552-3p (2e−50, Fig. [254]4 and Supplementary Fig. [255]4j), miR-4763-3p (2e−48, Fig. [256]4 and Supplementary Fig. [257]4k) and miR-6783-5p (2e−46, Fig. [258]4 and Supplementary Fig. [259]4l). These miRNAs are more active in migratory cells when compared to static cells. Both miR-486-3p and its opposite strand miR-486-5p are known to have central roles in several types of cancer, including functions relating to metastasis and cell migration^[260]11,[261]98 (though we could not find any evidence specific to breast cancer). Regarding miR-1207-5p, the literature suggests that it promotes breast cancer cell growth by targeting STAT6^[262]99. miR-552 may also contribute to cell proliferation and migration^[263]100. The other microRNAs in this list have not been extensively studied in the context of cancer metastasis. Two examples of miRNAs that are less active in migratory cells are miR-892c-5p (8e−29, Fig. [264]4 and Supplementary Fig. [265]4m) and miR-506-5p (2e−32, Fig. [266]4 and Supplementary Fig. [267]4n). For both of these miRNAs, their activity maps reveal that the difference in activity levels is most prominently observed in the primary tissue, wherein the static population has elevated activity comparing to the migratory population. The specific role of these miRNAs in relation to metastasis in breast cancer remains unknown. However, it is worth noting that miR-506-5p has been reported to be sponged by FOXD2-AS1, thereby having a role in glioma metastasis^[268]101. This finding could potentially explain the reduced activity observed in migratory cells compared to static cells. Some miRNAs have very similar activity maps. One set is the pair miR-125a-5p (Supplementary Fig. [269]4d) and miR-125b-5p (Supplementary Fig. [270]4c), which belong to the miR-10 family^[271]102. In this case, the similarity of their activity maps can be explained by the fact that their mature sequence similarity is high (0.682^[272]102) and that their target lists greatly overlap (hypergeometric p-value 5.42e−62). Another pair of miRNAs, miR-892c-5p (Supplementary Fig. [273]4m) and miR-506-5p (Supplementary Fig. [274]4n), also have somewhat similar activity maps. They do not belong to the same family^[275]102, however they too are similar in sequence (mature sequence similarity 0.762^[276]102) and in target lists (hypergeometric p-value 1.63e−29). Some similarity also exists between miR-486-3p (Supplementary Fig. [277]4h), miR-1207-5p (Supplementary Fig. [278]4i), miR-6721-5p (Supplementary Fig. [279]4g) and miR-4763-3p (Supplementary Fig. [280]4k). They also have somewhat similar mature sequences (≥0.238 for all pairwise comparisons^[281]102) and similar target lists (hypergeometric p-value < 1.8e−9 for all pairwise comparisons). miR-16-5p, miR-17-5p, miR-8485, and miR-124-3p (Fig. [282]4) present an interesting case. While the ranges of activity p-values differ from one miRNA to another, their overall pattern is almost identical. These miRNAs have no significant overlap of their target lists (hypergeometric p-value > 0.26 for all pairwise comparisons). Also, the mature sequence similarity between them is low (≤0.273 for all pairwise comparisons of miR-16-5p, miR-17-5p, and miR-124-3p; data of the sequence similarity of miR-8485 were missing from miRPathDB). To investigate whether the low expression targets of these miRNAs are involved in similar biological processes, we conducted pathway enrichment analysis using the canonical pathways gene set collection of the Human Molecular Signatures Database (MSigDB)^[283]103–[284]105. Our analysis revealed 10 pathways enriched (hypergeometric p-value ≤0.01) among the low expression targets of miR-16-5p, miR-17-5p and miR-124-3p (Supplementary Fig. [285]5), almost all of them related to cancer. Further research is required to elucidate their specific functions, as well as the potential involvement of miR-8485. To allow users to easily determine whether the target lists of two miRNAs greatly overlap, we computed the Jaccard index for every pair of miRNA target lists. The resulting table is available at Zenodo 10.5281/zenodo.10720979. Validation and comparison to other methods Validation on bulk data from the NCI Genomic Data Commons (GDC) database and comparison to miREACT We investigate the relationship between miTEA-HiRes activity p-values and measured expression by using data from the GDC portal^[286]106, which includes cancer tumors with matched bulk RNAseq and bulk miRNAseq samples. Similarly, we compare miReact^[287]59 activity scores to measured expression. We established a list of 11,927 cases in the GDC portal that have both bulk RNAseq sample and a miRNAseq sample, from a solid tissue primary tumor (Methods under GDC data curation). The dataset included tumors in 140 tissue types. We ran miTEA-HiRes and miReact on the RNAseq counts to predict miRNA activity p-values (Methods under Evaluating agreement between activity and expression in the GDC dataset, for miTEA- HiRes and miReact). For each sample, we sorted the miRNAs by descending expression, and used the mHG test to evaluate whether active miRNAs are over-represented at the top of the list (Fig. [288]5). Similarly, we computed the over-representation of non-active miRNAs at the bottom of the list (Fig. [289]5). Briefly, active vs. non-active is determined by p-value < 0.05, p-value > 0.1, respectively, in both tools. As seen in Fig. [290]5, in both cases, the agreement between expression and activity was better for miTEA-HiRes than for miReact. Fig. 5. miTEA-HiRes validation and comparison to miReact, using the GDC bulk dataset. [291]Fig. 5 [292]Open in a new tab Agreement between expression and activity, per sample, reflected by the enrichment of: a active miRNAs among the most expressed miRNAs; b non-active miRNAs among the least expressed miRNAs; miTEA-HiRes activity p-values grouped by cancer tissue for: c miR-101-3p; d miR-22-3p; e miR-10b-3p. Full results are available at Zenodo 10.5281/zenodo.10720979. We also present anecdotal examples of miRNAs with known functions in specific cancers (Methods under Activity and expression across cancer tissues of origin). Their expression levels are, indeed, reflected in their activity scores. For example, miR-101-3p has elevated activity, as determined by miTEA-HiRes, in the liver and the adrenal gland, compared to other cancer types (Fig. [293]5). Literature suggests it has a significant role in hepatocellular cancer^[294]107–[295]109, and also in adrenocortical carcinoma^[296]110. Similarly, miR-22-3p has a role in heptocellular cancer^[297]111,[298]112 and in adrenocortical carcinoma^[299]113, as reflected in its activity levels (Fig. [300]5). And finally, miR-10b-3p has a role in adrecortical carcinoma^[301]110, as reflected in its activity levels, as inferred by miTEA-HiRes (Fig. [302]5). The expression data of these three miRNAs is depicted in Supplementary Fig. [303]6. Full results are available at Zenodo 10.5281/zenodo.10720979. Validation using two single-cell total-RNA datasets showcases significant relationship between miRNA expression and activity We further investigate the relationship between miRNA activity and expression by analyzing total-RNA sequencing data obtained using the Smart-seq-total method^[304]45. This method allows for the simultaneous measurement of single-cell mRNA and miRNA expression. The datasets consist of RNAome data from human primary fibroblasts, from HEK293T and MCF7 cells (which were treated as a single multi-cell type dataset consisting of 633 cells), as well as from induced murine embryonic stem cells differentiated into embryoid bodies (2167 cells)^[305]45. We quantified the association between expression and activity by first sorting the miRNAs in each dataset by expression, then measuring the enrichment of active (non-active) miRNAs, as assessed by miTEA-HiRes (Methods under Evaluating the relationship between single-cell miRNA expression and activity), at the top (bottom) of the list. We utilized the mHG test to determine the level of enrichment. This process was performed for both human and mouse datasets (see Methods and enrichment plots in Supplementary Fig. [306]7). For the human dataset we found a significant enrichment of active miRNAs at the top of the list (mHG p-value = 3.76e−05). We also found a significant enrichment of non-active miRNAs at the bottom of the list (p-value = 0.0055). For the mouse dataset, significant enrichment (<0.05) was detected on both sides as well; enrichment of active miRNAs at the top of the list (p-value = 0.0301) and non-active miRNAs at the bottom of the list (p-value = 0.0114). We also tested the agreement between single-cell expression and activity in a single-cell resolution, for specific miRNAs that were found to be both overall active and expressed. Supplementary Fig. [307]8 shows example results from this analysis. We were hoping to explore the relationship between the expression and activity of miRNAs using additional total-RNA datasets generated by VASA-seq^[308]46 and STORM-seq^[309]47 technologies. However, we found that these technologies report very low expression values of miRNAs, probably due to low miRNA capture efficiency. Validation using scRNA-seq from miRNA induction experiment Inspired by miRSCAPE^[310]60 validation, we sought to find whether miTEA-HiRes is able to accurately capture the differential expression of miR-199a-5p and miR-199a-3p, induced in i199 and i199-KTN1 cell lines (derived from HEK293 cell line, by using a plasmid containing the pri-miRNA)^[311]114. Towards this, we obtained the count matrices of 3280 i199 cells, and 3143 i199-KTN1 cells, including induced and uninduced annotations directly from the author^[312]114. We applied miTEA-HiRes on the two datasets, and found that for both cell lines, both miR-199a-5p and miR-199a-3p were significantly more active in the induced cells compared to the uninduced ones (see Methods under Evaluating differential activity in single-cell miRNA induction experiment). Two-sided WRS p-values for induced vs. uninduced cells are as follows. For induced i199 (695 cells) vs. uninduced (1561 cells), we get p-value = 7e−15 for miR-199a-3p and p-value =1e−10 for miR-199a-5p. For induced i199-KTN1 (611 cells) vs. uninduced (1075 cells), we get p-value = 0.001 for miR-199a-3p and p-value = 0.007 for miR-199a-5p. As reported by miRSCAPE^[313]60 authors, miRSCAPE results were accurately aligned with the induction of both miRNAs while miReact could not consistently detect the upregulation of miR-199 in induced cells. Results are available at Zenodo 10.5281/zenodo.10720979. Discussion We developed miTEA-HiRes, an algorithmic approach, implemented as a Python package (including a pip-installable version), to predict miRNA activity in single cell RNA-seq and spatial transcriptomics datasets. miTEA-HiRes relies on experimentally validated miRNA-target interactions and on the mHG statistics. We used miTEA-HiRes to predict miRNA activity in six spatial datasets by Visium (Fig. [314]2), single cell datasets of PBMCs and CSF from MS patients and a control group (Fig. [315]3, Tables [316]1–[317]3, Supplementary Tables [318]1–[319]3), as well as in a single cell dataset of stationary and metastatic cells of breast cancer cell lines and a primary tumor samples (Fig. [320]4 and Supplementary Fig. [321]4). We also evaluated the relationship between expression and activity of miRNAs using totalRNA data (Supplementary Figs. [322]7 and [323]8). Finally we compare our method to miREACT using bulk GDC data (Fig. [324]5), and to both miREACT and miRSCAPE using miRNA induction experiment in single-cell data. By employing miTEA-HiRes on spatial transcriptomics data, it is possible to obtain information about the spatial distribution of miRNA activity, information that cannot be directly obtained from standard measurement. The inferred information then allows us to determine, for example, whether the miRNA activity is concentrated in one location or spread across the tissue, whether it coincides with specific tissue regions identified by H&E staining, by standard clustering, or by adjacent IHC, and if it overlaps with the activity patterns of other miRNAs within the same sample. For example, in the lung cancer sample, let-7b-5p and miR-6804-3p were found to have significantly non-overlapping activity maps which suggests a synthetic-lethality behavior for this pair of miRNAs. This finding provides insight into miR-6804-3p’s function by drawing on the established functions of let-7b-5p. Another example is the activity of the tumor suppressor miR-16-5p, which is located in a well-defined area of the breast cancer sample, an area that has a distinct H&E staining pattern and that is marked at the level of full gene expression clustering. This activity pattern stands in contrast with the activity of another tumor suppressor, miR-29c-3p, that is far more spread out across the same tissue (Fig. [325]2). For single cell data, miTEA-HiRes has two optional modes: total activity mode, which reports on miRNAs that are overall active, and comparative activity mode, which compares between populations of cells. We found that in some scenarios, miRNAs that are generally more active in a sample (reported by the total activity mode) greatly overlap with miRNAs that are differentially active between the sample main populations (reported by the comparative activity mode). An example observation is found analyzing MS and control PBMCs and CSF cells. In this analysis, we see a significant share of miRNAs that were found to be overall active and to also be differentially active between the MS and control populations. In other scenarios, such as the stationary and metastatic breast cancer cells, distinct sets of miRNAs were reported by the different modes, that is- miRNAs that were found to be highly active in all cells, did not have a significant overlap with the ones that were found to be differentially active between the stationary and metastatic populations. The variation of this aspect between the two datasets could potentially be attributed to the unique nature of these pathologies and the varying roles that miRNAs play in their formative and developmental processes. We found that in the breast cancer single cell dataset, some sets of miRNAs had very similar activity maps. The activity map similarity of most of these sets can be easily explained by the fact that their mature sequence similarity is high or by the (related) fact that their target gene lists greatly overlap. However, the set miR-16-5p, miR-17-5p, miR-8485 and miR-124-3p (Fig. [326]4) cannot be so simply explained. This, therefore, is an insight about a possible similarity of functions between these miRNAs (or their targets), or perhaps a biological process that involves all of them. A future extension of miTEA-HiRes may include this downstream analysis step of spotting these very interesting sets of possibly cooperating miRNAs. Currently, users can use the Jaccard index similarity table available at Zenodo 10.5281/zenodo.10720979 to evaluate the overlap between miRNA target lists. In some cases, miTEA-HiRes found that miRNAs were universally active, that is, active in all (or almost all) cells or spots. Examples of this phenomenon are given by mir-106b-5p in the MS PBMCs data (Supplementary Fig. [327]9), and four miRNAs in the breast cancer dataset mentioned above (miR-16-5p, miR-17-5p, miR-8485, and miR-124-3p; Fig. [328]4). The phenomenon may be biologically explained by the fact that some miRNAs have multiple isomiRs: miRNA variants that originate from the same loci, but are somewhat different due to various processes such as RNA editing, SNPs, or imprecise cleavage^[329]115,[330]116. While a miRNA may have a large number of targets, each specific isomiR is more inclined to target a subset of these targets. If different isomiRs are present at different regions of the tissue or at different cell populations, the subset of targets being down-regulated would change between cells\spots. However- the miRNA would be considered universally active, from our statistical perspective. Mathematically, this finding can be counter-intuitive due to the normalization step. However, the same logic applies here: different subsets of the target list are down-regulated in different subsets of the cells\spots. To address this mathematical phenomenon we present a synthetic count matrix (see Supplementary Fig. [331]9). An important aspect of miTEA-HiRes is its ability to facilitate the comparison of miRNA activity between subpopulations of the data, allowing researchers to derive more precise conclusions. For example, miR-651-3p was found to be significantly differentially active between MS and control in both PBMCs and CSF cells. The activity scores indicated a decreased activity in MS compared to the control group overall. However, only when comparing each cell type separately, we found that this pattern predominantly manifests in T cells. Surprisingly, in PBMCs, granulocytes exhibit an opposite pattern, with miR-651-3p being significantly more active in MS compared to the control group (Fig. [332]3, Table [333]2). As demonstrated by comparing Fig. [334]3 and Supplementary Fig. [335]11, significant differences in miRNA activity between conditions consistently reflect changes in the expression levels of their targets. The ability to compare subpopulations also effectively addresses another concern. When analyzing the two primary populations within the data, their composition can influence the outcomes. For instance, if a specific cell type is more prevalent in the first population and has higher activity of a particular miRNA, comparing the populations as a whole would simply indicate that the first group is more active for this miRNA. However, conducting comparisons among cell types within each group, as well as between the groups, would provide a better understanding of the findings. The connection between the expression level of a miRNA and its activity level is an open question, not yet well addressed due to scarce resources that offered measurement of both. We explored this connection using a total RNA mouse and human datasets. The datasets include both the expression levels of mRNAs and lncRNAs (miRNA targets, which can be used by miTEA-HiRes to predict activity levels), and the expression levels of the miRNAs themselves. In both datasets, we found a significant enrichment of active miRNAs among highly expressed miRNAs (Supplementary Fig. [336]7), and also a significant enrichment of non-active miRNAs among low expression miRNAs. This finding is even more remarkable given that these datasets include the expression of the 3p and 5p strands of each miRNA combined, while they appear separately in the miRNA-target database thus yielding possibly different miTEA-HiRes results. We also examined the relationship between expression and activity in MS, by scanning the literature for what is known about the expression of miRNAs that were found to be differentially active. The ten miRNAs that were found to be the most differentially active in MS PBMCs were all found to be less active in MS compared to healthy control. For five of them, activity and expression had a similar pattern, that is: these miRNAs are known to be down-regulated or under-expressed in MS compared to healthy control, or in severe MS compared to mild MS. Two of them are known to be over-expressed in certain stages of MS compared to healthy control, indicating a more complex working mechanism that may lead to disagreement between activity and expression. For the remaining three miRNAs, we could not find literature regarding their expression in MS. While miRNA expression and activity are obviously related, they are not unanimous due to the complexity of cell biology. Our findings shed light on the interrelation of miRNA expression and activity. Through the integration of miRNA expression measurement and activity inference using our approach and code package, researchers can gain deeper insight into cellular level regulatory pathways, leading to a better understanding of spatial and cell-level biology. In summary, miTEA-HiRes provides a means to capture the signal of miRNA activity, which usually goes undetected in gene expression datasets. Our study demonstrated that miTEA-HiRes successfully identified numerous functional miRNAs in specific samples and sample types. Through comparison with other methods, we emphasize the strengths of miTEA-HiRes. Most notably miTEA-HiRes provides robust quality of results while relying on the data itself without the need to collect and pre-train on large problem-specific datasets. miTEA-HiRes also offers ease of usage, with a friendly end-to-end pip-installable Python package. Additionally, we provide several leads that may be further investigated. miTEA-HiRes can thus help define miRNA roles in biological processes and point out miRNAs that should be further explored. Also, future work could utilize the general framework of our technique in uncovering the activity of transcription factors, providing a broader and more comprehensive understanding of active biological processes. Methods Statistical assessment of miRNA activity The computation of activity p-values by miTEA-HiRes (including data preprocessing) is described as a pseudo-code in Supplementary Algorithm [337]1. miTEA-HiRes accepts as input both spatial transciptomics and single-cell RNA-sequencing raw count matrices containing gene counts per spot or cell. Bulk data may be analyzed as single cell data. The steps and components of the analysis are further described herein. miRNA-target interactions (MTIs) MTIs for humans and mice were acquired from the experimentally validated miRNA-target interactions database miRTarBase 8.0^[338]117. Non-functional MTIs were excluded, and functional MTIs with a strong experimental evidence or at least two weak experimental evidences were preserved, similar to the approach used in a previous study^[339]63. The filtered MTIs are provided within the miTEA-HiRes package for 706 mouse miRNAs and 2571 human miRNAs. Preprocessing For single-cell data, if there are multiple files, they are merged into a single matrix and a random sample of 10,000 cells is taken to improve performance and minimally sacrificing statistical power. After data cleansing (i.e. removing duplicated genes, removing genes with zero counts, and removing cells with identical gene expression profiles), miTEA-HiRes normalizes the count matrix so that the total number of reads in each cell or spot is 10,000 (Fig. [340]1, top middle panel). To capture deviations in gene expression relative to its own dynamic expression across samples, each gene’s values are transformed into z-scores (Fig. [341]1, top right panel). That is, for a dataset with S cells or spots, each count c(g, s) per gene\transcript g and cell\spot s is transformed as follows: [MATH: z(g,s)=c(< mi>g,s)a(g) σ(g) :MATH] Where [MATH: a(g)= 1Ssc(g,s) :MATH] and [MATH: σ(g)= 1S s< mrow>(c(g,s)a(g))2 :MATH] Automatic data preprocessing is available in the miTEA-HiRes package. Activity p-values As depicted in Fig. [342]1 (middle row), for each spot or cell s, miTEA-HiRes ranks the genes\transcripts g by their z-score transformed counts, with the smallest value z(g, s) at the top and the largest value z(g, s) at the bottom. Next, using the set of MTIs, miTEA-HiRes converts this ranked gene list into a binary ordered list for each miRNA of interest, where target genes are assigned a value of 1 and all other genes are assigned a value of 0 (as depicted in Supplementary Fig. [343]3). miTEA-HiRes then employs the mHG (minimum hypergeometric) test (see next section) to assess statistical enrichment in this ranked binary list. For this purpose, we are using the XL-mHG Python package^[344]118,[345]119. The mHG p-value is then used by miTEA-HiRes to represent the miRNA activity level for the specific combination of miRNA and cell or spot being analyzed. In case no targets are identified, the p-value generated by the mHG test is exactly one. mHG (minimum hypergeometric) test The nonparametric mHG test was developed by ref. ^[346]65 and is only briefly described here for completeness. The test aims to evaluate the level of enrichment present at the top of a ranked binary list. According to the null hypothesis, the given list is randomly and uniformly selected from a pool of all lists containing N entries, with K of them being 1s. The alternative hypothesis suggests that the 1s have a tendency to prominently appear at the top of the list. Let [MATH: X~Hyper< mi>geometric(N,K< mo>,n) :MATH] and let λ be the observed binary list of length N, containing K 1s and (N−K) 0s. The mHG statistic is computed as: [MATH: mHG(λ)=min1nNProb(Xbn(λ)) :MATH] Where [MATH: bn=i=1 nλi. :MATH] The p-value of mHG is then computed as the fraction of permutations [MATH: λ~ :MATH] , of the vector λ, for which [MATH: mHG(λ~)mHG (λ) :MATH] . That is, when using Λ to denote all such permuted patterns, we consider: [MATH: p-valuemHG( λ)=< mfenced close="∣" open="∣">λ~ΛmHG(λ~)mHG (λ)< /mfenced>NK :MATH] 1 Activity scores According to the data type and mode of operation (spatial, or single cell: total activity or comparative activity modes), miTEA-HiRes ranks miRNAs to highlight the most notable ones, and assigns an aggregated activity score for each one of them (details follow). Relevant miRNA activity plots are produced, with activity p-values (per cell/spot) transformed to -log10(p-value) to emphasize significant results (see Fig. [347]1, bottom row). The ranked list of miRNAs, as well as links to the corresponding plots, are provided by the miTEA-HiRes package. For spatial transcriptomics The aggregated activity score for each miRNA is the fraction of spots with observed significant activity (p-value < 1e−5). miRNAs are sorted based on their activity scores, and spatial miRNA activity maps are generated as part of the output. For single-cell RNA-seq in total activity mode miTEA-HiRes first averages the activity p-values across all cells for each miRNA. Then, these average p-values are FDR-corrected to obtain the final activity score for every miRNA. Also, miTEA-HiRes produces activity maps on a UMAP layout. In order to treat the activity score as an exact p-value, a more conservative merging approach may be applied^[348]120–[349]124. For single-cell RNA-seq in comparative activity mode The miRNA activity score represents the level of differential activity between the two populations and is defined by the two-sided WRS p-value, subjected to FDR correction. We recommend using the comparative activity mode to compare populations of at least 100 cells. The ranking mechanism of miRNAs was designed to highlight miRNAs that are not only differentially active but also have consistently high activity p-values in at least one population. At the top of the ranked list are miRNAs that demonstrate an average activity p-value that is equal to or greater than the 0.97 quantile in at least one population and have activity scores smaller than 1e−8. Following these top miRNAs, are miRNAs that manifest an average activity p-value that is equal to or greater than the 0.9 quantile in at least one population and have activity score smaller than 1e−8. All the remaining miRNAs are sorted based on their activity scores. Finally, relevant UMAPs and histograms are produced to demonstrate the different activity patterns across the two populations. UMAP projection Before computing the UMAP coordinates, miTEA-HiRes first performs commonly used preprocessing steps on the raw count matrix: cells with less than 200 genes are excluded, genes expressed in less than 3 cells are excluded, cells with total number of counts below the bottom 2% and above the top 2% are excluded to remove outliers, cells are normalized to 10,000 total counts, and all counts are transformed to log(count). miTEA-HiRes then follows the Scanpy recommended pipeline^[350]125 for generating UMAP, including the extraction of highly variable genes, PCA (principal component analysis), computation of the neighborhood graph of cells, and finally computation of UMAP coordinates. miTEA-HiRes Python package miTEA-HiRes is a user friendly python package, available in [351]https://github.com/EfiHerbst31/miTEA-HiResGithub and installable using pip. It can be executed with merely a path to the data and dataset name, but also has multiple options for more refined analysis. Basic execution includes end-to-end miRNA activity map computation. Output of the basic execution includes an activity matrix, an html list of ranked miRNAs, along with their activity scores and links to plots of top 10 miRNAs. Executions can be done via command-line or by importing the library and using its functions. Data type (i.e. single-cell or spatial transcriptomics), and data extension (supported: txt, tsv, pkl and mtx; files may be zipped) are automatically inferred from the data location. If preprocessing is required, multiple files are merged, 10,000 cells are sampled by default and data cleaning is performed (processed data is saved as .pkl file for reproducibility). Supported species are homo sapiens and mus musculus. In order to run in single-cell comparative activity mode, populations strings contained within cell names are required, which are assumed to be unique across the cells. miTEA-HiRes is computation intensive (for example, the running time of a spatial dataset with 2264 spots and ~6000 genes, on a 16 cores machine with 706 examined microRNAs, is 22 minutes), hence it is recommended to run miTEA-HiRes on a multicore machine (by default all available CPUs are utilized). By default all miRNAs available in the MTI database are analyzed. By default top 10 most interesting miRNAs are plotted, however this can be changed. More fine-tuning can be done using the possibility to filter spots with a small number of reads, and with the possibility to change the threshold below which a spot is considered to be active. Debugging mode allows for detailed output. Detailed information is provided in the package [352]https://github.com/EfiHerbst31/miTEA-HiResreadme document. Packages utilized within miTEA-HiRes: Anndata^[353]126, Scanpy^[354]127 and XL-mHG^[355]119. Statistics and reproducibility Analyzing spatial transcriptomics data sets Spatial transcriptomics datasets were analyzed using miTEA-HiRes without any preprocessing, using the default parameters. Analyzing Multiple Sclerosis single-cell data sets Both PBMC and CSF datasets were analyzed using miTEA-HiRes without any preprocessing, using default parameters (sampling 10K cells). The MS dataset with cell types was analyzed without sampling, to improve statistical significance when analyzing cell subsets. Analyzing breast cancer single-cell data sets To enhance the statistical power and leverage the similarities between breast cancer samples, we merged all cell lines and patient-derived cells into two groups: 1182 static cells and 1992 migratory cells. We then used the miTEA-HiRes analysis pipeline. GDC data curation First, we used the GDC portal^[356]106 to create two cohorts- one of cases with RNAseq data and the other of cases with miRNAseq data. We then used Python code to intersect these two cohorts, and loaded the intersected cohort back to the GDC portal. We used the portal to identify the RNAseq samples and the miRNAseq samples that followed these criteria: primary tumor, solid tissue, open access, tsv or txt. We then used Python to download the relevant samples. If more than one RNAseq sample \miRNAseq sample was available for the same case, we chose the one with the most detected genes \miRNAs. Finally, we produced one gene expression count matrix of 11,927 samples and 57,287 genes from the RNAseq samples, and one miRNA expression (reads per million) count matrix of 11,927 samples and 1810 miRNAs from the miRNAseq samples. We then found that in the second matrix, for some miRNAs, we had the expression values of the precursors of the miRNA (marked with “-1","-2"... at the end) rather than the mature miRNA. In these cases, we summed the expression values of the precursors to get an assessment of the expression of the mature miRNA. After performing this step, and removing miRNAs that were not expressed at all, we had 1662 miRNAs in the miRNA expression matrix. Evaluating agreement between activity and expression in the GDC dataset, for miTEA-HiRes and miReact We ran miTEA-HiRes on the gene expression count matrix using the total activity mode (Methods). We also ran miReact^[357]59 in R on the same gene expression count matrix, as described in miReact tutorial^[358]128. From both methods we received an activity p-values matrix as output, with a prediction of the activity of each miRNA (2571 miRNAs for miTEA-HiRes, 2578 miRNAs for miReact) in each of the 11,927 samples. We transformed the miTEA-HiRes activity p-values matrix to -log10(p-values) (the miReact activity p-values were already -log10 transformed). Each of the miRNAs that appeared in the activity matrices of both miTEA-HiRes and miReact, also appeared as one of the 1662 miRNAs in the miRNA expression matrix. For strand-specific miRNAs, no strand was specified for the miRNA that appeared in the miRNA expression matrix. For both miTEA-HiRes and miReact, we determined activity as follows. A miRNA was considered active in a certain sample if its activity p-value was less than 0.05. If a strand-specific miRNA was considered active, we considered the no-strand version of the same miRNA (the one that appeared in the miRNA expression matrix) as active. For each sample, we ordered the miRNAs based on their descending expression. We then used the mHG test to evaluate whether active miRNAs were enriched at the top of the list. The p-values of the mHG tests performed for all samples are summarized in Fig. [359]5. Similarly, we considered miRNAs with an activity p-value of more than 0.1 as non-active. If a strand-specific miRNA was found to be non-active, and its opposite strand was not found to be active, then we considered the no-strand version of the same miRNA (the one that appeared in the miRNA expression matrix) to be non-active. For each sample, we ordered the miRNAs based on their ascending expression. We then used the mHG test to evaluate whether non-active miRNAs were enriched at the top of the list. The p-values of the mHG tests performed for all samples are summarized in Fig. [360]5. Activity and expression across cancer tissues of origin We downloaded the clinical data for the final set of cases, chosen from the GDC portal as explained above. Five cases had multiple entries (with inconsistencies in the relevant column) and were excluded. On the “tissue_or_organ_of_origin" column, we found that 11,896 cases had 140 tissues of origin, while for 26 cases the tissue of origin was not reported. We used a Python function to group the origins into 35 combined tissues. For example, “Fallopian tube", “Cervix uteri" and “Endometrium" were classified as “Female Reproductive System", while “Cortex of adrenal gland" and “Medulla of adrenal gland" were classified as “Adrenal Gland". Then, for miR-101-3p, miR-22-3p, and miR-10b-3p, we produced the box plots of miTEA-HiRes activity values across 14 combined tissues that had at least 200 samples each (Fig. [361]5). Similarly, we produced the expression box plots for the same miRNAs (Supplementary Fig. [362]6). Evaluating the relationship between single-cell miRNA expression and activity We downloaded total-RNA sequencing datasets obtained using the Smart-seq-total method^[363]45. We then merged the datasets of human primary fibroblasts, HEK293T, and MCF7 cells into a single human dataset (633 cells), and the datasets of all stages of murine embryonic stem cells differentiation into a single mouse dataset (2167 cells). To explore the enrichment level of active miRNAs at the top of the ranked list of overexpressed miRNAs, we performed the following processing pipeline. To obtain the list of most expressed miRNAs, we summed the raw counts for each miRNA over all the cells. We retained only those miRNAs that were also available in miTEA-HiRes database. We ranked this list in descending order, with the most expressed miRNAs positioned at the top. A miRNA was considered active if its activity score was less than 0.05 (average activity p-value over all cells). miTEA-HiRes’ MTI database distinguishes between miRNAs with 3p and 5p strands, while the total-RNA dataset displays only one value for the two strands, which represents the summed expression of them. To account for this, we considered a miRNA to be active if at least one of the strands was active. To utilize the mHG test, we converted the ranked list of overexpressed miRNAs into 1’s (representing active miRNAs) and 0’s. To explore the enrichment level of non-active miRNAs at the top of the ranked list of underexpressed miRNAs, we performed the following processing pipeline. To obtain the list of underexpressed miRNAs, we used the reverse-order of the overexpressed miRNAs list obtained previously. A miRNA was considered to be non-active if its activity score was greater than 0.1 (average activity p-value over all cells). If a miRNA was found to be non-active, but its other strand (3p or 5p) was previously considered as active, it was excluded from the non-active miRNAs list to ensure that only truly non-active miRNAs were included. To utilize the mHG test, we converted the ranked list of underexpressed miRNAs into 1’s (representing non-active miRNA) and 0’s otherwise. For the human dataset, enrichment of active miRNAs at the top of the list of expressed miRNAs was calculated as follows: p[mHG](N = 1462, B = 740, n* = 138, b(n*) = 97) = 3.76e−05. Enrichment of non-active miRNAs at the bottom of the list: p[mHG](N = 1462, B = 526, n* = 1367, b(n*) = 508) = 0.0055. For the mouse dataset, enrichment of active miRNAs at the top of the list: p[mHG](N = 492, B = 47, n* = 234, b(n*) = 32) = 0.0301. Enrichment of non-active miRNAs at the bottom of the list: p[mHG](N = 492, B = 413, n* = 265, b(n*) = 236) = 0.0114). N refers to the number of shared miRNAs between the dataset and miTEA-HiRes database, B is the number of active (non-active) miRNAs, n* is the cutoff selected by the mHG test, b(n*) is the number of miRNAs both highly expressed and active (lowly expressed and non-active) at the cutoff. Evaluating differential activity in single-cell miRNA induction experiment We obtained the count matrices of 3280 distinct i199 cells, and 3143 i199-KTN1 cells, including induced and uninduced annotations directly from the author^[364]114. Following miRSCAPE methods^[365]60, we annotated the cells according to GeGFP-00001 expression levels (’induced’ if the expression level was within the top 20%, and ’uninduced’ if the expression level was zero). After cell annotation, we removed the genes related to miRNA induction experiment from the data. As a preprocessing step, we filtered out cells having gene detection rate amongst the top 2%. Also, as we noted that the overall gene detection rate was lower in the i199-KTN1 cells compared to the i199 cells, we filtered out cells having less than 5000 detected genes. As a result, we were left with the following cell type composition: i199 had 695 induced cells, 1561 uninduced cells, and 2814 cells in total. i199-KTN1 had 611 induced cells, 1075 uninduced cells, and 2264 cells in total. We then applied miTEA-HiRes in comparative activity mode, focusing on miR-199a-5p and miR-199a-3p. miTEA-HiRes indicated correctly the excess activity of both miRNAs in the induced cell group in both cell types, as measured using WRS p-values. Results are available at Zenodo 10.5281/zenodo.10720979. Reporting summary Further information on research design is available in the [366]Nature Portfolio Reporting Summary linked to this article. Supplementary information [367]Transparent Peer Review file^ (282.9KB, pdf) [368]Supplementary Information^ (4.6MB, pdf) [369]Reporting Summary^ (1.9MB, pdf) Acknowledgements