Graphical abstract

   graphic file with name fx1.jpg
   [63]Open in a new tab

Highlights

     * •
       Integration and novel clustering improve the resolution of bone
       marrow microenvironment
     * •
       Novel functional states in endothelium and differentiation stages
       in the mesenchyme
     * •
       Inference of conserved features in human from those identified in
       mouse
     __________________________________________________________________

   Biological sciences; Stem cells research; Omics; Transcriptomics

Introduction

   Uncovering pathogenetic mechanisms requires identifying the
   corresponding major groups of genes in the disease-relevant tissues
   ([64]Gomez-Cabrero et al., 2014; [65]Wolkenhauer et al., 2013). To this
   end, collective efforts such as the Human Single-Cell Atlas have been
   launched, aiming at providing a single-cell map of human tissues and
   organs. A core case is the hematopoietic system, where single-cell RNA
   sequencing (scRNA-seq) has allowed to refine our understanding of
   hematopoiesis in mouse and human ([66]Giladi et al., 2018;
   [67]Nestorowa et al., 2016; [68]Rodriguez-Fraticelli et al., 2018).
   Furthermore, these studies have challenged the classical view of
   hematopoiesis differentiation as a compendium of discrete cellular
   states with decreased differentiation potential toward a more dynamic
   view in which hematopoietic stem and progenitor cells (HSPC) gradually
   pass through a continuum of differentiation states ([69]Karamitros
   et al., 2017; [70]Laurenti and Göttgens, 2018; [71]Velten et al., 2017;
   [72]Weinreb et al., 2020). Moreover, recent studies using scRNA-seq
   technologies have shed light on the organization of the hematopoietic
   regulatory microenvironment in the mouse ([73]Baccin et al., 2020;
   [74]Baryawno et al., 2019; [75]Kanazawa et al., 2021; [76]Matsushita
   et al., 2020; [77]Tikhonova et al., 2019; [78]Wolock et al., 2019;
   [79]Zhong et al., 2020). These studies have resolved some of the
   controversies regarding the overlap of stromal populations previously
   described and the description of certain discrete stromal cells as
   professional, hematopoietic cytokine-producing populations ([80]Baccin
   et al., 2020). Moreover, further combination with in situ technologies
   helped to delineate the relationship between specific stromal cell
   types in the murine bone marrow (BM) ([81]Baccin et al., 2020). This
   data have provided a wider and more dynamic picture of hematopoiesis
   and their regulatory microenvironment, allowing for a provocative
   hypothesis to rise, such as whether their specific association with
   given niches controls transcriptional states in hematopoietic stem
   cells and whether these states are reversible upon occupying
   alternative niches.

   Nevertheless, these studies are limited by the number of cells
   sequenced, potentially hampering our ability to resolve the full
   spectrum of cellular states and differentiation stages that define the
   stromal BM microenvironment. More recently, efforts to integrate
   multiple datasets generated in different labs have been successfully
   attempted ([82]Dolgalev and Tikhonova, 2021); however, they have failed
   to provide additional information compared to that derived from the
   individual studies. Furthermore, knowledge on the conservation of the
   cellular composition in the human BM stroma is in its infancy due to
   the difficulty of obtaining high-quality samples with sufficient
   stromal cell numbers from healthy individuals. This leaves us with two
   outstanding challenges; how to piece together such different fragments
   toward a comprehensive molecular atlas and to what extent such an atlas
   in mice is conserved in the human bone marrow.

   Here, we integrate three scRNA-seq datasets (two publicly available
   ([83]Baryawno et al., 2019; [84]Tikhonova et al., 2019) and one
   in-house) separately targeting two well-defined populations
   (endothelial and mesenchymal cells). The integration of distinct
   datasets required developing tailored bioinformatics pipelines to
   ensure the robust identification of cell types and stages. We identify
   14 endothelial subclusters and 11 subpopulations defining different
   stages of differentiation in the mesenchyme. Our analysis provides the
   most comprehensive atlas of the cellular composition in the mouse bone
   marrow. Last, we asked to what extent such an atlas could provide
   insight into the less accessible human BM microenvironment. To this
   end, we made the first pilot study, profiling the human BM using
   scRNA-seq, which was integrated with our mouse BM atlas. This analysis
   demonstrated substantial conservation between species.

Results

Data integration and high-resolution clustering strategy

   [85]Figure 1 provides a graphical summary of the experimental design
   and the analysis flow. We integrated selected subsets of cells from
   three distinct mouse datasets: two recently published (“[86]Tikhonova
   et al. (2019), 6626 cells, and “[87]Baryawno et al. (2019), 38443
   cells) and an independent dataset (“In-house” dataset, 13,402). These
   datasets differ in the procedures for isolation of cells within the BM
   microenvironment. This includes unbiased isolation of cells lacking
   hematopoietic markers (“Baryawno” ([88]Baryawno et al., 2019) and
   “In-house”) ([89]Figure S1) versus targeted isolation of populations of
   interest as in the study by [90]Tikhonova et al. (2019). Furthermore,
   not every cell type identified in one study is present in the other
   datasets.

Figure 1.

   [91]Figure 1
   [92]Open in a new tab

   Overview of the paper

   Graphical brief description of the paper. See also [93]Figure S1.

   We decided to focus our analysis on those bona fide niche populations
   such as endothelial (EC) and mesenchymal (MSC) cells due to their
   presence in the three studies and their relevance in controlling
   hematopoietic stem cell (HSC) maintenance. For the integrative
   analysis, we used the Tikhonova study as a reference to facilitate the
   integration, considering that their cells were isolated based on
   fluorescence reporter expression driven by cell-specific gene
   promoters: VE-Cad for endothelial cells and LEPR for mesenchymal cells.
   Therefore, to identify and label the cells of interest, we integrated
   separately “In house + Tikhonova” and “Baryawno + Tikhonova”
   ([94]Figure 2A) using the cell labels from the Tikhonova study. As a
   result, and after quality filters (see [95]STAR Methods), we labeled in
   each dataset endothelial cells (N = 9587) and mesenchymal cells (N =
   5291) that were used for integration of the three datasets for each
   cell type.

Figure 2.

   [96]Figure 2
   [97]Open in a new tab

   Data integration and high-resolution clustering strategy

   (A) Integrated analysis of the bone marrow niche datasets (two publicly
   available, [98]Tikhonova et al., 2019 and [99]Baryawno et al., 2019 and
   one generated in our lab, in-house) separately for two well-defined
   population (endothelial and mesenchymal cells). [100]Tikhonova et al.,
   2019 dataset is used as a reference considering their separated cell
   profiling strategy for COL2.3+, LEPR+, and VE-Cad+ populations. In the
   top row, a UMAP projection is depicted for each single-cell RNA
   dataset. In the left-lower (“In-house + Tikhonova”) and in the
   right-lower (“Baryawno + Tikhonova”), datasets are integrated to
   identify endothelial and mesenchymal populations.

   (B) Clustering strategy: the analysis of endothelial cells as an
   example. An upper limit to the cluster is set for the clustering (left
   panel) using Louvain high-resolution clustering. Then, an iterative
   divide-and-conquer strategy identifies the optimal level of clusters at
   different levels: Level 1 (second panel from the left), Level 2 (third
   panel), and Level 3 (fourth panel).

   (C–E) The robustness analysis for sub-clustering B3 (from Level 2 to
   Level 3). Specifically: (C) subclusters identified, (D) the fraction of
   assignments to its original cluster using a random-forest +
   bootstrapping strategy and (E) summary of the results (D) per cluster,
   #correct indicates the times a cluster is a dominant cluster (see
   [101]STAR Methods) for a cell within it in all pairwise comparisons
   (see [102]Figure S2 for the sub-clustering analysis of A1 and A2). See
   also [103]Figures S2 and [104]S3.

   Following the integration, endothelial and mesenchymal groups were used
   to identify cellular subtypes and stage-specific cells ([105]Figure 2,
   [106]S2, and [107]S3). However, current state-of-the-art clustering
   methodologies, including Louvain clustering ([108]Blondel et al., 2008;
   [109]Traag et al., 2019), could not discriminate robustly among
   different cell subtypes ([110]Figure 2B left panel and
   [111]Figure S3A), in part because there is a high degree of
   cell-to-cell similarity when considering cells of the same origin
   ([112]Tasic et al., 2016). To enable robust sub-clustering, we
   customized an existing bootstrapping-based approach. In brief, first, a
   divide-and-conquer strategy is applied, where the first level of robust
   clusters is identified (see [113]STAR Methods). Next, we proceeded with
   another round of clustering, yielding the second level of robust
   subclusters. As a termination criterion, no sub-clustering was
   considered if a cluster was found to have no sub-divisions in the
   Louvain high-resolution clustering ([114]Blondel et al., 2008;
   [115]Macosko et al., 2015) ([116]Figure 2B left panel and
   [117]Figure S3A). As a result, the cells are grouped into clusters;
   then, in the second step, we applied a bootstrapping-based methodology
   adapted from the Bosiljka study ([118]Tasic et al., 2016) (see
   [119]STAR Methods and [120]Figure S2A) to quantify the robustness of
   each cluster. We formulated two evaluation metrics: for each cell, we
   computed “how many times it has been correctly assigned to the cluster
   proposed” (e.g. recall per cell in [121]Figure S2D), and for each
   cluster, we quantified “how many times a cluster remains dominant
   (#Correct) in all comparisons for cells within it” (e.g. #correct in
   [122]Figure S2E); see [123]STAR Methods for a detailed explanation. If
   a non-robust cluster was identified, the cells of such cluster were
   then assigned to the neighboring clusters repeating the
   random-forest-based strategy. For instance, in the analysis of ECs,
   three levels of sub-clustering were conducted ([124]Figure 2B middle
   and right panels). In Level 1, two clusters were identified (A and B),
   and considering that possible subclusters were identified in the
   Louvain high-resolution clustering ([125]Figure 2B, (second panel from
   the left), they were sub-clustered resulting in Level 2. After Level 2,
   only three subclusters were further investigated (A1, A2, and B3) in
   Level 3 ([126]Figure 2B, third panel). Level 3 was identified as the
   final level, and a robustness analysis was conducted for the Level 3
   clusters ([127]Figure 2B fourth panel, C-E and [128]Figures S2C-S2H).
   Non-robust clusters were eliminated and their cells were reassigned to
   neighboring clusters (see [129]STAR Methods). Similarly, we applied the
   same robust clustering to the mesenchymal stromal cells
   ([130]Figures S3A-S3H). By using this approach, we were able to
   describe 14 subclusters in the endothelium ([131]Figure 2B fourth panel
   and [132]Figure 3A) and 11 in the mesenchyme ([133]Figure S3A fourth
   panel and [134]Figure 3C). We observed that both “cell proportions of
   subclusters in each dataset” and “the cell proportions of the dataset
   of origin within each subcluster” varied between datasets and
   subclusters ([135]Figures 3B and 3D, [136]S2B, and [137]S3B). While
   those biases could be caused for instance by the different selection
   protocols, we aimed to study the extent of the bias: (a) we did not
   identify any association with cell stage ([138]Figure 3B and 3D);
   (b) only clusters B2 and D3 did not have cells from the three datasets.
   And finally, (c) to quantify the possible biases derived from datasets,
   we conducted an entropy analysis to evaluate dataset distribution in
   the clusters; we observed large levels of entropy—varying between
   clusters—and the averages per cluster are shown in [139]Table S1
   ([140]Figure S5).

Figure 3.

   [141]Figure 3
   [142]Open in a new tab

   Quality control and added value of the clustering analysis

   (A) Representation of the final endothelial clusters.

   (B) Left panel: proportion of cells per dataset in each cluster in the
   final Endothelial clustering analysis. Right panel: proportion of cells
   per cell cycle stage using Seurat in each cluster in the final
   Endothelial clustering analysis.

   (C and D) Similar as (A and B) for MSC.

   (E and F) The added value of the integrated approach. Upper panel:
   every cell depicts the % of markers identified per cluster using only
   one dataset when compared with the markers identified in the total
   dataset. Middle panel: every cell depicts the % of False Negatives.
   Bottom panel: every cell depicts the % of False Positives when
   comparing the analysis conducted within each dataset with the
   integrated analysis (considered as the correct result).

   (E and F) are respectively associated with endothelial and mesenchymal
   cells.

   (G and H) Robustness of the cluster characterization using only cells
   from a single dataset but maintaining the same cluster structure.

   (G and H) are respectively associated with endothelial and mesenchymal
   cells. See also [143]Figure S4.

   By integrating the three datasets, distinct cellular states in the
   microenvironment and the description of the gene markers defining those
   subtypes were identified. The rationale being that despite the partial
   overlap observed between the datasets, a larger number of cells would
   generally contribute to a deeper characterization of the
   subpopulations. To directly address this and using the markers derived
   from the integrative analysis, we conducted a series of analyses to
   assess whether the integration provided additional insights compared to
   each dataset.

   First, we investigated what percentage of final integrated-based
   markers was recovered by each dataset by itself ([144]Figures 3E and 3F
   upper panel), what percentage of false negatives ([145]Figures 3E and
   3F middle panel) and false positives ([146]Figures 3E and 3F lower
   panel). For some of the subclusters, such as B3.4, A2.1, and A2.6
   (endothelium) or C2.1 and C3 (mesenchyme), over 50% of the markers
   could not be detected by each dataset separately. In a second analysis,
   we quantified the robustness of the defined clusters using data from
   each study separately ([147]Figures 3G and 3H). Only the Baryawno
   dataset allows for the robust identification of all the subclusters
   except for B3.4 in the endothelium. However, performing the same
   clustering strategy using only the Baryawno’s dataset in the
   endothelium could not identify all the subclusters with the same level
   of resolution compared to those observed with the integrated dataset
   ([148]Figure S4). We observed that the marker analysis using several
   differential expression tools provided similar results (data not
   shown), demonstrating that our approach is valid even for small size
   clusters and unbalanced groups.

   Next, we used an independent dataset to validate the performance of our
   integration effort. To that end, we applied our clusters’ signature to
   derive automatic annotations on an independent dataset ([149]Baccin
   et al., 2020) using SingleR ([150]Figure S6). This allowed us to
   discriminate in the independent dataset most of the cellular states we
   described based on the integration despite the lower number of cells
   sequenced ([151]Figure S6). These data further validate our integrative
   approach.

   Taken together, these data demonstrate that our customized approach for
   the integration of multiple datasets allows for a robust deconvolution
   of cell states when there is a high degree of similarity between cells
   of the same origin. Furthermore, transcriptional cellular stages
   inferred from the integration could be applied to further characterize
   independent single-cell datasets.

Deep characterization of the BM endothelial cell compartment

   Next, we aimed to characterize each of those 14 endothelial subclusters
   ([152]Figure 3A) based on the identified markers ([153]Table S2). Using
   the expression of those molecular markers, we could discriminate
   between arteries and sinusoids ([154]Figure 4A and [155]Table S3) in
   agreement with previous reports ([156]Hooper et al., 2009; [157]Itkin
   et al., 2016; [158]Rafii et al., 2016). Arterial clusters showed high
   expression of specific arterial genes such as Ly6a, Ly6c1, Igfbp3, and
   Vim ([159]Figure 4B upper panels). At the same time, sinusoidal cells
   were defined by their characteristic signature expressing Adamts5,
   Stab2, Il6st, and Ubd ([160]Figure 4B lower panels). Importantly,
   besides these already known markers, differential expression analysis
   of the integrative datasets revealed some novel genes specific of each
   endothelial population, such as Igfbp7 and Ppia for arteries and Cd164
   or Blrv for sinusoids ([161]Figure 4C). The expression of these genes
   would be consistent with the role Igfbp7 and Blvrb in the maintenance
   of endothelial vasculature homeostasis ([162]Klóska et al., 2019;
   [163]Tamura et al., 2009).

Figure 4.

   [164]Figure 4
   [165]Open in a new tab

   Deep characterization of the endothelial cell compartment in the BM

   (A) UMAP representation of arteries (red) and sinusoids (blue) within
   the endothelial cell population. The right-bottom subpanel depicts the
   final endothelial clusters identified.

   (B) Violin plot of gene expression for known markers of arteries and
   sinusoids cell subtypes.

   (C) Violin plot of gene expression for new marker candidates separating
   arteries and sinusoids cell subtypes.

   (D) Dot plot of the top 5 markers for each endothelial subcluster. Dot
   size corresponds to the proportion of cells within the group expressing
   each gene, and dot color corresponds to the average expression level.

   (E) Selected set of gene sets derived from the gene set analysis
   conducted with top 50 markers per cluster (see [166]STAR Methods).

   (F) Final clustering of the endothelial cell population and the
   labeling assigned based on marker genes and gene set analysis. See also
   [167]Figure S4 and [168]Tables S2, [169]S3, and [170]S4.

   Beyond characterizing arteries and sinusoids, we annotated their
   respective cell states using cluster’s markers based on genes
   ([171]Table S2, [172]Figure 4D) and gene sets (derived from gene set
   enrichment analysis, see [173]STAR Methods for a detailed version of
   the annotation; [174]Table S4, [175]Figure 4E). Our final annotation
   described in [176]Figure 4F was based in conserved Gene Ontology terms,
   and supported, with no contradictory gene sets, from Reactome and KEGG
   pathways analysis ([177]Table S4/data not shown). Gene sets related to
   vasculature development and remodeling were identified within the
   endothelial subclusters, confirming the identity of this cell
   population ([178]Table S4). We uncovered two subclusters (A1.1 and
   A2.5, respectively) within the arteries and sinusoids groups, which
   were enriched in gene sets involved in wounding. This finding is in
   agreement with the role of EC in pro- and antithrombotic processes
   ([179]Yau et al., 2015). Gene sets involved in extracellular matrix
   assembly, cell adhesion, and migration processes were specifically
   enriched in arteries, in line with the importance of these biological
   processes for vascular morphogenesis ([180]Davis and Senger, 2005). In
   relation to the structural support provided by arterial cells,
   subcluster B3.2 (actin, endocytosis) implicated in matrix remodeling
   was defined by the expression of RhoC, Apln, and Anxa2. Other arterial
   subclusters such as ROS and Immune (A1.2 and B3.1, respectively)
   include highly expressed gene sets involved in the regulation of
   reactive oxygen species metabolic process and cytokine-mediated
   signaling pathway. These findings are in line with the role of ECs in
   maintaining the redox balance and leukocyte regulation ([181]Testa
   et al., 2016; [182]Zhao et al., 2012). Sinusoidal-endothelial
   subclusters such as A2.1 and A2.6 showed enrichment in GO terms related
   to artery development and endothelial proliferation processes, two
   critical steps within the process of angiogenesis ([183]Naito et al.,
   2021). Furthermore, the sinusoidal subcluster A2.7 expressed gene sets
   involved in ion transport and signaling-related signatures. This is in
   concordance with the need of EC to constantly sense and adapt to
   alterations in response to microenvironmental cues ([184]March et al.,
   2009; [185]Quillon et al., 2015). Of note, ion channels play a role in
   EC functions controlled by intracellular Ca^2+ signals, such as the
   production and release of many vasoactive factors such as nitric oxide.
   In addition, these channels are involved in the regulation of the
   traffic of macromolecules, controlling intercellular permeability, EC
   proliferation, and angiogenesis ([186]Nilius and Droogmans, 2001).
   Importantly, several markers that were found only with the dataset
   integration correspond to genes within the GO categories used to label
   the clusters, hence, revealing their important role in defining the
   function of these cell states. For example, in subcluster B2, genes
   such as Gja1, Tgfb3, and Ablim2 are involved in regulating cell
   junctions and cytoskeletal organization ([187]Barrientos et al., 2007;
   [188]Okamoto and Suzuki, 2017).

   Taken together, these results suggest a previously unrecognized level
   of specialization of the bone marrow endothelial cells. Furthermore,
   the specificity of the distinct functional states in the EC suggest
   that the endothelial compartment is a more dynamic and flexible tissue
   with a richer intrinsic repertoire than previously anticipated.
   However, functional validation is required to confirm the putative role
   of each EC intermediate cell state described here.

Deep characterization of the BM mesenchymal cell compartment

   Applying the same robust clustering to mesenchymal stromal cells, we
   identified 11 subclusters and proceeded with the annotation
   ([189]Figure 3C and [190]S3, and [191]Tables S5, [192]S6, and [193]S7).
   Based on the expression of canonical markers, we first discriminated
   clusters between early mesenchymal (MSC), and cells already committed
   to the osteolineage (OLN-primed) ([194]Figure 5A and [195]Table S6).
   The high expression of Cxcl12, LepR, Adipoq, or Vcam1, among others,
   confirmed the identity of the early MSC group ([196]Figure 5B left
   column); whereas Bglap, Cd200, Alpl, or Col1a1 expression revealed the
   presence of osteolineage committed cells within the mesenchymal
   compartment ([197]Figure 5B right column). Furthermore, we identified a
   number of previously unrecognized, differentially expressed genes
   between the MSC and OLN-primed clusters such as Sbds and Itgb1 for MSCs
   and Enpp1 and Vkorc1, for OLN-primed cell type population
   ([198]Figure 5C). Itgb1, highly expressed in MSC is implicated in human
   chondrogenic differentiation of mesenchymal cells ([199]Hamidouche
   et al., 2009). Among OLN-primed specific markers, Enpp1 and Vkorc1 have
   been shown to regulate bone development by regulating bone
   calcification ([200]Hajjawi et al., 2014; [201]Mackenzie et al., 2012;
   [202]Price, 1985; [203]Spohn et al., 2009).

Figure 5.

   [204]Figure 5
   [205]Open in a new tab

   Deep characterization of the mesenchymal cell compartment in the BM

   (A) UMAP representation of mesenchymal (red) and osteolineage-primed
   (OLN-primed) (blue) within the mesenchymal compartment. The right-upper
   subpanel depicts the final mesenchymal clusters identified.

   (B) Violin plot of gene expression for known markers of mesenchymal
   (red) and osteolineage-primed (blue) cells.

   (C) Violin plot of gene expression for new marker candidates separating
   mesenchymal (red) and osteolineage primed (blue) cells.

   (D) Dot plot of the top 5 markers for each mesenchymal subcluster. Dot
   size corresponds to the proportion of cells within the group expressing
   each gene, and dot color corresponds to the average expression level.

   (E) Selected set of gene sets derived from the gene set analysis
   conducted with top 50 markers per cluster (see [206]STAR Methods). (∗)
   transmembrane receptor protein serine/threonine kinase signaling
   pathway.

   (F) Final clustering of the mesenchymal cell population and the
   annotation based on marker genes and gene set analysis. See also
   [207]Figure S4 and [208]Tables S5, [209]S6 and [210]S7.

   Besides MSCs and OLN-primed MSCs, we identified additional
   subpopulations. Through the marker identification and gene set analysis
   of the 11 subclusters ([211]Figures 5D and 5E and [212]Tables S5,
   [213]S6, and [214]S7) we were able to characterize and label each of
   the clusters as shown in [215]Figure 5F. GO terms such as adipogenesis,
   assembly and organization, immune response, cell migration, or muscle
   differentiation were enriched in the C2.3, C2.1, C1, C3, C4.2, and C4.1
   subsets respectively, confirming the identity of this MSC cell group.
   Furthermore, terms related to extracellular matrix, chondrocyte
   differentiation, and bone development, including bone formation,
   ossification, or epithelial migration, among others, were identified in
   the OLN-primed subclusters (C5, C6, and D, respectively), verifying the
   identity of these more mature cells (OLN-primed cells) within the
   mesenchymal stromal cells.

   Taken together, these results demonstrate that the newly identified
   mesenchymal subpopulations could not be properly characterized without
   the multi-dataset integration and a novel clustering approach. Further,
   our data provide evidence of the heterogeneity of the mesenchymal
   compartment in the BM.

Composition of the human endothelial and mesenchymal BM microenvironment

   While our data reveal a previously unrecognized heterogeneity in the
   murine BM endothelial and mesenchymal compartments, information about
   the composition of the human microenvironment and how much of this
   heterogeneity is observed in humans remains unanswered. To address this
   issue, we performed scRNA-seq analysis in prospectively isolated EC
   (TO-PRO-3^-, CD45^−, CD235^-, Lin^−, CD31^+, and CD9^+([216]Barreiro
   et al., 2005; [217]Kenswil et al., 2018)) and MSC-OLN (TO-PRO-3^-,
   CD45^−, CD235^-,Lin^−,CD271^+([218]Ghazanfari et al., 2016;
   [219]Hashemi et al., 1991; [220]Quirici et al., 2002), and CD146^+/−)
   ([221]Figure S7A) from iliac crest BM aspirates from four healthy young
   adults (20–30 years of age) ([222]Figure 6A). As described in
   [223]Figure 6B, we added additional filtering steps in the
   bioinformatic analysis to identify the two populations of interest, EC
   and MSC. As an additional quality control, we estimated the
   contribution of each human sample to EC and MSC subsets and cell cycle
   status ([224]Figures S7B and S7C). The EC (907 and 658 cells, clusters
   1 and 6, respectively) ([225]Figures 6C and [226]S7B middle panel)
   identity was confirmed based on the expression of canonical endothelial
   markers such as PECAM1 (coding for CD31), CD9, ICAM2, VLC, and ITGB1
   ([227]Figure 6D and [228]Table S8). In addition, examining functional
   pathways in clusters 1 and 6 revealed enrichment in GO terms associated
   with blood coagulation and hemostasis ([229]Table S9). The MSC identity
   (249 cells, cluster 11, [230]Figures 6C and [231]S7B middle panel) was
   confirmed by the expression of the mesenchymal specific genes (CXCL12
   and LEPR) and the OLN-primed specific genes ANGPT1, COL1A1, and VCAM1,
   among others ([232]Figure 6E and [233]Table S8). Furthermore,
   enrichment in functions associated with extracellular matrix
   organization and response to the mechanical stimulus was demonstrated
   in osteolineage cells ([234]Table S9). In summary, the generated human
   data suggest that single-cell RNA sequencing from iliac crest aspirates
   can aid in describing the complexity of the human BM microenvironment.
   Nevertheless, the limited number of EC and MSC, as well as the presence
   of contaminating populations did not allow a fine-grained clustering as
   the one performed in the mouse data.

Figure 6.

   [235]Figure 6
   [236]Open in a new tab

   Composition of the human endothelial and mesenchymal BM
   microenvironment

   (A) Experimental design for the human BMM characterization.

   (B) Scheme of customized bioinformatics pipeline filtering the cells
   with a large number of Immunoglobulin genes.

   (C) UMAP visualization of color-coded clustering of the human BM
   microenvironment after filtering cells.

   (D and E) Expression of representative markers for endothelial
   population (D) and mesenchymal-osteolineage cells (E) using an UMAP
   representation. See also [237]Figure S5 and [238]Tables S8 and [239]S9.

Substantial conservation of the EC and MSC population in the human BM
microenvironment

   Based on the limitations of the human data, we next investigated to
   what extent the knowledge uncovered in mouse could be applied to
   identify subpopulations/cell states in the human BM microenvironment.
   As a first step, we used single-cell mouse data to annotate the human
   cells using SingleR ([240]Aran et al., 2019), separately for EC
   ([241]Figures S8A and S8B) and MSC ([242]Figures S8C and S8D). We
   observed that MVG identified in mouse allowed us to separate the cells
   into clusters for both human EC and MSC ([243]Figures S6A–S6D). As a
   result, this analysis suggests that part of the biological mechanisms
   defining the BM microenvironment may be shared between species.

   Therefore, we decided to investigate the enrichment of conserved
   features (genes) between mouse and human; therefore, an enrichment
   score (ES) was computed for each cluster for EC and MSC separately (see
   [244]STAR Methods) ([245]Figure 7A and [246]Tables S10 and [247]S11).
   In the case of endothelial cells, the enrichment score was up to 2-fold
   ([248]Figure 7A): wounding (A1.1) with 2.15-ES, the junction (B2)
   2.26-ES, arteriogenesis (A2.1) 2.02-ES, and signaling (A2.7) 2.5-ES).
   Importantly, for some of the subclusters as sinusoidal signaling (A2.7)
   and the arterial of angiogenesis (B3.3), these shared genes are
   critical for defining each of those EC functional states
   ([249]Table S10). Both human and mouse ECs express DDIT4, JUN, CITED2,
   GADD45G, DUSP1, FOS, and CLDN5, which are part of a wide array of
   transcription factors, growth factors, and signaling pathways that have
   been described to regulate the maintenance of vascular homeostasis
   under physiological conditions ([250]Echavarria and Hussain, 2013;
   [251]Escudero-Esparza et al., 2012; [252]Jia et al., 2016). Similarly,
   ECs subclusters involved in angiogenesis in both species shared the
   expression of RGCC, GATA2, KLF2, and CAV2 genes, which are known to be
   implicated in angiogenic-related processes ([253]Lee et al., 2006;
   [254]Linnemann et al., 2011).

Figure 7.

   [255]Figure 7
   [256]Open in a new tab

   Conservation analysis of the EC and MSC population in the human BM
   microenvironment

   (A and B) Quantification of the conservation for EC (A) and MSC-OLN
   (B) cells for each cluster. The enrichment of those genes that are
   cluster markers in mouse and observed in most variable genes (MVG) of
   EC and MSC human cells, respectively. The right column shows, among the
   genes identified in human, those that are part of the gene sets used to
   label the cluster. See also [257]Figure S6 and [258]Tables S10 and
   [259]S11.

   (C) Heatmap of the cytokines, secreted molecules, and growth factors in
   EC and MSC.

   In the case of mesenchymal cells, we identified >3.5-ES for three
   subclusters, such as RNA-Myogenesis (C4.1), Factors-Immune (C4.2), and
   ossification (D2). Importantly, some of the shared genes correspond to
   genes that allowed the subcluster labeling through GO categories
   ([260]Figure 7B and [261]Table S11). Genes such as CXCL12, APOE, or
   IGFBP3 are associated with cell migration and lipid transport pathways
   among others ([262]Amable et al., 2014; [263]Robert et al., 2020), and
   characterized the murine adipogenesis subcluster (C2.3). Other genes,
   such as IGFBP5, are involved in actin filament assembly and
   organization ([264]Sureshbabu et al., 2012) and defined the cell
   adhesion subcluster (C2.1). IFIT3, MIDN, and ILR1 belong to pathways
   associated with interferon regulation or autoimmunity ([265]Guo et al.,
   2017; [266]Kim et al., 2020) and were identified in the immune response
   subclusters (C3 and C4.2). Additionally, the expression of COL5A1 and
   CADM1 genes, previously related to collagen fibril organization and
   bone mineralization processes ([267]Kahai et al., 2004; [268]Nakamura
   et al., 2017), defined the bone formation subcluster (D1). Moreover,
   genes such as SPP1 or CLEC11A, which are related to osteoblasts
   function and mineralization ([269]Huang et al., 2007; [270]Shen et al.,
   2021; [271]Yue et al., 2016), defined the mouse OLN-primed MSCs
   subcluster associated with ossification (D2) and are also expressed in
   human MSC-OLN cluster. Altogether, this data indicate the conservation
   of the osteogenic microenvironment between both species. To further
   explore the conservation of the BM microenvironment between both
   species, we examined the expression of known niche factors that
   regulate hematopoiesis. Despite the limited number of human cells, we
   were able to identify common niche-derived factors specific to the
   endothelium (SPARC, TGFB1, SELP, and EGFL7 among others) and the
   mesenchyme (including CXCL12, RARRES2, KITLG, and GAS6)
   ([272]Figure 7C). These cytokines, secreted molecules and growth
   factors, represent key regulators of HSPCs shared among species.

   Together, our analysis suggests that deep characterization of cellular
   states in mice can be used to infer conserved features in the human BM
   microenvironment despite a low level of conservation in the actual
   transcriptional profile of EC and higher in the case of MSC.
   Importantly, our data reveal a substantial degree of conservation
   regarding the complexity and heterogeneity of the EC and MSC
   compartment in the BM between mouse and human. This suggests that the
   layers of microenvironmental regulation of hematopoiesis and the
   identified plasticity in mice may also be shared between species.

Discussion

   Our study dissects the intrinsic organization and the heterogeneity
   within the endothelial (EC) and mesenchymal cell populations (MSC)
   governing the BM microenvironment. This was accomplished through
   customized bioinformatics integration of multiple datasets along with
   the inclusion of over 50.000 murine bone marrow stromal cells. We were
   able to identify new subsets of MSC and EC but, but more importantly,
   to define new molecular markers for the identification of highly
   specialized subpopulations of cells in the BM microenvironment. Pathway
   enrichment analysis unveiled multiple, potentially transient cell
   states defined by differential gene expression and the enrichment of
   specific functional characteristics. Importantly, EC subsets were
   characterized by enrichment in pathways known to be essential for
   endothelial homeostasis maintenance, demonstrating a high degree of
   specialization in the endothelium. Similarly, multiple transient cell
   states in the MSC compartment were defined and characterized by their
   differentiation capacity. Importantly, our deep deconvolution of the
   heterogeneous mesenchymal and endothelial compartments became feasible
   only by integrating multiple datasets. Of note, our analysis showcases
   that a research paradigm aiming for the generation of a detailed
   comprehensive molecular atlas of an organism requires both multi-omic
   data and computational integration. Here, we have relied on what is
   referred to as unpaired unimodal (scRNA-seq) data. Clearly, a natural
   next step is to use and further develop new computational tools that
   permit the integration ([273]Argelaguet et al., 2021) of unpaired
   multi-omics datasets such as scRNA-seq, scATAC-seq, and other data
   modalities. Recent technological developments enable several multiple
   omics recorded from the same cell, i.e. paired data, which leverages
   our ability to dissect and molecularly characterize the intrinsic
   organization of the bone marrow niche environment. Advances in
   computational biology have started to develop such tools ([274]Ashhurst
   et al., 2021; [275]Hao et al., 2021; [276]Martinez-de-Morentin et al.,
   2021; [277]Wu et al., 2021). Moreover, some validation experiments,
   such as the use of fluorescent reporters, targeting niche-associated
   genes, and functional studies will allow us to confirm the identified
   new molecular markers based on differential gene expression and also
   the related annotated pathways.

   While our study did not directly address the influence between stromal
   cells in the hematopoietic stem cell niche and the HSCs, the deep
   resolution of our study allows for some inferences to be made. Of note,
   we detected the expression of vascular endothelial growth factor-C
   (Vegf-c) in mouse endothelial and mesenchymal cells ([278]Tables S2,
   [279]S5, [280]S6, [281]S8, and [282]S11). Vegf-c has recently been
   implicated in the maintenance of the perivascular niche and the
   recovery of hematopoiesis upon injury ([283]Fang et al., 2020). Vegf-c
   is specifically expressed in the endothelial B1 and mesenchymal C1
   subpopulations, suggesting an important role of these specialized
   endothelial and mesenchymal cells in the preservation of the integrity
   of the perivascular niche. In addition, Apelin+ (Apln) endothelial
   cells have been recently implicated in HSC maintenance and regeneration
   upon injury ([284]Chen et al., 2019a). Importantly, two endothelial
   subclusters (B3.2-actin endocytosis and A1.1-wounding arteries)
   demonstrated expression of Apln, suggesting that these EC states
   represent specific sources of hematopoietic support and vascular
   regeneration upon injury. Osteolectin+ LepR–+mechanosensitive
   peri-arteriolar mesenchymal cells with osteogenic potential—are
   implicated in lymphoid, but not HSC maintenance ([285]Shen et al.,
   2021). Importantly, osteolectin expression defines murine cluster D2
   (ossification) and shows conservation in human MSCs, suggesting the
   preservation of a specialized lymphoid niche between species.

   Detailed characterization of the human BM niche has not yet been
   addressed. Approaches undertaken in a mouse system cannot readily be
   transferred to the human system. Furthermore, differences in sample
   processing can also impact the results. In that sense, our results,
   despite the low number of cells, may represent the first dataset that
   includes scRNAseq from the human endothelial and mesenchymal BM
   microenvironment. While we were able to identify mesenchymal and
   endothelial cells based on canonical markers shared with mice
   ([286]Aoki et al., 2021; [287]Baryawno et al., 2019; [288]Kalucka
   et al., 2020; [289]Leimkühler et al., 2021; [290]Matthews et al., 2020;
   [291]Severe et al., 2019; [292]Stumpf et al., 2020; [293]Tikhonova
   et al., 2019; [294]Wang et al., 2020; [295]Xie et al., 2021), our human
   scRNAseq did not possess enough resolution to elucidate the
   heterogeneity of the human BM stroma to the same level as with the
   mouse data. Based on the extensive knowledge generated in the mouse, we
   therefore focused on characterizing how much of the information and
   targets from the mouse can be of interest in human characterization.
   This analysis allowed us to identify the expression of the human
   orthologs to the murine cluster-defining genes with different degrees
   of enrichment in the endothelium and mesenchyme. Some of these shared
   genes in mice and human stromal cells corresponded to the GO-defining
   genes of the different clusters identified in the mouse. Further,
   analysis of niche factors produced by ECs or MSCs and known to regulate
   hematopoiesis in mice were found conserved in the human samples. These
   findings suggest a meaningful degree of conservation regarding the
   cellular states that define the stromal microenvironment in mouse and
   human. Although additional studies and improved processing of human
   samples will be required for deep characterization of the human BM
   microenvironment, these preliminary results validate our integrative
   cross-species approach.

   As an example of the added value, the current study identifies
   candidates of relevance in the study of BM-related diseases. Sbds, a
   ribosome maturation protein associated with the Shwachman-Diamond
   syndrome, represents a previously unrecognized marker of immature MSCs
   based on the dataset integration. Sbds deficiency has been implicated
   in ossification defects and metabolic changes in HSPCs
   ([296]Raaijmakers et al., 2010; [297]Zambetti et al., 2016),
   potentially contributing to myelodysplasia and AML onset in patients
   with Shwachman-Diamond syndrome. Future studies will help to improve
   the understanding of these new candidates and the pathogenesis
   associated. On a broader note, deep molecular analysis of the BM
   microenvironment sets the stage for computational disease modeling
   ([298]Tegnér et al., 2009) from a systems medicine perspective.

   Taken together, our study provides a deeper understanding of the
   composition and specialization of the BM microenvironment and point
   toward a substantial degree of conservation between species. Moreover,
   we demonstrate the usefulness of the multi-dataset integration and the
   customized clustering approach used in our study to improve the
   resolution of complex tissues and organs. This approach promises to aid
   in the construction of cell atlases by reducing the resources
   associated with sequencing that a single lab will need to invest in
   order to obtain meaningful depth in single-cell analysis while reducing
   the biases that may arise from data generated from a single laboratory
   or platform.

   Future studies integrating genome, transcriptome, epigenome, proteome,
   and anatomical positioning together with functional assays to correlate
   descriptive phenotypes with functional data will help fully resolve the
   composition, regulation, and connectivity in the BM microenvironment in
   health and disease.

Limitations of the study

   This study provides the most comprehensive description to date of the
   murine HSC-regulatory microenvironment and suggests a higher level of
   specialization of the cellular circuits than previously anticipated.
   Moreover, this knowledge generated in the mouse allows the inference in
   human demonstrating substantial conservation between species. However,
   these studies are limited by the number of cells sequenced, the
   difficulty of obtaining high-quality samples with sufficient stromal
   cell numbers from healthy individuals, as well as the presence of
   contaminating populations in human data. The integrative analysis has
   limitations to account for the different.

Abbreviations

     * HSC Hematopoietic stem cell
     * HSPC Hematopoietic stem and progenitor cells
     * scRNA-seq Single-cell RNA sequencing
     * BM Bone marrow
     * EC Endothelial cells
     * MSC Mesenchymal cells
     * OLN-primed Osteolineage
     * MVG Most variable genes

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Antibodies
     __________________________________________________________________

   Biotin anti-mouse lineage cocktail antibody (Mac1, CD3, Gr1, B220 and
   TER-119) BIOLEGEND-DEDHAM Cat#133307
   APC/Cyanine7 Streptavidin BIOLEGEND-DEDHAM Cat#405208
   Brilliant Violet 510 anti-mouse CD45 antibody (clone 30F11)
   BIOLEGEND-DEDHAM Cat#103137
   APC anti-mouse CD45 antibody (clone 30F11) BIOLEGEND-DEDHAM Cat#103112
   APC anti-mouse TER-119 antibody BIOLEGEND-DEDHAM Cat#116212
   Brilliant Violet 510 anti-human CD3 antibody BIOLEGEND-DEDHAM
   Cat#317332
   Brilliant Violet 510 anti-human CD19 antibody BIOLEGEND-DEDHAM
   Cat#363020
   V500 anti-human CD45 antibody (Clone HI30) BD horizon Cat#560777
   Brilliant Violet 510 anti-human CD64 antibody BIOLEGEND-DEDHAM
   Cat#305028
   BV421 anti-human CD45 antibody BIOLEGEND-DEDHAM Cat#304032
   BV421 anti-human CD235 antibody BIOLEGEND-DEDHAM Cat#349108
   FITC anti-human CD31 antibody BIOLEGEND-DEDHAM Cat#303104
   APC/FIRE 750 anti-human CD9 antibody BIOLEGEND-DEDHAM Cat#312113
   PE 750 anti-human CD146 antibody BIOLEGEND-DEDHAM Cat#361005
   PERCP Cy5.5 anti-human CD271 antibody BIOLEGEND-DEDHAM Cat#345112
     __________________________________________________________________

   Biological samples
     __________________________________________________________________

   Bone marrow aspirates healthy young donors N/A
     __________________________________________________________________

   Chemicals, peptides, and recombinant proteins
     __________________________________________________________________

   STREPTAVIDIN MICROBEADS MILTENYI BIOTEC, S.L. Cat#130-048-101
   Vybrant DyeCycle Orange Stain ThermoFisher Scientific Cat#V35005
   ANNEXIN V FITC PHARMINGEN Cat#556419
   7AAD BD Bioscience Cat#559925
   TO-PRO-3 IODIDE (642/661) THERMO FISHER SCIENTIFIC S.L. Cat#T3605
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   Miltenyi LD columns MILTENYI BIOTEC, S.L. Cat#130-042-901
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   Raw and analyzed data This paper Single Cell Portal: SCP1747
     __________________________________________________________________

   Experimental models: Organisms/strains
     __________________________________________________________________

   Female C57BL/6J mice, CD45.2 Jackson Laboratory Cat#000664
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   FlowJo (version 10.7.1) software N/A N/A
   R (version 4.0.3, 3.6.3) R core [299]https://www.r-project.org/
   Seurat (version 4.0.0, 3.2.3) [300]Stuart et al. (2019)
   [301]https://github.com/satijalab/seurat/
   SCTransform [302]Hafemeister and Satija (2019)
   [303]https://github.com/satijalab/sctransform
   IKAP [304]Chen et al. (2019b) [305]https://github.com/GenomicsNX/IKAP
   CellRanger (version 6.0.1) 10× Genomics [306]www.10xgenomics.com/
   SingleR (version 1.4.1) [307]Aran et al. (2019)
   [308]https://github.com/dviraran/SingleR
   ClusterProfiler (version 3.18.1) [309]Yu et al. (2012)
   [310]https://guangchuangyu.github.io/software/clusterProfiler/
   [311]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Dr. David
   Gomez-Cabrero ([312]david.gomezcabrero@kaust.edu.sa).

Materials availability

   Requested materials are available from the [313]lead contact.

Experimental model and subject details

   For in vivo animal studies, six females C57BL/6J mice (CD45.2) at age
   20 weeks were used for Single-Cell RNA-seq experiments. Animals studies
   were carried out in accordance with the European Communities Council
   Directive (2010/63/UE) and with the approval of the Ethical Committee
   for Animal Testing of the University of Navarra. The number of mice per
   experiment was calculated to adhere to the “Replace, refine and reduce”
   rule for animal use in experimental procedures.

   For human studies, a total volume of approximately 60 mL bone marrow
   (BM) was obtained by aspiration from the posterior iliac crest from
   four healthy young adults (20–30 years of age) after written informed
   consent was achieved. The human sample collection and research
   conducted in this study were approved by the Research Ethics Committee
   of the University of Navarra (project 2017.218). Personal data was kept
   confidential in accordance with the Organic law 3/2018, on personal
   data protection and Spanish law 14/2007 on Biomedical research. All
   collection samples are codified, and only the authorized personnel were
   able to correlate the patient’s identity with the codes.

Method details

Isolation and FACS sorting of murine bone marrow microenvironment cells

   Six mice were euthanized via CO[2] asphyxiation. Bones from humerus,
   radius, iliac crests, femurs and tibia were harvested in PBS 1X
   containing 2% FBS and 2 mM EDTA (modPBS). All steps were performed on
   ice to preserve cell viability and RNA integrity. Muscles and soft
   tissue were thoroughly removed from the bones and bone marrow (BM)
   cells were obtained by crushing in modPBS. Cells were then filtered
   through a 70 μm cell strainer and red blood lysed with ACK buffer
   (NH[4]Cl 150 mM, KHCO[3] 10 mM, and Na[2]EDTA 0.1 mM) for 10 min at
   room temperature (RT) with rotation. Remaining calcified bone fragments
   were collected on a 50 mL conical tube and digested with the
   appropriate volume of PBS with 0.3% collagenase I and dispase (5 U/mL)
   during 15 min at 37°C and shaking at 200 rpm. FBS representing 10% of
   the digestion volume was added to stop the collagenase digestion. After
   digestion, the calcified and crushed fractions were filtered through a
   70 μm filter into a collection tube and pooled into one sample. Cells
   were subsequently stained for 20 min on ice first in the appropriate
   volume of modified PBS 1X (3 mL/mouse) with 160 ul/mouse of
   biotinylated lineage cocktail (Mac1, CD3, Gr1, B220 and TER119)
   followed by incubation with streptavidin magnetics microbeads
   (100 μL/mouse). Negative selection was performed using Miltenyi LD
   columns according to manufacturers’ protocol. After selection, the
   sample was stained with the following combination of conjugated
   antibodies at a concentration of 1/200: APC-Cy7 labeled streptavidin,
   BV510 labeled anti-CD45, APC labeled anti-CD45 and APC labeled
   anti-TER-119. Samples were then stained with 0.05 μM of Vybrant dye
   orange (VDO) at 37°C for 30 min to label living cells. Annexin V was
   also added, in combination with 7AAD to discard apoptotic and dead
   cells from the sample respectively. For annexin V staining, cells were
   stained with 1 μL/mouse of Annexin V-FITC on an appropriate volume of
   1X Annexin V binding buffer in the dark for 15 min at RT. Samples were
   then resuspended in 1X Annexin V buffer and 5 μL of 7AAD dye (up to
   1 × 10^6 cells) were added. BM non-hematopoietic cells were FACS sorted
   using BD FACSAria II sorter collected in PBS 1X supplemented with 0.05%
   UltraPure BSA and cell viability was assessed using Nexcelom
   Cellometer. Data were analyzed by FlowJo (version 10.7.1) software.

Isolation and FACS sorting of human bone marrow endothelial and
mesenchymal-osteolineage cells

   All sample processing steps were performed on ice to preserve cell
   viability and RNA integrity. A total volume of approximately 60 mL of
   bone marrow was obtained by aspiration from the posterior iliac crest.
   Red blood cells were lysed twice with 45 mL of ACK buffer per 5 mL of
   human sample during 15 min at RT with rotation. Sample was then
   filtered through a 70 μm cell strainer, centrifuged, and stained for
   30 min on ice with the following combination of conjugated antibodies
   at a concentration of 1/100 except anti-Lin (3ul/test- test
   25 × 10^6cells): BV510 labeled anti-Lin (including CD3, CD10, CD19,
   CD45 and CD64), BV421 labeled anti-CD235, BV421 labeled anti- CD45,
   FITC labeled anti-CD31, APC-Cy7 labeled anti-CD9, PE labeled anti-CD146
   and PerCP- Cy5.5 labeled anti-CD271. Dead cells and debris were firstly
   excluded by FSC, SSC and adding 10 μL of TO-PRO-3. BM niche populations
   were prospectively isolated based on the following immunophenotype:
   ECs: TO-PRO-3^-/Lin^−/CD45^-/CD235^-/CD9^+/CD31^+ and MSCs:
   TO-PRO-3^-/Lin^−/CD45^-/CD235^-/CD31^-/CD271^+/CD146^+/−. FACS sorting
   was performed on a BD FACSAria II sorter, sorted BM niche cells were
   collected in PBS 1× and 0.05% Ultra-Pure BSA and cell viability was
   determined using Nexcelom Cellometer. Data were analyzed by FlowJo
   (version 10.7.1) software.

Profiling by single-cell RNA-sequencing (scRNA-seq)

   scRNA-seq was performed using the Single Cell 3′ Reagent Kits v3.1 (10X
   Genomics) according to the manufacturer’s instructions. For human
   samples, endothelial and mesenchymal cells were pooled before scRNA-seq
   was performed. Approximately 15,000 cells were loaded at a
   concentration of 1,000 cells/μL on a Chromium Controller instrument
   (10X Genomics) to generate single-cell gel bead-in-emulsions (GEMs). In
   this step, each cell was encapsulated with primers containing a fixed
   Illumina Read 1 sequence, followed by a cell-identifying 16 bp 10X
   barcode, a 10 bp Unique Molecular Identifier (UMI) and a poly-dT
   sequence. A subsequent reverse transcription yielded full-length,
   barcoded cDNA. This cDNA was then released from the GEMs, PCR-amplified
   and purified with magnetic beads (SPRIselect, Beckman Coulter).
   Enzymatic Fragmentation and Size Selection was used to optimize cDNA
   size prior to library construction. Illumina adaptor sequences were
   added, and the resulting library was amplified via end repair, A-
   tailing, adaptor ligation and PCR. Libraries quality control and
   quantification was performed using Qubit 3.0 Fluorometer (Life
   Technologies) and Agilent’s 4200 TapeStation System (Agilent),
   respectively. Sequencing was performed in a NextSeq500 (Illumina) (Read
   1: 26 cycles, i7 Index: 8 cycles, Read 2: 49 cycles) at an average
   depth of 60,000 reads/cell in mice and 30,000 reads/cell in human.

Single-cell RNA-seq analysis of mouse samples

   See extended details and code in the following Github:
   [314]https://github.com/TranslationalBioinformaticsUnit/BMN_characteriz
   ation.

Sample selection

   Sample [315]GSM3674224, [316]GSM3674225, [317]GSM3674226,
   [318]GSM3674227, [319]GSM3674228, [320]GSM3674229 from [321]GSE128423
   by Baryawno, sample [322]GSM2915575, [323]GSM2915576, [324]GSM2915577
   from [325]GSE108891 by Tikhonova and one in-house mouse bone marrow
   niche sample was included in this analysis.

Filtering

   The single cell analysis of mice samples analysis was performed using R
   (version 4.0.3 for human, 3.6.3 for mouse) and Seurat (version 4.0.0
   for human, 3.2.3 for mouse)([326]Stuart et al., 2019).Three bone marrow
   niche samples were filtered individually based on the 10^th and 90^th
   quantile of number of features and counts. Cells with more than 5%
   mitochondrial genes were also removed. Each dataset was normalized
   using SCTransform function ([327]Hafemeister and Satija, 2019) from
   Seurat package separately.

Pairwise integration and selection of the target population

   In-house dataset and Baryawno were integrated with Tikhonova separately
   using IntegrateData function from Seurat (version 3.2.3.). Using as a
   reference the annotation from Tikhonova dataset, cells that aligned
   with LEPR+ cells and VE-Cad+ cells were annotated as MSC and EC
   respectively. MSC-like cells and EC-like cells from different datasets
   were normalized again and integrated using Seurat without further
   filtering.

Clustering

   After filtering and quality control, a divide-and-conquer strategy was
   applied to the clustering of mouse ECs and MSCs separately. Firstly,
   following integration, dimension reduction with principal component
   analysis (PCA), data visualization with Uniform Manifold Approximation
   and Projection (UMAP), computation of K-nearest neighbors and an
   initial Louvain clustering using resolution of 1 were performed as a
   reference of high-resolution limit. Secondly, IKAP ([328]Chen et al.,
   2019b) was applied to each integrated dataset as “level 1” clustering
   ([329]Figure 2B). Each cluster from level 1 was then compare with the
   high-resolution reference. The cluster from level 1 was further divided
   using IKAP to level 2 if the cluster were far from cluster limit. The
   process would be repeated until at least one cluster reach cluster
   limit.

Cluster evaluation

   To evaluate the stability of these clusters, a bootstrapping strategy
   was adopted ([330]Tasic et al., 2016). The script was rewritten for our
   pipeline, and we further defined two metrics: (a) recall per cell and
   (b) #Correct (details below) for each cluster to obtain better
   demonstration and quantification of the cluster robustness.

   The strategy was conducted in a pairwise manner with basic steps as
   follow:
     * 1.
       Select two clusters, randomly split the clusters to five equal
       groups and use one group of cells (20%) as testing dataset.
     * 2.
       Identify the set of differentially expressed genes (DEGs) between
       the 2 clusters using the Wilcoxon Rank Sum test ([331]Wilcoxon,
       1945).
     * 3.
       Train a random forest classifier with 20 genes selecting the top 10
       DEGs from each cluster based on average log2 fold change.
     * 4.
       Applied the classifier to the 20% testing dataset.
     * 5.
       Repeat step1-4 for five times for different groups such that each
       cell in these two clusters was classified once.
     * 6.
       Repeat step1-5 nine more times.
     * 7.
       Repeat step1-6 for all cluster pairs

   There are three types of results that can be summarized from this
   bootstrapping strategy:
     * 1)
       Dominant cluster identification: A dominant cluster is the cluster
       to which the cell is assigned for more than half the runs.
     * 2)
       Number of Correct Dominance Assignment (#Correct): The sum of times
       the dominant cluster matches the original cluster (true positive)
       for a cell across all cluster pair comparisons ([332]Figures S2E
       and S2H).
     * 3)
       Recall Per Cell: The proportion of correct assignment (positive
       result) to its original cluster in all runs from all comparisons
       ([333]Figures S2D and S2G).

   Clusters where more than 50% of the cells has been “incorrectly”
   assigned robustly at least once to another dominant cluster will be
   considered unstable ([334]Figures S2F and S2H, cluster A2.2). The cells
   of such cluster will be assigned to other clusters (see dominant plots,
   [335]Figure S9 as an example) using – to that end - a random forest
   classifier as described before.

   To evaluate the mixing of three dataset in each cluster after the
   integration, Shannon entropy was calculated. For each cell
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   in a cluster, the probability of them coming from any of the datasets
   (d = 1,2,3) is denoted as
   [MATH: <mrow><msubsup><mi>p</mi><mi>j</mi><mi>d</mi></msubsup></mrow>
   :MATH]
   , the entropy for this cell was calculated as
   [MATH: <mrow><msub><mi>H</mi><mi>j</mi></msub><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mo linebreak="goodbreak"
   linebreakstyle="after">−</mo><munderover><mrow><mo>∑</mo></mrow><mrow><
   mi>d</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup><mi
   >p</mi><mi>j</mi><mi>d</mi></msubsup><mi>l</mi><mi>o</mi><mi>g</mi><msu
   bsup><mi>p</mi><mi>j</mi><mi>d</mi></msubsup></mrow> :MATH]
   .

Gene set analysis

   After clustering, DEGs of each cluster were identified using a Wilcoxon
   Rank Sum test. For each cluster, an over-representation analysis of GO
   gene set enrichment analysis was conducted using the top 30, 50 and 100
   DEGs based on the average log transformed fold changes using
   clusterProfiler (version 3.18.1)([336]Yu et al., 2012). Additionally,
   to further unveil the specialization of those subclusters, gene sets
   were also computed within the sub-divisions of each cluster at the last
   clustering level with the top 30, 50, 100 DEGs. Final annotation of
   each cluster was manually assigned based on conserved gene terms within
   all the analysis.

Added value analysis

Added value 1

   Comparing the DEGs defined by individual dataset and integrated
   dataset. The individual datasets were normalized with SCTransform and
   DEGs were identified within top 3000 most variable genes using a
   Wilcoxon Rank Sum test. For integrated dataset, DEGs were identified
   within top 3000 most variable genes from integrated assay using a
   Wilcoxon Rank Sum test. False negative and false positive rate were
   calculated by comparing the DEGs identified by integrated dataset and
   individual dataset ([337]Figures 3E and 3F).

Added value 2

   Cluster stability evaluation for individual dataset. To understand if
   the clusters identified from three datasets can remain stable within a
   single dataset or not, the bootstrapping strategy was applied to each
   dataset with the annotation identified by the integrated dataset.

Added value 3

   Comparing cluster identified by a single dataset. To further understand
   if the clusters can be identified by one dataset only, Baryawno dataset
   was used as an example considering its large cell populations. The same
   pipeline from normalization to bootstrapping was applied to this
   dataset and the clusters identified from this single dataset was
   compared with the clusters identified by three datasets using Jaccard
   index.

Added value 4

   Application of cluster identity in a separate dataset ([338]Baccin
   et al., 2019). The clusters identified were used as reference to
   annotate ECs and MSCs from Baccin data using singleR (version
   1.4.1)([339]Aran et al., 2019).Normalized counts in “RNA” assay from
   Seurat object were used for this analysis.

Single-cell RNA-seq analysis of human samples

   Preprocessing of sequencing data: Preprocessing of single-cell RNA-seq
   data for each in house sample were conducted by CellRanger count from
   Cell Ranger (version 6.0.1) using reference genome GRCh38.

Sample filtering

   The single cell analysis of human analysis was performed as described
   in before except for human cells with more than 10% mitochondrial genes
   were also removed. Because the exploratory data analysis revealed
   potential contamination of B cells, we applied an additional filter:
   cells with more than 10% reads mapped to immunoglobin genes were
   excluded from downstream analysis.

Integration

   After filtering, each sample was normalized using SCTransform and
   integrated using 3000 most variable features using Seurat. Following
   integration, dimension reduction with PCA, data visualization with
   UMAP, computation of K-nearest neighbors with 20 dimensions and
   clustering using resolution of 0.4 were performed.

Select EC and MSC (without further integration)

   Clusters were annotated based on biological insights on markers.
   Cluster 11 were identified as mesenchymal and cluster 1, 6 were
   identified as endothelial. During the exploratory analysis, human EC
   cells were subclustered at resolution 0.4 and one of the clusters
   identified was further filtered for the downstream analysis because the
   cells in the cluster were not expressing EC marker genes. Several
   outliers from human MSC cells were also removed.

Compare human MVGs with mouse DEGs

   3000 MVGs for human EC and MSC were identified using “RNA” assay and
   these genes were scaled in “integrated” assay resulting in 932 human EC
   specific MVGs and 976 human MSC specific MVGs. The MVGs from human EC
   or MSC were compared with DEGs from each mouse cluster. The enrichment
   score for a given cluster i was defined as the ratio between “the
   number of genes shared between human MVGs and mouse DEGs from cluster
   i” and “the expected number of genes”, where the later was computed as
   follows:
   [MATH: <mrow><mfrac><mrow><mi>M</mi><mi>V</mi><msub><mi>G</mi><mi
   mathvariant="italic">human</mi></msub><mo>∩</mo><mi>M</mi><mi>V</mi><ms
   ub><mi>G</mi><mi mathvariant="italic">mouse</mi></msub><mspace
   width="0.25em"></mspace></mrow><mrow><mi>M</mi><mi>V</mi><msub><mi>G</m
   i><mi mathvariant="italic">mouse</mi></msub></mrow></mfrac><mo
   linebreak="goodbreak">×</mo><mspace
   width="0.25em"></mspace><msub><mi>l</mi><mi>i</mi></msub></mrow> :MATH]

   Where
   [MATH: <mrow><msub><mi>l</mi><mi>i</mi></msub></mrow> :MATH]
   is the number of DEGs from mouse cluster i.

SingleR analysis between mouse and human

   To annotate human cells using mouse clusters as reference the singleR
   tool (version 1.4.1)([340]Aran et al., 2019) was utilized for ECs and
   MSCs separately. Cell type specific MVGs with expression values in
   “integrated” assay from Seurat object were used for this analysis.

Quantification and statistic analysis

   Statistical analysis in this study (t-test and comparing proportions)
   were conducted using R.

Ethics approval

   All animal experiments were performed in accordance with national and
   institutional guidelines and procedures were approved by the Ethical
   Committee for Animal Testing of the University of Navarra.
   Acknowledging the principles of 3Rs (Replacement, Reduction and
   Refinement), all mice used in this study were from mice that were
   euthanized by cervical dislocation as parts of other ongoing ethically
   approved experiments.

   The human sample collection and research conducted in this study were
   approved by the Research Ethics Committee of the University of Navarra.
   All the protocols used in this study were in strict compliance with the
   legal and ethical regulations.

Acknowledgments