Graphical abstract
graphic file with name fx1.jpg
[63]Open in a new tab
Highlights
* •
Integration and novel clustering improve the resolution of bone
marrow microenvironment
* •
Novel functional states in endothelium and differentiation stages
in the mesenchyme
* •
Inference of conserved features in human from those identified in
mouse
__________________________________________________________________
Biological sciences; Stem cells research; Omics; Transcriptomics
Introduction
Uncovering pathogenetic mechanisms requires identifying the
corresponding major groups of genes in the disease-relevant tissues
([64]Gomez-Cabrero et al., 2014; [65]Wolkenhauer et al., 2013). To this
end, collective efforts such as the Human Single-Cell Atlas have been
launched, aiming at providing a single-cell map of human tissues and
organs. A core case is the hematopoietic system, where single-cell RNA
sequencing (scRNA-seq) has allowed to refine our understanding of
hematopoiesis in mouse and human ([66]Giladi et al., 2018;
[67]Nestorowa et al., 2016; [68]Rodriguez-Fraticelli et al., 2018).
Furthermore, these studies have challenged the classical view of
hematopoiesis differentiation as a compendium of discrete cellular
states with decreased differentiation potential toward a more dynamic
view in which hematopoietic stem and progenitor cells (HSPC) gradually
pass through a continuum of differentiation states ([69]Karamitros
et al., 2017; [70]Laurenti and Göttgens, 2018; [71]Velten et al., 2017;
[72]Weinreb et al., 2020). Moreover, recent studies using scRNA-seq
technologies have shed light on the organization of the hematopoietic
regulatory microenvironment in the mouse ([73]Baccin et al., 2020;
[74]Baryawno et al., 2019; [75]Kanazawa et al., 2021; [76]Matsushita
et al., 2020; [77]Tikhonova et al., 2019; [78]Wolock et al., 2019;
[79]Zhong et al., 2020). These studies have resolved some of the
controversies regarding the overlap of stromal populations previously
described and the description of certain discrete stromal cells as
professional, hematopoietic cytokine-producing populations ([80]Baccin
et al., 2020). Moreover, further combination with in situ technologies
helped to delineate the relationship between specific stromal cell
types in the murine bone marrow (BM) ([81]Baccin et al., 2020). This
data have provided a wider and more dynamic picture of hematopoiesis
and their regulatory microenvironment, allowing for a provocative
hypothesis to rise, such as whether their specific association with
given niches controls transcriptional states in hematopoietic stem
cells and whether these states are reversible upon occupying
alternative niches.
Nevertheless, these studies are limited by the number of cells
sequenced, potentially hampering our ability to resolve the full
spectrum of cellular states and differentiation stages that define the
stromal BM microenvironment. More recently, efforts to integrate
multiple datasets generated in different labs have been successfully
attempted ([82]Dolgalev and Tikhonova, 2021); however, they have failed
to provide additional information compared to that derived from the
individual studies. Furthermore, knowledge on the conservation of the
cellular composition in the human BM stroma is in its infancy due to
the difficulty of obtaining high-quality samples with sufficient
stromal cell numbers from healthy individuals. This leaves us with two
outstanding challenges; how to piece together such different fragments
toward a comprehensive molecular atlas and to what extent such an atlas
in mice is conserved in the human bone marrow.
Here, we integrate three scRNA-seq datasets (two publicly available
([83]Baryawno et al., 2019; [84]Tikhonova et al., 2019) and one
in-house) separately targeting two well-defined populations
(endothelial and mesenchymal cells). The integration of distinct
datasets required developing tailored bioinformatics pipelines to
ensure the robust identification of cell types and stages. We identify
14 endothelial subclusters and 11 subpopulations defining different
stages of differentiation in the mesenchyme. Our analysis provides the
most comprehensive atlas of the cellular composition in the mouse bone
marrow. Last, we asked to what extent such an atlas could provide
insight into the less accessible human BM microenvironment. To this
end, we made the first pilot study, profiling the human BM using
scRNA-seq, which was integrated with our mouse BM atlas. This analysis
demonstrated substantial conservation between species.
Results
Data integration and high-resolution clustering strategy
[85]Figure 1 provides a graphical summary of the experimental design
and the analysis flow. We integrated selected subsets of cells from
three distinct mouse datasets: two recently published (“[86]Tikhonova
et al. (2019), 6626 cells, and “[87]Baryawno et al. (2019), 38443
cells) and an independent dataset (“In-house” dataset, 13,402). These
datasets differ in the procedures for isolation of cells within the BM
microenvironment. This includes unbiased isolation of cells lacking
hematopoietic markers (“Baryawno” ([88]Baryawno et al., 2019) and
“In-house”) ([89]Figure S1) versus targeted isolation of populations of
interest as in the study by [90]Tikhonova et al. (2019). Furthermore,
not every cell type identified in one study is present in the other
datasets.
Figure 1.
[91]Figure 1
[92]Open in a new tab
Overview of the paper
Graphical brief description of the paper. See also [93]Figure S1.
We decided to focus our analysis on those bona fide niche populations
such as endothelial (EC) and mesenchymal (MSC) cells due to their
presence in the three studies and their relevance in controlling
hematopoietic stem cell (HSC) maintenance. For the integrative
analysis, we used the Tikhonova study as a reference to facilitate the
integration, considering that their cells were isolated based on
fluorescence reporter expression driven by cell-specific gene
promoters: VE-Cad for endothelial cells and LEPR for mesenchymal cells.
Therefore, to identify and label the cells of interest, we integrated
separately “In house + Tikhonova” and “Baryawno + Tikhonova”
([94]Figure 2A) using the cell labels from the Tikhonova study. As a
result, and after quality filters (see [95]STAR Methods), we labeled in
each dataset endothelial cells (N = 9587) and mesenchymal cells (N =
5291) that were used for integration of the three datasets for each
cell type.
Figure 2.
[96]Figure 2
[97]Open in a new tab
Data integration and high-resolution clustering strategy
(A) Integrated analysis of the bone marrow niche datasets (two publicly
available, [98]Tikhonova et al., 2019 and [99]Baryawno et al., 2019 and
one generated in our lab, in-house) separately for two well-defined
population (endothelial and mesenchymal cells). [100]Tikhonova et al.,
2019 dataset is used as a reference considering their separated cell
profiling strategy for COL2.3+, LEPR+, and VE-Cad+ populations. In the
top row, a UMAP projection is depicted for each single-cell RNA
dataset. In the left-lower (“In-house + Tikhonova”) and in the
right-lower (“Baryawno + Tikhonova”), datasets are integrated to
identify endothelial and mesenchymal populations.
(B) Clustering strategy: the analysis of endothelial cells as an
example. An upper limit to the cluster is set for the clustering (left
panel) using Louvain high-resolution clustering. Then, an iterative
divide-and-conquer strategy identifies the optimal level of clusters at
different levels: Level 1 (second panel from the left), Level 2 (third
panel), and Level 3 (fourth panel).
(C–E) The robustness analysis for sub-clustering B3 (from Level 2 to
Level 3). Specifically: (C) subclusters identified, (D) the fraction of
assignments to its original cluster using a random-forest +
bootstrapping strategy and (E) summary of the results (D) per cluster,
#correct indicates the times a cluster is a dominant cluster (see
[101]STAR Methods) for a cell within it in all pairwise comparisons
(see [102]Figure S2 for the sub-clustering analysis of A1 and A2). See
also [103]Figures S2 and [104]S3.
Following the integration, endothelial and mesenchymal groups were used
to identify cellular subtypes and stage-specific cells ([105]Figure 2,
[106]S2, and [107]S3). However, current state-of-the-art clustering
methodologies, including Louvain clustering ([108]Blondel et al., 2008;
[109]Traag et al., 2019), could not discriminate robustly among
different cell subtypes ([110]Figure 2B left panel and
[111]Figure S3A), in part because there is a high degree of
cell-to-cell similarity when considering cells of the same origin
([112]Tasic et al., 2016). To enable robust sub-clustering, we
customized an existing bootstrapping-based approach. In brief, first, a
divide-and-conquer strategy is applied, where the first level of robust
clusters is identified (see [113]STAR Methods). Next, we proceeded with
another round of clustering, yielding the second level of robust
subclusters. As a termination criterion, no sub-clustering was
considered if a cluster was found to have no sub-divisions in the
Louvain high-resolution clustering ([114]Blondel et al., 2008;
[115]Macosko et al., 2015) ([116]Figure 2B left panel and
[117]Figure S3A). As a result, the cells are grouped into clusters;
then, in the second step, we applied a bootstrapping-based methodology
adapted from the Bosiljka study ([118]Tasic et al., 2016) (see
[119]STAR Methods and [120]Figure S2A) to quantify the robustness of
each cluster. We formulated two evaluation metrics: for each cell, we
computed “how many times it has been correctly assigned to the cluster
proposed” (e.g. recall per cell in [121]Figure S2D), and for each
cluster, we quantified “how many times a cluster remains dominant
(#Correct) in all comparisons for cells within it” (e.g. #correct in
[122]Figure S2E); see [123]STAR Methods for a detailed explanation. If
a non-robust cluster was identified, the cells of such cluster were
then assigned to the neighboring clusters repeating the
random-forest-based strategy. For instance, in the analysis of ECs,
three levels of sub-clustering were conducted ([124]Figure 2B middle
and right panels). In Level 1, two clusters were identified (A and B),
and considering that possible subclusters were identified in the
Louvain high-resolution clustering ([125]Figure 2B, (second panel from
the left), they were sub-clustered resulting in Level 2. After Level 2,
only three subclusters were further investigated (A1, A2, and B3) in
Level 3 ([126]Figure 2B, third panel). Level 3 was identified as the
final level, and a robustness analysis was conducted for the Level 3
clusters ([127]Figure 2B fourth panel, C-E and [128]Figures S2C-S2H).
Non-robust clusters were eliminated and their cells were reassigned to
neighboring clusters (see [129]STAR Methods). Similarly, we applied the
same robust clustering to the mesenchymal stromal cells
([130]Figures S3A-S3H). By using this approach, we were able to
describe 14 subclusters in the endothelium ([131]Figure 2B fourth panel
and [132]Figure 3A) and 11 in the mesenchyme ([133]Figure S3A fourth
panel and [134]Figure 3C). We observed that both “cell proportions of
subclusters in each dataset” and “the cell proportions of the dataset
of origin within each subcluster” varied between datasets and
subclusters ([135]Figures 3B and 3D, [136]S2B, and [137]S3B). While
those biases could be caused for instance by the different selection
protocols, we aimed to study the extent of the bias: (a) we did not
identify any association with cell stage ([138]Figure 3B and 3D);
(b) only clusters B2 and D3 did not have cells from the three datasets.
And finally, (c) to quantify the possible biases derived from datasets,
we conducted an entropy analysis to evaluate dataset distribution in
the clusters; we observed large levels of entropy—varying between
clusters—and the averages per cluster are shown in [139]Table S1
([140]Figure S5).
Figure 3.
[141]Figure 3
[142]Open in a new tab
Quality control and added value of the clustering analysis
(A) Representation of the final endothelial clusters.
(B) Left panel: proportion of cells per dataset in each cluster in the
final Endothelial clustering analysis. Right panel: proportion of cells
per cell cycle stage using Seurat in each cluster in the final
Endothelial clustering analysis.
(C and D) Similar as (A and B) for MSC.
(E and F) The added value of the integrated approach. Upper panel:
every cell depicts the % of markers identified per cluster using only
one dataset when compared with the markers identified in the total
dataset. Middle panel: every cell depicts the % of False Negatives.
Bottom panel: every cell depicts the % of False Positives when
comparing the analysis conducted within each dataset with the
integrated analysis (considered as the correct result).
(E and F) are respectively associated with endothelial and mesenchymal
cells.
(G and H) Robustness of the cluster characterization using only cells
from a single dataset but maintaining the same cluster structure.
(G and H) are respectively associated with endothelial and mesenchymal
cells. See also [143]Figure S4.
By integrating the three datasets, distinct cellular states in the
microenvironment and the description of the gene markers defining those
subtypes were identified. The rationale being that despite the partial
overlap observed between the datasets, a larger number of cells would
generally contribute to a deeper characterization of the
subpopulations. To directly address this and using the markers derived
from the integrative analysis, we conducted a series of analyses to
assess whether the integration provided additional insights compared to
each dataset.
First, we investigated what percentage of final integrated-based
markers was recovered by each dataset by itself ([144]Figures 3E and 3F
upper panel), what percentage of false negatives ([145]Figures 3E and
3F middle panel) and false positives ([146]Figures 3E and 3F lower
panel). For some of the subclusters, such as B3.4, A2.1, and A2.6
(endothelium) or C2.1 and C3 (mesenchyme), over 50% of the markers
could not be detected by each dataset separately. In a second analysis,
we quantified the robustness of the defined clusters using data from
each study separately ([147]Figures 3G and 3H). Only the Baryawno
dataset allows for the robust identification of all the subclusters
except for B3.4 in the endothelium. However, performing the same
clustering strategy using only the Baryawno’s dataset in the
endothelium could not identify all the subclusters with the same level
of resolution compared to those observed with the integrated dataset
([148]Figure S4). We observed that the marker analysis using several
differential expression tools provided similar results (data not
shown), demonstrating that our approach is valid even for small size
clusters and unbalanced groups.
Next, we used an independent dataset to validate the performance of our
integration effort. To that end, we applied our clusters’ signature to
derive automatic annotations on an independent dataset ([149]Baccin
et al., 2020) using SingleR ([150]Figure S6). This allowed us to
discriminate in the independent dataset most of the cellular states we
described based on the integration despite the lower number of cells
sequenced ([151]Figure S6). These data further validate our integrative
approach.
Taken together, these data demonstrate that our customized approach for
the integration of multiple datasets allows for a robust deconvolution
of cell states when there is a high degree of similarity between cells
of the same origin. Furthermore, transcriptional cellular stages
inferred from the integration could be applied to further characterize
independent single-cell datasets.
Deep characterization of the BM endothelial cell compartment
Next, we aimed to characterize each of those 14 endothelial subclusters
([152]Figure 3A) based on the identified markers ([153]Table S2). Using
the expression of those molecular markers, we could discriminate
between arteries and sinusoids ([154]Figure 4A and [155]Table S3) in
agreement with previous reports ([156]Hooper et al., 2009; [157]Itkin
et al., 2016; [158]Rafii et al., 2016). Arterial clusters showed high
expression of specific arterial genes such as Ly6a, Ly6c1, Igfbp3, and
Vim ([159]Figure 4B upper panels). At the same time, sinusoidal cells
were defined by their characteristic signature expressing Adamts5,
Stab2, Il6st, and Ubd ([160]Figure 4B lower panels). Importantly,
besides these already known markers, differential expression analysis
of the integrative datasets revealed some novel genes specific of each
endothelial population, such as Igfbp7 and Ppia for arteries and Cd164
or Blrv for sinusoids ([161]Figure 4C). The expression of these genes
would be consistent with the role Igfbp7 and Blvrb in the maintenance
of endothelial vasculature homeostasis ([162]Klóska et al., 2019;
[163]Tamura et al., 2009).
Figure 4.
[164]Figure 4
[165]Open in a new tab
Deep characterization of the endothelial cell compartment in the BM
(A) UMAP representation of arteries (red) and sinusoids (blue) within
the endothelial cell population. The right-bottom subpanel depicts the
final endothelial clusters identified.
(B) Violin plot of gene expression for known markers of arteries and
sinusoids cell subtypes.
(C) Violin plot of gene expression for new marker candidates separating
arteries and sinusoids cell subtypes.
(D) Dot plot of the top 5 markers for each endothelial subcluster. Dot
size corresponds to the proportion of cells within the group expressing
each gene, and dot color corresponds to the average expression level.
(E) Selected set of gene sets derived from the gene set analysis
conducted with top 50 markers per cluster (see [166]STAR Methods).
(F) Final clustering of the endothelial cell population and the
labeling assigned based on marker genes and gene set analysis. See also
[167]Figure S4 and [168]Tables S2, [169]S3, and [170]S4.
Beyond characterizing arteries and sinusoids, we annotated their
respective cell states using cluster’s markers based on genes
([171]Table S2, [172]Figure 4D) and gene sets (derived from gene set
enrichment analysis, see [173]STAR Methods for a detailed version of
the annotation; [174]Table S4, [175]Figure 4E). Our final annotation
described in [176]Figure 4F was based in conserved Gene Ontology terms,
and supported, with no contradictory gene sets, from Reactome and KEGG
pathways analysis ([177]Table S4/data not shown). Gene sets related to
vasculature development and remodeling were identified within the
endothelial subclusters, confirming the identity of this cell
population ([178]Table S4). We uncovered two subclusters (A1.1 and
A2.5, respectively) within the arteries and sinusoids groups, which
were enriched in gene sets involved in wounding. This finding is in
agreement with the role of EC in pro- and antithrombotic processes
([179]Yau et al., 2015). Gene sets involved in extracellular matrix
assembly, cell adhesion, and migration processes were specifically
enriched in arteries, in line with the importance of these biological
processes for vascular morphogenesis ([180]Davis and Senger, 2005). In
relation to the structural support provided by arterial cells,
subcluster B3.2 (actin, endocytosis) implicated in matrix remodeling
was defined by the expression of RhoC, Apln, and Anxa2. Other arterial
subclusters such as ROS and Immune (A1.2 and B3.1, respectively)
include highly expressed gene sets involved in the regulation of
reactive oxygen species metabolic process and cytokine-mediated
signaling pathway. These findings are in line with the role of ECs in
maintaining the redox balance and leukocyte regulation ([181]Testa
et al., 2016; [182]Zhao et al., 2012). Sinusoidal-endothelial
subclusters such as A2.1 and A2.6 showed enrichment in GO terms related
to artery development and endothelial proliferation processes, two
critical steps within the process of angiogenesis ([183]Naito et al.,
2021). Furthermore, the sinusoidal subcluster A2.7 expressed gene sets
involved in ion transport and signaling-related signatures. This is in
concordance with the need of EC to constantly sense and adapt to
alterations in response to microenvironmental cues ([184]March et al.,
2009; [185]Quillon et al., 2015). Of note, ion channels play a role in
EC functions controlled by intracellular Ca^2+ signals, such as the
production and release of many vasoactive factors such as nitric oxide.
In addition, these channels are involved in the regulation of the
traffic of macromolecules, controlling intercellular permeability, EC
proliferation, and angiogenesis ([186]Nilius and Droogmans, 2001).
Importantly, several markers that were found only with the dataset
integration correspond to genes within the GO categories used to label
the clusters, hence, revealing their important role in defining the
function of these cell states. For example, in subcluster B2, genes
such as Gja1, Tgfb3, and Ablim2 are involved in regulating cell
junctions and cytoskeletal organization ([187]Barrientos et al., 2007;
[188]Okamoto and Suzuki, 2017).
Taken together, these results suggest a previously unrecognized level
of specialization of the bone marrow endothelial cells. Furthermore,
the specificity of the distinct functional states in the EC suggest
that the endothelial compartment is a more dynamic and flexible tissue
with a richer intrinsic repertoire than previously anticipated.
However, functional validation is required to confirm the putative role
of each EC intermediate cell state described here.
Deep characterization of the BM mesenchymal cell compartment
Applying the same robust clustering to mesenchymal stromal cells, we
identified 11 subclusters and proceeded with the annotation
([189]Figure 3C and [190]S3, and [191]Tables S5, [192]S6, and [193]S7).
Based on the expression of canonical markers, we first discriminated
clusters between early mesenchymal (MSC), and cells already committed
to the osteolineage (OLN-primed) ([194]Figure 5A and [195]Table S6).
The high expression of Cxcl12, LepR, Adipoq, or Vcam1, among others,
confirmed the identity of the early MSC group ([196]Figure 5B left
column); whereas Bglap, Cd200, Alpl, or Col1a1 expression revealed the
presence of osteolineage committed cells within the mesenchymal
compartment ([197]Figure 5B right column). Furthermore, we identified a
number of previously unrecognized, differentially expressed genes
between the MSC and OLN-primed clusters such as Sbds and Itgb1 for MSCs
and Enpp1 and Vkorc1, for OLN-primed cell type population
([198]Figure 5C). Itgb1, highly expressed in MSC is implicated in human
chondrogenic differentiation of mesenchymal cells ([199]Hamidouche
et al., 2009). Among OLN-primed specific markers, Enpp1 and Vkorc1 have
been shown to regulate bone development by regulating bone
calcification ([200]Hajjawi et al., 2014; [201]Mackenzie et al., 2012;
[202]Price, 1985; [203]Spohn et al., 2009).
Figure 5.
[204]Figure 5
[205]Open in a new tab
Deep characterization of the mesenchymal cell compartment in the BM
(A) UMAP representation of mesenchymal (red) and osteolineage-primed
(OLN-primed) (blue) within the mesenchymal compartment. The right-upper
subpanel depicts the final mesenchymal clusters identified.
(B) Violin plot of gene expression for known markers of mesenchymal
(red) and osteolineage-primed (blue) cells.
(C) Violin plot of gene expression for new marker candidates separating
mesenchymal (red) and osteolineage primed (blue) cells.
(D) Dot plot of the top 5 markers for each mesenchymal subcluster. Dot
size corresponds to the proportion of cells within the group expressing
each gene, and dot color corresponds to the average expression level.
(E) Selected set of gene sets derived from the gene set analysis
conducted with top 50 markers per cluster (see [206]STAR Methods). (∗)
transmembrane receptor protein serine/threonine kinase signaling
pathway.
(F) Final clustering of the mesenchymal cell population and the
annotation based on marker genes and gene set analysis. See also
[207]Figure S4 and [208]Tables S5, [209]S6 and [210]S7.
Besides MSCs and OLN-primed MSCs, we identified additional
subpopulations. Through the marker identification and gene set analysis
of the 11 subclusters ([211]Figures 5D and 5E and [212]Tables S5,
[213]S6, and [214]S7) we were able to characterize and label each of
the clusters as shown in [215]Figure 5F. GO terms such as adipogenesis,
assembly and organization, immune response, cell migration, or muscle
differentiation were enriched in the C2.3, C2.1, C1, C3, C4.2, and C4.1
subsets respectively, confirming the identity of this MSC cell group.
Furthermore, terms related to extracellular matrix, chondrocyte
differentiation, and bone development, including bone formation,
ossification, or epithelial migration, among others, were identified in
the OLN-primed subclusters (C5, C6, and D, respectively), verifying the
identity of these more mature cells (OLN-primed cells) within the
mesenchymal stromal cells.
Taken together, these results demonstrate that the newly identified
mesenchymal subpopulations could not be properly characterized without
the multi-dataset integration and a novel clustering approach. Further,
our data provide evidence of the heterogeneity of the mesenchymal
compartment in the BM.
Composition of the human endothelial and mesenchymal BM microenvironment
While our data reveal a previously unrecognized heterogeneity in the
murine BM endothelial and mesenchymal compartments, information about
the composition of the human microenvironment and how much of this
heterogeneity is observed in humans remains unanswered. To address this
issue, we performed scRNA-seq analysis in prospectively isolated EC
(TO-PRO-3^-, CD45^−, CD235^-, Lin^−, CD31^+, and CD9^+([216]Barreiro
et al., 2005; [217]Kenswil et al., 2018)) and MSC-OLN (TO-PRO-3^-,
CD45^−, CD235^-,Lin^−,CD271^+([218]Ghazanfari et al., 2016;
[219]Hashemi et al., 1991; [220]Quirici et al., 2002), and CD146^+/−)
([221]Figure S7A) from iliac crest BM aspirates from four healthy young
adults (20–30 years of age) ([222]Figure 6A). As described in
[223]Figure 6B, we added additional filtering steps in the
bioinformatic analysis to identify the two populations of interest, EC
and MSC. As an additional quality control, we estimated the
contribution of each human sample to EC and MSC subsets and cell cycle
status ([224]Figures S7B and S7C). The EC (907 and 658 cells, clusters
1 and 6, respectively) ([225]Figures 6C and [226]S7B middle panel)
identity was confirmed based on the expression of canonical endothelial
markers such as PECAM1 (coding for CD31), CD9, ICAM2, VLC, and ITGB1
([227]Figure 6D and [228]Table S8). In addition, examining functional
pathways in clusters 1 and 6 revealed enrichment in GO terms associated
with blood coagulation and hemostasis ([229]Table S9). The MSC identity
(249 cells, cluster 11, [230]Figures 6C and [231]S7B middle panel) was
confirmed by the expression of the mesenchymal specific genes (CXCL12
and LEPR) and the OLN-primed specific genes ANGPT1, COL1A1, and VCAM1,
among others ([232]Figure 6E and [233]Table S8). Furthermore,
enrichment in functions associated with extracellular matrix
organization and response to the mechanical stimulus was demonstrated
in osteolineage cells ([234]Table S9). In summary, the generated human
data suggest that single-cell RNA sequencing from iliac crest aspirates
can aid in describing the complexity of the human BM microenvironment.
Nevertheless, the limited number of EC and MSC, as well as the presence
of contaminating populations did not allow a fine-grained clustering as
the one performed in the mouse data.
Figure 6.
[235]Figure 6
[236]Open in a new tab
Composition of the human endothelial and mesenchymal BM
microenvironment
(A) Experimental design for the human BMM characterization.
(B) Scheme of customized bioinformatics pipeline filtering the cells
with a large number of Immunoglobulin genes.
(C) UMAP visualization of color-coded clustering of the human BM
microenvironment after filtering cells.
(D and E) Expression of representative markers for endothelial
population (D) and mesenchymal-osteolineage cells (E) using an UMAP
representation. See also [237]Figure S5 and [238]Tables S8 and [239]S9.
Substantial conservation of the EC and MSC population in the human BM
microenvironment
Based on the limitations of the human data, we next investigated to
what extent the knowledge uncovered in mouse could be applied to
identify subpopulations/cell states in the human BM microenvironment.
As a first step, we used single-cell mouse data to annotate the human
cells using SingleR ([240]Aran et al., 2019), separately for EC
([241]Figures S8A and S8B) and MSC ([242]Figures S8C and S8D). We
observed that MVG identified in mouse allowed us to separate the cells
into clusters for both human EC and MSC ([243]Figures S6A–S6D). As a
result, this analysis suggests that part of the biological mechanisms
defining the BM microenvironment may be shared between species.
Therefore, we decided to investigate the enrichment of conserved
features (genes) between mouse and human; therefore, an enrichment
score (ES) was computed for each cluster for EC and MSC separately (see
[244]STAR Methods) ([245]Figure 7A and [246]Tables S10 and [247]S11).
In the case of endothelial cells, the enrichment score was up to 2-fold
([248]Figure 7A): wounding (A1.1) with 2.15-ES, the junction (B2)
2.26-ES, arteriogenesis (A2.1) 2.02-ES, and signaling (A2.7) 2.5-ES).
Importantly, for some of the subclusters as sinusoidal signaling (A2.7)
and the arterial of angiogenesis (B3.3), these shared genes are
critical for defining each of those EC functional states
([249]Table S10). Both human and mouse ECs express DDIT4, JUN, CITED2,
GADD45G, DUSP1, FOS, and CLDN5, which are part of a wide array of
transcription factors, growth factors, and signaling pathways that have
been described to regulate the maintenance of vascular homeostasis
under physiological conditions ([250]Echavarria and Hussain, 2013;
[251]Escudero-Esparza et al., 2012; [252]Jia et al., 2016). Similarly,
ECs subclusters involved in angiogenesis in both species shared the
expression of RGCC, GATA2, KLF2, and CAV2 genes, which are known to be
implicated in angiogenic-related processes ([253]Lee et al., 2006;
[254]Linnemann et al., 2011).
Figure 7.
[255]Figure 7
[256]Open in a new tab
Conservation analysis of the EC and MSC population in the human BM
microenvironment
(A and B) Quantification of the conservation for EC (A) and MSC-OLN
(B) cells for each cluster. The enrichment of those genes that are
cluster markers in mouse and observed in most variable genes (MVG) of
EC and MSC human cells, respectively. The right column shows, among the
genes identified in human, those that are part of the gene sets used to
label the cluster. See also [257]Figure S6 and [258]Tables S10 and
[259]S11.
(C) Heatmap of the cytokines, secreted molecules, and growth factors in
EC and MSC.
In the case of mesenchymal cells, we identified >3.5-ES for three
subclusters, such as RNA-Myogenesis (C4.1), Factors-Immune (C4.2), and
ossification (D2). Importantly, some of the shared genes correspond to
genes that allowed the subcluster labeling through GO categories
([260]Figure 7B and [261]Table S11). Genes such as CXCL12, APOE, or
IGFBP3 are associated with cell migration and lipid transport pathways
among others ([262]Amable et al., 2014; [263]Robert et al., 2020), and
characterized the murine adipogenesis subcluster (C2.3). Other genes,
such as IGFBP5, are involved in actin filament assembly and
organization ([264]Sureshbabu et al., 2012) and defined the cell
adhesion subcluster (C2.1). IFIT3, MIDN, and ILR1 belong to pathways
associated with interferon regulation or autoimmunity ([265]Guo et al.,
2017; [266]Kim et al., 2020) and were identified in the immune response
subclusters (C3 and C4.2). Additionally, the expression of COL5A1 and
CADM1 genes, previously related to collagen fibril organization and
bone mineralization processes ([267]Kahai et al., 2004; [268]Nakamura
et al., 2017), defined the bone formation subcluster (D1). Moreover,
genes such as SPP1 or CLEC11A, which are related to osteoblasts
function and mineralization ([269]Huang et al., 2007; [270]Shen et al.,
2021; [271]Yue et al., 2016), defined the mouse OLN-primed MSCs
subcluster associated with ossification (D2) and are also expressed in
human MSC-OLN cluster. Altogether, this data indicate the conservation
of the osteogenic microenvironment between both species. To further
explore the conservation of the BM microenvironment between both
species, we examined the expression of known niche factors that
regulate hematopoiesis. Despite the limited number of human cells, we
were able to identify common niche-derived factors specific to the
endothelium (SPARC, TGFB1, SELP, and EGFL7 among others) and the
mesenchyme (including CXCL12, RARRES2, KITLG, and GAS6)
([272]Figure 7C). These cytokines, secreted molecules and growth
factors, represent key regulators of HSPCs shared among species.
Together, our analysis suggests that deep characterization of cellular
states in mice can be used to infer conserved features in the human BM
microenvironment despite a low level of conservation in the actual
transcriptional profile of EC and higher in the case of MSC.
Importantly, our data reveal a substantial degree of conservation
regarding the complexity and heterogeneity of the EC and MSC
compartment in the BM between mouse and human. This suggests that the
layers of microenvironmental regulation of hematopoiesis and the
identified plasticity in mice may also be shared between species.
Discussion
Our study dissects the intrinsic organization and the heterogeneity
within the endothelial (EC) and mesenchymal cell populations (MSC)
governing the BM microenvironment. This was accomplished through
customized bioinformatics integration of multiple datasets along with
the inclusion of over 50.000 murine bone marrow stromal cells. We were
able to identify new subsets of MSC and EC but, but more importantly,
to define new molecular markers for the identification of highly
specialized subpopulations of cells in the BM microenvironment. Pathway
enrichment analysis unveiled multiple, potentially transient cell
states defined by differential gene expression and the enrichment of
specific functional characteristics. Importantly, EC subsets were
characterized by enrichment in pathways known to be essential for
endothelial homeostasis maintenance, demonstrating a high degree of
specialization in the endothelium. Similarly, multiple transient cell
states in the MSC compartment were defined and characterized by their
differentiation capacity. Importantly, our deep deconvolution of the
heterogeneous mesenchymal and endothelial compartments became feasible
only by integrating multiple datasets. Of note, our analysis showcases
that a research paradigm aiming for the generation of a detailed
comprehensive molecular atlas of an organism requires both multi-omic
data and computational integration. Here, we have relied on what is
referred to as unpaired unimodal (scRNA-seq) data. Clearly, a natural
next step is to use and further develop new computational tools that
permit the integration ([273]Argelaguet et al., 2021) of unpaired
multi-omics datasets such as scRNA-seq, scATAC-seq, and other data
modalities. Recent technological developments enable several multiple
omics recorded from the same cell, i.e. paired data, which leverages
our ability to dissect and molecularly characterize the intrinsic
organization of the bone marrow niche environment. Advances in
computational biology have started to develop such tools ([274]Ashhurst
et al., 2021; [275]Hao et al., 2021; [276]Martinez-de-Morentin et al.,
2021; [277]Wu et al., 2021). Moreover, some validation experiments,
such as the use of fluorescent reporters, targeting niche-associated
genes, and functional studies will allow us to confirm the identified
new molecular markers based on differential gene expression and also
the related annotated pathways.
While our study did not directly address the influence between stromal
cells in the hematopoietic stem cell niche and the HSCs, the deep
resolution of our study allows for some inferences to be made. Of note,
we detected the expression of vascular endothelial growth factor-C
(Vegf-c) in mouse endothelial and mesenchymal cells ([278]Tables S2,
[279]S5, [280]S6, [281]S8, and [282]S11). Vegf-c has recently been
implicated in the maintenance of the perivascular niche and the
recovery of hematopoiesis upon injury ([283]Fang et al., 2020). Vegf-c
is specifically expressed in the endothelial B1 and mesenchymal C1
subpopulations, suggesting an important role of these specialized
endothelial and mesenchymal cells in the preservation of the integrity
of the perivascular niche. In addition, Apelin+ (Apln) endothelial
cells have been recently implicated in HSC maintenance and regeneration
upon injury ([284]Chen et al., 2019a). Importantly, two endothelial
subclusters (B3.2-actin endocytosis and A1.1-wounding arteries)
demonstrated expression of Apln, suggesting that these EC states
represent specific sources of hematopoietic support and vascular
regeneration upon injury. Osteolectin+ LepR–+mechanosensitive
peri-arteriolar mesenchymal cells with osteogenic potential—are
implicated in lymphoid, but not HSC maintenance ([285]Shen et al.,
2021). Importantly, osteolectin expression defines murine cluster D2
(ossification) and shows conservation in human MSCs, suggesting the
preservation of a specialized lymphoid niche between species.
Detailed characterization of the human BM niche has not yet been
addressed. Approaches undertaken in a mouse system cannot readily be
transferred to the human system. Furthermore, differences in sample
processing can also impact the results. In that sense, our results,
despite the low number of cells, may represent the first dataset that
includes scRNAseq from the human endothelial and mesenchymal BM
microenvironment. While we were able to identify mesenchymal and
endothelial cells based on canonical markers shared with mice
([286]Aoki et al., 2021; [287]Baryawno et al., 2019; [288]Kalucka
et al., 2020; [289]Leimkühler et al., 2021; [290]Matthews et al., 2020;
[291]Severe et al., 2019; [292]Stumpf et al., 2020; [293]Tikhonova
et al., 2019; [294]Wang et al., 2020; [295]Xie et al., 2021), our human
scRNAseq did not possess enough resolution to elucidate the
heterogeneity of the human BM stroma to the same level as with the
mouse data. Based on the extensive knowledge generated in the mouse, we
therefore focused on characterizing how much of the information and
targets from the mouse can be of interest in human characterization.
This analysis allowed us to identify the expression of the human
orthologs to the murine cluster-defining genes with different degrees
of enrichment in the endothelium and mesenchyme. Some of these shared
genes in mice and human stromal cells corresponded to the GO-defining
genes of the different clusters identified in the mouse. Further,
analysis of niche factors produced by ECs or MSCs and known to regulate
hematopoiesis in mice were found conserved in the human samples. These
findings suggest a meaningful degree of conservation regarding the
cellular states that define the stromal microenvironment in mouse and
human. Although additional studies and improved processing of human
samples will be required for deep characterization of the human BM
microenvironment, these preliminary results validate our integrative
cross-species approach.
As an example of the added value, the current study identifies
candidates of relevance in the study of BM-related diseases. Sbds, a
ribosome maturation protein associated with the Shwachman-Diamond
syndrome, represents a previously unrecognized marker of immature MSCs
based on the dataset integration. Sbds deficiency has been implicated
in ossification defects and metabolic changes in HSPCs
([296]Raaijmakers et al., 2010; [297]Zambetti et al., 2016),
potentially contributing to myelodysplasia and AML onset in patients
with Shwachman-Diamond syndrome. Future studies will help to improve
the understanding of these new candidates and the pathogenesis
associated. On a broader note, deep molecular analysis of the BM
microenvironment sets the stage for computational disease modeling
([298]Tegnér et al., 2009) from a systems medicine perspective.
Taken together, our study provides a deeper understanding of the
composition and specialization of the BM microenvironment and point
toward a substantial degree of conservation between species. Moreover,
we demonstrate the usefulness of the multi-dataset integration and the
customized clustering approach used in our study to improve the
resolution of complex tissues and organs. This approach promises to aid
in the construction of cell atlases by reducing the resources
associated with sequencing that a single lab will need to invest in
order to obtain meaningful depth in single-cell analysis while reducing
the biases that may arise from data generated from a single laboratory
or platform.
Future studies integrating genome, transcriptome, epigenome, proteome,
and anatomical positioning together with functional assays to correlate
descriptive phenotypes with functional data will help fully resolve the
composition, regulation, and connectivity in the BM microenvironment in
health and disease.
Limitations of the study
This study provides the most comprehensive description to date of the
murine HSC-regulatory microenvironment and suggests a higher level of
specialization of the cellular circuits than previously anticipated.
Moreover, this knowledge generated in the mouse allows the inference in
human demonstrating substantial conservation between species. However,
these studies are limited by the number of cells sequenced, the
difficulty of obtaining high-quality samples with sufficient stromal
cell numbers from healthy individuals, as well as the presence of
contaminating populations in human data. The integrative analysis has
limitations to account for the different.
Abbreviations
* HSC Hematopoietic stem cell
* HSPC Hematopoietic stem and progenitor cells
* scRNA-seq Single-cell RNA sequencing
* BM Bone marrow
* EC Endothelial cells
* MSC Mesenchymal cells
* OLN-primed Osteolineage
* MVG Most variable genes
STAR★Methods
Key resources table
REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
__________________________________________________________________
Biotin anti-mouse lineage cocktail antibody (Mac1, CD3, Gr1, B220 and
TER-119) BIOLEGEND-DEDHAM Cat#133307
APC/Cyanine7 Streptavidin BIOLEGEND-DEDHAM Cat#405208
Brilliant Violet 510 anti-mouse CD45 antibody (clone 30F11)
BIOLEGEND-DEDHAM Cat#103137
APC anti-mouse CD45 antibody (clone 30F11) BIOLEGEND-DEDHAM Cat#103112
APC anti-mouse TER-119 antibody BIOLEGEND-DEDHAM Cat#116212
Brilliant Violet 510 anti-human CD3 antibody BIOLEGEND-DEDHAM
Cat#317332
Brilliant Violet 510 anti-human CD19 antibody BIOLEGEND-DEDHAM
Cat#363020
V500 anti-human CD45 antibody (Clone HI30) BD horizon Cat#560777
Brilliant Violet 510 anti-human CD64 antibody BIOLEGEND-DEDHAM
Cat#305028
BV421 anti-human CD45 antibody BIOLEGEND-DEDHAM Cat#304032
BV421 anti-human CD235 antibody BIOLEGEND-DEDHAM Cat#349108
FITC anti-human CD31 antibody BIOLEGEND-DEDHAM Cat#303104
APC/FIRE 750 anti-human CD9 antibody BIOLEGEND-DEDHAM Cat#312113
PE 750 anti-human CD146 antibody BIOLEGEND-DEDHAM Cat#361005
PERCP Cy5.5 anti-human CD271 antibody BIOLEGEND-DEDHAM Cat#345112
__________________________________________________________________
Biological samples
__________________________________________________________________
Bone marrow aspirates healthy young donors N/A
__________________________________________________________________
Chemicals, peptides, and recombinant proteins
__________________________________________________________________
STREPTAVIDIN MICROBEADS MILTENYI BIOTEC, S.L. Cat#130-048-101
Vybrant DyeCycle Orange Stain ThermoFisher Scientific Cat#V35005
ANNEXIN V FITC PHARMINGEN Cat#556419
7AAD BD Bioscience Cat#559925
TO-PRO-3 IODIDE (642/661) THERMO FISHER SCIENTIFIC S.L. Cat#T3605
__________________________________________________________________
Critical commercial assays
__________________________________________________________________
Miltenyi LD columns MILTENYI BIOTEC, S.L. Cat#130-042-901
__________________________________________________________________
Deposited data
__________________________________________________________________
Raw and analyzed data This paper Single Cell Portal: SCP1747
__________________________________________________________________
Experimental models: Organisms/strains
__________________________________________________________________
Female C57BL/6J mice, CD45.2 Jackson Laboratory Cat#000664
__________________________________________________________________
Software and algorithms
__________________________________________________________________
FlowJo (version 10.7.1) software N/A N/A
R (version 4.0.3, 3.6.3) R core [299]https://www.r-project.org/
Seurat (version 4.0.0, 3.2.3) [300]Stuart et al. (2019)
[301]https://github.com/satijalab/seurat/
SCTransform [302]Hafemeister and Satija (2019)
[303]https://github.com/satijalab/sctransform
IKAP [304]Chen et al. (2019b) [305]https://github.com/GenomicsNX/IKAP
CellRanger (version 6.0.1) 10× Genomics [306]www.10xgenomics.com/
SingleR (version 1.4.1) [307]Aran et al. (2019)
[308]https://github.com/dviraran/SingleR
ClusterProfiler (version 3.18.1) [309]Yu et al. (2012)
[310]https://guangchuangyu.github.io/software/clusterProfiler/
[311]Open in a new tab
Resource availability
Lead contact
Further information and requests for resources and reagents should be
directed to and will be fulfilled by the lead contact, Dr. David
Gomez-Cabrero ([312]david.gomezcabrero@kaust.edu.sa).
Materials availability
Requested materials are available from the [313]lead contact.
Experimental model and subject details
For in vivo animal studies, six females C57BL/6J mice (CD45.2) at age
20 weeks were used for Single-Cell RNA-seq experiments. Animals studies
were carried out in accordance with the European Communities Council
Directive (2010/63/UE) and with the approval of the Ethical Committee
for Animal Testing of the University of Navarra. The number of mice per
experiment was calculated to adhere to the “Replace, refine and reduce”
rule for animal use in experimental procedures.
For human studies, a total volume of approximately 60 mL bone marrow
(BM) was obtained by aspiration from the posterior iliac crest from
four healthy young adults (20–30 years of age) after written informed
consent was achieved. The human sample collection and research
conducted in this study were approved by the Research Ethics Committee
of the University of Navarra (project 2017.218). Personal data was kept
confidential in accordance with the Organic law 3/2018, on personal
data protection and Spanish law 14/2007 on Biomedical research. All
collection samples are codified, and only the authorized personnel were
able to correlate the patient’s identity with the codes.
Method details
Isolation and FACS sorting of murine bone marrow microenvironment cells
Six mice were euthanized via CO[2] asphyxiation. Bones from humerus,
radius, iliac crests, femurs and tibia were harvested in PBS 1X
containing 2% FBS and 2 mM EDTA (modPBS). All steps were performed on
ice to preserve cell viability and RNA integrity. Muscles and soft
tissue were thoroughly removed from the bones and bone marrow (BM)
cells were obtained by crushing in modPBS. Cells were then filtered
through a 70 μm cell strainer and red blood lysed with ACK buffer
(NH[4]Cl 150 mM, KHCO[3] 10 mM, and Na[2]EDTA 0.1 mM) for 10 min at
room temperature (RT) with rotation. Remaining calcified bone fragments
were collected on a 50 mL conical tube and digested with the
appropriate volume of PBS with 0.3% collagenase I and dispase (5 U/mL)
during 15 min at 37°C and shaking at 200 rpm. FBS representing 10% of
the digestion volume was added to stop the collagenase digestion. After
digestion, the calcified and crushed fractions were filtered through a
70 μm filter into a collection tube and pooled into one sample. Cells
were subsequently stained for 20 min on ice first in the appropriate
volume of modified PBS 1X (3 mL/mouse) with 160 ul/mouse of
biotinylated lineage cocktail (Mac1, CD3, Gr1, B220 and TER119)
followed by incubation with streptavidin magnetics microbeads
(100 μL/mouse). Negative selection was performed using Miltenyi LD
columns according to manufacturers’ protocol. After selection, the
sample was stained with the following combination of conjugated
antibodies at a concentration of 1/200: APC-Cy7 labeled streptavidin,
BV510 labeled anti-CD45, APC labeled anti-CD45 and APC labeled
anti-TER-119. Samples were then stained with 0.05 μM of Vybrant dye
orange (VDO) at 37°C for 30 min to label living cells. Annexin V was
also added, in combination with 7AAD to discard apoptotic and dead
cells from the sample respectively. For annexin V staining, cells were
stained with 1 μL/mouse of Annexin V-FITC on an appropriate volume of
1X Annexin V binding buffer in the dark for 15 min at RT. Samples were
then resuspended in 1X Annexin V buffer and 5 μL of 7AAD dye (up to
1 × 10^6 cells) were added. BM non-hematopoietic cells were FACS sorted
using BD FACSAria II sorter collected in PBS 1X supplemented with 0.05%
UltraPure BSA and cell viability was assessed using Nexcelom
Cellometer. Data were analyzed by FlowJo (version 10.7.1) software.
Isolation and FACS sorting of human bone marrow endothelial and
mesenchymal-osteolineage cells
All sample processing steps were performed on ice to preserve cell
viability and RNA integrity. A total volume of approximately 60 mL of
bone marrow was obtained by aspiration from the posterior iliac crest.
Red blood cells were lysed twice with 45 mL of ACK buffer per 5 mL of
human sample during 15 min at RT with rotation. Sample was then
filtered through a 70 μm cell strainer, centrifuged, and stained for
30 min on ice with the following combination of conjugated antibodies
at a concentration of 1/100 except anti-Lin (3ul/test- test
25 × 10^6cells): BV510 labeled anti-Lin (including CD3, CD10, CD19,
CD45 and CD64), BV421 labeled anti-CD235, BV421 labeled anti- CD45,
FITC labeled anti-CD31, APC-Cy7 labeled anti-CD9, PE labeled anti-CD146
and PerCP- Cy5.5 labeled anti-CD271. Dead cells and debris were firstly
excluded by FSC, SSC and adding 10 μL of TO-PRO-3. BM niche populations
were prospectively isolated based on the following immunophenotype:
ECs: TO-PRO-3^-/Lin^−/CD45^-/CD235^-/CD9^+/CD31^+ and MSCs:
TO-PRO-3^-/Lin^−/CD45^-/CD235^-/CD31^-/CD271^+/CD146^+/−. FACS sorting
was performed on a BD FACSAria II sorter, sorted BM niche cells were
collected in PBS 1× and 0.05% Ultra-Pure BSA and cell viability was
determined using Nexcelom Cellometer. Data were analyzed by FlowJo
(version 10.7.1) software.
Profiling by single-cell RNA-sequencing (scRNA-seq)
scRNA-seq was performed using the Single Cell 3′ Reagent Kits v3.1 (10X
Genomics) according to the manufacturer’s instructions. For human
samples, endothelial and mesenchymal cells were pooled before scRNA-seq
was performed. Approximately 15,000 cells were loaded at a
concentration of 1,000 cells/μL on a Chromium Controller instrument
(10X Genomics) to generate single-cell gel bead-in-emulsions (GEMs). In
this step, each cell was encapsulated with primers containing a fixed
Illumina Read 1 sequence, followed by a cell-identifying 16 bp 10X
barcode, a 10 bp Unique Molecular Identifier (UMI) and a poly-dT
sequence. A subsequent reverse transcription yielded full-length,
barcoded cDNA. This cDNA was then released from the GEMs, PCR-amplified
and purified with magnetic beads (SPRIselect, Beckman Coulter).
Enzymatic Fragmentation and Size Selection was used to optimize cDNA
size prior to library construction. Illumina adaptor sequences were
added, and the resulting library was amplified via end repair, A-
tailing, adaptor ligation and PCR. Libraries quality control and
quantification was performed using Qubit 3.0 Fluorometer (Life
Technologies) and Agilent’s 4200 TapeStation System (Agilent),
respectively. Sequencing was performed in a NextSeq500 (Illumina) (Read
1: 26 cycles, i7 Index: 8 cycles, Read 2: 49 cycles) at an average
depth of 60,000 reads/cell in mice and 30,000 reads/cell in human.
Single-cell RNA-seq analysis of mouse samples
See extended details and code in the following Github:
[314]https://github.com/TranslationalBioinformaticsUnit/BMN_characteriz
ation.
Sample selection
Sample [315]GSM3674224, [316]GSM3674225, [317]GSM3674226,
[318]GSM3674227, [319]GSM3674228, [320]GSM3674229 from [321]GSE128423
by Baryawno, sample [322]GSM2915575, [323]GSM2915576, [324]GSM2915577
from [325]GSE108891 by Tikhonova and one in-house mouse bone marrow
niche sample was included in this analysis.
Filtering
The single cell analysis of mice samples analysis was performed using R
(version 4.0.3 for human, 3.6.3 for mouse) and Seurat (version 4.0.0
for human, 3.2.3 for mouse)([326]Stuart et al., 2019).Three bone marrow
niche samples were filtered individually based on the 10^th and 90^th
quantile of number of features and counts. Cells with more than 5%
mitochondrial genes were also removed. Each dataset was normalized
using SCTransform function ([327]Hafemeister and Satija, 2019) from
Seurat package separately.
Pairwise integration and selection of the target population
In-house dataset and Baryawno were integrated with Tikhonova separately
using IntegrateData function from Seurat (version 3.2.3.). Using as a
reference the annotation from Tikhonova dataset, cells that aligned
with LEPR+ cells and VE-Cad+ cells were annotated as MSC and EC
respectively. MSC-like cells and EC-like cells from different datasets
were normalized again and integrated using Seurat without further
filtering.
Clustering
After filtering and quality control, a divide-and-conquer strategy was
applied to the clustering of mouse ECs and MSCs separately. Firstly,
following integration, dimension reduction with principal component
analysis (PCA), data visualization with Uniform Manifold Approximation
and Projection (UMAP), computation of K-nearest neighbors and an
initial Louvain clustering using resolution of 1 were performed as a
reference of high-resolution limit. Secondly, IKAP ([328]Chen et al.,
2019b) was applied to each integrated dataset as “level 1” clustering
([329]Figure 2B). Each cluster from level 1 was then compare with the
high-resolution reference. The cluster from level 1 was further divided
using IKAP to level 2 if the cluster were far from cluster limit. The
process would be repeated until at least one cluster reach cluster
limit.
Cluster evaluation
To evaluate the stability of these clusters, a bootstrapping strategy
was adopted ([330]Tasic et al., 2016). The script was rewritten for our
pipeline, and we further defined two metrics: (a) recall per cell and
(b) #Correct (details below) for each cluster to obtain better
demonstration and quantification of the cluster robustness.
The strategy was conducted in a pairwise manner with basic steps as
follow:
* 1.
Select two clusters, randomly split the clusters to five equal
groups and use one group of cells (20%) as testing dataset.
* 2.
Identify the set of differentially expressed genes (DEGs) between
the 2 clusters using the Wilcoxon Rank Sum test ([331]Wilcoxon,
1945).
* 3.
Train a random forest classifier with 20 genes selecting the top 10
DEGs from each cluster based on average log2 fold change.
* 4.
Applied the classifier to the 20% testing dataset.
* 5.
Repeat step1-4 for five times for different groups such that each
cell in these two clusters was classified once.
* 6.
Repeat step1-5 nine more times.
* 7.
Repeat step1-6 for all cluster pairs
There are three types of results that can be summarized from this
bootstrapping strategy:
* 1)
Dominant cluster identification: A dominant cluster is the cluster
to which the cell is assigned for more than half the runs.
* 2)
Number of Correct Dominance Assignment (#Correct): The sum of times
the dominant cluster matches the original cluster (true positive)
for a cell across all cluster pair comparisons ([332]Figures S2E
and S2H).
* 3)
Recall Per Cell: The proportion of correct assignment (positive
result) to its original cluster in all runs from all comparisons
([333]Figures S2D and S2G).
Clusters where more than 50% of the cells has been “incorrectly”
assigned robustly at least once to another dominant cluster will be
considered unstable ([334]Figures S2F and S2H, cluster A2.2). The cells
of such cluster will be assigned to other clusters (see dominant plots,
[335]Figure S9 as an example) using – to that end - a random forest
classifier as described before.
To evaluate the mixing of three dataset in each cluster after the
integration, Shannon entropy was calculated. For each cell
[MATH: j :MATH]
in a cluster, the probability of them coming from any of the datasets
(d = 1,2,3) is denoted as
[MATH: pjd
:MATH]
, the entropy for this cell was calculated as
[MATH: Hj=−∑<
mi>d=13pjdlogpjd :MATH]
.
Gene set analysis
After clustering, DEGs of each cluster were identified using a Wilcoxon
Rank Sum test. For each cluster, an over-representation analysis of GO
gene set enrichment analysis was conducted using the top 30, 50 and 100
DEGs based on the average log transformed fold changes using
clusterProfiler (version 3.18.1)([336]Yu et al., 2012). Additionally,
to further unveil the specialization of those subclusters, gene sets
were also computed within the sub-divisions of each cluster at the last
clustering level with the top 30, 50, 100 DEGs. Final annotation of
each cluster was manually assigned based on conserved gene terms within
all the analysis.
Added value analysis
Added value 1
Comparing the DEGs defined by individual dataset and integrated
dataset. The individual datasets were normalized with SCTransform and
DEGs were identified within top 3000 most variable genes using a
Wilcoxon Rank Sum test. For integrated dataset, DEGs were identified
within top 3000 most variable genes from integrated assay using a
Wilcoxon Rank Sum test. False negative and false positive rate were
calculated by comparing the DEGs identified by integrated dataset and
individual dataset ([337]Figures 3E and 3F).
Added value 2
Cluster stability evaluation for individual dataset. To understand if
the clusters identified from three datasets can remain stable within a
single dataset or not, the bootstrapping strategy was applied to each
dataset with the annotation identified by the integrated dataset.
Added value 3
Comparing cluster identified by a single dataset. To further understand
if the clusters can be identified by one dataset only, Baryawno dataset
was used as an example considering its large cell populations. The same
pipeline from normalization to bootstrapping was applied to this
dataset and the clusters identified from this single dataset was
compared with the clusters identified by three datasets using Jaccard
index.
Added value 4
Application of cluster identity in a separate dataset ([338]Baccin
et al., 2019). The clusters identified were used as reference to
annotate ECs and MSCs from Baccin data using singleR (version
1.4.1)([339]Aran et al., 2019).Normalized counts in “RNA” assay from
Seurat object were used for this analysis.
Single-cell RNA-seq analysis of human samples
Preprocessing of sequencing data: Preprocessing of single-cell RNA-seq
data for each in house sample were conducted by CellRanger count from
Cell Ranger (version 6.0.1) using reference genome GRCh38.
Sample filtering
The single cell analysis of human analysis was performed as described
in before except for human cells with more than 10% mitochondrial genes
were also removed. Because the exploratory data analysis revealed
potential contamination of B cells, we applied an additional filter:
cells with more than 10% reads mapped to immunoglobin genes were
excluded from downstream analysis.
Integration
After filtering, each sample was normalized using SCTransform and
integrated using 3000 most variable features using Seurat. Following
integration, dimension reduction with PCA, data visualization with
UMAP, computation of K-nearest neighbors with 20 dimensions and
clustering using resolution of 0.4 were performed.
Select EC and MSC (without further integration)
Clusters were annotated based on biological insights on markers.
Cluster 11 were identified as mesenchymal and cluster 1, 6 were
identified as endothelial. During the exploratory analysis, human EC
cells were subclustered at resolution 0.4 and one of the clusters
identified was further filtered for the downstream analysis because the
cells in the cluster were not expressing EC marker genes. Several
outliers from human MSC cells were also removed.
Compare human MVGs with mouse DEGs
3000 MVGs for human EC and MSC were identified using “RNA” assay and
these genes were scaled in “integrated” assay resulting in 932 human EC
specific MVGs and 976 human MSC specific MVGs. The MVGs from human EC
or MSC were compared with DEGs from each mouse cluster. The enrichment
score for a given cluster i was defined as the ratio between “the
number of genes shared between human MVGs and mouse DEGs from cluster
i” and “the expected number of genes”, where the later was computed as
follows:
[MATH: MVGhuman∩MVGmouseMVGmouse×li :MATH]
Where
[MATH: li :MATH]
is the number of DEGs from mouse cluster i.
SingleR analysis between mouse and human
To annotate human cells using mouse clusters as reference the singleR
tool (version 1.4.1)([340]Aran et al., 2019) was utilized for ECs and
MSCs separately. Cell type specific MVGs with expression values in
“integrated” assay from Seurat object were used for this analysis.
Quantification and statistic analysis
Statistical analysis in this study (t-test and comparing proportions)
were conducted using R.
Ethics approval
All animal experiments were performed in accordance with national and
institutional guidelines and procedures were approved by the Ethical
Committee for Animal Testing of the University of Navarra.
Acknowledging the principles of 3Rs (Replacement, Reduction and
Refinement), all mice used in this study were from mice that were
euthanized by cervical dislocation as parts of other ongoing ethically
approved experiments.
The human sample collection and research conducted in this study were
approved by the Research Ethics Committee of the University of Navarra.
All the protocols used in this study were in strict compliance with the
legal and ethical regulations.
Acknowledgments