Abstract Hematopoietic Stem/Progenitor cells (HSPCs) are endowed with the role of maintaining a diverse pool of blood cells throughout the human life. Despite recent efforts, the nature of the early cell fate decisions remains contentious. Using single-cell RNA-Seq, we show that existing approaches to stratify bone marrow CD34+ cells reveal a hierarchically-structured transcriptional landscape of hematopoietic differentiation. Still, this landscape misses important early fate decisions. We here provide a broader transcriptional profiling of bone marrow lineage negative hematopoietic progenitors that recovers a key missing branchpoint into basophils and expands our understanding of the underlying structure of early adult human haematopoiesis. We also show that this map has strong similarities in topology and gene expression to that found in mouse. Finally, we identify the sialomucin CD164, as a reliable marker for the earliest branches of HSPCs specification and we showed how its use can foster the design of alternative transplantation cell products. Subject terms: Computational biology and bioinformatics, Haematopoietic stem cells, Stem-cell differentiation __________________________________________________________________ Human Hematopoietic stem and progenitor cells (HSPCs) are commonly defined by CD34 expression. Here, the authors map single-cell RNA states both inside and outside the CD34 compartment, uncovering previously unappreciated branchpoints and validating CD164 as an efficient marker for early HSPCs. Introduction In humans, there have been conflicting proposals for the hierarchical relationships linking different hematopoietic progenitors^[40]1–[41]7. In the conventional depiction of human haematopoiesis, supported by lineage-tracing studies in the mouse^[42]8, the earliest branching splits lymphoid vs myelo/erythroid fate commitment. Conversely, in a recent challenge of the classical view, it has been suggested that multipotent progenitors could undergo a very early fate decision towards the megakaryocyte lineage followed by a single step-wise transition to either erythroid, myeloid, or lymphoid commitment^[43]9. The advent of single-cell RNA sequencing (scRNA-Seq) has not only created an opportunity to improve our understanding of the nature of human haematopoiesis through the study of transcriptional single-cell states^[44]10–[45]12, but also generated conflicting observations. Initial use of this technology in humans led to an alternative view that early haematopoiesis is composed by a continuum of low-primed undifferentiated haematopoietic stem and progenitor cells (CLOUD-HSPCs) from which unilineage-restricted cells emerge^[46]10. Recently, scRNA-Seq data combined with assays of chromatin accessibility supported instead the notion of a structured hierarchy, revealing a variegated hematopoietic landscape^[47]13, the existence of lineage-biased stem cells in mice^[48]14,[49]15 and of different stages of human lymphoid commitment in humans^[50]16,[51]17. Human HSPCs are commonly identified by expression of the antigen CD34^[52]18. CD34+ cells are heterogeneous, and there are ongoing efforts to classify their substructure by immunophenotyping and according to their differentiation and in vivo survival potential^[53]5. The CD34+ cell population structure is unresolved, with recent studies showing that the current immunophenotypically defined CD34+ subsets could be more heterogeneous than previously thought^[54]9,[55]19. A possible reason for the lack of resolution is that enrichment methods for CD34+ cells may bias the representation of cell states during early hematopoietic commitment, as the CD34 marker is downregulated at different rates along commitment to different cell fates^[56]20,[57]21. In this regard, one should note that previous single-cell studies on human hematopoiesis focused exclusively on the whole CD34+ population (comprising both Lin− and Lin+ cells)^[58]11, or on in silico modeling of the fate commitment of the CD34+ fraction containing the least differentiated HSPCs^[59]10. We here aim at providing insights on the population structure of early hematopoietic commitment, by profiling human HSPCs with high-throughput scRNA-Seq^[60]22,[61]23. Differently from previous works, in the present study we not only isolate the immature cells expressing the CD34 antigen, but we also extend our analysis to the whole bone marrow (BM) fraction lacking the main markers of terminal differentiation (Lineage negative, Lin− cells). The resulting scRNA maps provide a comprehensive transcriptional snapshot of the early human hematopoietic cell fates, shedding light on the origin of the basophil branching and on a previously unappreciated surface marker for fractionating the HSPCs cell product. Results Generating a high-resolution scRNA map of CD34+ progenitors To establish a reference data set and to address the heterogeneity and fate potential of the known CD34+ subsets, our first investigations aimed at mapping at high-resolution the single-cell transcriptional states of cells commonly defined as human HSPCs (Fig. [62]1a). To this goal, we separated CD34+ cells purified by magnetic beads selection into seven subpopulations^[63]5, marking cells of differing fate potential (Fig. [64]1b) and tagged and sequenced the transcriptome of 6011 single cells (Supplementary Fig. [65]1a and Supplementary Table [66]1). We then used the scRNA-Seq data to infer the structure of cell states in high-dimensional gene expression space (Fig. [67]1c). We applied a visualization method previously developed for mouse hematopoietic progenitors^[68]24, whereby each cell represents a graph node, with graph edges linking nearest neighbor cells. The scRNA-Seq graph, visualized using SPRING force-directed layout^[69]25, shows a hierarchical, tree-like continuum of states, with branches that terminate at cells expressing recognizable transcriptional signatures of lineage commitment before the expression of final maturation markers (Fig. [70]1c, d) (megakaryocytes (Meg), erythroid cells (E), granulocytes (G), dendritic cells (DC), lymphoid cells (Ly1-2)). The structure of the single-cell data broadly partitions based on immunophenotypic subpopulations, but, significantly and in line with recent suggestions^[71]9, we observed that previously defined HSPC subpopulations hide substantial transcriptional heterogeneity (Supplementary Fig. [72]2a). Fig. 1. [73]Fig. 1 [74]Open in a new tab Experimental workflow and transcriptional map for human HSPCs. a Schematic for experimental design and workflow of data analysis. Two experiments have been performed on two separate healthy donors to generate two single-cell transcriptome maps (MNC, mononuclear cells; PBA, population balance analysis; GEA, Gene expression analysis). b Gating strategy used for the FACS sorting of seven HSPC subsets from magnetic beads purified CD34+ cells of a healthy donor BM (HSC, hematopoietic stem cells; MPP, multipotent progenitors; MLP, multi-lymphoid progenitors; Pre-B/NK, Pre-B lymphocytes/natural killer cells; MEP, megakaryocyte-erythroid progenitors; CMP, common myeloid progenitors; GMP, granulocyte–monocyte progenitors). c SPRING plot of the seven HSPCs single-cell transcriptomes. Each point is one cell. Labels at the edges represent the transcriptional states associated to early lineage commitment (Meg, megakaryocytes; E, erythroid cells; G, granulocytes; DC, dendritic cells; Ly1/Ly2, lymphoid B, T, NK cells). Color legend as in b. d Representative gene expression maps of lineage defining genes (PLEK, Meg; HBB, E; MPO, G; SPIB, DC; CD79A, and DNTT, Ly1/2). e Classification of individual cells into homogenous transcriptional groups numbered from 1 to 11, based on inferred principal trajectories (Supplementary Fig. [75]3a for details). f Predicted hierarchy based on two steps PBA. g Heatmap showing the expression average in groups shown in e for statistically significant genes coding for CD markers (likelihood ratio test [LRT] adjusted p value < 0.05). h Gene expression maps of CD34 and CD164 Our scRNA-Seq map of CD34+ subpopulations suggests that HSPCs do not undergo a single-step transition from CLOUD-HSPCs to unilineage states. Instead, they form a structured hierarchy (Fig. [76]1c). The earliest fate split separates erythroid–megakaryocyte progenitors from lymphoid–myeloid progenitors (LMPs), which separate further into lymphoid, DC and granulocytic progenitors. This hierarchy is highlighted by both inferred transcriptional trajectories (Fig. [77]1e and Supplementary Fig. [78]3a) and formal high-dimensional analysis of graph structure using the population balance analysis (PBA) algorithm^[79]24 (Fig. [80]1f)^[81]24. We conclude that human HSPCs are more organized than recently hypothesized and show more structure than appreciated by classical immunophenotyping. Extending the scRNA profiling to all BM progenitors In the 1980 s, the wide adoption of monoclonal antibodies for immunophenotyping revealed that the CD34 antigen is an effective marker to isolate immature HSPCs from humans^[82]18. Since then, efforts have been made to define the hierarchical structure of HSPCs purified from immunomagnetic-selected CD34+ cells, under the assumption that this cell population effectively captures all early fate choices. Although our above analysis supports such efforts, we reasoned that a focus on CD34+ cells purified with magnetic beads enrichment might provide an incomplete view of the earliest branching events in haematopoiesis. We noted, for example, that branches towards basophils/eosinophils/mast cells and monocytes commitments were missing in our initial scRNA-Seq analysis of CD34 cells, despite these appearing as early events in mouse haematopoiesis^[83]24. In addition, many cells negative for mature lineage markers in human BM are CD34^low/− and could account for additional transitional states at which CD34 expression is rapidly downregulated, thus greatly reducing their probability of capture. Therefore, to generate a complete landscape of early haematopoiesis, we extended our analysis to encompass human CD34^low and CD34− cells. To this aim, we collected from a second healthy donor four fractions of BM Lin− cells, covering different degrees of maturation (Fig. [84]2a). The graded fluorescence-activated cell (FACS) sorting used in this analysis corrects for expansion of cells as they differentiate, allowing examination of early states alongside later ones that comprise the vast majority of Lin− progenitors. In fractionating the cells by maturity, we made use of a cell surface marker, CD164, that we identified from the initial data set as expressed by cells that are multipotent until just beyond the first E/Meg–LMP branchpoint (Fig. [85]1g, h). This fractionation strategy allowed us to preserve resolution of the single-cell events of the more-primitive compartments, whereas at the same time maintaining a full representation of the late cell fate branching (Fig. [86]2b; Supplementary Figs. [87]1b, [88]2b). Fig. 2. [89]Fig. 2 [90]Open in a new tab Human Lin− compartment investigation by means of CD34/CD164 fractionating. a Gating strategy for the FACS sorting of four subsets inside the Lin—fraction of a healthy donor BM, according to CD34 and CD164 expression (left panels). Relative contribution of CD71+ progenitors is shown in the right panels. b SPRING plot of the four Lin−CD34/CD164 subsets single-cell transcriptomes. Each point is one cell. Labels at the edges represent the transcriptional states associated to early lineage commitment (P, early progenitor cells; Meg, megakaryocytes; E, erythroid cells; BaP, basophil progenitors; N, neutrophils; M, monocytes; DC, dendritic cells; Ly-T/B/NK, lymphoid T/B/NK cells). Color legend as in a. Gene expression maps are available in Supplementary Fig. [91]4. c Predicted hierarchy based on two steps PBA. d Classification of individual cells into homogenous transcriptional groups numbered from 1 to 15, based on inferred principal trajectories. Solid lines show results based on final converged iteration (Supplementary Fig. [92]3b for details). Dashed lines added manually to highlight a potential additional trajectory not present in final iteration and inferred by visual inspection (DC-M). e Gene dynamics associated to branching and fate decisions. Plots on the left, branching and groups; mirror heatmaps, expression of statistically significant genes differentially expressed along each branch pseudotime (LRT adjusted p value < 0.05). Plots on the right, a selection of three transcription factors differentially expressed along each branch (LRT adjusted p value < 0.05). f Projection of the transcriptional states of the seven HSPCs onto the Lin−CD34/CD164 map As predicted, the transcriptional map of the Lin− fraction, derived from the high-throughput clustering of 15,401 single cells (Fig. [93]2b; Supplementary Fig. [94]4 and Supplementary Table [95]1), revealed important early features that were missing from the analysis of the immune-selected CD34+ population. Using the same graph-based technique as for CD34+ cells, we could now identify later cell fate decisions. Monocyte progenitors seem to emerge from a common neutrophil/monocyte precursor later in the myeloid commitment and after the branching decision towards DC progenitors, with a possible contribution from DC progenitors as recently shown in the mouse^[96]24,[97]26. These data also suggest that the identity of the remaining CD34− Lin− cells consists mostly of late neutrophil progenitors, and of a continuum of differentiating states towards erythroid commitment. Our results could be formalized, on a computational basis, by both high dimensions (using PBA algorithm, Fig. [98]2c) and inferred transcriptional trajectories (Fig. [99]2d and Supplementary Fig. [100]3b) and were confirmed upon analyzing the data with an independent method (Diffusion Maps^[101]27, Supplementary Fig. [102]5, [103]6) that does not rely on a limited amount of k-nearest neighbors (kNN) for data-embedding calculation. To generate a resource for further studies, we investigated the association between gene expression dynamics and cells progression along the estimated differentiation paths. We identified putative transcriptional switches occurring during early hematopoietic cell fate choices and genes exhibiting significant variations during lineage commitments (Fig. [104]2e; Supplementary Fig. [105]7 and Supplementary Information for the complete lists). This analysis contains valuable information for in vitro reprogramming efforts and for investigations into the origin of blood cell differentiation disorders and cancers. To understand how the enrichment of CD34+ subsets could limit our view of early haematopoiesis, we projected the CD34+ HSPCs subpopulations onto the Lin− state map (Fig. [106]2f). The analysis confirmed that large portions of the Lin− map are strongly under-represented upon the magnetic pulldown of the CD34+ population (namely the ones identifying basophils, monocytes progenitors, and the stages of late erythroid differentiation). This supports the concept that the Lin− population structure provides a more complete view of key cell fate decisions along human hematopoietic commitment and suggests that, for a complete classification of HSPCs, analyses should be performed on FACS-sorted CD34+, CD34^low and CD34− compartments. Finally, with this projection we could appreciate the heterogeneous nature of the currently defined HSPC subsets, showing that they can be further fractionated into distinct and more homogenous transcriptional states (Supplementary Fig. [107]8). Exploring the origin of the human basophilic branch The most-notable result emerging from the exploration of our BM Lin− map was the identification of a branch toward cells carrying a transcriptional profile of early basophils specification. Strikingly, this class of basophils progenitors (BaP) was found to associate with erythroid and megakaryocyte fates and not with granulocytes precursors. Our data, generated on adult human BM, align with and expand on preliminary observations done in human cord blood CD34+ cells^[108]11,[109]28, and in murine haematopoiesis^[110]29. To elaborate on this observation, we computationally projected the Basophil branch of our BM Lin− map onto the Lin− HSPC map to identify which, among the HSPC single cell states, had the highest scRNA similarity to this branch. The topological origin of the early basophil cell specification in the HSPC map was in striking accordance with what observed in the BM Lin− map and the highest level of similarity was detected with respect to the CD135− progenitors with known megakaryo–erythroid potential (MEP) (Fig. [111]3a). Building on these results, we thus designed and conducted a series of in vitro differentiation assays starting from FACS sorting Lin−CD34+ cells into CD135+ (FLT3+) (by definition containing common myeloid progenitors (CMP) and granulocyte–monocyte progenitors; GMP) and CD135−(FLT3−) (by definition containing MEP) cells (Fig. [112]3b–d and Supplementary Fig. [113]9c). These two groups of cells were separately put in culture in myeloid-, megakaryocytes (MK)-, and basophil-differentiating conditions under the hypothesis that if basophils are generated by CMP or GMP (as suggested by the classical model of haematopoiesis) the CD135+ fraction should be the one capable of differentiating into basophils after culture. As reported in Fig. [114]3, the Lin−CD34+4+CD135− and Lin−CD34+CD135+ populations had, as expected, specific growth preferences toward MK (the former) and myeloid (the