Abstract

Background

   Classic dendritic cells (cDCs) play a central role in the immune system
   by processing and presenting antigens to activate T cells, and consist
   of two major subsets: CD141^+ cDC (cDC1) and CD1c^+ cDC (cDC2). A
   population of migratory precursor cells, the pre-cDCs, is the immediate
   precursors to both cDC subsets. Previous studies showed that there were
   two pre-committed pre-cDC subpopulations. However, the key molecular
   drivers of pre-commitment in human pre-cDCs were not investigated.

Results

   To identify the key molecular drivers for pre-commitment in human
   pre-cDCs, we performed single cell RNA sequencing (RNA-Seq) of two cDC
   subsets and pre-cDCs, and bulk RNA-Seq of pre-cDCs and cDCs from human
   peripheral blood. We found that pre-DC subpopulations cannot be
   separated by either variable genes within pre-cDCs or differentially
   expressed genes between cDC1 and cDC2. In contrast, they were separated
   by 16 transcription factors that are themselves differentially
   expressed or have regulated targets enriched in the differentially
   expressed genes between bulk cDC1 and cDC2, with one subpopulation
   close to cDC1 and the other close to cDC2. More importantly, these two
   pre-cDC sub-populations are correlated with ratio of IRF8 to IRF4
   expression level more than their individual expression level. We also
   verified these findings using three recently published datasets.

Conclusions

   In this study, we demonstrate that single cell transcriptome profiling
   can reveal pre-cDCs differentiation map, and our results suggest the
   concept that combinatorial dose of transcription factors determines
   cell differentiation fate.

Electronic supplementary material

   The online version of this article (10.1186/s12860-019-0199-y) contains
   supplementary material, which is available to authorized users.

   Keywords: Dendritic cell, Cell differentiation, Transcriptional factor,
   Single cell RNA-Seq

Background

   The hematopoietic system is one of the best model systems for the study
   of lineage differentiation and diversification. One type of
   hematopoietic cells, dendritic cells (DCs), plays a central regulatory
   role in the immune system. It detects pathogen-associated molecular
   pattern (PAMP) signal or danger-associated molecular pattern (DAMP)
   signal, captures antigens from self or invaders, processes and presents
   them to activate T cells, and sustains the memory adaptive immune
   response and Tregs, thereby bridging innate and adaptive immune
   response [[39]1]. The functional complexity of DCs is reflected in
   their diversity. Conventional dendritic cells (cDCs) in humans that
   activate T cells are comprised of two subsets with distinct phenotypes
   and functions, namely, CD141^+ cDCs (cDC1) and CD1c^+ cDCs (cDC2).
   Specifically, human cDC1 and their equivalent in mouse, CD8^+ cDCs,
   specialize in activating CD8 T cell response [[40]2], whereas human
   cDC2 and their murine counterpart CD11b^+ cDCs, specialize in
   activating CD4 T cell response [[41]3]. Both cDC1 and cDC2 descend from
   migratory dendritic cell precursor cells, or pre-cDCs, which arise from
   common dendritic cell progenitor (CDP) in the bone marrow and egress to
   the periphery where they terminally differentiate into cDCs [[42]4].
   Several phenotypically distinct pre-cDC cell populations have been
   identified in human peripheral blood with potential to produce cDC1 and
   cDC2 [[43]5–[44]8].

   Single cell RNA sequencing (scRNA-Seq) studies not only led to the
   identification of a new pre-cDC population in human blood [[45]7], but
   also revealed that seemingly homogeneous pre-cDCs in mouse and human
   were heterogeneous and led to the discovery that pre-cDCs have two
   subpopulations pre-committed to cDC1 and cDC2, respectively [[46]5,
   [47]6, [48]9]. However, the identification and characterization of the
   pre-cDC subpopulations in these studies were not based on scRNA-Seq
   data, but mainly from prospective isolation of cells, followed by bulk
   RNA-Seq. It remains unclear whether we can use scRNA-Seq data directly
   to separate and identify pre-committed subpopulations. Additionally,
   the transcriptional program that drives pre-cDC pre-commitment is not
   well characterized.

   A number of important transcriptional factors have been shown critical
   in cDC development in general. cDC1 dependent transcriptional factors
   include IRF8 [[49]10], ID2 [[50]11], BATF3 [[51]10, [52]12], NFIL3
   [[53]13]. cDC2 dependent transcriptional factors include IRF4 [[54]14,
   [55]15], KLF4 [[56]16], ZEB2 [[57]17]. There are some transcription
   factors implicated in both cDC1 and cDC2 development, such as
   Ets-family transcriptional factor PU.1 (SPI1), ZBTB46, E2–2 (TCF4),
   STATs, IKZF1 (IKAROS) and Notch RBP-J [[58]18]. However, the recipes of
   essential transcription factors for cDC1 and cDC2 subset commitment in
   human pre-cDCs have not been investigated. Our recent studies indicate
   that the lineage program is established early in hematopoietic stem
   cells (HSCs) and multipotent progenitors (MPPs) and transmitted to
   progeny and strengthened during cell division, and suggest that the
   lineage program is correlated with and orchestrated by combinatorial
   dose of multiple transcription factors [[59]19]. How to identify such
   lineage program recipes as a commonality shared between distinct
   development stages has not yet been established.

   Here we report a single cell transcriptomics study of pre-cDCs to
   investigate core transcriptional program underlying cDC1 and cDC2
   lineages. We used the Fluidigm C1 platform to sequence mixed pre-cDCs,
   cDC1, and cDC2 cells, and developed a computational approach to find
   master regulator transcription factors that drive the pre-cDC
   differentiation process. We ask 1) whether single-cell transcriptome
   profiling can distinguish two cDC subsets and their immediate precursor
   pre-cDCs; 2) whether pre-cDCs are composed of distinct committed
   subpopulations that resemble terminally differentiated cDC cells; and
   3) which transcription factors drive the terminal differentiation of
   pre-cDCs. To answer these questions, we implemented a workflow as shown
   in Additional file [60]1: Figure S1 Specifically, we used
   multidimensional scaling (MDS) to infer cell types based on scRNA-Seq
   data and remove outliers. We then identified genes with highly variable
   expression in pre-cDC population and showed that they can be used to
   separate cDC1 and cDC2, indicating the heterogeneity of pre-cDCs is
   associated with cDC commitment. Finally, we identified transcription
   factors that are differentially expressed between bulk cDC1 and cDC2
   (indicated in red and blue) or have targets enriched among
   differentially expressed genes (indicated in olive), and showed that
   these are potential master regulators that underline pre-cDC
   heterogeneity and drive differentiation.

Results

Single cell global transcriptome can identify human pre-cDCs, cDC1s and cDC2s
from mixture independent of surface marker

   To determine whether single cell transcriptome can be compared across
   different experiments, and can distinguish pre-cDC, cDC1 and cDC2 from
   cell mixture without prior knowledge of cell identity, we isolated
   pre-cDC, cDC1 and cDC2 from peripheral blood as in Fig. [61]1a, and
   mixed them with different combinations in three batches and submitted
   each batch for single cell RNA-Seq using the Fluidigm C1 system. The
   pre-DCs here representing 0.001% of CD45^+ PBMC are
   Lin^−CD45RA^+CD115^−CD116^+CD117^+Flt3^+, distinct from pDC in cell
   surface expression of CD303 (Fig. [62]1a) and transcriptome (Additional
   file [63]1: Figure S2A) and lack pDC potential [[64]4]. They are
   phenotypically different from Lin^−CD33^+CD45RA^+CD123^+ pre-DCs from
   the work of See et al. [[65]6] and CD100^hiCD34^intFLT3^− Villani et
   al. [[66]7] (Additional file [67]1: Figure S2B).

Fig. 1.

   [68]Fig. 1
   [69]Open in a new tab

   Experiment design and sample summary. a Gating strategy for isolating
   pre-cDCs and cDC1s and cDC2s. b Sequencing batch experiments details

   To test whether scRNA-Seq data can separate DCs and pre-DCs, we mixed
   cDC1, cDC2 and pre-DCs prior to sequencing. Specifically, in our
   experiment design, batch 1 contained cDC1:cDC2 in a ratio of 1:1, batch
   2 contained only pre-cDC, and batch 3 contained pre-cDC: cDC1: cDC2 in
   a ratio of 2:1:1. The cells that passed a depth filter of 100,000 reads
   mapped to gene region comprise 59, 50 and 42 cells from batch 1, batch
   2 and batch 3, respectively (more details in Fig. [70]1b and quality
   control data in Additional file [71]1: Figure S3A-B). With these cells,
   we first calculated the pair-wise transcriptome distance using the
   global transcriptome of each cell (see [72]Methods), and generated a
   2-dimensionsional MDS plot of the distance matrix to visualize the
   transcriptomic similarities between the cells (Additional file [73]1:
   Figure S4A). The resulted plot revealed two groups of cells that are
   densely clustered, and a third group of cells sparsely scattered after
   performing k-means clustering on the MDS plot. These sparsely
   scattering cells in group 3 have significantly lower read depths than
   the other cells (Additional file [74]1: Figure S4B), and are thus
   removed from the downstream analysis as technical outliers. We further
   filtered the cells by mitochondrial reads percent < 30% and number of
   expressed genes > 1000, leading to 135 single cells. These single cells
   express most of the top 100 housekeeping genes [[75]20] and express few
   of the cell cycle genes (Reactome, [76]http://www.reactome.org),
   indicating that they are alive and not undergoing cell cycle
   (Additional file [77]1: Figure S4C). Among the 135 single cells with
   good RNA-Seq data quality, 52/54 cells from batch 1 (only containing
   cDC1 and cDC2) are clustered in cluster1, suggesting its identity of
   cDC cluster; 44/46 cells from batch 2 (only containing pre-cDCs) are
   clustered in cluster2 suggesting its identity of pre-cDCs cluster.
   Importantly, 19 and 15 cells in batch 3 are distributed in and blended
   with cluster1 (cDCs) and cluster2 (pre-cDCs), respectively
   (Fig. [78]2a), consistent with its cellular composition
   (cDC1:cDC2:pre-cDC, 1:1:2). This indicates that the batch effect is
   negligible so that cells can be compared cross experiments. We conclude
   that global transcriptome can separate cDCs from pre-cDCs without known
   identity.

Fig. 2.

   [79]Fig. 2
   [80]Open in a new tab

   Cell type identification of the single cells from mixture with MDS
   based on global transcriptome. a Multidimensional scaling (MDS) plot
   for the cells that passed quality control, batch1 (yellow, cDCs),
   batch2 (brown, pre-cDCs) and batch3 (black, mixture of cDCs and
   pre-cDCs), where the input distance was adjusted for dropout rate with
   SCDE. The cells can be separated into 2 clusters by k-means clustering:
   cluster1 for cDCs and cluster 2 for pre-cDCs. b The clustering pattern
   of cluster1 identified in panel a with MDS. Each cell is colored with
   binary expression of CD141(red), CD1c (blue), both (purple) or none
   (black). c Heatmap of the top 20 differentially expressed genes derived
   by SCDE by comparing the upper and lower groups of cDCs in panel b. The
   top dendrogram was generated with hierarchical clustering. The two
   column color bars represent batch and cell type information,
   respectively. Genes surround by black box are known key genes for cDC
   differentiation. d Transcriptional signature of pre-cDC, cDC1 and cDC2
   population. Three hundred eighty genes shown in rows were
   differentially expressed in at least two populations of pre-cDC, cDC1
   and cDC2 computed with ANOVA (false-discovery rate (FDR) < 0.05) and
   can be classified into 4 clusters(C1–4) indicated by the row slide
   color. Bulk expression of the three cell types for these genes after
   averaging over 3 biological replicates is also shown on the right.
   Right margin also shows the predicted upstream regulators and enriched
   pathways for each gene cluster. Single cells in the columns were
   ordered by the hierarchical clustering result based on the 380 genes

   We then ask whether single cell global transcriptome could distinguish
   cDC1 and cDC2. Using MDS, we further separated the 78 cells in group 1,
   i.e., putative cDCs, into two distinct clusters, which are correlated
   with cDC1 and cDC2, respectively, based on the expression of CD141 and
   CD1c transcripts (Fig. [81]2b). Using SCDE [[82]21], we identified top
   20 differentially expressed genes ranked by absolute value of Z-score
   between the two visible clusters of cDCs (Fig. [83]2c). The upper group
   of cells in Fig. [84]2b highly express CLEC9A and XCR1, indicating its
   identify of cDC1, whereas the lower group of cells in Fig. [85]1b
   highly express CD1C and ZEB2, indicating its identity of cDC2 (Fig.
   [86]2c). cDC cells from both batch 1 and batch 3 are distributed in two
   clusters (Fig. [87]2c). Additionally, signature analysis indicates the
   high consistency of cell type identity in the two clusters of single
   cells (Additional file [88]1: Figure S5). These results indicate that
   global transcriptome can separate cDC1 from cDC2 without known
   identity.

   To further verify the pre-cDC, cDC1 and cDC2 populations inferred from
   their global transcriptome with MDS, we identified 380 differentially
   expressed genes between at least one pair of cell populations in mean
   expression with one-way analysis of variance (ANOVA) under the
   criteria: absolute value of log[2] fold change > 1 and false-discovery
   rate (FDR) < 0.05. Hierarchical clustering of these differentially
   expressed genes on all the good single cells revealed 4 major clusters
   of genes (Fig. [89]2d, Additional file [90]2: Table S1). We used
   ConsensusPathDB-human(CPDB) [[91]22], an aggregation tool that
   integrate pathways from different sources for human, to infer pathways
   enriched within each gene cluster. Cluster 1 genes were highly
   expressed in pre-cDCs and low in cDCs, including key genes like SOX4
   and TCF4, enriched for development pathways like EGF-EGFR1 signaling
   pathway and mitotic pathway. Cluster 2 genes were highly expressed in
   cDC1, including CLEC9A, XCR1 and transcriptional factor IRF8, enriched
   for integrin signaling pathway and IFN-gamma pathway. Cluster 3 genes
   were highly expressed in cDC2s, including the surface marker CD1c and
   transcription factor CEBPB and ZEB2, enriched for IL1/4/5 signaling
   pathway and Toll-like receptor signaling pathway. Cluster 4 genes were
   highly expressed in both cDC1s and cDC2s, including VDR, CEBPD and
   HLA-D* genes, enriched for downstream TCR signaling, MHC class II
   antigen presentation and PD-1 signaling. Importantly, the expression
   pattern of these 380 genes was highly consistent between single cells
   and bulk populations (Fig. [92]2d), which confirms our inferred
   identities of the single cells from mixture and the good quality of our
   single cell data.

Genes highly variable in pre-cDCs are associated with cDC specification

   Recent studies have shown that pre-cDCs in mouse [[93]9] and human
   [[94]5] contain two distinct populations pre-committed to cDC1 and
   cDC2. We asked whether global transcriptome was able to separate
   pre-cDCs, using MDS. However, although global transcriptome could
   separate pre-DCs from cDCs, it was unable to separate pre-cDCs into two
   obvious subpopulations (Fig. [95]3a). We then set out to identify a
   subset of key genes that are associated with cDC specification in
   pre-cDCs. We hypothesize that if individual pre-cDCs are pre-committed
   to one of the two cDC lineages, the key genes should demonstrate more
   expression variability across individual pre-cDCs than background
   noise. We first estimated background noise with spike-ins, and then
   identified 842 variable genes that are 1) significantly more variable
   than background model (FDR < 0.1) and 2) exhibiting detectable mRNA
   (aligned reads number > 0) in at least 30 single cells (Fig. [96]3b and
   Additional file [97]2: Table S2). With MDS, expression of these 842
   variable genes identified and separated pre-cDC, cDC1 and cDC2 cells
   (Fig. [98]3c), with separation of cDC1 and cDC2 better than that using
   global transcriptome (Fig. [99]3a). By contrast, both a random
   selection of 842 genes that have high mean expression in pre-cDCs (Fig.
   [100]3d, Additional file [101]2: Table S3), and cell cycle genes (Fig.
   [102]3e, Additional file [103]2: Table S4) failed to separate cDC1 and
   cDC2. To strength the conclusion from random selected genes, we
   randomly selected 842 genes with high expression for 3000 rounds. We
   used rand index to measure the similarity of the clustering result
   based on the randomly selected genes and the inferred identities of DCs
   from Fig. [104]2. Figure [105]3f shows that clustering result from
   variable genes is much better than that from randomly selected genes,
   indicating the best distinguishing ability of variable genes in
   pre-cDCs (Fig. [106]3f). This suggests that the key genes associating
   with cDC specification are already expressed and demonstrate high
   variability among individual pre-cDCs. Of the 842 variable genes,
   transcription factors include ZEB2, IRF4 and NFIL3 known for cDC
   differentiation and CTCF and RBPJ known for chromatin state remodeling
   (Fig. [107]3b). CPDB pathway analysis indicated enrichment in TGF-beta
   signaling pathway(Wikipathways), integrin (INOH) and interferon
   signaling(Reactome), and predicted SPI1, TCF7, VDR, CIITA and IRF8 as
   upstream regulators (Enrichr) which highly connect with dendritic cell
   development [[108]23, [109]24] (Additional file [110]2: Table S2).

Fig. 3.

   [111]Fig. 3
   [112]Open in a new tab

   Highly variable genes in pre-cDCs population can separate DC subsets. a
   MDS plot of all the single cells based on global transcriptome. b
   Squared coefficient of variation (CV^2) was plotted against the mean of
   normalized read counts for each gene of 50 pre-cDCs in Batch 2. The
   solid blue curve denotes the fitted variance-mean dependence with ERCC
   spike-ins. The genes marked in red show higher expression variability
   than background/technical noise (measured with spike-ins, blue) by
   testing the null hypothesis that the coefficient of biological
   variation is less than 50% with FDR < 0.1. c MDS plot of all the single
   cells with the 842 biologically variable genes. d MDS plot of all the
   single cells with randomly 842 sampled genes without replacement from
   genes that have mean expression > 100 in panel A. e MDS plot of all the
   single cells with 496 cell cycle genes downloaded from Reactome
   ([113]http://www.reactome.org). f Histogram of rand index between the
   clustering result based on 842 sampled genes and the inferred cDC
   identities in Fig. [114]2. Larger rand index indicates higher
   consistence between two clustering results. Genes of high expression
   were randomly selected for 3000 rounds. The red dot indicates the rand
   index between the DC clusters based on the 842 variable genes and the
   inferred cDC identities

Differentially expressed TFs between DC subsets can reveal pre-cDC
differentiation trajectory

   Since global transcriptome and variable genes within pre-DC populations
   fail to reveal the heterogeneity of pre-cDCs, we hypothesize that genes
   representing the lineage determinant of two cDC subsets should be able
   to identify two pre-cDC subpopulations that demonstrate consistency
   with cDC1 and cDC2, respectively. We first derived 234 differentially
   expressed genes (BH p.adj < 0.05, absolute value of log2 fold change
   > 1) between bulk cDC1 and cDC2 as DC signature gene set. As expected,
   this gene list successfully separated the single cDC1s from single
   cDC2s, however, it failed to reveal subpopulations in single pre-cDC
   cells shown in the t-Distributed Stochastic Neighbor Embedding (t-SNE)
   map (Fig. [115]4a), indicating that signature genes of differentiated
   cDCs are not yet fully manifested in pre-cDCs.

Fig. 4.

   [116]Fig. 4
   [117]Open in a new tab

   Pre-cDC subpopulation pattern in human is implicated by differentially
   expressed TFs between cDC1 and cDC2. a t-SNE plot of all the single
   cells with the full set of differentially expressed genes between bulk
   cDC1 and cDC2. b Flowchart of how to identify the MR TFs. c Regulatory
   network of the 13 out of 16 MR TFs based on the database of
   transcription factor and target relationship in human hematopoietic
   lineages from Neph et al. [[118]25] complemented by String database
   [[119]26]. d t-SNE plot of all the single cells with MR TFs that are
   differentially expressed between bulk cDC1 and cDC2. The cells were
   clustered into two groups C1 and C2 by k-means clustering. e Expression
   level of IRF8, IRF4 and IRF8/IRF4 expression ratio for each single
   cell. The three rows represent IRF8, IRF4 and IRF8/IRF4, respectively.
   The columns represent pre-cDC, cDC2 and cDC1, respectively. For
   example, the IRF8 expression level from low to high in pre-cDCs is
   represented from blue, orange to red, and the other cDC single cells
   are colored in gray. f Violin plot of IRF8 expression, IRF4 expression
   and IRF8/IRF4 expression ratio in single cell populations: pre-cDC1,
   pre-cDC2, cDC1 and cDC2. P values indicated between pre-DC1 and pre-DC2
   and between cDC1 and cDC2 are from Wilcoxon Rank Sum test

   Given that the differentially expressed genes between cDC1 and cDC2
   cannot identify subpopulations of pre-cDCs, we then asked whether the
   TFs that regulate the differentially expressed genes and drive the
   differentiation in the pre-cDCs can identify the pre-cDC
   subpopulations. Using a procedure as shown in Fig. [120]4b, we inferred
   a list of 16 candidate master regulator (“MR”) TFs: ATF3, CEBPB, CEBPD,
   EHF, ELK3, FOSL2, FOXO3, HOXA7, IRF4, IRF8, JARID2, RARG, TCF7L2, THRB,
   ZEB2 and ZNF217 that either: (a) are significantly differentially
   expressed between bulk cDC1 and cDC2 (BH p.adj < 0.05, absolute value
   of log2 fold change > 1, Additional file [121]2: Table S5); or (b)
   regulate the differentially expressed genes and exhibit differential
   expression between bulk cDC1 and cDC2 with marginal significance
   (p-value < 0.05, absolute value of log2 fold change > 1, Additional
   file [122]2: Table S5). Thirteen out of the 16 MR TFs form a network of
   genetic regulation and protein interaction (Fig. [123]4c), when
   analyzed using String database [[124]26] and transcription factor and
   target database in human hematopoietic lineages from Neph et al.
   [[125]25]. Expression level of these MR TFs demonstrated reasonable
   variability among pre-cDC, cDC1 and cDC2 in bulk and single cells
   (Additional file [126]1: Figure S6A-B). Using these MR TFs as input to
   t-SNE, pre-cDCs were separated into two distinct groups (Fig. [127]4d)
   that closely clustered with cDC1 cells and cDC2 cells, respectively.
   Here, dim1 is about the lineage divergence of pre-cDCs and dim2 is
   about the increasing commitment to cDC1/cDC2, and IRF4 and IRF8 have
   the highest correlation with both dim1 and dim2. We refer the pre-cDC
   subpopulation closer to the cDC1 cells as pre-cDC1, and the other
   pre-cDC subpopulation closer to the cDC2 single cells as pre-cDC2 (Fig.
   [128]4d). In contrast, neither global transcriptome, highly variable
   genes in pre-cDCs, or 234 differentially expressed genes between DC
   subsets was able to reveal subpopulation of pre-cDCs by tSNE
   visualization (Additional file [129]1: Figure S6C). To verify the
   separation power of the MR TFs, we performed a pseudo time trajectory
   analysis with Monocle2 [[130]27]. In the resulting trajectory map,
   pre-DCs demonstrated differentiation branches with one branch committed
   to cDC1s, the other branch committed to cDC2s and a third group of
   likely less committed pre-cDCs at the bottom (Additional file [131]1:
   Figure S7A). For the sake of comparing this trajectory with t-SNE
   result, we flipped the two branches as shown in Additional file [132]1:
   Figure S7B. The pre-cDCs close to cDC1, cDC2 and those less committed
   at the bottom were encoded with boxes, triangles and asterisks,
   respectively. If we map this shape encoding into t-SNE plot, we can see
   most of the asterisks locate at the bottom, boxes locate at right and
   triangles locate at left (Additional file [133]1: Figure S7C), which
   means Monocle2, a very different analysis method confirms our finding
   from t-SNE. In contrast, Monocle2 analysis using variable genes failed
   to reveal trajectory of pre-DCs (Additional file [134]1: Figure S7D).
   This indicates that the candidate MR TFs, but not differentially
   expressed genes, variable genes or global transcriptome, are effective
   in separating the pre-cDC subpopulations.

   Among the 16 MR TFs, IRF8 and IRF4 were known to be specific to cDC1s
   and cDC2s, respectively [[135]10, [136]14, [137]15]. Indeed, our single
   RNAseq data indicates that expression of IRF8 was high in most cDC1s
   and low in most cDC2s (Fig. [138]4e) whereas expression of IRF4 was low
   in most cDC1s and high in most cDC2s (Fig. [139]4e). However, although
   single pre-cDC1 cells have relatively higher expression of IRF8 and
   relatively lower expression of IRF4 than those in pre-cDC2
   subpopulation (Fig. [140]4e), neither IRF8 nor IRF4 expression is
   mutually exclusive between two pre-cDC subpopulations. In particular,
   IRF8 has a bimodal expression pattern in pre-cDCs (lower in pre-cDC2
   cells and higher in pre-cDC1 cells) as opposed to its unimodal
   expression (higher in cDC1 and lower in cDC2) in cDCs as shown in Fig.
   [141]4f. Several single pre-cDC2 cells express IRF8, which seemingly
   contradicts their fates to cDC2. On the other hand, we observe that
   IRF8/IRF4 ratio is unimodal in pre-DC1 and pre-DC2, i.e., pre-cDC2
   cells exhibit lower IRF8/IRF4 ratio compared to that of pre-cDC1. In
   addition, IRF8/IRF4 ratio is consistently higher in pre-cDC1s than that
   of pre-cDC2s (P = 4.8e-6, Wilcoxon rank sum test, Fig. [142]4f).
   Therefore, the expression ratio of IRF8/IRF4, not the individual TF
   expression, correlates with pre-cDC’s commitment to respective cDC
   subset.

   To confirm the ability of MR TFs in defining pre-committed pre-cDC
   populations, we then applied our method to examine three recently
   published scRNA-Seq datasets of peripheral blood cDC1, cDC2 and
   pre-cDCs from Breton et al. [[143]5], Villani et al. [[144]7] and See
   et al. [[145]6]. (Additional file [146]1: Figure S8). For the dataset
   from Breton et al., these TFs can also separate cDC1 and cDC2, with
   pre-cDCs also showing pre-commitment pattern to either type of cDCs
   from with cluster C1 and C2 containing cDC2s and cDC2 pre-committed
   pre-cDCs and cluster C3 containing cDC1 and cDC1 pre-committed
   pre-cDCS. Additionally, the expression ratio of IRF8/IRF4 correlates
   with pre-cDC’s commitment to respective cDC subset rather than
   expression level of IRF8 or IRF4 alone (Additional file [147]1: Figure
   S8A). CD172a(SIRPA) is a surface marker that can distinguish the two
   pre-cDC subsets, with higher expression indicating pre-commitment to
   cDC2 and lower expression indicating pre-commitment to cDC1 [[148]5].
   Consistently, the pre-cDCs in cluster C1 and cluster C2(i.e., pre-DC2)
   contain more cells expressing CD172a than that of cluster 3(i.e.,
   pre-DC1). Similar observation was identified for the dataset from
   Villani et al. and See et al. [[149]6, [150]7]) (Additional file
   [151]1: Figure S8B-C). This supports that our list of MR TFs can
   separate single pre-cDCs into pre-committed pre-cDC1 and pre-cDC2.

Transcriptional signature of the two pre-cDC populations

   To further characterize the pre-cDC1s and pre-cDC2s, we used SCDE
   [[152]21] to derive the differentially expressed genes between the two
   subpopulations of single cells (p-value < 0.05, Additional file [153]2:
   Table S6 and Fig. [154]5a) and then performed pathway enrichment
   analysis with CPBD [[155]22]. Genes upregulated in the pre-cDC1
   subpopulation were enriched in RXR and RAR heterodimerization with
   other nuclear receptor pathway. Genes upregulated in the pre-cDC2
   subpopulation were enriched in Interleukin-1 related pathways (Fig.
   [156]5b).

Fig. 5.

   [157]Fig. 5
   [158]Open in a new tab

   Characterization of the pre-cDC subpopulations. a Heatmap showing genes
   that are differentially expressed between the two subpopulations of
   pre-cDCs. b Enriched pathways with genes upregulated in each pre-cDC
   subset

Discussion

   Previous studies [[159]5, [160]6, [161]9] showed human pre-cDC cells
   are heterogeneous and contain pre-committed subpopulations that
   correspond to the two major cDC subtypes. However, the key molecular
   drivers of pre-commitment in human pre-cDCs were not investigated. To
   address this question, we performed single cell and bulk RNA-Seq of two
   cDC subsets and pre-cDCs from human peripheral blood and developed a
   new approach that could identify the underlying molecular recipe for
   pre-cDCs pre-commitment to terminal cDCs. Specifically, we performed
   single cell RNA-sequencing analysis by mixing different cell types in
   the experimental design to demonstrate that global gene expression
   pattern can distinguish pre-cDCs from cDCs, and separate the two
   conventional cDC subsets (cDC1 and cDC2) independent of surface
   markers. We found that neither highly variable genes within pre-cDCs,
   nor differentially expressed genes between cDC1 and cDC2 could separate
   the pre-DC subsets. We reasoned that it is possible that the
   transcriptional program driving the pre-commitment has not been fully
   manifested in pre-cDC. Instead, we identified 16 candidate master
   regulators by searching for transcriptional factors: (a) that are
   differentially expressed between two cDC cell types, or (b) whose
   targets are enriched among differentially expressed genes between two
   cDC cell types. We showed that these candidate master regulators were
   able to separate pre-cDCs into two subpopulations, resembling cDC1s and
   cDC2s, respectively. Furthermore, these two pre-cDC subpopulations are
   more correlated with the ratio of IRF8 to IRF4 expression than their
   individual expression level. This suggests combinatorial dose of
   transcription factors determines fate decision. Finally, we confirmed
   these findings using two recently published scRNA-Seq datasets [[162]5,
   [163]7].

   Using clustering analysis of single cells transcriptome data, we can
   identify pre-cDC, cDC1 and cDC2 from cell mixture across different
   experiment batches and conclude 1) global transcriptome of pre-cDCs are
   distinct from their immediate progeny cDCs and 2) cDCs single cells
   themselves are separable on the whole single cell transcriptome without
   prior knowledge of cell identity independent of cell surface markers.
   However, although the global transcriptome separated pre-cDC, cDC1 and
   cDC2, it could not separate the two pre-cDC subpopulations. This is
   consistent with recent studies that were trying to identify a critical
   subset of genes instead of the global transcriptome to reveal cell
   population heterogeneity [[164]5, [165]6, [166]9, [167]28].

   In the signature gene set analysis for the three populations, single
   pre-cDCs demonstrate more precursor features (signature genes were
   enriched for EGF-EGFR1 signaling and mitotic pathway) than cDCs and cDC
   population shows more mature characteristics in terms of immune
   function. cDC1 signature genes highlight its role in IFN-gamma
   functional pathway; cDC1s not only induce Th1 cells to produce high
   concentrations of interferon-gamma [[168]29], but also respond to
   IFN-gamma, thereby serving as key regulator or perpetuator of Th1
   response in vivo. cDC2 signature genes are enriched for pathways such
   as TGF beta signaling and toll-like receptor signaling which also
   corresponds to previous finding [[169]30, [170]31].

   The finding that biologically variable genes in the pre-cDC population
   can separate the two cDCs subsets implies that although pre-cDCs are
   functionally immature in comparison to cDCs, the biologically variable
   genes in pre-cDC population include components that can distinguish
   transcriptional program of cDC1 and cDC2, indicating the
   transcriptional program towards the differentiation into cDC1 or cDC2
   is already initiated in pre-cDCs. However, using the full set of
   differentially expressed genes between cDC1 and cDC2, we were not able
   to separate the pre-committed pre-cDC subpopulations. We hypothesize
   that the commitment transcriptional program may not be fully
   implemented, therefore, we should examine whether the master regulators
   drive the differentiation process.

   With a list of inferred candidate MR TFs, we were able to identify two
   pre-cDC subpopulations with each being close to one of the cDC types.
   Pre-cDC2(close to cDC2) cells have upregulated pathways related with
   Interleukin-1 which is consistent with the cDC2 specific signature gene
   set enriched pathway analysis (IL1/4/5 signaling pathways were
   enriched); and pre-cDC1 (close to cDC1) cells have upregulated pathways
   related with RXR and RAR pathway, which was previously shown to promote
   DC antigen presentation, differentiation and survival [[171]32]. RXR
   and RAR are also well known to directly bind to AP-1, which is an
   essential protein to form a complex with IRF8 and BATF3 [[172]33,
   [173]34]. Our findings suggest a new role of retinoic acid in cDC1
   development. The pathway analysis also suggests that pre-cDC2 cells are
   more transcriptionally mature and express more cytokines than those
   (i.e., pre-cDC1) that pre-commit to cDC1 which may still need to
   undergo more development and differentiation steps towards to cDC1
   stage.

   Of these candidate MR TFs, functions of IRF8 have been studied in
   dendritic cells. Mutations in IRF8 could result in a complete lack of
   circulating dendritic cells or selective depletion of
   CD1c + circulating dendritic cells(cDC2s) [[174]35]. Maintaining of
   IRF8 high expression level is required for cDC1 identity [[175]10]. It
   is also found that IRF8 is strictly required for the survival of cDC1s
   [[176]36]. Interestingly, CEBPB forms a negative feedback loop with
   IRF8 when specifying monocyte-derived DC and pDC chromatin states
   [[177]37]. Consistently, they also show opposite expression direction
   in our human cDC1s and cDC2s (Fig. [178]4c). CEBPD belongs to the same
   family with CEBPB and can regulate IRF4 and IRF8 (Fig. [179]4c), which
   may also play a role in cDCs differentiation.

   The t-SNE analysis with MR TFs indicates the combinatorial dose of
   multiple TFs is a likely to orchestrate the lineage program. This is
   consistent with the dosage-dependent functions of TFs including IRF4
   and IRF8. In particular, both IRF4 and IRF8, due to their low affinity
   to interferon responsive element, must be recruited by other TFs to the
   DNA, and their function critically depends on forming protein-protein
   complexes [[180]34]. Our result is consistent with a model in which
   competition between the two transcriptional factors IRF8 and IRF4
   contributes most to the fate choices to cDC1 and cDC2 lineages. This is
   in line with the antagonism between IRF4 and IRF8 in the activated B
   cell differentiation into plasmablasts or undergoing affinity
   maturation in germinal centers [[181]38]. Future experiments, such as
   ATAC-Seq [[182]39] for profiling chromatin states, should test the
   effect of IRF4 and IRF8 dose changes on pre-DC1 and pre-DC2 lineage
   determination.

   In our experimental design, instead of sequencing each single cell type
   separately in each batch, we mixed two cDC subsets in the first batch
   and pre-cDCs and two cDC subsets in the third batch, respectively. This
   strategy can help bypass batch effect as a potential confounding factor
   when comparing different populations. With the rapid development of
   single cell sequencing techniques, sequencing large scale of cells
   together is becoming feasible: the Fluidigm C1 mRNA Seq HT IFC can
   capture up to 800 cells in a run [[183]40], microfluidic system can
   sequence scalable number of cells at low cost [[184]41] and Drop-seq
   can sequence thousands of cells each time [[185]42].

Conclusions

   In this study, we performed single cell RNA-Seq to study the
   differentiation process of pre-cDCs. We found that pre-cDCs
   subpopulations can be identified based on expression of a small number
   of core transcriptional factors. Additionally, the combinatorial dose
   of IRF4 and IRF8 had a stronger association with the fate decision than
   their individual expression level. This study suggests the concept that
   combinatorial dose of transcription factors determines cell
   differentiation fate.

Methods

Single cell capture and library preparation for RNA sequencing

Single cell isolation

   Human peripheral blood samples were purchased from New York Blood
   Center (New York) and processed 24-48 h post-collection. Fresh
   mononuclear cells were isolated from cord blood or peripheral blood by
   density centrifugation using Ficoll-Hypaque (Amersham Pharmacia
   Biotech, Piscataway, NJ).

   For isolation of cDC1, cDC2, and rare pre-cDCs from peripheral blood,
   an enrichment step was also performed prior to FACS sorting. In brief,
   mononuclear cells were incubated with antibodies against CD135 (4G8,
   PE, BD) and CD117 (A3C6E2, Biotin, Biolegend) for 40 min at 4 °C. After
   washing, antibody against PE (PE001, Biotin, Biolegend) was added and
   incubated for another 10 min at 4 °C. Following wash, CD117+ and CD135+
   cells were positively selected using anti-biotin MicroBeads and LS MACS
   magnetic columns (Miltenyi). For sorting pre-cDCs, enriched cells were
   stained for CD14 (TuK4, Qdot-655, Invitrogen), CD3 (OKT3, Brilliant
   Violet (BV) 650, Biolegend), CD19 (HIB19, BV650, Biolegend), CD56
   (HCD56, BV650, Biolegend), CD66b (G10F5, PerCP-Cy5.5, Biolegend), CD303
   (201A, PerCP-Cy5.5, Biolegend), CD1c (L161, APC-Cy7, Biolegend), CD141
   (M80, PE-Cy7, Biolegend), CD34 (581, AlexaFluor700, Biolegend), CD117
   (104D2, BV421, Biolegend), CD135 (4G8, PE, BD), CD45RA (HI100, BV510,
   Biolegend), CD116 (4H1, FITC, Biolegend) and CD115 (9-4D2-1E4, APC,
   Biolegend) for 40 min on ice. Pre-cDCs were isolated as
   Lin(CD3/19/56/14)-
   Granulocyte(CD66b)-pDC(CD303)-cDC(CD1c/CD141)-CD34-CD117 + CD135 + SSC
   lo CD116 + CD115-CD45RA+ cells. cDC1 were isolated as
   Lin(CD3/19/56/14)- Granulocyte(CD66b)-pDC(CD303)-CD141+ cells. cDC2
   were isolated as Lin(CD3/19/56/14)-
   Granulocyte(CD66b)-pDC(CD303)-CD1c + cells.

   Collected cells were washed and resuspended in 0.1% BSA (Fisher), 2 mM
   EDTA, PBS. Purified populations were then mixed at specific ratios for
   different sequencing batches. For batch 1, cDC1 and cDC2 were mixed at
   a 1:1 ratio; for batch 2, pre-cDCs were not mixed; for batch 3,
   pre-cDCs, cDC1 and cDC2 were mixed at a 2:1:1 ratio, respectively. All
   mixes had a final cellular concentration of 200 total cells/ul and 5ul
   of sample mix was loaded on Fluidigm C1.

Library preparation for RNA sequencing

Fluidigm C1

   Fluidigm C1 was run according to manufacturer’s instructions, including
   using the LIVE/DEAD stain (L-3224, LifeTechnologies). They were imaged
   using a fluorescent microscope. However, ERCC ExFold RNA-Spike-Ins
   (4,456,739, Ambion) were used in place of the RNA spikes as
   recommended. For all the three experiments, the overall spike-in
   dilution was 1: 2, 000, 000.

Sequencing

   cDNA was quantified using the High Sensitivity Qubit kit ([186]Q32854,
   LifeTechnologies) and EnVision plate reader. All experiments were
   performed using an Illumina NextSeq500 with 75 bp single-end reads and
   ~ 400 M reads/run.

Samples and library preparation for bulk RNA sequencing

   Blood samples were obtained in the same way with that of single cells.
   For isolation of cDC1, cDC2, and rare pre-cDCs from peripheral blood,
   the process is the same with that of single cells.

   Bulk sequencing libraries were created using the SMARTer Ultra Low RNA
   Kit (634,935 and 639,207, Clontech) and sequenced with an Illumina
   NextSeq500 with 75 bp single-end reads and ~ 400 M reads/run).

Computational analysis

   Most of the computational analysis was performed in the R programming
   environment, unless stated otherwise. Multi-dimensional scaling was
   performed with cmdscale() function with eig = TRUE and k = 2. Heatmap
   was done with heatmap.2() in gplots package. Principal components
   analysis was performed with the function prcomp(), with centering,
   scaling and cor options on. Identification of the upstream regulators
   of a gene set was done with Enrichr [[187]43] using the ChEA database
   (version ChEA 2016). Pathway analysis was done with ConsensusPathDB
   [[188]22], which is a meta-database that integrates 4593 pathways from
   different sources. Signature analysis to assess the purity of cDC
   clustering were performed in two ways: 1) the connectivity map score
   [[189]44], which measures the closeness of each single cell to a set of
   cell type signature genes, calculated with the connectivity_score()
   function of gmap package [[190]45]; 2) weighted summation score:
   [MATH:
   <munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n<
   /mi></munderover><msub><mi>w</mi><mi>i</mi></msub><msub><mi>x</mi><mi>i
   </mi></msub> :MATH]
   , where the weight w[i] for each gene i is the fold change between cDC1
   VS. cDC2 bulk samples and the expression value for each gene i is
   binary: 0 if the gene is not expressed; 1 if the gene is expressed.
   Rand index, which measures the agreement between two clustering
   results, was calculated with adjustedRand() function of clues package
   [[191]46].

Single-cell RNA sequencing data analysis

   For all three batches of cells, we mapped single-end 75 bp reads to the
   human reference genome (Ensembl GRCh37) and the ERCC sequences using
   STAR [[192]47] (version 2.3.0e) with default parameters. We used
   featureCounts [[193]48] to compute the number of reads mapped to each
   gene or spike-in with options “-s 0 -t exon -g gene_name” . Only
   uniquely mapped reads were considered here in downstream analysis. We
   excluded cells that have fewer than 100,000 reads from the downstream
   analysis. We used the SCDE package [[194]21] to fit individual error
   models for each cell and calculated an adjusted distance measure
   between each pair of individual cells that accounted for the
   probability that genes were not observed because of amplification
   failures or other reasons, i.e., dropout issue. We then used this
   adjusted distance measure as input to a multi-dimensional scaling (MDS)
   analysis with 2 dimensions for visualization and clustering of
   individual cells.

Bulk RNA sequencing data analysis

   We sequenced 5 pre-cDCs samples, 4 cDC1 samples and 4 cDC2 samples.
   Stranded single-end 75 bp sequenced with NextSeq reads were mapped to
   the human reference genome (Ensembl GRCh37) using STAR [[195]47]
   (version 2.3.0e) with default parameters. We used RNA-SeQC [[196]49]
   and RSeQC [[197]50] to do quality control for the aligned reads.
   Samples that have high intergenic rate and reverse strand rate of
   mapped reads were removed. Finally, we have 3 good samples for each
   type of cells. featureCounts [[198]48] was also used to obtain the
   number of uniquely mapped reads in sense direction to each gene with
   options “-s 1 -t exon -g gene_name”. The raw data of both single cell
   samples and bulk samples was uploaded to NCBI GEO (accession number
   [199]GSE81682).

Inference of single cell populations signature gene set

   The read counts were first normalized with DESeq2 [[200]51] to remove
   the sequencing depth difference and then log2 transformed with pseudo
   count 1 added. The list of genes that are differentially expressed
   between pre-cDCs, cDC1 and cDC2 was identified with one-way analysis of
   variance aov() function in R. TukeyHSD() function was used to do Tukey
   HSD post-hoc test and Benjanmini-Hochberg multiple testing correction.
   Four hundred sixty-seven genes were identified with absolute value of
   log[2] fold change larger than one and adjusted p-value < 0.05.
   Unsupervised clustering of the 467 genes and 135 single cells were done
   with Ward agglomeration method and distance was defined as 1- Spearman
   correlation. The predicted regulator TFs here were selected by: a)
   p.adj < 0.05; b); log2FC > 2; c) combined score > 0; d) expressed in at
   least 10 cells in the corresponding cell population.

Highly variable genes in pre-cDCs inference

   To determine genes with high biological variability in pre-cDCs, we
   first used the method introduced by Brennecke et al. [[201]52] to fit
   the dependence between the mean of normalized read counts and the
   squared coefficient of variation (CV^2) with
   [MATH: <msup><mi
   mathvariant="italic">CV</mi><mn>2</mn></msup><mo>=</mo><mfrac><msub><mi
   >a</mi><mn>1</mn></msub><mi>μ</mi></mfrac><mo>+</mo><msub><mi>α</mi><mn
   >0</mn></msub><mspace width="0.25em"></mspace> :MATH]
   on 92 spike-ins for pre-cDCs in batch 2 only, where μ is the mean of
   normalized read counts, a[1] and α[0] are coefficients obtained from
   the fit. Then, for each gene, we use the fitted technical estimate and
   test against the null hypothesis that the biological coefficient of
   variation is less than 50% (at 10% FDR). This leads to 1389 variable
   genes. Among these genes, 842 genes were detected in at least in 10
   cells and were selected as biological variable genes to perform MDS for
   all the single cells.

Pre-cDCs pre-committed subpopulation visualization

   To identify the differentiation map from pre-cDCs to cDCs, we first
   inferred a list of candidate master regulator [[202]15] TFs that are
   significantly differentially expressed between bulk cDC1 and cDC2 (BH
   p.adj < 0.05, two-fold change) or have regulated targets enriched in
   the differentially expressed genes and exhibit marginal differential
   expression between bulk cDC1 and cDC2 (pvalue < 0.05, two-fold change).
   Then we define the distance between cell x and y as 1- r(x, y), where r
   represents Spearman’s rank correlation coefficient between x and y
   based on the expression level of MR TFs. At last, we use this distance
   matrix as input to t-Distributed Stochastic Neighbor Embedding (t-SNE)
   [[203]53] implemented in R for visualization of the single cells with
   parameters perplexity = 20 and theta = 0.1.

Additional files

   [204]Additional file 1^ (4.9MB, pptx)

   Figure S1. Flow chart of the analysis. Figure S2. Comparison of 3
   studies: See et al., 2017, Villani et al., 2017 and Ma et al., 2018.
   Figure S3. Quality control of single cell sequencing data. (A)
   Sequencing saturation analysis for the 3 batches. (B) Bar plot of
   metrics to assess sequencing quality for all the single cells. Figure
   S4. Outlier analysis. (A) Multidimensional scaling (MDS) plot indicates
   that group 3 is outliers (dots outside of the dashed ovals). (B)
   Boxplot of mapped reads number for good single cells (groups1 and 2)
   and outliers (group 3). (C) Histogram of the percentage of
   mitochondrial reads, genes detected out of 100 housekeeping genesand
   496 cell cycle genes. (D) Heatmap of cell-specific markers for pre-cDC
   and cDCs. Figure S5. Assessment of the purity of the two DC clusters.
   Cmap score for each single cell using DC signature genes from Villani
   et al. (A) and signature genes from our bulk RNA-Seq data (B). (C)
   histogram of weighted sum score with the signature genes from our bulk
   RNA-Seq data.Figure S6. More details about MR TFs between bulk cDC1 and
   cDC2 that potentially drive the pre-commitment of pre-DCs. (A-B)
   Heatmap of MR TFs in bulk data (A) and single cell data (B). (C) t-SNE
   plot of all the single cells with global transcriptome, biological
   variable genes in pre-cDCs, DE genes between bulk cDC1 and cDC2 and the
   MR TFs, with pre-committed pre-cDC subsets marked. (D) Violin plot of
   the expression for the housing keep gene GABARAP. Figure S7. Trajectory
   analysis with Monocle2. Figure S8. Test our hypothesis on three
   published data sets. Test our hypothesis on the dataset of Breton et
   al., [[205]5](A), Villani et al., [[206]7](B) and the dataset in Fig.
   [207]3 of See et al., [208]6(C). (PPTX 5054 kb)
   [209]Additional file 2:^ (288.7KB, zip)

   Table S1. The list of 380 genes that are differentially expressed
   between at least one pair of cell populations in mean expression and
   the gene clustering result. Table S2. A) The list of biological
   variable genes in pre-cDC of batch 2, B) enriched pathways of the
   variable genes and C) upstream regulators of the variable genes. Table
   S3. The list of random selected genes to generate MDS plot in Fig.
   [210]3d. Table S4. The list of cell cycle genes from reactome to
   generate MDS plot in Fig. [211]3e. Table S5. Summary of master
   regulator transcriptional factors. Their expression level comparison
   was shown in groups bulk cDC2 VS. cDC1, single cell cDC2 VS. cDC1 and
   single cell pre-DC2 VS. pre-DC1. For the TFs that have targets enriched
   in the differentially expressed genes between cDC1 and cDC2, the
   evidence from ChEA database (version 2016) was followed. Table S6. The
   list of differentially expressed genes between two pre-cDC
   subpopulations. (ZIP 288 kb)

Acknowledgements