Abstract
Severe immune aplastic anemia is a fatal disease due to the destruction
of marrow hematopoietic cells by cytotoxic lymphocytes, serving as a
paradigm for marrow failure syndromes and autoimmune diseases. To
better understand its pathophysiology, we apply advanced single cell
methodologies, including mass cytometry, single-cell RNA, and TCR/BCR
sequencing, to patient samples from a clinical trial of
immunosuppression and growth factor stimulation. We observe opposing
changes in the abundance of myeloid cells and T cells, with T cell
clonal expansion dominated by effector memory cells. Therapy reduces
and suppresses cytotoxic T cells, but new T cell clones emerge
hindering robust hematopoietic recovery. Enhanced cell-cell
interactions including between hematopoietic cells and immune cells, in
particular evolving IFNG and IFNGR, are noted in patients and are
suppressed post-therapy. Hematologic recovery occurs with increases in
the progenitor rather than stem cells. Genetic predispositions linked
to immune activation genes enhances cytotoxic T cell activity and
crosstalk with target cells.
Subject terms: Haematological diseases, Autoimmune diseases,
Autoimmunity, Haematopoiesis, Gene regulation in immune cells
__________________________________________________________________
The transcriptional phenotype of immune cells associated with severe
aplastic anaemia (SAA) may change post immunotherapy. Here the authors
analyse single cell transcriptomics of hematopoietic and immune cells
from SAA patients and assess how these phenotypes change after
treatment showing alterations in myeloid cells and TCR clonal abundance
correlate with robustness of hematopoietic response.
Introduction
Aplastic anemia (AA) was recognized well over a century ago, by Paul
Ehrlich, and named by Louis Henri Vaquez. Considered rare in Western
countries, AA is far more common in Asia^[54]1–[55]3. Severe aplastic
anemia (SAA) presenting in the community is fatal if untreated, and
effective therapies, stem cell transplant and immunosuppression, have
reversed the dire prognosis—a success story in twentieth-century
hematology^[56]4.
The disease AA can be separated from both much less frequent congenital
syndromes, which include bone marrow (BM) hypocellularity and frequent
iatrogenic suppression of blood cell production due to chemo- and
radiotherapy marrow injury. Immune AA has clinical similarity to
immune-mediated hematologic diseases: large granular lymphocytosis,
paroxysmal nocturnal hemoglobinuria, hypoplastic myelodysplastic
syndrome, and single lineage syndromes like pure red cell
aplasia^[57]5. Additionally, immune AA shares pathophysiologic features
with other human autoimmune diseases, such as multiple sclerosis,
uveitis, and type I diabetes. The study of AA is advantaged by the
ready availability of blood and marrow cells, and techniques to
identify, separate, grow, and manipulate them in vitro.
The best evidence for an immune pathophysiology in AA derives from the
clinical response to immunosuppressive treatments^[58]6; dependence of
blood counts on continuation of such therapies^[59]7,[60]8; and the
occurrence of AA as a rare complication of transfusion of reactive
lymphocytes^[61]9 as well as a toxicity of cytotoxic lymphocyte
activation in cancer immunotherapies^[62]10–[63]12. Murine models
support specific immune mechanisms of marrow destruction and drug
mechanisms of action^[64]13–[65]18.
Many tissue culture experiments also provide evidence of immune
pathophysiology. Examples include colony culture and
co-culture^[66]19,[67]20, cell surface phenotyping by flow
cytometry^[68]21–[69]23, bulk analysis of RNA transcriptomes^[70]24,
and detection of cytokines in the circulation and within
cells^[71]25–[72]28. However, most of these methods require extensive
and non-physiological manipulations and conditions, limitations of
starting cell number, and unpredictable cell loss. Most in vitro
experiments have been conducted with a necessarily limited focus of
interest, due to the paucity of marrow cells in AA and the rarity of
the disease. Reproducibility of results among laboratories is not
insignificant and is likely due to differences in technique.
Remarkably, the pathophysiology of AA continues to be ascribed to an
“immune hypothesis.”
Advanced single cell methodologies—single cell DNA genomics, RNA
sequencing, and proteomics, as well as high-resolution time-of-flight
cytometry (CyTOF)—combined with computational algorithms for the large
amount of generated information, afford an extraordinary opportunity
for dramatic visualization and deep analysis of the affected marrow
compartment, with minimal laboratory manipulation, comprehensive
inclusion of all retrievable cells, and multiplexing of orthogonal
datasets. We and others have reported the successful application of
these methods to human hematologic, genetic, and immune
diseases^[73]29–[74]35, in which modern methods both confirm earlier
conclusions from simpler approaches and provide startling new insights
into pathophysiology.
In this study, we apply advanced single-cell methodologies to primary
BM samples from SAA patients, in the context of in vivo perturbation:
before and after current first-line therapy. We provide a comprehensive
transcriptome atlas of target hematopoietic cells and effector immune
cells in these patients, and investigate the dynamic changes of cell
populations, gene programs, T cell clonality, and association with
response to treatment. Our findings have implications not only for BM
failure syndromes but for other human autoimmune diseases with
analogous pathophysiology.
Results
Patient cohort
The current work is unique in that a prospective clinical protocol
([75]NCT01623167) was the basis for sample collection and specific
experiments. Fresh BM samples were obtained from 20 out of 137 persons
with SAA who were treated with hATG, cyclosporine and eltrombopag in
combination, a standard immunosuppression regimen (IST, Supplementary
Fig. [76]1). The median age (36.5 with range 11–73), sex distribution,
and severity of blood counts were typical of immune AA. At the primary
efficacy endpoint (6 months after IST), 15 patients had achieved
hematopoietic response (R), 4 patients had withdrawn from the protocol,
and 1 was a nonresponder (NR). With a median follow-up time of 24
months, 5 responders had a relapse requiring re-initiation of IST and 4
had developed secondary myeloid malignancy (Fig. [77]1b, Supplementary
Fig. [78]1d, Supplementary Data [79]1). Immunoprofiling of
hematopoietic stem and progenitor cells (HSPCs), cytometry by
time-of-flight (CyTOF), and single-cell RNA sequencing (scRNA-seq), and
single-cell T cell receptor/B cell receptor sequencing (scTCR/BCR-seq)
were performed for 93 BM samples and sorted cell populations of
patients, and 16 total or sorted samples of 4 healthy donors
(Fig. [80]1a).
Fig. 1. Resolving hematopoietic and immune cell complexity using CyTOF and
scRNA-seq.
[81]Fig. 1
[82]Open in a new tab
a Experimental workflow. BMMNC samples from patients and healthy
donors (HD) were analyzed by CyTOF and flow cytometry to profile
hematopoietic and immune cells and subpopulations. BMMNCs and
Lineage^−CD34^+ cells were subjected to scRNA-seq, followed by data
analysis including single-cell transcriptome profiling (gene expression
and cell–cell interaction) and scTCR/BCR profiling. Created in
BioRender. Wu, Z. (2025) [83]https://BioRender.com/t6tfzun. b
Characterization of the patient cohort. c UMAP projection of BMMNCs at
all time points from patients and healthy donors in CyTOF. d Log fold
changes and log P values (with the two-sided unpaired Mann–Whitney
test) of cell abundance between pre-treatment samples of SAA patients
and controls (top) and post-treatment samples (bottom). Red dots
represent lymphoid cells, and blue dots represent myeloid cells. e UMAP
embedding of CyTOF data as shown in Fig. 1c, colored by differential
abundance. Red indicates clusters with patients’ cells (pre-treatment
samples) constituting >95% of cells, and blue clusters with healthy
donors’ cells >95%. f Graphic of Nhoods identified by Milo. Nodes are
Nhoods, colored by log2 fold-changes (log2FC) between pre-treatment
samples of SAA patients (n = 20) and healthy donors (n = 4).
Nondifferential abundance Nhoods (FDR ≥ 0.1) are colored white, and
sizes correspond to the number of cells in a Nhood. Graph edges depict
the number of cells shared between adjacent Nhoods. g beeswarm plot
showing adjusted log2FC of cell populations abundance between
pre-treatment samples of SAA patients and healthy donors in Nhoods. h
A UMAP plot of single-cell gene expression of BMMNCs samples at
different time points of patients (n = 45) and healthy donors (n = 4).
Cells are colored by type (HSPC, ProB as B cell progenitors,
erythroblast, neutrophil lineage, monocytic lineage, CD8^+ T cell,
CD4^+ T cell, NK cell, B cell and plasma cell, dendritic cell, stromal
cell, and platelet). i Frequency (% in BMMNCs) of CD8^+ T, CD4^+ T, NK,
erythroblast, neutrophil, and monocyte in pre- (n = 20), post-treatment
(n = 14) samples of SAA patients and healthy donors (n = 4). P values
with the two-sided unpaired and paired Mann–Whitney test are shown.
Broad, low-resolution assessment of the bone marrow by single-cell sequencing
and cytometry
CyTOF showed marked differences between AA samples at baseline relative
to healthy donors. Unsupervised cell clusters were annotated as
granulocytic lineage, monocytic lineage, NK cells, T and B lymphocytes
and abundance analysis using Milo^[84]36 was performed (Fig. [85]1c);
as expected, proportions of CD8^+ and CD4^+ T cells were
overrepresented while myeloid cells were lower in all patients’ samples
(Fig. [86]1e–g). Subpopulation analyses within cell types, for example
CD8^+ T cells, revealed a less uniform pattern of abundance
(Supplementary Fig. [87]2). After IST, percentages of CD8^+ T, CD4^+ T,
NK, and B cells decreased and myeloid cells increased; however,
subgroup clustering within each cell type again revealed more
complicated and heterogenous responses (Supplementary Fig. [88]3).
These results were confirmed using a different cell-type assignment
methodology and analysis (Supplementary Data [89]2, Supplementary
Figs. [90]4–[91]9, Fig. [92]1d).
Using scRNA-seq, we explored functional heterogeneity, transcriptome
programs, TCR/BCR usage, and their dynamics after treatment in paired
samples from SAA. BM mononuclear cells (BMMNCs) clusters were annotated
to major cell types, including HSPCs and differentiated cells of all
lineages (Fig. [93]1h), and expression of well-established marker genes
was plotted in Supplementary Fig. [94]9e. Similar to CyTOF, there were
increased T lymphocytes and decreased myeloid cells before treatment
compared with healthy donors, and decreased lymphocytes and recovery of
myeloid cells after treatment (Fig. [95]1i).
High-resolution visualization of target and effector cell populations and
imputed function by single-cell methodologies
scRNA-seq and CyTOF can discriminate among subpopulations and allow
inferences of function from the genes and gene pathways expressed (or
suppressed), corresponding cell membrane and intracellular proteins. As
examples, the stem and progenitor cells and lineage-committed precursor
cells can be denominated, and naïve, memory, effector, and regulatory
cell subsets, as well as their state of activation and exhaustion,
clonality, and dynamics, can be distinguished, as discussed in detail
below.
Loss of early hematopoietic stem and progenitor cells and recovery of
late-stage cells after treatment
Overall, lineage^−CD34^+ HSPCs were much reduced in pre-treatment
samples and recovered after treatment. However, for more primitive
CD34^+CD38^− stem and multipotent progenitors, there was a notable lack
of recovery at 6 months, while CD34^+CD38^+ differentiated
lineage-specified progenitor cells (including granulocytic and
monocytic progenitors, GMPs; common myeloid progenitors/progenitors of
erythroid-megakaryocytes, CMPs/MEPs; and lymphoid progenitors, LymPs)
dramatically increased (Fig. [96]2a, b).
Fig. 2. Loss of early-stage and recovery of late-stage HSPCs after treatment.
[97]Fig. 2
[98]Open in a new tab
a Representative plots of phenotypes of HSPCs in healthy donors (HD)
and patients by flow cytometry. b Proportions of progenitor populations
were compared between pre- (n = 20) and post-treatment samples (n = 16)
and HD (n = 4). P values with the two-sided unpaired and paired
Mann–Whitney test. c A UMAP plot of single-cell gene expression in
HSPCs of all patients and HD, colored by cell types as HSC, MEP, GMP,
Tprog, and Bprog. d A heatmap of representative differentially
expressed genes (grouped by functional pathways) of HSPCs between
patients (n = 20) and HD (n = 4) on the top panel, between post-
(n = 16) and pre-treatment samples (n = 20) at the bottom. Values are
presented as log2FC. e A principal component analysis plot of samples
based on cell cluster abundance measured by CyTOF. Pre- and
post-treatment samples and samples from HD were clustered separately. f
A plot showing fraction of variance (gene expression of BMMNCs)
explained by diseases (PT/HD), disease severity (SAA/VSAA), and
treatment (Post/Pre-immunosuppression). g Dynamic changes of lineage
priming of HSCs to LymP, GMP, and MEP, along with pseudotime
differentiation. X-axis, pseudotime ordering from HSCs to
lineage-restricted progenitors estimated by Palantir. Y-axis,
Log(lineage signature gene expression) in pre-treatment (dark orange),
post-treatment samples (orange), and HD (gray). Data are presented as
mean with 1.96SE. P values with the Analysis of Covariance (ANCOVA). h
Dynamic changes of expression levels of transcription factors along
pseudotime differentiation. Y-axis, expression of transcription factors
in pre-treatment (dark orange), post-treatment samples (orange), and
healthy donors (gray). Data are presented as mean with 1.96SE. P values
with ANCOVA. i Dynamic changes of lineage primed HSC fraction in
pre-treatment (dark orange), post-treatment samples (orange), and HD
(gray) along differentiation. Y-axis, Log(cell numbers of HSCs). For
each lineage, cell fractions of early (pseudotime 1–4 on x-axis) and
late stages (pseudotime 5–10) were compared: a ratio in
pre-treatment/HD, and a ratio in post-treatment/pre-treatment. P values
with the two-sided unpaired t-test: HSC to GMP, 0.016 and 0.007; HSC to
MEP, 0.929 and 0.96; HSC to LymP, 0.004 and 0.029.
From scRNA-seq data, sorted CD34^+ HSPCs were clustered to include
hematopoietic stem cells and multipotent progenitor cells (HSCs), MEPs,
GMPs, and LymPs (Fig. [99]2c), validated by expression of
lineage-specific genes, indicating differentiation from HSCs to ME, GM,
and Lym lineages (Supplementary Fig. [100]10). CD34^+ HSPCs from SAA
patients had higher expression of inflammatory pathways, including
hallmark IFN-γ and IFN-α, which were downregulated post-treatment.
Conversely, DNA damage repair and cell cycling genes were downregulated
at baseline and showed upregulation after therapy, accompanied by
increased expression of genes in cell stress pathways (unfolded protein
response, Fig. [101]2d).
Both CyTOF and scRNA-seq data showed incomplete hematopoietic recovery
and continued immune dysregulation after treatment (Fig. [102]2e).
There were dramatic differences in gene expression in BM between SAA
patients and healthy donors, but, the variation was less apparent
between pre- and post-treatment samples (Fig. [103]2f). Differentiation
potentials of HSCs to MEP, GMP, and LymP trajectories in pre-treatment,
post-treatment, and healthy samples were comparable, appropriately
governed by key transcriptional factors (Fig. [104]2g, h), supporting
the inference that destruction of target cells occur after lineage
specification and differentiation rather than biased cell fate
determination.
On the assumption that eltrombopag stimulated late-stage HSCs rather
than stem cells^[105]37, dynamic changes of HSC numbers distributed
along pseudotime differentiation were examined. For pre-treatment
samples, there were less early HSCs for multilineage differentiation,
suggesting that immune targeting of early stem cells resulted in BM
failure. Late-stage HSCs contributed to trilineage differentiation
post-treatment (Fig. [106]2i).
CD8^+ T cells and CD4^+ T cell subpopulations expanded and activated in SAA
CD8^+ T cells from scRNA-seq data of total BMMNCs, plotted along
pseudotime (Slingshot^[107]38) (Fig. [108]3a), showed progressive
differentiation from naïve T cells to central memory and then to
effector T cells, validating the Velocity analysis^[109]39
(Fig. [110]3b). Frequencies of effector memory CD8^+ T cells were
higher at baseline and decreased after treatment, with relative
increases in naïve and central memory CD8^+ T cells (Fig. [111]3c). The
genetic programs of CD8^+ T cells showed progressive gain in effector
memory cell markers and loss of naïve cell markers, and increased
cytotoxicity gene transcription with differentiation, a pattern more
prominent in patients than in healthy individuals (Fig. [112]3d). AA
samples exhibited higher CD8^+ effector memory potential and increased
cell cytotoxicity compared to healthy donors (Fig. [113]3e).
Fig. 3. CD8^+ T cells expanded and exhibited cytotoxicity in SAA.
[114]Fig. 3
[115]Open in a new tab
a In a UMAP only showing CD8^+ T cells, subtypes including naïve,
central memory (CM), and effector memory (EM) were indicated.
Pseudotime based on Slingshot was calculated and colored accordingly. b
RNA velocity analysis of CD8^+ T cell phenotypes independently
confirmed the differentiation trajectory. c Dot plots showing frequency
of naïve, CM, and EM subpopulations of CD8^+ T cells in pre- (n = 20),
post-treatment (n = 18) samples of SAA patients and healthy donors
(n = 4). P values with the two-sided unpaired and paired Mann–Whitney
test are shown. d Plots showing expression dynamics of marker or
functional genes of CD8^+ T cells in pre- and post-treatment samples in
SAA patients and healthy donors, along pseudotime. e Plots of frequency
of effector memory T cells and cytotoxicity gene expression of CD8^+ T
cells in pre- and post-treatment samples of SAA patients and healthy
donors.
We integrated CyTOF data from pre- and post-treatment samples and
healthy donors, and constructed density plots of CD8^+ and CD4^+ T
cells based on Binaryclust data^[116]40; naïve CD8^+ T (cluster 4) and
naïve CD4^+ T (cluster 18) cells were dominant in healthy, nearly
absent in pre-treatment SAA samples and recovered after treatment.
CD8^+ effector T cells (cluster 5) and central memory cells (cluster
15), as well as CD4^+ central memory T cells (cluster 16), were
prominent in baseline samples and decreased after treatment; a pattern
also observed with scRNA-seq (Fig. [117]4a, b).
Fig. 4. CD8^+ T cells and CD4^+ T cell subpopulations expanded and activated
in SAA.
[118]Fig. 4
[119]Open in a new tab
a UMAP plots showing overlay of CD8^+ T cell density in healthy
controls (n = 4), SAA patients pre- (n = 20) and post-treatment
(n = 18), as well as the individual conditions. Dot plots showing
abundance (% in CD45^+ BMMNCs) of CD45RA^+CD8^+CD127^+CD197^+ naïve T
cells (cluster 4), CD45RA^+CD8^+CD57^+ effector T cells (cluster 5),
and CD45RA^+CD197^+CD184^+CD127^+ central memory T cells (CM, cluster
15) in pre- and post-treatment samples of SAA patients and healthy
donors identified by Binary Clust. P values with the two-sided unpaired
and paired Mann–Whitney test are shown. b UMAP plots showing overlay of
CD4^+ T cell density in healthy donors (n = 4), SAA patients pre-
(n = 20) and post-treatment (n = 18) as well as individual conditions.
Dot plots showing abundance (% in CD45^+ BMMNCs) of
CD45RA^+ CD4^+CD197^+CD38^+ naïve T cells (Cluster 18),
CD45RA^+CD4^+CD197^+CD38^+CD127^+CD27^+CD184^+ central memory T cells
(CM, cluster 16),
CD4^+CD25^+CD127^+CD45RO^+CD150^+FOXP3^+CD95^+CD194^+CD39^+ Treg-B
(cluster 8), CD4^+CD25^+CD127^-CD45RA^+Foxp3^+CD197^+ Treg-A (cluster
6), and nonA nonB-Treg (cluster 11) in pre- (n = 20), post-treatment
(n = 18) samples of SAA patients and healthy donors (n = 4) identified
by BinaryClust. P values based on the two-sided unpaired and paired
Mann–Whitney tests. c Ratios of Treg-A and Treg-B abundance versus
CD8^+ effector T cells (cluster 5 and cluster 2) in pre- (n = 20),
post-treatment (n = 18) samples of SAA patients and healthy donors
(n = 4). P values based on the two-sided unpaired and paired
Mann–Whitney tests. d In a UMAP only showing CD8^+ T cells, CD57^+ T
cells, and expression of cytotoxicity genes were indicated. e In a UMAP
only showing CD4^+ T cells, Treg-A and Treg-B were shown. f In a UMAP
only showing CD4^+ T cells, subtypes including naïve, central memory
(CM), and effector memory (EM) were indicated. A bar chart showing
percentages of Treg-B and non-Treg-B CD4^+ T cells as naïve, CM, and
EM. g Gene Set Enrichment Analysis (GESA) enriched plots of
differentially expressed genes for Hallmark interferon gamma response
and Hallmark interferon alpha response in Treg-B compared with
non-Treg-B CD4^+ T cells in pre-treatment samples of SAA patients. GSEA
is based on the one-sided Kolmogorov–Smirnov test. h A heatmap showing
expression of representative differentially expressed genes grouped by
their functional pathways in IFN-γ and IFN-α signaling between Treg-B
and non-Treg-B CD4^+ T cells pre- and post-treatment in SAA patients.
Values are presented as log2FC.
CD4^+ regulatory T cells (Tregs) abundance was similar in SAA samples
before and after treatment and also similar to controls (Supplementary
Fig. [120]5). Both Treg-A (Cluster 6: CD4^+CD25^+FOXP3^+CD95^−) and
Treg-B (Cluster 8: CD4^+CD25^+FOXP3^+CD95^+)^[121]41 decreased after
treatment (Fig. [122]4b, Supplementary Fig. [123]5). We calculated a
ratio of Treg versus CD8^+ effector cells; the ratio of Treg-B (but not
Treg-A) to CD8^+ effector cells (cluster 2 or 5) was increased after
treatment relative to pre-treatment (Fig. [124]4c), consistent with
earlier reports of the potential modulatory benefit of Treg-B in
AA^[125]41,[126]42. A cluster of nonA or nonB Tregs (Cluster 11) in
CD4^+ T cells was lower in pre-treatment than in healthy and increased
after treatment (Fig. [127]4b).
To explore transcriptome features of T cells that were identified by
CyTOF, we annotated CD8^+ and CD4^+ T cell subpopulations in BMMNCs
using scRNA-seq. Effector memory CD8^+ T cells highly expressed CD57
and showed a high cytotoxicity score (Fig. [128]4d). Among CD4^+ T
cells, Treg-B were transcriptionally defined in a confined cell
population while Treg-A were more diverse (Fig. [129]4e). CD4^+Treg-B
were more enriched in the effector memory population (Fig. [130]4f),
and there was increased IFN-γ and IFN-α signaling in Treg-B
pre-treatment compared to healthy donors, which decreased after
treatment, indicating an activated phenotype (Fig. [131]4g, h). The
Fas/FasL pathway score was lower in Treg-B from SAA samples compared to
healthy, likely suggesting resistance to highly inflammatory
environment of BM in SAA (Supplementary Fig. [132]5f).
NK cells, annotated as CD56^bright, CD56^early dim, CD56^dim, and
adaptive NK cells, were similar in patients’ pre-treatment samples as
in healthy donors, but CD56^bright NK cells increased and CD56^early
dim NK cells decreased after treatment, also several CD57^+ subclusters
of NK cells were higher in pre-treatment samples than in healthy
donors, and decreased after treatment (Supplementary Figs. [133]8h and
[134]13d), suggesting a “switch” of activated NK cells to naïve NK
cells.
Effector memory CD8^+ T cells were clonally expanded in SAA, with TCR usage
highly individual-private
T cells were clonally expanded in AA samples. Abundancy of clonally
expanded T cells (>2 cells with an identical TCR) was nearly double in
pre-treatment samples as compared to healthy (average 25% vs 10%).
Post-therapy, clonal expansion was not attenuated, and remained at 23%
of average and with an equivalent Gini index of TCR clonality
(Fig. [135]5a, b, Supplementary Fig. [136]14a–c). No significant
differences were noted in BCR clonality (Supplementary
Fig. [137]14d–f). Expanded TCR clones compared with all other T cells
had higher IFN-γ and cytotoxicity gene expression. Phenotypically,
clonally expanded T cells were mainly of the effector memory population
(Fig. [138]5c). These results were consistent using different cut-offs
in defining T cell clonal expansion (Supplementary Fig. [139]15a–c).
Fig. 5. Effector memory CD8^+ T cells were clonally expanded in SAA.
[140]Fig. 5
[141]Open in a new tab
a Frequency of clonal expanded T cells in pre- (n = 20), post-treatment
(n = 16) samples, and HD (n = 4). P values with the two-sided unpaired
and paired Mann–Whitney test. Clone sizes were projected to a UMAP of T
cells in pre-treatment samples, colored based on clone sizes. b Gini
index of TCR clone sizes in pre- (n = 20), post-treatment (n = 16)
samples, and HD (n = 4). P values with the two-sided unpaired and
paired Mann–Whitney test. c CD4^+ T and CD8^+ T cells expressing the
highest (top 10%) cytotoxicity score and IFN-γ signaling score are
highlighted in red, and all the rest in gray. Expression of T
cytotoxicity genes and TNF-α via NFκB signaling genes was compared
between clonally and non-clonally expanded T cells. P values with the
two-sided unpaired and paired Mann–Whitney test. Bar chart showing
percentage of clonally expanded and nonexpanded T cells as naïve, CM,
and EM. d Expression dynamics of gene scores in averaging increased,
decreased, and stable clones after treatment. P values were generated
using the two-way ANOVA with factors pre/post and
increase/decrease/unchanged clones.
TCR usage in SAA was individual-private rather than disease-specific.
There was very little sharing of top TCR clones among patients; similar
findings were observed among healthy donors (Supplementary
Fig. [142]15). TCR sequences in SAA patients were examined for
inference of potential antigens, for which TCR data from the current
cohort (20 patients and 4 healthy donors) were merged with TCR data
from our previously published AA cohort (12 patients, and 9 healthy
donors)^[143]43. Four hundred and fifty-five CDR3 sequences had a
higher frequency in SAA than in the generation probability. TCR
sequences in SAA patients were annotated to be related with autoimmune,
infectious and neoplasm diseases, higher percentages than in healthy
donors, and they were related to common pathogens including SARS-COV2,
human herpes virus, and influenza virus; but these pathogen-specific
TCR sequences present with a similar frequence in patients and healthy
donors (Supplementary Figs. [144]16 and [145]17). GLIPH2 was used to
explore common TCR clusters and differential abundance in patients and
controls (Supplementary Data [146]3); 24 TCR groups were significantly
more frequent among SAA patients. Each TCR cluster consisted of TCRs
from multiple patients (P < 0.05, Supplementary Fig. [147]18),
indicating more homogenous TCR usage in SAA than in the general
population, but potential antigens (if any) inducing an aberrant immune
response in SAA could not be identified due to limitations of study
cohorts (patient number and heterogenous HLA background) and existing
limited and biased virus databases.
T cell clonal dynamics and hematopoietic recovery after immune suppression
T cell clonal size dynamics after treatment was categorized into 3
groups: increased (>2 fold, treatment-resistant), unchanged (0.5–2
fold, treatment-insensitive), and decreased (<0.5 fold,
treatment-sensitive). IFN-γ signaling in T cells was uniformly
decreased post-treatment in all three groups; a more dramatic decrease
in cytotoxicity and increase in cell apoptosis were observed in
unchanged and decreased clones compared to increased clones
(Fig. [148]5d). For statistical determination, we also defined clone
size dynamics using an alternative approach, edgeR^[149]44,[150]45, and
consistent results were obtained in categorizing clone size dynamics
and transcriptomic changes (Supplementary Fig. [151]15d, e). These
results suggest suppression of T cell functions generally with IST,
regardless of clone size changes post-treatment. Transcriptional score
of cytotoxicity and apoptosis associated with a treatment-resistant
subset of T cell clones.
Among 16 patients who were evaluable at 6 months, 9 of them had clone
size increased and 7 decreased (Fig. [152]6a, b, Supplementary
Fig. [153]20). A clone size change was inversely correlated with
diversity of TCR usage (using Gini index) and with robustness of blood
count recovery (Supplementary Fig. [154]21a–c). New clones, not present
at baseline, predominantly contributed to clone size increase after
treatment; pre-existing clone size decreased in all but two cases
(Fig. [155]6c, d). Acquisition of larger new clones negatively
correlated with hematopoietic recovery (Fig. [156]6e, Supplementary
Fig. [157]21d). We focused on dominant clones that diminished after
treatment (treatment-sensitive) and new clones arising only after
treatment (treatment-resistant). Compared with other clones,
treatment-sensitive clones at baseline and treatment-resistant clones
that arose after treatment had higher IFN (IFN-γ and IFN-α) genes but
lower cell cycling gene expression at baseline, indicating activation
and exhaustion. TCR sequences of pre-existing clones (in baseline
samples) and new clones (in post-treatment samples) overlapped, with
higher similarity compared to TCR sequences identified in healthy
donors. These results are concordant with clonal TCR usage in SAA
patients, suggesting that pre-existing and new clones may target shared
(unknown) antigens. Samples with higher TCR similarity in existing
clones showed higher similarities in new clones and also between new
and pre-existing clones (Fig. [158]6f).
Fig. 6. T cell clonal expansion dynamics associated with hematopoietic
recovery.
[159]Fig. 6
[160]Open in a new tab
a Patients were grouped: clone sizes increased (red, n = 7) and
decreased (blue, n = 9). Data are presented as mean values ± SEM. P
values with the two-sided unpaired Mann–Whitney test were shown. b
Clinical characteristics of patients with clone sizes increased (n = 7)
and decreased (n = 9) post-treatment. P values with the two-sided
unpaired Mann–Whitney test are shown. c New clone sizes in two groups
of patients who had clone sizes increased (n = 7) and decreased (n = 9)
were compared. P values with the two-sided unpaired Mann–Whitney test
are shown. Correlation of new clone sizes with total clone size changes
after treatment. P values and slopes with the Pearson correlation test
are shown. d Clone sizes (frequency %) of pre-existing clones in paired
samples of patients (n = 16) and HD (n = 4). P values with the
two-sided unpaired and paired Mann–Whitney test are shown. e
Correlation of pre-existing clone sizes with new clone sizes (left) and
new clone sizes with therapeutic scores (right). P values and slopes
with the Pearson correlation test are shown. f TCR sequence similarity
is plotted for T cells in HD, pre-existing clones in pre-treatment
samples (pre), new clones in post-treatment samples (post), and between
pre-existing and new clones in post-treatment samples (post). Those who
had higher similarities of pre-existing clones are highlighted in red
across sample groups. P values with the two-sided unpaired Mann–Whitney
test are shown. g Expression of IFN-γ signaling scores was plotted for
all patients (n = 20), for patients who had clone sizes increased
(n = 7) and decreased (n = 9), and HD (n = 4), respectively. P values
with the two-sided unpaired and paired Mann–Whitney test are shown.
Functionally, IFN-γ scores were decreased more dramatically in
individuals with a decreased clone size after treatment compared to
those with an increased clone size (Fig. [161]6g). Higher baseline
IFN-γ, cell activation, TNF, and exhaustion scores correlated with
better blood counts at 6 months, and decreased scores post-treatment
positively correlated with more robust hematologic recovery
(Supplementary Fig. [162]21e, f). Hematopoietic recovery appeared
dependent on effective suppression of pre-existing T cell clones in
size and cytotoxicity, and conversely emergence of new clones
post-treatment was associated with less robust blood count improvement.
Intercellular interactions in patient and healthy marrows
Intercellular relationships can be imputed from expression of
ligand-receptor pairs across cell populations. Analysis using
CellPhoneDB^[163]46 revealed interactions among many cell types, but
cross-talks were most pronounced among immune cell populations, in both
patients and controls. Cell–cell interactions imputed using this
computational approach were enhanced in SAA patients compared to
controls (Fig. [164]7a), suggesting upregulation and cooperation of
innate and acquired immune compartments in disease. Most enhanced
cell–cell interactions at baseline were reduced after IST
(Fig. [165]7b). We plotted overrepresented interactions of HSPCs with
other cell types in SAA and compared with controls (Fig. [166]7c,
Supplementary Fig. [167]22). IFNG secreted by CD8^+ T cells was imputed
to interact mainly with IFNGR on neutrophils and other cell types,
including HSPCs, while IFNG secreted by other cell types interacted
moderately with IFNGR on CD8^+ T cells. Overall, there was
an abnormally increased IFNG/IFNGR interaction between CD8^+ T cells
and other cells in SAA (Fig. [168]7d). CD8^+ T and NK cells were the
major sources of IFNG interacting with the IFNGR on HSPCs; there were
increased IFNG/IFNGR interactions between HSPC and other cell types in
SAA compared to healthy donors (Fig. [169]7e). These aberrant cellular
interactions were reduced after treatment. Ligand-receptor pairs
between HSPC and pre-existing clones, imputed computationally, were
plotted in pre-treatment samples, and between HSPC and residual
pre-existing as well as new clones in post-treatment samples. New T
cell clones exhibited similar ligand-receptor interactions with HSPC as
observed with pre-existing clones, suggesting functional similarity and
similar antigen recognition (Fig. [170]7f). There were more
ligand-receptor pair interactions between early HSCs and CD8^+ T cells
than between late HSCs with CD8^+ T cells (Fig. [171]7g).
Fig. 7. Enhanced cell–cell interactions in SAA and attenuation by therapy.
[172]Fig. 7
[173]Open in a new tab
a Ligand-receptor pairs among cell types in BMMNCs were estimated by
CellPhoneDB^[174]89. Color legends for cell types are the same as in
Fig. [175]2a. The Thickness of lines connecting cell types indicates
the total number of ligand-receptor pairs between two cell types
estimated by CellPhoneDB. In general, there were more ligand-receptor
interactions between cell types in BMMNCs of SAA (left) than in those
of healthy donors (right). CD8T CD8^+ T cell, CD4T CD4^+ T cell, NK
natural killer cell, CD34 CD34^+ cell, ProB proB cell, B_Plasma B
cell_Plasma cell, Ery erythroblast, Neut neutrophil, Mono monocyte, DC
dendritic cell. b Heatmaps showing LogFC of cell–cell interaction
scores estimated by CellPhoneDB across cell types in BMMNCs
post-treatment as compared to pre-treatment. c Ligand-receptor pairs
that were overrepresented in SAA patients than in healthy donors across
cell types in the BMMNCs are presented, with HSPCs as sender (left) and
receiver (right). Significance indicates if the ligand-receptor pair is
overrepresented in pre-treatment samples compared with controls. d
Cell–cell interactions of IFNG/IFNGR. Heatmaps depicting the
interactions of IFNG and IFNGR1/2 between CD8^+ T cells (left, as
sender and right, as receiver). A sum of IFNG/IFNGR interactions in
CD8^+ T cells with all other cell types in the BM is shown at the
bottom for healthy donors, pre-, and post-treatment samples. e
A heatmap depicting interactions of IFNG and IFNGR1/2 between various
cell types in the BM (as sender) and HSPC (as receiver). Sum of
IFNG/IFNGR interactions in HSPCs with all other cell types in the BM is
shown at the bottom for healthy donors, pre-, and post-treatment
samples. f Heatmaps showing ligand-receptor pairs between HSPCs and
pre-existing T cell clones in pre-treatment samples (left), and between
HSPCs and residual pre-existing T cell clones, and HSPCs and new T cell
clones (right). g Heatmaps showing numbers of ligand-receptor pairs
between CD8^+ T cells and early-stage HSCs, late-stage HSCs, HSCs, and
progenitors.
Malignant clonal evolution at single-cell resolution
Development of secondary myeloid neoplasm is the most serious long-term
complication in SAA patients treated with IST. In the current series,
UPN10 suffered high-risk evolution to monosomy 7 at 6 months after
treatment, and there was a large proportion of monosomy 7 (7-) cells in
post-treatment CD34^+ HSPCs detected by scRNA-seq (but not observed in
total BMMNCs, Supplementary Fig. [176]23a, b, d–f). Gene expression was
compared between 7- and diploid cells; genes involved in immune
response pathways (including TNF-a signaling via NF-κB signaling,
inflammatory response, and interferon gamma and alpha response) were
downregulated in 7- cells, consistent with our original
report^[177]30. There was a significant increase of cell cycling gene
expression in 7- cells at disease evolution (Supplementary
Fig. [178]23c). We did not identify aneuploid cells in pre-treatment
sample from this patient (Supplementary Fig. [179]23g); other patients
with chromosomal abnormalities were not included in the analysis due to
timing of sampling, a clone size, or a small fraction of cells with
aberrant chromosomes (Supplementary Data [180]1).
Constitutional risk factors in an immune-mediated disease: germline variants
related to specific cell types
To evaluate for genome-wide heritable genetic factors in AA and
disease-associated cell populations in BM, we integrated the current
scRNA-seq data with polygenic signals from GWAS. A gene-level
association analysis was performed using MAGMA^[181]47. GWAS-based
pathway enrichment analysis revealed that top 100 genes associated with
AA (Fig. [182]8a, b) were enriched in immune response pathways,
including IFN-γ and IFN-α (Fig. [183]8c). We then examined the most
involved cell types by computing a disease score (scDRS)^[184]48 for
each single cell in our current scRNA-seq dataset, based on the
disease-specific genes identified by MAGMA. Disease scores were similar
across BMMNC cell types in controls, but highest for CD8^+ T and NK
cells in SAA patients (Fig. [185]8d, Fisher test, P = 0.04). For cell
populations examined using RolyPoly^[186]49, CD8^+ T cells were the
most relevant (Fig. [187]8e). We next examined if these top
disease-associated genes exhibit cell-type-specific expression. S1PR5
was preferentially expressed in NK and CD8^+ T cells, CCDN2 in CD4^+ T,
CD8^+ T, and NK cells, and DGKA in CD4^+ T cells. DERL3 had
preferential expression in B and plasma cells; expression of FBP1,
IRF5, VAMP3, and NFKB1 was largely limited in myeloid cells
(Fig. [188]8f, g). Further analysis was thus concentrated on a few
subpopulations: S1PR5^+ NK and CD8^+ T cells, and IRF5^+ and NFKB1^+
myeloid cells.
Fig. 8. Genetic risk and cell-type-specific disease-related variants
contribute to disease phenotype.
[189]Fig. 8
[190]Open in a new tab
a A Manhattan plot of GWAS analysis highlighting risk genetic loci for
AA, based on GCST9001879 data, in which P values were calculated by the
generalized linear mixed model with adjusting for age, sex, and top 20
principal components. b A Circus plot showing top risk genes of AA of
MAGMA analysis. The outer ring demonstrates 22 autosomal chromosomes.
In the inner ring, a circular symbol represents a specific risk gene. c
GESA enriched plots of risk genes identified in GWAS analysis were
enriched in Hallmark interferon gamma response. GSEA based on the
one-sided Kolmogorov–Smirnov test. d UMAP embedding BMMNCs from healthy
donors (left) and SAA patients (right), colored by scDRS disease scores
calculated from GWAS summary statistics. e A dot plot showing results
of combination of scRNA-seq data and GWAS summary statistics on AA
based on RolyPoly analysis among patients. Y-axis shows mean negative
log-transformation P value (-Log2P), x-axis shows cell types. P values
(one-sided) were calculated by estimating the association of each cell
type with a phenotype. f A heatmap showing percentages of cells
expressing selected risk genes in multiple cell types. g Same UMAP
projection of BMMNCs as in Fig. [191]2a. Expression of S1PR5 and IRF5
is plotted, showing cells (in red) with top 10% expression levels of
S1PR5 and IRF5. h Frequency of S1PR5^+CD8^+ T cells and S1PR5^+NK cells
were compared between pre-treatment (n = 20) and post-treatment samples
(n = 16) of SAA patients and healthy donors (n = 4). P values with the
two-sided unpaired and paired Mann–Whitney test are shown. i Immune
activation scores in S1PR5^+ and S1PR5^− CD8^+ T cells, S1PR5^+ and
S1PR5^− NK cells, and IRF5^+ and IRF5^− myeloid cells. P values with
the two-sided unpaired and paired Mann–Whitney test are shown. j Top
downregulated gene pathways in NFκB^+ compared with NFκB^− myeloid
cells. k Cell interaction scores calculated by CellPhoneDB in S1PR5^+
and S1PR5^−CD8^+ T cells, NK cells, IRF5^+ and IRF5^− myeloid cells,
and NFκB^+ and NFκB^− myeloid cells.
The frequency of S1PR5^+ NK cells and S1PR5^+CD8^+ T cells was higher
in pre-treatment samples than in healthy donors, and significantly
decreased after treatment (Fig. [192]8h). S1PR5^+ NK cells,
S1PR5^+CD8^+ T cells, and NFKB1^+ myeloid cells had higher expression
of many immune activation genes in the corresponding cell population,
compared with negative cells (Fig. [193]8I, j). There were increased
interactions of S1PR5^+ NK cells and S1PR5^+CD8^+ T cells with other
cells, compared to interactions of negative cells, and also increased
interactions of IRF5^+ myeloid cells and NFKB1^+ myeloid cells with
other cells compared to interactions with negative cells
(Fig. [194]8k). These data, uncontrived in silico, implicate genetic
backgrounds as contributors to disease phenotype: cell-type specific
expression S1PR5^+CD8^+ T to dysregulated immune activation, and
IRF5/NFKB1^+ myeloid cells to exaggerated inflammation when interacting
with lymphocytes.
Discussion
Single-cell methodologies, especially those based on nucleotide
sequencing, are now appreciated as much more than a technical
improvement, but rather revolutionary in the depth, breadth, and
complexity of the results that they generate^[195]50,[196]51.
Single-cell methods have marked advantages: unbiased as to data
collection, minimal in vitro perturbation, and easily shared (and
mined) data files. Single-cell techniques are particularly well suited
to examining limited cell numbers, as in the marrow failure syndromes,
and to hematologic diseases in general, for which blood and marrow
cells are readily obtained in the
clinic^[197]29–[198]33,[199]52,[200]53; even from patients with SAA. In
the current work, we prospectively incorporated single-cell
methodologies into a clinical research protocol, in order to compare BM
before and after a defined therapeutic regimen that leads to improved
blood counts and sustained remissions in this disease. Single-cell
methods have been criticized for their reliance on “black box”
algorithms and analytical pipelines, and the issue of “validation,” as
initially required for the measurement of RNA abundance or avoidance of
DNA errors, and later for the requirement of “functional” confirmation,
despite the absence of satisfactory statistical standards or
privileging of less accurate historical assays. We sought to avoid such
problems by performing simultaneous orthogonal testing for internal
validation of results, as well as comparison with similar, if more
restricted, published datasets^[201]43.
Immune AA is broadly understood as the destruction of hematopoietic
stem cells by T cells. By conventional histology, the appearance of the
affected organ, the marrow, is dramatic but bland, with replacement of
a hematopoietic tissue with fat. Histochemistry is unrevealing, with
variable residual “stromal” elements and lymphocytes. Methods reliant
on cell separation are also fraught due to the paucity of targets
(HSPCs absent or severely decreased) and effectors (CD4 and CD8+
lymphocyte populations within mononuclear cell sedimented fractions).
In striking contrast, single-cell methods using freshly isolated and
minimally manipulated total BM cells allowed comprehensive
visualization of heterogeneous populations—confirming suspected
features of stem cell loss and immune pathophysiology, revealing
unexpected complexity of interactions among cell populations, and
allowing inferences concerning specific cell function. Of importance,
comparison of marrow on presentation of disease and after effective
therapy indicated surprising features of incomplete stem cell
repopulation, dynamic effector T cell oligoclonality, and a complex
network of hematopoietic and immune cell relationships.
Some of our results, satisfyingly, confirm and expand suspected and
accepted components of the immune pathophysiology of AA. HSPCs as
targets have been difficult to study quantitatively because of their
scarcity and poor correlation with blood counts. We unexpectedly
observed that clinical hematologic recovery was not accompanied by an
increase in stem cells in the BM, but among the heterogeneous
population of early multipotential and lineage-committed progenitor
cells. In retrospect, this result is less surprising and likely
explanatory: AA is a relapsing syndrome, especially early after
immunosuppression^[202]8,[203]54; eltrombopag appears to preferentially
expand progenitors^[204]37; and attempts to measure and harvest CD34
cells post-hematologic recovery have failed. The status of stem and
progenitor cells at more distant time periods after treatment and after
years of stable near-normal blood counts, as well as the
characteristics of the hematopoietic compartment in the setting of
relapse, should be examined in future studies. Malignant clonal
evolution and the development of myeloid neoplasia in immune AA
were not the focus of the current work and remains difficult to study
systematically. Nevertheless, we detected monosomy 7 cells, acquired at
6 months in a single patient, to exhibit upregulation of cell cycling
genes, not seen in similar patients who received immunosuppression
alone^[205]30, possibly indicating stimulation of aneuploid cells by
eltrombopag, a growth factor that acts on stem cells. Pharmacologic
doses of a thrombopoietin mimetic may elicit proliferation of deficient
monosomy 7 cells that are unresponsive to physiologic thrombopoietin
concentrations.
For the immune system, single-cell methods provide a global view of
manifold components and a striking visualization of the effector cells
and associated regulatory elements. Cytotoxic CD8^+ T cells have long
been suspected to be the main proximal effectors of tissue
destruction^[206]26,[207]55: this population is now dramatically
evident at low resolution, in all patients at presentation, as
dominant, and the apparent target of treatment. Single-cell methods
disclose considerable heterogeneity of tissue transcriptomic programs
and cell surface phenotypes within the CD8^+ T cell compartment; some
subsets are strikingly abnormal in number and inferred function, and
correlate with clinical status. Subsets of effector memory cells can be
inferred to interact with hematopoietic target populations, and their
transcriptomes show upregulation of IFN and other cytokine signaling
pathways. Features of oligoclonality, previously reported at
comparatively low resolution^[208]43,[209]56, are also revealed in
detail: TCR oligoclones were limited in number but common to all
patients examined; they showed private, not disease-specific
specificities, although there were shared TCRs among patients compared
to healthy controls. The inciting antigenic events in immune marrow
failure syndromes are not known; they have been postulated to derive
from antigenic mimicry with epitopes of frequently encountered
pathogens^[210]57,[211]58. Cross-reactivity of TCRs from AA cases with
herpesvirus antigens has been recently reported^[212]59. Although not
the main focus of the present work, addressing viral reactivity from
single RNA sequences is limited by patient numbers (a large enough
cohort with a heterogenous HLA background is required) and the existing
virus databases (limited in sequence coverage and biased). CD8^+
effector T cell oligoclonality appeared dynamic over time, with early
clones decreasing or disappearing with treatment, while new clones also
targeting stem and progenitors arose after cytotoxic lymphocyte
reduction, and during and despite suppressive immune therapy. Selection
for TCR and clones may be based on resistance to such treatments as
well as through “antigen spread,” uncovering of novel epitopes
expressed on damaged and regenerating marrow cells.
The behavior of specific CD8^+ cell subsets correlated with clinical
responsiveness, especially the robustness of hematologic recovery. Our
results support previous observations in a pediatric very SAA (VSAA)
cohort (n = 10) that the highly skewed TCR repertoire in the BM is
associated with response to immunosuppression^[213]60, which we
investigated further in a larger adult cohort and with clonal dynamics
before and after treatment. An initial larger total T cell clone size,
lower TCR diversity, and higher clonality pre-treatment associated with
a decreased clone size after treatment; conversely, patients with
a smaller total T cell clone size, higher TCR diversity, and lower Gini
index showed an increased clone size after treatment. Clonotype
dynamics were associated with clinical course in a few cases who were
assessed longitudinally^[214]43,[215]56; not being explorative, these
studies only identified and tracked one or a few top pre-existing
clones, without examination of global clonal dynamics patterns or new
clones. Indeed, pre-existing clones almost uniformly decreased with
immunosuppression, and new clones almost exclusively contributed to
clonal expansion after treatment. Clonal dynamics inferred from serial
samples and single-cell TCR usage suggest that IST is effective in
suppressing pre-existing clones, but newly arising clones may limit
blood count recovery. These results allow re-interpretation of clinical
observations, such as the paradoxical action of more intensive immune
regimens, which are generally inferior to milder conventional therapy
in treatment-naïve cases yet can salvage patients refractory to initial
therapy, often leading to sustained hematologic
recovery^[216]61–[217]64. Monitoring T cell clonal dynamics could guide
intensification strategies for patients showing inadequate recovery.
At single-cell resolution, there was heterogeneity across and within
other immune cell types. While Treg numbers were not reduced compared
to matched healthy donors and were not changed after therapy, frequency
of Treg-B (specified by higher IL-2 receptor, FAS and CD45RO
expression) was higher in pre-treatment samples was surprising^[218]23,
but most likely due to BM rather than blood as the source of samples;
regulatory T cells may relocate to the site of inflammation.
Additionally, as our cohort included many patients with VSAA, in which
Tregs would be exposed to higher levels of inflammation, the most
functional and proliferative Tregs expressing CD95 (FAS) would be
sensitive to FASL-induced apoptosis and adopt a “protective” phenotype
by downregulating FAS and their functional suppressive
capabilities^[219]41,[220]42.
In general, our analysis of immune cells in AA BM is consistent with
previously published work using scRNA-seq, describing dominant CD8^+
cytotoxic T cells, oligoclones based on TCR, and pathophysiologic
effector memory cells^[221]50,[222]51,[223]65–[224]67. However, we
could not identify T cells with clear viral targets, as mentioned, and
in addition, our results and a comparable patient dataset differ in the
identification of an NKT cell population distinct to AA. We would note
that NK cells have generally been reported to be decreased in number
and activity in AA. Only enriched NK cells were identified by CyTOF due
to the limitation of the antibody panel, and their number was not
significant in the current series, but a few CD57^+ cytotoxic NK
clusters were overrepresented in disease and were suppressed after
treatment. Ongoing collaborative work seeks to further resolve this
result.
That our previous view of the pathophysiology of AA has been too simple
is supported by at least two new analyses in our study. First,
utilizing a computational method for imputing cell–cell interactions
from ligand-receptor pairs, we visualized a highly complex network of
relationships that cross traditional boundaries of lineage,
differentiation, and compartment. These suggestive individual
relationships invite experimental verification but may already inform
translational reports and animal models. Some undoubtedly represent the
complexity of a pathological immune response. Yet even in healthy
donors, interactions were inferred between mature and immature cells,
between myeloid and non-myeloid cell types, and among immune system
cells—many intensified in diseased marrow, and potential targets of
novel therapeutic interventions. Interactions between CD8^+ T cells and
HSPCs are among the strongest: IFNGR on HSPC interactions with IFNG
from CD8^+ T and NK cells are the strongest. Second, when we linked our
RNA sequencing results to existing GWAS datasets, we discovered
multiple immune-related genes that potentially suggest a germline
background to an acquired immune disease. Polymorphisms may indicate
constitutional genes as genetic susceptibility factors and disease
modifiers, based not simply on association but on plausible causal
relationships. Such relations may include vulnerability to unregulated
responses to antigen stimulation, activation of myeloid cell targets in
response to T cell attack, and exaggerated specific immune pathway
responses, especially for IFN-γ signaling and production. As one
example, increased cell–cell interactions between S1PR5/IRF5^+
cytotoxic cells and other T cell and major marrow cell types accompany
enhanced crosstalk, in a genetic background immune activation,
particularly of the IFN-γ pathway.
Among the obvious limitations of a sample size, clinical follow-up,
drop-out RNA sequences, and antibody selection for cytometry, we note
more specific difficulties. First, there were too few non-responders
after highly effective therapy to allow an analysis of categorized
response. Second, computational imputation of cell–cell interactions
through ligand and receptor gene expression has been widely used, and
algorithms now incorporated curated interacting proteins from
well-established databases and consider multi-subunit structures for
accuracy, but datasets are by far from complete and accurate; testing
by individual functional assays was not feasible, nor is there a
statistic to validate the validation procedure itself. Nevertheless and
important observations from scRNA-seq were consistent with CyTOF and
conventional flow cytometry, with historical cell co-culture
observations decades ago^[225]19,[226]20, and with other published
single-cell data in human marrow failure
syndromes^[227]50,[228]51,[229]64–[230]67. Third, inference from GWAS
data was made with publicly available datasets, and linking germline
background with transcriptome data was indirect. Although costly,
simultaneous GWAS and single-cell multi-omics (genomics,
transcriptomics, and phenotyping) in individual patients could provide
more comprehensive and direct information as to the cell- and cell
population-specific roles of genetic background in modulating
disease^[231]68,[232]69. Fourth, although in theory there is a large
pool of TCR sequences in the population, profiling of a “complete” T
cell clonotype in the current and many other studies is inevitably
limited by a patient cohort and individual heterogeneity, availability
of samples, disease and pathological or biological conditions, cost,
and sequencing techniques. Last, direct and more detailed comparison of
our results with published work using CyTOF and/or scRNA-seq was not
undertaken, due to heterogeneity of patient cohorts (disease severity
and treatment regimens) and specimen types (peripheral blood, BM, and
isolated cell populations), and variations in algorithms for cell
identification from study to study. Integration and comprehensive
analysis of currently available datasets of AA patients and, more
broadly, datasets in other autoimmune diseases are desirable in future
mining expeditions. “Incompleteness” is not a fault of single-cell
methodologies but a powerful feature of these novel, developing
approaches.
The vast amounts of data from single-cell experiments and the
complexity of the analytic programs limit any single descriptive
report, but one publication from a single laboratory is only the
beginning of global analyses and extension of reported results.
Multiple sclerosis, uveitis, diabetes mellitus, and inflammatory bowel
disease share pathophysiologic features with immune AA and respond to
immunosuppressive therapies. Our approach offers a guide to laboratory
investigations of these and other diseases of altered immunity and
tissue destruction.
Methods
Patient cohort
We conducted a prospective phase 2 study under the protocol
([233]www.clinicaltrials.gov [234]NCT01623167) approved by the
Institutional Review Boards of the National Heart, Lung, and Blood
Institute, in accordance with the Declaration of Helsinki. A total of
139 persons were enrolled: their SAA had not been definitively treated
with ATG-based IST, and they lacked a suitable matched sibling marrow
donor or were not candidates for hematopoietic stem cell
transplantation, due to patients’ choices, advanced ages, or
socioeconomic factors. Patients, 2 years of age or older, were enrolled
from December 2014 to May 2022 and treated with hATG, CSA, and
eltrombopag combination (hATG from day 1 for 4 days, therapeutic dosing
of CSA from day 1 for 6 months, and eltrombopag starting from day 1 to
6 months). All patients met clinical criteria for SAA. For persons who
met the criteria for very SAA (VSAA; absolute neutrophil count
(ANC) ≤ 200 × 109/l), treatment was initiated based on morphologic
confirmation while cytogenetic studies were pending. Age- and
sex-matched healthy donors were enrolled as controls under protocol
[235]NCT00442195 in NHLBI: UPN111: 62-year-old female; UPN112,
27-year-old female; UPN113, 28-year-old female; UPN114, 23-year-old
male.
BM processing
BM specimens were obtained from patients and healthy donors after
written informed consent under the corresponding protocols, and
processed within 6 h after collection. BMMNCs were isolated from each
person by density centrifugation using LSM Lymphocyte Separation Medium
(Cat# 50494X, MP Biomedicals). Briefly, BM was diluted twofold using
phosphate-buffered saline (PBS) (Cat# 17-516Q, Lonza), layered on top
of 1 volume LSM Lymphocyte Separation Medium in a 50-ml Falcon tube,
and spun down at 1140 × g for 25 min at room temperature with the brake
off. A BMMNC layer was isolated and washed with PBS after red blood
cell lysing with ACK lysing buffer (Cat# 118-156-101, Quality
Biological). BMMNCs were resuspended in the IMDM (Cat# 12440053, Thermo
Fisher Scientific) + 2% fetal bovine serum (Cat# 12306C, Sigma-Aldrich)
before fluorescence-activated cell sorting (FACS) to enrich
lineage^−CD34^+ HSPCs. BMMNCs were stained in 50 μl volume with
monoclonal antibodies for 30 min on ice: 5 μl anti-human lineage
cocktail (CD3, CD14, CD16, CD19, CD20 and CD56; clones UCHT1, HCD14,
3G8, HIB19, 2H7, and HCD56, respectively, Cat# 348805, Biolegend) in
Pacific Blue; 5 μl anti-CD34 Ab (clone 581, Cat# 555822, BD
Biosciences) in PE and 5 μl anti-CD38Ab (clone HIT2, Cat# 555462, BD
Biosciences) in APC. Cells were sorted using the FACSAria Fusion Flow
Cytometer (BD Biosciences). Aliquots of BMMNCs were subjected to
multi-color flow cytometry to profile HSP subpopulations. BMMNCs and
purified lineage^−CD34^+ cells were subjected to scRNA-seq.
Flow cytometry profiling of HSPCs
BMMNCs were stained in a 50 μl volume with antibody mixtures on ice for
30 min in RPMI 1640 (Cat# 11875093, Life Technologies). Samples were
subsequently acquired using the BD LSR Fortessa cytometer (BD
Biosciences), followed by post-acquisition analysis using Flowjo
software (v.7.6.4; Flowjo LLC, BD Biosciences). Antibodies used for
flow cytometry analyses were: 5 μl anti-human lineage cocktail (CD3,
CD14, CD16, CD19, CD20, and CD56; clones UCHT1, HCD14, 3G8, HIB19, 2H7,
and HCD56, respectively, Cat# 348805, Biolegend) in Pacific Blue; 5 μl
anti-human CD34 in PE (clone 581, Cat# 550761, BD Biosciences), 5 μl
anti-human CD38 in APC (clone HIT2, Cat# 555462, BD Biosciences), 5 μl
anti-CD90 in FITC (clone 5E10, Cat# 328108, Biolegend), 5 μl anti-human
CD10 in BV605 (clone HI10A, Cat# 562978, BD Biosciences), 5 μl
anti-human CD135 in PE/Cy7 (clone BV10A4H2, Cat# 313314, Biolegend),
and 5 μl anti-human CD45RA in BV510 (clone HI100, Cat# 304142,
Biolegend). Cell populations were defined as reported^[236]36: HSC,
Lineage^−CD34^+CD38^−; CMP/MEP, Lineage^−CD34^+CD38^+CD10^−CD45RA^−;
GMP, Lineage^−CD34^+CD38^+CD10^−CD45RA^+; LymP, Lineage^−CD34^+ CD38^+C
D10^+.
CyTOF
A panel of antibodies based on surface markers and transcription
factors (Supplementary Data [237]2) was designed to identify different
cell populations for CyTOF analysis. Most meta-isotope-labeled
antibodies were purchased from Fluidim. Some unlabeled antibodies were
purchased from BioLegend and eBioscience, and were tagged with rare
metal isotopes in King’s College London, UK. BMMNCs (10 × 10^6)/patient
were stained with surface antibody mixtures on ice for 30 min in
staining buffer. Subsequently, intracellular staining for transcription
factors was performed after fixation and permeabilization according to
the manufacturer’s instructions (eBioscience). The CyTOF-2 mass
cytometer (Fluidigm) was used for data acquisition. Acquired data were
normalized based on normalization beads (Ce 140, Eu151, Eu153, Ho165,
and Lu175)^[238]70. Automated clustering was performed on a subset of
800,000 cells sampled from all individuals.
Data processing, scale transformation, automated clustering, and distance
computations
Data were initially processed and analyzed using Cytobank^[239]71.
CD45^+ cells were gated to eliminate debris, doublets, and dead cells.
The stand-alone analysis tool cyt was also used for performing t-SNE
dimensionality reduction and merging distinct FCS files^[240]72. We
analyzed mass cytometry complex data using viSNE^[241]72 to visually
identify delineated subpopulations^[242]73 in combination with
SPADE^[243]74 and heat maps to distinguish different types of cells.
Identifying cell populations with unsupervised machine learning
We employed a data analysis pipeline inspired by the T-REX
algorithm^[244]75 to uncover cell clusters associated with treatment
response in BM samples. This involved analyzing data from 33 patients
before and after treatment, and 4 healthy controls. We used a custom R
workflow integrating dimensionality reduction (UMAP), nearest neighbor
search (KNN), and marker enrichment modeling (MEM). The analysis
scripts are publicly available online
([245]https://github.com/cytolab/T-REX). UMAP was used to compress
high-dimensional data while preserving relevant relationships. Default
parameters from the uwot package were employed. KNN identified nearest
neighbors for each cell using the low-dimensional data. Optimization
based on tetramer enrichment determined the optimal k value (number of
neighbors) to be 100. Based on the KNN graph, a change in cell
abundance between two samples was calculated.
Regions were analyzed using DBSCAN for clustering and MEM for detailed
phenotype characterization. Regions demonstrating significant changes
(<5% and >95%) were highlighted as orange and blue, respectively.
[246]MEM was applied to assess feature enrichment within KNN regions
surrounding each cell. Instead of requiring a reference control, a
statistical null reference was employed, representing a median IQR of
all analyzed features. Enrichment values were mapped from 0 to +10.
Differential abundance analysis
We used Milo (v1.2.0) to test for the differential abundance of cells
within the inferred neighborhoods, between two conditions (that is, SAA
patients versus healthy controls or post-treatment versus pre-treatment
samples). We first used the buildGraph function to construct a KNN
graph with k = 100. Next, we used the make_Nhoods function to assign
cells to neighborhoods based on their connectivity over the KNN graph.
The neighborhoods were projected to UMAP for visualization. To test for
differential abundance, Milo fit a Non-Binomial GLM to the counts of
each neighborhood, accounting for different numbers of cells across
samples using the Trimmed Mean of M values normalization. The spatial
FDR and log2foldchange of the number of cells between two conditions in
each neighborhood were used for visualization. The beewarm plot of the
distribution of log2foldchange across neighborhoods was used to present
the differential abundance of cell populations.
BinaryClust
CyTOF analysis was further performed, separately, in a semi-supervised
way using an in-house pipeline, BinaryClust
([247]https://github.com/desmchoy/BinaryClust)^[248]40. Briefly, FCS
files were imported using the R package flowCore. Data were arcsinh
transformed with a cofactor of 5, and key channels of CyTOF data had an
expected behavior of being log normal with zero inflation (confirmed
with the function examine_data). By applying 2-means clustering for
each of the key channels, it was possible to binary classify each
marker into positive and negative populations to generate a
“classification matrix” for all the cell subpopulations. BinaryClust
was applied to each sample separately in a parallel manner. Cell
abundances and median expression for each marker were then tallied for
each sample. Differential expression and abundances between conditions
were then computed in a pairwise manner. Statistical significances were
determined by the Mann–Whitney U tests with the Benjamini–Hochberg
corrections. Summary heatmaps of differential abundances and expression
results were plotted using the R package ComplexHeatmap.
Density plot
Processed CyTOF data from BinaryClust has been exported to Python and
used for further analyses with the “scanpy” package (version
1.9.6)^[249]76. The neighborhood graph for the UMAP has been calculated
using the “neighbors” function with the parameters of 30 neighbors and
a cosine distance metric. The density has been calculated with the
“embedding_density” function.
Definition of a predicative index and a therapeutic index
Serial sampling of acutely ill patients is difficult, but rigorously
processed tissues, collected before therapy and after response, allows
for strong inferences as to pathophysiology and drug mechanisms of
action. As there was only a single non-responding patient, direct
comparisons between R and NR as categorical variables could not be
performed. Instead, we generated predictive and therapeutic indices by
conversion of blood cell counts to scores as continuous variables. This
method yielded correlations of baseline parameters and hematologic
response, a predictive index, and dynamic changes in parameters to
correlate with hematopoietic recovery at 6 months, a therapeutic index
(Supplementary Figs. [250]11 and [251]12)^[252]77.
To circumvent the limitation of sample sizes and the fact that only one
non-responder at a primary end point, we designed two indices:
[MATH: Pi :MATH]
and a
[MATH: Ti :MATH]
to leverage the single-cell and clinical data in a continuous axis, and
thus could detect consistent patterns among immune cell compositions
and their connections with blood counts after treatment in all
patients. We calculated a blood count score for each sample, combining
white blood cell counts, hemoglobin level, and platelet counts to
represent trilineage hematopoietic recovery. Hematopoietic score
changes were calculated by changes of blood count scores between post-
and pre-treatment samples. The
[MATH: Pi :MATH]
was to calculate the correlation of baseline parameters with blood
count scores at 6 months after treatment; the Ti was to calculate the
correlation of changes of parameters (cellular proportion, dynamic gene
expression, TCR clonality, and others) with changes of blood count
scores between post- and pre-treatment samples^[253]78.
We converted ANC, platelet count, and absolute reticulocyte count (ARC)
at baseline and post-treatment to blood count scores on the same scale
based on blood count distributions, by calculating their distance[times
of standard deviation] to the median of those from two large Caucasia
cohorts^[254]78,[255]79. First, as expected, ANC, platelet, and ARC
scores were positively correlated with ANC, platelet count, and ARC; we
defined an average of these three counts as a “hematopoietic score”,
and a change of this average score post-treatment as a “hematopoietic
recovery score.” We checked validity of this approach by correlating
ANC, platelet count, ARC, and the average hematopoietic scores with
disease severity and response to treatment, and found these scores
representing residual hematopoiesis and recovery, were largely
consistent with responses (complete respnse (CR), partial
response (PR), and NR; and CR/strong PR and NR/weak PR) (Supplementary
Figs. [256]11 and [257]12). In both pre- and post-treatment samples, we
identified a myeloid cell abundance largely positively correlated with
the hematopoietic score while a lymphoid cell abundance largely
negatively correlated with the hematopoietic score (Supplementary
Fig. [258]13a). Baseline myeloid cell abundance predicted a better
response at 6 months and baseline lymphoid cell abundance negatively
strongly correlated with hematopoietic recovery (Supplementary
Fig. [259]13b). As expected, increased myeloid and B cell abundance and
decreased CD8^+ T and CD4^+ T cell abundance following therapy
correlated with recovery (Supplementary Fig. [260]13c).
scRNA-seq library and scTCR/BCR-seq library preparation and sequencing
scRNA-seq coupled with scTCR/BCR-seq analysis was performed with the
10x Genomics System using the 10x Genomics Single Cell Immune Profiling
Solution v 1.1 (the Chromium Single Cell 5′ Reagent Kit v1 and v1.1,
Cat# 1000165, 10x Genomics), following the manufacturer’s protocols
([261]www.10xgenomics.com)^[262]80. Briefly, BMMNs and FACS-sorted BM
lineage^−CD34^+ cells were washed with 1X PBS with 0.04% (wt/vol)
bovine serum albumin. Cell concentration and viability were determined
using the Countess II automatic cell counter (Thermo Fisher Scientific)
and the trypan blue staining method. Cell loading and capturing were
done on the Chromium Controller. Following reverse transcription and
cell barcoding in droplets, emulsions were broken, and cDNA was
purified using Dynabeads MyOne SILANE, followed by PCR amplification.
Amplified cDNA was then used for 5′ gene expression library
construction and TCR/BCR enrichment. For gene expression library
construction, the amplified cDNA was fragmented, end-repaired, and
double-sided size-selected with SPRIselect beads. For TCR/BCR library
construction, TCR/BCR transcripts were enriched from amplified cDNA by
PCR, and were sequenced on the Illumina NovaSeq system using read
lengths of 26-bp read 1, 8 bp i7 index, 98-bp read 2. The scTCR/BCR
libraries were sequenced using read lengths of 150-bp read 1, 8 bp i7
index, 150-bp read 2.
scRNA-seq data preprocessing and quality control
The sequencing data was analyzed with the cellranger pipeline
([263]https://support.10xgenomics.com/single-cell-gene-expression/softw
are/pipelines/latest/what-is-cell-ranger, v7.0.1) to process scRNA-seq
raw data in order to align reads to the genome, and to generate
gene–cell expression matrices. Specifically, sequencing reads were
aligned to the hg38 reference genome by STAR with annotation of
ENSEMBL. Uniquely aligned reads were used to quantify gene expression
levels for all genes with Unique Molecular Identifiers (UMIs). We
filtered and removed low-quality cells from further analyses if the
number of genes detected was fewer than 300 (low quality, potential
fragments) or more than 6,000 (potential doublets). We also excluded
those cells with a high percentage of mitochondrial gene reads
(>10%)^[264]81, and the remaining single cells were subjected to
subsequent data analysis.
TCR reads were aligned to the GRCh38 reference genome, and consensus
TCR annotation was performed using the cellranger vdj program (10x
Genomics, version 7.0.1). TCR libraries were sequenced to a final 1662
mean read pairs/cell. TCR annotation was performed using the 10x
cellranger vdj pipeline as described at
[265]https://support.10xgenomics.com/single-cell-vdj/software/pipelines
/latest/using/vdj. Barcodes with a higher number of UMI counts than a
threshold derived from the simulated backgrounds were considered as
cell-associated barcodes. V(D)J read filtering and assembly were
implemented as in a previous study^[266]82. In summary, cellranger
firstly trimmed known adapters and primer sequences from the 5′ and 3′
ends of reads and then filtered away reads lacking at least one 15-bp
exact match against at least one reference segment (TCR, TRA, and TRB
gene annotations in ENSEMBL version 87). Next, cellranger performed de
novo assembly for each barcode by building a De Bruijn graph of reads
independently. The assembler output contig sequences which were
assigned at least one UMI. Finally, each assembled contig was aligned
against all of the germline segment reference sequences of the V, D, J,
C, and 5′ UTR regions. Only the productive contigs were kept. Most cell
barcodes contained two matching productive contigs, comprising either a
TCRA or a TCRB, though it was biological possibility that fewer
productive contigs (low sensitivity) or >2 productive contigs (some
cells do contain more than one TCRB or TCRA chain) were associated with
one cell barcode^[267]83. Similarly, BCR reads were also processed
using the cellranger vdj program with the IMGT database of the GRCh38
genome as reference. Only productive contigs of BCR were kept for
analysis. Sequencing metrics are shown in Supplementary Data [268]4.
Downstream analyses were mainly performed using the R software^[269]84
package Seurat ([270]http://satijalab.org/seurat/, v4.1.2) on BMMNCs
and lineage^−CD34^+ cells separately^[271]85. Raw reads in each cell
were first scaled by a library size to 10000 and then log-transformed.
To improve downstream dimensionality reduction and clustering, we
performed a data integration with the robust principal component
analysis (RPCA) algorithm, instead of canonical correlation analysis in
Seurat, due to memory issues brought by data of millions of cells. RPCA
is less memory-intensive at the cost of being more conservative with
integration. The 2000 highly variable genes were used for PCA of
high-dimensional data. Top 30 principal components were selected for
unsupervised clustering of cells with a graph-based clustering
approach, and further dimensional reduction with UMAP.
Quantification and statistical analysis
After alignment with RPCA, cells from different participants were well
mixed and separated by cell-type categories. Clusters were identified
by the FindClusters function in Seurat^[272]86 and shown in UMAP plots.
Accordingly, marker genes in each cluster were identified using the
Wilcoxon Rank Sum test implemented in the Seurat v.4.1.2 package.
Cell-type assignment
For lineage^−CD34^+ cells, an HSPC type was assigned to each cluster
based on significance in overlapping between HSPCs and cluster-specific
genes (Fisher’s exact test)^[273]30,[274]87. Top 250 overexpressed
genes in each HSPC population were downloaded from
[275]http://www.jdstemcellresearch.ca/node/32 and were denominated as
cell-type-specific signature genes. Subsequently, the one-tailed
Fisher’s exact test was utilized to assert enrichment of HSPC signature
genes in the cluster marker gene list for each cluster^[276]88, and a
top-associated cell type was assigned to each cluster. Cell types of BM
cells were assigned with the same strategies using the Human Cell Atlas
as reference^[277]89. The cell-type annotations were refined with known
marker genes.
Differential abundance analysis
Similar to the data analysis of CyTOF, differences in cell abundances
between patients’ baseline samples versus healthy donors, and between
baseline samples versus samples after treatment were tested by the
differential abundance testing with the miloR package
([278]https://bioconductor.org/packages/release/bioc/html/miloR.html).
Specifically, the Seurat objects were converted to SingleCellExperiment
objects. The PCAs and UMAPs of Seurat objects were also assigned to the
new objects. Each neighborhood was assigned a cell-type label based on
the majority voting of cells belonging to that neighborhood. A
neighborhood was labeled as “Mixed” if the most abundant label was
present in <75% of cells within that neighborhood^[279]90.
Reconstruction of hematopoiesis trajectories using scRNA-seq data and dynamic
gene expression
We used Monocle 2, a widely used trajectory-detection algorithm for
pseudotime ordering. Monocle 2 orders cells along pseudotime that
recapitulate known marker trends in the differentiation. Tracking gene
expression, pathway activity, or cell-type composition changes along
pseudotime enables the determination of the differentiation change for
each of the terminal fates.
Projection of patients’ cells to the map of normal hematopoiesis
To characterize early hematopoiesis in SAA patients, individual cells
were projected onto the map of normal hematopoietic differentiation
based on cell-by-cell comparison of patterns of global gene expression
and localization to the most similar healthy donor cells with the
function of transfer_cell_labels() and fix missing_cell_labels()
functions in Monocle 2. This strategy was used for mapping patients’
cells on UMAP plots and assigning pseudotime estimation for cells in
individual patients.
Comparison of lineage gene with area under the receiver operating
characteristic curve (AUC) scores
AUCell package in Bioconductor was used to calculate AUC scores of
lineage-specific gene expression (HSCs, MEPs, GMPs, and LymPs) of
single cells in individual patients, and average AUC scores of specific
lineages of all cells were plotted.
Differential expression of genes and generation of heatmaps
Differentially expressed genes were defined with the FindMarkers
function in Seurat, by comparing gene expression in one cell subset
with expression in all others. Heatmaps and network visualization were
generated with ggplot2 and heatmap2 in the R package.
GSEA
[280]GSEA is based on fold-changes of all detected genes. To create
gene sets for a genome with custom annotations, we associated our genes
with known hallmark gene sets and manually created gene sets.
Fgsea^[281]91 was used for GSEA and to plot the running normalized
enrichment scores along the ranked gene list.
Definition of pathway activity scores
We downloaded gene lists of HALLMARK_INFLAMMATORY_RESPONSE,
HALLMARK_HALLMARK_TNFA_SIGNALING_VIA_NFKB, and
HALLMARK_INTERFERON_GAMMA_RESPONSE from MSigDB of GSEA, and their
activity scores (expression levels) in different samples or cell
populations were calculated with the AddModuleScore() function built in
Seurat ([282]http://satijalab.org/seurat/). The activity scores were
normalized with healthy donors included in individual studies, and the
double-sided t-test was used to assess the difference between samples
or cell populations. We defined a cell activation score, a cell
cytotoxicity score, and an exhaustion score based on a set of cell
cytotoxicity and exhaustion-related genes identified. The cell
activation score was calculated with expression of the gene set
GO:0001775. The cytotoxicity score was calculated by the
AddModuleScore() function with the expression of GZMA, GZMH, GZMM,
PRF1, and GNLY. The exhaustion score was calculated by AddModuleScore()
function with the expression of CXCL13, HAVCR2, PDCD1, TIGIT, LAG3,
CTLA4, LAYN, RBPJ, VCAM1, GZMB, TOX, and MYO7A.
Analysis of sources of transcriptomic variation
Gene expression is impacted by a number of biological factors.
Decomposing gene expression into the percentage attributable to
multiple biological sources of variation helps to understand their
contribution magnitude. The expression variance for each gene was
partitioned into the variance attributable to each variable using a
linear mixed model implemented in the variancePartition package in
Bioconductor. In this study, categorical variables (disease status,
treatment, and disease severity) were modeled as the contributors, and
the results were visualized using the ggtern package, an extension of
ggplot2.
Ligand-receptor analysis
CellPhoneDB (v2.0.0)^[283]46 was used to identify the putative
cell–cell interactions of SAA patients and healthy donors, and between
different cell populations. The code of Ktplots
([284]https://github.com/zktuong/ktplots) was modified to visualize
ligand-receptor interacting strength across cell populations from
CellPhoneDB results. We also used the R package NicheNet^[285]92 to
predict ligand-receptor interactions that might drive gene expression
changes in our cell types of interest. We combined all HSPCs,
monocytes, and neutrophils for this analysis. All default parameters
were used with the exception of setting lower cutoff thresholds of 0.3
and 0.6 for “prepare_ligand_target_visualization”.
Reconstruction of T cell differentiation trajectories using scRNA-seq data
and dynamic gene or gene programs expression
In order to estimate T cell differentiation, trajectory inference was
performed with the R package Slingshot. The analyses were performed for
CD8^+ and CD4^+ T cells separately. For each analysis, the UMAP matrix
was fed into Slingshot, and a naive T cell population was manually
designated as the root of all inferred trajectories, considering naive
T cells differentiate to other T cells. The pseudotime variable was
inferred by fitting simultaneous principal curves for further analysis.
Estimation of RNA velocity
Spliced and unspliced transcripts were quantified by Velocyto function
with sorted BAM files of cellranger. T cell clusters were evaluated by
RNA velocity analysis using scVelo (V0.2.4), separately for healthy
controls and patients. Velocity-derived counts were processed,
filtered, and normalized before velocity estimation on the basis of top
2000 highly variable genes with at least 20 UMIs for both spliced and
unspliced transcripts across all cells. The moments facilitated the RNA
velocity estimation implemented in I function scv.tl.velocity with a
mode set to “dynamical”. The estimated velocities were used to
construct a velocity graph representing the transition probabilities
among cells by the function scv.tl.velocity_graph. Finally, the
velocity graph was used to embed the RNA velocities into the UMAP by
the function scv.pl.velocity_embedding_stream.
Diversity index calculation
In this study, several methods represent the number of clones
(identical TCR/BCR chains) present (richness) and their relative
frequency (evenness). The Shannon entropy weighs both of these aspects
of diversity equally, and it is an intuitive measure whereby the
maximum value is determined by the total size of the repertoire.
Entropy values decrease with increasing inequality of frequency as a
result of clonal expansion. The Shannon entropy in a population of N
clones with nucleotide frequency
[MATH: pi
:MATH]
is defined by Eq. ([286]1):
[MATH: HP=−
∑i=1
npilog2pi
:MATH]
1
The Gini coefficient is a number aimed at measuring the inequality in a
distribution. It is most often used in economics to measure a country’s
wealth distribution and has been widely used in diversity assessment of
TCRs/BCRs^[287]93. The Gini index and Shannon entropy for clonality and
diversity analyses are calculated with the R package of tCR
([288]https://imminfo.github.io/tcr/).
Definition of T cell clonal expansion and clonal dynamics
Following a widely used criteria and considering controversial
definitions of “a clone” (a single cell versus a few cells), we defined
T cells as “clonally expanded” when there were >2 T cells with
an identical TCR, “not-expanded” when there were 1 or 2 T cells with
an identical TCR, “highly clonally expanded” when there were ≥10 T
cells with an identical TCR. We repeated some analyses based on
different cut-offs of defining clonally expanded clones, to test the
validity of our results and conclusion; consistent results were
obtained and shown in Supplementary Fig. [289]14a–c. For clonal dynamic
pattern definition, to ensure sufficient numbers of cells for
differential gene expression analysis, we first filtered to retain
putative T cell clones with at least 20 cells. Then we categorized T
cell clonal size dynamics after treatment into three groups: increased
(>2 fold, treatment-resistant), unchanged (0.5–2 fold,
treatment-insensitive), and decreased (<0.5 fold, treatment-sensitive).
For statistical determination, we also analyzed clonal size dynamics
using edgeR^[290]44,[291]45, a package wide-used in gene differential
expression: increased (P value < 0.05 and Log Fold change > 1),
decreased (P value > 0.05 and Log Fold change < −1), unchanged (P
value ≥ 0.05, or P value < 0.05 but −1 ≤ Log Fold change ≤ 1). These
two methods showed largely consistent in defining clone size dynamics
and subsequent transcriptomic analysis (Supplementary Fig. [292]14d, e)
Identification of TCR motifs with shard antigen specificity using GLIPH2
GLIPH2^[293]94 was applied to T cells of SAA patients to identify
clusters of TCRs that recognized the same epitope based on CDR3β amino
acid sequence similarities, with default parameters. CDR3β amino acid
sequences of top 1000 most abundant CDRs were used to identify
significant motif lists and associated TCR convergence groups
(Supplementary Data [294]3). WebLogo
([295]https://weblogo.berkeley.edu) was used to generate the sequence
logos of the motifs.
Detection of chromosome abnormalities
We used the infercnv package in Bioconductor to distinguish aneuploid
and diploid cells, with similar results. Our in-house method used
Chromosome Relative Expression to identify and visualize chromosomal
gain and loss. The differentially expressed genes between aneuploid and
diploid cells were identified with the findMarkers function in Seurat.
GWAS data processing and analysis
We downloaded GWAS summary statistics of AA from the GWAS Catalog at
[296]EBI, and used the qqman R package to make a Manhattan plot. Then
we used a 10-kb window around the gene body to map SNPs to genes, and
H-MAGMA (v1.07) to compute gene-level association P values and z-scores
from the GWAS data. We selected top 100 genes based on MAGMA P values
as genetic putative disease genes.
Pathway enrichment analysis
We applied the built-in functions of H-MAGMA, using the results from
GWAS summary statistics as its input, to examine genome-wide enriched
biological pathways for AA. We calculated competitive P values by
examining results that the combined effect of genes within a pathway
was significantly greater than the combined effect of all other genes,
and 10,000 permutations were used to adjust competitive P values.
Specific procedures for the analysis of GWAS data at gene and pathway level
(1) We downloaded the GWAS statistics results for AA from GWAS Catalog
([297]https://www.ebi.ac.uk/gwas/downloads/summary-statistics), and
H-MAGMA and related sources files from GitHub
([298]https://github.com/thewonlab/H-MAGMA); (2) we ran H-MAGMA with
command: magma --bfile../../MAGMA/g1000_eur --pval use=hm_rsid, p_value
ncol=N --gene-annot to test the joint association of all markers and
local linkage disequilibrium to assign each gene a z-score; (3) we
created a rank file with the H-MAGMA z-score of genes; and (4) we ran
GSEA with the rank files for function analysis.
After getting the z-scores (as to putative disease) of all genes, we
integrated with scRNA-seq data to identify disease-relevant cell types
and to calculate disease scores of individual cells by gene expression,
as described below.
Integration of GWAS data and scRNA-seq data
H-MAGMA was used to perform gene-based analysis of the GWAS data to
integrate scRNA-seq data.
Inference of relevant cell types with GWAS data
To prioritize trait-relevant cell types by looking for enrichment of
the GWAS signal within functional regions, we used RolyRoly^[299]49,
which was designed to calculate effects of SNPs near protein-coding
genes on cell types contributing to complex traits. RolyPoly treats a
variance of each gene as the linear combination of each cell type and
estimates the related coefficients. Then variances of cell-type effects
were estimated using block bootstrap, and used t-statistics to test
significance. Utilizing GWAS summary statistics for all SNPs, Rolypoly
performed joint analysis with gene expression of a variety of cell
types, to define prioritized trait-relevant cell types.
On the single-cell level, we used scDRS (v1.0.1) to quantify the
aggregate expression of putative disease genes derived from GWAS
summary statistics using H-MAGMA (each putative disease gene is
weighted by its GWAS H-MAGMA z-score and inversely weighted by its
gene-specific technical noise level in the single-cell data) in each
cell of scRNA-seq data to generate cell-specific raw disease
scores^[300]48. We created 1000 matched control gene sets (matching the
gene set size, mean expression, and expression variance of the putative
disease genes) and calculated 1000 sets of cell-specific raw control
scores. Then, we normalized the raw disease scores and raw control
scores for each cell, producing normalized disease and normalized
control scores. To compute the scores described above, we used the
function scdrs compute-score (--n_ctrl = 1000, --cov-file = age, sex,
the number of genes per cell, and disease severity). Cells with high
scores were highlighted in UMAP in Fig. [301]6d.
Quantification and statistical analysis
No statistical method was used to predetermine a sample size. No data
were excluded from the analyses. We did not use any study design that
required randomization or blinding. Statistical analyses were performed
as described in the figure legends. Pearson correlation was used to
estimate correlations among immune cell subsets. Survival was measured
using the Kaplan–Meier method. Statistical significance was determined
by the Multivariate Cox regression, the Kruskal–Wallis test,
the Wilcoxon test, the one-way ANOVA test, and the Limma moderated
t-test with the Benjamini–Hochberg method for multiple comparisons
correction. A ROC curve and AUC values were generated using Prism
(v.9.5.1; GraphPad software).
Reporting summary
Further information on research design is available in the [302]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[303]Supplementary Information^ (3.3MB, pdf)
[304]41467_2025_60213_MOESM2_ESM.pdf^ (82.7KB, pdf)
Description of Additional Supplementary Files
[305]Supplementary Data 1^ (11.7KB, xlsx)
[306]Supplementary Data 2^ (14.7KB, xlsx)
[307]Supplementary Data 3^ (2MB, xlsx)
[308]Supplementary Data 4^ (48.6KB, xlsx)
[309]Reporting Summary^ (1MB, pdf)
[310]Transparent Peer Review file^ (1.4MB, pdf)
Source data
[311]Source Data^ (156.2KB, xlsx)
Acknowledgements