Abstract

   Tuberculosis, caused by Mycobacterium tuberculosis, remains an enduring
   global health challenge due to the limited efficacy of existing
   treatments. Although much research has focused on immune failure, the
   role of host macrophage biology in controlling the disease remains
   underappreciated. Here we show, through multi-modal single-cell RNA
   sequencing in a murine model, that different alveolar macrophage
   subsets play distinct roles in either advancing or controlling the
   disease. Initially, alveolar macrophages that are negative for the CD38
   marker are the main infected population. As the infection progresses,
   CD38^+ monocyte-derived and tissue-resident alveolar macrophages emerge
   as significant controllers of bacterial growth. These macrophages
   display a unique chromatin organization pre-infection, indicative of
   epigenetic priming for pro-inflammatory responses. Moreover, intranasal
   BCG immunization increases the numbers of CD38^+ macrophages, enhancing
   their capability to restrict Mycobacterium tuberculosis growth. Our
   findings highlight the dynamic roles of alveolar macrophages in
   tuberculosis and open pathways for improved vaccines and therapies.

   Subject terms: Tuberculosis, Bacteriology, Bacterial infection,
   Alveolar macrophages
     __________________________________________________________________

   Mycobacterium tuberculosis infection remains a global health crisis.
   Here researchers characterise the alveolar macrophage subsets in a
   murine model of BCG vaccination in mice and link increases of CD38+
   macrophages with restriction of bacterial growth.

Introduction

   Tuberculosis (TB) remains a major global health issue, as reported in
   WHO’s Global Tuberculosis Report 2022^[31]1. Mycobacterium tuberculosis
   (Mtb), the etiological agent of TB, primarily infects the lungs and has
   evolved to survive the host immune response. Within the lungs, alveolar
   macrophages (AMs) play a critical role in protecting the airway
   surfaces upon early infection^[32]2–[33]5, with recruited
   monocyte-derived macrophages (IMs)  increasing in number as the
   infection progresses^[34]6,[35]7. Therefore, understanding the specific
   role of AMs during Mtb infection is key to uncovering disease
   mechanisms and guiding the development of effective vaccines.

   In recent years, our understanding of the interactions between AMs and
   Mtb has grown considerably. Bulk transcriptomic studies, alongside dual
   RNA-seq, have demonstrated that AMs can facilitate Mtb replication and
   dissemination in the lung^[36]6,[37]8–[38]10. Recent advances in
   single-cell transcriptomics have uncovered the presence of heterogenous
   populations within the AM and IM lineages, each exhibiting varying
   responses to Mtb infection^[39]9. Moreover, innovative vaccination
   approaches, including subcutaneous, intravenous (iv) and pulmonary
   immunization using live Bacillus Calmette–Guérin (BCG), have been found
   to activate AMs and innate immune cells, and provide lasting protection
   against Mtb^[40]11–[41]17. However, despite these advancements, the
   specific AM populations and properties underlying this protective
   effect remain unidentified.

   To date, vaccine development against TB has met with modest
   success^[42]18–[43]21 and it is seriously impaired by our lack of
   reliable biomarkers that are predictive of vaccine efficacy and
   outcome^[44]22–[45]26. The identification and phenotypic
   characterization of those alveolar macrophage subsets best equipped to
   mediate protection against Mtb would represent a significant advance in
   defining the immune response that we wish to drive with new
   immunotherapeutic interventions.

   Here, we show that by leveraging our established multi-modal
   single-cell RNA sequencing (scRNA-seq) protocol^[46]9,[47]27, we can
   identify the specific AM subsets that provide protection against Mtb.
   Employing the murine TB model, we evaluate macrophage phenotypes at 2-,
   3-, 4-, and 6-weeks post-Mtb challenge to understand their roles in the
   early stages of infection and after BCG intranasal immunization. Our
   analysis reveals the most pronounced phenotypic changes occur within
   the tissue-resident AM (TR-AMs) populations and the recruited
   monocyte-derived AMs (moAMs). A key discovery is the identification of
   CD38, an extracellular NADase linked to macrophage activation,
   inflammation, and infection^[48]28–[49]31, as a potential marker for
   protective responses against Mtb. We observe distinct phenotypes for
   SiglecF^+ CD38^+ TR-AMs, SiglecF^- CD38^+ moAMs, and SiglecF^+ CD38^−
   TR-AMs throughout the disease process. Initially, CD38^− TR-AMs,
   displaying a muted pro-inflammatory response, are the main infected
   cell type. As infection progresses, CD38^− TR-AM numbers decrease,
   while CD38^+ TR-AMs and the newly recruited moAMs rise in prevalence.
   In the later stages of infection, these CD38^+ AM subsets emerge as the
   dominant infected host AM populations, demonstrating an increased
   capacity to restrict Mtb’s growth.

   Single-cell Assay for Transposase-Accessible Chromatin using sequencing
   (ScATAC-seq) analysis reveals that CD38^+ TR-AMs have a unique
   chromatin structure distinct from CD38^− TR-AMs. These cells are
   already present in the lungs of naïve mice prior to infection,
   indicating an inherent epigenetic predisposition modulating their
   respective responses to Mtb infection. Additionally, intranasal
   immunization with live BCG results in an increase in CD38^+ macrophages
   in the lungs. This increase, resulting from both an influx of CD38^+
   monocyte-derived macrophages and polarization of existing reactive
   pro-inflammatory CD38^+ TR-AMs, offers significant protection against
   Mtb challenge.

Results

Ontogeny and intra-lineage diversity are key determinants in macrophage
responses to Mtb infection

   To better understand macrophage behavior during Mtb infection, we
   conducted an extensive multi-modal scRNA-seq analysis of macrophage
   phenotypes at 2-, 3-, 4-, 6-weeks post-Mtb challenge, which spans the
   transition from innate to adaptive immune
   responses^[50]2,[51]10,[52]32–[53]36. Our aim was to determine how
   distinct macrophage populations respond post-Mtb infection, with the
   goal of identifying the subsets that are responsible for controller
   responses and those that favor Mtb replication and expansion.
   Consistent with our previous work^[54]9, we infected mice with the
   hspx’::gfp/smyc’::mCherry Mtb reporter strain. This strain expresses
   mCherry constitutively and GFP in response to host mediated
   immune-related stress^[55]35. For each infected host cell, we were able
   to assess the fitness status of intracellular Mtb, quantify the surface
   marker expression and profile their transcriptome^[56]9. After QC, our
   dataset comprised of 49,600 cells, the significant majority of which
   were identified as macrophages (Fig. [57]1A and Supplementary
   Fig. [58]1A).

Fig. 1. scRNA-seq Timepoint Analysis Reveals Distinct Macrophage Origins and
Monocyte-to-Macrophage Differentiation Post-Mtb Challenge.

   [59]Fig. 1
   [60]Open in a new tab

   A UMAP visualization representing scRNA-seq data from 49,600 cells
   collected at 2-, 3-, 4-, and 6-weeks post-Mtb challenge. Cells are
   clustered based on their transcriptional profiles in an unbiased
   manner. B UMAP visualizations of cells stained for SiglecF and Cd11c
   protein markers. C Violin plot representations of protein expression
   levels for SiglecF and CD11c in different AM subsets. Statistical
   significance for each marker was assessed using the Kruskal-Wallis
   test, followed by Dunn’s post-hoc tests with adjustments for multiple
   comparisons using the Bonferroni method. The results indicate highly
   significant differences among the groups, with the Kruskal-Wallis test
   yielding p values below the threshold of numerical resolution
   (χ² = 6066.47, df = 5, p < 0.0001 for SiglecF; χ² = 4598, df = 5,
   p < 0.0001 for Cd11c). Levels of significance in the plot are denoted
   as follows: *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001. D
   Visualization of scWGCNA-derived gene modules expression levels mapped
   onto UMAP embeddings. Left Panel: Expression pattern of genes in Module
   5 (110 genes). Right Panel: Expression pattern of genes in Module 6 (28
   genes). E DotPlot illustrating the expression pattern of
   tissue-resident and monocyte specific markers across macrophage cell
   clusters. Dot size and color intensity indicate percentage and average
   expression level, respectively. F Umap plots showing expression levels
   (in log-normalized counts) for the Mafb (top) and Ly6c2 genes (bottom)
   across all cells. Source data are provided as a Source Data file.

   Recent studies, including ours, have shown that both resident AMs and
   recruited IMs, are ontogenically diverse^[61]6,[62]8,[63]37,[64]38 and
   comprised of phenotypically-distinct
   subpopulations^[65]9,[66]39–[67]42. These subpopulations exhibit
   markedly different inflammatory responses to Mtb infection^[68]9. In
   the current study, we extend these findings, further characterizing the
   discrete AM and IMs subpopulations based on their origin and
   inflammatory profiles.

   AMs are often identified by their expression of surface markers SiglecF
   and CD11c^[69]6,[70]43,[71]44, and in our dataset we could identify AM
   subpopulations by SiglecF and CD11c staining (Fig. [72]1B).
   Interestingly, we also noticed a population of cells that cluster with
   AMs by UMAP analysis, but do not stain for either SiglecF or CD11c
   (Fig. [73]1A, [74]C). To determine the origin of these cells, we
   focused on the early infection stages (Naïve and 2 weeks
   post-infection). We performed unbiased single cell weighted gene
   correlation network analysis (scWGCNA) on the resulting 13,125 cells to
   identify gene co-expression modules that define this population^[75]45.
   The analysis identified two gene expression modules that mirrored the
   differential staining of the SiglecF and CD11c antibodies
   (Fig. [76]1D). Module 6 consists of a 28-gene co-expression set and
   includes genes commonly used to define tissue resident AMs—such as Cd9,
   Mrc1 (CD206), Lpl, Trf, and Chil3 (Ym1)
   (Fig. [77]1E)^[78]8,[79]9,[80]46,[81]47. In contrast, Module 5,
   represented by a larger 110-gene set and expressed by the SiglecF/CD11c
   double negative population, is enriched in genes associated with the
   monocytic lineage (Fig. [82]1E) and monocyte-to-macrophage
   differentiation (Supplementary Fig. [83]1B). These AM-like cells
   express MafB—a driver of monocyte-to-macrophage differentiation—and
   Ly6c2, a marker for cells of monocyte origins
   (Fig. [84]1F)^[85]48–[86]52. We therefore designated this population as
   moAMs (monocyte-derived AMs). In the lungs of naïve mice moAMs are
   absent, implying these monocytes migrate there and differentiate to
   perform AM-related functions after Mtb infection (Fig. [87]2A).
   Trajectory and pseudotime analyses corroborate this hypothesis,
   revealing that moAMs, along with the classical NOS2^+ and NOS2^- IMs,
   represent distinct cell fates (gray leaves) that originate from a
   population of infiltrating monocytes (root 2—white circle—and branching
   point 11—black circle -), characterized by a unique transcriptional
   signature associated with leukocyte adhesion and extravasation
   (Fig. [88]2B, [89]C)^[90]53–[91]64. Uninfected bystander moAMs express
   genes associated with macrophage functions including phagocytosis,
   lipid metabolism, and RNA/protein synthesis, indicative of roles beyond
   mere homeostasis (Fig. [92]2D)^[93]65–[94]67. This perspective is
   supported by the contrasting gene expression in Mtb-infected moAMs,
   which pivot to a pro-inflammatory state, as evidenced by the
   upregulation of genes such as Acsl1, Hmox1, Saa3, Ptgs2, Slc7a11,
   Clec4e among others (Fig. [95]2D)^[96]9. This transition is marked by
   elevated Nos2 expression and association with hspx’::GFP-high bacteria
   (Fig. [97]2E). Such findings align with prior research which described
   the role of interferon-γ in deactivating enhancer regions bound by the
   transcription factor Maf. This mechanism has been shown to suppress
   expression of M2 genes and increase activation of monocyte-derived
   macrophages^[98]68. This moAM cohort expresses CD38 (Fig. [99]2F),
   hence their designation as CD38^+ moAMs.

Fig. 2. Identification of moAMs and temporal analysis of pro-inflammatory
genes in AM and IM subsets.

   [100]Fig. 2
   [101]Open in a new tab

   A UMAP Visualization of cell clusters from Naïve and 2-weeks
   Post-infection timepoint. The NOS2^+ IMs, NOS2^- IMs and moAMs clusters
   are highlighted. B Pseudotime analysis of macrophages on the UMAP
   projection. Key features of the trajectory, including leaves (gray) and
   branch points (black), are annotated directly on the plot. C DotPlot
   showing the expression levels of key extravasation and adhesion markers
   across various macrophage clusters. Dot size and color intensity
   indicate percentage and average expression level, respectively. D
   Heatmap of differential gene expression for selected genes between
   bystander and infected moAMs. Each row represents a gene, and each
   column represents a cell. N = 1048 cells for moAMs Bystander and
   n = 1537 cells for moAMs Infected. E (Left Panel): Violin plot showing
   the aggregate expression level of Nos2 based on GFP status. Expression
   levels are visualized using violin plots overlaid with jitter plots to
   show individual data points and mean values represented by solid white
   points. Statistical significance was assessed using a two-sided
   Wilcoxon test (p < 2.22e^−16). Effect size measured as Cliff’s Delta is
   also displayed directly on the plot. (Right Panel): UMAPs of infected
   cells. In the top plot, cells are color-coded based on their Nos2
   expression level, while in the bottom plot, they are color-coded based
   on their GFP status. F UMAP of CD38 staining levels across cell
   subsets. G Volcano plot showing differential gene expression (DGE) in
   NOS2^+ and NOS2^− IMs, with key genes of interest labeled. DGE analysis
   was performed between NOS2^+ IMs and NOS2^− IMs using a two-sided
   Wilcoxon rank-sum test. P values were adjusted for multiple comparisons
   using the Bonferroni correction. Genes with an average log[2] fold
   change >0.58 and adjusted p < 0.05 were classified as significantly
   upregulated in NOS2+ IMs, while those with a log2 fold change <−0.58
   and adjusted p < 0.05 were classified as significantly downregulated.
   Exact p values and log[2] fold changes for all genes are provided in
   Supplementary Data [102]1. Source data are provided as a Source Data
   file.

   Next, we focused on TR-AMs and identified two subpopulations based on
   their CD38 expression. The CD38^+ population exhibits increased Nos2
   expression and is associated with stressed (hspx’::GFP high) bacteria
   (Fig. [103]2E, [104]F). Notably, these TR-AMs are dual positive for
   SiglecF and CD11c (Fig. [105]1B). They also express the TR-AM gene
   signature from Module 6 (Fig. [106]1D), leading us to annotate them as
   CD38^+ TR-AMs. Importantly, trajectory and pseudotime analyses
   establish the CD38^+ TR-AMs as a distinct lineage from moAMs, as
   evidenced by their origin from root 1 (white circle) (Fig. [107]2B).
   Additionally, we also identified CD38^− subpopulations of TR-AMs,
   associated with hspx’::GFP-low Mtb, indicating bacteria experiencing
   minimal host-derived stress (Fig. [108]2E, [109]F).

   In examining the IM subpopulations, we observed a similar level of
   heterogeneity. While both IM populations appear to originate from
   infiltrating monocytes (Fig. [110]2B), the Nos2^+ IMs are CD38^+,
   whereas the Nos2^- IM lack CD38 expression (Fig. [111]2F). Looking at
   the transcriptional profiles of these populations revealed that CD38^+
   IMs, akin to CD38^+ moAMs and TR-AMs, display a gene expression profile
   consistent with classic pro-inflammatory responses, which were
   previously linked to effective TB control^[112]6,[113]9 (Fig. [114]2G)
   (Supplementary Data [115]1).

   Conversely, the IM subpopulation that is CD38^− not only lack Nos2
   expression, but is also associated with hspx’::GFP-low Mtb
   (Fig. [116]2E, [117]F). Relative to NOS2^+ IMs, these cells show
   increased expression of genes linked to anti-inflammatory responses.
   These include transcripts for complement proteins C1q, Ccl8, Ms4a7
   among others, which have been previously characterized
   (Fig. [118]2G)^[119]9,[120]69–[121]73. Furthermore, this subset still
   shows a partial expression of extravasation and adhesion markers shared
   with infiltrating monocytes (Fig. [122]2C). This suggests that they
   represent recently arrived interstitial monocytes that have just been
   infected and have not yet undergone immune activation (Suppl.
   Data [123]1). In summary, our analysis identified distinct
   subpopulations of AMs and IMs with varying ontogeny and inflammatory
   profiles. Importantly, we discovered a population of monocyte-derived
   AMs capable of transitioning to a pro-inflammatory state upon Mtb
   infection, and two subpopulations of TR-AMs based on CD38 expression,
   which are associated with different bacterial phenotypes following Mtb
   infection.

Pro-inflammatory immune responses are associated with rising CD38 expression
in macrophages over time

   To understand the temporal evolution of macrophage immune responses
   following Mtb infection, we examined the expression patterns of
   pro-inflammatory gene signatures and associated markers over time. From
   our past studies, there was a discernible correlation between Nos2
   expression in host macrophages and bacterial stress
   (Fig. [124]2E)^[125]9. Building on this, we tracked the temporal trends
   of Nos2 expression within the infected macrophage populations. As the
   infection progresses, we observed an increase in nitric oxide
   production, reflected by both an increase in the level of Nos2
   expression per cell (intensity) and the number of cells expressing Nos2
   (prevalence), as shown in Fig. [126]3A. Flow cytometry data confirmed
   this trend; the median fluorescence intensity (MFI) of the GFP signal
   from hspx’::GFP-infected cells increased over time, indicating
   heightened bacterial stress as the infection proceeded (Fig. [127]3B).
   Importantly, infecting Nos2-KO mice with the hspx’::GFP Mtb reporter
   strain resulted in minimal GFP induction by Mtb at both 2 and 4 weeks
   post-infection (Fig. [128]3C). The limited induction of GFP expression
   by Mtb in Nos2-KO mice was also accompanied by an increase in bacterial
   load (Fig. [129]3D). Finally, flow cytometry analysis at 2 wpi of Rag1
   and IFN-γ KO mice revealed no induction of hspx’::GFP (Supplementary
   Fig. [130]2A), confirming that production of nitric oxide by
   pro-inflammatory macrophages is critically dependent on the adaptive
   immune response.

Fig. 3. Pro-inflammatory immune responses are associated with rising CD38
expression over time.

   [131]Fig. 3
   [132]Open in a new tab

   A Violin plot displaying the distribution of Nos2 gene expression
   levels in infected macrophages, split by timepoints. B Overlaid flow
   cytometry histograms (on the left) and a bar chart (on the right)
   display the median fluorescence intensity (MFI) of the Mtb GFP signal
   at various timepoints during the infection. C Flow cytometry analysis
   of the Mtb smyc’::mCherry/hspx’::GFP reporter strain at 2 and 4 wpi in
   NOS2^-/- mice. D Scatter plot comparing log[10]-transformed CFU counts
   between Control and NOS2^−/− mice. Data points represent individual
   observations with means indicated by horizontal bars. A t-test was used
   to assess the statistical significance of the difference in means
   between the two groups. Summary statistics including the mean, standard
   deviation (SD), and sample size (n) for each group are displayed in the
   plot. The t-test is two-sided, assuming equal variance, with the
   resulting p value annotated on the plot to indicate the statistical
   significance of the observed differences. The y-axis is labeled with
   log10 CFU Count to reflect the log transformation. E CD38 Protein
   Expression in macrophages, stratified by infection status and
   timepoint. Data are presented as violin plots with an overlaid box
   plot. The center line in the box plot represents the median expression
   level of CD38, the bounds of the box represent the interquartile range
   (IQR) from the 25th to the 75th percentile, and the whiskers extend to
   1.5 times the IQR from the quartiles. Statistical significance of
   pairwise comparisons between infection states at each timepoint was
   assessed using two-sided t-tests. P values were adjusted for multiple
   comparisons utilizing the Benjamini-Hochberg method; annotations on the
   plot reflect these adjusted p values. The effect size measured as
   Cliff’s Delta is also displayed directly on each plot, providing a
   standardized measure of the magnitude of differences. F Pathways
   upregulated at late (top) and early (bottom) timepoints. Generalized
   additive models (GAMs) were fitted to assess temporal changes in
   pathway expression levels. An analysis of variance (ANOVA) was
   performed on these models to derive p values, which were subsequently
   adjusted for multiple comparisons using the Benjamini-Hochberg method.
   The red line represents the fitted GAM-predicted trend over time for
   each pathway, with specific macrophage clusters labeled as appropriate.
   Exact adjusted p values are provided. Each cell cluster is represented
   by distinct colored points. Average pathway expression levels are on
   the y-axis, with inferred time from early to late on the x-axis. G
   Scatter plot showing two-dimensional coordinates of single cells from a
   force-directed layout embedding for infected and bystander groups. Each
   point represents a cell, with its position on the plot determined based
   on its expression similarities to other cells. Colors indicate the time
   point of data collection as per the right-side color bar. Source data
   are provided as a Source Data file.

   Consistent with these data, our surface marker analysis mirrored the
   trends observed in Nos2 RNA levels. Specifically, CD38 protein levels –
   a marker associated with inflammation^[133]29,[134]31 – increase in
   infected macrophages during the latter stages (Fig. [135]3E). Parallel
   trends were also seen with the marker CD11b (Suppl Fig. [136]2B). To
   gain deeper insights into the molecular mechanisms and biological
   processes driving these phenotypes, we performed a time-dependent
   pathway analysis^[137]74. This analysis identified 68 pathways that
   exhibited variations in expression as the infection progressed
   (p.adj < 0.05) (Suppl. Data [138]2). Pathways showing increased
   representation over time were predominantly associated with
   inflammatory and anti-bacterial responses and were overly represented
   in CD38^+ macrophages (Fig. [139]3F and Supplementary Fig. [140]2C).
   Conversely, pathways that dominated early stages of infection were
   mostly associated with CD38^− macrophages, and aligned with processes
   linked to M2 polarization, tissue homeostasis, and bacterial survival,
   such as negative regulation of protein kinase c signaling^[141]75,
   suppression of apoptosis and cell proliferation among others
   (Fig. [142]3F and Supplementary Fig. [143]2D). Confirming these
   observations, our analysis revealed an increased recovery of Mtb mRNA
   reads from CD38^+ macrophage populations over time, indicative of
   active bacterial degradation within these cells. Specifically, we noted
   low levels of Mtb reads at 2 weeks post-infection, which significantly
   increased from 3 to 6 weeks, as illustrated in Supplementary
   Fig. [144]2E.

   Intriguingly, when focusing on bystander macrophages, our analysis
   revealed no change in their CD38 staining levels, irrespective of the
   timepoint examined, contrasting starkly with their infected
   counterparts (Fig. [145]3E). To validate this, we leveraged a
   force-directed layout to conduct unbiased graph-based clustering of
   both cell types^[146]76. While infected cells clustered according to
   duration of infection, bystander cells grouped based on their ontogeny
   without temporal distinctions (Fig. [147]3G). This pronounced
   difference emphasizes that the inflammatory responses we observe aren’t
   merely a result of a broad immune activation of host macrophages over
   time; rather, they are finely tuned and specifically driven by active
   Mtb infection.

   In conclusion, our findings highlight a dynamic shift in the phenotype
   and relative proportion of infected macrophage populations over the
   course of infection. This transition emphasizes the role of CD38^+
   macrophages, which appear to be closely associated with the control of
   infection.

The relative proportion of Mtb-infected cells shifts from AMs to IMs over
time

   To quantify the changes in the proportion of infected macrophage
   populations over time, we employed a recently developed statistical
   framework designed for differential abundance testing (DAB)^[148]77. We
   used this approach to assess whether changes in abundance of both
   infected and bystander cells occur over the course of infection. Our
   analysis identified variations in the abundance of 400 neighbors across
   different timepoints in infected AMs and IMs subpopulations (with
   FDR < 0.1). This contrasted with bystander populations that exhibited
   minimal variation (Fig. [149]4A).

Fig. 4. Integrative analysis of differential abundance in MTB infection.

   [150]Fig. 4
   [151]Open in a new tab

   A Neighborhood graph projected on the UMAP plot displaying cell cluster
   relationships by differential abundance in infected and bystander
   populations. Neighbors with significant Log[2] Fold Change (LFC)
   depletion rates are in red, showing cells reduced during infection,
   while late-infection enriched neighbors are in blue (FDR < 0.1). B
   Horizontal scatter plot showing changes in macrophage subset abundance
   over time, with color-coded data points for statistical significance
   and change in direction: Blue for early infection increase (FDR < 0.1),
   Orange for late infection increase (FDR < 0.1), and Light Gray for no
   significant change. UMAP visualizations of macrophage subpopulations in
   infected (C) and bystander (D) cells across different timepoints of
   infection. The percentages of AMs and IMs within the total macrophage
   population at each timepoint are labeled on the plot. Source data are
   provided as a Source Data file.

   Mtb-infected neighbors belonging to the CD38^− TR-AMs subpopulations
   showed a marked decline in numbers late in infection. Conversely,
   neighbors within the Nos2^+ IMs showed a significant increase in
   abundance during the same later timepoints (Fig. [152]4B). Observations
   at the broader population level further highlighted this trend: at 2
   weeks post-infection (wpi), AMs constituted 80% of all the infected
   macrophages, but by 3wpi this percentage was reduced to ~39%, and by
   the 4th and 6th wpi, it dropped to around 15% (Fig. [153]4C). Overall,
   our data highlight a substantial shift in abundance from TR-AMs to IMs
   among the infected macrophages as the infection advances, as also
   confirmed by flow cytometry analysis (Supplementary Fig. [154]3). This
   trend sharply contrast the patterns seen in bystander AMs and IMs,
   whose proportions remain consistent across the analyzed timepoints
   (Fig. [155]4D). The stability in the proportion of bystander
   macrophages in the Mtb-infected lung from the 2 weeks post-infection
   (wpi) suggests that, within our infection model, monocyte-derived cells
   have already been recruited to the lung by this time but remain largely
   uninfected (Fig. [156]4C, D), similar to a previous report^[157]78.

   This premise is further supported by examining the distinct timelines
   of infection rates between CD38^+ IMs and moAMs. Despite pseudotime
   analysis indicating a shared origin for both populations from the early
   monocyte cluster (as illustrated in Fig. [158]2B), CD38^+ IMs only
   became the more abundant infected group by the 3rd week post-infection
   (Fig. [159]4C). In contrast, a substantial proportion of CD38^+ moAMs
   were infected at a much earlier stage, by 2wpi (Fig. [160]4B, [161]C).
   Recent studies provide context to these observations. The initial delay
   in the infection of monocyte-derived IM may be attributed to the
   early-stage preference of the infection for the lung’s alveolar space
   during the onset of TB. As the disease progresses, leading to the
   translocation of infected AMs from the alveolar to the interstitial
   lung space^[162]2,[163]6,[164]8,[165]10, the infection rate of CD38^+
   Nos2^+ IMs increases, leading to the overall shift in the abundance of
   infected macrophages from AMs to IMs as described above.

   In summary, while the total bystander IM and AM populations in the lung
   remain constant from 2wpi onwards, our data indicates a decline in the
   number of newly infected TR-AMs as the infection shifts from the
   alveolar to the interstitial lung space. In the established infection,
   our data demonstrate that monocyte-derived IMs are the dominant host
   cell population accounting for the majority of new infection events.

The early infection timepoints are dominated by CD38^− TR-AMs that are
relatively nonresponsive to tuberculosis infection

   Work by Rothchild et al. highlighted that murine AMs exhibit a robust
   anti-inflammatory response during the early stages of TB infection, up
   to 10 dpi. This early-phase response is postulated to be pivotal to
   their role in the initial infection
   process^[166]2,[167]6,[168]8,[169]10. Building on this previous work,
   our focus on the AM populations revealed that CD38^− AM subsets
   dominate this early infection landscape. Specifically, they account for
   ~70% of the Mtb-infected host cells at 14 dpi (Fig. [170]5A). To gain a
   more nuanced understanding of the CD38^− AMs and the impact of their
   changing numbers, we employed differential abundance testing (DAB) to
   group previously identified neighbors based on their log-fold change
   (LFC) depletion rates, as an alternative approach to the traditional
   cluster-based categorization (Fig. [171]5B)^[172]77. Through this
   approach, we identified two neighbor groups (group 2 and 5) exhibiting
   significant depletion rates in late infection (>5 LFC). Examining the
   cell types associated with these high LFC depletion rate groups
   (FDR < 0.1), we found that 98% of cells from group 2 belonged to the
   CD38^− TR-AMs subsets (spanning CD38^− TR-AM_1, CD38^− TR-AM_2, CD38^−
   TR-AM_3). For group 5, 57% were associated with the CD38^− TR-AMs and
   33% with CD38^+ moAMs (Fig. [173]5C). The over-representation of CD38^−
   TR-AMs in groups marked by pronounced depletion rates, identified
   through our DAB approach and independently confirmed by compositional
   analysis through scCODA^[174]79 (Supplementary File [175]1), suggests
   an innate susceptibility of these cells to Mtb infection. This
   vulnerability becomes more obvious when compared with the unchanging
   relative abundance of CD38^− TR-AMs in bystander populations across
   different infection timepoints (Fig. [176]4A, [177]D). Coupled with the
   observed decline in the overall count of infected AMs as the infection
   unfolds (Fig. [178]4C), it becomes evident that very few CD38^− TR-AMs
   become infected in the later stages. This underscores the idea that the
   CD38^− TR-AM clusters, dominant at 14 dpi, are poorly equipped to
   survive Mtb infection. This hypothesis is supported by the absence of
   Nos2 mapping reads across all infection timepoints in CD38^− TR-AMs
   (Fig. [179]5D), excepting cells bordering the MoAM in the CD38^−
   TR-AM_1 cluster (likely signifying a common scRNA-seq occurrence of
   cells being misattributed among neighboring clusters).

Fig. 5. Analysis of CD38^+ and CD38^− AMs post Mtb challenge.

   [180]Fig. 5
   [181]Open in a new tab

   A Pie charts of infected AM subtypes at 2 and 4 Weeks Post-Infection. B
   (Left) UMAP visualization showing neighborhood (nhoods) clustering
   based on differential abundance, where each point represents a
   neighborhood group positioned to reflect original data structures. The
   coloring denotes groups with similar LFC depletion rates. (Right)
   Horizontal scatter plot showing changes in the LFC depletion rate of
   NhoodGroup(s). C Bar plots of cell type distribution in Neighborhood
   Groups 2 and 5. D Ridge plot of Nos2 gene expression across AM
   clusters, segmented by timepoint of infection. E Volcano plot of
   Differential Gene Expression (DGE) in CD38^+/- AMs, with x-axis for
   log[2] fold changes and y-axis for −log[10] adjusted p values,
   highlighting top DGE genes for both groups. DGE analysis was performed
   using a two-sided Wilcoxon rank-sum test. P values were adjusted for
   multiple comparisons using the Bonferroni correction. Genes with an
   average log[2] fold change >0.58 and adjusted p value < 0.05 were
   classified as significantly upregulated in CD38^+ AMs, while those with
   a log2 fold change <−0.58 and adjusted p < 0.05 were classified as
   significantly downregulated. Exact p values and log[2] fold changes for
   all genes are provided in Supplementary Data [182]3. F Horizontal bar
   chart illustrating Gene Ontology (GO) Processes Enriched in CD38^− AMs.
   Pathway enrichment analysis was conducted using the G::Profiler and the
   two-sided g:SCS method, as described^[183]117. The bar plot displays
   the negative log[10]-transformed p values of significantly enriched
   pathways. G Bar chart illustrating the temporal distribution of
   infected AMs by CD38 staining, with each bar representing the
   percentage of the specific cell type within the total infected AMs at
   each timepoint. H Horizontal bar chart illustrating the Gene Set
   Enrichment Analysis for CD38^+ AMs. Pathway enrichment analysis was
   conducted using the G::Profiler and the two-sided g:SCS method, as
   described^[184]117. g:SCS threshold is a value pre-calculated for query
   list sizes up to 1000 genes. Given a fixed input query size, g:SCS
   analytically approximates a threshold t corresponding to the 5% upper
   quantile of randomly generated queries of that size. All actual p
   values resulting from the query are transformed to corrected p values
   by multiplying these to the ratio of the approximate threshold t and
   the initial experiment-wide threshold a = 0.05. The bar plot displays
   the negative log[10]-transformed p values of significantly enriched
   pathways. Source data are provided as a Source Data file.

   DGE analysis of the CD38^− AMs further supports this hypothesis. We
   found CD38^− AMs to be associated with known markers of AM populations
   (Chil3, Lpl, Marco, Mrc1, Trf)^[185]8,[186]9 (Fig. [187]5E). Examining
   the transcriptional profile of these populations we observed an
   upregulation of genes involved with lipid metabolism such as Fabp4,
   Mgll, and Lpl^[188]8,[189]80. Additionally, we noted increased
   expression in genes linked to the electron transport chain (like
   mt-Nd1, mt-Nd2, mt-Nd4, mt-Cytb, and mt-Atp6) which are involved in
   cellular energy metabolism. This metabolic transcriptional signature
   was further complemented by the upregulation of genes connected to
   fatty acid uptake and transport (e.g., Dbi)^[190]81, lipid droplet
   formation (e.g., Cidec, Plin2)^[191]82,[192]83, and protection against
   oxidative stress (e.g., Gpx1, Gpx4)^[193]84 (Supplementary
   Data [194]3). Collectively, these data indicate a shift towards fatty
   acid oxidation (FAO) metabolism. Importantly, the upregulation of genes
   involved in metabolic and homeostatic functions contrasts with a
   diminished expression of pro-inflammatory genes (Fig. [195]5E)
   (Supplementary Data [196]3). Moreover, Gene Ontology (GO) functional
   enrichment analysis of the transcriptional profile of CD38^− AMs also
   supports these observations. The functional profile of the CD38^−
   TR-AMs aligns with alveolar macrophage characteristics that have been
   shown to promote Mtb growth^[197]6. These include increased oxidative
   phosphorylation and fatty acid metabolism (Fig. [198]5F). In summary,
   our data suggests that the broad range of anti-inflammatory responses
   often associated with AM population are mediated through CD38^− TR-AMs,
   which are highly susceptible and sensitive to Mtb infection during the
   early phases of TB.

CD38^+ AMs are key contributors in controlling tuberculosis infection

   We re-clustered the AM subsets based on their surface expression of
   CD38 protein. Consistent with the previous observations, CD38^+ cells
   only constituted 12% of the total infected AMs at 2 weeks, but this
   proportion increased to 89% and 93% at the 4 and 6-week timepoints,
   respectively (Fig. [199]5G). Intriguingly, DAB testing revealed
   unchanging abundance of infected CD38^+ AMs across timepoints
   (Fig. [200]4A), and moAMs and CD38^+ TR-AMs constitute the majority of
   the infected AMs in the later stages of infection (Fig. [201]4B, C).
   This stability in numbers, even amidst a pronounced decline in the
   overall infected AMs (Fig. [202]4C), implies that these cellular
   subsets might be inherently more resilient to Mtb infection. This
   interpretation aligns with their observed Nos2 expression patterns.
   Both CD38^+ TR-AMs and moAMs exhibit a bimodal Nos2 expression, with
   increasing cell counts and Nos2 expression intensity, suggesting their
   antimicrobial capacity increases over time (Fig. [203]5D). The elevated
   CD38 expression and pro-inflammatory pathways in AMs in latter stages,
   as described earlier, are predominantly represented by these two CD38^+
   populations.

   Trajectory and pseudotime analyses provide additional insights into
   these populations. While moAMs, CD38^− TR-AM_1, and CD38^− TR_AM_2
   appear as mature endpoints (gray circles 12, 18, 17, 7, 5, and root 3,
   white circle), CD38^+ TR-AMs seem to arise from enhanced polarization
   of the CD38^− TR-AM_3 subset (root 1, white circle and branching point
   12, 17, black circles, Fig. [204]2B). Thus, we hypothesize that the
   decreased presence of the CD38^− TR-AM_3 subset in subsequent infection
   stages is likely a consequence of these cells transitioning to CD38^+
   TR-AMs, as highlighted in the subsequent paragraphs. Finally,
   enrichment analysis of the genes upregulated by CD38^+ AMs reveals
   up-regulation of the same pathways that are associated with control of
   Mtb infection (Fig. [205]5H). These findings suggest that the divergent
   behavior of CD38^+ and CD38^− AMs in response to Mtb infection is
   programmed into the cell subsets prior to infection, with the CD38^+
   AMs playing a potentially crucial role in managing Mtb growth and
   spread.

Intrapulmonary BCG vaccination amplifies the CD38^+ macrophage population
resulting in enhanced control of Mtb infection

   Recent investigations, including those by Mata et al.^[206]14, have
   shown that intrapulmonary BCG vaccination prior to Mtb infection
   induces protective responses in lung resident macrophages. To extend
   this observation, we assessed how pulmonary BCG administration
   modulates the responses of CD38^+/- macrophages to Mtb infection. Mice
   were vaccinated intranasally with live BCG and after a 2-month period,
   were infected with the reporter smyc’::mCherry/hspx’::GFP Mtb Erdman
   for a duration of 2 weeks. Subsequently, we integrated scRNA-seq
   datasets from these vaccinated mice with our existing timepoint
   analysis data.

   Analysis of infected macrophage populations from both BCG-vaccinated
   (n = 3931 cells) and unvaccinated (n = 2278 cells) mouse lungs, 2 weeks
   post-Mtb infection, revealed marked differences. In the BCG-vaccinated
   cohort, the monocyte-derived macrophages, associated with
   hspx’::GFP-high Mtb (Figs. [207]1F and [208]2E), constituted the vast
   majority of infected macrophages, at 74.43%. This is in stark contrast
   to the unvaccinated cohort, where such cells accounted for just 43.19%
   (Fig. [209]6A). Flow cytometry data from the sorted populations
   reinforced these findings. Macrophages from the BCG-treated group
   mounted a more effective response, limiting Mtb replication more
   efficiently. This is underscored by a marked reduction of the mCherry
   signal in infected cells of BCG-vaccinated mice, which was ~2 log[10]
   times lower than that in infected cells from unvaccinated mice.
   Moreover, there was a substantial difference in the proportion of
   hspx’::GFP stressed bacteria between the two cohorts: GFP-high cells
   represented 22.6% of the total infected cells in the unvaccinated group
   compared to 77.78% in their BCG-vaccinated counterparts (Fig. [210]6B).
   The distribution of hspx’::GFP^+ cells between the two groups, as
   revealed in our scRNA-seq dataset (Supplementary Fig. [211]4B), aligned
   with our flow cytometry data. Complementing these findings, we observed
   a marked decrease in the overall percentage of infected cells in BCG
   treated mice (0.45%) compared to unvaccinated mice (2.08%)
   (Fig. [212]6B), as also confirmed by independent flow cytometry
   analysis (Supplementary Fig. [213]4A). This was also supported by an
   overall reduction in bacterial burden in BCG-treated mice, as assessed
   by CFU counts (Fig. [214]6C). These differences in cellular responses
   between BCG-treated and unvaccinated mice highlight significant
   alterations in their macrophage populations. BCG-treated mouse lungs
   were dominated by highly pro-inflammatory monocytes, and there was a
   marked reduction in infected CD38^− TR-AMs compared to unvaccinated
   controls (Fig. [215]6D,[216]E). The reduced proportion of infected
   CD38^− TR-AMs in BCG-treated mice is likely the result of increased
   recruitment of pro-inflammatory monocytes, in addition to the
   transition of the CD38^− TR-AMs_3 into their CD38^+ counterparts
   (Fig. [217]6E).

Fig. 6. Comparative analysis of immune responses, cellular distribution, and
Mycobacterium tuberculosis infection dynamics in BCG-vaccinated vs.
unvaccinated mice.

   [218]Fig. 6
   [219]Open in a new tab

   A Bar chart comparing the proportions of Monocyte-Derived and
   Tissue-resident cells between BCG-vaccinated and unvaccinated mice at 2
   weeks post-infection. N = 3931 cells (5 mice) for the BCG vaccinated
   group and n = 2278 cells (5 mice) for the Unvaccinated Group. B Flow
   cytometry analysis of the distribution of the Mtb mCherry and GFP
   signals from infected cells in BCG-vaccinated and control mice. C
   Scatter plot showing log[10]-transformed CFU counts in BCG-vaccinated
   and unvaccinated mice at 2wpi. Data points for individual observations
   are shown with mean values marked by horizontal bars. Statistical
   significance was evaluated using a two-sided t-test, assuming equal
   variances between groups. The plot displays summary statistics—mean,
   standard deviation (SD), and sample size (n)—for each group, with the
   annotated p value highlighting significant differences in means. The
   y-axis is labeled with log[10] CFU Count to reflect the log
   transformation. D Bar chart displaying the distribution of infected
   TR-AM subtypes by vaccination status. Each bar’s height corresponds to
   the percentage of the respective TR-AM subtype within the total
   infected macrophage cell population, with values labeled on each bar. E
   UMAP visualization of cell clusters in BCG vaccinated and control
   samples, 2 weeks post-infection. N = 3931 cells for the BCG vaccinated
   group and n = 2278 cells for the Unvaccinated Group. Both groups
   include cells pooled from 5 different animals. F Dot plot
   representation of genes associated with lysosomal function and
   phagocytosis/autophagy pathways, between BCG-vaccinated and control
   mice. Dot size and color intensity indicate percentage and average
   expression level, respectively. Source data are provided as a Source
   Data file.

   Using scWGCNA, we identified a unique co-expression module of 463 genes
   exclusive to the macrophage populations of BCG-vaccinated mice
   (Supplementary Fig. [220]4C) (Supplementary Data [221]4). This module
   was highly enriched in pro-inflammatory genes, including Nos2 and Cd38
   (Supplementary Fig. [222]4D). Further analysis revealed significant
   presence of genes involved in small GTPase signal transduction,
   encompassing Rab GTPases, guanine nucleotide exchange factors (GEFs)
   and GTPase activators (GAPs) (Supplementary Fig. [223]4E). Rab GTPases
   are known for their roles in endosomal trafficking, while GAPs and GEFs
   are vital for membrane transport, phagocytosis, and controlling of the
   actin cytoskeleton^[224]85–[225]87. Their increased expression suggests
   modifications in intracellular trafficking within the BCG-treated
   macrophages. Altered Mtb intracellular trafficking and increased
   lysosomal fusion can limit bacterial growth^[226]88–[227]90 and amplify
   killing mechanisms due to immune activation and increased
   autophagy^[228]91–[229]95. Our data confirms the previous findings as
   we observe that increased intracellular trafficking is tightly linked
   to increased expression of genes involved in lysosomal and autophagy
   functions in BCG-treated macrophages (Fig. [230]6F). Additionally,
   genes facilitating macrophage migration and tissue invasion saw
   heightened expression in macrophages from BCG-treated lungs
   (Supplementary Data [231]4). This observation aligns with the gene-set
   enrichment analysis of the 463 co-expressed genes from the scWGCNA
   module. Here, cell migration, motility, programmed cell death, and
   autophagy emerged as the pathways enriched in BCG-treated macrophages,
   in both GO and KEGG analysis (Supplementary Fig. [232]5A)
   (Supplementary Data [233]5).

   The increased expression of pro-inflammatory gene signatures involved
   in control of Mtb infection in BCG-treated macrophages aligns with
   their observed phenotype. A hallmark feature across BCG-treated mice
   was the increased expression of CD38 (Supplementary Fig. [234]5B). This
   immune control phenotype was still evident at 4 weeks post-infection,
   corroborating Mata et al.’s findings (Supplementary Fig. [235]5C).
   Intriguingly, we observed discrepancies between the two studies
   regarding the origin of infected macrophages after BCG vaccination. In
   our study, the majority of infected macrophages in BCG-treated mice
   were monocyte-derived, while previous data indicated Mtb was mostly
   confined to AMs after BCG vaccination^[236]14. We believe the disparity
   was due to weak CD64 staining on pro-inflammatory monocyte-derived
   cells (Supplementary Fig. [237]5D). To test this hypothesis, we
   re-infected BCG-treated and control mice with a high infection dose
   (5 × 10^3 CFU), similarly to Mata et al.^[238]14. The published CD64
   flow cytometry gating approach will identify most infected macrophages
   as AMs in BCG-treated mice^[239]14 (Supplementary Fig. [240]5E).
   However, when gating only on SiglecF, expressed uniquely by AMs
   (Fig. [241]1B), the results indicate the majority of infected cells are
   monocyte-derived (Supplementary Fig. [242]5F), as in our scRNA-seq
   dataset. Regardless of the gating method, we found a consistent
   increase in the percentage of infected AMs in BCG-vaccinated mice
   compared to the unvaccinated counterparts, underscoring their improved
   ability to restrict Mtb growth, both at 2wpi (Supplementary
   Fig. [243]5F) and at 4wpi (Supplementary Fig. [244]5G)^[245]14. This
   aligns with our scRNA-seq results and highlights the fundamental role
   of the increased polarization of the CD38^− TR-AMs_3 transitioning to
   CD38^+ TR-AMs for enhanced Mtb control in BCG-treated AMs
   (Fig. [246]6E). In conclusion, our data suggests that the heightened
   defense against Mtb reinfection observed after intranasal BCG
   vaccination is due to both the increased presence of CD38^+
   monocyte-derived macrophages and the activation of resident CD38^+
   TR-AMs.

Pre-existing differences in the chromatin organization of CD38^+ vs CD38^−
AMs are linked to differential responses to Mtb infection

   Our scRNA-seq analysis revealed distinct differences in AM populations
   between naïve and Mtb-infected mice. Specifically, we observed that
   CD38^+ moAMs are recruited to the lung in response to Mtb infection
   (Fig. [247]2A). In contrast, CD38^+ TR-AMs were already present prior
   to infection, predominantly exhibiting a less active CD38^− phenotype,
   which we previously defined as CD38^− TR-AM_3 (Fig. [248]2A, [249]B).

   To investigate the chromatin landscape and potential epigenetic
   regulation of AM subsets prior to infection, we performed scATAC-seq on
   CD45^+ cells isolated from the lungs of naïve mice. Unbiased clustering
   based on differential chromatin accessibility identified 10 distinct
   clusters (Fig. [250]7A). Integrating this scATAC-seq dataset with our
   timepoint-specific scRNA-seq data, which includes naïve, bystander, and
   infected cells, revealed that variations in chromatin organization
   before infection align closely with the diverse transcriptional
   phenotypes observed during TB infection. Using gene scores, we inferred
   the potential gene expression profiles for each cell in the scATAC-seq
   sample, based on the accessibility of regulatory elements adjacent to
   each gene. We then performed data integration with the scRNA-seq
   dataset, as previously described^[251]96 (Fig. [252]7B).

Fig. 7. Pre-existing differences in the chromatin organization of TR-AMs are
linked to differential responses to Mtb infection.

   [253]Fig. 7
   [254]Open in a new tab

   A Unbiased UMAP Clustering of Chromatin Accessibility Profiles. Cells
   are spatially organized and colored based on similarities in their
   chromatin accessibility patterns. Associated cluster names are shown on
   the plot. B Visualization of unconstrained integration of single-cell
   ATAC-seq and scRNA-seq datasets, with each point being an individual
   cell spatially organized by chromatin accessibility profile and colored
   according to its predicted cell type, based on cell types in the
   scRNA-seq dataset. C scATAC Cluster Compositions Post-scRNA
   Integration. This bar plot depicts the percentage compositions of each
   scATAC cluster based on the predicted cell labels obtained after
   integration with the scRNA dataset. The x-axis represents different
   scATAC clusters, while the y-axis displays the percentage of total
   cells for each scATAC-seq cluster associated with each predicted
   scRNA-seq cell label. Each bar is colored distinctly for the three
   TR-AMs conditions: CD38^− Anti-inflammatory, CD38^+ Pro-inflammatory,
   Mki67^+ TR-AMs, and Other. The actual percentage values are overlaid on
   the bars. D Scatter plot depicting transcription factor binding sites
   (TFBS) enrichment scores in cluster C7. Transcription factor motif
   enrichment analysis was performed utilizing a hypergeometric test to
   assess the overrepresentation of transcription factor motifs within
   differentially accessible chromatin regions, and p values were adjusted
   for multiple comparisons using the Benjamini-Hochberg procedure. The
   plot visualizes the top transcription factors whose motifs are
   significantly enriched in C7, ranked by their negative
   log[10]-transformed adjusted p values (mlog[10]Padj). Each point
   represents a transcription factor motif, with color intensity
   reflecting the magnitude of enrichment. The top 15 most significant
   motifs are labeled to highlight transcription factors particularly
   relevant to the regulatory landscape of C7. E UMAP plots displaying
   log-normalized count expression levels for the transcription factors
   (TFs) Smarcc1, Bach1, and Rela in the infected cell population of the
   scRNA-seq dataset. Source data are provided as a Source Data file.

   Given that our scATAC-seq sample comprised only cells from naïve mice,
   we anticipated that the inferred gene expression profiles from the
   scATAC-seq dataset would predominantly align with those of naïve cells
   from our scRNA-seq dataset. Surprisingly, we found that the inferred
   gene expression of cells from cluster C7 mirrored that of the
   pro-inflammatory CD38^+ subsets from our scRNA-seq data (Fig. [255]7B,
   [256]C), which by transcriptional profiling are not present in naïve
   mice (Fig. [257]2A). In contrast, clusters C6 and C8 aligned with
   CD38^− TR-AMs, while cluster C5 correlated with Mki67^+ AMs
   (Fig. [258]7B, [259]C). To validate that cluster C7 represented CD38^+
   TR-AMs and not monocyte-derived AMs, we probed for open chromatin
   within the promoter regions of monocyte markers Mafb and Ly6a, whose
   expression is restricted to monocyte-derived macrophages in our
   scRNA-seq dataset, as noted earlier^[260]9 and (Fig. [261]1F). We found
   high levels of open chromatin for these markers only in
   monocyte-derived cells, but not in cluster C7 (Supplementary
   Fig. [262]6B).

   To further understand why the inferred gene expression of cluster C7
   aligns with pro-inflammatory populations in scRNA-seq, we first
   assessed transcription factor dynamics, performing marker peak and
   motif enrichment analysis (FDR < 0.1, log[2]FC > 0.5) to identify
   transcription factor binding sites (TFBS) that are enriched across the
   different scATAC-seq clusters. Cluster C7 exhibited highly significant
   enrichment for binding sites of transcription factors such as Smarcc1,
   Bach1, and Rela, known to drive pro-inflammatory gene activation in
   macrophages^[263]97–[264]99 (Fig. [265]7D). We further validated the
   expression of these TFs in our scRNA-seq dataset and found them to be
   uniquely expressed by CD38^+ pro-inflammatory macrophages following Mtb
   infection (Fig. [266]7E).

   Finally, our analysis of open chromatin peaks within the regulatory
   regions (±5k from TSS) of pro-inflammatory genes that define CD38^+
   AMs, such as Nos2, Cd38, Slc7a11, Ccl5, Ptgs2, Il1b, and Cxcl2,
   revealed higher open chromatin in cells from cluster C7 compared to
   clusters C6 and C8, further validating that CD38^+ TR-AMs are
   pre-primed for a pro-inflammatory response (Supplementary
   Fig. [267]6C).

   Overall, our analysis demonstrates that CD38^+ TR-AMs are present in
   naïve mouse lungs before infection, but are transcriptionally quiescent
   and have an epigenetic profile markedly distinct from their CD38^−
   TR-AM counterparts. These CD38^+ TR-AMs have increased chromatin
   accessibility in promoter regions of pro-inflammatory genes, with
   significant enrichment of transcription factor binding sites that have
   the potential to drive pro-inflammatory macrophage activity and control
   Mtb growth. Our findings support the hypothesis that the response of
   different TR-AM subsets to Mtb infection is largely predetermined by
   their intrinsic chromatin organization prior to infection.

Discussion

   Our understanding of immune protection against TB largely comes from
   studying immune failure^[268]26,[269]100. Experimental infections in
   immune deficient mouse strains have informed us which pathways, when
   compromised, increase susceptibility to infection and disease^[270]101.
   Similarly, several mutations in the human population have been linked
   to increased incidence or severity of disease^[271]102,[272]103.
   However, if disease outcome is determined by the biology of the host
   macrophages, and if different macrophage populations are responsible
   for control or promotion of bacterial growth, focusing solely on immune
   failure offers a limited view of disease control. This focus has
   resulted in our reliance on Interferon-γ Release Assays (IGRA),
   Mycobacterial Growth Inhibition Assays (MGIA) and similar biological
   readouts for vaccine development, which have proven to be
   non-predictive of immune protection^[273]22,[274]23.

   Our current study uses a mouse challenge model with fluorescent fitness
   reporter bacteria to define and characterize different macrophage
   populations in the infected mouse lung. We’ve previously found that
   recruited pro-inflammatory monocyte-derived IMs effectively control
   Mtb^[275]6,[276]8,[277]9, while AMs exhibit diverse phenotypes. We
   posit that the variability among these macrophage subsets significantly
   influences disease progression in TB.

   In early infection, Mtb primarily resides in CD38^− AMs, which show a
   muted response, as noted by Rothchild et al.^[278]10. As infection
   progresses, the AM landscape changes significantly with the recruitment
   of moAMs and activation of a subset of TR-AMs, leading to increased
   bacterial control. These macrophage subsets transition the lung
   environment from immune homeostasis to a pro-inflammatory state,
   effectively curtailing Mtb growth. The shifts in macrophage phenotypes
   are rooted in their intrinsic epigenetic programming, as revealed by
   ScATAC-seq analyses. CD38^+ TR-AMs, are already present in naïve lungs,
   and exhibit chromatin landscapes predisposed for a pro-inflammatory
   response, indicating epigenetic priming as a key factor in their
   infection response. The diverse chromatin organization of the different
   TR-AMs subsets before infection suggests the potential for manipulating
   their epigenetic programs, opening new avenues to enhance macrophage
   function in TB.

   Furthermore, examination of post-infection phenotypic changes in AM
   subsets, particularly after intrapulmonary BCG administration, provides
   insights into potential strategies for reprogramming these macrophages
   to our advantage. Our focus was not on promoting BCG as a long-term
   vaccination strategy, but rather on understanding how transient changes
   in macrophage function post-BCG administration can inform potential
   therapeutic targets. We observed transcriptional shifts resulting in
   the polarization of CD38^− TR-AM_3 towards the CD38^+ TR-AM phenotype
   and increased inflammatory activation of monocyte-derived macrophages
   in BCG-treated mice. These transcriptional responses are associated
   with augmented expression of pathways related to intracellular
   trafficking and lysosomal/autophagic functions, suggesting that
   intranasal BCG administration triggers the pre-activation of genetic
   programs inherent to the epigenetic profile of pro-inflammatory TR-AM
   subsets. This promotes a phenotype more effective in restricting Mtb
   replication. Additionally, BCG also increases the recruitment of
   monocyte-derived macrophages, resulting in fewer unresponsive CD38^−
   TR-AMs being infected with Mtb, further strengthening the lung myeloid
   populations’ ability to counter the Mtb challenge. These results extend
   the recent reports that BCG reprograms lung AMs to better control
   Mtb^[279]14,[280]17. While our findings, conducted in SPF mice, do not
   fully capture the complex genetic and environmental influences found in
   human populations, the changes in macrophage phenotype observed during
   infection and post-BCG vaccination offer a clear path towards improved
   TB control. We propose that the dynamic nature of these macrophage
   subpopulations plays a major role in the early events following Mtb
   infection and throughout the course of the disease. Through
   understanding the underlying molecular mechanisms and the broader
   immune context driving the distinct responses of CD38^+ and CD38^− AMs
   to Mtb infection, we could guide the screening of therapeutics aimed at
   improving macrophage control of Mtb early in infection. However,
   factors such as genetic diversity, nutritional status, co-infections,
   and early-life BCG vaccination can significantly alter immune
   responses; therefore, these results will require validation under more
   complex conditions in humans.

   The significant shifts in macrophage phenotypes we’ve observed,
   especially in the context of BCG vaccination and immune function,
   emphasize the functional resolution of the analytical tools employed in
   this current study. The scWGCNA method, a focal point of this and
   ongoing studies, has enabled generation of gene expression modules that
   functionally categorize various macrophage subpopulations, in both
   mouse and NHP infections. These modules, alongside newly identified CD
   markers, facilitate integration with skin and lung challenge approaches
   for more accurate phenotype identification of tissue-resident and
   monocyte-derived macrophages in relation to disease or vaccination
   status. We believe this represents a viable avenue to the development
   of predictive biomarkers of immune protection.

Methods

Mtb and BCG strains

   The parental strain employed for all experiments was Mycobacterium
   tuberculosis Erdman (ATCC 35801). Fluorescent reporter strains
   including smyc′::mCherry, smyc′::mCherry/hspx’::GFP, and hsp60′::GFP
   have been reported^[281]35,[282]104–[283]106. Both the M. tuberculosis
   strain and BCG (Pasteur) were cultivated at 37 °C until they reached
   the mid-log phase in MiddleBrook 7H9 broth enriched with 10% OADC
   (Becton, Dickinson, and Company), 0.2% glycerol, and 0.05% tyloxapol
   (Sigma-Aldrich). For the selection of fluorescent strains, Hygromycin B
   (50 mg/ml) was utilized. For mouse infections, bacterial aliquots were
   prepared in 10% glycerol, titrated, and preserved at −80 °C, following
   the protocol detailed in ref. ^[284]27.

Mice

   C57BL/6J WT, B6.129P2-Nos2tm1Lau/J (NOS2^−/−), B6.129S7-Ifngtm1Ts/J
   (IFNγ^−/−), B6.129S7-Rag1tm1Mom/J (Rag1^-/-) female and male mice were
   purchased from The Jackson Laboratory. The mice used in this study were
   6–8 wk old. All mice were maintained in a specific pathogen–free animal
   biosafety level 3 facility at Cornell University. Animal care was in
   accordance with the guidelines of the Association for Assessment and
   Accreditation of Laboratory Animal Care. All animal procedures were
   approved by the Institutional Animal Care and Use Committee of Cornell
   University.

Mice infection and lung cells isolation

   For Mtb infections, mice were anesthetized and intranasally inoculated
   with 1.5 × 10^3 CFUs of the Erdman strains (smyc′::mCherry,
   hspx′::GFP/smyc′::mCherry, or hsp60′::GFP) resuspended in 30 μl of PBS
   containing 0.05% Tween 80. The inoculum dose was verified by plating
   various dilutions of the bacterial stocks used for infection on 7H10
   agar plates supplemented with OADC Enrichment and glycerol. These
   plates were incubated at 37 °C, and after 3 weeks, colonies were
   counted. At 2, 3, 4, and 6 weeks post-infection (w.p.i.), mice were
   euthanized. Lungs were aseptically removed and immersed in PBS
   containing 5% FBS and Collagenase IV (250 U/mL). To preserve the gene
   expression profiles of both the host and bacteria, samples were
   immediately processed using a GentleMACS tissue dissociator (Miltenyi
   Biotec) and maintained on ice. The dissociated lung material was
   subsequently strained through a 70-µM mesh, and red blood cells were
   lysed using ammonium-chloride-potassium (ACK) lysis buffer
   (Lonza)^[285]27.

   For BCG vaccinations, mice received an intranasal dose of 2 × 10^6 CFU
   of BCG (Pasteur) bacilli in 30 μL of PBS containing 0.05% Tween 80.
   Post-infection, these mice were housed in pathogen-free cages at a
   biosafety level 2 facility at Cornell University, in preparation for
   re-infection with M. tuberculosis Erdman. After a period of 2 months
   (60 days) from the initial vaccination, these mice underwent a
   secondary intranasal challenge. This challenge involved ~1.5 × 10^3 or
   5 × 10^3 CFU of either hspx′::GFP/smyc′::mCherry or smyc′::mCherry Mtb
   Erdman respectively, with the bacteria also resuspended in 30 μL of PBS
   containing 0.05% Tween 80. At specified intervals post-infection,
   specifically at 2 and 4 weeks, the lungs of these mice were aseptically
   removed and immersed in a solution of PBS containing 5% FBS and
   250 U/mL of Collagenase IV. The harvested lung tissues were then
   processed for subsequent scRNA-seq or flow cytometry analyses.

Sorting of the murine lung suspensions for scRNA-seq analysis

Infected populations

   To generate single cell suspensions for cell sorting, mice
   (n = 5/timepoint) were euthanized according to approved protocols. Lung
   lobes were aseptically removed and placed in GentleMacs C tubes
   (Miltenyi Biotec) containing 4 mL of dissociation solution composed of
   PBS, 5% FBS, and 250 U/mL Collagenase IV. The tissues were then
   dissociated using the GentleMacs dissociator (Miltenyi Biotec) for
   30 min. Resulting suspensions were strained through a 70 µm cell
   strainer into a 50 mL Falcon tube, followed by centrifugation at
   400 × g for 5 min at 4 °C. After discarding the supernatant, red blood
   cells were lysed using 1 mL of ACK lysis buffer for 5 min at room
   temperature. Subsequently, cells were stabilized by adding 10 mL of
   staining solution (PBS, 5% FBS), centrifuged under the same conditions,
   resuspended in sorting buffer (PBS, 1% FBS, 5 mM EDTA, and 25 mM
   Hepes), filtered through a 40-µM strainer, and sorted. Throughout the
   sorting, samples were kept at 4 °C and directly collected into Cell
   Staining Buffer (BioLegend). Mice infected with either smyc′::mCherry
   or hsp60′::GFP were used as a control to define the sorting gates for
   the hspx′::GFP/smyc′::mCherry-infected cells. For the BCG analysis, we
   followed the same protocol sorting n = 5 mice.

Bystander populations

   Single-cell suspensions from mice infected with smyc′::mCherry
   (n = 3/timepoint) were incubated for 20 min with fluorophore-bound CD45
   antibodies (104; BD). After two PBS washes, samples were resuspended in
   the sorting buffer, filtered via a 40-µM strainer, and sorted. During
   sorting, samples were consistently held at 4 °C and collected directly
   into Cell Staining Buffer (BioLegend).

scRNA-seq libraries preparation and sequencing

Sample preparation and staining

   Sorted cells were centrifuged at 500 g for 5 min and then resuspended
   in 50 µl of cell staining buffer containing 0.25 µg of TruStain FcX
   PLUS (BioLegend), followed by a 10 min incubation at 4 °C. An ADT plus
   HTO antibody cocktail mix (50 µl) was then added to the samples, and
   the cells were further incubated for 30 min at 4 °C. After two washes
   in cell staining buffer, differentially tagged samples (e.g.,
   hspx′::GFPhigh/hspx′::GFPlow) were combined and resuspended in 1×
   Dulbecco’s PBS. The samples were then fixed by slowly adding ice-cold
   methanol to a final concentration of 90% (vol/vol) and stored at −20 °C
   overnight.

Sample rehydration and mRNA library preparation

   Post-fixation, the samples were brought out of the BSL3 facility,
   equilibrated on ice for 15 min, and washed twice with rehydration
   buffer (1× Dulbecco’s PBS with 1.0% BSA [Thermo Fisher Scientific] and
   0.5 U/µl RNase Inhibitor [Sigma-Aldrich]). The cell count was
   determined prior to loading onto the 10× chip. For mRNA library
   preparation, we adapted the 10× protocol ([286]CG000206 Rev D), making
   a minor alteration in step 2.2. Specifically, we incorporated 1 μl of
   ADT and HTO additive primers (0.2 μM stock) following the method
   described by Stoeckius et al.^[287]107. HTO and ADT libraries were
   prepared according to BioLegend’s standard protocols.

Library generation and sequencing

   The mRNA, HTO, and ADT libraries underwent quality control assessment
   using an Agilent Fragment Analyzer. Their concentrations were
   determined using the QX200 digital PCR system from Bio-Rad. Libraries
   were pooled in the same sequencing run at specific ratios: 90% mRNA, 5%
   ADT, and 5% HTO. Sequencing was performed on the NextSeq2000 (Illumina)
   using the 50-bp P3 NextSeq kit. The cycle distribution was: read 1 (28
   cycles), i7 index (8 cycles), and read 2 (52 cycles). Sequencing depth
   exceeded 50,000 reads/cell.

scATAC-seq nuclei isolation, library preparation and sequencing

   Murine naïve lung sorting was performed as described in refs.
   ^[288]9,[289]27. For nuclei isolation, sorted cells were centrifuged at
   300 rcf for 5 min at 4 °C and then resuspended in 150uL of PBS
   supplemented with 0.04% BSA. A 100 µL aliquot of this cell suspension
   was transferred to a 0.2 mL flat-cap tube, centrifuged again under the
   same conditions, and subsequently resuspended in 50uL of Lysis Buffer
   (containing 10 mM Tris-HCL, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween 20, 0.1%
   Nonidet P40 Substitute, 0.01% Digitonin, and 1% BSA). This was
   incubated on ice for 4 min. Post-incubation, 50 µL of Wash Buffer
   (comprising 10 mM Tris-HCL, 10 mM NaCl, 3 mM MgCl2, 1% BSA, and 0.1%
   Tween 20) was added to the lysed cells. The nuclei suspension was then
   centrifuged at 500 rcf for 5 min at 4 °C. The nuclei were washed with
   45 µL of a 1:20 dilution of the Nuclei Buffer (10x Genomics,
   PN-2000153/2000207) and centrifuged again using the same conditions.
   Finally, the isolated nuclei were resuspended in a volume of Diluted
   Nuclei Buffer to obtain a concentration ranging from 3080 to 7700
   nuclei/µL. This was used as input for the 10X protocol (CG 000209 Rev
   D) targeting a recovery of 10,000 nuclei.

   The transposition reaction and library construction were performed
   following the protocol from 10X ([290]CG000209 REV D). Sequencing was
   conducted on a NextSeq 500 with the parameters: Read 1N: 50 cycles, i7
   index: 8 cycles, i5 index: 16 cycles, and Read 2N: 50 cycles.

Antibodies used for scRNA-seq

   For our scRNA-seq timepoint experiments, we used a range of TotalSeq
   (BioLegend) murine antibodies in our antibody cocktail mix, each at a
   concentration of 0.5 μg/sample. These antibodies included SiglecF
   (custom-made, clone S17007L), CD64 (cat. # 139325), Ly6G (cat. #
   127655), CD11c (cat. # 117355), CD14 (cat. # 123333), Ly6G-Ly6C (cat. #
   108459), CD63 (cat. # 143915), F4/80 (cat. # 123153), CD38 (cat. #
   102733), TLR4 (cat. # 117614), CD11b (cat. # 101265), CD16/32 (cat. #
   101343), CD86 (cat. # 105047), CD1d (cat. # 123529), CD3 (cat. #
   100251), CD4 (cat. # 100569), and CD8a (cat. # 100773). In addition,
   for hashing purposes, we used BioLegend’s Hashtag 1 murine (cat. #
   155801), Hashtag 2 murine (cat. # 155803) antibodies.

Flow cytometry analysis

   Lung cell suspensions were counted and incubated for 30 min in the dark
   at room temperature with fluorophore-conjugated antibodies, washed
   twice with PBS 1×, and fixed in 4% paraformaldehyde. Antibody panels
   and Fluorescence Minus One controls were generated as appropriate. For
   this study, we used fluorochrome-conjugated mAbs specific to mouse
   SiglecF (E50-2440; Becton Dickinson), CD64 (X54-5/7.1; BioLegend),
   MerTK (2B10C42; BioLegend), CD38 (90; Biolegend) and CD45 (104; Becton
   Dickinson), along with the following reporter strains: smyc′::mCherry
   (mCherry), hsp60′::GFP (GFP), and hspx′::GFP/smyc′::mCherry. Cells were
   analyzed with a Symphony A3 (BD Biosciences). Data were analyzed using
   FlowJo software (version 10.9; BD).

Quantification of bacterial loads

   At 2 weeks post-infection (for BCG-vaccinated) and at 4 weeks
   post-infection (for B6.129P2-Nos2tm1Lau/J (NOS2^−/−)) mice were
   sacrificed and the lung lobes homogenized in PBS containing 0.05%
   tyloxapol (Sigma-Aldrich). Bacterial loads were determined by plating
   serial dilutions of the homogenates on 7H10 agar. Plates were incubated
   at 37 °C and colonies enumerated 3–4 weeks after.

Data analysis

Data acquisition and QC

   Sequencing data derived from each run underwent processing using
   distinct software tailored to the library type. mRNA libraries were
   processed using the CellRanger software (v. 6.0) from 10X Genomics and
   the mouse mm10 genome (GRCm38). The ADT and HTO libraries were
   processed using CITE-Seq-Count (v. 1.4.3), available at
   [291]https://hoohm.github.io/CITE-seq-Count/. This processing yielded
   raw count matrices for both mRNA and proteins. Downstream data analysis
   was conducted in Seurat (v. 4.3), following methods outlined by Stuart
   et al.^[292]108. Cells with less than 200 unique genes or with over 30%
   of the total reads belonging to mitochondria were filtered out. The
   HTOs multiplexed samples were then demultiplexed using the
   MULTIseqDemux() function in Seurat^[293]109. This step allowed for the
   removal of doublets and empty droplets and ensured the correct
   assignment of identities to each sample.

Single-cell RNA sequencing data preprocessing

   All scRNA-seq datasets from individual timepoints were preprocessed
   using Seurat’s regularized negative binomial regression, where both the
   number of counts and the percentage of mitochondrial reads were
   regressed out, as per Hafemeister and Satija^[294]110. For analysis of
   the macrophage populations in the uninfected naïve lungs we used the
   previously generated datasets from ref. ^[295]9. Metadata columns,
   namely Timepoint, Batch, Infection Status, and Vaccination Status, were
   added to each dataset prior to subsequent steps.

Data merging and integration

   Datasets were combined into a unified Seurat object. The raw counts
   contained in the RNA slot of the merged object were used as an input
   for Harmony integration^[296]111. Raw counts underwent
   log-normalization, followed by the identification of the top 3000
   variable genes. These genes were scaled and centered. PCA was then
   performed on these values. Integration in Harmony incorporated Batch,
   Infection Status, Timepoint, and Vaccination Status as covariates.

Cluster detection and annotation

   Using the Harmony-aligned embeddings, graph-based cluster detection was
   achieved utilizing principal components (PCs) that had been marked
   statistically significant by the jackstraw method^[297]112. For
   community detection, we employed the Louvain algorithm^[298]113. Cell
   types for each identified cluster were annotated using both
   reference-based^[299]114 and canonical marker genes.

Trajectory and pseudotime analysis

   This analysis was carried out in Monocle (v3.0)^[300]115. The
   integrated Seurat object was converted using the SeuratWrappers package
   in R (v. 0.2.0). For unbiased trajectory and pseudotime analysis of the
   macrophage populations, all cells classified as macrophages were
   assigned to the same partition, and trajectory/pseudotime analysis was
   conducted^[301]116. White circles indicate the root origins of the
   trajectory, gray circles indicate destination fates and black circles
   indicate branching points.

Generation of a force layout embedding (FLE)

   To visualize infected and bystander cells in a force layout embedding
   (FLE) we used the methods described in Waddington-OT^[302]76. Briefly,
   using the Pegasus library (pegasus), a PCA was performed on the
   expression data using the function pg.pca(). This was followed by the
   determination of nearest neighbors using pg.neighbors(). Subsequently,
   a diffusion map was generated through the function pg.diffmap() to
   visualize the transition between cellular states. To visualize the
   data, the FLE algorithm implemented in Pegasus was employed by invoking
   pg.fle(). The resulting 2D coordinates were then extracted for further
   visualization. A scatter plot was generated using matplotlib to
   represent cells in the 2D space derived from the FLE calculations.
   Cells were color-coded based on their respective timepoints (weeks).

Time-dependent pathway analysis

   For our time-dependent pathway analysis, we employed the Tempora
   package^[303]74. First, the integrated Seurat object was subset to only
   contain macrophage clusters. Subsequently, the ImportSeuratObject()
   function was employed to prepare the data for Tempora, where clusters
   and timepoints were explicitly defined. Given the necessity of pathway
   enrichment analysis, gene set files were fetched from the BaderLab
   online resource. We specifically selected gene set files that
   incorporated all pathways, excluding those inferred from electronic
   annotations (IEA). The downloaded gene matrix transposed (GMT) file was
   then leveraged for pathway enrichment calculations using the GSVA
   method with CalculatePWProfiles(). To construct the cellular
   trajectory, the BuildTrajectory() function was applied, with the number
   of PCs set to 11 and a statistical significance threshold of <0.05. The
   trajectory visualization was achieved using the PlotTrajectory()
   function, and subsequently, generalized additive models (GAMs) were
   employed to identify significant time-varying pathways through the
   IdentifyVaryingPWs() function. The temporal dynamics of these pathways
   were illustrated using a custom ggplot function, ggplotVaryingPWs() for
   improved visualizations.

scWGCNA co-expression analysis

   scWGCNA is an analytical pipeline used extensively by developmental
   biologists, that support the iterative, unbiased assembly of gene
   expression modules that define cell populations of interest. To perform
   weighted gene co-expression network analysis on our scRNA-seq datasets,
   we employed the scWGCNA package^[304]45. Pseudocells were computed
   using the calculate.pseudocells() function from the scWGCNA package.
   This method aggregates single cells into pseudocells to reduce the
   complexity of the dataset. A fraction of 0.2 of the cells were used as
   seeds, and for each seed, 10 nearest neighbors were aggregated based on
   the Harmony dimensional reduction. For each analysis, the selected
   single cell data was normalized using the LogNormalize() method with a
   scaling factor of 10,000. Variable features were identified using the
   Variance Stabilizing Transformation (VST) method, and the top 3000
   features were retained.

   Subsequently, pseudocells (pcells) and the variable genes (var.genes)
   identified in the previous step were used in the scWGNA() function to
   identify modules of co-expressed genes. Membership tables were
   inspected to understand the genes belonging to each module, and average
   expressions of each co-expression module per cell were analyzed.
   Eigengenes for the co-expression modules were computed using the
   scW.eigen() function. This function collapses the expression profile of
   each module into a single representative profile known as the module
   eigengene. The average expression of each module was then visualized on
   a UMAP plot using a customized scW.p.expression() function for improved
   visualization.

Differential abundance testing with Milo

   We used the neighborhood-based statistical framework Mylo^[305]77 to
   test for changes in the abundance of infected and bystander macrophage
   populations across timepoints. To determine the best parameters for
   running the model both the neighborhood cell distribution and the
   distribution of uncorrected P values were assessed. A k-nearest
   neighbors (k-NN) graph was constructed on the data using the
   buildGraph() function, with k set to 10, and using Harmony as the
   dimensionality reduction method. The neighborhoods were then defined on
   this graph using the makeNhoods() function, using the same k and d
   parameters as before. Subsequently, cells were counted within these
   neighborhoods based on their originating samples using the countCells()
   function. The neighborhoods were tested for differential abundance
   using the testNhoods() function. This test took into account a design
   matrix (consisting of a sample identifier and a variable for the
   timepoint) and the Harmony dimensions. Results were then sorted by the
   spatial False Discovery Rate (SpatialFDR). The neighborhood graph was
   further constructed using the buildNhoodGraph() function.
   Visualizations of the neighborhood graph highlighting the differential
   abundance results were then generated. Following this, neighborhoods
   were annotated based on cell identity, and a histogram was plotted to
   visualize the fraction of identified cells. Neighborhoods with an
   identity fraction less than 0.4 were labeled as Mixed. Bee swarm plots
   were generated to further visualize the data based on these identities.
   Finally, neighborhoods were grouped using the groupNhoods() function,
   with a max.lfc.delta of 2. The resultant grouped neighborhoods were
   visualized on a UMAP plot and further explored through bee swarm plots
   based on their LFC differences.

scCODA

   We used scCODA to investigate changes in cell composition across
   different timepoints during Mtb infection. The cell count data was
   reshaped to match the format required for scCODA. This involved mapping
   the original timepoints to new categories (e.g., 2 Weeks and 3 Weeks to
   Early Timepoint, 4 Weeks and 6 Weeks to Late Timepoint) and summing
   counts when necessary to consolidate the data into these new
   categories. The reshaped data was loaded into a pandas DataFrame, where
   each column represented a different timepoint and rows represented
   individual cell types. This DataFrame was then converted to an AnnData
   object, which is the data structure required by scCODA for
   compositional data analysis. The analysis focused on comparing the
   composition of macrophage subsets at these defined timepoints to
   identify significant changes in their proportions that could be linked
   to infection progression. The compositional analysis model was created
   using scCODA’s CompositionalAnalysis class, specifying Timepoint as the
   covariate and automatic as the reference cell type. The model was then
   run to perform Hamiltonian Monte Carlo (HMC) sampling and generating
   posterior distributions for the compositional changes between the
   specified groups. Results from the scCODA model were summarized to
   determine which cell types showed significant changes in proportion
   relative to the reference group across the study’s timepoints.

Pathway enrichment analysis

   Pathway enrichment analysis was performed using G::profiler^[306]117.
   For each analysis, we created an ordered by fold change (FC) (for DGE)
   or membership value (for scWGCNA modules) list of genes as a query,
   selecting only those genes where adjusted P value (p-adj) < 0.05. The
   analysis was performed using the g:SCS method for multiple testing
   correction, the gene ontology (GO), KEGG and Reactome databases as a
   data source, and the default settings for the other parameters in
   G::profiler. Only pathways enriched with p < 0.05 were considered
   statistically significant. Manual exploration of the gene lists for
   each analysis has also been performed to identify relevant themes for
   genes whose function is described in the literature (e.g., Small GTPase
   signal transduction). For this purpose, we only considered genes whose
   FC was absolute >1.5 and p-adj < 0.05, or membership value for scWGCNA
   modules > 0.4.

scATAC-seq chromatin accessibility analysis

   We utilized the ArchR software^[307]96 to perform integrative
   single-cell chromatin accessibility analysis. For the analysis we used
   the precompiled version of the mm10 genome in ArchR. Quality filtering
   parameters for data pre-processing included filtering out cells with
   TSS enrichment scores below 4 and less than 1000 unique fragments. As
   an additional step for quality control, we performed doublet
   identification and removal. Doublet scores were added to the Arrow
   files using the addDoubletScores() function with parameters
   knnMethod = UMAP, k = 10 and LSIMethod = 1. Post doublet
   identification, cells suspected to be doublets were filtered using
   filterDoublets(), reducing the initial nuclei count from 9371 to 8493.
   Subsequently, dimensionality reduction has been performed using the
   Iterative Latent Semantic Indexing (LSI) on the insertion count matrix
   with the addIterative LSI() function in ArchR for 4 iterations.
   Clustering was performed on the Iterative LSI dimensions using Seurat
   with a resolution of 0.7. UMAP embeddings were added with the addUMAP()
   function using parameters nNeighbors = 30, minDist = 0.5 and metric =
   cosine. The UMAP was plotted and color-coded by clusters.

   To identify marker genes for each cluster, gene expression for marker
   genes was estimated from chromatin accessibility data by using gene
   scores. A gene score is considered as a prediction of how highly
   expressed a gene will be based on the accessibility of regulatory
   elements in the vicinity of the gene^[308]96. To create the gene
   scores, we used distance-weighted accessibility models as defined in
   ArchR and implemented in the function addGeneScoreMatrix(). Marker
   genes were then identified for each cluster using the Gene Score Matrix
   with a Wilcoxon test method. The markers for each cluster were filtered
   using an FDR ≤ 0.05 and Log[2]FC ≥ 0.58. Imputed weights were added
   using the function addImputeWeights() to visualize the marker genes
   e.g., Mafb and Ly6a on the UMAP. To visualize the local chromatin
   accessibility around specific genes on a per cluster basis, the
   plotBrowserTrack function was employed, considering 5000 base pairs
   both upstream and downstream of the start of the genes of interest.

Integration of scRNA-seq and scATAC-seq data

   Single-cell timepoint RNA-seq data (scRNA-seq) was utilized for
   integration with ATAC-seq data. Unconstrained integration, a completely
   agnostic approach that takes all of the cells in the scATAC-seq
   experiment and attempt to align them to any of the cells in the
   scRNA-seq experiment, was used. The addGeneIntegrationMatrix() function
   was used to generate a gene integration matrix, which was named
   GeneIntegrationMatrix. For visualization, a palette was derived from
   the scRNA-seq data’s cell type categories. An embedding plot was
   generated using the plotEmbedding() function, colored by predicted cell
   groups to label and visualize the scATAC-seq clusters with the cell
   types predicted from our scRNA-seq dataset.

Transcription Factor (TF) Analysis

   For TF analysis, the following steps were performed: (1) A reproducible
   peak set was determined using the addReproduciblePeakSet() function,
   with MACS2 as the peak caller^[309]118. The getMarkerFeatures()
   function was employed to identify marker peaks unique to an individual
   cluster or a small group of clusters in an unsupervised fashion, using
   the above calculated peak matrix and accounting for biases such as TSS
   Enrichment and fragment counts. Significant marker peaks were
   determined with criteria set at FDR ≤ 0.05 & Log[2]FC ≥ 0.58. (2) To
   assess whether marker peaks or differential peaks were enriched for
   binding sites of specific transcription factors, we performed motif and
   feature enrichment analysis. We annotated the ArchR project with motif
   information using the addMotifAnnotations() function, employing the
   cisbp motif set. Analysis of enrichment of motifs within marker peaks
   was performed with the peakAnnoEnrichment() function, using the
   following criteria FDR < = 0.1 & Log[2]FC ≥ 0.5. The enriched
   transcription factor binding sites (TFBS) for each scATAC-seq cluster
   were then visualized in a ranked scatter plot.

Statistical analysis

scRNA-seq

   Differential expression analysis was performed using the nonparametric
   Wilcoxon rank-sum test as implemented in Seurat^[310]108. Only genes
   with FDR < 0.05 between two comparisons were considered statistically
   significant. Unless otherwise specified, the Wilcoxon rank-sum test
   followed by FDR correction has also been used to compare the
   distribution of a specific gene expression among two groups of cells in
   different plots and visualizations. For plots where the distribution of
   values (eg: CD38 protein level) have been compared, for each timepoint,
   across two conditions (eg: Bystander vs Infected) pairwise t-tests has
   been performed to determine which pairs of groups are different from
   each other. The p values from the pairwise t-tests are adjusted for
   multiple comparisons using the Benjamini-Hochberg (BH) method. The
   Kruskal-Wallis test has been used to analyze the differences among
   group medians in a sample (eg: if the medians of protein expression
   levels differ significantly across clusters). The Dunn’s test has been
   used following the Kruskal-Wallis test to determine which specific
   groups’ distributions differ from each other and the Bonferroni method
   has been used to adjust p values for multiple comparisons. In addition,
   effect sizes for differences in expression between groups were
   quantified using Cliff’s Delta, which measures the probability that a
   randomly selected value from one group will be greater than a randomly
   selected value from the other group. Cliff’s Delta values range from −1
   to 1, where 0 indicates no effect, and values closer to −1 or 1
   indicate stronger effects. The Cliff’s Delta was computed using the
   cliff.delta() function from the effsize R package. Data integration and
   batch effect removals were performed with Harmony^[311]111. ADT and HTO
   data were normalized using a centered log ratio transformation,
   implemented in the function NormalizeData() with
   normalization.method = CLR, in Seurat. RNA counts were log-normalized
   and scaled before PCA and data integration with Harmony. Data
   visualizations were generated on the log-normalized counts for the
   feature plots, scatter plots, and violin plots. Heatmaps and dot plot
   charts were generated on the scaled expression data, as per default in
   Seurat.

scATAC-seq

   Marker genes distinguishing the different cell clusters were identified
   using the getMarkerFeatures() function on the GeneScoreMatrix. The
   Wilcoxon rank-sum test was utilized to assess the significance of
   predicted differential gene expression between cell clusters. Biases
   such as TSS Enrichment and fragment counts (log[10](nFrags)) were
   accounted for during the analysis. Genes were considered as markers
   based on criteria set at a FDR of ≤0.05 and a Log[2] Fold Change
   (Log[2]FC) of ≥0.58. The FDR was controlled using the BH procedure. To
   identify marker peaks associated with different cell clusters, a
   Wilcoxon rank-sum test was employed on the PeakMatrix, accounting for
   biases such as TSS (Transcription Start Site) Enrichment and fragment
   counts. Marker peaks were considered significant based on criteria set
   at a FDR of ≤0.05 and a Log[2]FC of ≥0.58. The FDR was controlled using
   the BH procedure to reduce the chances of false positives arising from
   multiple testing. Transcription factor (TF) motif enrichment within the
   identified marker peaks was assessed using the peakAnnoEnrichment()
   function. This function performs hypergeometric enrichment of a given
   peak annotation within the defined marker peaks.

Flow cytometry analysis

   MFI of the GFP signal for the hspx-high and hspx-low populations at
   different timepoints was calculated using the software FlowJo (v.
   10.9).

CFU data

   In this study, CFUs were quantified and analyzed to assess differences
   in bacterial burden between different conditions/treatments. Log[10]
   transformations were applied to each CFU count prior to further
   analysis. To ensure the appropriateness of parametric tests, the data
   were first checked for normality using the Shapiro-Wilk test, a method
   suited for small sample sizes. This test evaluates the hypothesis that
   a sample comes from a normally distributed population, which is a
   critical assumption for the application of a two-sample t-test.
   Further, to assess the equality of variances between the treatment
   groups, Levene’s test was performed. Upon confirming that the
   assumptions of normality and homogeneity of variances were satisfied, a
   two-sample t-test was conducted to determine if there were
   statistically significant differences in the mean log-transformed CFU
   counts between the groups. The results of the t-test, including the p
   values, were used to infer the statistical significance of the
   differences observed. The CFU plots presented in the results section
   are annotated with summary statistics including the mean and standard
   deviation of each group, along with the sample size (n), and the p
   value from the t-test.

Reporting summary

   Further information on research design is available in the [312]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [313]Supplementary Information^ (52.1MB, pdf)
   [314]Peer Review File^ (1.1MB, pdf)
   [315]41467_2024_52846_MOESM3_ESM.pdf^ (88.5KB, pdf)

   Description of Additional Supplementary Files
   [316]Supplementary Data 1^ (1.2MB, xlsx)
   [317]Supplementary Data 2^ (14.6KB, xlsx)
   [318]Supplementary Data 3^ (999.8KB, xlsx)
   [319]Supplementary Data 4^ (27.8KB, xlsx)
   [320]Supplementary Data 5^ (109.9KB, xlsx)
   [321]Reporting Summary^ (1.3MB, pdf)

Acknowledgements