Abstract Microsporidia are single-celled intracellular parasites that cause opportunistic diseases in humans. Encephalitozoon intestinalis is a prevalent human-infecting species that invades the small intestine. Macrophages are potential reservoirs of infection, and dissemination to other organ systems is also observed. The macrophage response to infection and the developmental trajectory of the parasite are not well studied. Here we use single cell RNA sequencing to investigate transcriptional changes in both the parasite and the host during E. intestinalis infection of human macrophages in vitro. The parasite undergoes large transcriptional changes throughout the life cycle, providing a blueprint for parasite development. While a small population of infected macrophages mount a response, most remain transcriptionally unchanged, suggesting that the majority of parasites may avoid host detection. The stealthy microsporidian lifestyle likely allows these parasites to harness macrophages for replication. Together, our data provide insights into the host response in primary human macrophages and the E. intestinalis developmental program. Subject terms: Parasite development, Parasite host response, Data processing, Fungal biology, Monocytes and macrophages __________________________________________________________________ Microsporidia such as Encephalitozoon intestinalis are single-celled intracellular parasites that cause opportunistic infections and disease in humans involving infection of macrophages. Here the authors infect human macrophages with E. intestinalis, in vitro and use single cell transcriptomics to assess the consequences of cellular infection compared to bystander effects on macrophages and provide insights into the E. intestinalis developmental program. Introduction Microsporidia are obligate, intracellular parasites closely related to fungi^[45]1. Over 1500 species have been reported, which infect a wide range of invertebrate and vertebrate hosts, including humans^[46]2. Microsporidia exist in the environment as dormant spores surrounded by a thick spore coat. In humans, transmission of microsporidia primarily occurs through ingestion of contaminated food or water. Consequently, the initial site of infection is frequently the gastrointestinal tract^[47]3. Microsporidia infection often results in self-limiting diarrheal disease in otherwise healthy individuals. However, in immunocompromised patients, infection can be fatal^[48]4, and disseminated disease has been reported to cause encephalitis^[49]5, tracheobronchitis^[50]6, nephritis^[51]7, keratoconjunctivitis^[52]8, and myositis^[53]9. Encephalitozoon intestinalis is one of the most common microsporidian species found to infect humans. The initial site of infection is usually the small intestine^[54]10, where E. intestinalis invades intestinal epithelial cells. In some cases the infection can spread systemically^[55]11, and E. intestinalis is capable of infecting a wide range of cell types. E. intestinalis has been shown to infect macrophages, both in culture^[56]12,[57]13 and in patients^[58]14. Macrophages may be important for parasite replication and spread within the infected tissue, and may serve as a reservoir for the pathogen. To this end, microsporidia have developed mechanisms to evade killing by macrophages. Studies have shown that Encephalitozoon spp. are able to establish infection within macrophages by preventing phagosome acidification and inhibiting fusion with lysosomes^[59]15. Furthermore, Encephalitozoon spp. were also observed to suppress apoptosis in human macrophages, which prevents the killing of the parasites^[60]16. To understand how E. intestinalis manipulates human macrophages to promote replication, it is necessary to understand the transcriptional dynamics of how macrophages respond to infection, as well as the developmental program of the parasites. Previous work using bulk RNA sequencing has provided insights into transcription in microsporidia and host responses to infection^[61]17–[62]20. For example, bulk RNA-seq of E. intestinalis-infected Caco-2 cells, a human colon cancer cell line, revealed that infection leads to mitochondrial stress and has an impact on cell signaling networks related to energy, metabolism, and membrane trafficking^[63]21. While bulk RNA sequencing can provide insights into the host-parasite interface across the whole population, it does not capture the complexity and heterogeneity of infection dynamics at the cellular level, and low infection rates and asynchronous replication make it challenging to resolve temporal changes in the parasite transcriptional program. Here we investigate how asynchronously replicating E. intestinalis parasites propagate within macrophages and evade detection, by performing single cell RNA sequencing (scRNA-seq) of primary human macrophages infected with E. intestinalis over their life cycle. From these data we analyzed both the parasite and host transcriptomes. Our results lead to a transcriptional blueprint for parasite development, facilitating the identification of molecular markers at each stage of the parasite lifecycle. In addition, our results show that most macrophages fail to respond to invading parasites and are transcriptionally indistinguishable from uninfected cells, suggesting that E. intestinalis generally avoids detection, enabling successful parasite replication. Results E. intestinalis infection dynamics in primary human macrophages To identify suitable timepoints for scRNA-seq analysis, we isolated monocyte-derived macrophages from human donors and investigated the timeline of the E. intestinalis life cycle in these cells. We infected macrophages with purified E. intestinalis spores, and monitored infection by fluorescence microscopy from 3 h post infection (hpi) to 72 hpi (Fig. [64]1a–e). At each timepoint, infected cultures were co-stained for DNA (DRAQ5) and chitin (calcofluor white), allowing for the identification of mature spores (DNA puncta surrounded by a chitin coat), as well as actively replicating parasites (DNA puncta lacking a chitin coat, inside macrophages). At 3 hpi and 12 hpi, we were unable to identify any replicating parasites. However, many macrophages contain ~1–5 mature E. intestinalis spores at these early time points (Fig. [65]1a, b). As productive infection typically yields dozens of spores and takes ~48 h, we infer that macrophages containing ≤5 mature spores and no replicating parasites are likely the result of phagocytosis of spores from the inoculum (Fig. [66]1a, b). By 24 hpi, we observed clusters of replicating parasites in ~5% of macrophages (Fig. [67]1c, f), which is earlier than previously reported^[68]22. Consistent with previous infection models for E. intestinalis^[69]23, replicating parasites are clustered, likely within the parasitophorous vacuole (Fig. [70]1c). By 48–72 hpi, ~20–30% of the macrophages contain newly replicated parasites (Fig. [71]1d–f). Many of these parasites are actively replicating, while others have matured into sporoblast or mature spore stages, as evidenced by the acquisition of a chitin coat. Together, our data suggest that replicating parasites are most abundant 24-48 hpi, with spore maturation occurring ~48–72 hpi. Fig. 1. E. intestinalis infection kinetics in primary human macrophages. [72]Fig. 1 [73]Open in a new tab a–e Representative confocal microscopy images of primary human macrophages infected with E. intestinalis at 3 (a), 12 (b), 24 (c), 48 (d) and 72 (e) hours post infection (hpi). A single mature spore positive for chitin (orange) depicts phagocytosis of the spore by the macrophage (arrowheads). A cluster of DNA-only positive foci (magenta) represents parasites actively proliferating inside of the macrophage (c) (arrow). A mixture of parasites positive for DNA and chitin staining, and DNA-only staining represents macrophages with developing spores (arrow) (d-e). Micrographs are representative of at least 4 biological replicates at each time point. f Quantification of macrophages with active infection, corresponding to representative images in (c–e). Mean ± SD are from eight biological replicates for 24 hpi, six biological replicates for 48 hpi, and four biological replicates for 72 hpi. n = 100 cells per experiment. Source data are provided in the Source Data file. scRNA-seq analysis of human macrophages infected with E. intestinalis To investigate transcriptional changes in both the host and the parasite during infection, we performed scRNA-seq of E. intestinalis-infected macrophages. After filtering and quality control (see Methods), we analyzed a total of 39,881 cells from four healthy human donors, sampling several infection time points, alongside uninfected controls (Fig. [74]2a; Supplementary Fig. [75]1; Supplementary Table [76]1). We initially analyzed 8,097 macrophages from Donor 1 and 10,062 macrophages from Donor 2. Analysis of Donor 1 and Donor 2 revealed donor-to-donor variation, which prompted us to carry out further experiments on two additional donors, Donor 3 and Donor 4, with 10,254 macrophages and 11,468 macrophages, respectively. Fig. 2. Transcriptional analysis of macrophages infected with E. intestinalis. [77]Fig. 2 [78]Open in a new tab a Schematic overview of the experimental workflow for scRNAseq. Primary human PBMCs were collected from 4 healthy donors, differentiated into macrophages, and infected with E. intestinalis for 3, 12, 24, 48, or 72 h. Uninfected cells served as controls. Cells were then prepared and taken through the scRNAseq workflow. b–e UMAP plots of the integrated cells from donors projecting cells from Donors 1–4. f UMAP plot of the integrated cells from Donors 1–4 colored by uninfected (gray; Population A) vs infected (blue; Population B). A cell is scored as infected if >2% of total transcripts in the cell were derived from E. intestinalis. g Quantification of the percentage of infected cells in each cluster (h). Dotplot of the top 5 genes expressed in each cluster. The X axis shows the clusters and the Y axis shows the genes. The color indicates the average expression level across the cells in each cluster and the size indicates the percentage of cells in each cluster expressing that gene. Source data are provided in the Source Data file. We first analyzed the combined parasite and macrophage transcriptomes across all four donors and all timepoints. Uniform manifold approximation and projection (UMAP) and unsupervised clustering of these cells based on highly variable genes revealed 12 clusters, T0-T11, where the “T” prefix denotes clustering based on the Total transcriptome of both the host and the parasite (Fig. [79]2b–e). The clusters partition into two distinct cell populations: Population A, which is present both in uninfected controls and infected samples, and Population B, which is observed almost exclusively in samples infected with E. intestinalis (Fig. [80]2b–e; Supplementary Fig. [81]2). To differentiate between infected and uninfected macrophages, we scored a cell as “infected” if >2% of total transcripts were derived from E. intestinalis (see Methods). We then assessed how infected cells were distributed between the two populations, and found that Population A consists primarily of uninfected cells (Fig. [82]2 f) and includes clusters T0, T1, T2, T3, T8, and T10. In contrast, >90% of the cells in Population B are infected (Fig. [83]2f, g) and Population B includes clusters T4, T5, T6, and T9. Time-point analysis reveals a similar overall infection trajectory in Population B among all four donors (Supplementary Fig. [84]2). As early as 3 hpi, infected cells begin to populate cluster T5, and expand to also fill cluster T4 by 12 hpi. Finally, by 24 hpi, infected cells are distributed across all of the clusters in Population B, and no additional clusters arise at later time points (Supplementary Fig. [85]2). While samples from all donors result in a similar overall architecture and infection trajectory, there are differences in the distribution of cells between clusters. For example, Donor 2 and Donor 4 have many more cells in cluster T3 compared to Donor 1 and Donor 3 (Fig. [86]2b–e; Supplementary Fig. [87]2). Cluster T3 was largely absent from uninfected controls, but can be detected as early as 3 hpi, expands significantly by 12 and 24 hpi, then appears to decline at 48 and 72 hpi. (Supplementary Fig. [88]2; Supplementary Data [89]1). Interestingly, cells in cluster T3 are generally not infected, suggesting that these cells may be responding to the presence of E. intestinalis in the culture. In addition, clusters T7 and T9 are specific to Donors 3 and 4, which further suggests variation between donors, such as genetic variation or recent pathogen exposure. Alternatively, technical variation between experiments may account for some differences (see Methods). The human genes that are upregulated in clusters T7 and T9 are involved in cell cycle regulation and cell division (e.g. H2AFZ, PCLAF, MKI67, and STMN1). While clusters T7 and T9 have similar macrophage expression profiles, cells in cluster T7 are not infected by E. intestinalis while cells in cluster T9 are infected (Fig. [90]2h, g). This suggests that upon infection with E. intestinalis, the cells in cluster T7 give rise to the cells in cluster T9. To investigate the gene expression signatures that define each cluster, we identified differentially expressed genes in each cluster. Clusters T4, T5, T6, and T9, which primarily contain infected macrophages, revealed that the most differentially expressed genes are from E. intestinalis. In clusters T4, T5, and T6 we could not detect any human transcripts with a fold change >2, and in cluster T9, differentially expressed human genes are in the minority. Thus, cell clustering appears to be based primarily upon whether a cell is infected or not, while less extreme transcriptional variation in the host and parasite may be occluded and not well separated. Therefore, to gain insights into the developmental program in the parasite, as well as the host response to infection, we further analyzed the transcriptional patterns of the parasite and host separately. Analysis of E. intestinalis gene expression dynamics To study the dynamics of microsporidian gene expression during parasite development, we analyzed our scRNA-seq datasets focusing only on the E. intestinalis transcripts. Whereas “single cells” in our experiment correspond to single macrophages, each infected macrophage contains one or more parasite cells, and a given cell may contain a mixture of parasite stages within one or more parasitophorous vacuoles^[91]24,[92]25. Thus, the parasite mRNA within each macrophage represents the average transcriptional profile of a somewhat asynchronous population of parasites within a single macrophage. Despite these limitations, cell clustering analysis using only parasite transcripts reveals 5 distinct clusters, (Fig. [93]3a), potentially indicating distinct developmental stages of the E. intestinalis life cycle. We denote these clusters with a “P” prefix (P0-P4) since they are based solely on Parasite transcription. Two lines of evidence suggest that parasite development begins in cluster P0 and proceeds in numerical order to cluster P4. First, the percentage of microsporidian reads varies between clusters, with relatively low levels in clusters P0 and P1, and relatively high levels in clusters P2, P3, and P4 (Fig. [94]3b). This trend suggests that parasite burden increases between P0/P1 and P2/P3/P4 and that these correspond to earlier and later stages of the parasite life cycle, respectively. Second, time point analysis further supports this overall trajectory of infection. At 3 hpi, infected cells are found almost exclusively in cluster P0 (Fig. [95]3c), suggesting that this represents the initial transcriptional state immediately after host cell entry. By 12 hpi, infected cells are evenly distributed between clusters P0, P1, and P2. As cells in cluster P2 have higher levels of microsporidian mRNA compared to cluster P1 (Fig. [96]3b), it seems likely that parasites in cluster P2 have progressed somewhat further through the life cycle. By 24 hpi, infected cells are distributed across all five clusters (Fig. [97]3c), suggesting that clusters P3 and P4 represent later stages of development. Although mature spores were not observed by fluorescence microscopy until 48 hpi, the distribution of cells across clusters P0-P4 was very similar at 24, 48, and 72 hpi, with no additional populations detected at later time points. This could either be due to the resistance of hardy spores to detergent-based lysis, or due to the low RNA levels in dormant spores. Consequently, the latest stage we observe, cluster P4, may represent parasites that are not fully mature. Fig. 3. Dynamics of E. intestinalis gene expression during parasite development. [98]Fig. 3 [99]Open in a new tab a UMAP plot of parasite-only transcripts from Donors 1–4. Each cluster corresponds to a stage of parasite development. b UMAP projection of the percentage of parasite reads in a cell (left), and the percentage of parasite transcripts quantified per cluster (right). c UMAP plot separated by time point reveals parasite development trajectory. d Violin plots representing the expression of the top differentially expressed genes across clusters. e Heatmap of differentially expressed genes in each cluster over the parasite life cycle. The color indicates relative gene expression, with red indicating increased expression. The panel on the right depicts known protein products and their role during development. The bolded protein products correspond to the genes shown in (d). Source data are provided in the Source Data file. Differential gene expression analysis of E. intestinalis during development Examination of each parasite cluster showed that very few genes were differentially expressed in clusters P0-P2, the earliest stages of infection that we detect (Fig. [100]3e; Supplementary Data [101]2). Of the genes expressed in these clusters, several are molecular chaperones and housekeeping genes such as ClpB (Eint_111310), Hsp70 (Eint_030430), and Hsp90-like protein (Eint_021060), as well as transcription initiation factors and machinery for protein synthesis (Fig. [102]3e; Supplementary Data [103]3 and [104]4). These housekeeping genes remain constitutively expressed throughout development, which may explain the low number of differentially expressed genes among these early developing stages (Supplementary Data [105]3). In cluster P2, we observed a slight increase in the expression of the long chain fatty acid coA ligase or synthetase (ACSL) (Eint_100850) (Fig. [106]3d, e; Supplementary Data [107]2; Supplementary Data [108]3). ACSL plays an important role in the activation of fatty acids by catalyzing the formation of fatty acyl-CoA^[109]26, which is required for fatty acid degradation but also as a precursor for phospholipid and triglyceride biosynthesis^[110]27. Hence, it is plausible that ACSL may be responsible for membrane biogenesis, which is important for the development of the endomembrane system and the rapid growth of E. intestinalis. Taken together, clusters P0 and P1 may represent the preparatory phases, since no other structural proteins are being expressed at this time, and P2 possibly marks the beginning of the proliferative stage. Cluster P3 marks a shift away from fundamental cellular metabolism and proliferation, towards more specialized functions required for the production of spores (Fig. [111]3e). The magnitude of transcriptional changes is up to 6-fold in P3, compared to the more modest changes in clusters P0, P1, and P2, which reach a maximum of 2-fold (Fig. [112]3e; Supplementary Data [113]2; Supplementary Data [114]4). The two most upregulated genes in cluster P3 are Endospore Proteins 1 and 2^[115]28 (EnP1; Eint_010720 and EnP2; Eint_011200, respectively), along with several other proteins thought to be localized to the spore coat, such as Spore Wall Protein 1 (SWP1; Eint_101630), as well as proteins involved in the synthesis and remodeling of cell wall polysaccharides, such as a putative Chitooligosaccharide Deacetylase (Eint_110360) and Chitin Synthase (Eint_011330) (Fig. [116]3e; Supplementary Fig. [117]3; Supplementary Data [118]4). Several septins are also upregulated in cluster P3 (Eint_111850, Eint_011310, and Eint_090780). Septins are often involved in creating boundaries between cellular compartments, such as between the cell body and the cilium^[119]29 or between the parent cell and the daughter bud in Saccharomyces cerevisiae^[120]29,[121]30, but their functions in microsporidia remain poorly characterized. Several transporter and aquaporin genes are also upregulated (Eint_111770, Eint_081300, and Eint_070680), which may be important for siphoning key nutrients from the host and osmoregulation during parasite maturation (Fig. [122]3e; Supplementary Fig. [123]3; Supplementary Data [124]2; Supplementary Data [125]4). Finally, several transcription factors are upregulated, which may be involved in the transition from P2 to P3 or from P3 to P4. At the latest stage of development, cluster P4, we observed the largest changes in gene expression, up to 41-fold, including upregulation of many known components of the polar tube, a harpoon-like invasion apparatus assembled in maturing microsporidian spores (Fig. [126]3e; Supplementary Fig. [127]3; Supplementary Data [128]2). Polar Tube Protein 1 (PTP1; Eint_060150), is the most highly upregulated gene in cluster P4 (41-fold), and is the most differentially expressed gene across all 5 parasite clusters (Supplementary Data [129]2; Supplementary Data [130]4). Also upregulated in cluster P4 are the known Encephalitozoon polar tube proteins PTP2 (Eint_060140), PTP3 (Eint_111330), and PTP4 (Eint_071050); and the proposed ortholog of N. bombysis PTP6 (Eint_081680) (Supplementary Fig. [131]3; Supplementary Data [132]4). Thus, cluster P4 appears to include several genes known to encode components of the polar tube, and may also include additional genes important for polar tube assembly and spore development, which occurs during the mid-to-late-sporoblast stage^[133]25. Clusters P3 and P4 are enriched in secreted proteins involved in spore maturation The structures of most PTPs are poorly predicted by AlphaFold2 and have minimal similarity to known protein domains. However, PTP4 and PTP6 both contain a single Ricin B-type Lectin domain (RBL)^[134]31,[135]32, and other RBL proteins have also been linked to the PT^[136]33. Previous work has suggested that RBL domains form a small, beta-trefoil domain involved in carbohydrate binding^[137]34, and that PTP4 in particular may play a role in attachment to host cells during invasion^[138]31. Hypothesizing that other RBL proteins may also play a role in PT function, and therefore be transcriptionally co-regulated, we searched the E. intestinalis genome for genes encoding RBL domains and assessed if they were also upregulated in cluster 4. We identified a total of 14 putative RBL proteins in E. intestinalis (see Methods), including PTP4, PTP6, and a protein that has been referred to as “PTP5” in unpublished reports and reviews^[139]35 (Supplementary Fig. [140]4). Of these 14 putative RBL proteins, 12 are potential PTP4/PTP6 paralogs, and appear to encode a single RBL domain with a potential N-terminal signal peptide to direct them to the secretory pathway (Fig. [141]4a); the remaining 2 putative RBL proteins are architecturally distinct, and appear to consist of an RBL domain fused to an integral membrane protein and are annotated as putative Dolichyl-phosphate-mannose protein O-mannosyltransferases (Fig. [142]4a). 12 of the 14 putative RBL genes, including 10 of the 12 PTP4/PTP6 paralogs, are upregulated in cluster P4 (Fig. [143]4b). The remaining 2 PTP4/PTP6 paralogs were not upregulated in any cluster, and expression levels are low. Taken together, there is a clear and coordinated upregulation of PTP4/PTP6-like RBL proteins in cluster P4, which may indicate a more general role for this protein family in polar tube assembly and/or spore maturation. In addition to PTPs and RBL proteins, cluster P4 shares some of the signatures upregulated in cluster P3, including septins, and several genes involved in the synthesis of cell surface polysaccharides. Fig. 4. Cluster P4 is enriched in proteins containing signal peptides and Ricin-B domains. [144]Fig. 4 [145]Open in a new tab a Schematics of representative proteins containing a Ricin-B domain and an N-terminal signal peptide (top) and proteins containing a Ricin-B domain fused to an integral membrane protein (bottom). b Heatmap depicting the expression of genes in cluster P4 containing a Ricin-B domain and/or a signal peptide. c Schematic of the assay we developed to test secretion. Constructs containing the first 40 amino acids of each protein from (b) fused to the N-terminus of GFP were transfected into Expi293 cells, alongside control constructs. Supernatants and whole cell lysates were collected and analyzed via western blot, including lysis controls. d Western blot of supernatants (top) and whole cell controls (bottom) from panel (c), probing GFP and GAPDH. Three biological replicates were performed. e Representative immunofluorescence microscopy images of germinated E. intestinalis spores stained for PTP2 (magenta; known polar tube protein) and Eint_070340 (green; upregulated in cluster P4 and predicted to be secreted). Colocalization analysis reveals staining of the polar tube by PTP2 and Eint_070340. Areas of co-localization are indicated in white. PTP2 and Eint_070340 have an average correlation coefficient of 0.3228. f Quantification of the percentage of polar tubes (PT) that are fully stained, partially stained, or not stained by anti-PTP2 and anti-Eint_070340. Mean ± SD are from three biological replicates. n = 100 spores per experiment. Source data are provided in the Source Data file. Many of the known proteins upregulated in clusters P3 and P4 are targeted to the secretory pathway, including the PTPs^[146]36 and SWP1^[147]37. Numerous other proteins of unknown function are also upregulated in clusters P3 and P4, leading us to hypothesize that some of these might also play roles in cell wall or PT assembly/function and, therefore, might also be targeted to the secretory pathway. Thus, we assessed whether genes upregulated in clusters P3 and P4 were enriched for secreted proteins relative to earlier developmental stages, and to the E. intestinalis proteome as a whole. We used SignalP-6.0 with a relaxed cutoff to compensate for the insensitive detection of signal peptides in highly divergent microsporidian proteins (see Methods). Using this approach, we predict that ~5.6% of E. intestinalis proteins may be secreted (108 of 1934 proteins) (Supplementary Data [148]5, Supplementary Fig. [149]5). Relative to the E. intestinalis proteome at large, clusters P0, P1, and P2 are not enriched for secreted proteins (1.7%, 3.7%, and 6.8%, respectively). The frequency of proteins with predicted signal peptides jumps to 9.6% in cluster P3, and 24% in cluster P4. The upregulated genes in cluster P4 encode nearly half of all the predicted secreted proteins in the E. intestinalis proteome. Of the 47 putative secreted proteins upregulated in cluster P4, 15 are PTPs or RBL proteins (Fig. [150]4b). To test whether these 15 proteins are secreted by E. intestinalis, we implemented a protein secretion assay in a heterologous, mammalian expression system. We designed constructs in which the N-terminal 40 residues of each protein, including the putative signal peptide, was fused to the N-terminus of GFP. We then transfected each construct into Expi293 cells, and assessed the secretion of our GFP fusion proteins into the culture supernatants 24 hrs after transfection (Fig. [151]4c). Western blotting against GFP revealed that the N-terminus of all 15 proteins is capable of mediating GFP secretion from mammalian cells (Fig. [152]4d), suggesting that they likely encode bonafide signal peptides that target the native proteins to the secretory pathway in E. intestinalis. In addition to these 15 PTP and RBL proteins, 4 of the 47 proteins that are upregulated in P4 and predicted to be secreted are proteases, which may be involved in processing and maturation of secreted proteins. Most of the remaining 28 secreted proteins have poorly predicted structures using AlphaFold2 and lack clear similarity to proteins of known structure or function. However, the strong upregulation of many of these secreted proteins, along with known PTPs, suggests possible roles in polar tube biogenesis or spore maturation. To further explore the potential connection between the PT and genes encoding upregulated in P4, we focused our attention on Eint_070340. Eint_070340 is a gene of unknown function, which is the third most upregulated gene in cluster P4, and is predicted to be secreted. To assess the localization of this protein in E. intestinalis, we performed immunofluorescence microscopy using an antibody raised against Eint_070340, along with an antibody against PTP2, which is a known polar tube protein expected to localize along the polar tube. Our data reveal that Eint_070340 is localized to the polar tube, suggesting that Eint_070340 may play a role in PT assembly or function (Fig. [153]4e). Interestingly, while PTP2 is distributed along the entire PT, Eint_070340 most often localizes along a part of the PT, and is excluded from the region proximal to the spore body. Our anti-Eint_070340 antibody also stains sporoplasms released from germinated spores, and a small fraction of spore coats. While the association of Eint_070340 with the PT is clear, future studies will be needed to understand the role of this protein in the PT and possibly other regions. Analysis of host single-cell transcriptomes To understand how macrophages respond to E. intestinalis infection, we separately analyzed only host transcripts. Cell clustering revealed 9 distinct clusters (H0-H8) (Fig. [154]5a; Supplementary Fig. [155]6a), with the “H” prefix denoting clustering based solely on the Host transcriptome. Comparison of the expression pattern for each cluster to the Human Primary Cell Atlas (HPCA)^[156]38,[157]39 indicates that all of the cell clusters correspond to macrophages, monocytes, or dendritic cells (Supplementary Fig. [158]6b). Clusters H0, H1, H2, H4, and H7 were observed in both uninfected and infected samples, while clusters H3, H5, H6, and H8 are expanded in infected samples compared to uninfected controls. Clusters H6 and H8 contain very few cells and will not be considered further (Supplementary Fig. [159]6a). Fig. 5. Transcriptional analysis of the host response to E. intestinalis infection. [160]Fig. 5 [161]Open in a new tab a UMAP plots of host-only transcripts projecting uninfected cells (left) and all cells (right) from Donors 1–4. b UMAP plot colored by uninfected cell (gray) or infected (blue) cells. A cell is scored as infected if >2% of total transcripts in the cell were derived from E. intestinalis. c Quantification of the percentage of infected cells in each cluster. d Dotplot of the top 10 genes expressed in each cluster. The X axis shows the genes and the Y axis shows the clusters. The color indicates the average expression level across the cells in each cluster and the size indicates the percentage of cells in each cluster expressing that gene. e, f Violin plots showing expression levels of top genes from clusters H5 (e) and H3 (f). g, h Bar plots showing −log[10](p-value) from enrichment analysis of biological pathways on clusters H5 and H3 using the Molecular Signatures DataBase (MSigDB) hallmark gene sets. The p-value is calculated using the Enrichr Fisher’s exact test with multiple hypothesis correction using BH approach. i Quantification of the percentage of parasite developmental stages found within the human-only clusters. Source data are provided in the Source Data file. We next examined the distribution of infected cells across the macrophage clusters. The only cluster enriched for infected cells was H5, in which 83.9% of cells were infected. However, surprisingly, cluster H5 only contains ~13% of all infected cells, while the majority of infected cells (87%) are distributed across the other eight clusters in approximately similar proportions (Fig. [162]5b), with infection rates ranging from 8.9% - 31.6%. Most infected cells are interspersed amongst non-infected cells, suggesting that in most cases, host transcription does not significantly change upon infection with E. intestinalis. Thus, most infected macrophages apparently fail to detect or respond to the invading parasites. In contrast, cells in cluster H5 are transcriptionally distinct from uninfected cells, and may be responding to infection. Cluster H5 appears as early as 3 hpi and persists through later timepoints. To better understand the nature of the host response in cluster H5, we identified differentially expressed genes (Fig. [163]5d). We observed upregulation of cell signaling genes (such as INHBA, IER3, CIR1), and several long non-coding RNAs of known (MALAT1 and NEAT1) and unknown (LINC02244 and LINC01705) function (Fig. [164]5e). MALAT1 and NEAT1 have both been shown to promote inflammatory responses in macrophages during viral infection and NEAT1 has been implicated in inflammasome activation leading to programmed cell death^[165]40–[166]42. Pathway analysis revealed that these differentially expressed genes are involved in TNF signaling via NF-κB, hypoxia, and inflammatory responses (Fig. [167]5g). The parasites found in cluster H5 are primarily at the latest stages of infection, while early parasite stages are poorly represented (Fig. [168]5i). Taken together, the characteristics of macrophages in cluster H5 suggest that these cells have sensed the infection and may be responding by triggering pyroptosis to limit parasite replication and spread. Alternatively, the parasites may be manipulating the host cell in order to facilitate parasite egress via cell lysis. Cluster H3 is also expanded in infected samples compared to controls, but in contrast to H5, H3 is not enriched for infected cells (infection rate ~20.1%). Comparison of the H3 transcriptional profile to the Human Primary Cell Atlas revealed a strong signature characteristic of monocyte-derived macrophages that were treated with interferon-alpha (IFNα) (Supplementary Fig. [169]6b). Pathway analysis for genes differentially expressed in cluster H3 also indicated a strong signature related to stimulation with IFNα and IFNγ (Fig. [170]5h). Macrophages typically produce type I IFNs, such as IFNα, in response to viral infection, which induces apoptosis of virus-infected cells^[171]43,[172]44. Therefore, it is possible that cells in cluster H3 are responding to other cells in the population that have sensed E. intestinalis and are secreting IFNs. In agreement with this IFNα-responsive signature, cluster H3 shows an upregulation of interferon inducible genes such as ISG15, MX1, RSAD2, IFIT2 and IFIT3 (Fig. [173]5d, f). While the interferon response is important for the control of microsporidia infection in animal models^[174]45, it does not appear to be protective in macrophages in vitro, as parasite development appears to be unaffected in cluster H3 (Fig. [175]5i). Discussion Our scRNA-seq data provide a transcriptional atlas of E. intestinalis development, which enables the identification of molecular markers for each stage of parasite development. During the early stages of E. intestinalis development (P0-P1), we see house-keeping genes being expressed as well as transcription initiation factors. This suggests that clusters P0-P1 represent parasites just prior to replication and rapid proliferation, which may correspond to the sporoplasm or meront stages (Fig. [176]6a). In cluster P2, we observe the expression of ASCL as well as protein synthesis machinery which are likely important for the rapid growth of E. intestinalis, corresponding to the proliferative meront to early sporont stages. Following the proliferative stage, chitooligosaccharide deacetylase-like protein was upregulated (Fig. [177]3e). This protein is involved in deacetylation of chitin to form chitosan, an important component of the fungal cell wall^[178]46. It has been shown in E. cuniculi that the chitin deacetylase localizes to the plasma membrane during the meront-to-sporont transition^[179]47. This suggests that cluster P3 begins the development of the spore wall. This was further exemplified by the upregulation of genes involved in spore wall formation including SWP1 and EnP1. In E. intestinalis, SWP1 is expressed in the transition between the proliferative stage and sporont stage, in which a thin layer of exospore is formed, suggesting that P3 is the sporont stage. In the final detected stage of development, the genes encoding the known PTPs are differentially expressed along with genes encoding proteins with RBL domains and signal peptides. The upregulation of many of these secreted proteins along with known PTPs suggests this cluster corresponds to PT biogenesis. Thus, cluster P4 may correspond to the mid-to-late sporoblast stage which is where PT assembly occurs (Fig. [180]6a). A general trend away from proliferative gene expression and toward spore production later in infection was also reported in an earlier bulk RNA-seq analysis of E. cuniculi infected RK-13 cells, including modest upregulation of PTP1, PTP2, and a gene involved in spore wall formation^[181]20. Fig. 6. Host-parasite transcriptional atlas. [182]Fig. 6 [183]Open in a new tab a Developmental program of E. intestinalis. Molecular markers for each parasite stage in red. b Schematics of the host response to E. intestinalis infection. E. intestinalis infection goes undetected in the majority of cells and the parasites are able to undergo their full lifecycle (top). A subset of host cells are able to respond to infection (cluster H5) particularly when the parasites mature into late stage spores (middle). These cells are undergoing cell death to limit infection and spread. Bystander cells are responding to secreted interferons from infected cells and creating an antiviral cell state to try to prevent infection (bottom). The transcriptional response to microsporidia infection among hosts are diverse. Previous studies using bulk RNA seq found in silkworms infected with Nosema bombycis there is an upregulation of antimicrobial peptides^[184]48 and Caenorhabditis elegans undergo an antiviral response when infected with Nematocida species^[185]19,[186]48. In mammals, cytokines are induced upon infection and there is an overall rewiring of host cell energy production^[187]17,[188]21. Here, we investigated single cell transcriptomic changes during E. intestinalis infection in primary human macrophages from four healthy human donors. Our results indicate that the majority of macrophages are unable to detect invading E. intestinalis parasites, as depicted by the even distribution of both infected and uninfected macrophages across clusters (Fig. [189]5b; Fig. [190]6b). In most macrophage clusters, we observed all parasite developmental stages, which suggests that E. intestinalis can progress through its entire life cycle without altering the transcriptional profile in host macrophages and successfully avoid host detection. This is in agreement with previous studies, in which naive human monocyte-derived macrophages were unable to reduce parasite burden^[191]49. However, if monocyte-derived macrophages were pre-treated with LPS or interferon-γ, they were able to reduce parasite burden, suggesting that naive macrophages require external signals, often from T cells, in order to inhibit parasite replication. This mimics the environment in immunocompromised individuals, such as AIDS patients, where there is a decreased number of T cells to provide external signals to the infected macrophages, allowing E. intestinalis to harness macrophages for replication. Interestingly, Salmonella enterica has recently been shown to manipulate macrophage migration, mediating spread of the pathogen from intestinal epithelial cells into the bloodstream^[192]50,[193]51. Hypotheses along similar lines have been suggested for microsporidia^[194]22, though definitive evidence to support or refute the hypothesis remains to be gathered. While the majority of cells are unable to detect infection, one cluster, H5, is induced upon infection. Cluster H5 contains a high rate of infected cells and is transcriptionally distinct from other clusters (Fig. [195]5b), suggesting that cells in this cluster detect infection and may mount an inflammatory response against the parasites. Interestingly, the parasites in this cluster primarily represent the latest stages of development (Fig. [196]5i). It is possible that mature parasites are easier to detect, resulting in the host cell undergoing pyroptosis to limit infection (Fig. [197]6b). Like H5, cluster H3 is also induced by E. intestinalis. However, in contrast to H5, the rate of infection in H3 is much lower, and cells from H3 may be responding to interferons that are being secreted by neighboring infected cells instead of responding directly to infection (Fig. [198]6b). Overall our work begins to uncover the molecular mechanisms of parasite development that correspond to the developmental stages observed by previous EM studies^[199]25, as well as the heterogeneity of host cell responses and donor-to-donor variation. Methods Isolation of primary human monocyte-derived macrophages LeukoPaks were obtained from anonymous human blood donors with informed consent from the New York Blood Center. De-identified samples are exempted from the ethics approval requirements by the NYULH Institutional Review Board. Primary human peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll gradient separation as previously described^[200]52. CD14^+ monocytes were then isolated from the PBMC fraction using the EasySep Human CD14 Positive Selection Kit II (STEMCELL Technologies 17858) according to the manufacturer’s protocol. Briefly, the cells were resuspended at a concentration of 1 × 10^8 cells/mL in 1X PBS (Gibco) containing 2% heat-inactivated fetal bovine serum (FBS) and 1 mM Ethylenediaminetetraacetic acid (EDTA). The EasySep Human CD14 Positive Selection Cocktail II was added (100 µL/mL of sample) and incubated for 10 min at room temperature. Next, the EasySep Dextran RapidSpheres was added (100 µL/mL of sample) and incubated for 3 min at room temperature. The mixture was topped to the recommended volume with 1X PBS containing 2% FBS and 1 mM EDTA. The sample was placed in an EasySep magnet at room temperature for 3 min. The supernatant was discarded and repeated for a total of three washes. The isolated CD14^+ cells were plated at 1 × 10^6 cells/mL in RPMI (Cytivia) supplemented with 10% FBS, 10 mM HEPES, 100 U/mL penicillin, 100 μg/mL streptomycin and 50 ng/mL granulocyte-macrophage colony-stimulating factor (GM-CSF) (Partner Therapeutics). The cells were incubated at 37  °C with 5% CO[2]. Media was replenished with GM-CSF on day 2. Cells were harvested and plated for infection on day 4. Donors 1 and 2 were isolated on two separate days whereas Donors 3 and 4 were isolated in parallel on the same day. Consequently, the UMAP clustering similarities among Donors 3 and 4 and the UMAP clustering differences between Donors 3 and 4 compared to Donors 1 and 2 may reflect biological difference between donors, or could also result from batch effects during cell isolation and culture, which may lead cells processed on different days being transcriptionally more similar to cells isolated on the same day. Propagation and purification of E. intestinalis All mammalian and parasite lines used in this work were tested monthly for mycoplasma contamination (MycoStrip, InvivoGen), and all tests were negative. E. intestinalis spores (ATCC 50506) were propagated in Vero cells (ATCC CCL-81). Vero cells were grown in a 75 cm^2 tissue culture flask using Dulbecco’s Minimum Essential Medium (DMEM) (Gibco) supplemented with 10% heat-inactivated fetal bovine serum (FBS) at 37 °C and with 5% CO[2]. At 70%-80% confluence, the media was switched to DMEM supplemented with 3% FBS and E. intestinalis spores were added. Infected cells were allowed to grow for 8–12 days and the medium was changed every two days. To purify spores, the infected cells were detached from tissue culture flasks using a cell scraper and moved to a 15 mL conical tube, followed by centrifugation at 1300 × g for 10 min at 25 °C. Cells were resuspended in 5 mL 1X PBS and mechanically disrupted using a G-27 needle. The released spores were purified using a 2-step Percoll gradient^[201]53. Equal volumes (5 mL) of spore suspension and 100% Percoll were added to a 15 mL conical tube, vortexed, and then centrifuged at 1800 × g for 30 min at 25 °C. The spore pellets were washed three times with 1X PBS. The spores were further purified via Percoll gradient (25%, 50%, 75%, 100%) ultracentrifugation. The spores were layered over the Percoll gradient and centrifuged at 8600 rpm for 30 min at 25 °C (TH-641 Swinging Bucket Rotor, ThermoFisher Scientific). Bands containing various impurities remained in the supernatant/gradient fractions, and were removed using a vacuum. The pellet at the bottom of the tube, containing mature spores, was washed with 1X PBS and ultracentrifuged again at 8600 rpm for 30 min at 25 °C. The spore pellet was resuspended in 1X PBS and transferred to a 1.5 mL microcentrifuge tube and centrifuged at 3000 × g for 3 min at 25 °C. This was repeated twice. The final pellet was resuspended in 1X PBS and stored at 4 °C. E. intestinalis infection experiments For each infection experiment used to prepare the samples for scRNA-seq, two identical sets of cultures of E. intestinalis-infected Vero cells were prepared. One set of cultures was processed for scRNA-seq, and the second set of cultures was prepared for fluorescence microscopy to assess the infection levels of the sample. 2 × 10^5 differentiated macrophages were seeded in a 6-well plate (CELLTREAT), either empty (for scRNA-seq) or with 5 rounded-glass coverslips placed in each well (for microscopy sample) (Fisher Scientific). The wells were supplemented with RPMI 1640 medium, 10% FBS, and 50 ng/mL of GM-CSF. The cells were allowed to rest for 24 h at 37 °C with 5% CO[2]. 2.5 hr prior to infection with E. intestinalis, GM-CSF was added to the cells. To infect the cells, purified, mature E. intestinalis spores were added into each well at a multiplicity of infection (MOI) of 20 (or 4 × 10^6 spores). Microsporidian spores were allowed to infect the cells for 24 hr, except 3 hpi and 12 hpi timepoints which were incubated for 3 hr and 12 hr, respectively. For samples intended for analysis 24, 48, and 72 hpi, spores were removed and the wells were washed with fresh media 3 times at 24 hpi and infection continued to the respective timepoints. The exact infection timepoints collected in each donor is listed in Fig. [202]2a and Supplementary Table [203]1. Fluorescence microscopy was used to assess microsporidian infection and replication in macrophages, prior to scRNA-seq. The media was removed from the wells and the cells were fixed with 4% paraformaldehyde at 37 °C for 15 min, followed by washing with 1X PBS, three times. The cells were stained with 1X phalloidin-iFluor488 (Abcam, USA) at room temperature for 20 min. After washing with 1X PBS, the cells were stained with 5 μM DRAQ5 (Novus Biologicals) for 10 min and 0.5 μg/ml of calcofluor white for 10 min (Sigma Aldrich) at room temperature. To mount the slides, 50% glycerol was added to poly-lysine coated slides (Fisher Scientific) and the coverslips were placed on the slide. The coverslips were sealed with a clear nail polish to prevent evaporation. The slides were visualized using a Nikon CSU-W1 spinning disk laser confocal microscope. Generating single-cell suspensions and cell hashing To isolate a single cell for scRNA-seq, infected macrophages were detached from the 6-well plate by trypsinization. Briefly, the media was removed from the well and incubated with 0.25% trypsin (Corning) at 37 °C with 5% CO[2] for 15 min. RPMI 1640 medium supplemented with 10% FBS was added to stop the trypsinization reaction. At this step, some of the macrophages still adhered to the well. Pipetting of the media against the bottom of the well was done to further dissociate the cells. To collect the cells, they were centrifuged at 1200 × g for 5 min at 4 °C. To combine cells from several infection timepoints for the scRNA-seq library preparation, the cells from each infection timepoint were incubated with different hashing antibodies^[204]54. Macrophages were resuspended with 100 μl of a staining buffer (2% BSA and 0.01% Tween 20 in 1X PBS), supplemented with 10 μl of Fc blocking reagent (BioLegend, USA), and incubated for 10 min at 4 °C. 0.5 µg of TotalSeq B anti-human hashtag antibodies 5–10 (BioLegend, USA) was added into the cell suspensions collected from each infection time point and incubated for 20 min at 4 °C. The cells were washed three times with 1 ml of staining buffer. Finally, the cell pellets were resuspended with 150 μl of staining buffer. Cell density and viability were counted using a hemocytometer by mixing the cells with trypan blue dye (Invitrogen, USA). Our cell viability in each infection time point ranges from 91% to 100%. Library preparation and 10X scRNA-seq After cell hashing, cells from different infection timepoints were pooled together and adjusted to ~500 cells/μl prior to processing for scRNA-seq library preparation. The library preparation was carried out using a Chromium Single-Cell 3′ Reagent Kits v2 Chemistry (10x Genomics, USA) according to the manufacturer’s protocol. We recovered a total of 10,100 cells from Donor 1, 12,279 cells from Donor 2, and 30,355 cells combined for Donors 3 and 4 for scRNA-seq library preparation. After library completion, the paired-end sequencing was performed using an Illumina NovaSeq 6000 system (Illumina, USA). Both library preparation and sequencing were carried out at the Genome Technology Center, NYU Langone Health. Processing of the raw sequencing reads A combined reference genome containing both the human genome (GRch38; GCF_000001405.39; contains nuclear and mitochondrial genomes)^[205]55 and the E. intestinalis genome (ATCC 50506; GCF_000146465.1)^[206]56,[207]57 was generated using Cell Ranger software version 5.0.1 with a Cell Ranger mkref function (10X Genomics, USA). No quality control was performed prior to Cell Ranger. Raw sequencing reads were mapped to the combined reference genome and the gene expression matrices were generated using Cell Ranger count function with default parameters^[208]58. Cell Ranger count quantifies an amount of genes detected in each cell (nFeature) and the total number of the RNA molecules in each cell (nCount), which was evaluated from unique molecular identifiers (UMIs). scRNA-seq data processing Initial data processing The gene expression matrices from Cell Ranger count were transferred and processed using Seurat version 4.0^[209]59. The number of cells obtained from each donor prior to filtering are shown in Supplementary Table [210]3. Cells from different infection timepoints were demultiplexed based on their hashing antibodies, using a MULTIseqDemux function. Doublet cells, which contained more than one type of hashing antibodies, and negative cells, which lacked any hashing antibody) were removed from all of the datasets. To further filter low quality cells, nFeature, nCount, and percentage of mitochondrial genes (percent.mt) were used (Supplementary Fig. [211]1). Criteria for filtering cells were different between donors. In donor 1 dataset, the criteria were nFeature: 500-6000 genes, nCount: 2000-40,000 UMIs, and percent.mt <20. For donor 2, the criteria were nFeature: 500–6000 genes, nCount: 2200–35,000 UMIs, and percent.mt <20. For donor 3, the criteria were nFeature: 400–4000 genes, nCount: 1000–20,000 UMIs, and percent.mt <20. For donor 4, the criteria were nFeature: 400-3800 genes, nCount: 1000-15,000 UMIs, and percent.mt <20. Number of cells passed the quality control are shown in Supplementary Table [212]1. The datasets for donor 1, donor 2, and donor 3 + 4 combined were each normalized using a log normalization method (NormalizeData, normalization.method = “LogNormalize”, scale.factor = 10,000) and variable features were identified (FindVariableFeatures, selection.method = “vst”, nfeatures = 3000). Integration anchors were identified prior to combining the data from different donors (FindIntegrationAnchors). These anchors were used to integrate datasets from 4 donors (IntegrateData) and the integrated data were scaled (ScaleData). Then, we ran RunPCA. PC1 to PC41 were included to perform dimensionality reduction using a Uniform Manifold Approximation and Projection (UMAP) method (RunUMAP, dims = 1:41). For cell clustering, FindNeighbors (dims = 1:41) and FindClusters (resolution = 0.4) were carried out. DimPlot function was used to generate UMAP plots in Fig. [213]2b-f, while ggplot was used to make Fig. [214]2g. To identify which macrophages were infected with E. intestinalis, the percentage of E. intestinalis transcripts detected in each cell were calculated (PercentageFeatureSet, pattern = “^Eint-”). We classified macrophages that have >2% of the microsporidian transcripts as “infected”. Varying this cutoff between 0.25% and 5% microsporidian transcripts did not drastically change the percent of cells scored as infected in either control or infected samples (Supplementary Fig. [215]7). The number of infected cells found in each donor are shown in Supplementary Table [216]2. A small portion of the control cells are classified as infected. We hypothesize that these are doublet cells that our pipeline cannot detect. While we can readily exclude doublets when a drop contains 2 hashing oligos (~11%), as well as cells lacking a hashing oligo (~10%), doublets involving two cells with the same oligo or one positive and one negative cell occur with moderate frequency and are difficult to detect. By pooling the uninfected control cells with the infected cell timepoints from the same donor after cell hashing, we suspect that rare doublet cells were generated wherein a control cell with a hashing oligo and an infected cell lacking a hashing oligo ended up in the same droplet. This would give the appearance of a control cell based upon the cellular hashtag, but would contain mRNA from both host cells as well as the parasite. scRNA-seq data processing of the E. intestinalis transcripts To investigate E. intestinalis transcriptional profiles at different developmental stages, E. intestinalis transcripts were separated from the human transcripts by a subset function prior to normalization using identical parameters as previously described in the initial scRNA-seq data processing. Only infected cells identified from the initial scRNA-seq data processing were used in this analysis. Integration anchors were identified (FindIntegrationAnchors). The data from 4 donors were integrated (IntegrateData) and scaled (ScaleData). RunPCA was performed followed by RunUMAP using PC1 to PC44 (RunUMAP, dims = 1:44). To cluster the infected cells, FindNeighbors (dims = 1:44) and FindClusters (resolution = 0.35) were performed. FeaturePlot function was used to generate Fig. [217]3b, e. scRNA-seq data processing of the human transcripts To process the human transcripts separately, low quality cells were filtered using the same parameters as described in the initial data processing section. Then, the data from 4 donors were integrated and scaled. For the downstream analyses, similar pipelines as in the processing E. intestinalis transcripts were utilized, except the principle components used for RunUMAP. They were PC1 to PC43 (RunUMAP, dims = 1:43) and the resolution used for performing cell clustering was 0.20 (FindClusters, resolution = 0.20). Marker gene detection and differential gene expression analyses To identify signature genes for each cell cluster, FindAllMarkers function was performed using the default parameters and the min.cpt of 0.25 (FindAllMarkers, min.pct = 0.25). Wilcoxon Rank Sum test was used to test the differentially expressed genes between clusters. Average Log[2] Fold change (avg_log2FC) was also used to rank highly expressed genes in each cluster. DotPlot was carried out to generate Fig. [218]2h and Fig. [219]5d, while DoHeatmap was utilized in Fig. [220]3e. VlnPlot was used to visualize the gene expression levels in Figs. [221]3d and [222]5e, f. Gene ontology enrichment analysis To gain better understanding on what pathways highly expressed genes in cluster H5 and H3 are involved in, pathway enrichment analysis was performed using EnrichR ([223]https://maayanlab.cloud/Enrichr/)^[224]60. Briefly, highly expressed genes that are upregulated and have a fold change >1.5 were selected and subjected for EnrichR. Figure [225]5g, h were obtained using a ‘Molecular Signatures Hallmark 2020’ library. The terms were ranked according to their p-values. Automatic cell type annotation To perform automatic cell type recognition from the scRNA-seq data, SingleR package was used^[226]39. SingleR compares gene expression patterns from each cell cluster to the normalized expression values obtained from the Human Primary Cell Atlas (HPCA)^[227]38. A ‘label.fine’ function was used to obtain more detailed categories of the immune cell types (SingleR, ref = Human.primary.cells, assay.type.test = 1, labele = Human.primary.cells$label.fine). Each cell in the datasets was categorized into one of the cell types. Then, the percentage of cells in each cluster were visualized using ggplot in R. Prediction of Ricin B domain-containing proteins To identify proteins with ricin B domains in E. intestinalis, we used FoldSeek^[228]61 to search the AlphaFold Database^[229]62 of predicted protein structures on March 30, 2023. We initiated the search using the predicted structure of Eint_081460 as a probe and restricted the search to the suborder Apansporoblastina, which includes E. intestinalis. This resulted in 196 hits from a number of different species; these sequences were each used to initiate a BLASTp search using default search parameters but restricted to E. intestinalis ATCC 50506. The proteins identified from all of these searches were combined in a single list and duplicates were removed, resulting in 14 unique E. intestinalis proteins. Re-predicting these structures of these 14 proteins through ColabFold^[230]63 using the AlphaFold2_advanced notebook with default parameters strongly suggested the presence of ricin B domains in 12 of 14 proteins, based upon manual examination of the predicted folds and consistency in the structure prediction across the 5 top ranked predictions. For the 13th protein (Eint_070940), the predictions were more variable, with only one of the top five predictions resembling a ricin B domain. This initial prediction included the N-terminal signal peptide and a predicted disordered region from the C-terminus, which can sometimes interfere with AlphaFold2 predictions of neighboring regions. After deleting these regions, the remaining sequence (residues 16-156) was predicted to adopt a ricin B domain fold with good agreement across the top 5 predictions. For the 14th and final protein (Eint_071030), the predictions had lower pLDDT scores and were more variable between predictions, suggesting a less reliable prediction. However, the best scoring predictions resembled an RBL fold, with similar topology and connectivity and could be superposed on other RBL domains. These predictions deviated primarily in the prediction of first and final β-strands, and similar results were obtained with the Eint_071030 ortholog from E. cuniculi (ECU07_1070), which shares ~70% sequence identity. Thus, it appears that E. intestinalis likely encodes 14 ricin B domain-containing proteins in its genome (Supplementary Fig. [231]4). Prediction of secreted proteins The sequences of all annotated proteins from the E. intestinalis ATCC 50506 genome (a total of 1,934 sequences) were downloaded from Uniprot on April 6, 2023. Prediction of possible signal peptides was carried out using SignalP-6.0^[232]64 using the default parameters. However, only a relatively small fraction of the E. intestinalis proteome was predicted to be secreted: only 3.4% of the proteome (66 proteins) was predicted to have a signal peptide with a probability of 0.5 or higher. In contrast, a similar analysis of the Plasmodium falciparum genome predicted that 8.8% of the proteome had a signal peptide with a probability of 0.5 or higher. Several proteins expected to be secreted from E. intestinalis had much lower signal peptide probabilities, including PTP3 (probability = 0.34) and several Ricin B domain-containing proteins (e.g., Eint_081460, probability = 0.16). Microsporidia are evolutionarily divergent, and protein secretion has primarily been studied in model organisms. Thus, we hypothesized that prediction tools may still detect many signal peptides in E. intestinalis proteins, but that E. intestinalis signal peptides may result in lower scores due to lineage specific differences in signal peptide lengths, composition, cleavage site preferences, etc. Indeed,