Abstract ObjectivesThe upper respiratory tract flora may influence host immunity and modulate susceptibility to viral respiratory infections. This study aimed to investigate the associations between upper respiratory tract flora and immune cells in severe ILI, identify specific microbial taxa and immune response pathways contributing to disease severity, and elucidate how flora influences ILI progression by modulating immune cell functions.MethodsHeritability of GWAS summary data was estimated using LDSC (v1.0.1). Gene-level genetic associations were analyzed with MAGMA. scRNA-seq data were integrated with genetic association data using scDRS. FUSION was used to construct cell type-specific expression quantitative trait locus models based on genotypes and scRNA-seq data from the onek1k project, which were combined with flora abundance-related GWAS data for a transcriptome-wide association study.ResultsFrom the LDSC analysis, data from 1195 severe ILI-associated GWASs with upper respiratory flora(h2 > 0.1) were included in subsequent analysis. TWAS identified 19 significant association pairs (P[adj ]< 0.05), and 1226 differentially expressed genes between mild and severe ILI patients (P[adj] < 0.05 and | log[2]FC|>0.25). Functional enrichment analyses using GO, KEGG, and Reactome databases revealed that immune cells,such as CD4 + T effector memory cells, cDCs, NK cells, were enriched in multiple biological processes or pathways.ConclusionsThis study identified associations between severe ILI-related upper respiratory tract flora and cell type-specific gene expression, potentially explaining how differential flora influences ILI progression. CD16 + monocytes exhibited the most differentially expressed genes, followed by proliferating cells and cDCs, highlighting the significant role of immune cell-enriched pathways in ILI progression. 1. Introduction Influenza-like illness (ILI) is a global public health problem that kills hundreds of thousands of people each year [[38]1–[39]3]. Acute pulmonary inflammation and multiple organ failure associated with severe ILI are the leading causes of death after ILI infection [[40]4–[41]6]. Vaccination is the most effective way to prevent influenza. However, vaccine effectiveness has been low to moderate in recent years [[42]7,[43]8], and vaccine coverage remains low, particularly in low- and middle-income countries where pathogens are susceptible to mutation and protection is suboptimal [[44]9]. Consequently, ILI infection remains a significant public health problem with enormous economic and social burdens globally. The study of ILI-organism interactions and mechanisms is important for infection control. As more research supports the critical role of the flora in regulating host immunity [[45]10–[46]12], exploring whether these effects extend to influenza risk may provide new perspectives and approaches for refining prevention strategies. Animal models and human studies have shown that nasopharyngeal respiratory surfaces are colonized by large commensal flora [[47]11–[48]13]. They can alter susceptibility to viral respiratory infections, including influenza, by modulating host immunity. For example, flora can potentially inhibit the overgrowth of pathogens and prevent their transmission to the lungs, thereby serving as a “protective barrier” in maintaining respiratory health.[[49]14] In recent years, it has been suggested that there is a relationship between the diversity of the upper respiratory tract flora and respiratory infections [[50]15]. For example, one study reported that host susceptibility to ILI was closely related to the abundance of Streptococcus and Bacillus salivarius [[51]16]. The abundances of Haemophilus, Prevotella, Clostridium and Neisseria in the oral cavity and nasopharynx were associated with the length of hospital stay in infants hospitalized with respiratory syncytial virus infection in one study [[52]17]. A recent study of patients with novel coronavirus pneumonia (COVID-19) revealed significant differences in the diversity of the upper respiratory tract flora between COVID-19 patients and controls, with Corynebacterium, Lactobacillus, and other related flora consistently and significantly lower in COVID-19 patient samples, whereas Neisseria was consistently and significantly greater [[53]18]. The epithelial cells of the upper and lower respiratory tracts are the primary target cells for influenza virus infection and replication. These epithelial cells are surrounded by a complex flora that may directly or indirectly interact with influenza viruses to mediate infection risk. Symbiotic strains can prevent infection by modulating the host’s innate and adaptive immune responses [[54]10,[55]11]. However, relatively few studies have been conducted on the relationship between the upper respiratory flora and the progression of ILI, and its mechanism in the progression of ILI has not been clarified. In-depth investigation and elucidation of the interactions and regulatory mechanisms between ILI and the host upper respiratory flora, lung injury, and systemic immune responses may provide new intervention strategies for ILI and help prevent and ameliorate infection-induced lung injury. The antigen-specific immune response is an important means for the body to recognize and kill pathogens and has a significant impact on infection control and clinical resolution. Excessive immune killing can cause severe histopathological damage, leading to worsening of symptoms and even death [[56]19–[57]21]. In the intervening years, with the development of biotechnology, the understanding of immune cell function has deepened, and the role of different immune subtypes in the process of pathogen infection has been gradually recognized. For example, CD8 + effector memory T cells generated after atypical pneumonia can provide durable protection against coronaviruses. In addition, during respiratory syncytial virus infection, this subtype of T cells is associated with the development of pathological immune damage [[58]22,[59]23]. In COVID-19 patients, tissue-resident Th17 cells coordinate the killing effects of lung macrophages and cytotoxic CD8 + T cells and may be associated with a protective role against lung injury [[60]24]. On the basis of the above findings, this study hypothesized that the upper respiratory tract flora may influence the course of ILI infection by modulating the recognition and killing effects of immune cells on pathogens after an organism is infected with ILI. To this end, We employed the method of Genome-Wide Association Study (GWAS), a well-established and efficient approach widely used to identify genetic loci associated with common diseases or traits [[61]25]. The method aims to explore the relationship between specific traits or diseases and single nucleotide polymorphisms (SNPs), which represent variations in a single nucleotide within the genome. Typically, GWAS analyses rely on high-throughput genotyping technologies, such as SNP arrays, and incorporate statistical methods (e.g., logistic regression, mixed-effects models) to assess the association between each SNP and the trait of interest. By comparing genotype differences between individuals, GWAS can uncover genetic variations related to specific traits or disease risks. In recent years, the scale of GWAS has expanded to include over 10 million SNPs [[62]26]. Since these SNPs are typically inherited in linkage disequilibrium blocks, certain representative or tag SNPs can capture the majority of variations within each block. Furthermore, with the increasing demand for explaining the genetic basis of more diseases or traits, as well as the substantial increase in sample sizes [[63]27], the statistical power and reliability of the results have been significantly enhanced [[64]28].Through Transcriptome-Wide Association Studies (TWAS), we integrated GWAS data with expression quantitative trait loci (eQTL) data [[65]29]. TWAS not only focuses on the direct association between genotypes (such as SNPs) and traits, but also considers how gene expression levels in specific tissues or cell types influence the occurrence of traits or diseases. Although thousands of genomic loci related to complex traits have been identified through GWAS, most of the trait-associated variants are located in non-coding regions, the functionality of which remains difficult to interpret [[66]30]. Moreover, identifying variants with small or moderate effects typically requires large sample sizes to achieve adequate statistical power. By aggregating multiple eQTLs to capture gene regulatory effects and directly measuring these effects at the transcriptome level, TWAS effectively overcomes these challenges, enabling the establishment of more biologically interpretable associations between genes and complex traits or diseases [[67]31]. We first calculated the heritability of the upper respiratory tract flora data obtained. Second, PBMC samples from patients with severe influenza were categorized into two groups, ILI mild and ILI severe, on the basis of symptoms, to identify the types of upper respiratory tract flora associated with these conditions. Third, after performing quality control on the data from these two groups, we conducted horizontal genetic association analysis via MAGMA software to integrate the flora SNP signals at the gene level. The scDRS (single-cell disease relevance score) was subsequently used to calculate genetic signals associated with the flora to identify the relevant cell types. Fourth, cell type-specific eQTL models were constructed for each identified microbiome-associated cell type to identify specific genes. Finally, pathway enrichment analysis was conducted on the identified gene sets via R software. 2. Materials and methods 2.1 Collection of flora data All adult Chinese individuals in this cohort were recruited for the multi-omics study, This cohort consists of 2,984 healthy individuals from the Chinese population. From 2017 to 2018, approximately 3,932 oral samples were newly collected from this cohort for whole metagenome sequencing, including 2,017 tongue dorsum samples and 1,915 saliva samples [[68]32]. For the saliva sample, a double concentration of stabilizing reagent kit was used and 2ml saliva was collected. The samples from Yunnan Province were self-collected using a commercial kit (Catalog No. 401103, Zeesan, Xiamen, China) [[69]33]. 2.2 Data acquisition and processing for genome-wide association studies * (1) Data Acquisition for Genome-Wide Association Study of Upper Respiratory Tract Flora: GWAS data of upper respiratory tract flora were downloaded from the China National Gene Bank database (CNGBdb, project data No. CNP0001664) [[70]32]. A total of 1,566 dorsal tongue flora and 1,547 salivary flora GWAS summary datasets were included in this project (genome version GRCh38). The full GWAS summary data were downloaded for this study. Notably, owing to some of the differences in the definitions of some of the floras between this study and the project, this study referred to the LPSN database ([71]https://lpsn.dsmz.de/) for the GWAS summary data that had the same or most similar definitions to each of the respiratory tract flora [[72]34]. * (2) Data quality control: The following QC processes were also performed on the downloaded GWAS summary data related to upper respiratory tract flora: i) SNPs located on autosomes were retained; ii) SNPs detected in less than 70% of the samples were excluded; iii) SNPs with alleles other than “A”, “C”, “G”, or “T” were excluded; iv) SNPs with anomalies in the variance of the summary statistic (≤ 0); v) SNPs with abnormal standard error (SE ≤ 0) were excluded; and vi) SNPs with allele frequency (MAF) < 0.001. * (3) Heritability calculation: The heritability (h2) of the included GWAS summary data was estimated via LDSC software (v.1.0.1), with 504 East Asian (EAS) populations from the 1000 Genomes Project Phase III as the reference panel [[73]35,[74]36]. For GWAS data related to the upper respiratory tract flora, only those with heritability estimates (h2) greater than 0.1 were included in subsequent analyses. 2.3 Data acquisition and processing for single-cell sequencing * (1) Data acquisition for single-cell sequencing: In this study, two scRNA-seq datasets containing peripheral blood mononuclear cell (PBMC) samples from patients with severe ILI were acquired to identify cell types associated with the upper respiratory tract flora. The original study of the first scRNA-seq dataset ([75]GSE149689) included a total of 20 PBMC samples from asymptomatic, mildly symptomatic, and severely symptomatic COVID-19 patients; patients with severe influenza; and healthy controls [[76]37]. We selected 16 samples totaling 53,387 cells from patients hospitalized with COVID-19 and severe influenza for subsequent analysis. A second source study of scRNA-seq data ([77]GSE164948) included 20 PBMC samples from patients hospitalized with COVID-19, influenza, and CAP infections with other pathogens, as well as healthy controls [[78]38]. Similarly, 16 samples from hospitalized patients with a total of 13,999 cells were selected for subsequent analysis. Five samples from asymptomatic and mildly symptomatic COVID-19 patients were defined as having mild ILI, and the remaining 27 samples from patients with severe COVID-19, critically hospitalized patients and hospitalized patients with influenza or CAP were defined as having severe ILI. * (2)Data quality control:First, i) scDblFinder (v.1.9.4) was used to identify and remove potential doubles [[79]39]; ii) cells expressing fewer than 200 genes, with a total UMI count ≥20,000, or expressing a percentage of mitochondrial genes ≥15% of the total UMI count were excluded; and iii) genes expressed in fewer than 3 cells were excluded. Finally, 51,243 cells and 23,442 genes were retained for subsequent analyses. * (3) Sample merging, dimensionality reduction and clustering: i) Seurat (v.4.4.0) was first employed to perform SCTransform normalization on each sample, in which a regression was performed on the percentage of mitochondrial gene expression in each cell [[80]40,[81]41]; ii) all samples were merged, 3,000 highly variable genes were selected, and linear dimensionality reduction was performed via principal component analysis; iii) batch correction of PCs was conducted on a per-sample basis via Harmony (v.1.2.0) [[82]42]; iv) the top 28 Harmony-corrected PCs that cumulatively explained up to 90% of the variance were selected to construct a K-nearest neighbor graph structure and were analysed by clustering via Louvain’s algorithm; and v) uniform manifold approximation and projection (UMAP) with the same number of PCs for visualization. * (4) Cell type annotation: To ensure the accuracy of cell type annotation, we performed cell type annotation on the subpopulations obtained by clustering on the basis of widely used marker genes as well as those used in the two original studies of the data ([83]Table 1). A total of 51,243 cells were finally divided into 15 subtypes. Of these, two subtypes, platelets (PLTs, n = 3,636) and red blood cells (RBCs, n = 829), were not the cell types. Table 1. Cell types defined marker gene. Cell Subtype Cell Population Marker Gene T Cell 15715 CD2、CD3E CD4 + naive T cell 5821 CD3E、CD4、CCR7、SELL CD4 + Effector T Cell 360 CD4、GNLY、NKG7 CD4 + Effector Memory T Cell 716 CD4、CTLA4、GPR183 CD8 + Naive T Cell 760 CD3E、CD8A、CD8B、CCR7、SELL CD8 + Effector T Cell 5022 CD8A、CD8B、GNLY、NKG7 CD8 + Effector Memory T Cell 3036 CD8A、CD8B、GZMK、GPR183 NKT 1382 CD3G、KLRF1、KLRD1 NK 6378 KLRF1、KLRD1 B Cell 6231 CD79A、MS4A1 Monocyte 16029 CD14、CD163、CD68 CD14 Mono 14377 CD14、S100A12 CD16 Mono 1652 CD14、FCGR3A cDC 272 FCER1A、CD1C Proliferating Cell 771 MKI67、TYMS PLT 3636 PF4、PPBP RBC 829 HBD、HBM [84]Open in a new tab * (5) Differential expression analysis: The differentially expressed genes (DEGs) between the severe and mild ILI groups in each cell type were analysed via the Wilcoxon rank sum test. Among them, genes whose |log[2](fold change)|(|log[2]FC|) > 0.25 and BH-adjusted P < 0.05 were defined as differentially expressed genes. 2.4 Analysis of flora-associated cell types The enrichment of flora-related genetic signals in 29,149 cells from the severe ILI samples was calculated via the single-cell disease relevance score (scDRS, v.1.0.3) to identify each flora-associated cell type [[85]43]. In brief, given the processed scRNA-seq data and gene‒disease association results output from MAGMA, scDRS first constructs a putative disease gene set G with each g in G weighted by its GWAS MAGMA z score ( [MATH: ωg :MATH] ) and then quantifies the aggregate expression of the putative disease gene set in each cell to generate cell‒specific raw disease scores S[c]: [MATH: Sc=gG< mrow>ωgσ< mrow>tech, g1 XcggGωgσ tech, g1 :MATH] where [MATH: σtech, g :MATH] indicates the estimated gene-specific technical noise of gene g and where [MATH: Xcg :MATH] is the normalized expression of cell c on gene g. scDRS also uses Monte Carlo (MC) to sample matched control gene sets and generate a set of raw control scores. scDRS normalizes these raw scores and computes cell-level P values on the basis of the empirical distribution of the pooled normalized control scores for each cell c: [MATH: Pc= 1+C=1 ncellΣb=1BΠ(ScScb< /mi>ctrl)1+ncellB :MATH] where B is the number of MC samples. In this study, we draw the top 1,000 genes from MAGMA ranked by the z score to generate the disease gene set and set the MC sampling times to 1000. 2.5 Identification of flora-associated cell type-specific genes For the identified cell types associated with each flora, the cell type-specific genes associated with each flora were identified by constructing a cell type-specific gene expression quantitative trait locus (eQTL) model and applying the analytical framework of FUSION software to perform TWAS analysis [[86]29]. Only cis-eQTL relationships (cis-eQTLs) were considered in this study. 2.5.1 Cell type-specific gene expression quantitative trait genome models. * (1) Genotyping Data Processing: The converted genotyping data were subjected to a series of QCs with reference to the original study [[87]44], and genotypes were populated via minimac4 (v.4.1.6) software. After excluding SNPs with a fill score < 0.7 or an MAF < 0.01, a final set of 5,807,466 SNPs from 1034 individuals was obtained [[88]35]. * (2) Single-cell sequencing data processing: The QC steps applied to the scRNA-seq data included the following: i) exclusion of genes whose expression was detected in fewer than 3 cells; ii) exclusion of 18,959 cells from 15 subjects that were not included in the genotype resampling; and iii) considering the heterogeneity of populations, experimental conditions, and cell type definitions, a total of 883,855 cells from 19 subtypes were reclassified into 14 subtypes on the basis of the similarity of gene expression in the flora-associated cell type analysis section. ([89]Table 2). Table 2. onek1k Summary of single-cell data quality control. onek1k Cell Subtype Cell Count Target Cell Subtype Target Cell Count Before Filtering After Filtering CD4 Naive 259012 255130 CD4 + Naive T Cell 255130 CD4 CTL 17993 17849 CD4 + Effector T Cell 17849 CD4 TEM 31261 30879 CD4 + Effector Memory T Cell 30879 CD8 Naive 52538 51487 CD8 + Naive T Cell 51487 CD8 TCM 16409 16175 CD8 + Effector T Cell 16175 CD8 TEM 161051 159634 CD8 + Effector Memory T Cell 159634 NK 163202 160465 NKT 160465 NK_CD56bright 7006 6922 NK 6922 B naive 65702 64321 B memory 123261 B intermediate 29889 29466 B memory B memory 30234 29474 B memory CD14 Mono 36130 35560 CD14 + Monocyte 35560 CD16 Mono 15743 15502 CD16 + Monocyte 15502 cDC1 114 114 cDC 4515 cDC2 4456 4401 cDC CD4 Proliferating 773 762 Proliferating Cell 2770 CD8 Proliferating 305 304 Proliferating Cell NK Proliferating 1731 1704 Proliferating Cell [90]Open in a new tab * (3) Model construction: Construct the cis-eQTL model via FUSION.compute_weights pipeline from FUSION [[91]29]. The cis-eQTL model construction process consists of i) extracting SNPs located in the genetic cis-regulatory regions of the target genes (cis-SNPs) from the genotypic data of the study subjects via PLINK (v.1.9.0) software [[92]45]; ii) a kinship matrix was constructed on the basis of the cis-SNPs of the target gene, and its heritability was calculated via GCTA (v.1.93.2 beta) in combination with the individual expression levels of the gene (cis-h2) [[93]46]; iii) for genes with significant heritability, each cis-eQTL method was fitted in the form of cross-validation to estimate the effect of fitting the model via each method; and iv) each cis-eQTL method was fitted via all samples to obtain the final result of each cis-SNP gene expression weight. 2.5.2 Identification of flora-associated cell type-specific genes. On the basis of the constructed cell type-specific cis-eQTL model, genomic data from the EAS population at 1000 Genomes Project were used as a reference panel, and TWAS analysis was performed on each flora-associated GWAS summary dataset via the FUSION framework to identify the cell type-specific expressed genes associated with each flora. Considering that the eQTL model was constructed from the European(EUR)population and that the flora-associated GWAS summary data were from the EAS population, this part of the study focused only on nominally significant genes, i.e., those with uncorrected P < 0.05. 2.6 Functional enrichment analysis For a given set of genes, pathway enrichment analysis was performed via the clusterProfiler (v.4.2.2) and ReactomePA (v.1.38.0) software packages on the basis of the Gene Ontology (GO), KEGG, and Reactome pathway databases [[94]47,[95]48]. 3. Results 3.1 Analysis of GWAS data related to the flora of the upper respiratory tract and identification of associated cell types A total of 1,566 GWAS summary statistics of the dorsal tongue flora and 1,547 GWAS summary statistics of the salivary flora were included in this part of the study. After QC, 5,345,095 SNPs were obtained. On the basis of the results of LDSC, 1195 GWASs of upper respiratory tract flora with h2 > 0.1 were included in the subsequent analyses, of which 489 were salivary flora-associated GWASs and 706 were dorsal tongue flora-associated GWASs. On the basis of MAGMA genetic association analysis, we identified 27 genes that were significantly associated with at least one flora. The top five genes, ranked in ascending order of P value, are SLC2A9 (P = 1.37E-09), SLC2A9 (P = 3.82E-08), MDGA2 (P = 4.59E-08), NUDT8 (P = 1.01E-07), and BBOX1 (P = 1.40E-07) ([96]Table 3 and An additional file shows this in more detail [see S1 Fig]). Notably, the SLC2A9 gene was significantly associated with two flora in the GWAS, both of which belong to the genus Oribacterium within the phylum Firmicutes in the tongue dorsum microbiome. Table 3. Genetic association analysis based on MAGMA gene level. Gene Chromosome N[SNPs] Ns[ample] P Sample Bacterial community SLC2A9 4 808 1996 1.37E-09 Tongue k__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Oribacterium; s__unclassified_mgs_1215 SLC2A9 4 808 1991 3.82E-08 Tongue k__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Oribacterium; s__unclassified_mgs_489 MDGA2 14 1636 2000 4.59E-08 Tongue k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Aerococcaceae; g__Granulicatella; s__unclassified_mgs_2134 NUDT8 11 64 1898 1.01E-07 Saliva k__Bacteria; p__Firmicutes; c__Clostridia; o__TANB77; f__CAG-508; g__CAG-793; s__unclassified_mgs_3234 BBOX1 11 205 1985 1.40E-07 Tongue k__Bacteria; p__Actinobacteriota; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Pauljensenia; s__unclassified_mgs_3571 NDUFV1 11 50 1898 1.57E-07 Saliva k__Bacteria; p__Firmicutes c__Clostridia; o__TANB77; f__CAG-508; g__CAG-79; | s__unclassified_mgs_3234 GJB4 1 78 1998 3.08E-07 Tongue k__Bacteria; p__Spirochaetota; c__Spirochaetia; o__Treponematales; f__Treponemataceae; g__Treponema_D; s__unclassified_mgs_1124 SPDYE18 7 838 1900 3.20E-07 Saliva k__Bacteria; p__Firmicutes; c__Negativicutes; o__Veillonellales; f__Veillonellaceae; g__F0422; s__unclassified_mgs_1482 KANSL3 2 44 1866 3.35E-07 Saliva k__Bacteria; p__Patescibacteria; c__Saccharimonadia; o__Saccharimonadales; f__Saccharimonadaceae; g__unclassified_mgs_2626 EDEM1 3 86 1987 3.45E-07 Tongue k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Aerococcaceae; g__Granulicatella; s__unclassified_mgs_3116 SPRTN 1 86 1996 3.53E-07 Tongue k__Bacteria; p__Fusobacteriota; c__Fusobacteriia; o__Fusobacteriales; f__Leptotrichiaceae; g__unclassified_mgs_2348 TBX10 11 80 1898 3.65E-07 Saliva k__Bacteria; p__Firmicutes; c__Clostridia; o__TANB77; f__CAG-508; g__CAG-793; s__unclassified_mgs_3234 EGLN1 1 225 1996 5.04E-07 Tongue k__Bacteria; p__Fusobacteriota; c__Fusobacteriia; o__Fusobacteriales; f__Leptotrichiaceae; g__unclassified_mgs_2348 ATP8A1 4 554 1904 5.13E-07 Saliva k__Bacteria; p__Firmicutes; c__Negativicutes; o__Selenomonadales; f__Selenomonadaceae; g__Selenomonas; s__unclassified_mgs_2794 ZNF277 7 198 1905 5.27E-07 Saliva k__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Prevotella; s__Prevotella_oulorum_mgs_3240 GJB5 1 74 1998 5.42E-07 Tongue k__Bacteria; p__Spirochaetota; c__Spirochaetia; o__Treponematales; f__Treponemataceae; g__Treponema_D; s__unclassified_mgs_1124 MCHR2 6 96 1983 7.05E-07 Tongue k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__unclassified_mgs_449 LRIF1 1 49 1894 7.97E-07 Saliva k__Bacteria; p__Fusobacteriota; c__Fusobacteriia; o__Fusobacteriales; f__Fusobacteriaceae; g__Fusobacterium; s__unclassified_mgs_2180 EDEM1 3 86 1976 8.33E-07 Tongue k__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Porphyromonadaceae; g__Porphyromonas; s__unclassified_mgs_2117 ANG 14 70 1902 9.27E-07 Saliva k__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Oribacterium; s__unclassified_mgs_388 IFNE 9 52 1894 1.08E-06 Saliva k__Bacteria; p__Actinobacteriota; c__Actinobacteria; o__Actinomycetales; f__Actinomycetaceae; g__Pauljensenia; s__unclassified_mgs_3546 RNASE4 14 81 1902 1.08E-06 Saliva k__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Oribacterium; s__unclassified_mgs_388 EXOC8 1 27 1996 1.15E-06 Tongue k__Bacteria; p__Fusobacteriota; c__Fusobacteriia; o__Fusobacteriales; f__Leptotrichiaceae; g__unclassified_mgs_2348 FZD3 8 248 1905 1.31E-06 Saliva k__Bacteria; p__Campylobacterota; c__Campylobacteria; o__Campylobacterales; f__Campylobacteraceae; g__Campylobacter_A; s__Campylobacter_A_showae_A_mgs_1682 RAI14 5 540 1995 1.44E-06 Tongue k__Bacteria; p__Firmicutes; c__Bacilli; o__Erysipelotrichales; f__Erysipelotrichaceae; g__Solobacterium; s__unclassified_mgs_3577 KCNK9 8 237 1991 1.45E-06 Tongue k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__unclassified_mgs_3318 MUC7 4 212 1899 2.02E-06 Saliva k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus; s__unclassified_mgs_1910 [97]Open in a new tab For the included ILI-associated single-cell data, a total of 51,243 PBMCs were obtained after QC for subsequent analysis. Downscaling, fractionation and cell type annotation yielded a total of 15 immune cell subtypes. ([98]Fig 1, An additional file shows this in more detail [see S2-S3 Figs]). A total of 4,465 cells from two subtypes, PLTs (n = 3,636) and RBCs (n = 829), were further excluded. A total of 29,149 cells from 13 cell types from severe ILI samples were ultimately included in the flora-associated cell type analysis. Fig 1. UMAP plot of downscaled scRNA-seq data from severe ILI colored by cell type. [99]Fig 1 [100]Open in a new tab Description of data: NK: natural killer cells; cDC: conventional dendritic cells; Prolif: proliferation cells; PLT: platelets; RBC: red blood cells. According to the results of the scDRS analysis, a total of nine taxa associated with ILI at the “phylum” level were identified. The phylum Firmicutes was the most abundant phylum with 6,305 instances, representing 40.59% of the total, followed by Bacteroidota with 1,846 instances (11.88%), Actinobacteriota with 1,820 instances (11.72%), Proteobacteria with 1,287 instances (8.28%), and Fusobacteriota with 975 instances (6.26%). In total, 205 species were found to be statistically significant, with an adjusted p value (P[adj]) < 0.05. ([101]Table 4).Thirteen immune cell types associated with severe ILI were associated with at least one flora, whereas 205 flora were associated with at least one immune cell type. (An additional file shows this in more detail [see [102]S4 Fig]). On the basis of the proportion of significant flora nominally significantly associated cells, CD16 + monocytes were associated with metagenomics-defined Campylobacter_A_concisus_F_mgs_1384 in saliva (29.49%, P[adj] = 0.013), unclassified_mgs_803 in the genus Prevotella salivarius (28.9%, P[adj] = 0.013), unclassified_mgs_1198 in the genus Haemophilus salivarius (23.43%, P[adj] = 0.006), and Lachnoanaerobaculum_sp000296385_mgs_3102 in the genus Lachnoanaerobaculum (27.74%, P[adj] = 0.006). The abundance of these flora was significantly correlated with that of other cell types. In this study of the upper respiratory tract flora associated with ILI, Firmicutes was the taxon with the highest proportion at the “phylum” level of categorization, which is consistent with other similar studies [[103]49,[104]50]. Thus, in the study of Firmicutes, 13 immune cell types were associated with at least one flora, and 91 flora were associated with at least one immune cell type.([105]Fig 2) In addition to the significant correlation between Lachnoanaerobaculum_sp000296385_mgs_3102 and the abundance of CD16 + monocytes in the genus Lachnoanaerobaculum described above, this strain was also significantly correlated with the abundance of cDCs (24.48%, P[adj] = 0.006). CD4 + T effector cells (27.75%, P[adj] = 0.004), CD8 + T effector cells (21.54%, P[adj] = 0.004), and NK cells (23.43%, P[adj] = 0.004) were significantly associated with the abundance of unclassified_mgs_1428 in the genus Catonella from the tongue dorsum. Table 4. Based on the “phylum” classification level, the statistical analysis of severe influenza-like illness (ILI) upper respiratory flora. phylum Total (%) Saliva (%) Tongue (%) P_sig (%) Firmicutes 6305(40.59) 2223(35.27) 4082(64.73) 91(1.44) Bacteroidota 1846(11.88) 975(52.82) 871(47.18) 13(0.70) Actinobacteriota 1820(11.72) 858(47.14) 962(52.86) 19(1.04) Proteobacteria 1287(8.28) 507(39.40) 780(60.60) 16(1.24) Fusobacteriota 975(6.26) 416(42.67) 559(57.33) 10(1.03) Others 3289(21.17) 1378(41.90) 1911(58.10) 56(1.00) [106]Open in a new tab Fig 2. Heatmap of cell type associations with the Firmicutes “phylum” based on scDRS analysis. [107]Fig 2 [108]Open in a new tab Description of data: Colors represent the proportion of cells significantly associated with the flora in each cell type, with asterisks indicating significance. *: BH-corrected P < 0.05. 3.2 Identification of cell type-specific genes associated with the upper respiratory tract flora For the 205 flora identified by scDRS with P[adj] < 0.05 and their associated 13 cell types, cell-specific eQTL models were further constructed and analysed by TWAS to identify severe ILI-associated cell type-specific genes associated with the flora. A total of 19 significantly associated pairs (P[adj] < 0.05) that were closely associated with severe ILI and involved 9 types of immune cells were identified via TWAS analysis. In particular, CD14 + monocyte cells, CD16 + monocyte cells, and CD4 + T effector memory cells had greater numbers of significant associations. Among the 8947 nominally significant association pairs (uncorrected P < 0.05), 3854 salivary flora association pairs and 5093 dorsal tongue flora association pairs were identified ([109]Fig 3 and An additional file shows this in more detail [see S1 File]). To further explore their potential roles in ILI, we conducted differential expression analysis on cell type-specific genes associated with particular flora and reported that 1226 genes presented significantly different expression between patients with mild and severe ILI (P[adj] < 0.05 and | log[2]FC|>0.25) (An additional file shows this in more detail [see S2 File]). The differential expression analysis results for the 9 cell types significantly correlated with the ILI-associated pairs revealed that TMEM41B, ERAP2, and CCR1 in CD14 + monocyte cells were associated with the metagenomic definition unclassified_mgs_2621 within the genus Granulicatella salivarius (P[adj max] < 0.001, log[2]FC[max] = -0.321). Compared with other immune cells, CD16 + monocyte cells presented the greatest number of DEGs and were significantly associated with 26 types of flora, followed by proliferating cells and cDCs. CD16 + monocyte cells were significantly associated with genes such as L1TD1, MTHFR, and ARL6IP5 related to unclassified_mgs_1057 in the genus Streptococcus from the tongue dorsum (P[adj] [max] = 0.021, log[2]FC[max] = -0.257). In CD4 + T effector memory cells, the expression levels of genes such as CCT3, SEPT2, and EFTUD2, which are associated with Aggregatibacter actinomycetemcomitans_mgs_301 within the genus Aggregatibacter (P[adj max] = 0.003, log[2]FCmax = -0.362), were significantly lower in the severe ILI group. Conversely, the expression levels of the gene SKAP2 associated with Neisseria_sicca_A_mgs_986 in the genus Neisseria from saliva (P[adj max] < 0.001, log[2]FC[max] = 0.473) in CD14 + monocyte cells and the expression levels of genes such as CFD and PSME2 (P[adj max] < 0.001, log[2]FC[max] = 27.591) associated with unclassified_mgs_1057 in the genus Streptococcus from the tongue dorsum and the expression levels of genes such as TMEM70 and XBP1 (P[adj max] < 0.001, log[2]FC[max] = 3.703) associated with unclassified_mgs_1956 in the genus Prevotella from the tongue dorsum in CD16 + monocyte cells were significantly greater in the severe ILI group. Fig 3. Manhattan plot of cell-specific gene expression and flora associations based on TWAS analysis. [110]Fig 3 [111]Open in a new tab Description of data: Red dashed line represents nominal significance level (P = 0.05). 3.3 Functional enrichment analysis of cell type-specific DEGs associated with the flora of the upper respiratory tract Functional enrichment analysis based on the GO, KEGG, and Reactome pathway databases was conducted for the identified flora-associated, cell type-specific genes that were significantly differentially expressed between patients with mild and severe ILI. The enrichment results indicated that in CD14 + monocyte cells from ILI patients, differential genes associated with the metagenomic definition unclassified_mgs_2621 (saliva_pheno.2184) in the genus Granulicatella were significantly enriched in processes such as protein carrier chaperone activity, aminopeptidase activity, and intramembrane lipid transporter activity (Enrichment ratio[min] = 0.167, FDR[max] = 0.011), whereas differential genes associated with Neisseria_sicca_A_mgs_986 (saliva_pheno.307) in the genus Neisseria were significantly enriched in biological processes such as the regulation of blood vessel remodelling, negative regulation of extracellular matrix organization, and leucine zipper domain binding (Enrichment ratio[min] = 0.2, FDR[max] = 0.024). The DEGs associated with unclassified_mgs_1939 (tongue_pheno.2645) of the genus Campylobacter_A from the tongue dorsum in CD4 + T effector memory cells were significantly enriched in the processes of the cytoplasmic vesicle lumen, vesicle lumen, and secretory granule lumen (Enrichment ratio[min] = 0.667, FDR[max] < 0.001). Additionally, DEGs associated with Aggregatibacter_actinomycetemcomitans_mgs_301 (tongue_pheno.485) of the genus Aggregatibacter from the tongue dorsum were significantly enriched in the biological processes of the regulation of establishment of protein localization to telomeres, the regulation of protein localization to the Cajal body, and the positive regulation of protein localization to the Cajal body (Enrichment ratio[min] = 0.250, FDR[max] < 0.001). Differential genes in cDC associated with unclassified_mgs_1417 (saliva_pheno.1286) from the genus saliva CAG-793 were enriched in processes such as mitochondrial protein-containing complexes, oxidoreductase complexes, and the mitochondrial matrix (Enrichment ratio[min] = 0.375, FDR[max] < 0.001). Additionally, differential genes associated with Eubacterium_B_infirmum_mgs_800 (tongue_pheno.71) from the dorsal tongue Eubacterium were significantly enriched in biological processes related to hydrolase activity, hydrolysis of O-glycosyl compounds, and other pathways involving the degradation of glycosyl bonds and sugars. In contrast, NK cells were significantly enriched for biological processes associated with differential gene enrichment, with both saliva Prevotella_loescheii_mgs_2547 (saliva_pheno.1016) and tongue dorsum Catonella (tongue_pheno.29), and Prevotella_loescheii_mgs_2547 mgs_2547 were enriched for biological processes such as regulation of leukocyte-mediated cytotoxicity and regulation of cell killing (Enrichment ratio[min] = 0.500, FDR[max] = 0.007). The genus Catonella was enriched for tubulin binding, Polycomb repressive complex and other processes (Enrichment ratio[min] = 0.400, FDR[max] = 0.015). The unclassified_mgs_1397 (saliva_pheno.2783)-related DEGs in the genus Saliva Campylobacter_A were enriched mainly in biological processes such as glutamate receptor binding, spindle pole and dopaminergic synapse; pathways related to neurodegeneration; multiple diseases; and other biological pathways (Enrichment ratio[min] = 0.333, FDR[max] = 0.041). Differential genes associated with saliva Eikenella 2545 unclassified_mgs_2545 (saliva_pheno.2390) were enriched in proliferating cells for motor proteins, microtubule motor activity, vasopressin-regulated water reabsorption and other biological processes (Enrichment ratio[min] = 0.500, FDR[max] = 0.005). Differential genes associated with saliva unclassified_mgs_2225 (saliva_pheno.2774) and tongue dorsal Campylobacter_A_showae_A_mgs_3545 (tongue_pheno.760) in the genus Campylobacter_A were associated with CD16 + monocyte cells and CD8 + T effector cells enriched in the following biological processes: ficolin-1-rich granules, ficolin-1-rich granule lumen (Enrichment ratio[min] = 0. 214, FDR[max] = 0.002), the response to starvation, the cellular response to nutrient levels, and the cellular response to starvation (Enrichment ratio[min] = 0.600, FDR[max] < 0.001).([112]Figs 4 and [113]5, [114]Table 5, and An additional file shows this in more detail [see S3 File]). Fig 4. Bubble plot of functional enrichment analysis for salivary flora-associated cell type-specific differentially expressed genes. [115]Fig 4 [116]Open in a new tab Description of data: Results of the top 5, arranged in ascending order by q-value; saliva_pheno.1016: Prevotella_loescheii_mgs_2547 within the genus Prevotella in the salivary flora; saliva_pheno.1286: unclassified_mgs_1417 within the genus CAG-793 in the salivary flora; saliva_pheno.2390: unclassified_mgs_2545 within the genus Eikenella in the salivary flora; saliva_pheno.2774: unclassified_mgs_2225 within the genus Campylobacter_A in the salivary flora; saliva_pheno.2783: unclassified_mgs_1397 within the genus Campylobacter_A in the salivary flora. Fig 5. Bubble plot of functional enrichment analysis for dorsal tongue flora-associated cell type-specific differentially expressed genes. [117]Fig 5 [118]Open in a new tab Description of data: Results of the top 5, arranged in ascending order by q-value; tongue_pheno.2645: unclassified_mgs_1939 within the genus Campylobacter_A in the dorsal tongue flora; tongue_pheno.29: the genus Catonella in the dorsal tongue flora; tongue_pheno.485: Aggregatibacter_actinomycetemcomitans_mgs_301 within the genus Aggregatibacter in the dorsal tongue flora; tongue_pheno.71: Eubacterium_B_infirmum_mgs_800 within the genus Eubacterium_B in the dorsal tongue flora; tongue_pheno.760: Campylobacter_A_showae_A_mgs_3545 within the genus Campylobacter_A in the dorsal tongue flora. Table 5. The functional enrichment analysis of cell type-specific expression genes related to flora community. Bacterial community Cell Type Data Base Pathway/Biological Process Enrichment Ratio FDR aminopeptidase activity 0.167 0.010 intramembrane lipid transporter activity 0.167 0.010 calcium-dependent phospholipid binding 0.167 0.011 chemokine binding 0.167 0.008 saliva:k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Burkholderiales; f__Neisseriaceae; g__Neisseria; s__Neisseria_sicca_A_mgs_986 CD14 + Monocyte GO regulation of blood vessel remodeling 0.2 0.023 negative regulation of extracellular matrix organization 0.2 0.023 negative regulation of collagen metabolic process 0.2 0.023 regulation of extracellular matrix disassembly 0.2 0.023 cotranslational protein targeting to membrane 0.2 0.024 glycoprotein catabolic process 0.2 0.024 leucine zipper domain binding 0.2 0.011 LRR domain binding 0.2 0.011 Bacterial community Cell Type Data Base Pathway/Biological Process Enrichment Ratio FDR tongue:k__Bacteria; p__Campylobacterota; c__Campylobacteria; o__Campylobacterales; f__Campylobacteraceae; g__Campylobacter_A; s__unclassified_mgs_1939 CD4 + T effector memory GO secretory granule lumen 0.7 5.95E-06 cytoplasmic vesicle lumen 0.7 5.95E-06 vesicle lumen 0.7 5.95E-06 KEGG Prostate cancer 1 0.002 Fluid shear stress and atherosclerosis 1 0.002 saliva:k__Bacteria;p__Campylobacterota; c__Campylobacteria; o__Campylobacterales; f__Campylobacteraceae; g__Campylobacter_A; s__unclassified_mgs_1397 B GO glutamate receptor binding 0.375 2.17E-05 spindle pole 0.375 0.001 KEGG Pathways of neurodegeneration - multiple diseases 0.5 0.041 Dopaminergic synapse 0.333 0.041 Oocyte meiosis 0.333 0.041 Fluid shear stress and atherosclerosis 0.333 0.041 Adrenergic signaling in cardiomyocytes 0.333 0.041 Cellular senescence 0.333 0.041 Autophagy - animal 0.333 0.041 Bacterial community Cell Type Data Base Pathway/Biological Process Enrichment Ratio FDR saliva:k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Burkholderiales; f__Neisseriaceae; g__Eikenella; s__unclassified_mgs_2545 Proliferating cell GO plus-end-directed microtubule motor activity 0.5 7.29E-05 microtubule motor activity 0.5 < 0.001 KEGG Motor proteins 1 0.001 Vasopressin-regulated water reabsorption 0.5 0.001 saliva:k__Bacteria; p__Firmicutes; c__Clostridia; o__TANB77; f__CAG-508; g__CAG-793; s__unclassified_mgs_1417 cDC GO mitochondrial protein-containing complex 0.5 9.60E-05 oxidoreductase complex 0.375 < 0.001 mitochondrial matrix 0.5 < 0.001 mitochondrial inner membrane 0.5 < 0.001 inner mitochondrial membrane protein complex 0.375 < 0.001 Bacterial community Cell Type Data Base Pathway/Biological Process Enrichment Ratio FDR tongue:k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Enterobacterales; f__Pasteurellaceae; g__Aggregatibacter; s__Aggregatibacter_actinomycetemcomitans_mgs_301 CD4 + T effector memory GO positive regulation of establishment of protein localization to telomere 0.25 < 0.001 regulation of establishment of protein localization to telomere 0.25 < 0.001 regulation of protein localization to Cajal body 0.25 < 0.001 positive regulation of protein localization to Cajal body 0.25 < 0.001 regulation of establishment of protein localization to chromosome 0.25 < 0.001 protein localization to nuclear body 0.25 < 0.001 positive regulation of protein localization to chromosome, telomeric region 0.25 < 0.001 protein localization to Cajal body 0.25 < 0.001 saliva:k__Bacteria;p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Prevotella; s__Prevotella_loescheii_mgs_2547 NK GO regulation of leukocyte mediated cytotoxicity 0.5 < 0.001 regulation of cell killing 0.5 < 0.001 KEGG Natural killer cell mediated cytotoxicity 0.6 0.001 Antigen processing and presentation 0.4 0.007 Bacterial community Cell Type Data Base Pathway/Biological Process Enrichment Ratio FDR tongue:k__Bacteria; p__Campylobacterota; c__Campylobacteria; o__Campylobacterales; f__Campylobacteraceae; g__Campylobacter_A; s__Campylobacter_A_showae_A_mgs_3545 CD8 + T effector GO cellular response to starvation 0.6 < 0.001 response to starvation 0.6 < 0.001 cellular response to nutrient levels 0.6 < 0.001 saliva:k__Bacteria; p__Campylobacterota; c__Campylobacteria; o__Campylobacterales; f__Campylobacteraceae; g__Campylobacter_A; s__unclassified_mgs_2225 CD16 + Monocyte GO ficolin-1-rich granule 0.288 < 0.001 ficolin-1-rich granule lumen 0.214 0.002 tongue:k__Bacteria;; p__Firmicutes; c__Clostridia; o__Peptostreptococcales; f__Anaerovoracaceae; g__Eubacterium_B; s__Eubacterium_B_infirmum_mgs_800 cDC GO hydrolase activity, hydrolyzing O-glycosyl compounds 0.273 < 0.001 hydrolase activity, acting on glycosyl bonds 0.273 0.001 KEGG Other glycan degradation 0.222 0.003 Bacterial community Cell Type Data Base Pathway/Biological Process Enrichment Ratio FDR tongue:k__Bacteria; p__Firmicutes; c__Clostridia; o__Lachnospirales; f__Lachnospiraceae; g__Catonella NK GO tubulin binding 0.6 < 0.001 protease binding 0.4 0.001 KEGG Polycomb repressive complex 0.5 0.015 Platelet activation 0.5 0.015 [119]Open in a new tab saliva_pheno.2184: The stran identified as unclassified_mgs_2621 within the genus Granulicatella, isolated from the saliva, was defined through metagenomic analysis; saliva_pheno.307: The stran identified as unclassified_mgs_986 within the genus Neisseria, isolated from the saliva, was defined through metagenomic analysis; tongue_pheno.2645:The stran identified as unclassified_mgs_1939 within the genus Campylobacter_A, isolated from the dorsal surface of the tongue, was defined through metagenomic analysis; saliva_pheno.2783:The stran identified as unclassified_mgs_1397 within the genus Campylobacter_A, isolated from the saliva, was defined through metagenomic analysis; saliva_pheno.2390:The stran identified as unclassified_mgs_2545 within the genus Eikenella, isolated from the saliva, was defined through metagenomic analysis; saliva_pheno.1286:The strain unclassified_mgs_1417, defined by 16S rRNA gene sequencing, within the genus associated with saliva CAG-793; tongue_pheno.485:Within the genus Aggregatibacter, Aggregatibacter_actinomycetemcomitans_mgs_301 is a rod-shaped stran isolated from the dorsal surface of the tongue; saliva_pheno.1016: Within the genus Prevotella, Prevotella_loescheii_mgs_2547 is a strain isolated from saliva; tongue_pheno.760:Within the genus Campylobacter_A, Campylobacter_A_showae_A_mgs_3545 is a strain isolated from the dorsal surface of the tongue;saliva_pheno.2774:The stran identified as unclassified_mgs_2225 within the genus Campylobacter_A, isolated from the saliva, was defined through metagenomic analysis; tongue_pheno.71:Within the genus Eubacterium_B, Eubacterium_B_infirmum_mgs_800 is a strain isolated from the dorsal surface of the tongue; tongue_pheno.29: Within the genus Catonella, isolated from the dorsal surface of the tongue. 4. Discussion This study integrated upper respiratory tract flora-related GWAS data and severe ILI-related transcriptomic data to explore the potential associations between upper respiratory tract flora and immune cells in severe ILI and to investigate the possible mechanisms influencing severe ILI from the perspectives of metabolism and the immune response. The study employed a stratified design to integrate multi-source heterogeneous data: the upper respiratory tract flora GWAS data were derived from a homogeneous cohort of healthy Han Chinese individuals, including 2017 tongue dorsum samples and 1915 saliva samples, while transcriptomic data related to severe ILI were integrated from independent cohorts of Southeast Asian and European populations. Previous studies have shown that host genetic factors exhibit long-term stability in the relative abundance of heritable flora taxa, with minimal disruption by environmental factors [[120]51]. Therefore, this study effectively reduced the interference of host heterogeneity and environmental confounding factors in the flora-immune association analysis by including a cohort with consistent genetic background, strict health status screening, and standardized sampling sites. Further research has shown that the diversity of human immune genes is driven by human contact with pathogens [[121]52,[122]53], and there is a significant correlation between genetic variability and global pathogen diversity [[123]54]. Some of these correlations may be confounded by variables such as climate, diet, or lifestyle. Thus, incorporating samples with diverse genetic backgrounds can reveal the heterogeneity of immune responses across different populations. The primary objective of this study is to investigate the common immunological mechanisms underlying influenza-like illness (ILI) and their associations with the upper respiratory flora and host immune response, rather than to elucidate pathogen-specific mechanisms. Moreover, the early symptoms of COVID-19 substantially overlap with those of ILI, and in severe cases, both conditions manifest with upper respiratory inflammation and immune dysregulation. Therefore, we regard COVID-19, in essence, as a form of ILI, and it is classified as part of ILI in public health surveillance. To encompass a broader range of ILI and a wider spectrum of ILI samples, we included patients with ILI caused by various pathogens. This design not only reveals the common features of upper respiratory immune responses and alterations in the flora but also represents the broad clinical phenotype of ILI. Through flora-associated cell type analysis, we identified a total of 205 flora associated with 13 immune cell subtypes, with each immune cell subtype identifying 1–4 flora. Notably, statistically significant associations were identified between 205 species of flora and at least one immune cell types, which may reveal the potential regulatory roles of flora in modulating immune responses.In the gastrointestinal tract, the characteristic feature of acute mucosal infections is dysbiosis, accompanied by significant changes in flora and bacterial dominance. This dysbiotic state may enhance pathogen invasiveness and local inflammatory responses, thereby exacerbating inflammation and tissue damage [[124]55].Previous studies have shown that interactions within the flora, mediated by metabolites that modulate innate immune responses, regulate synergistic mechanisms that drive the stability of flora. Therefore, the correlation between multiple flora and at least one immune cell type may help explain why the upper respiratory tract flora does not rely on a single or few “key flora,” but rather exerts its effects through a complex regulatory network formed by the overall composition of the community, thereby reflecting the collective function of the flora [[125]56].The surface components of the Granulicatella flora can activate CD14 + monocyte cells via the Toll-like receptor, which recognizes pathogens through pathogen-associated molecular patterns. ligation of Toll-like receptor - pathogen-associated molecular patterns initiates Toll-like receptor signalling, which is critical for the innate immune cascade that induces cytokine or chemokine secretion and cell recruitment for pathogen clearance [[126]57]. In addition, these CD14 + monocytes eliminate bacteria via phagocytosis and initiate further immune responses by differentiating into macrophages or dendritic cells [[127]57]. This flora influences the phenotype of ILI by modulating the response of the body’s immune cells. Another study revealed that NK cells play a critical role in antiviral and antitumour immunity. The function of NK cells is regulated by their surface CD16 receptors, a process that can be influenced by different bacterial populations. In HIV-1-infected individuals, the number of CD16 + cells increases [[128]58], and the interaction between Campylobacterota flora and NK cells may influence the host immune response by modulating NK cell activity and quantity. This study revealed that these associations may be partially related to their roles in phagocytosis and antiviral activities in CD14 + monocyte cells, NK cells, and CD16 + monocyte cells, as well as in regulating NK cell activity. Campylobacter_A_showae_A_mgs_3545 from the genus Campylobacter_A on the dorsal tongue and unclassified_mgs_2621 from the genus Granulicatella in saliva Infection with the two flora can result in local and systemic immune responses, including the activation and differentiation of CD4 + naive T cells, which, upon encountering Campylobacter_A_showae antigens displayed by antigen-presenting cells, further differentiate into helper T cells (Th1, Th2, Th17) and regulatory T cells that regulate the immune response. As previous studies have shown, the genus Streptococcus is a common flora in the oral cavity and stimulates naive CD8 + T cells through antigen-presenting cells, leading to their activation and differentiation into cytotoxic T lymphocytes, which, in the local environment of the oral cavity, can directly kill infected epithelial cells and control the spread of bacterial infection and coinfection with viruses and can directly kill infected epithelial cells and control the spread of bacterial infection and coinfection with viruses, thereby reducing the bacterial load, which peaks after 7 days of ILI infection [[129]59,[130]60]. In this setting, other associated immune cells show reduced phagocytosis and increased levels of oxidative free radicals, associated with the suppression of proinflammatory cytokine secretion. In the present study, the Streptococcus genus flora was hypothesized to effectively activate CD8 + naive T cells via an antigen presentation mechanism, prompting them to differentiate into highly efficient CD8 + T cells, which actively participate in the viral clearance process. The extent of ILI infection was significantly controlled, effectively preventing its progression to more severe stages. Furthermore, these CD8 + T cells further differentiate into CD8 + T effector memory cells, which may provide long-lasting protection for future immune defense, enhancing the body’s ability to fight against similar pathogens. Studies have shown that the Firmicutes phylum not only plays a critical role in maintaining intestinal homeostasis but also affects CD16 + monocyte function through its metabolites. These metabolites can enhance the anti-inflammatory and immunomodulatory functions of monocytes by regulating their phenotype, thereby providing protection against infections and chronic inflammatory diseases [[131]61]. Among all significantly identified flora in this study, seven phyla were found to be correlated with NK cells, CD16 + monocytes, CD14 + monocyte cells, and the metabolites of the flora may affect the regulation of the phenotype of mononuclear immune cells and their anti-inflammatory and immunomodulatory functions. In patients with severe ILI, changes in the abundance of the Firmicutes phylum in the blood may affect the expression of contact-dependent genes in CD16 + monocyte cells by regulating the expression of inflammation-related proteins such as CX3CR1-CX3CL1 and IL-1β, thereby influencing the body’s immune recovery capacity [[132]62]. The abundance of the Firmicutes and Campylobacterota phyla was particularly notable In addition, certain strains of the Prevotella genus have complex relationships with human health, with some strains being associated with inflammation, metabolic diseases, and immune responses [[133]63]. Prevotella flora stimulate epithelial cells to produce IL-8, IL-6, and CCL20, thereby promoting mucosal Th17 immune responses and neutrophil recruitment. Prevotella genus-mediated mucosal inflammation leads to the systemic dissemination of inflammatory mediators, bacteria, and bacterial products, which in turn may influence systemic disease outcomes [[134]64]. The associations found in the present study between these strains and various responses and inflammatory factors may be related to their effects on the activity of specific B cells, CD4 + T effector cells, and NK cells, as well as the expression of specific genes or proteins. Further research is needed to confirm these findings. In the present study, certain genes associated with the genera Streptococcus CD14 + monocyte cells, NK cells and CD16 + monocyte cells also exhibited significant differential expression between the severe and mild ILI groups. Functional enrichment analysis of these DEGs revealed that in the genus Streptococcus, the DEGs associated with the dorsum of the tongue metagenome, unclassified_mgs_1057, were significantly enriched in pathways related to nuclear androgen receptor binding and transcription coactivator activity. These findings suggest that the genus Streptococcus tongue dorsalis may influence the progression of ILI by initiating the transcription of specific genes in the nucleus of CD16 + monocyte cells that interact with the cellular regulatory RNA polymerase II complex. Similarly, certain genes associated with Eubacterium in cDCs were similarly and significantly differentially expressed between the severe and mild ILI groups. Further functional enrichment analysis of these differentially expressed genes revealed that those associated with Eubacterium B infirmum mgs 800 on the dorsum of the tongue were significantly enriched in processes related to the azurophil granule membrane, lysosomal lumen, hydrolase activity, glycosidase activity, and transcription initiation factor binding. This finding suggests a mechanism by which the strain may influence the course of ILI disease by regulating enzyme activity, activating transcription factors, and regulating gene expression in combination with specific DNA sequences. The present study also revealed that the dorsal tongue unclassified_mgs_1939 of the genus Campylobacter_A was associated with CD4 + T effector memory in patients with mild ILI with differential gene expression. TWAS and pathway enrichment analysis revealed that these DEGs were enriched in biological processes such as the secretory granule lumen, prostate cancer, fluid shear stress and atherosclerosis. These findings suggest that the effect of the dorsal tongue metagenomics definition unclassified_mgs_1939 on the body’s infection-associated immune response may be related to its effect on regulatory signalling in CD4 + T effector memory cells or the production of immunosuppressive molecules in CD4 + T effector memory cells. However, “enrichment” does not necessarily imply that these genes exhibit higher expression levels in severe patients; rather, it reflects a stronger association or activation trend of these genes within specific immune responses or signaling pathways. Regarding the differential regulation of immune cell gene expression by the same associated flora, the upper respiratory tract flora can directly regulate immune cell gene expression through its metabolites, thereby leading to differences in gene expression within the same immune cell type. Severe ILI patients typically exhibit a more pronounced immune response, this immune state is closely associated with the interaction between the host and the upper respiratory tract flora. The same flora may exert distinct biological effects under different immune conditions, ultimately contributing to differences in immune cell gene expression. The main limitations of the present study are as follows. First, the integrative analysis for identifying flora-associated cell types was based on the assumption that interindividual differences in the abundance of upper respiratory tract flora could be attributed to genetic variation to some extent. Although the study acquired all the data of upper respiratory tract flora-associated genome-wide association studies from the National Gene Bank of China database and only the GWAS summary data with h2 > 0.1 were included in the integration analysis, the GWAS revealed only the association between genetic variation and phenotype rather than direct causality, while the mechanism of the effect of fewer heritable flora on ILI remained unclear. Second, the relatively small sample size (n = 2984) of the GWAS pooled data included in this study may be one of the reasons for the unsatisfactory signals of the integrated analysis and the TWAS analysis. In addition, in the identification of flora-associated cell type-specific expressed genes, the eQTL model used was constructed from the EUR population, and although genomic data from different races were used as reference panels in the model construction (EUR) and TWAS analysis (EAS), differences in genetic structure between races may still be an important reason why fewer significant genes were identified in this part of the study than in similar studies. Finally, this part of the study investigated only the associations between flora and immune cells from a transcriptional and metabolic perspective, and additional epigenetic and protein studies may be considered to gain a more comprehensive understanding of the underlying mechanisms involved in “upper respiratory flora-immunity-ILI progression”. Moreover, the conclusions of the bioinformatics analysis should be further verified by experimental and population studies. In summary, a total of 205 flora associated with 13 cell types were analysed in this study, and cell-specific eQTL models were further constructed and analysed via TWAS. On the basis of this analysis, 1226 flora-associated cell type-specific genes with significantly different expression levels between patients with mild and severe ILI were identified (P[adj] < 0.05 and | log[2]FC|>0.25). Finally, by analysing the genes specifically expressed in cell types associated with the significantly differentially expressed flora, pathways or biological processes related to ILI progression were enriched. Supporting information S1 Supporting information This zip file contains supplementary files, figures to the study. (ZIP) [135]pone.0322864.s001.zip^ (18.2MB, zip) Abbreviations ILI influenza like illness Treg regulatory T cells COVID-19 novel coronavirus pneumonia QC quality control KO Kyoto Encyclopedia of Genes and Genomes BH Benjamini & Hochberg FDR false discovery rate COPD chronic obstructive pulmonary disease GWAS genome-wide association studies CNGBdb China National Gene Bank database WGS whole genome sequencing SNPs single nucleotide polymorphisms MAF minor allele frequency EAS East Asian EUR European scRNA-seq single-cell RNA sequencing GEO Gene Expression Omnibus PBMC peripheral blood mononuclear cell UMAP uniform manifold approximation and projection PLT platelet RBC red blood cell MC Monte Carlo LD linkage disequilibrium eQTL expression quantitative trait loci TWAS transcriptome-wide association analysis GO Gene Ontology FC fold change NK natural killer cells cDC conventional dendritic cells Prolif proliferation cells Data Availability All relevant data are within the manuscript and its [136]Supporting Information files Funding Statement This study was supported by the National Natural Science Foundation of China (grant number 82273741) (Awarded to P.H.), the Jurong Municipal Social Development Program (grant number ZA42205) (Awarded to Y.F.W.), the Jurong Municipal Social Development Program (grant number ZA42312) (Awarded to H.B.C.), and the Open Project of Jiangsu Provincial Key Laboratory of Laboratory Medicine (JSKLM-Y-2024-018) (Awarded to Y.F.W.). References