Abstract

   Genome-wide association study analysis has revealed the significant
   influence of genetic factors in the progression of COVID-19. However,
   the impact of these genetic variants on gene expression in various cell
   types remains largely unexplored. Here, we profiled immune cells from
   42 COVID-19 cases (involving five stages) and 11 healthy individuals,
   and performed a single-cell RNA-seq assay with >200 000 cells to
   investigate cell type-specific and stage-specific genetic effects of
   genetic variants. Single-cell expression quantitative trait
   loci analysis of eight distinct cell types showed that the expression
   of 2593 genes is significantly associated with common genetic
   polymorphisms, and that the majority of these genes show their most
   prominent effects in specific cell types. Furthermore, we also
   discovered new gene associations for COVID-19-risk variants identified
   from genome-wide association studies and highlighted the monocytes in
   which their effects are most prominent. We found that 57 genes were
   regulated by variants associated with COVID-19, with significant
   enrichment for dynamic effects. In summary, our results highlight the
   importance of studying context-specific genetic regulation of gene
   expression and provide insights into the mechanisms underlying genetic
   susceptibility to COVID-19.

Introduction

   Coronavirus disease 2019 (COVID-19), caused by SARS-CoV-2 infection,
   consists of a wide range of pathologies, including no symptoms, mild,
   and severe respiratory failure, and has caused serious consequences
   around the world [[44]1–3]. Although the aberrant activity of
   inflammatory cytokines in patients with COVID-19 indicates the
   functional changes of immune cells, the host immune genetic factors
   that determine susceptibility to COVID-19 are not fully revealed.
   Genome-wide association studies (GWAS) have identified a number of
   genomic variants associated with COVID-19 traits [[45]4–6]. The
   majority of these variants are located in gene regulatory elements such
   as promoter and enhancer regions [[46]7]. In addition, several
   COVID-19-related causal genes are identified by conducting
   colocalization and transcriptome-wide association study (TWAS)
   analysis, integrating datasets of GWAS and expression quantitative
   trait loci (eQTL) [[47]8, [48]9]. However, the immune-specific genetic
   regulation of causal genes in COVID-19 has not been fully revealed.

   Recent studies have shown that eQTLs play crucial roles in progression
   of diseases [[49]10, [50]11]. Notably, traditional eQTL studies with
   bulk tissues have largely masked the cell type-specific mediation of
   genetic variants on gene expression, which has been revealed to play an
   important role in COVID-19 pathogenesis [[51]12]. Single-cell RNA
   sequencing (scRNA-seq) is a comprehensive technology to profile the
   composition and cell type-specific transcriptional states of tissues or
   cell lines. And existing scRNA-seq analysis has revealed that monocytes
   are critical in cause of COVID-19 [[52]13, [53]14]. By integrating
   scRNA-seq and population genetic information, it is possible to
   evaluate cell type-specific effects of genetic variants on gene
   expression. Nonetheless, a comprehensive census of circulating immune
   cells in COVID-19 with different severity remains incomplete, and
   identifying cell type-specific and context-mediating genetic regulation
   remains challenging.

   Given the dynamic characteristics and the heterogeneity of circulating
   immune cells, in this study, we provide a comprehensive census to
   reveal cellular changes across four severity levels of COVID-19, and
   mapped genetic regulation of gene expression on eight immune cell
   types. In addition, we reconstructed cell differentiation trajectories
   for monocytes and identified dynamic eQTL effects across subtypes of
   monocytes at different COVID-19 severity. Furthermore, we identified 57
   genes with colocalizing eQTL and GWAS loci for COVID-19. Notably, our
   results showed that part of the colocalizing genes involved in protein
   interaction network were enriched in interferon-mediated signaling
   pathways. In summary, our data suggest that dysregulation of gene
   expression of immune cells could mediate COVID-19 and emphasize the
   importance of cell type-specific gene expression regulation by genetic
   variants.

Materials and methods

Data collection

   scRNA-Seq data of 53 raw SRR files (42 COVID-19 cases, 11 healthy
   controls) from peripheral blood mononuclear cells (PBMCs) were
   collected from the Gene Expression Omnibus (GEO) database (accession
   number: [54]GSE165080) [[55]15]. The COVID-19 cases were divided into
   four stages, including asymptomatic (5 cases), mild (13 cases), severe
   (12 cases), and convalescent (12 cases). GWAS summary data of COVID-19
   patients were obtained from the COVID-19 Host Genetics Initiative
   database ([56]https://www.covid19hg.org/), which comprises 18 152
   COVID-19 samples and 1145 546 controls. The bulk data of
   three coronaviruses were collected from the GEO database (accession
   numbers: [57]GSE155986, [58]GSE193169, and [59]GSE198398) [[60]16–18].

scRNA-seq data analysis

Quality controls

   First, raw scRNA-seq data were processed using the Cell Ranger software
   (v5.0.0). The human genome (GRCh38) was used as a reference for
   alignment. The Ensembl (v43) was used as a reference for gene
   annotation (website). Then, results from RNA quantification in Cell
   Ranger were analyzed using Seurat (v4.0.5) package [[61]19],
   implementing the following rigorous quality control measures: (i) cell
   filtering—removal of cells with <200 detected genes, >25 000 unique
   molecular identifiers (UMIs; upper threshold determined as mean +3SD of
   UMI distribution), or >10% mitochondrial gene content and (ii) gene
   filtering—exclusion of genes detected in fewer than three cells. As a
   result, a total of 222 456 cells and 22 167 genes passed quality
   filters.

Gene enrichment analysis

   We performed gene ontology and pathway enrichment analysis using the
   clusterProfiler R package (version 4.0.5). For each cell type, the top
   20 marker genes were selected based on absolute log[2] fold-change
   values from differential expression analysis, with statistical
   significance determined by Student’s t-test. These gene sets were
   analyzed among GO terms.

Cell–cell communication analysis

   To systematically investigate cell–cell communication dynamics in
   COVID-19, we employed the CellChat package (version 1.6.1) to analyze
   intercellular signaling networks using single-cell transcriptomic data.
   Our analysis incorporated the curated ligand–receptor interactions from
   CellChatDB as reference, through which we first identified
   overexpressed genes and interacting pairs within specific cell types.
   We then computed communication probabilities for all potential
   ligand–receptor interactions and performed comparative analysis at both
   the pathway level (evaluating 36 distinct signaling pathways) and
   the system level between COVID-19 patients and healthy controls. Our
   analysis revealed that 32 of the 36 examined pathways were active in
   both cohorts, though with quantitative differences in signaling
   strength. In addition, we discovered four pathway-specific signatures:
   MIF and PARs pathways emerged as COVID-19-exclusive features, whereas
   APRIL and EPHB signaling were uniquely maintained in healthy controls.

Ordering of cells in a pseudotime trajectory

   To perform trajectory inference, we imported normalized gene expression
   profiles for a total of 24 238 monocytes, including 15 000 classical
   monocytes (cMs) and 9283 nonclassical monocytes (ncMs), into R
   (v3.6.1). The profiles were analyzed using the monocle (v2.22.0)
   package [[62]20]. In brief, the monocle object was constructed using
   the newCellDataSet function. Then, the size factor and dispersion
   between cells were estimated using the estimateSizeFactors and
   estimateDispersions functions. Next, the highly variable genes related
   to the pseudotime were calculated using differentialGeneTest function.
   Finally, the differential genes were embedded into the monocle object
   using setOrderingFilter function, and the project was mapped into the
   low-dimensional space by reduceDimension function.

Genetic variant detection and quality control

   Genetic variants were identified through the following analytical
   pipeline: First, raw sequencing data (SRR files) underwent quality
   assessment and adapter trimming using fastp (version 0.23.2), followed
   by alignment to the human reference genome (GRCh38) using STAR (version
   2.7.10). Variant calling was then performed using the GATK Best
   Practices workflow for RNA-seq short variant discovery (GATK version
   4.2.6.1), with parameters optimized for single-cell data. To ensure
   variant quality, we applied stringent filtering criteria: (i) Quality
   by Depth (QD) > 2; (ii) Mapping Quality (MQ) > 40; (iii) Fisher Strand
   (FS) < 60; (iv) Strand Odds Ratio (SOR) < 3; (v) MQRankSum > −12.5;
   (vi) ReadPosRankSum > −8; (vii) Minor Allele Frequency (MAF) > 5%; and
   (viii) Hardy–Weinberg Equilibrium P-value >5 × 10^−5. Finally, variants
   were annotated using snpEff (version 5.1d) against the dbSNP reference
   database. This rigorous pipeline yielded 153 527 high-quality autosomal
   SNPs for subsequent eQTL analysis.

Cis-eQTL mapping analysis

   We performed cis-eQTL analysis using a rigorous linear modeling
   following the rigorous pipeline: First, we calculated mean normalized
   expression values for each gene across cell types and samples. These
   expression profiles were then analyzed using the matrixEQTL package
   (v2.3.0) with a linear model that incorporated age and sex as
   covariates to control for potential confounding effects. The analysis
   tested for associations between genotype and gene expression within a 1
   Mb window centered on each gene’s transcription start site [[63]21]. We
   established statistical significance at a nominal P-value threshold of
   0.05, with all analyses adjusted for multiple testing using the
   Benjamini–Hochberg procedure (FDR < 0.05). This covariate-adjusted
   approach ensured that identified eQTLs represented genuine genetic
   regulatory relationships independent of demographic variables.

Modeling of context-specific eQTL effects

   We conducted mapping of stage-specific and pseudotime-dependent
   cis-eQTLs for monocytes. Initially, we categorized monocytes into
   five distinct stages, comprising four COVID-19 stages (asymptomatic,
   mild, severe, and convalescent) as well as healthy controls.
   Subsequently, we calculated the average expression of each gene per
   individual within each monocyte stage and employed the previously
   established method to map stage-specific eQTLs. Similarly, for the
   identification of dynamic eQTL effects, we divided the
   pseudotime trajectory of monocytes into nine bins, calculated the
   average gene expression per individual within each bin, and
   subsequently mapped eQTLs for each bin.

Integration of eQTLs with GWAS signals

   To identify COVID-19 causal genes, first, we performed the coloc
   (v5.1.0) analysis [[64]22], which is based on Bayesian genetic
   colocalization. We intersected candidate loci by using the GWAS data of
   COVID-19 and eQTLs for each cell type. In brief, we defined a 500-kb
   window centered on the lead eQTL variant, and performed colocalization
   testing using all variants within the window that were present in both
   the eQTL and the GWAS summary statistics. This approach allowed us to
   assess the likelihood of colocalization between the genetic variants
   influencing gene expression and those associated with COVID-19
   susceptibility. Then, we conducted the TWAS to impute
   cell type-specific genetic expression levels from genotypes, and
   identify potentially novel associated genes. The FUSION software was
   used to perform TWAS analysis for each transcriptome reference panel
   [[65]23]. The 1000 Genomes project was used as an LD reference file.

Results

Single-cell transcriptomes of immune cells in COVID-19

   First, scRNA-seq analysis was performed to profile >220 000 PBMCs from
   53 unique samples (Fig. [66]1A). The 53 samples corresponded to 11
   healthy controls and 42 COVID-19 cases, including 5 asymptomatic cases,
   13 mild cases, 12 severe cases, and 12 convalescent cases. After
   quality control, a total of 222 456 high-quality cells detecting 22 167
   genes were ultimately obtained. Subsequently, all the cells were
   derived into eight major cell types, including CD4^+ T cells, CD8^+ T
   cells, B cells, monocytes, conventional and plasmacytoid dendritic
   cells (cDC and pDC), and platelets, according to classical markers
   (Fig. [67]1B). In addition, processing batches had no observable
   effects on the distribution of cells based on the uniform manifold
   approximation and projection (UMAP) visualization ([68]Supplementary
   Fig. S1A). Moreover, significant differences were observed in immune
   cell distribution across COVID-19 severity groups (Fig. [69]1C).
   Monocytes and platelets were markedly enriched in COVID-19 patients
   compared to healthy controls. Notably, monocytes exhibited the highest
   abundance in severe cases during disease progression. Furthermore,
   STARTRAC-dist analysis revealed distinct clustering patterns of
   cellular subsets among severity groups ([70]Supplementary Fig. S1B),
   with monocytes demonstrating progressive enrichment correlating with
   disease severity. Even for asymptomatic and convalescent patients, the
   proportion of monocytes remained higher than normal controls,
   indicating the abnormal activation of monocytes during SARS-CoV-2
   infection. This finding was also consistent with the previous studies
   [[71]13, [72]24]. Next, functional enrichment analysis was conducted to
   describe the active pathways of upregulated genes in various cell types
   for COVID-19. For COVID-19-enriched cells, monocytes exhibited elevated
   expression of genes, such as IL1B, CCL3, and CXCL8 (Fig. [73]1D). These
   upregulated genes were enriched in pathways such as “response to
   inflammatory” and “regulation of viral transcription”
   ([74]Supplementary Fig. S1C), suggesting that monocytes played a
   critical role in virus defense and lymphocyte activation.

Figure 1.

   [75]Figure 1.
   [76]Open in a new tab

   A single-cell transcriptional map of immune cells in COVID-19. (A) The
   schematic for single-cell analysis of the study design. (B) The UMAP
   visualization of scRNA-seq data for immune cells of 42 COVID-19 cases
   and 11 healthy controls. (C) Violin plot showing the expression levels
   of selected marker genes in each of the indicated cell types. (D) The
   proportion of different cluster groups present in COVID-19 patients and
   healthy controls (left) and different stages of COVID-19 patients
   (right). (E) The differential gene expression analysis showing
   upregulated and downregulated genes across all eight cell types.

   Next, to explore the potential intercellular relationships and
   transcriptomic changes, the cellular interaction differences among
   cells between COVID-19-positive and -negative samples were investigated
   by CellChat [[77]25]. The results showed that monocytes exhibited
   distinct interaction potentials with both CD4^+ and CD8^+ T cells
   ([78]Supplementary Fig. S2A), emphasizing the effect of monocytes in T
   lymphocyte activation and migration for COVID-19 patients. In addition,
   biologically specific signal pathways were conducted to explain the
   immune mechanisms of SARS-CoV-2 infection. The results revealed that
   MIF and PARs signal pathways were enriched in COVID-19 patients, while
   APRIL and EPHB signal pathways were enriched in healthy controls (Fig.
   [79]2A). Moreover, the MIF signaling network showed that T cells (both
   CD4 and CD8) contribute to monocyte-dominated MIF signal production in
   COVID-19 (Fig. [80]2B). Notably, MIF interactions among monocytes and T
   cells were more likely autocrine, while others were paracrine signaling
   (Fig. [81]2C). Furthermore, MIF signaling is dominated by MIF ligand
   and its multimeric CD74/CD44 receptor (Fig. [82]2D), which had been
   shown to be associated with the progression of COVID-19 [[83]26,
   [84]27]. Finally, the interaction strength analysis confirmed that
   monocytes had more incoming and outgoing strength in COVID-19 patients
   compared to healthy controls ([85]Supplementary Fig. S2B).
   Collectively, these findings highlighted that monocytes were increased
   in COVID-19 patients, and closely associated with the COVID-19
   progression stage.

Figure 2.

   [86]Figure 2.
   [87]Open in a new tab

   Recognition of immunomodulatory signaling pathways in COVID-19.
   (A) Overall information flow of communication network between COVID-19
   patients and healthy controls. (B) The hierarchical diagram showing the
   autocrine (left) and paracrine (right) signals of MIF signaling
   transduction. Solid circles and hollow circles represent the source and
   the target points, respectively. The size of the circle is proportional
   to the number of cells in each cell type, and the edge width represents
   the communication probability. The edge color is consistent with the
   signal source. (C) The heatmap showing the relative importance of each
   cell type calculated based on the four network center metrics of the
   MIF signal network. (D) The relative contribution of each ligand
   receptor to the overall communication network of the MIF signaling
   pathway.

Identification of immune cell type-specific cis-eQTLs of COVID-19

   To investigate genetic regulation of gene expression in COVID-19 immune
   cells, we first identified genetic variants by calling SNPs from raw
   single-cell RNA-seq reads, followed by re-annotation and rigorous
   quality control (detailed in the “Genetic variant detection and quality
   control” of “Materials and methods” section). Using these
   high-confidence variants, we then performed comprehensive eQTL analysis
   across eight major immune cell types, testing associations between
   genotype and gene expression levels (Fig. [88]3A). In total, we
   identified 4549 eQTLs for 2593 autosomal genes in all cell types
   ([89]Supplementary Table S1). For each individual cell type, we
   detected 356–1509 eQTLs with statistical significance, of which
   281–1026 eQTLs were only detected in individual cell types (Fig.
   [90]3B). In our study, we identified a total of 4284 cell type-specific
   eQTLs. To systematically evaluate our findings, we performed
   multi-level replication analyses across different eQTL resources.
   Compared to bulk tissue data from the Genotype-Tissue Expression (GTEx)
   database [[91]28]. We observed moderate eQTL replication (6.6%–43.2%)
   and eGene overlap (2.4%–23.3%) rates across eight cell types, with
   94.1%–100% consistency in allelic effect direction for replicated
   variants ([92]Supplementary Table S2). One plausible explanation for
   this observation is that the eQTLs identified in GTEx are primarily
   based on bulk tissue analysis, which tends to mask cell type-specific
   effects due to the averaging of gene expression across multiple cell
   types within the tissue [[93]29]. Our analysis demonstrated that while
   bulk analyses obscure cell type-specific regulatory variants, the core
   genetic regulation of target genes remains detectable. In addition, our
   findings showed strong concordance with established single-cell eQTL
   resources. The DICE database comparison revealed replication rates of
   16.1%–28.1% across major immune cell types, including monocytes, B
   cells, T cells, and NK cells, with 96.3%–99.1% consistency in effect
   directions ([94]Supplementary Table S3 and [95]Supplementary Fig. S3A).
   Similar robust agreement emerged when analyzing [[96]11] PBMC sc-eQTL
   data. The CD4^+T cells demonstrated the highest replication rate
   (20.8%) with perfect effect direction consistency
   (100%) ([97]Supplementary Fig. S3B). Although pDCs showed the minimum
   number of overlapping eQTLs (n = 9), this limited set exhibited
   remarkable effect size correlation (Pearson’s r = 0.89, P< .001),
   demonstrating exceptionally precise replication when present. While our
   comparative analysis with the [[98]10] dataset revealed a limited
   overlap in identified eQTLs, we observed high directional consistency
   in shared regulatory variants across matched cell types
   ([99]Supplementary Fig. S3C). The reduced overlap likely stems from
   both biological and methodological factors. Biologically, differences
   between our COVID-19 patient cohort and their healthy donor samples may
   reflect disease-specific alterations in regulatory landscapes.
   Methodologically, the use of Spearman rank correlation in their
   analysis versus the linear models employed in our study represents a
   key technical distinction, as these approaches capture different
   aspects of genotype-expression relationships. Together, these factors
   help explain the observed variation in eQTL detection while maintaining
   the consistency of shared regulatory effects. The reduced overlap
   likely stems from both biological and methodological factors.
   Biologically, differences between our COVID-19 patient cohort and their
   healthy donor samples may reflect disease-specific alterations in
   regulatory landscapes. Methodologically, the use of Spearman rank
   correlation in their analysis versus the linear models employed in our
   study and most other sceQTL analyses represents a key technical
   distinction, as these approaches capture different aspects of
   genotype-expression relationships. Together, these factors help explain
   the observed variation in eQTL detection while maintaining the
   consistency of shared regulatory effects. Importantly, the high
   consistency in both regulatory direction and effect magnitude for
   overlapping variants provides robust validation of our methodological
   approach and confirms the reliability of our findings despite the
   distinct sample characteristics. These results suggest that while
   disease state influences the overall eQTL profile, core regulatory
   mechanisms remain detectable across different physiological conditions.

Figure 3.

   [100]Figure 3.
   [101]Open in a new tab

   Identification of sc-eQTLs in immune cells of COVID-19 patients.
   (A) The flowchart for identification of cis-eQTL from COVID-19
   scRNA-seq data. (B) The upset plot showing the number and composition
   of significant cis eQTLs for eight cell types. (C) Scatter plot showing
   the correlation between number of cells per donor and number of
   detected eGenes in each cell type. (D) Example of monocyte-specific
   eQTL. The SNP rs11516630 has a significant association with ADK
   expression just in the monocytes. adj. association P-value *P < .05,
   **P < .001, and ***P < .0001.

   We next investigated whether the complexity of genetic regulation
   varies across immune cell populations. Interestingly, we observed
   variations in the number of these cell type-specific eQTLs across
   different cell types, suggesting a potential relationship between the
   number of eQTLs and cellular heterogeneity [[102]10]. In addition, the
   proportion of these cell type-specific eQTLs to significant eQTLs also
   exhibited cellular heterogeneity among different cell types. For
   instance, we observed that the proportions of independent eQTLs in NK
   cells and pDCs were remarkably high, reaching up to 82.2% and 85.9%,
   respectively. However, in the case of CD8 T cells, the proportion of
   independent eQTLs was comparatively lower, at only 47.7%. The variation
   in the proportion of independent eQTLs emphasizes the importance of
   considering cell-specific regulatory mechanisms when studying gene
   expression in different immune cell populations. Based on this
   characteristic, we defined these independent eQTLs as
   cell type-specific eQTLs. Additionally, we also found that monocytes
   had a higher number of total cis-eQTLs, significant cis-eQTLs, and
   cell type-specific eQTLs. Notably, monocytes also displayed elevated
   numbers of cell counts (Fig. [103]3C), suggesting that the statistical
   power to detect eQTLs in cell types is associated with the number of
   cells, consistent with previous research [[104]30]. These results
   suggest that while disease state influences the overall eQTL profile,
   core regulatory mechanisms remain detectable across different
   physiological conditions. Notably, this persistence was particularly
   evident in monocytes. This monocyte-specific pattern suggests that
   certain genetic regulatory programs may operate in a cell-autonomous
   manner during infection. For example, the eGene ADK, despite
   having higher baseline expression in B cells, exhibited
   monocyte-specific regulation. Further supporting this
   cell type-dependent mechanism, we found that the homozygous C allele of
   rs11516630 significantly reduced ADK expression exclusively in
   monocytes (Fig. [105]3D), with no comparable effect observed in other
   cell types. In summary, the single-cell resolution of our analysis
   revealed an important dimension of cellular heterogeneity.

Context-dependent eQTL effects on monocytes in COVID-19

   Given the importance of monocytes in immunomodulatory functions, we
   further mapped cis-eQTL of monocytes in 5 stages of disease severity to
   explore the regulation of gene expression of genetic variants during
   COVID-19 progression. We observed notable variations in the quantity of
   genes with significant cis-eQTL effects (eGenes) across different
   stages of COVID-19 severity (Fig. [106]4A). Moreover, the highest
   number of eGenes was observed in the severe stage of COVID-19,
   suggesting that gene expression in severe disease states is more
   susceptible to genetic variation. For example, the eGenes S100A8 and
   S100A12 showed the most significant eQTLs in severe stage (Fig.
   [107]4B). S100A8 encodes a calcium-binding protein that belongs to the
   S100 family of proteins. Clinical studies have provided evidence that
   elevated levels of S100A8 are strongly associated with critical cases
   of COVID-19, and can effectively differentiate between mild and severe
   disease states [[108]29, [109]31, [110]32]. Moreover, S100A8 is
   believed to play a crucial role in regulating cytokine storms, which
   have emerged as a defining characteristic of severe and potentially
   fatal COVID-19 cases [[111]33]. Whereas S100A12, a member of the S100
   family of proteins, plays a crucial role in immune responses by
   interacting with receptors such as TLR4 and RAGE, and its close
   association with COVID-19 has been reported [[112]34]. Another
   identified severity-induced eGene was STAT3, which encodes for a
   transcription factor that belongs to the STAT protein family. Although
   STAT3 was expressed in most stages of COVID-19, the most significant
   genotype-dependent effect was observed in the severe stage
   ([113]Supplementary Fig. S4). This suggests that for many
   monocyte-related genes, genetic regulation is only evident during
   specific stages of COVID-19 progression, particularly during the severe
   stage.

Figure 4.

   [114]Figure 4.
   [115]Open in a new tab

   Context-specific eQTLs of monocytes in COVID-19 patients. (A) Upset
   plot showing the number and composition of significant cis eQTLs for
   eight cell types. (B) Example of monocyte-specific eQTL. The SNP
   rs11516630 has a significant association with ADK expression just in
   the monocytes. adj. association P-value *P < .05, **P < .001, and
   ***P < .0001. (C) The branched point plot showing the pseudotime
   development trajectories of cMs and ncMs (left) and nine bins (right).
   (D) Example genes that significantly change as a function of monocyte
   development pseudotime. Each dot corresponds to a cell, and colors
   represent different stages. (E) Example of dynamic eQTLs. The average
   expression of the gene within each pseudotime window was stratified by
   genotype.

   Building upon established evidence of context-specific eQTLs, we
   systematically examined the dynamic effects of genetic variation on
   gene expression across monocyte pseudotime [[116]35, [117]36]. Using an
   inferred trajectory model, we stratified all monocytes into nine
   consecutive pseudotime bins (Fig. [118]4C). When overlaying the
   expression of classical markers on the trajectory, we observed a
   continuous transition across the trajectory from cMs (B1) to ncMs (B9)
   (Fig. [119]4D). For example, CD14 is highly expressed in cMs and was
   found to be downregulated at late stages of the transition to cMs.
   Conversely, the expression of FCGR3A increased as the cells
   transitioned to cMs. In total, we identified 273 genes that exhibited
   changes in expression levels as a function of pseudotime. We found that
   dynamically regulated genes in quantiles were enriched in various
   pathways, such as response to zinc or copper ion (B1) and positive
   regulation of phagocytosis or lymphocyte migration (B4). By identifying
   dynamic gene expression, we gain deeper insights into regulatory
   processes and molecular changes underlying cellular transitions from
   cMs to ncMs.

   Next, we characterized dynamic changes in eQTL effects along the
   transition trajectory from cMs to ncMs, identifying patterns of
   variable regulatory activity across pseudotime. Of the 3204 cis-eQTLs
   identified in cMs to ncMs, 47 were expressed in at least five
   pseudotime bins and detected for dynamic effects. Many of the genes
   with dynamic eQTL effects play an important role in the positive
   regulation of phagocytosis for monocytes. For example, ERAP2 plays a
   crucial role in antigen processing and presentation, which is essential
   for phagocyte-mediated immune responses. As a key aminopeptidase in the
   endoplasmic reticulum, ERAP2 trims antigenic peptides for MHC class I
   loading, thereby influencing phagocytic cell functions [[120]37,
   [121]38]. The genetic variant rs2548524-A exhibited significant
   modulatory effects on ERAP2 expression throughout monocyte
   differentiation, with pseudotime-dependent variation in effect sizes.
   Notably, the strongest suppressive effect occurred in B7 (β = −0.25,
   FDR = 0.005), coinciding with the transitional stage between cMs and
   ncMs (Fig. [122]4E). This stage-specific repression suggests rs2548524
   may regulate ERAP2-mediated antigen processing during critical phases
   of phagocyte functional maturation. The similar trend is identified for
   genetic variants that influence the expression of the FCER1G, a subunit
   gamma of IgE receptor involved in immunoglobulin-mediated immune
   response [[123]39]. Our results found that seven genetic variants
   showed significant allelic effects in at least a single bin. Among
   these variants, the T allele at rs2070901 was associated with increased
   expression of FCER1G, suggesting that the T allele is likely to promote
   the immune response process. In addition, rs2070901 has shown
   high interaction patterns with FCER1G [[124]40, [125]41], indicating
   that the interaction strength between rs2070901^C/C and FCER1G may be
   enhanced at the end of transition from cMs to ncMs. In summary, our
   analysis of pseudotime eQTLs suggested that the significant genetic
   regulation of immune function-related genes is only evident during
   specific stages of cellular transitions from cMs to ncMs.

Identification of cell type-specific COVID-associated risk genes

   To identify potential disease-risk genes, we performed Bayesian genetic
   colocalization analyses using immune cell type-specific eQTLs to map
   loci associated with COVID-19 (Fig. [126]5A). As a result, we
   identified 19 unique colocalizations (PP4 > 0.95), corresponding to 19
   GWAS loci for 21 SNP-gene pairs in seven immune cell types. Among these
   genes, 16 (84.2%) were protein-coding genes ([127]Supplementary Table
   S4). Further, we performed the TWAS analysis and identified 43 genes,
   38 of which are protein-coding genes, associated with COVID-19
   ([128]Supplementary Table S5). In total, we identified a set of 57 risk
   genes associated with COVID-19, with 5 genes overlapping between the
   two approaches (Fig. [129]5B). Importantly, despite having the same
   lead genetic variants, we discovered that the overlapping risk genes
   exhibited specific genetic effects across different cell types. For
   example, IL10RB, which has been identified as a prime candidate gene
   target for COVID-19 host susceptibility [[130]42]. IL10RB is highly
   expressed only in monocytes, but the genetic effects of IL10RB
   expression exhibited considerable variations across different cell
   types ([131]Supplementary Fig. S5A). Our analyses highlighted the
   advantage of using cell type-specific cis-eQTLs to annotate to identify
   novel associated genes with COVID-19.

Figure 5.

   [132]Figure 5.
   [133]Open in a new tab

   Colocalization of immune-specific eQTLs with GWAS associations for
   COVID-19. (A) Identification of COVID-19 pathogenic genes using two
   methods. (B) The Venn diagram of pathogenic genes for Coloc and TWAS
   (right). (C) Sangi bubble plot of COVID-19 pathogenic gene enrichment
   analysis. (D) Interaction network diagram of genetic genes/loci
   determined by Coloc and TWAS. The dashed lines represent gene–gene
   interactions based on the STRING database. (E) Venn diagram of COVID-19
   with HCoV-229E, HCoV-OC43, and MERS-CoV risk genes.

   Next, we investigated whether the identified risk loci and genes
   associated with COVID-19 had an impact on specific cellular functions.
   Our findings revealed that the risk genes were enriched in
   interferon-mediated signaling pathways, which play crucial roles in
   immune response and the development of COVID-19, particularly in
   response to viral infection (Fig. [134]5C). Further, for 57 risk genes
   and their corresponding genetic variants, we conducted an interaction
   network to identify crucial risk gene modules. We found that part of
   the genes clustered into connected modules based on the interaction
   pairs in STRING database (Fig. [135]5D) [[136]43]. In addition, we
   observed that neighboring genes within the modules were regulated by
   shared risk variants of COVID-19. For example, we found a set of
   interconnected genes, six of which were regulated by rs3176767, a
   potential risk variant for regulating ICAM1 and ICAM5 in rheumatoid
   arthritis [[137]44]. Notably, the expression of these COVID-risk genes
   was downregulated by rs3176767, suggesting that this variant may
   decrease COVID-19 risk ([138]Supplementary Fig. S5B). Our results
   indicated that immune disease loci colocalize with genes that may be
   involved in the COVID-19 process and that genes with similar functions
   tend to be regulated by disease risk variants.

   To provide insights into COVID-19 risk genes, we further explored the
   role of these genes in other coronaviruses, including HCoV-229E,
   HCoV-OC43, and MERS-CoV (Fig. [139]5E). We found that 57 COVID-19 risk
   genes partially overlapped with significant differential genes
   associated with other coronaviruses. Notably, several of the
   overlapping genes, such as ICAM1, ICAM5, NGFR, RPS6KL1, and SAMD14, are
   involved in synapses between neurons and CXCR chemokine receptor
   binding, suggesting that these risk genes might collectively influence
   the progression of coronaviruses by modulating specific biological
   pathways. Considering the nonuniformity of data types available for
   COVID-19 and other coronaviruses, we produced the pseudobulk expression
   profiles for COVID-19 and obtained a total of 715 significant
   differential genes (log[2]FC > 2, P-value <.05). Moreover, it was
   observed that out of the 48 differential genes identified in COVID-19,
   significant differences were also found in other three coronaviruses.
   Significantly, CXCL1 showed notable differences between case and
   control in all four coronaviruses, indicating its potential
   significance and involvement in the pathogenesis of these viral
   infections. In summary, our analysis expands the list of candidate risk
   genes for COVID-19 and provides a new reference for targeted therapies
   for COVID-19.

Discussion

   The immune dysregulation associated with COVID-19 can lead to severe
   adverse reactions and infections, causing multiorgan dysfunction and
   even death [[140]45]. Through systematic single-cell profiling of
   214 526 immune cells across five distinct disease stages, our study
   delineates the cellular and molecular landscape of COVID-19
   pathogenesis, identifying eight immune cell populations. Our
   high-resolution analysis not only characterizes COVID-19-associated
   transcriptional alterations but also reveals cell type-specific
   enrichment of critical disease pathways. By conducting cell
   communication analysis, we observed a significant increase in both the
   quantity and the intensity of communication among monocytes. Monocytes
   emerged as the primary cellular source for several pathways, including
   the COVID-19-specific MIF and PARs pathways, as well as the
   healthy-specific APRIL and EPHB pathways. This observation highlights
   the crucial role of monocytes in the development of COVID-19.
   Therefore, our results provide valuable resources for investigating the
   immune dysregulation mechanisms associated with COVID-19.

   Clinical studies have indicated that susceptibility to COVID-19 can
   exhibit significant interindividual and population-specific variation
   [[141]46]. GWAS conducted in COVID-19 have identified multiple genetic
   risk variants that are associated with disease susceptibility. However,
   it has posed a challenge to identify causal genes and susceptible
   immune cell types of COVID-19, partly due to the lack of studies that
   systematically investigate the cell type-specific effects of genetic
   variants on gene expression. Based on the aforementioned background, we
   set out to map genetic effects of eight cell types on gene expression
   and identified >6000 significant cell type-specific cis-eQTLs of
   COVID-19 patients. Our results indicated that there is a high degree of
   cell type-specific genetic regulation on genes that is not captured
   with bulk RNA-seq, such as GETx. In addition, we identify examples of
   how genetic variants contribute to key immune-related eGenes. Given the
   essentiality of exploring the effects of genetic variants in
   biological contexts, it is crucial to fully uncover the
   context-specific genes and pathways regulated by disease-risk variants.
   Therefore, we further captured stage-specific and dynamic eQTLs for
   monocytes of COVID-19 patients. The findings revealed that a number of
   context-specific eQTLs were not identified through the initial cis-eQTL
   analysis. Moreover, to gain further insights into the relationship
   between eQTLs and COVID-19 risk loci, we performed TWAS and
   colocalization analysis, providing complementary information regarding
   the association between eQTLs and COVID-19 risk loci. Through the
   integration of cell type-specific eQTLs with COVID-19 risk loci
   identified via GWAS, we successfully identified a total of 57 causal
   genes at these loci. Additionally, we also determined the specific cell
   types in which these genes exert their pathogenetic effects.

   While this study provides valuable insights into the genetic regulation
   of immune responses in COVID-19, several limitations should be
   acknowledged. First, our variant annotation approach utilized genetic
   variants called from the RNA reads rather than dedicated genotyping
   arrays or whole-genome sequencing (WGS), which may have limited the
   comprehensive detection of genetic variants. While this RNA-based
   approach provides a practical method for leveraging existing
   transcriptomic data, it has important technical considerations compared
   to genotype data or WGS. The expression-dependent nature of RNA-seq
   data results in nonuniform genomic coverage, potentially missing
   variants in intergenic regions or lowly expressed genes that may harbor
   regulatory elements. Compared to standard SNP arrays typically used in
   eQTL studies, our approach had reduced power to detect variants in
   nontranscribed regions. Second, our analyses incorporated key
   covariates such as age and sex, but other potential confounders were
   not included due to data availability constraints. While age and sex
   are major demographic factors, unaccounted variables may introduce
   residual confounding, affecting the interpretation of genetic
   associations. Future studies with more extensive phenotypic data could
   further refine these analyses. Finally, sample size limitations in
   subgroup analyses may have reduced statistical power, increasing the
   risk of effect size biases. Larger cohorts would improve the robustness
   of cell type- and context-specific eQTL detection, particularly for
   rare cell states or modest genetic effects. Additionally, multi-omics
   integration future work could elucidate mechanistic links between
   genetic variants and immune dysregulation in COVID-19. Despite these
   limitations, our findings establish a foundation for understanding
   context-dependent genetic regulation in infectious diseases. Future
   studies with expanded genotyping, deeper phenotyping, and larger sample
   sizes will further advance precision medicine approaches for COVID-19
   and related pathologies.

Conclusion

   Collectively, our work combines genetic information with scRNA-seq
   profiles to elucidate the association underlying interindividual
   variants and gene expression in COVID-19. By integrating these
   approaches, we are able to identify key drivers responsible for the
   observed differences in immune responses among individuals. Our results
   showed how genetic variants influence the expression of genes involved
   in critical immune signaling pathways in a cell type-specific manner.
   Gaining a comprehensive understanding of the genetic regulation of the
   immune system holds significant implications for the treatment of both
   infections and cancer. Such knowledge can contribute to the development
   of targeted therapies and interventions that aim to modulate immune
   responses effectively and improve patient outcomes in disease contexts.

Supplementary Material

   lqaf130_Supplemental_File
   [142]lqaf130_supplemental_file.xlsx^ (549.2KB, xlsx)

Acknowledgements