Abstract
Patients with end-stage kidney disease (ESKD) are at high risk of
severe COVID-19. Here, we perform longitudinal blood sampling of ESKD
haemodialysis patients with COVID-19, collecting samples pre-infection,
serially during infection, and after clinical recovery. Using plasma
proteomics, and RNA-sequencing and flow cytometry of immune cells, we
identify transcriptomic and proteomic signatures of COVID-19 severity,
and find distinct temporal molecular profiles in patients with severe
disease. Supervised learning reveals that the plasma proteome is a
superior indicator of clinical severity than the PBMC transcriptome. We
show that a decreasing trajectory of plasma LRRC15, a proposed
co-receptor for SARS-CoV-2, is associated with a more severe clinical
course. We observe that two months after the acute infection, patients
still display dysregulated gene expression related to vascular,
platelet and coagulation pathways, including PF4 (platelet factor 4),
which may explain the prolonged thrombotic risk following COVID-19.
Subject terms: Biomarkers, SARS-CoV-2, Gene expression analysis,
Proteomic analysis
__________________________________________________________________
End-stage kidney disease confers a high risk for severe COVID-19
infection. Using an at-risk group (end-stage kidney disease patients
with COVID-19), authors using RNA-sequencing of immune cells and plasma
proteomic profiling to investigate the host response to viral
infection.
Introduction
COVID-19, caused by the SARS-CoV-2 virus, is a highly heterogeneous
disease. In most individuals, it is a mild, self-limiting illness, but
some individuals develop severe disease, typically manifesting as
respiratory failure with marked systemic inflammation and
immunopathology. Multiple studies have described
immunological^[56]1,[57]2, transcriptomic^[58]3–[59]11, and
proteomic^[60]12–[61]20 correlates of severe disease. The importance of
an aberrant host immune response in tissue injury in severe COVID-19 is
supported by the efficacy of anti-inflammatory treatments. These
include glucocorticoids^[62]21, monoclonal antibodies blocking the
interleukin-6 receptor^[63]22,[64]23, and the Janus kinase (JAK)
inhibitor baricitinib^[65]24. A wide range of additional therapies
directed at specific elements of the inflammatory response has been
developed for immuno-inflammatory diseases and present potential
repurposing opportunities for the treatment of severe COVID-19.
Understanding the molecular basis for severe COVID-19 is important for
the rational selection of such therapies.
Risk factors for severe COVID-19 include age, male sex, and the
presence of comorbidities such as chronic kidney disease (CKD). In CKD,
the risk of severe COVID-19 is proportional to the degree of renal
impairment^[66]25. End-stage kidney disease (ESKD) confers particularly
high risk, with a population-based study estimating a hazard ratio for
death of 3.69^[67]25 and a European registry study reporting 23.9%
28-day mortality in dialysis patients with COVID-19^[68]26. In part,
this is because ESKD patients are enriched for other risk factors for
severe COVID-19, including cardiometabolic disease. However, even after
adjustment for these, ESKD remains independently associated with the
risk of severe COVID-19. In addition, ESKD patients display impaired
vaccine responses^[69]27,[70]28, and those on in-centre haemodialysis
cannot shield effectively during lockdowns as they need to access
dialysis facilities regularly.
Here, we investigated the host response to SARS-CoV-2 in ESKD patients
on haemodialysis since study of such an at-risk group should enhance
the probability of identifying severity signals and might also point to
either an exaggerated or even distinct immunological response to the
virus. Moreover, ESKD patients receiving haemodialysis present an
opportunity for serial blood sampling of both outpatients and
inpatients with COVID-19, since patients must attend medical facilities
for regular dialysis regardless of COVID-19 severity. This enabled us
to perform longitudinal analysis and avoid the selection bias that
affects studies limited solely to hospitalised patients.
The host response to SARS-CoV-2 is orchestrated by a complex network of
cells and mediators, including circulating proteins such as cytokines
and soluble receptors. Soluble proteins play key roles in multiple
biological processes, including signalling, host defence and repair,
and are potential biomarkers and therapeutic targets. We therefore
hypothesised that a comprehensive analysis of both circulating proteins
and immune cells should yield valuable and complementary insights into
the pathobiology of COVID-19. To this end, we used the aptamer-based
SomaScan platform that provides broad coverage of the plasma proteome
(6323 proteins), combined with RNA-sequencing and flow cytometry of
peripheral blood mononuclear cells (PBMCs). We integrated these data to
provide a comprehensive view of the COVID-19 multi-omic landscape,
enabling us to link transcriptomic and cellular changes with
circulating proteins. Supervised learning identified plasma levels of
the LRRC15 protein, a recently proposed alternative receptor for
SARS-CoV-2, as a marker of disease severity. By comparing pre-infection
samples to samples collected from the same individuals during COVID-19
and after clinical recovery, we revealed persistent upregulation of
gene expression signatures related to vascular and clotting pathways
several months after infection. These findings elucidate the biological
underpinnings of the prolonged pro-thrombotic state associated with
COVID-19.
Results
Features of patient cohorts
We recruited two cohorts of ESKD patients on haemodialysis presenting
with COVID-19 (Fig. [71]1a). The Wave 1 cohort consisted of 53 patients
recruited during the initial phase of the COVID-19 pandemic (April-May
2020) (Supplementary Table [72]1). Serial blood sampling was carried
out where feasible (Fig. [73]1b), given the pressure on hospital
services and the effects of national lockdown. We assessed disease
severity using a WHO four-level ordinal score, categorising it into
mild, moderate, severe, and critical. Of the 53 patients, 25 had a peak
illness severity score of severe or critical (hereafter
severe/critical) and 28 mild or moderate (mild/moderate). Nine died.
The majority of patients were of non-European ancestry. Further
clinical and demographic details are provided in Supplementary
Table [74]1. We also contemporaneously recruited 59 non-infected
haemodialysis patients to provide a control group, selected to mirror
the age, sex and ethnicity distribution of the COVID-19 cases
(Supplementary Fig. [75]1a–c).
Fig. 1. Study design and cohort summary.
[76]Fig. 1
[77]Open in a new tab
a Graphical summary of the patient cohorts, sampling, and major
analyses. Wave 1 patients were recruited in Spring 2020. 17 of the
COVID-19 negative ESKD patients recruited as a controls in Wave 1 were
recruited again as COVID-19 positive cases in Wave 2 (2021). For 12/13
survivors in Wave 2, we obtained a convalescent sample approximately 2
months following recovery. Thus, for Wave 2, we had paired
pre-infection, acute infection and post-infection samples from the same
individuals. b, c For each cohort, the timing of the serial blood
sampling is shown by triangles and the temporal COVID-19 severity by
coloured bars. Three patients were hospitalised prior to COVID-19
diagnosis in the Wave 1 cohort. Three of the four patients in the Wave
2 cohort with fatal outcomes died >30 days from first positive swab.
The Wave 2 cohort consisted of 17 ESKD patients with COVID-19, infected
during the resurgence of cases in January-March 2021 (Supplementary
Table [78]2). All had been recruited as part of the COVID-19 negative
control group during Wave 1, thereby providing a pre-infection sample
collected 8–9 months earlier. For the Wave 2 cohort, we systematically
acquired serial samples for all patients at regular intervals (every
2–3 days over the course of the acute illness) (Fig. [79]1c). 9
patients had a peak illness severity of severe/critical (of whom 4
died), and 8 mild/moderate. For 12 of these patients, we acquired
convalescent samples approximately two months following infection.
The effect of COVID-19 on the PBMC transcriptome and plasma proteome in ESKD
patients
We performed transcriptomic profiling using RNA-seq of PBMCs. Principal
components analysis (PCA) revealed a clear effect of COVID-19 in both
Wave 1 (COVID-19 positive and negative patient samples) and Wave 2
(pre-infection and subsequent COVID-19 positive samples from the same
individuals) (Fig. [80]2a). In the Wave 1 cohort, differential gene
expression analysis between COVID-19 positive (n = 179 samples from 51
patients) and negative samples (n = 55) (using linear mixed models
(LMM) to account for repeated samples from the same individuals)
identified 3026 significantly up-regulated and 3329 down-regulated
genes (1% false discovery rate, FDR) (Supplementary Data [81]1a).
Sensitivity analyses exploring the effects of including additional
clinical covariates (underlying cause of ESKD, diabetes, and time since
first commencing haemodialysis) in the model did not materially impact
the results ([82]Supplementary Material, Supplementary Fig. [83]2). For
the Wave 2 cohort, where we compared COVID-19 positive samples (n = 90
samples from 17 individuals) with pre-infection samples from these same
individuals, we identified 2871 up-regulated and 3325 down-regulated
genes (1% FDR, LMM) (Supplementary Data [84]1a). These findings
demonstrate widespread transcriptomic changes associated with COVID-19.
3468 genes were significantly differentially expressed (1% FDR) in both
the Wave 1 and 2 cohorts. However, this approach of intersecting lists
of significant features based on a hard statistical threshold will
underestimate commonality between two datasets^[85]29. To provide a
measure of consistency that is not dependent on the significance
threshold, we compared the estimated effect size (log2 fold change)
between COVID-19 positive and negative samples for each gene in the
Wave 1 and Wave 2 cohorts. These were highly concordant (Pearson’s r
0.80) (Supplementary Fig. [86]3a), despite differences in the prevalent
SARS-CoV-2 variant and developments in medical management (8 of 17
patients in the Wave 2 cohort received glucocorticoids). To identify
the genes that were consistently differentially expressed across both
cohorts, we used robust rank aggregation (RRA) (Supplementary
Data [87]1a, Supplementary Fig. [88]4).
Fig. 2. Signatures of COVID-19 in ESKD.
[89]Fig. 2
[90]Open in a new tab
a PCA of the PBMC transcriptome (left) and plasma proteome (right).
Each point represents a sample and is coloured by COVID-19 status. b
Paired violin plots showing intra-individual comparisons of
pre-infection and most severe sample (Wave 2 cohort; n = 34 samples
from 17 individuals) during COVID-19 for selected genes. Grey lines
link each individual’s pre-infection and infection samples; these
samples are represented by points and coloured by COVID-19 status.
Shaded areas indicate kernel density estimates. For boxplots, centre =
median, upper bound = upper quartile, lower bound = lower quartile. All
genes shown were significantly differentially expressed (1% FDR) in
both cohorts. Genes are grouped by membership to pathways that were
significantly enriched (1% FDR) in GSVA. c The 30 protein pathway
enrichment terms with the greatest RRA scores (indicating consistent
dysregulation in both the Wave 1 and Wave 2 proteomic datasets),
ordered by effect size. All pathway terms shown were significantly
enriched in the individual cohort analyses (1% FDR). Red= up-regulated
in COVID-19 positive versus negative; blue= down-regulated. d As for b,
but displaying selected plasma proteins (significant at 1% FDR) (n = 32
samples from 16 individuals).
To gain insight into the biological pathways underlying these changes,
we used Gene Set Variation Analysis (GSVA)^[91]30 to compare COVID-19
positive and negative ESKD samples (Supplementary Data [92]1[93]b).
Enriched pathways included those related to cell cycle (e.g.,
‘Polo-like kinase mediated events’, which are involved in the cellular
response to DNA damage) and host defence (e.g., ‘Complement cascade’,
‘Fc-gamma receptor-dependent phagocytosis’, and ‘Parasite infection’)
(Supplementary Fig. [94]5). This analysis also highlighted
leucocyte-endothelial interactions (‘Cell surface interactions at the
vascular wall’, which included SELL and CEACAM-1, -3, -6 and -8 genes).
Examples of marked changes in gene expression between pre-infection and
acute infection in the Wave 2 cohort included components of
‘Immunoregulatory interactions between a lymphoid and a non-lymphoid
cell’ pathway term (e.g., SIGLEC1, SIGLEC9, SELL, all increased) and
‘Development and heterogeneity of the ILC family’ (e.g., IFNG, GATA3,
RORA, all decreased) (Fig. [95]2b).
We next assessed the circulating proteome, measuring 6323 proteins
using the SomaScan platform (Supplementary Data [96]1c). PCA showed
clear differences between COVID-19 positive and negative samples
(Fig. [97]2a). We identified 1273 differentially abundant proteins
between COVID-19 positive and negative samples in Wave 1 (86 samples
from 37 COVID-19 positive ESKD patients versus 53 non-infected ESKD
patient samples, LMM) (Supplementary Data [98]1[99]d, Supplementary
Fig. [100]6). In Wave 2, comparison of COVID-19 positive samples
(n = 102 samples from 17 patients) with pre-infection samples from the
same individuals identified 5265 differentially abundant proteins. The
effect sizes were generally concordant between the cohorts (Pearson’s r
0.57) (Supplementary Fig. [101]3b), and 730 proteins were significantly
differentially abundant (1% FDR) in both the Wave 1 and 2 datasets. As
for our transcriptomic analysis, we used RRA to rank the consistency of
differentially abundant proteins across the cohorts (Supplementary
Data [102]1[103]d).
Enrichment analysis revealed upregulation of pathways, including
‘DDX58/IFIH1 mediated induction of interferon-alpha/beta’, ‘Wilk et
al., 2021 IFN module’^[104]31, ‘Host-pathogen interaction of human
coronaviruses interferon induction’ and ‘SARS-CoV-2 innate immunity
evasion and cell-specific immune response’, reflecting host anti-viral
responses and providing validation of our analysis (Fig. [105]2c,
Supplementary Data [106]1[107]e). Highly up-regulated proteins within
these pathways included STAT1; DDX58 and ISG15, both crucial to the
IFN-mediated antiviral response in COVID-19^[108]32; IFITM3, which is
up-regulated in lung epithelial cells during early SARS-CoV-2
infection^[109]33; and the chemokines CXCL11, CXCL1, CXCL6, CXCL5 and
CXCL10. Another significantly up-regulated pathway was
‘Senescence-associated secretory phenotype’, which included
up-regulated ubiquitin-conjugating enzymes (UBE2S, UBE2E1), histones
(H2BC21, H2BU1) and STAT3 (Fig. [110]2d). Down-regulated pathways
included ‘Integrin cell surface interactions’ and ‘Collagen
biosynthesis and modifying enzymes’ which contained collagen proteins
(e.g., COL11A2, COL13A1, COL15A1) and related enzymes (e.g., P4HB,
PCOLCE) (Fig. [111]2d).
Transcriptomic and proteomic changes associated with COVID-19 severity
In both cohorts, the PCA of the PBMC transcriptomics revealed
differences according to both severity at time of sampling and overall
clinical course (defined by peak severity score) (Fig. [112]3a). There
was a gradient of severity reflected in the molecular phenotype. We
next assessed molecular features associated with severity at time of
blood sampling, encoded as an ordinal variable. We identified 3522
genes that were significantly associated with contemporaneous severity
in the Wave 1 cohort and 657 genes in the Wave 2 cohort (LMM, 1% FDR,
Supplementary Data [113]1f, Supplementary Fig. [114]7). 363 genes were
significantly associated in both cohorts. We then applied GSVA to
identify pathways and used RRA to combine results from each cohort
(Supplementary Data [115]1g).
Fig. 3. Association of the PBMC transcriptome and COVID-19 severity.
[116]Fig. 3
[117]Open in a new tab
a PCA of the PBMC transcriptome. Each point represents a sample and is
coloured by contemporaneous COVID-19 WHO severity (left) and overall
clinical course (right). b The 30 GSVA transcriptomic pathway
enrichment terms with the greatest RRA scores. All were significantly
enriched in both Wave 1 and 2 cohorts (1% FDR). Terms are ordered and
coloured by their effect size. Red=up-regulated in more severe
COVID-19; blue=down-regulated. c Violin plots show gene expression
values (Wave 1 cohort; n = 179 samples from 51 COVID-19 positive ESKD
patients and 55 samples from COVID-19 negative ESKD patients)
stratified by COVID-19 status and severity (at time of sample) for
selected genes. Shaded areas indicate kernel density estimates. For
boxplots, centre=median, upper bound=upper quartile (UQ), lower
bound=lower quartile (LQ), upper whisker=largest value at most 1.5 *
IQR (inter-quartile range) from the UQ, lower whisker=smallest value at
most 1.5 * IQR from the LQ, points=samples outside of the range of the
whiskers. All genes shown were significantly associated (1% FDR) with
severity in both the Wave 1 and 2 cohorts. Genes are grouped by
membership to pathways that were significantly enriched (1% FDR) in
GSVA.
The up-regulated transcriptomic pathways in more severe disease
included those involved in oxidative stress (‘Glutathione metabolism’,
‘Detoxification of reactive oxygen species’), ‘Transcriptional
regulation of granulopoiesis’, pathways containing numerous
histone-encoding genes (‘HDACs deacetylate histones’, ‘Diseases of
programmed cell death’, ‘RHO GTPases activate PKNs’) and ‘Complement
and coagulation cascades’ (Fig. [118]3b, c, Supplementary
Data [119]1g). Down-regulated pathway terms included ‘TCRA pathway’,
‘Pathogenesis of SARS-CoV-2 mediated by nsp9-nsp10 complex’, ‘TP53
activity’, and ‘PD1 signalling’, suggesting T cell activation in more
severe COVID-19 (Fig. [120]3b, c, Supplementary Data [121]1g).
PCA of the proteomic data revealed differences according to clinical
severity (Supplementary Fig. [122]8a). We found 148 and 1625 proteins
associated (LMM, 1% FDR) with disease severity in the Wave 1 (86
COVID-19 positive samples) and Wave 2 (102 COVID-19 positive samples)
datasets, respectively (Supplementary Data [123]1[124]h, Supplementary
Fig. [125]9). 98 proteins were associated with severity in both
datasets. Pathway analysis identified 15 severity-associated pathway
terms that reached statistical significance (1% FDR) in both cohorts
(Supplementary Fig. [126]8b, Supplementary Data [127]1[128]I). Among
the most upregulated pathways in more severe disease were ‘HDACs
deacetylate histones’, pathways related to transcriptional regulation
(e.g., ‘mRNA splicing minor pathway’, ‘Spliceosome’, ‘RNA polymerase II
transcription termination’, ‘Processing of capped intron-containing
pre- mRNA’) and ‘RUNX1 regulates genes involved in megakaryocyte
differentiation and platelet function’, while the most down-regulated
pathways included ‘PD-1 signalling’ and ‘T-cell receptor and
costimulatory signalling’. Example proteins from these pathways are
shown in Supplementary Fig. [129]8c.
Severe COVID-19 is associated with dynamic multi-omic modular trajectories
We next examined the temporal trajectories of the transcriptome and the
proteome during COVID-19 by explicitly modelling molecular profiles
with respect to time following symptom onset (Methods). To aid
biological interpretation, we first applied a dimension reduction
strategy using weighted gene correlation network analysis
(WGCNA)^[130]34. WGCNA identified 23 modules of co-expressed genes
(which we denote with the prefix t) (Supplementary Data [131]1j), and
12 proteomic modules (denoted with p) (Supplementary Data [132]1k).
Longitudinal modelling revealed 8 transcriptomic and 5 proteomic
modules with significantly (5% FDR) different temporal patterns in
patients with mild/moderate versus severe/critical disease (LMM time x
clinical course (TxCC) interaction—Methods) (Supplementary
Tables [133]3-[134]4). Typically, the modules displayed a flat temporal
profile in mild/moderate COVID-19, whereas there was a dynamic profile
in severe/critical disease (Fig. [135]4a, Supplementary Fig. [136]10).
Some modules rose with time in severe/critical patients (e.g., tB, tL,
p9 and p12), while others dropped (e.g., tC, tP, tI, p7). Examples of
individual genes from module tB exhibiting this behaviour include MMP9,
ORM1, LRRN1 (Fig. [137]4b).
Fig. 4. Longitudinal profiles of transcriptomic modules.
[138]Fig. 4
[139]Open in a new tab
a The longitudinal profiles of significant (TxCC, LMM, FDR < 0.05) gene
modules, stratified by clinical course. Lines represent estimated
marginal means and shaded areas represent their 95% confidence
intervals. b Modelled longitudinal profiles of the three genes within
module B with the most significant TxCC interaction effects (LMM).
Left: lines represent estimated marginal means and shaded areas
represent their 95% confidence intervals. Right: individual-level data
(n = 169 samples from 40 individuals). c Heatmap displaying
associations (LMM) between transcriptomic and proteomic modules
(right). Red=positive correlation, blue=negative correlation.
Significant associations (5% FDR) are represented by an asterisk.
We identified significant associations between modules, with
transcriptomic and proteomic modules clustered into larger positively
or negatively correlated groupings (Fig. [140]4c). The inter-modular
associations appeared to strongly reflect association with COVID-19
severity at time of sampling (Supplementary Tables [141]3-[142]4),
implying that this is a strong underlying factor in the -omics data.
Consistent with this, integrated analysis of the transcriptomic and
proteomic datasets using MEFISTO^[143]35 revealed a single factor that
had a significantly different trajectory in severe/critical versus
mild/moderate disease (p = 5.4 × 10^-12, LMM TxCC) (Supplementary
Fig. [144]11).
We characterised the modules by pathway analysis (Fig. [145]4a,
Supplementary Tables [146]3-[147]4, Supplementary Data [148]1l,
Supplementary Data [149]1m). We also investigated whether disease
trajectory-associated transcriptomic modules might reflect a shift in
cell-type proportions, estimated using the CIBERSORTx algorithm
(Methods) (Supplementary Fig. [150]12, Supplementary Data [151]1n). The
severity-associated modules tB and tJ were both strongly positively
associated with myeloid cell proportions, particularly neutrophils, and
negatively associated with lymphocyte subsets (Supplementary
Fig. [152]12). The presence of a neutrophilic gene signature in the
PBMC preparations may indicate the presence of low-density
granulocytes. Consistent with this, hub genes in Module tB (including
TECPR2, CSF3R, STX3; Fig. [153]4a) are associated with granulocytes and
autophagy, and pathway analysis of the module genes revealed enrichment
for pathways including ‘Neutrophil degranulation’ and ‘ROS and RNS
production in phagocytes’ (including genes encoding the key cytosolic
components of the phagocyte NADPH oxidase such as NCF1, NCF2 and NCF4).
Module tB also contains genes encoding calcium-binding proteins (e.g.,
S100A6, S100A9, S100A11, S100A12) that play important roles in
regulating inflammatory pathways^[154]36, as well as integrins (e.g.,
ITGA1, ITGAM, ITGB4, ITGAX, ITGAD), adhesion molecules (e.g., CEACAM1,
CEACAM3, CEACAM4, ICAM3), OSM (encoding Oncostatin M) and CSF1
(encoding M-CSF). The tL module, which also displayed a rising
trajectory in worse disease, was strongly positively associated with
imputed plasma cell proportion (Supplementary Fig. [155]12) and many of
its members encoded immunoglobulins. The severity-associated proteomic
modules that strongly correlated with transcriptomic modules tB, tJ and
tL were p8 and p9 (both enriched for pathways related to RNA splicing),
and p12 (significantly enriched for the pathway ‘HDACs deacetylate
histones’) (Supplementary Table [156]4). The latter is consistent with
our earlier observations that a histone pathway signature was
prominently associated with COVID-19 severity in both the RNA-seq
(Fig. [157]3c, Supplementary Fig. [158]7) and plasma proteomic data
(Supplementary Fig. [159]8c).
In contrast to tB, tJ, and tL, the other transcriptomic modules (tP,
tC, tF, tI, tN) all displayed a decreasing trajectory in patients with
worse disease (Fig. [160]4a). These transcriptomic modules tended to be
positively associated with imputed lymphocyte subset proportions and
negatively associated with imputed myeloid proportions, implying that
higher lymphocyte-related gene signatures and lower myeloid-related
ones is a favourable prognostic sign (Supplementary Fig. [161]12). Our
findings are consistent with studies in non-ESKD COVID-19 cohorts that
show that a reduction in lymphoid cell proportion and an increase in
myeloid cell proportion are associated with more severe disease
(e.g.,^[162]8,[163]37).
Flow cytometry identifies markers of enhanced interferon signalling early in
severe disease
To understand whether transcriptional signatures in PBMCs reflected
changes in blood cell proportions, we performed flow cytometry on a
subset of PBMC samples from the Wave 2 cohort. We found no major
difference in the overall proportions of myeloid or lymphoid cells
within the PBMC fraction between pre-infection and COVID-19 positive
samples, except for a reduction in the proportion of type 2 dendritic
cells (Supplementary Fig. [164]13). Similarly, there was little
difference in the distribution of cells between mild/moderate and
severe/critical patients. We observed some severity-related differences
within cell subsets. Within lymphoid cells, we noted higher expression
of the activation marker CD69 on CD4+ T cells at day 7 in
severe/critical disease compared to either pre-infection or
mild/moderate disease (Supplementary Fig. [165]14a). At day 14, there
was an increase in CD38^hi plasmablasts in severe/critical disease
compared to pre-infection or mild/moderate samples (Supplementary
Fig. [166]14b). We also found that in severe/critical patients, there
was a progressive drop in the proportion of non-classical monocytes
over the first 14 days of the illness that was more marked than in
mild/moderate patients (Supplementary Fig. [167]15a). This is
consistent with previous studies in non-ESKD cohorts showing an
association between a decrease in non-classical monocytes and more
severe COVID-19 (e.g.,^[168]8,[169]31,[170]38,[171]39.). In
severe/critical patients there was a greater proportion of intermediate
and non-classical monocyte subsets expressing CD38 compared both to
pre-infection samples and to mild/moderate patients (Supplementary
Fig. [172]15b), likely reflecting enhanced activation^[173]30. In
classical monocytes there was a similar, but non-significant, trend. We
found higher expression of proliferation-associated Ki67 on classical
monocytes in COVID-19 versus pre-infection samples in both
mild/moderate and severe/critical patients (Supplementary
Fig. [174]15c). In our transcriptomic data we identified increased
SIGLEC1 gene expression in COVID-19 (Fig. [175]2b). SIGLEC-1 is
exclusively expressed by CD14 + monocytes at the protein level.
SIGLEC-1 expression measured by flow cytometry correlated with GSVA
enrichment score of type I IFN signatures (Supplementary
Fig. [176]15d). We observed SIGLEC1 expression increased at greater
intensity as early as day 0-3 post infection in severe/critical versus
mild/moderate patients, suggesting stronger and a more immediate type I
IFN response in severe COVID-19 (Supplementary Fig. [177]15e).
Longitudinal cytokine/chemokine analysis reveals distinct temporal profiles
that distinguish disease severity
Many plasma proteins associated with severe COVID-19 are canonically
intra-cellular proteins. Their elevation in severe COVID-19 may
therefore be a readout of increased cell turnover, death, stress, and
viral hijacking of host cellular machinery. Consequently, we performed
a more focussed analysis examining proteins whose primary biological
role is to act extra-cellularly (e.g., cytokines, chemokines, growth
factors and their receptors). These classes of proteins are important
therapeutic targets in inflammatory diseases^[178]40. Accordingly, we
modelled the temporal profiles of 232 proteins that fell within the
KEGG pathway ‘Cytokine-cytokine receptor interaction’. Fifty proteins
had significantly different profiles in patients with a severe/critical
clinical course versus those with mild/moderate ones (TxCC interaction
effect, 5% FDR; Supplementary Data [179]1o). Proteins exhibited
distinct patterns of divergence between severe/critical and
mild/moderate disease over time (Fig. [180]5a). Some (e.g., IL1β, IL6,
IL15RA, CCL2) showed a relatively stable temporal profile in
mild/moderate patients but rising trajectories in severe/critical
patients (Fig. [181]5b). Others (e.g., CCL15, TNFSF13B (BAFF), PDGFRB,
EDAR, IFNA10, IFNA13, IFNA16, IFNE, and IFNL3) were elevated early in
the disease course and decreased over time, but displayed more marked
initial elevations in severe/critical patients (Fig. [182]5c). Yet
other proteins displayed temporal profiles in mild/moderate patients
that were inverted compared to severe/critical. For example, CD40LG,
TNFSF10 (TRAIL) and IL11 were reduced in the severe/critical versus the
mild/moderate group at early timepoints but increased in
severe/critical patients later (Fig. [183]5d). Conversely, leptin,
INHBA (inhibin A), and CCL22 were initially higher in severe/critical
than mild/moderate patients but with the reverse pattern later on
(Fig. [184]5e). These data illustrate the dynamic nature of the soluble
protein response and how this varies according to disease severity,
highlighting the limitation of studies that use a single snapshot.
Comparison of our data to another study utilising longitudinal
proteomic profiling (Filbin et al.^[185]12) in COVID-19 in a more
general patient population revealed similar findings (Supplementary
Material), suggesting that the effects we observed are generally not
specific to ESKD. One exception was EPOR (erythropoietin receptor). In
our ESKD patient data, this exhibited a significantly different
temporal profile in severe/critical versus mild/moderate disease
(p = 1.7 × 10^-8, LMM time x disease course interaction). In contrast,
the temporal profile was not significantly different among differing
disease severity strata in the data of Filbin et al. (p = 0.47, LMM)
(Supplementary Material, Supplementary Fig. [186]16). Erythropoietin
(EPO) is a hormone produced by the kidney that promotes red cell
formation. In ESKD there is loss of EPO production by the kidney, and
consequently patients require exogenous administration of recombinant
EPO. The contrasting EPOR longitudinal profiles between severe and mild
patients in our cohort are likely to reflect the changes in
erythropoietin responsiveness that accompany critical illness and
sepsis in ESKD patients.
Fig. 5. Dynamic temporal changes in circulating cytokines and receptors vary
between severe and mild COVID-19.
[187]Fig. 5
[188]Open in a new tab
a Heatmap displaying proteins with a significantly different temporal
profile in mild vs severe disease (TxCC, LMM, FDR < 0.05). Colour
indicates LMM estimated marginal means over time, stratified by patient
group (n = 169 samples from 40 individuals). Proteins are clustered
based on the temporal profile of the discordance between mild/moderate
and severe/critical disease. Proteins are annotated using gene symbols,
with alternative common protein identifiers in parentheses. b–e
Examples of proteins with differing patterns of discordance over time
in severe/critical versus mild/moderate patients (TxCC, LMM,
FDR < 0.05). Lines represent estimated marginal means and shaded areas
represent their 95% confidence intervals. The raw proteomic profiles
display protein abundance for each individual (n = 169 samples from 40
individuals).
Do immune cell transcriptomic and plasma proteomic signatures of COVID-19
differ between ESKD patients and non ESKD patients?
We next sought to investigate whether the plasma proteomic and PBMC
transcriptomic signatures that we observed in our data were specific to
ESKD patients, or were like those identified in other non-ESKD patient
cohorts. To this end, we compared our RNA-seq results to the COvid-19
Multi-omics Blood ATlas (COMBAT) Consortium study^[189]7. We
re-analysed their RNA-seq data so that the analytical approach was as
similar as possible to that used in the present study (see
Supplementary Material). For the differential gene expression analysis
between COVID-19 positive and negative samples, we found a high degree
of concordance between our study and the COMBAT study (Pearson r 0.7
for comparison of estimated effect sizes; Supplementary Fig. [190]17),
despite the use of whole blood in the COMBAT study versus PBMC in our
study. We observed similar consistency in the pathway-level GSVA
analysis (Supplementary Fig. [191]17). Our results for the association
of gene expression with COVID-19 severity were also generally
consistent with the COMBAT study (r 0.6 for gene-level analysis;
Supplementary Fig. [192]18; see Supplementary Material for more
detail). These findings suggest that similar immune cell transcriptomic
patterns occur in COVID-19, irrespective of whether the patient
has ESKD or not.
We then compared our plasma proteomic data to those of Filbin et
al.^[193]12, which also used the SomaScan proteomic platform and
collected samples at multiple timepoints during acute COVID-19. While
there was still some correlation of effect estimates, this was
generally lower than with the cross-study transcriptomic comparisons
(Supplementary Material; Supplementary Figs. [194]19 and [195]20). This
may indicate a greater impact of ESKD on the circulating proteome in
COVID-19 than the immune cell transcriptome, although we cannot exclude
the possibility that technical factors account for the differences.
In summary, these comparisons reveal that our results are generally
similar to those identified in cohorts of COVID-19 patients without
ESKD, albeit less so for plasma proteomics than for RNA-seq. Despite
the lack of clear systematic differences in the -omic signatures in
ESKD patients and other cohorts, manual review of our significant
results did reveal specific instances of some biologically plausible
examples of ESKD-specific effects, such as the example of EPOR
described above.
Plasma LRRC15 as a marker of COVID-19 severity
We next investigated whether clinical severity could be inferred from
the transcriptomic and/or proteomic data and which had the better
predictive performance. For each COVID-19 patient, we selected the
first sample at the patient’s peak severity score so that there was one
sample per patient. To predict COVID-19 severity at time of sampling,
we employed two supervised learning methods, lasso and random forests.
We applied these separately on (i) the plasma proteomic data; (ii) the
PBMC transcriptomic data; and (iii) the combination of both (the
multi-omic data). For this analysis, we combined the COVID-19 cases
from both cohorts. Area under the curve (AUC) was estimated using Monte
Carlo cross-validation (Methods). As an additional analysis, we trained
models on the Wave 1 cohort and tested on the Wave 2 cohort
(Supplementary Material).
The proteomic-based models outperformed the transcriptome-based ones,
with non-overlapping 95% confidence intervals (Fig. [196]6a,
Supplementary Fig. [197]21a). The lasso model generated on the proteome
had an estimated AUC of 0.93 (versus 0.86 for the transcriptome). The
random forests model generated on the proteome had an AUC of 0.88
(versus 0.83 for the transcriptome). The models based on the proteome
alone also had greater predictive performance than those trained on the
multi-omic data, although the confidence intervals for the AUC
estimates overlapped (Fig. [198]6a, Supplementary Fig. [199]21a).
Fig. 6. Supervised learning to predict COVID-19 severity from molecular
features.
[200]Fig. 6
[201]Open in a new tab
a Point estimates (mean) and 95% confidence intervals of the area under
the receiver operating characteristic curve (AUC-ROC) for predicting
COVID-19 severity (from 200 cross-validation resamples of 51
independent samples) using lasso regression. Both = supervised learning
on the combined proteomic and transcriptomic data. b Important proteins
(left) and genes (right) for the lasso model. Feature importance is
scaled between 0 and 1, where 1 represents the most important feature.
c The profile of LRRC15 plasma protein concentration over time,
stratified by severity of the patients’ overall clinical course
(n = 169 samples from 40 individuals). Left: lines represent estimated
LMM marginal means and shaded areas represent their 95% confidence
intervals. Right: raw data for each individual.
We next examined the supervised learning models to identify the most
important biomarkers of severe/critical disease (Methods)
(Fig. [202]6b, Supplementary Fig. [203]21b, Supplementary
Data [204]1[205]p-r). Although only a minority of the input features to
the multi-omic model were proteins (34%; 6323/18,548), proteins made up
the majority of the top 15 most important predictors (10/15 for lasso
and 9/15 for random forests). This, and our finding that the plasma
proteome was a superior classifier of severity than the PBMC
transcriptome, highlights that plasma proteins provide a valuable
read-out of the pathophysiological processes in severe COVID-19.
Both lasso and random forests identified plasma LRRC15 protein levels
as the most important biomarker of COVID-19 severity. Interestingly,
this protein was recently identified by three pre-prints as a receptor
for SARS-COV-2^[206]41–[207]43. We next examined LRRC15’s longitudinal
trajectory over the course of COVID-19 infection, finding that it
displayed a different temporal profile dependent on the disease course
(p = 6.5 × 10^-8, TxCC interaction, LMM). The concentration was stable
in most individuals with mild/moderate COVID-19 (Fig. [208]6c), whereas
it decreased over time in severe/critical patients. Thus, a snapshot
level of LRRC15 and its dynamic profile over time can convey
information on the current clinical state of the patient and the
overall course of the disease, respectively. Of note, data mining of
external studies revealed similar findings in two other
studies^[209]12,[210]19 that measured LRRC15 in non-ESKD COVID-19
patients (Supplementary Material, Supplementary Fig. [211]22).
Persistent deranged platelet and coagulation pathways in convalescence
For 12 of the 17 patients in the Wave 2 cohort, we obtained a sample
after clinical recovery at approximately two months following the acute
infection. PCA analysis of the PBMC transcriptome showed that while
pre-COVID-19 and convalescent samples appeared more similar than
samples taken during COVID-19, there were differences between the
convalescent samples and their pre-infection counterparts
(Fig. [212]7a), indicating that they have not fully returned to
baseline. Comparison of the convalescent samples to their paired
pre-COVID-19 samples revealed 25 significantly differentially expressed
genes (1% FDR), of which 24 were up-regulated post-COVID-19
(Fig. [213]7b, Table [214]1, Supplementary Data [215]1s). Up-regulated
clotting-related genes included PF4 (encoding platelet factor 4) and
the related gene PF4V1 (platelet factor 4 variant 1). Of note, these
genes are located in the same genomic region on chromosome 4, along
with the chemokine CXCL5, which was also significantly up-regulated.
Another nearby gene, PPBP (encoding Pro-Platelet Basic Protein, also
known as CXCL7), was also up-regulated in convalescent samples,
although it did not quite reach significance at 1% FDR (nominal
P = 3.3 × 10^-5, Benjamini-Hochberg adjusted P = 0.016). The
upregulation of these neighbouring genes suggests they are influenced
by a shared genomic regulatory element. Overrepresentation analysis of
the 25 differentially expressed genes revealed significant enrichment
of terms including ‘Platelet activation, signalling and aggregation’,
‘Formation of fibrin clot/clotting cascade’, ‘Chemokine signalling
pathway’, ‘SARS-CoV-2 innate immunity evasion and cell-specific immune
response’ and ‘Smooth muscle contraction’ (Fig. [216]7c, Supplementary
Data [217]1t). These data suggest persistent activation of abnormal
processes for a considerable time after clinical recovery. In
particular, they implicate the vascular and clotting systems, which may
have implications for long-term risk of thrombosis.
Fig. 7. Persistent dysregulation of immune cell gene expression two months
following COVID-19.
[218]Fig. 7
[219]Open in a new tab
a PCA of the Wave 2 PBMC transcriptomic data, including pre-infection,
infection and recovery samples (taken 2 months after the acute
illness). Each point represents a sample. Arrows link recovery samples
to the pre-infection sample from the same individual. b Paired violin
plots for differentially expressed genes in recovery versus
pre-infection samples (n = 24 samples from 12 individuals). Grey lines
link each individual’s pre- infection sample to their recovery sample;
these samples are represented by points. Shaded areas indicate kernel
density estimates. For boxplots, centre=median, upper bound=upper
quartile, lower bound=lower quartile. c All significantly enriched (5%
FDR) pathway terms for the differentially expressed genes in recovery
versus pre-infection samples. Overrepresentation testing was performed
for each gene set with a one-sided Fisher’s exact test.
Table 1.
Genes that do not return to baseline 2 months after recovery from
COVID-19
Gene ID Gene name Estimate (recovery - pre-infection) P-value
BH-adjusted P-value
ENSG00000236304 lncRNA 1.84 5.37E-08 9.20E-04
FSTL1 Follistatin Like 1 1.17 1.60E-07 1.37E-03
PTGS1 Prostaglandin-Endoperoxide Synthase 1 0.64 3.69E-07 2.11E-03
SPOCD1 SPOC Domain Containing 1 1.31 6.41E-07 2.53E-03
CXCL5 C-X-C Motif Chemokine Ligand 5 1.84 7.39E-07 2.53E-03
ALOX12 Arachidonate 12-Lipoxygenase, 12 S Type 1.33 1.24E-06 3.53E-03
PF4 Platelet Factor 4 1.53 1.71E-06 3.91E-03
ESAM Endothelial Cell Adhesion Molecule 1.18 1.83E-06 3.91E-03
MT-RNR1 Mitochondrially Encoded 12 S RRNA 1.00 2.61E-06 4.13E-03
MMD Monocyte To Macrophage Differentiation Associated 0.77 2.80E-06
4.13E-03
ENSG00000240093 lncRNA -0.71 2.82E-06 4.13E-03
MTURN Maturin, Neural Progenitor Differentiation Regulator Homolog 0.54
3.11E-06 4.13E-03
GNG11 G Protein Subunit Gamma 11 1.53 3.13E-06 4.13E-03
CAVIN2 Caveolae Associated Protein 2 1.15 4.51E-06 5.52E-03
DOK6 Docking Protein 6 1.59 5.00E-06 5.71E-03
LINC00989 lncRNA 1.25 5.76E-06 5.81E-03
SPARC Secreted Protein Acidic And Cysteine Rich 1.65 6.07E-06 5.81E-03
PF4V1 Platelet Factor 4 Variant 1 1.60 6.11E-06 5.81E-03
ABLIM3 Actin Binding LIM Protein Family Member 3 1.41 6.88E-06 6.07E-03
MFAP3L Microfibril Associated Protein 3 Like 0.74 7.09E-06 6.07E-03
CALD1 Caldesmon 1 1.78 7.79E-06 6.23E-03
ITGB5 Integrin Subunit Beta 5 1.10 8.00E-06 6.23E-03
LINC01750 lncRNA 2.00 1.13E-05 8.22E-03
PCSK6 Proprotein Convertase Subtilisin/Kexin Type 6 1.22 1.15E-05
8.22E-03
PVALB Parvalbumin 2.04 1.46E-05 9.97E-03
[220]Open in a new tab
Genes that are significantly differentially expressed (1% FDR, LMM) in
recovery versus pre-infection samples (n = 24 samples from 12
patients). Estimate represents the estimated difference in log CPM
(counts per million). BH=Benjamini-Hochberg. A complete table including
all genes tested can be found in Supplementary Data [221]1s.
Discussion
Here we performed serial blood sampling and longitudinal multi-omic
analysis of ESKD haemodialysis patients with COVID-19, enabling insight
into the pathogenesis of COVID-19 through examination of the temporal
evolution of molecular and cellular changes. ESKD patients are an
important group to study as they are at elevated risk of severe or
fatal disease^[222]25,[223]44. Despite the remarkable success of
vaccination programmes at the population level, ESKD patients display
impaired vaccine responses^[224]27,[225]28. In addition, the majority
of patients in our study were of non-white ethnicity, which is also a
risk factor for severe disease^[226]25.
Most studies of circulating proteins in COVID-19, including our
previous work, have used Olink immunoassay technology^[227]12–[228]14
or mass spectrometry^[229]15,[230]16. The broadest Olink assay system,
used in the study of Filbin et al.^[231]12, measures 1472 proteins,
while mass spectrometry is generally limited to reliable detection of
fewer than 1000 plasma proteins and lacks sensitivity for low abundance
proteins. A small number of studies have employed the aptamer-based
SomaScan v4 platform, that measures 4665 unique
proteins^[232]12,[233]17–[234]20. Here, we used the SomaScan v4.1,
which measures 6323 unique proteins, and complemented this with RNA-seq
and flow cytometry. Our study is strengthened by data from two cohorts
from different waves of the pandemic, and the comparison of samples
from before, during and after COVID-19 from the same individuals.
Plasma proteomics identified several pathways upregulated in COVID-19
related to host defence against viruses, including those previously
described in SARS-CoV-2. Our PBMC transcriptomic analysis identified
numerous pathways that are up-regulated in COVID-19. Many have been
identified in previous studies of COVID-19 in other populations without
ESKD, indicating the presence of common patterns of COVID-19-related
immunological abnormalities. Examples include type 1 interferon
signalling, the complement cascade, and genes reflecting
leukocyte-vascular interactions. Other up-regulated pathways included
‘Polo-like kinase mediated events’ and ‘Golgi-cisternae peri-centriolar
stack re-organisation’. Both are likely to reflect the extensive cell
division of immunocytes that occurs in COVID-19. For instance, the
pericentriolar stacks of Golgi cisternae undergo extensive
fragmentation and reorganization in mitosis. Similarly, polo-like
kinase is crucial for facilitating the G2/M transition. These findings
are consistent with the up-regulation of APC-Cdc20 mediated degradation
of Nek2A and other APC-Cdc20 related processes that we observed in the
proteomic data; Cdc20 is a protein that is key to the process of cell
division.
Transcriptomic and proteomic associations with severe COVID-19
converged on some unifying themes, including enrichment of pathways
related to histones, interferon response, granulopoiesis, clotting, TCR
activation and cell cycle processes. For example, up-regulation of
histone-encoding genes and elevated plasma histone protein levels were
both markers of COVID-19 severity. The increased expression of
histone-encoding transcripts may indicate increased immune cell
proliferation. In each cell cycle, sufficient histones are needed to
package the newly replicated daughter DNA strands, requiring tight
coupling of histone synthesis to the cell cycle^[235]45. Excess
histones within cells can trigger chromatin aggregation and block
transcription^[236]46. Thus, in severe COVID-19, viral hijacking of
cellular machinery may contribute to cellular damage through decoupling
of DNA synthesis and histone transcription. The preponderance of plasma
histone proteins in severe disease is likely to reflect the higher
levels of cell damage and death. The presence of histone proteins in
plasma, however, may represent more than just a marker of disease.
Histones are constituents of neutrophil extracellular traps (NETs)
which contribute to tissue injury in severe COVID-19. In addition,
histones constitute powerful damage associated molecular patterns
(DAMPs) and can perpetuate inflammation via ligation of toll-like
receptors and direct damage to epithelial and endothelial
cells^[237]47. Upregulation of pathways related to control of
transcription and translation was another feature of severe COVID-19
(Supplementary Fig. [238]8b), perhaps reflecting subversion of normal
cell biology by SARS-CoV-2. In keeping with this, studies of cells
infected with SARS-CoV-2 revealed alteration of processes including
translation, splicing and nucleic acid metabolism^[239]48,[240]49.
Modular analysis highlighted a rising neutrophilic gene signature as
the illness progressed in severe/critical patients, with enrichment of
reactive oxygen and nitrogen species pathways. This suggests prolonged
activation of neutrophils and their key effector pathways including NET
formation. This neutrophilic gene signature likely indicates the
presence of low-density granulocytes within the PBMC fraction. Data
from other infections suggest that phagocyte NADPH oxidase-derived
reactive oxygen species can be detrimental in acute viral infection;
mice lacking components of the NADPH oxidase have reduced disease
severity and inflammation in response to influenza and lymphocytic
choriomeningitis virus infection^[241]50–[242]52.
Cytokines and their receptors play a major role in the pathogenesis of
inflammatory diseases and are important targets of existing
drugs^[243]40. Longitudinal examination of plasma cytokines/chemokines
revealed divergence temporal trajectories between disease severity
strata, manifesting in several patterns (Fig. [244]5). For example, in
patients with a severe/critical disease course, IL11 was reduced early
on but increased later relative to more indolent disease
(Fig. [245]5d). IL11 is known to cause progressive
fibrosis^[246]53,[247]54, and the marked increases late in
severe/critical disease may have implications for the development of
pulmonary sequelae. Leptin, INHBA (inhibin A), and CCL22 showed the
opposite pattern (Fig. [248]5e). Leptin has roles in both cell
metabolism and immunity with many immune cells responding to leptin
directly via the leptin receptor, resulting in a pro-inflammatory
phenotype^[249]55. It is produced by adipocytes, so its elevation early
in severe/critical disease may be a read-out of higher body mass index,
which is a risk factor for severe COVID-19, or increased cell
metabolism/turnover. Its fall over time in severe/critical patients may
reflect weight loss and cell death. Whether leptin is also directly
influencing risk of severe disease through its immunological effects is
unclear. Inhibin-A progressively increased over time in mild/moderate
patients but fell in severe/critical patients. Inhibin-A negatively
regulates dendritic cell maturation and promotes a tolerogenic
phenotype^[250]56. Failure to upregulate it later in the disease course
may therefore contribute to deleterious inflammation. Similarly, CCL22
plays an important role in switching off inflammation. CCL22 promotes
dendritic cell-regulatory T cell interactions and CCL22 deficiency is
associated with excessive pathogenic inflammation in mice^[251]57.
Proteins in the type 1 interferon (IFN) pathway were higher in
severe/critical than mild/moderate patients early in disease
(Fig. [252]5c), suggesting a paradoxical role of this pathway in
COVID-19. While inherited or acquired deficiencies of IFN proteins
predispose to risk of severe COVID-19^[253]58,[254]59, our data suggest
that the picture may be more complex. Thus, IFNs may act as a
double-edged sword, with harm to the host from both insufficient
responses (leading to failure to control the virus) and from excessive
responses (resulting in immunopathology). While we cannot exclude the
possibility that increased IFNs is a consequence rather than a cause of
severe disease, their elevation very early in disease suggests this is
less likely. Another consideration is that the greater IFN response in
severe disease might reflect higher viral burden.
Using two distinct supervised learning methods, we observed that the
plasma proteome better captures disease severity than the PBMC
transcriptome. When supervised learning algorithms were trained on both
the proteomic and transcriptomic data simultaneously, plasma proteins
dominate the list of important biomarkers. There are several reasons
why this might be the case. Plasma is under strong homoeostasis:
derangement is a marker of loss of physiological control. Plasma
proteins may provide important read-outs of both pathogenesis and
tissue injury by reflecting the activity of cell types other than
PBMCs, such as neutrophils, endothelium and hepatocytes (a major source
of coagulation and complement proteins). In apparent contrast to our
findings, a study by Lee et al.^[255]9 involving immune cell
transcriptomics and plasma metabolomics using mass spectrometry (MS)
reported that the combination of transcriptome and metabolome provided
superior classification of severity. However, it is likely that the
difference between the study of Lee et al. and our findings relate to
what was measured (i.e., MS-based metabolomics versus measurement of
6,323 proteins).
Our integrated multi-omics analysis with MEFISTO revealed a single
factor that had a significantly different trajectory in severe/critical
versus mild/moderate disease (Supplementary Fig. [256]11a). This
parallels the findings of Su et al.^[257]8. While the specific methods
used differ, both our MEFISTO analysis and the integrative network
analysis by Su et al. identified a single factor in the data that was
highly related to COVID-19 severity and pro-inflammatory cytokines.
Comparison to other transcriptomic and plasma proteomic studies in
non-ESKD patients revealed broadly similar findings. Such inter-study
comparisons have inherent limitations as it is not possible to
distinguish whether study-specific findings are biological or are due
to differences in study design, statistical power, assay platforms and
other sources of non-biological variation. With these caveats in mind,
our transcriptomic findings were remarkably similar to those of the
COMBAT study^[258]7 (despite PBMC being measured in our study versus
whole blood in their study). There was lower concordance of our plasma
proteomic results to those of Filbin et al.^[259]12 than for the
transcriptomic comparison. This could reflect differences in study
design (unlike our controls, those of Filbin et al. presented with
acute respiratory distress) or technical differences (the study by
Filbin et al. used an earlier version of the SomaScan platform) but
could also be biological as it is known that circulating proteins are
affected by renal impairment^[260]60–[261]63.
One finding that may be specific to ESKD is the dynamic temporal
profile of the erythropoietin receptor (EPOR) in severe/critical
COVID-19 versus a more stable profile in mild/moderate disease. This is
likely to reflect the changes in erythropoietin responsiveness that
accompany critical illness and sepsis in ESKD patients. Cytokines
affect the EPO- mediated signalling pathway^[262]64 and inhibit the
expression and regulation of specific transcription factors involved in
the control of erythrocyte differentiation. For instance, high
concentrations of TNF-α or IFN-gamma cause the need for higher amounts
of EPO to restore the formation of erythrocyte colony forming
units^[263]65. In haemodialysis patients, inflammation decreases the
response to erythropoiesis stimulating agents, changing iron regulation
through hepcidin upregulation and facilitating haemolysis^[264]66. This
EPO-hyporesponsiveness (assessed in terms of haemoglobin) in the
setting of acute inflammation has been empirically demonstrated in a
large multi-national study^[265]67. Furthermore, patients with ESKD
rely on regular exogenous administration of erythropoiesis-stimulating
agents and this can be disrupted when patients are admitted to
hospital. Dysregulation of the erythropoietin pathway in severe
COVID-19 in ESKD patients may also be relevant to immune function since
erythropoietin is known to effect both innate and adaptive
immunity^[266]68.
A notable finding was the identification of plasma levels of LRRC15 as
a marker of COVID-19 severity (Fig. [267]6b). Longitudinal profiling
revealed that LRRC15 levels remain stable in those with a mild/moderate
clinical course but decrease over time in severe/critical illness
(Fig. [268]6c). Data mining of previous studies revealed similar
findings in two non-ESKD cohorts^[269]12,[270]19. Three recent
pre-prints using a variety of cell lines and approaches have identified
LRRC15 as a SARS-CoV-2 co-receptor^[271]41–[272]43. Using arrayed
transmembrane protein and pooled genome-wide CRISPR activation screens,
Shilts and colleagues demonstrated that the SARS-CoV-2 spike protein
interacts with LRRC15^[273]41. Both screens identified the interaction
and the CRISPRa screen identified LRRC15 and the established SARS-CoV-2
binding partner, ACE2, as the two most prominent interactors. This work
also showed that ACE2 and LRRC15 bind the C-terminal domain of the
spike protein, which contains the receptor binding domain, suggesting
that the two proteins may compete for spike protein binding. Loo et
al.^[274]43 performed a CRISPRa screen on HEK293T cells, also
identifying LRRC15 and ACE2 as the highest confidence SARS-Cov2
receptors. They propose that LRRC15 plays an actively inhibitory role,
binding SARS-CoV-2 but not allowing entry to the cell. They further
hypothesise that it does so in trans through its high expression on
fibroblasts rather than alveolar cells. Song and colleagues^[275]42
also used a CRISPRa approach to identify proteins that could bind the
SARS-CoV-2 spike protein to the A375 melanoma cell line. The screen
identified ACE2 and LRRC15, and further showed that the interaction
took place with the receptor binding domain of the spike protein.
Expression of LRRC15 on a HeLa cell line that expresses ACE2 inhibited
the entry of a SARS-CoV-2 spike pseudovirus. This paper also notes that
LRRC15 is expressed on different cells from those that express ACE2 and
proposes that LRRC15 inhibits virally entry in trans, acting as a decoy
and binding virions that cannot then enter cells via ACE2. Our data
provide in vivo human evidence to suggest LRRC15 may be important in
the host response to SARS-CoV-2, and are consistent with a model in
which a failure to up-regulate LRRC15 increases risk of severe COVID-19
disease because of the lack of a receptor that inhibits its entry to
cells.
A strength of our study was the availability of baseline pre-infection
samples for the Wave 2 cohort, as well as samples taken two months
after the acute COVID-19 episode. Leveraging this, we demonstrate that
there is chronic activation of gene expression related to vascular,
platelet and coagulation pathways for a prolonged period after clinical
resolution of disease. The elevated risk of thrombotic events during
acute COVID-19 is well-documented. In a large study encompassing both
hospitalised and non-hospitalised patients^[276]69, the risk of
pulmonary embolism (PE) and deep vein thrombosis (DVT) were 27-fold and
17-fold increased, respectively, in the seven days following diagnosis.
These risk ratios are much higher than those previously associated with
upper respiratory tract infections, suggesting unique features specific
to SARS-CoV-2 infection. The risk of arterial thrombosis was also
significantly increased, although smaller in magnitude than the risk of
venous thromboembolism (VTE). The pathophysiology underlying COVID-19
associated coagulopathy is complex and involves the convergence of
several pathways^[277]70. Invasion of ACE2-expressing epithelial cells
by SARS-CoV-2 results in down-regulation of ACE2 and increased
angiotensin II levels. This in turn leads to increased expression of
PAI1 which impairs breakdown of fibrin and promotes increased vascular
tone, via smooth muscle contraction. Endothelial cell activation,
complement activation, NETosis, hypoxia and cytokine/chemokine
secretion all promote coagulopathy through increases in tissue factor
and concomitant fibrin formation. Our data suggest that these pathways
remain dysregulated months after acute infection has resolved
(Fig. [278]7, Table [279]1). This is important given emerging evidence
indicating that the risk of thrombo-embolism extends beyond the acute
phase. Ho et al. showed that risk of a PE was 3.5-fold higher even in
the time window 28 to 56 days after diagnosis of COVID-19^[280]69. A
recent population-wide registry study revealed that following COVID-19
the risk of DVT and PE was significantly elevated for 70 and 110 days,
respectively^[281]71. Although VTE risk was greatest for those with
severe disease, even patients with mild disease had elevated VTE risk.
Our data provide a molecular basis that begins to explain this risk.
Among the genes up-regulated in convalescent samples compared to
pre-infection was platelet factor 4 (PF4). PF4 is expressed in
platelets and leucocytes. It is released from the alpha granules of
activated platelets, contributing to platelet aggregation. The
prolonged up-regulation of PF4 after COVID-19 is therefore likely to
contribute to a prothrombotic state. Of note, autoantibodies to PF4 are
the pathogenic entity in both vaccine-induced thrombotic
thrombocytopenia (VITT)^[282]72,[283]73 and heparin-induced
thrombocytopenia (HIT). PF4 becomes an autoantigen when it forms
complexes with adenoviral vaccine components or heparin respectively,
unmasking epitopes to which autoantibodies bind^[284]74. It will
therefore be interesting for future studies to investigate whether
autoantibodies to PF4 might contribute to post-COVID-19 thrombosis in
some patients. Whether the molecular abnormalities found in our study
also apply to more general patient populations without background ESKD
needs to be determined. Ongoing studies focusing on the sequelae of
COVID-19 are well placed to address this.
Our study has several limitations. It was a single centre study and so
lacked a truly independent external validation cohort. ESKD patients
have considerable multi-morbidity and deranged physiology, and our
findings may not all be generalisable to other patient populations. We
lacked a comparator group of ESKD patients with another viral infection
to delineate COVID-19 specific features. We studied peripheral blood;
while this can provide valuable information, it does not always reflect
processes at the site of tissue injury. We performed bulk RNA-seq on
PBMCs. Thus, transcriptomic signatures may reflect both changes in gene
expression and also alteration in the distribution of cell types within
PBMCs. We mitigated this issue through use of deconvolution methods and
flow cytometry, but future studies using single cell RNA-seq and
CITE-seq will provide further granularity. We did not have measurements
of viral load which would have aided interpretation of the magnitude of
host responses (e.g., interferon signalling). Finally, the convalescent
samples were taken relatively soon after clinical recovery: it will be
important for future studies to establish how long molecular
abnormalities persist.
In summary, we demonstrate dynamic transcriptomic, proteomic and
cellular signatures that vary both with time and COVID-19 severity. We
show that in patients with a severe clinical course there is increased
type 1 interferon signalling early in the illness, with increases in
pro-inflammatory cytokines later in disease. We identify plasma levels
of the proposed alternative SARS-CoV-2 receptor, LRRC15, as a marker of
COVID-19 severity. Finally, we show that immune cells display
dysregulated gene expression two months following COVID-19, with
upregulation of clotting-related genes. This may contribute to the
prolonged thrombotic risk post-COVID-19.
Methods
Patient cohorts and ethical approval
All participants were recruited from the Imperial College Renal and
Transplant Centre and its satellite dialysis units, London, United
Kingdom, and provided written informed consent prior to participation.
Study ethics were reviewed by the UK National Health Service (NHS)
Health Research Authority (HRA) and Health and Care Research Wales
(HCRW) Research Ethics Committee (reference 20/WA/0123: The impact of
COVID-19 on patients with renal disease and immunosuppressed patients).
Ethical approval was given. Study volunteers provided informed consent
and did not receive financial or other compensation for participating
in the study.
We recruited two cohorts of ESKD patients with COVID-19 (Fig. [285]1a).
All patients were receiving haemodialysis prior to acquiring COVID-19.
The first cohort (Wave 1) were recruited during the initial phase of
the COVID-19 pandemic (April–May 2020). Blood samples were taken from
53 patients with COVID-19 (Supplementary Table [286]1). Serial blood
sampling was carried out where feasible (Fig. [287]1b), given the
pressure on hospital services and the effects of national lockdown. We
also contemporaneously recruited 59 non-infected haemodialysis patients
to provide a control group, selected to mirror the age, sex and
ethnicity distribution of the COVID-19 cases (Supplementary
Fig. [288]1a–c).
The Wave 2 cohort consisted of 17 ESKD patients with COVID-19 infected
during the resurgence of cases in January–March 2021 (Supplementary
Table [289]2). These 17 individuals had all been recruited as part of
the COVID-19 negative control group during Wave 1, and so a
pre-infection sample collected in April/May 2020 (8–9 months preceding
infection) was also available. For the Wave 2 cohort, we systematically
acquired serial samples for all patients at regular intervals (every
2–3 days over the course of the acute illness) (Fig. [290]1c).
Additionally, for 12 of these 17 patients, we acquired convalescent
samples at approximately 2 months post the acute COVID-19 episode
(range 41-55 days from the initial sample). Convalescent samples were
unavailable for four patients who died and for one patient due to
logistical difficulties in sample collection.
To minimise variation related to the timing of dialysis, blood samples
were taken prior to commencing a haemodialysis session.
Clinical severity scoring
We assessed disease severity using a four-level ordinal score,
categorising into mild, moderate, severe, and critical, based on the
WHO clinical management of COVID-19: Interim guidance 27 May 2020. Mild
was defined as COVID-19 symptoms but no evidence of pneumonia and no
hypoxia. Moderate was defined as symptoms of pneumonia or hypoxia with
oxygen saturation (SaO[2]) greater than 92% on air, or an oxygen
requirement no greater than 4 L/min. Severe was defined as SaO2 less
than 92% on air, or respiratory rate more than 30 per minute, or oxygen
requirement more than 4 L/min. Critical was defined as organ
dysfunction or shock or need for high dependency or intensive care
support (i.e., the need for non-invasive ventilation or intubation). We
recorded disease severity scores throughout the illness, such that
samples from the same individual could have differing severity scores
according to the temporal evolution of the disease. We defined the
overall clinical course for each patient as the peak severity score
that occurred during the patient’s illness. Different downstream
analyses utilise either the severity at the time of sample (i.e., the
sample-level severity) or the overall clinical course (i.e., the
patient-level severity), as described in the relevant sections below.
PBMC collection protocol
Peripheral blood mononuclear cells (PBMCs) were obtained by density
gradient centrifugation using Lymphoprep (STEMCELL Technologies,
Canada). Approximately 20 ml of blood were diluted 1× with phosphate
buffered saline (PBS) with addition of 2% FBS and layered on top of
15 ml of Lymphoprep solution. The samples were then centrifuged at
800 g for 20 min at room temperature without break. PBMCs were
collected from the interface and washed twice with PBS/2%FBS. In total
2 million PBMCs were centrifuged down to form a pellet and resuspended
in 350 µl RLT buffer + 1% β-Mercaptoethanol (from Qiagen RNAeasy kit)
for RNA extraction. Remaining PBMCs were cryopreserved in 1 ml freezing
medium (FBS 10% DMSO) and stored in –80 °C freezer.
Plasma collection
5 ml of blood was collected in EDTA tubes and centrifuged at 1000 × g
for 15 min. Plasma was extracted and frozen at –80 °C.
RNA-seq of PBMCs
RNA extraction and sequencing were done at GENEWIZ facilities (Leipzig,
Germany). Total RNA was extracted from using RNeasy Mini kits (Qiagen)
as per the manufacturer’s instructions, with an additional purification
step by on-column DNase treatment using the RNase-free DNase Kit
(Qiagen) to remove any genomic DNA. Total RNA quality and concentration
was analysed using Agilent Tapestation (Agilent Tech Inc.). Samples
with RIN values ≥6.0 and ≥100 ng of total RNA were used to generate
RNA-seq libraries. RNA-seq libraries were made using NEBnext ultra II
RNA directional kit per the manufacturer’s instruction. Poly-A RNA was
purified using poly-T oligo-attached magnetic beads followed by
haemoglobin mRNA depletion using QIAseq FastSelect Globin Kit to remove
potential contaminating RNA from red blood cell. Then, first and second
cDNA strand synthesis was performed. Next, cDNA 3′ ends were adenylated
and adapters ligated followed by library amplification. The libraries
were size selected using AMPure XP Beads (Beckman Coulter), purified
and their quality was checked using a short sequencing run on MiSeq
Nano. Samples were randomized to avoid confounding of batch effects
with clinical status and multiplexed libraries were run on 29 lanes of
the Illumina HiSeq platform to generate approximately 30 million x
150 bp paired-end reads per sample.
Initial quality control and alignment was performed using the nf-core
RNA-seq v3.2 pipeline^[291]75 based on nextflow^[292]76, a workflow
management system. FastQC^[293]77 was used to evaluate and merge paired
reads prior to adapter trimming using Trimgalore^[294]78. We used
STAR^[295]79 to align reads to GRCh38 and htseq-count^[296]80 to
generate a counts matrix.
For the Wave 1 cohort, transcriptomic data were available for 179
samples from 51 COVID-19 positive ESKD patients (median 3 samples per
patient, range 1–8) (Supplementary Fig. [297]1d), plus 55 non-infected
ESKD patient samples. For the Wave 2 cohort (17 patients),
transcriptomic data were available for 90 samples collected during
acute COVID-19 infection (median of 6 samples per patient, range 3–7),
plus 17 pre-infection samples and 12 convalescent samples.
Prior to further analysis, genes with insufficient counts were removed
using edgeR’s filterByExpr function^[298]81; for differential
expression analyses, the ‘group’ argument was set to the main group of
interest. For all analyses, gene expression was TMM normalised^[299]82,
converted to counts per million (CPM) and log-transformed. We primarily
used ENSEMBL identifiers^[300]83, however for plots we report the HGNC
gene ID^[301]84 where available. For analyses that considered multiple
proteins simultaneously (PCA, WGCNA, MEFISTO, supervised learning), we
additionally: i) removed genes with low variance (33% of genes with the
lowest maximum absolute deviation) using the M3C package^[302]85; ii)
centred and scaled the data.
Plasma proteomics
We performed proteomics on EDTA plasma samples using the aptamer-based
SomaScan platform (Somalogic, Boulder, Colorado, USA). The SomaScan
v4.1 assay contains 7288 modified-aptamers (Somamers) that target human
proteins. Since more than one aptamer may target the same protein,
these 7288 aptamers map to 6347 unique proteins. 48 Somamers were
removed due to QC failure, so the final dataset contains 7240 Somamers
representing 6323 unique proteins. We annotated these proteins using
the Human Protein Atlas^[303]86; 4980 proteins were labelled as
intracellular, 1586 were annotated as membrane proteins and 1160 as
secreted (Supplementary Figure [304]23A). Many proteins were labelled
as both intracellular and as membrane or secreted, reflecting the
biology of protein storage and extra-cellular secretion/excretion
(Supplementary Figure [305]23B).
We report proteins by their corresponding HGNC gene ID^[306]84, which
provides a more standardised nomenclature compared to protein names and
allows direct comparison with the transcriptomic data.
Where multiple Somamers related to the same protein, we retained these
Somamers for univariate analyses such as differential abundance
analyses. However, for analyses that considered multiple proteins
simultaneously (PCA, WGCNA, MEFISTO, supervised learning), we selected
one Somamer at random to represent each protein. One COVID-19 positive
sample in the wave 2 cohort failed QC and was excluded from the
analyses. The expression values for each Somamer were inverse-rank
normalised prior to downstream analyses.
For the Wave 1 cohort, proteomic data were available for 86 samples
from 37 COVID-19 positive ESKD patients (median 3 samples per patient,
range 1-3), plus 53 non-infected ESKD patients. For the Wave 2 cohort
(n = 17 patients), following QC, proteomic data were available for 102
samples collected serially during acute COVID-19 infection (median of 6
samples per patient, range 5–7) and 16 pre-infection samples. For one
patient, a pre-infection plasma sample was unavailable.
Statistics and reproducibility
No statistical method was used to predetermine sample size. Analysts
were not blinded to COVID-19 status or severity.
Differential expression analyses: COVID-19 positive versus negative
We compared COVID-19 positive and negative patients using linear mixed
models (LMM), which account for serial samples from the same
individual^[307]87. Age, sex, and ethnicity were included as
covariates. A random intercept term was used to estimate the
variability between individuals in the study and thus account for
repeated measures. We performed differential expression analyses for
the transcriptomic data and the proteomic data. The regression model
for these analyses in Wilkinson-style notation was:
E ~ covid_status + sex + age + ethnicity + (1 | individual)
Where, E represents expression (gene or protein, depending on the data
type being analysed) and covid_status was a categorical variable
(COVID-19 positive or negative).
For differential expression of proteins, we applied LMM using the
lmerTest package^[308]88. Differential gene expression analysis was
performed using the same model formula, applied using the differential
expression for repeated measures (dream) pipeline^[309]89 in the
variancePartition package^[310]90. For all data types, we fitted LMM
using restricted maximum likelihood (REML) and calculated P-values
using a type 3 F-test, in conjunction with Satterthwaite’s method for
estimating the degrees of freedom for fixed effects^[311]88. Multiple
testing correction was performed using the Benjamini-Hochberg method
and a 1% FDR used for the significance threshold.
The Wave 1 cohort was analysed separately to the Wave 2 court. For Wave
1, we compared samples from COVID-19 positive ESKD patients to COVID-19
negative ESKD patients. For Wave 2, we compared samples from COVID-19
positive ESKD patients to samples from these patients taken
approximately 8 months prior to infection.
When reporting the number of differentially expressed proteins in the
text we refer to the number of unique proteins rather than the number
of significant Somamers.
Testing transcriptomic and proteomic features for association with COVID-19
severity
We performed a within-cases analysis, testing for the association of
gene expression with COVID-19 severity at time of sampling. We used the
four-level WHO severity rating (mild, moderate, severe, critical),
which could vary between samples from the same individual reflecting
the clinical status at the time the same was taken. We again used a
linear mixed model to account for samples from the same individual. The
regression model was:
[MATH: E~covid_severity+sex+age+ethnicity+(
1∣individual)
:MATH]
The covid_severity variable represents severity at the time of the
sample and was encoded using orthogonal polynomial contrasts to account
for ordinal nature of severity levels.
COVID-19 positive samples from the Wave 1 cohort were analysed
separately to those from the Wave 2 cohort.
The same approach was used for the proteomics data.
Gene set variation analysis
To identify pathways that were up- or down-regulated in COVID-19
positive versus negative samples, we applied gene set variation
analysis (GSVA)^[312]30. To define gene sets, we used the MSigDb C2
canonical pathways^[313]91; we discarded sets with less than ten genes.
We additionally included a gene set for the peripheral immune response
defined for patients with severe COVID-19^[314]31 and a set of type 1
interferons active in patients with systemic lupus erythematosus
(SLE)^[315]92. After reduction of genes into gene sets, we then
performed testing for dysregulated pathways using the same linear mixed
modelling approach as for the differential gene and protein expression
analyses. P-values were adjusted by Benjamini-Hochberg, with a
significance threshold of 1% FDR.
To dissect out the key molecules underpinning enriched pathways, we
examined the genes that comprise these pathway terms and identified
which of these featured most prominently in the differential gene
expression analysis.
We repeated this procedure for testing of association of pathways with
severity at the time of sample using the 4-level ordinal score.
We then applied the same approach to the proteomics data for the
COVID-19 positive versus negative analysis, and for testing
associations with COVID-19 severity at the time of sample.
Robust rank aggregation
The Wave 1 and Wave 2 cohorts were analysed separately for both the
differential expression analyses between COVID-19 positive and negative
samples and for the within-cases severity analyses. To identify the
associations that were most consistent between the Wave 1 and Wave 2
cohorts, for each analysis, we integrated the P-values for each cohort
using robust rank aggregation (RRA)^[316]93. This method identifies
features that are ranked higher than expected across multiple lists.
RRA generates a significance score analogous to a P-value; we -log10
transform these values such that a larger score indicates more
consistent associations between the Wave 1 cohort and the Wave 2
cohort. RRA was applied to the results of the transcriptomic, proteomic
and GSVA analyses comparing COVID-19 positive versus negative samples
from Wave 1 and Wave 2. Similarly, it was applied to the analyses
testing for association of molecular features with COVID-19 severity at
the time of sampling.
Modelling modular longitudinal trajectories
We examined the temporal trajectories of the transcriptome following
infection, by explicitly modelling molecular markers with respect to
time following COVID-19 symptom onset. We used a two-step approach.
Step 1. To aid biological interpretation, we first applied a dimension
reduction strategy using weighted gene correlation network analysis
(WGCNA)^[317]34 to identify modules of correlated molecular features.
For this analysis, we combined samples from the Wave 1 and Wave 2
cohorts. Additionally, since our goal was to perform longitudinal
analysis, we only selected patients who had been sampled at least three
times prior to 21 days following COVID-19 symptom onset. The default
implementation of WGCNA is not designed for use with non-independent
samples^[318]94, so we modified the analysis pipeline by generating a
correlation matrix using a repeated measures correlation metric
(rmcorr) that is appropriate for repeated measures^[319]95. We used
WGCNA’s pickSoftThreshold from similarity function to pick the minimum
soft-thresholding power that satisfied the minimum scale free topology
fitting index (R^2 > 0.85) and maximum mean connectivity (100). We
subsequently defined signed adjacency and topological overlap matrices
before applying average-linkage hierarchical clustering. We cut this
tree with a hybrid dynamic tree cutting algorithm, with the parameters
deepSplit = 4 and minClusterSize = 30^[320]96. Finally, we defined
eigengenes for each module and merged those with a distance less than
0.25. The eigen-genes provide a numerical representation for each
module of co-expressed genes.
We used the same approach to analyse the proteomic data.
Step 2. To examine the trajectory of each module over time, we fitted a
linear mixed model with time from symptom onset as an independent
variable and the eigengene (or eigenprotein in the case of proteomic
modules) as the dependent variable. Time was defined for each sample as
time from first symptoms; where date of first symptoms was not
available, we instead used date of first positive swab. Samples that
were taken more than 21 days from each individual’s baseline date were
excluded. We used R’s bs function to fit a polynomial spline of degree
two to model the expression of modules with respect to time from
baseline^[321]97. To test whether modules displayed different temporal
patterns according to the overall clinical course of COVID-19 (defined
as a binary variable indicating whether the peak WHO severity score was
mild/moderate or severe/critical), we included clinical course as a
covariate in the model, and an interaction term between time from
symptom onset and clinical course (TxCC).
The regression model used is displayed using Wilkinson-style notation
below.
[MATH: eigenexpression~clinical_course*time+sex+age+ethnicity+wave+(1∣individual)
:MATH]
We extracted the P-values for the TxCC term in this model and applied
Benjamini–Hochberg adjustment, using 5% FDR as the significance
threshold. A significant interaction effect for the TxCC term indicates
that the module has a different temporal profile in mild/moderate
versus severe/critical disease.
Additional WGCNA module annotation and association testing
To better understand the biological information reflected in the
transcriptomic and proteomic modules, we further characterised them
through a multi-pronged analytical strategy. We tested association of
eigen-genes and eigen-proteins with other variables. First, we tested
for the association of the modules with WHO severity at the time of the
sample using the LMM approach described above in subsection Testing
transcriptomic and proteomic features for association with COVID-19
severity, i.e.:
[MATH: E~covid_severity+sex+age+ethnicity+wave+(1∣individual)
:MATH]
Second, since PBMCs represent a mixed population of immune cells, we
investigated whether disease trajectory-associated transcriptomic
modules might reflect shift in cell type proportions. To this end, we
applied CIBERSORTx, a computational algorithm to impute immune cell
fractions from RNA-seq data (see subsection Cell fraction imputation
below). We then tested for correlations between these imputed immune
cell proportions and module eigengenes using LMM:
[MATH: eigenexpression~cell_fraction+sex+age+ethnicity+wave+(1∣individual)
:MATH]
Both these models included an additional fixed effect (‘wave’) to
reflect the cohort.
Third, we performed pathway enrichment analysis on the modules using
the R package clusterProfiler’s ‘enricher’ function^[322]98. Gene sets
were defined using MSigDB C2 canonical pathways^[323]91.
Lastly, to understand the relationship between the transcriptomic and
proteomic modules, we performed correlation analysis using LMMs.
5% FDR was used for statistical significance for these analyses.
Cell fraction imputation
We used CIBERSORTx^[324]99 to impute cell fractions from the normalised
bulk RNA-seq dataset. The program was run with default parameters We
inferred the cell fractions of 22 immune cell types in the isolated
PBMCs of each sample using the LM22 signature matrix file^[325]100.
Multi-omic longitudinal factor analysis with MEFISTO
MEFISTO^[326]101 is an extension of Multi-Omics Factor Analysis (MOFA)
that can exploit temporal relationships between samples to find factors
that change over time (from baseline). We used this method to find
joint factors of variation in the transcriptomic and proteomic
datasets. For the MEFISTO analysis, we used the same set of samples as
in the network analysis and applied the same pre-processing steps to
the data (see Methods—network analysis). Additionally, we removed genes
with the lowest maximum absolute deviation^[327]85 such that the number
of genes retained were equal to the number of unique proteins measured
(6,323) to avoid imbalance numbers of features between the
transcriptomic and proteomic data which can impact the MEFISTO
algorithm. Using the ‘slow’ convergence criterion, MEFISTO identified 8
factors that had a minimal variance explained of 1% in at least one
data modality.
We then applied the longitudinal model described earlier to test for an
interaction effect between time from first symptoms and clinical
course, with a latent factor identified by MEFISTO as the dependent
variable. The regression model used is displayed using Wilkinson-style
notation below:
[MATH: latent_factor~clinical_course*time+sex+age+ethnicity+wave+(1∣individual)
:MATH]
Longitudinal modelling of cytokines and cytokine receptors
We modelled the temporal profiles of 232 plasma proteins that fell
within the KEGG pathway ‘Cytokine-cytokine receptor interaction’. As
for the longitudinal analyses described earlier, we used a linear mixed
model with a time x clinical course interaction term.
[MATH: P~clinical_course*time+sex+age+ethnicity+wave+(1∣individual) :MATH]
P values for the time x clinical course interaction were extracted and
adjusted for multiple testing with the Benjamini-Hochberg procedure,
with significance threshold of 5% FDR.
Supervised learning
The goal of this analysis was to predict clinical severity from the
molecular features (transcriptomic, proteomic or both). We performed
supervised learning using the R caret framework^[328]102; caret uses
the randomForest package to fit random forest models and
glmnet^[329]103 to fit lasso models. For this analysis, we only
included samples on which both transcriptomics and proteomics had been
performed. We then selected the earliest sample for each individual at
which they had reached their peak COVID-19 WHO severity score, so that
there was one sample per patient. We then categorised the clinical
severity score corresponding to each sample into a binary variable such
that patients with a WHO severity score of mild or moderate were
considered mild/moderate and those with a WHO score of severe or
critical were considered severe/critical. This resulted in n = 37
mild/moderate samples and n = 14 severe/critical samples.
We trained models using Monte Carlo cross-validation for: (i) the
plasma proteomic data alone (6323 features); (ii) the PBMC RNA-seq data
alone (12,225 features); and (iii) the combined proteomic and RNA-seq
datasets. The first step in this training process was to create 200
random partitions of the data, such that 80% of the data was used to
train the model in each resample and 20% was retained as a validation
set. In each resample, we calculated the area under the curve (AUC) of
the receiver operating characteristic (ROC) curve. We then calculated
confidence intervals for the 200 AUC-ROC values generated for each
model and feature type.
The random forest model’s parameters were kept constant at 500 trees
and the mtry value (number of proteins randomly sampled as candidates
at each node) was calculated as the square root of the number of
features. After cross-validation, we fitted a final random forest model
using the entirety of the dataset. We extracted important features from
this model using the R randomForestExplainer package, based on the
accuracy decrease metric (the average decrease in prediction accuracy
upon swapping out a feature). For the lasso model, the lambda value
that maximised the mean AUC-ROC during cross-validation was selected.
We recorded the features selected by the lasso model in each data
resample; feature importance was subsequently defined as the number of
models in which each feature had a non-zero coefficient. The feature
importance metrics from both models were scaled by dividing their
values by the maximum value, such that the most important feature has
an importance metric of 1.
Differential gene expression analysis: pre-infection versus recovery samples
For the 12 individuals in the Wave 2 cohort for whom we collected a
convalescent sample (approximately 2 months post-infection; range 41-55
days from the initial sample), we performed a differential gene
expression analysis comparing these samples to the paired pre-infection
samples using LMM, implemented with the R dream package^[330]89. Age,
sex and ethnicity were included as covariates and a random intercept
term used to account for the paired nature of the samples. Statistical
significance was defined as 1% FDR. To identify enriched pathways in
the list of differentially expressed genes, we performed
overrepresentation analyses using the same approach as described above
for annotating the WGCNA modules.
Flow cytometry
Flow cytometry analysis was performed on a subset of the Wave 2 PBMC
samples. We examined samples taken during acute COVID-19 from 17
patients (of whom 9 patients had a mild/moderate clinical course and 8
patients with severe/critical course), and pre-infection samples from
15 of these same patients. 12 samples with low cell number recovery
(less than 10,000 PBMCs) were excluded from the analysis.
Cryopreserved PBMCs were thawed in humidified 37 °C, 5% CO[2] incubator
and resuspended in thawing medium (RPMI, 20% FBS). PBMCs were washed
twice with PBS and stained with Zombie Yellow LIVE/DEAD (Biolegend)
following the manfacturer’s protocol to exclude dead cells. Then, PBMCs
were washed twice with FACS buffer (1% BSA, 0.09% Azide, 1 mM EDTA),
and Fc receptors were blocked with Human TruStain Fc Receptor Blocking
Solution (Biolegend). Then, surface staining were performed using the
selected fluorochrome-conjugated monoclonal antibodies detailed in
Supplementary Table [331]5 for 20 min at 4 °C. Following incubation,
cells were fixed and permeabilized using the eBioscience™
Foxp3/Transcription Factor Staining Buffer Set (Invitrogen) for
intracellular staining. Cells were incubated with selected antibodies
or isotype controls for 30 min at 4 °C and resuspended in FACs buffer
for analysis. Aurora Spectral Flow Cytometry (Cytek®) and FlowJo
software, version 10 (Tree Star Inc. Ashland, OR, USA) were used for
analysis of all samples. The gating strategy used for flow cytometry is
shown in Supplementary Figs. [332]27-[333]28. Prior to gating cell
population of interest, cell debris was removed based on FCS/SSC and
only live cell (BV570 Zombie Yellow - negative) populations were
analysed.
Flow cytometry statistical analysis
Flow cytometry statistical analysis was performed with GraphPad Prism
(v9). To evaluate decomposition performance by CIBERSORTx analysis,
cell proportion estimates were compared to cell percentages from Flow
Cytometry analysis using Pearson’s correlation analysis (n = 68
samples). We were unable to examine for the presence of LDGs using our
flow cytometry data since this was performed on cryopreserved PBMCs and
LDGs do not survive the freeze-thaw process (whereas we performed
transcriptomics on RNA extracted from fresh PBMCs). We observed
significant correlation of estimated cell proportions from CIBERSORTx
analysis compared to proportions measured by flow cytometry for all
other cell types (Pearson r > 0.4045, two-tailed p-value < 0.0001).
For severity analysis, one sample per patient was selected at a time
that coincided with the expected spike in the inflammatory response
(nearest sample to day 7 after symptom onset; no more than ±72 h).
Patients were classified according to the overall peak illness severity
into two groups (mild/moderate = 9, severe/critical = 8). Change of
cell proportion across time were accessed by grouping samples into 4
days interval post COVID-19-positive test. One-way ANOVA was used to
calculate significant differences between multiple groups with Dunnet’s
correction for multiple-way comparisons. Significance is based upon
p-value < 0.05.
Reporting summary
Further information on research design is available in the [334]Nature
Portfolio Reporting Summary linked to this article.
Supplementary Information
[335]Supplementary Information^ (9.1MB, pdf)
[336]Peer Review File^ (3.5MB, pdf)
[337]41467_2022_35454_MOESM3_ESM.pdf^ (60.3KB, pdf)
Description of Additional Supplementary Files
[338]Supplementary Data^ (9.7MB, xlsx)
[339]Reporting Summary^ (8.3MB, pdf)
Acknowledgements