Graphical abstract
graphic file with name fx1.jpg
[47]Open in a new tab
Highlights
* •
Sequential multi-omics profiling of plasma during acute infection
and convalescence
* •
Inflammation, platelet degranulation, and metabolic perturbations
at convalescence
* •
Three distinct disease phenotypes based on unsupervised clustering
of omics profile
* •
A panel of 20 cytokines and metabolites predicted adverse outcomes
after discharge
__________________________________________________________________
Wang et al. conduct a comprehensive multi-omics analysis to identify
pathways differentially altered during acute SARS-CoV-2 infection and
convalescence. This study provides clues into the heterogeneity of the
post-acute COVID-19 symptoms and unveils potential therapeutic targets
for long COVID.
Introduction
An estimated 10%–30% of individuals convalescing from SARS-CoV-2
infection continue to experience post-acute sequelae of COVID-19 (PASC)
or long COVID, characterized by fatigue, sleep disturbance, confusion,
and dyspnea, alongside many other debilitating symptoms resulting in
significant impairments in their quality of life.[48]^1^,[49]^2^,[50]^3
The cellular receptor of SARS-CoV-2, angiotensin-converting enzyme 2
(ACE2), is ubiquitously expressed, thus facilitating multisystemic
manifestations of acute COVID-19 and long COVID.[51]^4^,[52]^5
Epidemiological studies have found that the risk of developing new
diagnoses of pulmonary, cardiovascular, gastrointestinal, metabolic,
psychiatric, and nervous system disorders was greatly elevated,
associated with higher hospitalization rates and worse prognoses at
6 months post infection.[53]^6^,[54]^7^,[55]^8 Although female sex,
pre-existing comorbidities, and severity of the acute infection have
been proposed as risk factors for PASC, the underlying cause behind
such heterogeneity in disease sequelae is not yet
understood.[56]^2^,[57]^9
Several pathophysiological mechanisms for PASC have been proposed,
including the presence of viral reservoirs, persistent inflammation,
induced autoimmunity, tissue injury, endothelial dysfunction, or
impaired energy metabolism.[58]^10^,[59]^11^,[60]^12 Additionally, a
triad of cytokines (interleukin-1 beta [IL-1 β], interleukin-6 [IL-6],
and tumor necrosis factor [TNF]) correlated with ongoing PASC 8 months
post infection.[61]^13 Similarly, patients with PASC had sustained
inflammatory responses reflected through elevation in type I (IFN-β)
and type III (IFN-λ1) interferon levels accompanied by persistent
activation of monocytes and plasmacytoid dendritic cells.[62]^14
Nevertheless, given the complex overlapping pathophysiology between
acute and long COVID phases, it remains a significant challenge to
delineate specific molecular features underpinning PASC development to
guide the discovery of prognostic biomarkers and therapeutic targets.
In this study, we leveraged a systems-based multi-omics approach to
extensively characterize and contrast the plasma cytokines, proteome,
and metabolome of 117 individuals during acute infection and at the
6-month convalescence phase compared to non-infected healthy controls.
Additionally, we performed unsupervised clustering analyses for
unbiased disease phenotyping accompanied by machine learning of
integrated clinical data to identify predictive biomarkers associated
with adverse outcomes following acute infection. Importantly, our study
revealed several therapeutic pathways that could be explored to
minimize the impact of long COVID.
Results
Characteristics of participants
We examined 117 individuals prospectively enrolled from designated
COVID-19 wards (n = 97) and intensive care units (n = 20) between
October 15, 2020, and June 29, 2021, with repeat blood sampling
performed over a median duration of 6.3 (interquartile range [IQR]:
6.0–7.1) months ([63]Figure 1A). During the repeat sampling, patients’
PASC symptomology, health-related quality-of-life scores, and clinical
outcomes were captured using self-reported questionnaires and a
detailed review of their electronic medical records ([64]Table S1). The
most frequently reported PASC symptoms were fatigue (66 individuals;
56.4%), general weakness (49 individuals; 41.9%), shortness of breath
(SOB, 47 individuals; 40.2%), cognitive impairment (39 individuals;
33.3%), and mood disturbance (39 individuals, 33.3%, [65]Figure 1B).
For comparison purposes, we classified individuals into three PASC
severity groups based on their symptom burden from recovered (no PASC
symptoms, n = 30), mild (≦3 symptoms, n = 32), to severe (>3 symptoms,
n = 55) categories ([66]Table 1; [67]Table S2). The overall cohort was
characterized by a median age of 62 (IQR: 53–73) years, with 66 men
(56.4%) and a high prevalence of diabetes (49 individuals; 41.9%) and
hypertension (62 individuals; 53.0%). Clinical characteristics were
similar among the PASC severity groups except for smoking, where
current or previous smokers were overrepresented in the severe category
(31 individuals; 56.4%, p = 0.005). In contrast, the SF-12 score (113
[IQR: 109–115] vs. 96 [IQR: 87–106] vs. 80 [IQR: 69–90] for recovered,
mild, and severe groups, respectively, p < 0.001) and EuroQol visual
analog scale (90 [IQR: 85–92] vs. 83 [IQR: 75–90] vs. 63 [IQR: 50–76],
p < 0.001) tracked closely with PASC severity ([68]Figure 1C).
Furthermore, a greater proportion of individuals who experienced an
adverse outcome (all-cause mortality or re-hospitalization) following
discharge from acute infection were in the severe PASC category (p =
0.03, [69]Figure 1D; [70]Table S2). Therefore, our results reveal that
as patients transition from acute infection to convalescence, they
retain a persistently high symptom burden and risk for adverse
outcomes.
Figure 1.
[71]Figure 1
[72]Open in a new tab
Post-acute sequelae of COVID-19 (PASC) symptomatology and
health-related quality of life of participants
(A) Overview of study design and analysis. Figure was created using
[73]BioRender.com.
(B) PASC symptom prevalence at convalescence; bars represent
self-reported symptoms in percentages.
(C) Total SF-12 score and EuroQuol visual analog scale (EQ-VAS) among
PASC severity groups during convalescence. Asterisks indicate
statistical significance by Mann-Whitney U test with Benjamini-Hochberg
correction between the severity groups for each quality-of-life measure
as follows: ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.
(D) Association of PASC severity with adverse clinical outcomes
(composite of all-cause mortality and re-hospitalization) following
discharge from acute infection. Log rank p = 0.03.
Table 1.
Baseline clinical characteristics
Characteristic Entire cohort (n = 117) Recovered (n = 30) Mild (n = 32)
Severe (n = 55) Healthy controls (n = 28)
Demographics
__________________________________________________________________
Age (years) 62 (53–73) 64 (53–75) 62 (44–73) 62 (57–73) 55 (52–59)
Male 66 (56.4) 19 (63.3) 20 (62.5) 27 (49.1) 16 (57.1)
BMI (kg/m^2) 28.2 (24.3–34.1) 27.2 (24.5–32.4) 26.6 (24.3–30.6) 29.9
(23.9–37.4) –
__________________________________________________________________
Race or ethnic group
__________________________________________________________________
White 81 (69.2) 19 (63.3) 19 (59.4) 43 (78.2) –
Asian 19 (16.2) 4 (13.3) 7 (21.9) 8 (14.5) –
Hispanic 6 (5.1) 4 (13.3) 1 (3.1) 1 (1.8) –
Mixed or unknown 11 (9.4) 3 (10.0) 5 (15.6) 3 (5.5) –
Current or previous smoker 48 (41.0) 8 (26.7) 9 (28.1) 31 (56.4) –
__________________________________________________________________
Presentation
__________________________________________________________________
Fever 47 (40.2) 13 (43.3) 15 (46.9) 19 (34.5) –
Myalgia 34 (29.1) 7 (23.3) 12 (37.5) 15 (27.3) –
Cough 74 (63.2) 22 (73.3) 20 (62.5) 32 (58.2) –
Dyspnea 81 (69.2) 23 (76.7) 22 (68.8) 36 (65.5) –
Diarrhea/nausea 45 (38.5) 10 (33.3) 13 (40.6) 22 (40.0) –
Abnormal CXR 87 (74.4) 22 (73.3) 23 (71.9) 42 (76.4) –
__________________________________________________________________
Medical history
__________________________________________________________________
Diabetes 49 (41.9) 11 (36.7) 10 (31.3) 28 (50.9) –
Hypertension 62 (53.0) 16 (53.3) 14 (43.8) 32 (58.2) –
COPD 20 (17.1) 3 (10.0) 6 (18.8) 11 (20.0) –
CKD 16 (13.7) 2 (6.7) 3 (9.4) 11 (20.0) –
CVD 17 (14.5) 3 (10.0) 4 (12.5) 10 (18.2) –
Cancer 15 (12.8) 3 (10.0) 1 (3.1) 11 (20.0) –
__________________________________________________________________
Management
__________________________________________________________________
Supplemental O[2] 98 (83.8) 27 (90.0) 22 (68.8) 49 (89.1) –
Intubation 20 (17.1) 5 (16.7) 3 (9.4) 12 (21.8) –
Dexamethasone 102 (87.2) 28 (93.3) 26 (81.3) 48 (87.3) –
Antibiotic 88 (75.2) 23 (76.7) 23 (71.9) 42 (76.4) –
Tocilizumab 17 (14.5) 3 (10.0) 6 (18.8) 8 (14.5) –
Remdesivir 6 (5.1) 1 (3.3) 1 (3.1) 5 (9.1) –
SARS-CoV-2 vaccine 6 (5.1) 1 (3.3) 3 (9.4) 2 (3.6) –
__________________________________________________________________
SARS-CoV-2 strain
__________________________________________________________________
Original 85 (72.6) 22 (73.3) 24 (75.0) 39 (70.9) –
B.1.1.7 25 (21.4) 6 (20.0) 6 (18.8) 13 (23.6) –
P.1 5 (4.3) 2 (6.7) 1 (3.1) 2 (3.6) –
B.1.351 1 (0.9) – 1 (3.1) – –
B.1.617.2 1 (0.9) – – 1 (1.8) –
__________________________________________________________________
Outcomes
__________________________________________________________________
Adverse outcome 36 (30.8) 5 (16.7) 8 (25.0) 23 (41.8) –
[74]Open in a new tab
Abbreviations: BMI, body mass index; CXR, chest X-ray; COPD, chronic
obstructive pulmonary disease; CKD, chronic kidney disease; CVD,
cardiovascular disease (includes previous history of myocardial
infarction, coronary artery disease, heart failure, atrial and
ventricular arrhythmia).
Temporal changes in cytokines, proteome, and metabolome between acute
infection and convalescence
Cytokine profiling, proteomics, and metabolomics analyses were
performed to identify the temporal changes in plasma molecular features
between acute infection and convalescence. A total of 47, 274, and 635
cytokines, proteins, and metabolites were measured, respectively
([75]Table S2). Subsequently, principal component analysis (PCA) was
performed for all samples, which showed altered molecular profiles of
individuals at convalescence compared to the acute infectious phase,
both of which differed from age- and gender-matched healthy controls
([76]Figure 2A). In comparing acute infection and healthy controls, 231
molecules were significantly altered, consisting of 24 cytokines, 63
proteins, and 144 metabolites ([77]Figure 2B; [78]Table S3). Moreover,
we detected 157 differentially expressed molecules (DEMs) between the
acute infection and convalescence phase, composed of eight cytokines,
34 proteins, and 115 metabolites ([79]Figure 2C; [80]Table S3).
Finally, 219 DEMs were identified between convalescence and healthy
controls, which included nine cytokines, 31 proteins, and 169
metabolites ([81]Figure 2D; [82]Table S3).
Figure 2.
[83]Figure 2
[84]Open in a new tab
Differentially expressed molecules (DEMs) associated with acute and
convalescence phase
(A) Principal component analysis (PCA) utilizing proteomics,
metabolomics, and cytokines (log10 transformed) indicates that
principal components 1 and 2 capture 30.2% and 7.5% of the variance
between participants, respectively.
(B) Volcano plot comparing the DEMs between acute infection and healthy
control; The horizontal dashed line indicates the adjusted p value
cutoff (0.05), and two vertical dashed lines indicate the fold-change
cutoff (1.5). Orange dots indicate differentially expressed cytokines,
purple dots indicate differentially expressed metabolites, and green
dots indicate differentially expressed proteins.
(C) Volcano plot comparing the DEMs between convalescence and acute
infection.
(D) Volcano plot comparing the DEMs between convalescence and healthy
control.
(E) Heatmap showing the top 100 molecules with the most significant p
values comparing healthy control with acute and convalescence phases
using ANOVA test on log10 transformed data.
(F) Box and whisker plots illustrating the levels of thrombospondin-1,
glutamine, serotonin, and sCD40L. Asterisks indicate statistical
significance by Mann-Whitney U test with Benjamini-Hochberg correction
between groups for each molecule as follows: ∗p < 0.05, ∗∗p < 0.01,
∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.
The top 100 molecules with the most significant p values were shown by
a heatmap illustrating the trajectory of their change in individuals at
the acute and convalescence phase relative to healthy controls
([85]Figure 2E). A substantial proportion of molecules returned to
comparable plasma levels as healthy controls during convalescence.
However, some of these molecules remained altered and may be implicated
in PASC development. Specifically, levels of thrombospondin-1, an
important activator of TGF-β, which is a central mediator of wound
healing, angiogenesis, and tissue fibrosis, increased progressively in
samples from healthy controls to acute and convalescence phases
([86]Figure 2F).[87]^15 In contrast, glutamine levels remained
depressed in acute and convalescence phases compared to healthy
controls, representing a metabolic characteristic of COVID-19 that is
associated with greater disease severity.[88]^16 Moreover, serotonin
levels remained elevated in acute and convalescence phases, with even
higher levels in convalescence, possibly related to persistent
activation of platelet degranulation. Likewise, the levels of soluble
CD40 ligand were elevated in acute infection, and its plasma levels
continued to rise during convalescence ([89]Figure 2F), which may
reflect ongoing viral-mediated inflammation as robust T cell responses
have been demonstrated several months post SARS-CoV-2
infection.[90]^17^,[91]^18 Therefore, our results reveal persistent
alterations in multiple pathways illustrative of the pathological
signatures during convalescence with a dominant pattern of metabolic
changes.
Differential pathways altered during acute infection and convalescence
Integrated canonical pathway analyses of the DEMs between acute
infection and healthy controls identified pathways associated with
stimulation of immune cells and activation of IL-1, IL-6, TNF, and
toll-like receptor 3 signaling ([92]Figures 3A and 3B; [93]Table S4).
Notably, proteins involved in the platelet degranulation, acute phase
response, and complement system cascades strongly correlated with
immune cell counts and laboratory disease markers during acute
infection ([94]Figure S1; [95]Table S5). Levels of P-selectin,
thrombospondin-1, fibronectin, and coagulation factor XIII were
positively correlated with platelet and lymphocyte counts, which
supports the close interplay between platelets and adaptive immunity in
acute SARS-CoV-2 infection.[96]^19 Additionally, C-reactive protein and
lipopolysaccharide-binding proteins were positively correlated with
high-sensitivity troponin, representing a potential signature of
inflammation-induced cardiac injury. Moreover, metabolic pathways
altered during acute infection included arginine biosynthesis,
glutamate metabolism, and sphingolipid metabolism, in line with
findings from prior multi-omics studies ([97]Figure 3C;
[98]Table S6).[99]^20^,[100]^21^,[101]^22
Figure 3.
[102]Figure 3
[103]Open in a new tab
Pathways dysregulated during acute infection and convalescence
(A) Enriched Gene Ontology (GO) terms of differentially expressed
proteins and cytokines on Metascape for acute COVID-19 compared to
healthy controls, colored based on p values.
(B) Top regulatory effects of molecules and functions in acute COVID-19
based on Ingenuity Pathway Analysis (IPA).
(C) Pathways associated with metabolic alterations in acute COVID-19
compared to healthy controls. Pathway impact indicates the sum of
importance of the altered metabolites in the impacted pathway based on
pathway topology; the −log(P) are test statistics for quantitative
pathway enrichment analysis based on concentration differences between
groups. Notable impacted pathways are above the dashed lines (impact
>0.2 and −log(P) > 20).
(D) Enriched GO terms of differentially expressed proteins and
cytokines on Metascape for convalescence phase compared to healthy
controls, colored based on p values.
(E) Top regulatory effects of molecules and functions during
convalescence based on Ingenuity Pathway Analysis (IPA).
(F) Pathways associated with metabolic alterations during convalescence
compared to healthy controls.
To elucidate the pathogenesis and identify therapeutic targets for long
COVID, we next investigated the molecular changes that took place
6 months following acute infection. Our findings illustrate a
pronounced and persistent immune activation compared to healthy
individuals characterized by acute phase response, IL-1, TNF, and IL-6
pathways resembling the acute infectious phase ([104]Figures 3D and 3E;
[105]Table S4). In addition, we found dysregulation of several key
metabolites during convalescence central to glucose metabolism,
including 2-oxoglutaric acid, ornithine, spermidine, and allantoin,
implicated in the activation of cellular pathways such as sirtuin 6 and
glucose-6-phosphate dehydrogenase ([106]Figure S2A). In accordance with
findings in patients during acute infection, we found persistent
dysregulation in platelet degranulation and blood coagulation during
convalescence. Additionally, the omics profile at convalescence was
associated with the activation of cell migration and growth factor
signaling pathways as evidenced by elevated levels of platelet
endothelial cell adhesion molecule 1 (also known as CD31),
thrombospondin-1, fibroblast growth factor 2, and vascular endothelial
growth factor-A ([107]Figure S2B).
Although the molecular signature during convalescence bears
similarities to acute infection, there is a pronounced downregulation
in the extent of acute phase response, IL-1, and IL-6 signaling, with
upregulation of liver X receptor signaling, suggesting a shift toward
resolution and repair ([108]Figure S3; [109]Table S4). Moreover,
arginine biosynthesis, cysteine and methionine metabolism, and the TCA
cycle were differentially affected between acute and convalescence
phases ([110]Table S6). Metabolites involved in the TCA cycle, such as
pyruvate, malate, cis-aconitate, and 2-oxoglutaric acid, were further
elevated at the convalescence phase compared to acute infection.
Strikingly, perturbation in these pathways was observed even in
individuals (n = 30) reporting no PASC symptoms and return of function
to pre-COVID-19 states, suggesting persistent underlying pathological
processes despite symptom resolution ([111]Figure S4; [112]Table S3).
As such, the ongoing dysregulation of inflammatory, cellular signaling,
and metabolic pathways during convalescence may substantially increase
future complications and healthcare utilization even in seemingly
recovered individuals.
Association between proteome and metabolome signatures with clinical
parameters
Identifying molecules associated with self-reported symptoms and
quality-of-life indices can provide insights into PASC pathogenesis.
Therefore, we performed logistic regression analysis between DEMs at
convalescence with self-reported PASC symptoms adjusting for clinical
parameters previously described to be associated with long COVID
([113]Table S5).[114]^23 Molecules with three or more significant
associations (p < 0.05) were displayed in a heatmap ([115]Figure 4A).
We observed a set of triglycerides displaying a prominent negative
association with nausea and fatigue but were positively associated with
tachycardia ([116]Figure 4A). Plasma cystatin C and neutrophil
gelatinase-associated lipocalin are markers of renal function and are
positively associated with SOB, fatigue, nausea, and adverse outcomes,
suggesting potential kidney involvement in certain individuals with
PASC.[117]^24 Moreover, the gut-derived valeric acid was inversely
associated with nausea, fatigue, muscle aches, and SOB. Reduced fecal
concentrations of short-chain fatty acids such as valeric acid were
observed beyond 30 days after disease resolution in patients with
severe COVID-19 and were hypothesized to reflect prolonged
SARS-CoV-2-mediated disruption in the gut microbiome.[118]^25 Notable
metabolites whose levels negatively correlated with the SF-12 and
EQ-VAS score were 4-hydroxyproline and 2-hydroxyisobutyric acid
([119]Figure 4B). Of particular interest was the metabolite taurine,
whose levels were negatively associated with symptoms of nausea, mood
disturbance, cognitive impairment, SOB, general weakness, and adverse
outcomes. Furthermore, taurine and serotonin levels were also
positively correlated with quality-of-life scores (SF-12 and EQ-VAS),
which is in agreement with the ability of these molecules to induce
positive emotions and elevate mood ([120]Figures 4A and
4B).[121]^26^,[122]^27 Downregulation of glycerophospholipids,
sphingolipids, phosphatidylcholines, and fatty acids has been
demonstrated during the acute SARS-CoV-2 infection, particularly in
severe patients.[123]^20^,[124]^28 When we examined the changes in
molecules between convalescence and acute phases, recovery in lipid
levels was associated with lower PASC symptom burden, including nausea,
fatigue, and general weakness, alongside greater quality-of-life scores
([125]Figure S5). Thus, various molecular features involved in
neurological, immunological, gastrointestinal, and metabolic processes
are associated with symptoms and quality of life in long COVID.
Figure 4.
[126]Figure 4
[127]Open in a new tab
Association between molecular features with PASC symptoms and
health-related quality of life during convalescence
(A) Heatmap of PASC symptoms and adverse outcome associated with
differentially expressed molecules (DEMs) from multi-omics profile
between convalescence and healthy controls adjusted for age, gender,
diabetes, acute COVID-19 treatment (dexamethasone, antibiotics,
tocilizumab, remdesivir), WHO Ordinal Scale, and vaccination status
with at least three significant associations, p < 0.05.
(B) Heatmap based on significant associations between health-related
quality-of-life indices (total SF-12 score and EQ-VAS scale) and DEMs
from multi-omics profile between convalescence and healthy controls
adjusted for age, gender, diabetes, acute COVID-19 treatment
(dexamethasone, antibiotics, tocilizumab, remdesivir), WHO Ordinal
Scale, and vaccination status.
Unsupervised clustering identifies three distinct phenotypes during
convalescence
We next performed unsupervised clustering for individuals based on the
changes in concentrations of molecules (cytokines, proteins, and
metabolites) between acute and convalescence phases. First, we utilized
PCA as a dimensionality reduction approach before applying the k-means
algorithm ([128]Figure S6A). However, the reduction of variables was
insufficient concerning the clustering efficacy as determined by the
silhouette coefficient, and a non-linear approach was performed using
an autoencoder ([129]Figure S6B). Utilizing k-means on autoencoders
yielded three phenotypically distinct clusters based on their inherent
molecular similarities ([130]Table S7). Consistent with our PCA plot,
the clusters identified based on molecular features did not align with
the PASC severity categories (based on the number of symptoms), as
individuals from each severity group were evenly distributed among
clusters A to C ([131]Figure S6C). Most individuals were captured by
cluster A (n = 57, 48.7%), characterized by the absence of significant
deviation in molecular profile and had the least number of established
PASC risk factors. In comparison, cluster B was characterized by a
predominant triglyceride and organic acid signature ([132]Figure 5A),
whereas cluster C exhibited a more heterogeneous composition of
cytokines, proteins, and metabolites ([133]Figure 5B). Moreover,
compared to cluster B, cluster C had a higher proportion of women and
more frequently reported symptoms such as insomnia, palpitation, SOB,
general weakness, and fatigue ([134]Figure 5C). Interestingly, the top
network identified based on molecules from cluster C was enriched in
the HIF-1⍺ pathway, which regulates the cellular response to hypoxia
and metabolic adaptations while exhibiting sex differences in
activation,[135]^29 thus providing a possible molecular basis for the
gender variations observed in PASC epidemiology ([136]Figure 5D;
[137]Table S4). Importantly, increased plasma levels of the gut
microbiota-derived metabolites trimethylamine N-oxide (TMAO) and
phenylacetylglutamine in cluster C are associated with worsening
symptoms and adverse outcomes consistent with the association of
persistent microbial dysbiosis and cardiovascular disease in patients
with long COVID ([138]Figure 5E).[139]^30^,[140]^31^,[141]^32 As such,
our unbiased approach using unsupervised clustering highlights three
clinically distinct groups of patients with unique biomarkers and long
COVID symptomatology.
Figure 5.
[142]Figure 5
[143]Open in a new tab
Unsupervised clustering based on temporal changes in molecular
signatures between acute infection and convalescence
(A) Molecular features with ≧ 65 percent deviation in cluster B.
(B) Molecular features with ≧ 65 percent deviation in cluster C.
(C) Differences in clinical characteristics and PASC symptoms among
identified disease clusters. p values were calculated using the
Chi-squared test with Yates' continuity correction.
(D) Top network derived from the molecular signatures in cluster C
using Ingenuity Pathway Analysis (Qiagen).
(E) Trimethylamine N-oxide (TMAO) and phenylacetylglutamine levels
stratified by clinical outcomes (with event and event-free) and PASC
severity (recovered, mild, and severe) compared to healthy controls.
Asterisks indicate statistical significance by Mann-Whitney U test with
Benjamini-Hochberg correction between groups for each molecule as
follows: ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001.
Biomarker signatures associated with adverse clinical outcomes during
convalescence
Over a median duration of 17.4 (14.3–18.8) months after discharge, 36
individuals (30.8%) reached the composite outcome of all-cause
mortality or re-hospitalization. To uncover molecules associated with
adverse outcomes during convalescence, we stratified outcomes with a
minimal panel of molecules based on the multi-omics profile at repeat
sampling. The machine learning models were developed using multiplexed
cytokines, proteins, and metabolites. They were validated using linear
classifiers through 5-fold validation to predict the incidence of
adverse events following discharge from acute infection
([144]Figure 6A). Individuals were randomly split into a training
cohort (90%) for variable selection and model development, while the
remaining 10% were used as the validation cohort. From the training
cohort, a minimum panel of 20 variables, including seven cytokines and
13 metabolites, were preferentially selected, reaching an area under
the curve (AUC) of 0.96 ([145]Figure 6B; [146]Table S8). This minimum
panel performed better in terms of accuracy (0.83 vs. 0.75), recall
(1.00 vs. 0.40), and F1 score (0.83 vs. 0.57) compared to the combined
multi-omics dataset but showed reduced precision (0.71 vs. 1.00,
[147]Figure 6C). When we tested the model on the validation cohort, all
those with an event were correctly predicted by the panel
([148]Figure 6D). However, two event-free survivors were falsely
predicted to experience an event who were older females at 69 and 77
years of age. Network analysis showed that within our minimal
prediction panel, there is a downregulation of spermidine and taurine
metabolites, accompanied by a reduction of protective cytokines
(including interleukin-22 [IL-22] and colony-stimulating factor 3
[CSF3]) and upregulation of pro-inflammatory cytokines (including
interleukin-15 [IL-15]) with a concomitant increase in interleukin-10
(IL-10) ([149]Figure 6E). Interestingly, interleukin-27 (IL-27), a
multifunctional cytokine, was the most significant biomarker in the
minimal prediction panel in determining adverse outcomes.
Figure 6.
[150]Figure 6
[151]Open in a new tab
Predictive biomarkers for adverse outcomes during convalescence
(A) Receiver operator characteristic curves of prediction models
trained on each of the individual omics datasets, combined omics
(cytokines, proteomics, and metabolomics), and the minimal panel using
molecular profile at convalescence.
(B) The minimal panel consisted of seven cytokines and 13 metabolites
selected with sequential feature extraction based on molecular profile
at convalescence.
(C) Classifier performance metrics on the testing set using each
individual omics dataset, combined omics, and the minimal panel.
(D) Prediction score plot demonstrates the minimal panel’s efficacy in
classifying the testing set.
(E) Network of molecules included in the minimal panel based on
Ingenuity Pathway Analysis (Qiagen).
Discussion
To elucidate the molecular shifts between acute SARS-CoV-2 infection
and long COVID, we performed a sequential assessment of the plasma
proteome, metabolome, and cytokines in 117 individuals during
hospitalization from acute infection and at 6 months follow-up. We
utilized machine learning algorithms to generate insights into PASC
phenotypes based on changes in multi-omics signatures and developed a
minimum panel of molecules associated with long-term clinical outcomes.
Most participants were enrolled during the second and third wave of the
pandemic in Canada, with the dominant circulating SARS-CoV-2 strains
during these periods being the wild-type and B.1.1.7 variant.[152]^33
Even during repeat collection at 6.3 (IQR: 6.0–7.1) months, PASC
remained a significant disease burden in our cohort, with only 30 of
117 participants (25.6%) reporting full resolution of symptoms.
Self-reported symptoms included fatigue (56.4%), general weakness
(41.9%), SOB (40.2%), cognitive impairment (33.3%), and mood
disturbance (33.3%), which is consistent with other findings at
6 months post infection among hospitalized patients.[153]^34^,[154]^35
Moreover, there was a progressive reduction in health-related quality
of life assessed using the SF-12 score and EQ-VAS in accordance with
PASC severity based on the number of self-reported symptoms.
Accordingly, lower quality-of-life scores 6 months post infection have
been linked to individuals having mobility issues, pain or discomfort,
fatigue, and ICU admission.[155]^35^,[156]^36 Of particular concern is
the persistently altered molecular signature in individuals reporting
complete resolution of symptoms from the acute infectious phase, as
this may confer an increased susceptibility to PASC following recurrent
infection from SARS-CoV-2 variants or other insults.[157]^37
Consistent with the PASC symptom burden in our cohort, we observed
persistent dysregulation in various biological pathways known to be
implicated in SARS-CoV-2 pathogenesis more than 6 months post
infection. Substantial immune activation and stress response were
observed in convalescence with upregulation of cytokines such as IL-1β,
IL-6, CXCL1, IL-7, IL-8, and IL-18, which is in agreement with the
alterations in innate and adaptive immune cell populations seen in long
COVID.[158]^38^,[159]^39 Moreover, cytokines and proteins involved in
platelet degranulation and abnormal blood coagulation were further
elevated during convalescence from acute infection. Accordingly, plasma
from patients with PASC was characterized by an abundance of
microthrombi with increased clotting cascade proteins and enhanced
resistance toward fibrinolysis.[160]^40 Functional stimulation of
platelets from COVID-19 survivors confirmed the state of hyperreactive
platelets and increased granule secretion.[161]^41 These findings
suggest that the hypercoagulable state observed during acute infection
persists into convalescence, highlighting the potential utility of
tailored antithrombotic agents in managing long COVID complications.
Moreover, our data revealed a dynamic cellular state during
convalescence characterized by cell activation, migration,
proliferation, signaling, and interaction. Network analysis identified
that activation of the epidermal growth factor (EGF) signaling pathway
is central to these cellular processes, which can be induced by
inflammation and cellular stress. Indeed, upregulation of EGF signaling
leads to local TGF-β activation that facilitates barrier restoration in
damaged vascular cells and pericyte differentiation into
collagen-producing myofibroblasts.[162]^42 However, pulmonary fibrosis
following SARS-CoV-2 infection has been linked to aberrant EGF
activation in a subset of individuals, associated with reduced forced
vital capacity and diffusing capacity.[163]^43^,[164]^44
Global metabolomic analyses uncovered three predominant pathways
dysregulated between acute infection and convalescence, including
arginine biosynthesis, cysteine and methionine metabolism, and the TCA
cycle. Mitochondrial dysfunction and the inability to respond to
increasing energy demands in peripheral monocytes and endothelial cells
occurs in acute SARS-CoV-2 infection.[165]^45^,[166]^46 Consistent with
the hyperinflammatory response and cytokine storm, patients who
required intubation during acute infection displayed elevated energy
expenditure and hypermetabolic phenotypes.[167]^47 Thus, the persistent
elevation of TCA cycle metabolites may reflect increased energy
production to compensate for mitochondrial dysfunction and enhanced
metabolic requirements from chronic inflammation and tissue repair that
distinguishes PASC from chronic fatigue syndrome, which is
characterized by a concerted hypometabolic state.[168]^48 These results
are consistent with findings from acute infection based on plasma and
exosome analysis insofar as cellular metabolic pathways were also
markedly altered.[169]^28 Compared to the acute phase, there was a
significant reduction in several amino acids of the methionine pathway
(such as L-methionine, L-cystathionine, and alpha-aminobutyric acid)
during convalescence. Methionine is an essential amino acid that
participates functionally in synthesizing glutathione to alleviate
oxidative stress and mediate crucial antioxidant effects.[170]^49 As
TCA cycle activity and mitochondrial bioenergetics directly affect
cellular energy availability, dysregulation in these processes may
contribute to the high prevalence of fatigue and general weakness seen
in long COVID.[171]^50 A similar process of oxidative stress and
perturbations in carbohydrate metabolism have been implicated in the
development of neurodegenerative disorders such as Alzheimer’s
disease.[172]^51 Taken together with the prolonged detection of
SARS-CoV-2 in the brain several months post infection, we speculate
that oxidative damage may play a crucial role in mediating PASC-related
brain fog, memory loss, mood disturbance, and signatures of advanced
aging.[173]^52^,[174]^53
Utilizing the convalescence multi-omics profile from a training cohort
of 105 individuals, we developed a minimal panel of seven cytokines and
13 metabolites that demonstrated good predictive value, reaching an AUC
of 0.96 with 83% accuracy. The superior classification ability using
cytokines and metabolites for outcomes in individuals during
convalescence further strengthens the pathological role of dysregulated
inflammatory and metabolic responses in long COVID. Notably, cytokines
present in this panel were primarily related to the activation of IL-27
signaling, as reflected by the stimulation of IL-15 and IL-10 and the
inhibition of G-CSF, MCP-3 (CCL7), and IL-22. In severe SARS-CoV-2
infection, T cell apoptosis and exhaustion were associated with
overexpression of the exhaustion markers, such as PD-1 and TIM-3 on
peripheral CD8^+ T cells.[175]^54 Similar upregulation of PD-1 and
TIM-3 expression was seen in individuals with PASC symptoms 8 months
post infection.[176]^14^,[177]^55 Interleukin-27 is a pleiotropic
cytokine that induces the NFIL3 axis, leading to upregulation of TIM-3,
PD-L1, and IL-10 expression, which can directly promote T cell
exhaustion, resulting in an impaired ability to eliminate chronic viral
infections effectively.[178]^56^,[179]^57 Therefore, targeting the
upstream IL-27 signaling pathway to alleviate or reverse CD8^+ T cell
exhaustion represents a plausible strategy to mitigate the adverse
outcomes of long COVID. Additionally, IL-27 signaling has been
implicated in metabolic reprogramming, specifically through the
upregulation of UCP1, PPAR⍺, and PCC-1⍺, resulting in increased energy
expenditure and stimulation of thermogenesis.[180]^58 Other molecules
in the panel involved in energy metabolism include 2-aminoadipic acid
(an established predictor of diabetes), taurine, and
acylcarnitines.[181]^59^,[182]^60^,[183]^61^,[184]^62 Spermidine is
known for its antioxidant, anti-inflammatory effects and ability to
promote nitric oxide production to improve mitochondrial function and
biogenesis, whereas asymmetric dimethylarginine elicits opposite
effects.[185]^63^,[186]^64 Suppression of the taurine pathway during
convalescence was associated with worse health-related quality-of-life
and adverse outcomes. Interestingly, taurine has been shown to
alleviate oxidative stress and promote beneficial metabolic effects
while protecting the cardiovascular system.[187]^65^,[188]^66 Across
various animal models, taurine administration improved strength,
depressive behavior, memory, and other hallmarks of aging through
attenuating cellular senescence, mitochondrial dysfunction, DNA damage,
and chronic inflammation.[189]^67 However, the longitudinal safety and
efficacy of taurine supplementation to alleviate PASC symptoms in
humans remains to be determined. Collectively, these data indicate that
persistently altered cellular bioenergetics and mitochondrial
dysfunction constitute a significant risk factor for developing PASC
that could be targeted to improve clinical outcomes.
Given the lack of proven effective therapies for long COVID, our
results point toward several potential avenues that may be explored in
future studies. Firstly, persistent immune activation can impair wound
healing and contribute to neuroinflammation. Therefore,
anti-inflammatory strategies such as monoclonal antibody blockade of
IL-6, TNF, and IL-1 receptors or short-term corticosteroids could be
explored as they have been for acute SARS-CoV-2 infection. Secondly,
individuals at higher risk of thromboembolic disorders with long COVID
may benefit from anticoagulation, given the observed abnormalities in
platelet degranulation and coagulation processes. Thirdly, global
metabolomic analyses revealed specific alterations in methionine
metabolism and the TCA cycle, suggesting a potential role of
antioxidants and treatment strategies to support mitochondrial function
and energy production. Fourthly, taurine supplementation can
potentially alleviate long COVID burden based on the strong and
consistent correlation between taurine levels with PASC symptoms and
quality of life. Lastly, the observed dysregulation in
microbiota-derived metabolites such as TMAO and phenylacetylglutamine
concomitant with findings of gut dysbiosis in long COVID represents an
attractive therapeutic target.[190]^32
Limitations of the study
Our study leveraged a comprehensive systems-based approach to study the
PASC, but some limitations still exist. Since our study represents a
relatively severe disease cohort requiring hospitalization prior to
mass vaccination, our findings should be extended to patients
recovering from home and previously vaccinated individuals. Indeed,
emerging evidence suggests that vaccination against SARS-CoV-2 may
protect against PASC symptoms in previously infected individuals. This
may be related to their ability to stimulate anti-spike protein
antibody production and T cell activation to promote viral clearance
and resolve chronic inflammation.[191]^68 Additionally, the emergence
of Omicron variants substantially increased transmissibility with
diminished severity and pathogenicity.[192]^69^,[193]^70 However, given
the increased number of individuals infected with the Omicron variant,
more people are experiencing long COVID globally.[194]^71 Moreover,
despite protocolized morning blood collections to minimize diurnal
variations, the metabolic profile and associated relationship with
self-reported PASC symptoms could be confounded by the participant’s
fasting status. Future studies in large prospective cohorts are
warranted to validate the biomarkers and molecular pathways implicated
in long COVID pathophysiology and to evaluate the efficacy of several
identified therapeutic targets for consideration in clinical trials.
STAR★Methods
Key resources table
REAGENT or RESOURCE SOURCE IDENTIFIER
Biological samples
__________________________________________________________________
Human blood plasma Canadian Biosample Repository
[195]https://biosample.ca/
__________________________________________________________________
Deposited data
__________________________________________________________________
Raw proteomics data PeptideAtlas: PASS03810
[196]https://peptideatlas.org/
Raw metabolomics data MetaboLights: MTBLS7337
[197]https://www.ebi.ac.uk/metabolights/
__________________________________________________________________
Software and algorithms
__________________________________________________________________
MetaboAnalyst 5.0 Pang et al.[198]^72
[199]https://www.metaboanalyst.ca/home.xhtml
OriginLab OriginLab Corporation [200]https://www.originlab.com/
R (v4.2.3) The R Project for Statistical Computing
[201]https://www.r-project.org/
Python (v3.8.16) Jupyter notebooks
[202]https://www.python.org/doc/versions/
Skyline (v21.2.0.536) University of Washington
[203]http://www.skyline.ms
Ingenuity Pathway Analysis (IPA) QIAGEN
[204]https://qiagen.pathfactory.com/
Metascape Zhou et al.[205]^73 [206]http://metascape.org/
[207]Open in a new tab
Resource availability
Lead contact
Further information and requests for resources and reagents should be
directed to and will be fulfilled by the lead contact, Gavin Y. Oudit
(gavin.oudit@ualberta.ca).
Materials availability
This study did not generate new unique reagents.
Experimental model and subject details
Study participants
The COVID-19 Surveillance Collaboration (CoCollab) Study prospectively
enrolled consecutive patients newly admitted to hospital wards
designated for COVID-19 and intensive care units at the University of
Alberta Hospital (Edmonton, Canada) between October 15, 2020, and June
29, 2021 ([208]Figure S7). All enrolled patients were ≥18 years of age
with a laboratory-confirmed COVID-19 diagnosis based on a positive
SARS-CoV-2 real-time PCR (PCR) assay from nasopharyngeal swabs or lower
respiratory samples. Comparisons were made with age and gender-matched
healthy controls (n = 28) enrolled during the same period. Our study
was conducted in accordance with the ethical principles of the
Declaration of Helsinki with approval from the University of Alberta
Health Research Ethics Board (Pro00100319 and Pro00100207). Written and
informed consent was obtained from all participants.
Method details
Plasma collection and storage
Venous blood sampling was performed in the morning by trained
phlebotomists and transported to the Canadian Biosample Repository
located at the University of Alberta within 1 h for immediate
processing. Samples were collected in tubes containing
ethylenediaminetetraacetic acid (EDTA) and centrifuged at 1500 x g for
10 min at room temperature. Plasma was subsequently aliquoted for
storage at −80°C. Baseline sampling of acute COVID-19 was performed
immediately following hospital admission, while pre-scheduled follow-up
blood sampling at six months was collected either from the patient’s
location of residence by the study team or at the Kaye Edmonton Clinic
(Edmonton, Canada).
Clinical outcomes and quality-of-life assessment
Detailed clinical characteristics, including demographics, vital signs,
presenting symptoms, comorbidities, and medications, were collected
through individual review of electronic medical records. Incidence of
all-cause mortality and hospital readmission since their discharge date
from the acute COVID-19 hospitalization (median follow-up of 17.4 [IQR:
14.3–18.8] months) is obtained from individual’s electronic medical
records until June 30, 2022. A review of symptoms was performed using a
questionnaire during follow-up blood sampling that encompassed general
systemic (fatigue, general weakness, chills, night sweats, runny nose,
muscle ache), cardiopulmonary (shortness of breath, chronic cough,
palpitation, tachycardia), neurological (cognitive impairment such as
confusion, memory loss or brain fog, insomnia, changes in smell and
taste, headache, and mood disturbance), and gastrointestinal (nausea,
abdominal pain, diarrhea) domains. Other self-reported symptoms
associated with PASC not included in the questionnaire were also
recorded. Additionally, the validated SF-12 health questionnaire and
Euro Quality visual analogue scale (EQ-VAS) were used to assess the
biopsychosocial health of convalescing patients.[209]^74^,[210]^75 The
SF-12 health questionnaire examines the physical and mental health of
patients across eight separate domains (physical functioning, physical
role, bodily pain, general health, vitality, social functioning,
emotions, and mental health) while the EQ-VAS assesses individuals’
self-rated health status using a score between 0 (the worse health
state imaginable) and 100 (the best health state imaginable).
Multiplexed cytokine analysis
Luminex xMAP technology was used for the multiplexed quantification of
48 human cytokines, chemokines, and growth factors. The multiplexing
analysis was performed using the Luminex 200 system (Luminex, Austin,
TX, USA) by Eve Technologies Corp. (Calgary, Alberta). Forty-eight
markers were simultaneously measured in the samples using Eve
Technologies' Human Cytokine 48-Plex Discovery Assay (MilliporeSigma,
Burlington, Massachusetts, USA). The assay was run according to the
manufacturer’s protocol. The 48-plex consisted of sCD40L, EGF, Eotaxin,
FGF-2, FLT-3 Ligand, Fractalkine, G-CSF, GM-CSF, GROα, IFN-α2, IFN-γ,
IL-1α, IL-1β, IL-1RA, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9,
IL-10, IL-12(p40), IL-12(p70), IL-13, IL-15, IL-17A, IL-17E/IL-25,
IL-17F, IL-18, IL-22, IL-27, IP-10, MCP-1, MCP-3, M-CSF, MDC,
MIG/CXCL9, MIP-1α, MIP-1β, PDGF-AA, PDGF-AB/BB, RANTES, TGFα, TNF-α,
TNF-β, and VEGF-A. Assay sensitivities of these markers range from 0.14
to 55.8 pg/mL for the 48-plex. Individual analyte sensitivity values
are available in the MILLIPLEX MAP protocol.
Targeted plasma proteomics by LC-MS
Plasma samples were analyzed using a multiple reaction monitoring
(MRM)-based methodology with stable isotope-labelled standards (SIS) to
quantify 274 human proteins. In this “bottom-up” approach, carefully
chosen peptides for each protein were used as protein surrogates. In
total, 274 peptides were included in the assay employing the gold
standard technique for LC-MRM-MS: each with its own external
calibration curve using synthetic light and SIS standard peptides. In
addition, three levels of quality control (QC) samples were monitored
for each peptide. The points of each calibration curve and the QCs were
assessed for accuracy between 75% and 125% at the lowest calibration
curve point and QC and between 80% and 120% for the remaining points
and QCs. The samples were analyzed in one batch.
Peptides were synthesized using FMOC chemistry with ^13C/^15N-labeled
amino acids for SIS peptides, purified through reversed phase-HPLC with
subsequent assessment by MALDI-TOF-MS, and characterized via amino acid
analysis (AAA) and capillary zone electrophoresis (CZE). All other
chemicals and reagents used were of the highest analytical quality
available and were obtained from commercial vendors. Tryptic peptides
were selected to serve as molecular surrogates for the target proteins
according to a series of peptide selection rules (for detailed
criteria, see Kuzyk et al.[211]^76) and previous detectability in
plasma samples. To compensate for matrix-induced suppression or
variability in LC-MS performance, ^13C/^15N -labelled peptides were
used as internal standards. During sample preparation, 10 μL of raw
plasma was sequentially subjected to 9 M urea, 20 mM dithiothreitol,
and 0.5 M iodoacetamide. All steps were carried out in Tris buffer at
pH 8.0. Denaturation and reduction occurred simultaneously at 37°C for
30 min, with alkylation occurring thereafter in the dark at room
temperature for 30 min. Proteolysis was initiated by adding
TPCK-treated trypsin (70 μL at 1 mg/mL; Worthington) at a 10:1
substrate: enzyme ratio. After overnight incubation at 37°C,
proteolysis was quenched with formic acid (FA) at a final concentration
of 1.0%. The SIS peptide mixture was then spiked into the samples. All
samples were then concentrated by solid-phase extraction (Oasis HLB,
2 mg sorbent; Waters). After solid-phase extraction, the concentrated
eluate was dried using a vacuum concentrator and rehydrated in 0.1% FA
to a final protein concentration of 1 μg/μL for LC-MRM/MS analysis. A
surrogate matrix for use with standard and QC samples was prepared from
a digest of 10 mg/mL Bovine Serum Albumin (BSA) in PBS buffer, using
the same methodology as the plasma samples described above. The
standard curves were generated using a natural isotopic abundance (NAT)
peptide for each analyte. A dilution series of the NAT peptides in BSA
tryptic digest was prepared from a high concentration of 1000X the
lower limit of quantitation (LLOQ) over seven dilutions to the lowest
point of the curve, which was also the LLOQ for the assay. The QC
samples were prepared from the same NAT mix and diluted in BSA digest.
Injections of 10 μL of the plasma tryptic digests were separated with a
Zorbax Eclipse Plus RP-UHPLC column (2.1 × 150 mm, 1.8 μm particle
diameter; Agilent) that was contained within a 1290 Infinity system
(Agilent). Peptide separations were achieved at 0.4 mL∕min over a
60 min run via a multi-step LC gradient (2–80% mobile phase B; mobile
phase compositions: A was 0.1% FA in H[2]O while B was 0.1% FA in
acetonitrile). The column was maintained at 50°C. A 4-min post gradient
column re-equilibration step was used after each sample analysis. The
LC system was interfaced to a triple quadrupole mass spectrometer
(Agilent 6495C) via a standard-flow AJS ESI source, operated in the
positive ion mode. The general MRM acquisition parameters employed were
as follows: 3.5 kV capillary voltage, 300 V nozzle voltage, 11 L∕min
sheath gas flow at a temperature of 250°C, 15 L∕min drying gas flow at
a temperature of 150°C, 30 psi nebulizer gas pressure, 5 V cell
accelerator potential, and unit mass resolution in the first and third
quadrupole mass analyzers. The high energy dynode (HED) multiplier was
set to −20 kV for improved ion detection efficiency and signal-to-noise
ratios. Specific LC-MS acquisition parameters were employed for optimal
peptide ionization/fragmentation and scheduled MRM. Note that the
peptide optimizations had previously been empirically optimized by
direct infusion of the purified SIS peptides. In the quantitative
analysis, the targets (1 transition/peptide) were monitored over 700 ms
cycles and 1.5 min detection windows.
The MRM data was visualized and examined with Skyline Quantitative
Analysis software (version 21.2.0.536, University of Washington). This
involved peak inspection to ensure accurate selection, integration, and
uniformity (in terms of peak shape and retention time) of the SIS and
NAT peptides. After defining a small number of criteria (i.e., 1/x2
regression weighting, <25% deviation in the QC-A’s level’s accuracy,
<20% for QCs B and C), a standard curve was used to calculate the
peptide concentration in fmol/μL of plasma in the samples through
linear regression.
Targeted plasma metabolomics by LC-MS
A custom-made targeted quantitative metabolomics approach was applied
to analyze the samples using a combination of direct injection mass
spectrometry (DI-MS) and LC-MS/MS. This custom LC-MS assay can be used
for the targeted identification and quantification of up to 636
endogenous metabolites, including amino acids and amino acid
derivatives, biogenic amines, ceramides, cholesterol esters,
diacylglycerols, acylcarnitines, glycerophospholipids, sphingomyelins,
triacylglycerols, organic acids and nucleotide/nucleosides. The method
uses chemical derivatization (for organic acids and biogenic amines),
analyte extraction and LC separation (or direct injection for lipids
and acylcarnitines), combined with selective mass-spectrometric
detection using multiple reaction monitoring (MRM) pairs to identify
and quantify metabolites. Isotope-labelled ISTDs and isotope-labelled
chemical derivatization standards are used for accurate metabolite
quantification. Stock solutions of each standard and ISTD used in the
assay are prepared by dissolving accurately weighed solids in
double-distilled water (ddH[2]O). Calibration curve standards are
obtained by mixing and diluting the corresponding stock solutions with
ddH[2]O. For amino acids, biogenic amines, carbohydrates,
acylcarnitines and derivatives, and all lipids and their derivatives,
stock solutions of isotope-labelled compounds were also prepared
similarly. A working ISTD solution mixture in ddH[2]O was also made by
mixing all the prepared isotope-labelled stock solutions together. For
organic acids, stock solutions of isotope-labelled compounds were
prepared by dissolving the accurately weighed solids in 75% aqueous
methanol. A working internal standard solution mixture in 75% aqueous
methanol was made by mixing and diluting all the isotope-labelled stock
solutions.
The assay uses a 96-deep-well plate with a filter plate attached via
sealing tape and a set of reagents and solvents to prepare the plate
assay. The first 14 wells of the 96-well plate are used for a blank
sample, three zero samples, seven standard-containing or calibration
samples and three quality control (QC) samples. To extract and measure
all metabolites except organic acids, plasma samples were first thawed
on ice, then vortexed and centrifuged at 13,000
[MATH: × :MATH]
g. A total of 10 μL of each plasma sample was loaded onto the center of
the filter on the upper 96-well plate and dried in a stream of
nitrogen. Subsequently, phenyl-isothiocyanate (PITC) was added for the
derivatization of all amine-containing metabolites. After incubation,
the filter spots were dried again using an evaporator. The metabolites
were then extracted by adding an ammonium acetate/methanol mixture
(5 mM ammonium acetate dissolved in 300 μL methanol). Then, the
extracts were centrifuged into the lower 96-deep-well plate and diluted
with the MS running solvent before injection into the LC-MS system.
For organic acid analysis, 150 μL of ice-cold methanol and 10 μL of the
isotopically labeled ISTD mixture were added to 50 μL of each plasma
sample for overnight protein precipitation (at 20°C). After the
precipitation step was complete, each sample was centrifuged at 13,000
[MATH: × :MATH]
g for 20 min at 4°C; 50 μL of each supernatant was loaded onto the
center of a selected well of the 96-deep well-plate, followed by the
addition of 3-nitrophenylhydrazine (3-NPH), which serves as an
organic-acid specific derivatization reagent. After incubation for 2h,
10 mg of butylated hydroxytoluene (BHT), which is used as a stabilizer,
and 50 μL water were added before LC-MS injection.
Mass spectrometric analysis was performed on an ABSciex 5500 Qtrap
tandem mass spectrometry instrument (Applied Biosystems/MDS Analytical
Technologies, Foster City, CA) equipped with an Agilent 1290 series
UHPLC system (Agilent Technologies, Palo Alto, CA) with an Agilent
Zorbax C18 column. The samples were delivered to the mass spectrometer
by an LC method followed by a direct injection (DI) method. Data
analysis was done using Analyst 1.6.2 (Applied Biosystems/MDS
Analytical Technologies, Foster City, CA).
Quantification and statistical analysis
Data processing and statistical analysis
For proteomic, metabolomic, and cytokine profiling, features (i.e.,
molecules) with >50% missing values were removed from the dataset. The
remaining values below the lower limit of detection (LLOD) were imputed
using the minimum value of each feature as previously described
([212]Table S9).[213]^20^,[214]^22 Subsequently, the data were
normalized by applying a logarithmic (base 10) transformation.
Principal component analysis (PCA) was performed using the corrected
features (i.e., molecules) with MetaboAnalyst 5.0
([215]www.metaboanalyst.ca).[216]^72 Mann-Whitney U and Kruskal-Wallis
tests were utilized for non-normalized distributions, while Student’s t
test and ANOVA were performed for parametric comparisons. p values were
adjusted for multiple testing using the Benjamini and Hochberg false
discovery rate (FDR) correction. Analyses of differentially expressed
cytokines, proteins, and metabolites with significant filtering
criteria (p value <0.05 and |fold-change| >1.5) were performed using
MetaboAnalyst 5.0. Binary logistic regression was performed for the
association between self-reported PASC symptoms and DEM between
convalescence and healthy controls, as well as changes between
convalescence and acute COVID-19, adjusting for age, gender, diabetes,
acute COVID-19 management (dexamethasone, antibiotics, tocilizumab,
remdesivir), WHO Ordinal Scale, and vaccination status prior to
follow-up blood collection. Additionally, association with
health-related quality-of-life scores (SF-12 and EQ-VAS) was assessed
using the multiple linear regression analysis adjusting for the same
clinical variables. Correlations between molecules and clinical
laboratory values in acute COVID-19 were assessed using the Spearman
correlation coefficient. Box and bar plots were made using Origin,
Version 2022b (OriginLab Corporation, Northampton, MA, USA).
Statistical analyses were performed using R 4.2.3 (Vienna, Austria).
Unsupervised clustering and machine learning panel analyses were
performed by Python 3.8.16.
Pathway analysis
Differentially expressed cytokines, proteins, and metabolites were
analyzed using the Ingenuity pathway analysis (IPA) system (Qiagen) to
identify the most relevant pathways with significance levels based on
the right-tailed Fisher’s exact test.[217]^77 Networks between
molecules were generated using an algorithm that assigns scores based
on a hypergeometric distribution for each network. The Gene Ontology
(GO) terms were enriched for cytokines and proteomics using the
Metascape platform.[218]^73 Additionally, metabolomic pathways were
enriched using the Pathway Analysis module on MetaboAnalyst 5.0.
Unsupervised clustering
The intent of performing clustering on our data was to capture any
significant similarities between patients on a biological scale in lieu
of self-reported PASC severities. The k-means algorithm was utilized
for unsupervised clustering to minimize the variance within a cluster.
The algorithm generates ‘k’ cluster centroids and assigns samples to
clusters based on their relative distance from each centroid. This can
be written as an optimization problem:
[MATH: argminS<
/msub>∑i=1k∑x∈Si||x−μi||22 :MATH]
Here,
[MATH: Si :MATH]
represents the set of data points in a cluster
[MATH: i :MATH]
.
[MATH: μi :MATH]
is the centroid of cluster
[MATH: i :MATH]
and is computed as the mean of elements in the
[MATH:
ith
:MATH]
cluster. During each iteration of the algorithm, the elements in
[MATH: Si :MATH]
are updated, which in turn changes the value of
[MATH: ui :MATH]
until convergence is reached. Our dataset consists of a greater number
of variables than samples, which presents a challenge for clustering
since the samples are widely scattered in the high-dimensional variable
space. Therefore, a dimensionality-reduction approach is first used to
decrease the number of variables before clustering the data. We
performed PCA at a variance cut-off of 95%, which resulted in 81
principal components for the differential measurements between
convalescence and acute phases. Since the reduction in the number of
variables was insufficient, a non-linear dimensionality reduction was
performed using an autoencoder. Autoencoders (AE) are a class of
artificial neural networks where the network architecture creates a
bottleneck by encoding a layer of lower dimensions to generate a
lower-dimensional data projection. Since the activation of each layer
in the AE is non-linear, the lower dimensional projection of the data
is a non-linear combination of the original variables. The autoencoder
consisted of three encoding layers of 100, 70, and 50 neurons each. The
bottleneck layer consisted of 30 neurons followed by three decoding
layers, with all layers using the sigmoid activation on their outputs.
Through this approach, the dimensionality of the data was reduced to 30
features for the acute and convalescent samples using sigmoidal
activation functions in each layer. The relative significance of each
of the original variables in the encoded dimension can be determined by
performing a saliency analysis. Subsequently, the k-means algorithm is
applied to this encoded data with a cluster size of 3.
Machine learning and predictive model
The problem of feature extraction is determining a minimal set of
features or independent variables that contain the most information
needed to predict the response variable. This is a supervised learning
problem, and in the context of this study, the minimal number of
biomolecules required for predicting adverse clinical outcomes was
determined. This is a binary classification problem where the response
variable
[MATH: y∈[0,1] :MATH]
.
A linear classifier was used to classify the response. The mathematical
formulation of this problem is given by:
[MATH: Yˆ=X∗W+w0
:MATH]
[MATH: Yˆ=[y1ˆ
y2ˆ
⋮yNˆ
]N×1,X
=[log(a11)⋯log(a1M
)⋮⋱⋮
log(aN1
)⋯log(aNM
)]N×M,W
=[w1<
/mtr>w2<
mo>⋮wM<
/mtr>]M×1 :MATH]
Where
[MATH: W :MATH]
is the weight matrix the linearly combines the data matrix
[MATH: X :MATH]
consisting of
[MATH: n :MATH]
samples with
[MATH: m :MATH]
features or variables. The weight matrix is ‘learned’ from the data by
minimising the loss function given by:
[MATH: Loss=1N∑iN−[yi∗
log(P(yiˆ))+(1−yi)∗log(1−P(yiˆ))] :MATH]
where
[MATH: P(yiˆ)=11+e(−Xi
W+w0) :MATH]
The absolute value of
[MATH: w′s :MATH]
indicates the contribution of each feature in the classification. A
5-fold cross-validation was performed to check for overfitting of data.
This classification problem was repeated on datasets containing all
biomolecules or each molecule type (i.e., cytokines, proteins,
metabolites). The receiver operating characteristic (ROC) curve of
classification was generated from each dataset for their ability to
predict clinical outcomes. Subsequently, a minimal feature set was
determined using a greedy search algorithm known as sequential feature
extraction. In this method, a new feature is sequentially introduced to
the classifier, and the improvement in its predictive capability is
monitored. This routine is followed until a combination of the
predetermined features provides the best possible classification
performance.
Acknowledgments