Graphical abstract graphic file with name fx1.jpg [47]Open in a new tab Highlights * • Sequential multi-omics profiling of plasma during acute infection and convalescence * • Inflammation, platelet degranulation, and metabolic perturbations at convalescence * • Three distinct disease phenotypes based on unsupervised clustering of omics profile * • A panel of 20 cytokines and metabolites predicted adverse outcomes after discharge __________________________________________________________________ Wang et al. conduct a comprehensive multi-omics analysis to identify pathways differentially altered during acute SARS-CoV-2 infection and convalescence. This study provides clues into the heterogeneity of the post-acute COVID-19 symptoms and unveils potential therapeutic targets for long COVID. Introduction An estimated 10%–30% of individuals convalescing from SARS-CoV-2 infection continue to experience post-acute sequelae of COVID-19 (PASC) or long COVID, characterized by fatigue, sleep disturbance, confusion, and dyspnea, alongside many other debilitating symptoms resulting in significant impairments in their quality of life.[48]^1^,[49]^2^,[50]^3 The cellular receptor of SARS-CoV-2, angiotensin-converting enzyme 2 (ACE2), is ubiquitously expressed, thus facilitating multisystemic manifestations of acute COVID-19 and long COVID.[51]^4^,[52]^5 Epidemiological studies have found that the risk of developing new diagnoses of pulmonary, cardiovascular, gastrointestinal, metabolic, psychiatric, and nervous system disorders was greatly elevated, associated with higher hospitalization rates and worse prognoses at 6 months post infection.[53]^6^,[54]^7^,[55]^8 Although female sex, pre-existing comorbidities, and severity of the acute infection have been proposed as risk factors for PASC, the underlying cause behind such heterogeneity in disease sequelae is not yet understood.[56]^2^,[57]^9 Several pathophysiological mechanisms for PASC have been proposed, including the presence of viral reservoirs, persistent inflammation, induced autoimmunity, tissue injury, endothelial dysfunction, or impaired energy metabolism.[58]^10^,[59]^11^,[60]^12 Additionally, a triad of cytokines (interleukin-1 beta [IL-1 β], interleukin-6 [IL-6], and tumor necrosis factor [TNF]) correlated with ongoing PASC 8 months post infection.[61]^13 Similarly, patients with PASC had sustained inflammatory responses reflected through elevation in type I (IFN-β) and type III (IFN-λ1) interferon levels accompanied by persistent activation of monocytes and plasmacytoid dendritic cells.[62]^14 Nevertheless, given the complex overlapping pathophysiology between acute and long COVID phases, it remains a significant challenge to delineate specific molecular features underpinning PASC development to guide the discovery of prognostic biomarkers and therapeutic targets. In this study, we leveraged a systems-based multi-omics approach to extensively characterize and contrast the plasma cytokines, proteome, and metabolome of 117 individuals during acute infection and at the 6-month convalescence phase compared to non-infected healthy controls. Additionally, we performed unsupervised clustering analyses for unbiased disease phenotyping accompanied by machine learning of integrated clinical data to identify predictive biomarkers associated with adverse outcomes following acute infection. Importantly, our study revealed several therapeutic pathways that could be explored to minimize the impact of long COVID. Results Characteristics of participants We examined 117 individuals prospectively enrolled from designated COVID-19 wards (n = 97) and intensive care units (n = 20) between October 15, 2020, and June 29, 2021, with repeat blood sampling performed over a median duration of 6.3 (interquartile range [IQR]: 6.0–7.1) months ([63]Figure 1A). During the repeat sampling, patients’ PASC symptomology, health-related quality-of-life scores, and clinical outcomes were captured using self-reported questionnaires and a detailed review of their electronic medical records ([64]Table S1). The most frequently reported PASC symptoms were fatigue (66 individuals; 56.4%), general weakness (49 individuals; 41.9%), shortness of breath (SOB, 47 individuals; 40.2%), cognitive impairment (39 individuals; 33.3%), and mood disturbance (39 individuals, 33.3%, [65]Figure 1B). For comparison purposes, we classified individuals into three PASC severity groups based on their symptom burden from recovered (no PASC symptoms, n = 30), mild (≦3 symptoms, n = 32), to severe (>3 symptoms, n = 55) categories ([66]Table 1; [67]Table S2). The overall cohort was characterized by a median age of 62 (IQR: 53–73) years, with 66 men (56.4%) and a high prevalence of diabetes (49 individuals; 41.9%) and hypertension (62 individuals; 53.0%). Clinical characteristics were similar among the PASC severity groups except for smoking, where current or previous smokers were overrepresented in the severe category (31 individuals; 56.4%, p = 0.005). In contrast, the SF-12 score (113 [IQR: 109–115] vs. 96 [IQR: 87–106] vs. 80 [IQR: 69–90] for recovered, mild, and severe groups, respectively, p < 0.001) and EuroQol visual analog scale (90 [IQR: 85–92] vs. 83 [IQR: 75–90] vs. 63 [IQR: 50–76], p < 0.001) tracked closely with PASC severity ([68]Figure 1C). Furthermore, a greater proportion of individuals who experienced an adverse outcome (all-cause mortality or re-hospitalization) following discharge from acute infection were in the severe PASC category (p = 0.03, [69]Figure 1D; [70]Table S2). Therefore, our results reveal that as patients transition from acute infection to convalescence, they retain a persistently high symptom burden and risk for adverse outcomes. Figure 1. [71]Figure 1 [72]Open in a new tab Post-acute sequelae of COVID-19 (PASC) symptomatology and health-related quality of life of participants (A) Overview of study design and analysis. Figure was created using [73]BioRender.com. (B) PASC symptom prevalence at convalescence; bars represent self-reported symptoms in percentages. (C) Total SF-12 score and EuroQuol visual analog scale (EQ-VAS) among PASC severity groups during convalescence. Asterisks indicate statistical significance by Mann-Whitney U test with Benjamini-Hochberg correction between the severity groups for each quality-of-life measure as follows: ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001. (D) Association of PASC severity with adverse clinical outcomes (composite of all-cause mortality and re-hospitalization) following discharge from acute infection. Log rank p = 0.03. Table 1. Baseline clinical characteristics Characteristic Entire cohort (n = 117) Recovered (n = 30) Mild (n = 32) Severe (n = 55) Healthy controls (n = 28) Demographics __________________________________________________________________ Age (years) 62 (53–73) 64 (53–75) 62 (44–73) 62 (57–73) 55 (52–59) Male 66 (56.4) 19 (63.3) 20 (62.5) 27 (49.1) 16 (57.1) BMI (kg/m^2) 28.2 (24.3–34.1) 27.2 (24.5–32.4) 26.6 (24.3–30.6) 29.9 (23.9–37.4) – __________________________________________________________________ Race or ethnic group __________________________________________________________________ White 81 (69.2) 19 (63.3) 19 (59.4) 43 (78.2) – Asian 19 (16.2) 4 (13.3) 7 (21.9) 8 (14.5) – Hispanic 6 (5.1) 4 (13.3) 1 (3.1) 1 (1.8) – Mixed or unknown 11 (9.4) 3 (10.0) 5 (15.6) 3 (5.5) – Current or previous smoker 48 (41.0) 8 (26.7) 9 (28.1) 31 (56.4) – __________________________________________________________________ Presentation __________________________________________________________________ Fever 47 (40.2) 13 (43.3) 15 (46.9) 19 (34.5) – Myalgia 34 (29.1) 7 (23.3) 12 (37.5) 15 (27.3) – Cough 74 (63.2) 22 (73.3) 20 (62.5) 32 (58.2) – Dyspnea 81 (69.2) 23 (76.7) 22 (68.8) 36 (65.5) – Diarrhea/nausea 45 (38.5) 10 (33.3) 13 (40.6) 22 (40.0) – Abnormal CXR 87 (74.4) 22 (73.3) 23 (71.9) 42 (76.4) – __________________________________________________________________ Medical history __________________________________________________________________ Diabetes 49 (41.9) 11 (36.7) 10 (31.3) 28 (50.9) – Hypertension 62 (53.0) 16 (53.3) 14 (43.8) 32 (58.2) – COPD 20 (17.1) 3 (10.0) 6 (18.8) 11 (20.0) – CKD 16 (13.7) 2 (6.7) 3 (9.4) 11 (20.0) – CVD 17 (14.5) 3 (10.0) 4 (12.5) 10 (18.2) – Cancer 15 (12.8) 3 (10.0) 1 (3.1) 11 (20.0) – __________________________________________________________________ Management __________________________________________________________________ Supplemental O[2] 98 (83.8) 27 (90.0) 22 (68.8) 49 (89.1) – Intubation 20 (17.1) 5 (16.7) 3 (9.4) 12 (21.8) – Dexamethasone 102 (87.2) 28 (93.3) 26 (81.3) 48 (87.3) – Antibiotic 88 (75.2) 23 (76.7) 23 (71.9) 42 (76.4) – Tocilizumab 17 (14.5) 3 (10.0) 6 (18.8) 8 (14.5) – Remdesivir 6 (5.1) 1 (3.3) 1 (3.1) 5 (9.1) – SARS-CoV-2 vaccine 6 (5.1) 1 (3.3) 3 (9.4) 2 (3.6) – __________________________________________________________________ SARS-CoV-2 strain __________________________________________________________________ Original 85 (72.6) 22 (73.3) 24 (75.0) 39 (70.9) – B.1.1.7 25 (21.4) 6 (20.0) 6 (18.8) 13 (23.6) – P.1 5 (4.3) 2 (6.7) 1 (3.1) 2 (3.6) – B.1.351 1 (0.9) – 1 (3.1) – – B.1.617.2 1 (0.9) – – 1 (1.8) – __________________________________________________________________ Outcomes __________________________________________________________________ Adverse outcome 36 (30.8) 5 (16.7) 8 (25.0) 23 (41.8) – [74]Open in a new tab Abbreviations: BMI, body mass index; CXR, chest X-ray; COPD, chronic obstructive pulmonary disease; CKD, chronic kidney disease; CVD, cardiovascular disease (includes previous history of myocardial infarction, coronary artery disease, heart failure, atrial and ventricular arrhythmia). Temporal changes in cytokines, proteome, and metabolome between acute infection and convalescence Cytokine profiling, proteomics, and metabolomics analyses were performed to identify the temporal changes in plasma molecular features between acute infection and convalescence. A total of 47, 274, and 635 cytokines, proteins, and metabolites were measured, respectively ([75]Table S2). Subsequently, principal component analysis (PCA) was performed for all samples, which showed altered molecular profiles of individuals at convalescence compared to the acute infectious phase, both of which differed from age- and gender-matched healthy controls ([76]Figure 2A). In comparing acute infection and healthy controls, 231 molecules were significantly altered, consisting of 24 cytokines, 63 proteins, and 144 metabolites ([77]Figure 2B; [78]Table S3). Moreover, we detected 157 differentially expressed molecules (DEMs) between the acute infection and convalescence phase, composed of eight cytokines, 34 proteins, and 115 metabolites ([79]Figure 2C; [80]Table S3). Finally, 219 DEMs were identified between convalescence and healthy controls, which included nine cytokines, 31 proteins, and 169 metabolites ([81]Figure 2D; [82]Table S3). Figure 2. [83]Figure 2 [84]Open in a new tab Differentially expressed molecules (DEMs) associated with acute and convalescence phase (A) Principal component analysis (PCA) utilizing proteomics, metabolomics, and cytokines (log10 transformed) indicates that principal components 1 and 2 capture 30.2% and 7.5% of the variance between participants, respectively. (B) Volcano plot comparing the DEMs between acute infection and healthy control; The horizontal dashed line indicates the adjusted p value cutoff (0.05), and two vertical dashed lines indicate the fold-change cutoff (1.5). Orange dots indicate differentially expressed cytokines, purple dots indicate differentially expressed metabolites, and green dots indicate differentially expressed proteins. (C) Volcano plot comparing the DEMs between convalescence and acute infection. (D) Volcano plot comparing the DEMs between convalescence and healthy control. (E) Heatmap showing the top 100 molecules with the most significant p values comparing healthy control with acute and convalescence phases using ANOVA test on log10 transformed data. (F) Box and whisker plots illustrating the levels of thrombospondin-1, glutamine, serotonin, and sCD40L. Asterisks indicate statistical significance by Mann-Whitney U test with Benjamini-Hochberg correction between groups for each molecule as follows: ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001. The top 100 molecules with the most significant p values were shown by a heatmap illustrating the trajectory of their change in individuals at the acute and convalescence phase relative to healthy controls ([85]Figure 2E). A substantial proportion of molecules returned to comparable plasma levels as healthy controls during convalescence. However, some of these molecules remained altered and may be implicated in PASC development. Specifically, levels of thrombospondin-1, an important activator of TGF-β, which is a central mediator of wound healing, angiogenesis, and tissue fibrosis, increased progressively in samples from healthy controls to acute and convalescence phases ([86]Figure 2F).[87]^15 In contrast, glutamine levels remained depressed in acute and convalescence phases compared to healthy controls, representing a metabolic characteristic of COVID-19 that is associated with greater disease severity.[88]^16 Moreover, serotonin levels remained elevated in acute and convalescence phases, with even higher levels in convalescence, possibly related to persistent activation of platelet degranulation. Likewise, the levels of soluble CD40 ligand were elevated in acute infection, and its plasma levels continued to rise during convalescence ([89]Figure 2F), which may reflect ongoing viral-mediated inflammation as robust T cell responses have been demonstrated several months post SARS-CoV-2 infection.[90]^17^,[91]^18 Therefore, our results reveal persistent alterations in multiple pathways illustrative of the pathological signatures during convalescence with a dominant pattern of metabolic changes. Differential pathways altered during acute infection and convalescence Integrated canonical pathway analyses of the DEMs between acute infection and healthy controls identified pathways associated with stimulation of immune cells and activation of IL-1, IL-6, TNF, and toll-like receptor 3 signaling ([92]Figures 3A and 3B; [93]Table S4). Notably, proteins involved in the platelet degranulation, acute phase response, and complement system cascades strongly correlated with immune cell counts and laboratory disease markers during acute infection ([94]Figure S1; [95]Table S5). Levels of P-selectin, thrombospondin-1, fibronectin, and coagulation factor XIII were positively correlated with platelet and lymphocyte counts, which supports the close interplay between platelets and adaptive immunity in acute SARS-CoV-2 infection.[96]^19 Additionally, C-reactive protein and lipopolysaccharide-binding proteins were positively correlated with high-sensitivity troponin, representing a potential signature of inflammation-induced cardiac injury. Moreover, metabolic pathways altered during acute infection included arginine biosynthesis, glutamate metabolism, and sphingolipid metabolism, in line with findings from prior multi-omics studies ([97]Figure 3C; [98]Table S6).[99]^20^,[100]^21^,[101]^22 Figure 3. [102]Figure 3 [103]Open in a new tab Pathways dysregulated during acute infection and convalescence (A) Enriched Gene Ontology (GO) terms of differentially expressed proteins and cytokines on Metascape for acute COVID-19 compared to healthy controls, colored based on p values. (B) Top regulatory effects of molecules and functions in acute COVID-19 based on Ingenuity Pathway Analysis (IPA). (C) Pathways associated with metabolic alterations in acute COVID-19 compared to healthy controls. Pathway impact indicates the sum of importance of the altered metabolites in the impacted pathway based on pathway topology; the −log(P) are test statistics for quantitative pathway enrichment analysis based on concentration differences between groups. Notable impacted pathways are above the dashed lines (impact >0.2 and −log(P) > 20). (D) Enriched GO terms of differentially expressed proteins and cytokines on Metascape for convalescence phase compared to healthy controls, colored based on p values. (E) Top regulatory effects of molecules and functions during convalescence based on Ingenuity Pathway Analysis (IPA). (F) Pathways associated with metabolic alterations during convalescence compared to healthy controls. To elucidate the pathogenesis and identify therapeutic targets for long COVID, we next investigated the molecular changes that took place 6 months following acute infection. Our findings illustrate a pronounced and persistent immune activation compared to healthy individuals characterized by acute phase response, IL-1, TNF, and IL-6 pathways resembling the acute infectious phase ([104]Figures 3D and 3E; [105]Table S4). In addition, we found dysregulation of several key metabolites during convalescence central to glucose metabolism, including 2-oxoglutaric acid, ornithine, spermidine, and allantoin, implicated in the activation of cellular pathways such as sirtuin 6 and glucose-6-phosphate dehydrogenase ([106]Figure S2A). In accordance with findings in patients during acute infection, we found persistent dysregulation in platelet degranulation and blood coagulation during convalescence. Additionally, the omics profile at convalescence was associated with the activation of cell migration and growth factor signaling pathways as evidenced by elevated levels of platelet endothelial cell adhesion molecule 1 (also known as CD31), thrombospondin-1, fibroblast growth factor 2, and vascular endothelial growth factor-A ([107]Figure S2B). Although the molecular signature during convalescence bears similarities to acute infection, there is a pronounced downregulation in the extent of acute phase response, IL-1, and IL-6 signaling, with upregulation of liver X receptor signaling, suggesting a shift toward resolution and repair ([108]Figure S3; [109]Table S4). Moreover, arginine biosynthesis, cysteine and methionine metabolism, and the TCA cycle were differentially affected between acute and convalescence phases ([110]Table S6). Metabolites involved in the TCA cycle, such as pyruvate, malate, cis-aconitate, and 2-oxoglutaric acid, were further elevated at the convalescence phase compared to acute infection. Strikingly, perturbation in these pathways was observed even in individuals (n = 30) reporting no PASC symptoms and return of function to pre-COVID-19 states, suggesting persistent underlying pathological processes despite symptom resolution ([111]Figure S4; [112]Table S3). As such, the ongoing dysregulation of inflammatory, cellular signaling, and metabolic pathways during convalescence may substantially increase future complications and healthcare utilization even in seemingly recovered individuals. Association between proteome and metabolome signatures with clinical parameters Identifying molecules associated with self-reported symptoms and quality-of-life indices can provide insights into PASC pathogenesis. Therefore, we performed logistic regression analysis between DEMs at convalescence with self-reported PASC symptoms adjusting for clinical parameters previously described to be associated with long COVID ([113]Table S5).[114]^23 Molecules with three or more significant associations (p < 0.05) were displayed in a heatmap ([115]Figure 4A). We observed a set of triglycerides displaying a prominent negative association with nausea and fatigue but were positively associated with tachycardia ([116]Figure 4A). Plasma cystatin C and neutrophil gelatinase-associated lipocalin are markers of renal function and are positively associated with SOB, fatigue, nausea, and adverse outcomes, suggesting potential kidney involvement in certain individuals with PASC.[117]^24 Moreover, the gut-derived valeric acid was inversely associated with nausea, fatigue, muscle aches, and SOB. Reduced fecal concentrations of short-chain fatty acids such as valeric acid were observed beyond 30 days after disease resolution in patients with severe COVID-19 and were hypothesized to reflect prolonged SARS-CoV-2-mediated disruption in the gut microbiome.[118]^25 Notable metabolites whose levels negatively correlated with the SF-12 and EQ-VAS score were 4-hydroxyproline and 2-hydroxyisobutyric acid ([119]Figure 4B). Of particular interest was the metabolite taurine, whose levels were negatively associated with symptoms of nausea, mood disturbance, cognitive impairment, SOB, general weakness, and adverse outcomes. Furthermore, taurine and serotonin levels were also positively correlated with quality-of-life scores (SF-12 and EQ-VAS), which is in agreement with the ability of these molecules to induce positive emotions and elevate mood ([120]Figures 4A and 4B).[121]^26^,[122]^27 Downregulation of glycerophospholipids, sphingolipids, phosphatidylcholines, and fatty acids has been demonstrated during the acute SARS-CoV-2 infection, particularly in severe patients.[123]^20^,[124]^28 When we examined the changes in molecules between convalescence and acute phases, recovery in lipid levels was associated with lower PASC symptom burden, including nausea, fatigue, and general weakness, alongside greater quality-of-life scores ([125]Figure S5). Thus, various molecular features involved in neurological, immunological, gastrointestinal, and metabolic processes are associated with symptoms and quality of life in long COVID. Figure 4. [126]Figure 4 [127]Open in a new tab Association between molecular features with PASC symptoms and health-related quality of life during convalescence (A) Heatmap of PASC symptoms and adverse outcome associated with differentially expressed molecules (DEMs) from multi-omics profile between convalescence and healthy controls adjusted for age, gender, diabetes, acute COVID-19 treatment (dexamethasone, antibiotics, tocilizumab, remdesivir), WHO Ordinal Scale, and vaccination status with at least three significant associations, p < 0.05. (B) Heatmap based on significant associations between health-related quality-of-life indices (total SF-12 score and EQ-VAS scale) and DEMs from multi-omics profile between convalescence and healthy controls adjusted for age, gender, diabetes, acute COVID-19 treatment (dexamethasone, antibiotics, tocilizumab, remdesivir), WHO Ordinal Scale, and vaccination status. Unsupervised clustering identifies three distinct phenotypes during convalescence We next performed unsupervised clustering for individuals based on the changes in concentrations of molecules (cytokines, proteins, and metabolites) between acute and convalescence phases. First, we utilized PCA as a dimensionality reduction approach before applying the k-means algorithm ([128]Figure S6A). However, the reduction of variables was insufficient concerning the clustering efficacy as determined by the silhouette coefficient, and a non-linear approach was performed using an autoencoder ([129]Figure S6B). Utilizing k-means on autoencoders yielded three phenotypically distinct clusters based on their inherent molecular similarities ([130]Table S7). Consistent with our PCA plot, the clusters identified based on molecular features did not align with the PASC severity categories (based on the number of symptoms), as individuals from each severity group were evenly distributed among clusters A to C ([131]Figure S6C). Most individuals were captured by cluster A (n = 57, 48.7%), characterized by the absence of significant deviation in molecular profile and had the least number of established PASC risk factors. In comparison, cluster B was characterized by a predominant triglyceride and organic acid signature ([132]Figure 5A), whereas cluster C exhibited a more heterogeneous composition of cytokines, proteins, and metabolites ([133]Figure 5B). Moreover, compared to cluster B, cluster C had a higher proportion of women and more frequently reported symptoms such as insomnia, palpitation, SOB, general weakness, and fatigue ([134]Figure 5C). Interestingly, the top network identified based on molecules from cluster C was enriched in the HIF-1⍺ pathway, which regulates the cellular response to hypoxia and metabolic adaptations while exhibiting sex differences in activation,[135]^29 thus providing a possible molecular basis for the gender variations observed in PASC epidemiology ([136]Figure 5D; [137]Table S4). Importantly, increased plasma levels of the gut microbiota-derived metabolites trimethylamine N-oxide (TMAO) and phenylacetylglutamine in cluster C are associated with worsening symptoms and adverse outcomes consistent with the association of persistent microbial dysbiosis and cardiovascular disease in patients with long COVID ([138]Figure 5E).[139]^30^,[140]^31^,[141]^32 As such, our unbiased approach using unsupervised clustering highlights three clinically distinct groups of patients with unique biomarkers and long COVID symptomatology. Figure 5. [142]Figure 5 [143]Open in a new tab Unsupervised clustering based on temporal changes in molecular signatures between acute infection and convalescence (A) Molecular features with ≧ 65 percent deviation in cluster B. (B) Molecular features with ≧ 65 percent deviation in cluster C. (C) Differences in clinical characteristics and PASC symptoms among identified disease clusters. p values were calculated using the Chi-squared test with Yates' continuity correction. (D) Top network derived from the molecular signatures in cluster C using Ingenuity Pathway Analysis (Qiagen). (E) Trimethylamine N-oxide (TMAO) and phenylacetylglutamine levels stratified by clinical outcomes (with event and event-free) and PASC severity (recovered, mild, and severe) compared to healthy controls. Asterisks indicate statistical significance by Mann-Whitney U test with Benjamini-Hochberg correction between groups for each molecule as follows: ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ∗∗∗∗p < 0.0001. Biomarker signatures associated with adverse clinical outcomes during convalescence Over a median duration of 17.4 (14.3–18.8) months after discharge, 36 individuals (30.8%) reached the composite outcome of all-cause mortality or re-hospitalization. To uncover molecules associated with adverse outcomes during convalescence, we stratified outcomes with a minimal panel of molecules based on the multi-omics profile at repeat sampling. The machine learning models were developed using multiplexed cytokines, proteins, and metabolites. They were validated using linear classifiers through 5-fold validation to predict the incidence of adverse events following discharge from acute infection ([144]Figure 6A). Individuals were randomly split into a training cohort (90%) for variable selection and model development, while the remaining 10% were used as the validation cohort. From the training cohort, a minimum panel of 20 variables, including seven cytokines and 13 metabolites, were preferentially selected, reaching an area under the curve (AUC) of 0.96 ([145]Figure 6B; [146]Table S8). This minimum panel performed better in terms of accuracy (0.83 vs. 0.75), recall (1.00 vs. 0.40), and F1 score (0.83 vs. 0.57) compared to the combined multi-omics dataset but showed reduced precision (0.71 vs. 1.00, [147]Figure 6C). When we tested the model on the validation cohort, all those with an event were correctly predicted by the panel ([148]Figure 6D). However, two event-free survivors were falsely predicted to experience an event who were older females at 69 and 77 years of age. Network analysis showed that within our minimal prediction panel, there is a downregulation of spermidine and taurine metabolites, accompanied by a reduction of protective cytokines (including interleukin-22 [IL-22] and colony-stimulating factor 3 [CSF3]) and upregulation of pro-inflammatory cytokines (including interleukin-15 [IL-15]) with a concomitant increase in interleukin-10 (IL-10) ([149]Figure 6E). Interestingly, interleukin-27 (IL-27), a multifunctional cytokine, was the most significant biomarker in the minimal prediction panel in determining adverse outcomes. Figure 6. [150]Figure 6 [151]Open in a new tab Predictive biomarkers for adverse outcomes during convalescence (A) Receiver operator characteristic curves of prediction models trained on each of the individual omics datasets, combined omics (cytokines, proteomics, and metabolomics), and the minimal panel using molecular profile at convalescence. (B) The minimal panel consisted of seven cytokines and 13 metabolites selected with sequential feature extraction based on molecular profile at convalescence. (C) Classifier performance metrics on the testing set using each individual omics dataset, combined omics, and the minimal panel. (D) Prediction score plot demonstrates the minimal panel’s efficacy in classifying the testing set. (E) Network of molecules included in the minimal panel based on Ingenuity Pathway Analysis (Qiagen). Discussion To elucidate the molecular shifts between acute SARS-CoV-2 infection and long COVID, we performed a sequential assessment of the plasma proteome, metabolome, and cytokines in 117 individuals during hospitalization from acute infection and at 6 months follow-up. We utilized machine learning algorithms to generate insights into PASC phenotypes based on changes in multi-omics signatures and developed a minimum panel of molecules associated with long-term clinical outcomes. Most participants were enrolled during the second and third wave of the pandemic in Canada, with the dominant circulating SARS-CoV-2 strains during these periods being the wild-type and B.1.1.7 variant.[152]^33 Even during repeat collection at 6.3 (IQR: 6.0–7.1) months, PASC remained a significant disease burden in our cohort, with only 30 of 117 participants (25.6%) reporting full resolution of symptoms. Self-reported symptoms included fatigue (56.4%), general weakness (41.9%), SOB (40.2%), cognitive impairment (33.3%), and mood disturbance (33.3%), which is consistent with other findings at 6 months post infection among hospitalized patients.[153]^34^,[154]^35 Moreover, there was a progressive reduction in health-related quality of life assessed using the SF-12 score and EQ-VAS in accordance with PASC severity based on the number of self-reported symptoms. Accordingly, lower quality-of-life scores 6 months post infection have been linked to individuals having mobility issues, pain or discomfort, fatigue, and ICU admission.[155]^35^,[156]^36 Of particular concern is the persistently altered molecular signature in individuals reporting complete resolution of symptoms from the acute infectious phase, as this may confer an increased susceptibility to PASC following recurrent infection from SARS-CoV-2 variants or other insults.[157]^37 Consistent with the PASC symptom burden in our cohort, we observed persistent dysregulation in various biological pathways known to be implicated in SARS-CoV-2 pathogenesis more than 6 months post infection. Substantial immune activation and stress response were observed in convalescence with upregulation of cytokines such as IL-1β, IL-6, CXCL1, IL-7, IL-8, and IL-18, which is in agreement with the alterations in innate and adaptive immune cell populations seen in long COVID.[158]^38^,[159]^39 Moreover, cytokines and proteins involved in platelet degranulation and abnormal blood coagulation were further elevated during convalescence from acute infection. Accordingly, plasma from patients with PASC was characterized by an abundance of microthrombi with increased clotting cascade proteins and enhanced resistance toward fibrinolysis.[160]^40 Functional stimulation of platelets from COVID-19 survivors confirmed the state of hyperreactive platelets and increased granule secretion.[161]^41 These findings suggest that the hypercoagulable state observed during acute infection persists into convalescence, highlighting the potential utility of tailored antithrombotic agents in managing long COVID complications. Moreover, our data revealed a dynamic cellular state during convalescence characterized by cell activation, migration, proliferation, signaling, and interaction. Network analysis identified that activation of the epidermal growth factor (EGF) signaling pathway is central to these cellular processes, which can be induced by inflammation and cellular stress. Indeed, upregulation of EGF signaling leads to local TGF-β activation that facilitates barrier restoration in damaged vascular cells and pericyte differentiation into collagen-producing myofibroblasts.[162]^42 However, pulmonary fibrosis following SARS-CoV-2 infection has been linked to aberrant EGF activation in a subset of individuals, associated with reduced forced vital capacity and diffusing capacity.[163]^43^,[164]^44 Global metabolomic analyses uncovered three predominant pathways dysregulated between acute infection and convalescence, including arginine biosynthesis, cysteine and methionine metabolism, and the TCA cycle. Mitochondrial dysfunction and the inability to respond to increasing energy demands in peripheral monocytes and endothelial cells occurs in acute SARS-CoV-2 infection.[165]^45^,[166]^46 Consistent with the hyperinflammatory response and cytokine storm, patients who required intubation during acute infection displayed elevated energy expenditure and hypermetabolic phenotypes.[167]^47 Thus, the persistent elevation of TCA cycle metabolites may reflect increased energy production to compensate for mitochondrial dysfunction and enhanced metabolic requirements from chronic inflammation and tissue repair that distinguishes PASC from chronic fatigue syndrome, which is characterized by a concerted hypometabolic state.[168]^48 These results are consistent with findings from acute infection based on plasma and exosome analysis insofar as cellular metabolic pathways were also markedly altered.[169]^28 Compared to the acute phase, there was a significant reduction in several amino acids of the methionine pathway (such as L-methionine, L-cystathionine, and alpha-aminobutyric acid) during convalescence. Methionine is an essential amino acid that participates functionally in synthesizing glutathione to alleviate oxidative stress and mediate crucial antioxidant effects.[170]^49 As TCA cycle activity and mitochondrial bioenergetics directly affect cellular energy availability, dysregulation in these processes may contribute to the high prevalence of fatigue and general weakness seen in long COVID.[171]^50 A similar process of oxidative stress and perturbations in carbohydrate metabolism have been implicated in the development of neurodegenerative disorders such as Alzheimer’s disease.[172]^51 Taken together with the prolonged detection of SARS-CoV-2 in the brain several months post infection, we speculate that oxidative damage may play a crucial role in mediating PASC-related brain fog, memory loss, mood disturbance, and signatures of advanced aging.[173]^52^,[174]^53 Utilizing the convalescence multi-omics profile from a training cohort of 105 individuals, we developed a minimal panel of seven cytokines and 13 metabolites that demonstrated good predictive value, reaching an AUC of 0.96 with 83% accuracy. The superior classification ability using cytokines and metabolites for outcomes in individuals during convalescence further strengthens the pathological role of dysregulated inflammatory and metabolic responses in long COVID. Notably, cytokines present in this panel were primarily related to the activation of IL-27 signaling, as reflected by the stimulation of IL-15 and IL-10 and the inhibition of G-CSF, MCP-3 (CCL7), and IL-22. In severe SARS-CoV-2 infection, T cell apoptosis and exhaustion were associated with overexpression of the exhaustion markers, such as PD-1 and TIM-3 on peripheral CD8^+ T cells.[175]^54 Similar upregulation of PD-1 and TIM-3 expression was seen in individuals with PASC symptoms 8 months post infection.[176]^14^,[177]^55 Interleukin-27 is a pleiotropic cytokine that induces the NFIL3 axis, leading to upregulation of TIM-3, PD-L1, and IL-10 expression, which can directly promote T cell exhaustion, resulting in an impaired ability to eliminate chronic viral infections effectively.[178]^56^,[179]^57 Therefore, targeting the upstream IL-27 signaling pathway to alleviate or reverse CD8^+ T cell exhaustion represents a plausible strategy to mitigate the adverse outcomes of long COVID. Additionally, IL-27 signaling has been implicated in metabolic reprogramming, specifically through the upregulation of UCP1, PPAR⍺, and PCC-1⍺, resulting in increased energy expenditure and stimulation of thermogenesis.[180]^58 Other molecules in the panel involved in energy metabolism include 2-aminoadipic acid (an established predictor of diabetes), taurine, and acylcarnitines.[181]^59^,[182]^60^,[183]^61^,[184]^62 Spermidine is known for its antioxidant, anti-inflammatory effects and ability to promote nitric oxide production to improve mitochondrial function and biogenesis, whereas asymmetric dimethylarginine elicits opposite effects.[185]^63^,[186]^64 Suppression of the taurine pathway during convalescence was associated with worse health-related quality-of-life and adverse outcomes. Interestingly, taurine has been shown to alleviate oxidative stress and promote beneficial metabolic effects while protecting the cardiovascular system.[187]^65^,[188]^66 Across various animal models, taurine administration improved strength, depressive behavior, memory, and other hallmarks of aging through attenuating cellular senescence, mitochondrial dysfunction, DNA damage, and chronic inflammation.[189]^67 However, the longitudinal safety and efficacy of taurine supplementation to alleviate PASC symptoms in humans remains to be determined. Collectively, these data indicate that persistently altered cellular bioenergetics and mitochondrial dysfunction constitute a significant risk factor for developing PASC that could be targeted to improve clinical outcomes. Given the lack of proven effective therapies for long COVID, our results point toward several potential avenues that may be explored in future studies. Firstly, persistent immune activation can impair wound healing and contribute to neuroinflammation. Therefore, anti-inflammatory strategies such as monoclonal antibody blockade of IL-6, TNF, and IL-1 receptors or short-term corticosteroids could be explored as they have been for acute SARS-CoV-2 infection. Secondly, individuals at higher risk of thromboembolic disorders with long COVID may benefit from anticoagulation, given the observed abnormalities in platelet degranulation and coagulation processes. Thirdly, global metabolomic analyses revealed specific alterations in methionine metabolism and the TCA cycle, suggesting a potential role of antioxidants and treatment strategies to support mitochondrial function and energy production. Fourthly, taurine supplementation can potentially alleviate long COVID burden based on the strong and consistent correlation between taurine levels with PASC symptoms and quality of life. Lastly, the observed dysregulation in microbiota-derived metabolites such as TMAO and phenylacetylglutamine concomitant with findings of gut dysbiosis in long COVID represents an attractive therapeutic target.[190]^32 Limitations of the study Our study leveraged a comprehensive systems-based approach to study the PASC, but some limitations still exist. Since our study represents a relatively severe disease cohort requiring hospitalization prior to mass vaccination, our findings should be extended to patients recovering from home and previously vaccinated individuals. Indeed, emerging evidence suggests that vaccination against SARS-CoV-2 may protect against PASC symptoms in previously infected individuals. This may be related to their ability to stimulate anti-spike protein antibody production and T cell activation to promote viral clearance and resolve chronic inflammation.[191]^68 Additionally, the emergence of Omicron variants substantially increased transmissibility with diminished severity and pathogenicity.[192]^69^,[193]^70 However, given the increased number of individuals infected with the Omicron variant, more people are experiencing long COVID globally.[194]^71 Moreover, despite protocolized morning blood collections to minimize diurnal variations, the metabolic profile and associated relationship with self-reported PASC symptoms could be confounded by the participant’s fasting status. Future studies in large prospective cohorts are warranted to validate the biomarkers and molecular pathways implicated in long COVID pathophysiology and to evaluate the efficacy of several identified therapeutic targets for consideration in clinical trials. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples __________________________________________________________________ Human blood plasma Canadian Biosample Repository [195]https://biosample.ca/ __________________________________________________________________ Deposited data __________________________________________________________________ Raw proteomics data PeptideAtlas: PASS03810 [196]https://peptideatlas.org/ Raw metabolomics data MetaboLights: MTBLS7337 [197]https://www.ebi.ac.uk/metabolights/ __________________________________________________________________ Software and algorithms __________________________________________________________________ MetaboAnalyst 5.0 Pang et al.[198]^72 [199]https://www.metaboanalyst.ca/home.xhtml OriginLab OriginLab Corporation [200]https://www.originlab.com/ R (v4.2.3) The R Project for Statistical Computing [201]https://www.r-project.org/ Python (v3.8.16) Jupyter notebooks [202]https://www.python.org/doc/versions/ Skyline (v21.2.0.536) University of Washington [203]http://www.skyline.ms Ingenuity Pathway Analysis (IPA) QIAGEN [204]https://qiagen.pathfactory.com/ Metascape Zhou et al.[205]^73 [206]http://metascape.org/ [207]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Gavin Y. Oudit (gavin.oudit@ualberta.ca). Materials availability This study did not generate new unique reagents. Experimental model and subject details Study participants The COVID-19 Surveillance Collaboration (CoCollab) Study prospectively enrolled consecutive patients newly admitted to hospital wards designated for COVID-19 and intensive care units at the University of Alberta Hospital (Edmonton, Canada) between October 15, 2020, and June 29, 2021 ([208]Figure S7). All enrolled patients were ≥18 years of age with a laboratory-confirmed COVID-19 diagnosis based on a positive SARS-CoV-2 real-time PCR (PCR) assay from nasopharyngeal swabs or lower respiratory samples. Comparisons were made with age and gender-matched healthy controls (n = 28) enrolled during the same period. Our study was conducted in accordance with the ethical principles of the Declaration of Helsinki with approval from the University of Alberta Health Research Ethics Board (Pro00100319 and Pro00100207). Written and informed consent was obtained from all participants. Method details Plasma collection and storage Venous blood sampling was performed in the morning by trained phlebotomists and transported to the Canadian Biosample Repository located at the University of Alberta within 1 h for immediate processing. Samples were collected in tubes containing ethylenediaminetetraacetic acid (EDTA) and centrifuged at 1500 x g for 10 min at room temperature. Plasma was subsequently aliquoted for storage at −80°C. Baseline sampling of acute COVID-19 was performed immediately following hospital admission, while pre-scheduled follow-up blood sampling at six months was collected either from the patient’s location of residence by the study team or at the Kaye Edmonton Clinic (Edmonton, Canada). Clinical outcomes and quality-of-life assessment Detailed clinical characteristics, including demographics, vital signs, presenting symptoms, comorbidities, and medications, were collected through individual review of electronic medical records. Incidence of all-cause mortality and hospital readmission since their discharge date from the acute COVID-19 hospitalization (median follow-up of 17.4 [IQR: 14.3–18.8] months) is obtained from individual’s electronic medical records until June 30, 2022. A review of symptoms was performed using a questionnaire during follow-up blood sampling that encompassed general systemic (fatigue, general weakness, chills, night sweats, runny nose, muscle ache), cardiopulmonary (shortness of breath, chronic cough, palpitation, tachycardia), neurological (cognitive impairment such as confusion, memory loss or brain fog, insomnia, changes in smell and taste, headache, and mood disturbance), and gastrointestinal (nausea, abdominal pain, diarrhea) domains. Other self-reported symptoms associated with PASC not included in the questionnaire were also recorded. Additionally, the validated SF-12 health questionnaire and Euro Quality visual analogue scale (EQ-VAS) were used to assess the biopsychosocial health of convalescing patients.[209]^74^,[210]^75 The SF-12 health questionnaire examines the physical and mental health of patients across eight separate domains (physical functioning, physical role, bodily pain, general health, vitality, social functioning, emotions, and mental health) while the EQ-VAS assesses individuals’ self-rated health status using a score between 0 (the worse health state imaginable) and 100 (the best health state imaginable). Multiplexed cytokine analysis Luminex xMAP technology was used for the multiplexed quantification of 48 human cytokines, chemokines, and growth factors. The multiplexing analysis was performed using the Luminex 200 system (Luminex, Austin, TX, USA) by Eve Technologies Corp. (Calgary, Alberta). Forty-eight markers were simultaneously measured in the samples using Eve Technologies' Human Cytokine 48-Plex Discovery Assay (MilliporeSigma, Burlington, Massachusetts, USA). The assay was run according to the manufacturer’s protocol. The 48-plex consisted of sCD40L, EGF, Eotaxin, FGF-2, FLT-3 Ligand, Fractalkine, G-CSF, GM-CSF, GROα, IFN-α2, IFN-γ, IL-1α, IL-1β, IL-1RA, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-12(p40), IL-12(p70), IL-13, IL-15, IL-17A, IL-17E/IL-25, IL-17F, IL-18, IL-22, IL-27, IP-10, MCP-1, MCP-3, M-CSF, MDC, MIG/CXCL9, MIP-1α, MIP-1β, PDGF-AA, PDGF-AB/BB, RANTES, TGFα, TNF-α, TNF-β, and VEGF-A. Assay sensitivities of these markers range from 0.14 to 55.8 pg/mL for the 48-plex. Individual analyte sensitivity values are available in the MILLIPLEX MAP protocol. Targeted plasma proteomics by LC-MS Plasma samples were analyzed using a multiple reaction monitoring (MRM)-based methodology with stable isotope-labelled standards (SIS) to quantify 274 human proteins. In this “bottom-up” approach, carefully chosen peptides for each protein were used as protein surrogates. In total, 274 peptides were included in the assay employing the gold standard technique for LC-MRM-MS: each with its own external calibration curve using synthetic light and SIS standard peptides. In addition, three levels of quality control (QC) samples were monitored for each peptide. The points of each calibration curve and the QCs were assessed for accuracy between 75% and 125% at the lowest calibration curve point and QC and between 80% and 120% for the remaining points and QCs. The samples were analyzed in one batch. Peptides were synthesized using FMOC chemistry with ^13C/^15N-labeled amino acids for SIS peptides, purified through reversed phase-HPLC with subsequent assessment by MALDI-TOF-MS, and characterized via amino acid analysis (AAA) and capillary zone electrophoresis (CZE). All other chemicals and reagents used were of the highest analytical quality available and were obtained from commercial vendors. Tryptic peptides were selected to serve as molecular surrogates for the target proteins according to a series of peptide selection rules (for detailed criteria, see Kuzyk et al.[211]^76) and previous detectability in plasma samples. To compensate for matrix-induced suppression or variability in LC-MS performance, ^13C/^15N -labelled peptides were used as internal standards. During sample preparation, 10 μL of raw plasma was sequentially subjected to 9 M urea, 20 mM dithiothreitol, and 0.5 M iodoacetamide. All steps were carried out in Tris buffer at pH 8.0. Denaturation and reduction occurred simultaneously at 37°C for 30 min, with alkylation occurring thereafter in the dark at room temperature for 30 min. Proteolysis was initiated by adding TPCK-treated trypsin (70 μL at 1 mg/mL; Worthington) at a 10:1 substrate: enzyme ratio. After overnight incubation at 37°C, proteolysis was quenched with formic acid (FA) at a final concentration of 1.0%. The SIS peptide mixture was then spiked into the samples. All samples were then concentrated by solid-phase extraction (Oasis HLB, 2 mg sorbent; Waters). After solid-phase extraction, the concentrated eluate was dried using a vacuum concentrator and rehydrated in 0.1% FA to a final protein concentration of 1 μg/μL for LC-MRM/MS analysis. A surrogate matrix for use with standard and QC samples was prepared from a digest of 10 mg/mL Bovine Serum Albumin (BSA) in PBS buffer, using the same methodology as the plasma samples described above. The standard curves were generated using a natural isotopic abundance (NAT) peptide for each analyte. A dilution series of the NAT peptides in BSA tryptic digest was prepared from a high concentration of 1000X the lower limit of quantitation (LLOQ) over seven dilutions to the lowest point of the curve, which was also the LLOQ for the assay. The QC samples were prepared from the same NAT mix and diluted in BSA digest. Injections of 10 μL of the plasma tryptic digests were separated with a Zorbax Eclipse Plus RP-UHPLC column (2.1 × 150 mm, 1.8 μm particle diameter; Agilent) that was contained within a 1290 Infinity system (Agilent). Peptide separations were achieved at 0.4 mL∕min over a 60 min run via a multi-step LC gradient (2–80% mobile phase B; mobile phase compositions: A was 0.1% FA in H[2]O while B was 0.1% FA in acetonitrile). The column was maintained at 50°C. A 4-min post gradient column re-equilibration step was used after each sample analysis. The LC system was interfaced to a triple quadrupole mass spectrometer (Agilent 6495C) via a standard-flow AJS ESI source, operated in the positive ion mode. The general MRM acquisition parameters employed were as follows: 3.5 kV capillary voltage, 300 V nozzle voltage, 11 L∕min sheath gas flow at a temperature of 250°C, 15 L∕min drying gas flow at a temperature of 150°C, 30 psi nebulizer gas pressure, 5 V cell accelerator potential, and unit mass resolution in the first and third quadrupole mass analyzers. The high energy dynode (HED) multiplier was set to −20 kV for improved ion detection efficiency and signal-to-noise ratios. Specific LC-MS acquisition parameters were employed for optimal peptide ionization/fragmentation and scheduled MRM. Note that the peptide optimizations had previously been empirically optimized by direct infusion of the purified SIS peptides. In the quantitative analysis, the targets (1 transition/peptide) were monitored over 700 ms cycles and 1.5 min detection windows. The MRM data was visualized and examined with Skyline Quantitative Analysis software (version 21.2.0.536, University of Washington). This involved peak inspection to ensure accurate selection, integration, and uniformity (in terms of peak shape and retention time) of the SIS and NAT peptides. After defining a small number of criteria (i.e., 1/x2 regression weighting, <25% deviation in the QC-A’s level’s accuracy, <20% for QCs B and C), a standard curve was used to calculate the peptide concentration in fmol/μL of plasma in the samples through linear regression. Targeted plasma metabolomics by LC-MS A custom-made targeted quantitative metabolomics approach was applied to analyze the samples using a combination of direct injection mass spectrometry (DI-MS) and LC-MS/MS. This custom LC-MS assay can be used for the targeted identification and quantification of up to 636 endogenous metabolites, including amino acids and amino acid derivatives, biogenic amines, ceramides, cholesterol esters, diacylglycerols, acylcarnitines, glycerophospholipids, sphingomyelins, triacylglycerols, organic acids and nucleotide/nucleosides. The method uses chemical derivatization (for organic acids and biogenic amines), analyte extraction and LC separation (or direct injection for lipids and acylcarnitines), combined with selective mass-spectrometric detection using multiple reaction monitoring (MRM) pairs to identify and quantify metabolites. Isotope-labelled ISTDs and isotope-labelled chemical derivatization standards are used for accurate metabolite quantification. Stock solutions of each standard and ISTD used in the assay are prepared by dissolving accurately weighed solids in double-distilled water (ddH[2]O). Calibration curve standards are obtained by mixing and diluting the corresponding stock solutions with ddH[2]O. For amino acids, biogenic amines, carbohydrates, acylcarnitines and derivatives, and all lipids and their derivatives, stock solutions of isotope-labelled compounds were also prepared similarly. A working ISTD solution mixture in ddH[2]O was also made by mixing all the prepared isotope-labelled stock solutions together. For organic acids, stock solutions of isotope-labelled compounds were prepared by dissolving the accurately weighed solids in 75% aqueous methanol. A working internal standard solution mixture in 75% aqueous methanol was made by mixing and diluting all the isotope-labelled stock solutions. The assay uses a 96-deep-well plate with a filter plate attached via sealing tape and a set of reagents and solvents to prepare the plate assay. The first 14 wells of the 96-well plate are used for a blank sample, three zero samples, seven standard-containing or calibration samples and three quality control (QC) samples. To extract and measure all metabolites except organic acids, plasma samples were first thawed on ice, then vortexed and centrifuged at 13,000 [MATH: × :MATH] g. A total of 10 μL of each plasma sample was loaded onto the center of the filter on the upper 96-well plate and dried in a stream of nitrogen. Subsequently, phenyl-isothiocyanate (PITC) was added for the derivatization of all amine-containing metabolites. After incubation, the filter spots were dried again using an evaporator. The metabolites were then extracted by adding an ammonium acetate/methanol mixture (5 mM ammonium acetate dissolved in 300 μL methanol). Then, the extracts were centrifuged into the lower 96-deep-well plate and diluted with the MS running solvent before injection into the LC-MS system. For organic acid analysis, 150 μL of ice-cold methanol and 10 μL of the isotopically labeled ISTD mixture were added to 50 μL of each plasma sample for overnight protein precipitation (at 20°C). After the precipitation step was complete, each sample was centrifuged at 13,000 [MATH: × :MATH] g for 20 min at 4°C; 50 μL of each supernatant was loaded onto the center of a selected well of the 96-deep well-plate, followed by the addition of 3-nitrophenylhydrazine (3-NPH), which serves as an organic-acid specific derivatization reagent. After incubation for 2h, 10 mg of butylated hydroxytoluene (BHT), which is used as a stabilizer, and 50 μL water were added before LC-MS injection. Mass spectrometric analysis was performed on an ABSciex 5500 Qtrap tandem mass spectrometry instrument (Applied Biosystems/MDS Analytical Technologies, Foster City, CA) equipped with an Agilent 1290 series UHPLC system (Agilent Technologies, Palo Alto, CA) with an Agilent Zorbax C18 column. The samples were delivered to the mass spectrometer by an LC method followed by a direct injection (DI) method. Data analysis was done using Analyst 1.6.2 (Applied Biosystems/MDS Analytical Technologies, Foster City, CA). Quantification and statistical analysis Data processing and statistical analysis For proteomic, metabolomic, and cytokine profiling, features (i.e., molecules) with >50% missing values were removed from the dataset. The remaining values below the lower limit of detection (LLOD) were imputed using the minimum value of each feature as previously described ([212]Table S9).[213]^20^,[214]^22 Subsequently, the data were normalized by applying a logarithmic (base 10) transformation. Principal component analysis (PCA) was performed using the corrected features (i.e., molecules) with MetaboAnalyst 5.0 ([215]www.metaboanalyst.ca).[216]^72 Mann-Whitney U and Kruskal-Wallis tests were utilized for non-normalized distributions, while Student’s t test and ANOVA were performed for parametric comparisons. p values were adjusted for multiple testing using the Benjamini and Hochberg false discovery rate (FDR) correction. Analyses of differentially expressed cytokines, proteins, and metabolites with significant filtering criteria (p value <0.05 and |fold-change| >1.5) were performed using MetaboAnalyst 5.0. Binary logistic regression was performed for the association between self-reported PASC symptoms and DEM between convalescence and healthy controls, as well as changes between convalescence and acute COVID-19, adjusting for age, gender, diabetes, acute COVID-19 management (dexamethasone, antibiotics, tocilizumab, remdesivir), WHO Ordinal Scale, and vaccination status prior to follow-up blood collection. Additionally, association with health-related quality-of-life scores (SF-12 and EQ-VAS) was assessed using the multiple linear regression analysis adjusting for the same clinical variables. Correlations between molecules and clinical laboratory values in acute COVID-19 were assessed using the Spearman correlation coefficient. Box and bar plots were made using Origin, Version 2022b (OriginLab Corporation, Northampton, MA, USA). Statistical analyses were performed using R 4.2.3 (Vienna, Austria). Unsupervised clustering and machine learning panel analyses were performed by Python 3.8.16. Pathway analysis Differentially expressed cytokines, proteins, and metabolites were analyzed using the Ingenuity pathway analysis (IPA) system (Qiagen) to identify the most relevant pathways with significance levels based on the right-tailed Fisher’s exact test.[217]^77 Networks between molecules were generated using an algorithm that assigns scores based on a hypergeometric distribution for each network. The Gene Ontology (GO) terms were enriched for cytokines and proteomics using the Metascape platform.[218]^73 Additionally, metabolomic pathways were enriched using the Pathway Analysis module on MetaboAnalyst 5.0. Unsupervised clustering The intent of performing clustering on our data was to capture any significant similarities between patients on a biological scale in lieu of self-reported PASC severities. The k-means algorithm was utilized for unsupervised clustering to minimize the variance within a cluster. The algorithm generates ‘k’ cluster centroids and assigns samples to clusters based on their relative distance from each centroid. This can be written as an optimization problem: [MATH: argminS< /msub>i=1kxSi||xμi||22 :MATH] Here, [MATH: Si :MATH] represents the set of data points in a cluster [MATH: i :MATH] . [MATH: μi :MATH] is the centroid of cluster [MATH: i :MATH] and is computed as the mean of elements in the [MATH: ith :MATH] cluster. During each iteration of the algorithm, the elements in [MATH: Si :MATH] are updated, which in turn changes the value of [MATH: ui :MATH] until convergence is reached. Our dataset consists of a greater number of variables than samples, which presents a challenge for clustering since the samples are widely scattered in the high-dimensional variable space. Therefore, a dimensionality-reduction approach is first used to decrease the number of variables before clustering the data. We performed PCA at a variance cut-off of 95%, which resulted in 81 principal components for the differential measurements between convalescence and acute phases. Since the reduction in the number of variables was insufficient, a non-linear dimensionality reduction was performed using an autoencoder. Autoencoders (AE) are a class of artificial neural networks where the network architecture creates a bottleneck by encoding a layer of lower dimensions to generate a lower-dimensional data projection. Since the activation of each layer in the AE is non-linear, the lower dimensional projection of the data is a non-linear combination of the original variables. The autoencoder consisted of three encoding layers of 100, 70, and 50 neurons each. The bottleneck layer consisted of 30 neurons followed by three decoding layers, with all layers using the sigmoid activation on their outputs. Through this approach, the dimensionality of the data was reduced to 30 features for the acute and convalescent samples using sigmoidal activation functions in each layer. The relative significance of each of the original variables in the encoded dimension can be determined by performing a saliency analysis. Subsequently, the k-means algorithm is applied to this encoded data with a cluster size of 3. Machine learning and predictive model The problem of feature extraction is determining a minimal set of features or independent variables that contain the most information needed to predict the response variable. This is a supervised learning problem, and in the context of this study, the minimal number of biomolecules required for predicting adverse clinical outcomes was determined. This is a binary classification problem where the response variable [MATH: y[0,1] :MATH] . A linear classifier was used to classify the response. The mathematical formulation of this problem is given by: [MATH: Yˆ=XW+w0 :MATH] [MATH: Yˆ=[y1ˆ y2ˆ yNˆ ]N×1,X =[log(a11)log(a1M ) log(aN1 )log(aNM )]N×M,W =[w1< /mtr>w2< mo>⋮wM< /mtr>]M×1 :MATH] Where [MATH: W :MATH] is the weight matrix the linearly combines the data matrix [MATH: X :MATH] consisting of [MATH: n :MATH] samples with [MATH: m :MATH] features or variables. The weight matrix is ‘learned’ from the data by minimising the loss function given by: [MATH: Loss=1NiN[yi log(P(yiˆ))+(1yi)log(1P(yiˆ))] :MATH] where [MATH: P(yiˆ)=11+e(Xi W+w0) :MATH] The absolute value of [MATH: ws :MATH] indicates the contribution of each feature in the classification. A 5-fold cross-validation was performed to check for overfitting of data. This classification problem was repeated on datasets containing all biomolecules or each molecule type (i.e., cytokines, proteins, metabolites). The receiver operating characteristic (ROC) curve of classification was generated from each dataset for their ability to predict clinical outcomes. Subsequently, a minimal feature set was determined using a greedy search algorithm known as sequential feature extraction. In this method, a new feature is sequentially introduced to the classifier, and the improvement in its predictive capability is monitored. This routine is followed until a combination of the predetermined features provides the best possible classification performance. Acknowledgments