Abstract Cardiovascular disease (CVD) remains a leading cause of global morbidity and mortality. Timely diagnosis is important in reducing both short and long-term health complications. Saliva has emerged as a potential source for biomarker discovery, offering a non-invasive tool for early detection of individuals at elevated risk for CVD, yet large-scale extensive proteomic analysis using saliva for a comprehensive biomarker discovery remains limited. In an effort to develop a diagnostic tool using saliva samples, our study aims to assess the salivary and plasma proteomes in subjects with high risk of developing CVD using a large-scale proteomic approach. Leveraging on the SOMAscan platform, we analyzed 1,317 proteins in saliva and plasma collected from subjects at a high risk of CVD (HR-CVD) and compared the profiles to subjects with low risk of CVD (LR-CVD). Our analysis revealed significant differences in the plasma and salivary proteins between the two groups. Pathway enrichment analysis of the differentially detected proteins revealed that the immune system activation and extracellular matrix remodeling are the most enriched pathways in the CVD-HR group. Comparing proteomic signatures between plasma and saliva, we found approximately 42 and 17 differentially expressed proteins associated with CVD-HR uniquely expressed in plasma and saliva respectively. Additionally, we identified eight common CVD-risk biomarkers shared between both plasma and saliva, demonstrating promising diagnostic tools for identifying individuals at high risk of developing CVD. In conclusion, saliva proteomics holds a significant promise to identify subjects with a high risk to develop CVD. Further studies are needed to validate our findings. Keywords: Proteome, Saliva, Qatar biobank, CVD, Qatar GenomeProject, Machine Learning Subject terms: Diagnostic markers, Biomarkers, Cardiology, Cardiovascular biology, Proteomics Introduction Cardiovascular disease (CVD) is a group of conditions affecting the heart and blood vessels, such as heart failure, hypertension, stroke, coronary heart disease, and atherosclerosis^[36]1. CVD stands as a leading cause of mortality worldwide, responsible for approximately one-third of all deaths, and is recognized as the primary noncommunicable disease^[37]2. Furthermore, the CVD direct medical cost is predicted to increase to $818 billion by 2030 compared to $273 billion in 2010 in the United States^[38]3. Various risk factors contribute to the development of CVD, including increased body mass index (BMI), smoking, diabetes, high levels of low-density lipoprotein cholesterol, bad dietary habits^[39]2, and inflammation^[40]4. The prevalence of these risk factors differs among populations^[41]5. Data from the planning and statistical authorities in Qatar indicate that CVD is among the leading causes of mortality in 2020, contributing to 29% of all deaths in the country^[42]6. This is mainly attributed to the prevalent risk factors for CVD in Qatar^[43]7. A recent study showed that one in every five Qatari subjects is either prediabetic or diabetic, and one in three is hypertensive^[44]8. These risk factors are predicted to substantially increase by 2050^[45]9. A recent study examined the expected burden of CVD on diabetes over the next 10 years in Qatar and predicted a direct cost of 11.40 billion US$ and an indirect cost to surpass 8.30 billion US$^[46]10. Despite these concerning statistics, published studies investigating CVD risk among the Qatari and Arab populations at large are still scarce^[47]11. The early identification of individuals who are at high risk for developing CVD is crucial for early and cost-effective intervention^[48]12,[49]13. Therefore, there is a high demand for diagnostic biomarkers as they help in the early detection of diseases^[50]14. In our previous study, we assessed the association between the salivary microbiome and CVD risk using a large cohort of Qatar Genome Project (QGP) participants^[51]11. We showed significant differences in the salivary microbiome composition between HR and LR CVD subjects^[52]11. Recent advancements in protein assays allowed high throughput proteomic profiling, enabling the rapid discovery of new biomarkers by examining large numbers of proteins involved in various biological pathways^[53]12,[54]15. Among the pioneering high-throughput proteomic platforms utilized extensively in epidemiological and clinical investigations is the SOMAscan platform^[55]16. This platform employs single-stranded RNA or DNA sequences, termed aptamers, capable of recognizing epitopes on folded proteins^[56]16. With the capacity to analyze approximately 7,000 proteins in a relatively small sample volume, the SOMAscan platform offers exceptional sensitivity and reproducibility^[57]16. Its application has proven instrumental in identifying protein signatures linked to various diseases and potential biomarkers, including those associated with glomerulonephritis^[58]17, cancer^[59]18, Parkinson’s disease^[60]19, asthma^[61]20, and systemic sclerosis^[62]21among others. In CVD, SOMAscan was used to search for new CVD biomarkers in population-based studies like the Heart and Soul study (USA)^[63]22, Framingham Heart Study (USA)^[64]15,[65]23, Jackson Heart Study (US)^[66]24, in addition to other cohorts from Iceland^[67]25and Italy (InCHIANTI study)^[68]26. These studies have identified new CVD biomarkers^[69]15,[70]22–[71]24but were mainly done on populations with European ancestry^[72]15,[73]22,[74]23,[75]25. It is also worth noting that in CVD, large-scale proteomic profiling is mainly performed in the blood, and the proteome of other body fluids remains lacking^[76]27. To date, approximately 3,000 human saliva proteins have been identified, encompassing various enzymes, immunoglobulins, glycoproteins, and hormones, which collectively contribute to the maintenance of oral cavity homeostasis^[77]28. Saliva, often regarded as a "mirror of the gut" holds promise for diagnostic applications due to its diverse molecular composition derived from local blood supply, microbes, and cellular constituents^[78]29. Moreover, saliva is an accessible and non-invasive sample that is easy to collect, and as a result, salivary biomarkers can be applied for developing rapid diagnostic tools^[79]27. Despite these advantages, aptamer-based methods in saliva have primarily focused on cardiac markers such as troponins and myoglobin, while a comprehensive proteomic analysis for CVD protein signatures remains sparse^[80]27. The present study aims to leverage the SOMAscan proteomic panel to uncover biomarkers associated with CVD risk in both saliva and plasma samples from the Qatari population, thus laying the groundwork for identifying non-invasive salivary biomarkers for CVD risk and enhancing our understanding of proteomic signatures linked to CVD risk in saliva. Methods Ethical statement Approval for the study was obtained from the Institutional Review Board (IRB) of Sidra Medicine under protocol #1510001907, and from Qatar Biobank (QBB) under protocol #E/2018/QBB-RES-ACC-0063/0022. Prior to sample collection, all study participants signed an informed consent, and the experiments were conducted in accordance with the approved guidelines and in accordance with the Declaration of Helsinki. Study population and clinical data Cardiovascular disease (CVD) risk scores were computed to assess the risk of suffering a heart attack over the next 10 years, using the Cox proportional-hazards regression, as detailed in our previous report^[81]11. From the same study cohort, we randomly selected 50 subjects with low-risk to develop CVD (CVD score < 10) (CVD-LR) and 50 subjects with high-risk (HR) (CVD score > 20) (CVD-HR)^[82]11. The study included Qatari participants aged 18–64 years. Participants were excluded if they had recent antibiotics use (within three months) or suffered from chronic diseases (e.g., gastroesophageal reflux disease, Crohn’s disease, thyroid disease or cancer). De-identified samples, along with anthropometric and clinical data for all study subjects, were collected from QBB. In brief, enrolled subjects were advised to fast for at least 8 h before the collection of samples. Matched plasma and saliva samples were collected from the same subjects following QBB standardized sample collection protocol^[83]30,[84]31. Around 60 ml of blood was collected and used for routine blood tests. Then, the remaining was aliquoted and stored at −80°C^[85]30,[86]31. For saliva, about 5 mL of unstimulated saliva was collected in a falcon tube, divided into aliquots of 0.4 mL, and stored at −80°C^[87]32. SOMAscan proteomics The salivary and plasma proteome was characterized using the SOMAscan platform, which uses single-stranded DNA-based protein affinity reagents called SOMAmers (Slow Off-rate Modified Aptamers), as detailed in previous studies^[88]33–[89]35. In essence, each SOMAmer® reagent selectively binds to a specific target protein, totaling approximately 1317 proteins. The SOMAscan assay involves distributing SOMAmers into various sample dilution bins tailored to the analyzed matrix. These diverse distributions and dilution schemes are designed to ensure that analyte concentrations fall within the linear range of the assay for each SOMAmer. In conventional matrices such as plasma and serum, SOMAmers are split into 0.005%, 1%, and 40% dilution bins. However, for non-traditional matrices like saliva, specific dilution bins are not predefined, and samples are typically analyzed at a single dilution, with all SOMAmers assayed accordingly. To establish the optimal dilution for saliva samples, we conducted SOMAScan assays using pooled and individual saliva samples serially diluted from 40% to 0.3125%. This process facilitated the identification of the optimal saliva dilution, determined to be 10% diluted in assay buffer, resulting in average assay values falling within the mid-range of the dynamic range for each SOMAmer. Throughout the SOMAScan assay, adherence to the manufacturer’s cell and tissue protocol instructions was maintained. Relative fluorescence unit values obtained from SOMAscan were normalized against the hybridization control to correct for any systematic effects introduced during hybridization. The hybridization control factor was determined by pooling all samples from different plates. Median normalization was applied across all samples within the arrays, ensuring a successful assessment of signal intensity variance based on the hybridization controls. SOMAscan data analysis The raw fluorescence data of 1,317 proteins were first normalized via quantile normalization using the “normalizeBetweenArrays” from the limma package (v3.56.2)^[90]36. UMAP analysis indicated that patient’s BMI and age have a very strong confounding effect in segregating samples. Consequently, a differential expression analysis was conducted using limma, with consideration given to age and BMI within the design matrix. We used the ‘lmFit’ function for multiple linear regression, followed by the ‘ebays’ function with parameter ‘robus = TRUE’ to compute moderate t-statistics, F-statistics, and log-odd ratios. P-values were adjusted using Benjamini & Hochberg method. Proteins showing p-values < 0.05 and at least 50%-fold change between low and high CVD risk patients in either plasma or saliva were selected. Additionally, differentially expressed proteins demonstrating a consistent trend of expression change between the two tissues and significant statistical changes in both were selected as the initial biomarker candidates. For visualization purposes, the effects of age and BMI were first regressed out. Samples were then clustered into two groups based on UMAP representations reflecting low and high age/BMI values. The resulting cluster IDs were utilized to regress out the effects of age and BMI using the ‘removeBatchEffect’ function from the limma package. Functional enrichment analysis The enrichGO function from the R/Bioconductor clusterProfiler package (v4.8.3)^[91]37 was used to perform Gene Ontology (GO) and pathway enrichment analysis focusing on Biological Process ontologies. Only GO terms exhibiting an adjusted p-value < 0.05 were included in the analysis. Then, GO enrichment plots were generated utilizing the ggplot2 package. Estimation of protein biomarkers importance using machine learning models The following machine learning models: Random Forest (RF)^[92]38, Elastic-net (eNet)^[93]39, Partial least squares via mixOmics (pls)^[94]40, XGBoost^[95]41, generalized linear model (GLM) and Radial Basis Function (RBF) kernel SVM^[96]42were respectively used to determine the predictive importance of each marker in classifying CVD-HR and CVD-LR patients. We used the tidymodels R package to train the different models^[97]43. Each model was trained using repeated cross validation (5-folds and 10 repeats in each fold). To avoid label unbalancing during training, the different cross-validation subsets were generated in a stratified manner. Hyper parameter tuning was done using a grid search algorithm. The RF, eNet, and pls models had the highest performance in all tissues (plasma, saliva) and were selected to calculate the mean importance of each marker in the three models. The variable importance of each model was scaled to be within [0,1] Statistical analysis and visualization The demographic and clinical data of the study cohort were analyzed using GraphPad Prism (10.1.2). Mann–Whitney U tests were utilized to compare variables, including age, BMI, systolic and diastolic blood pressure, glucose level, HbA1C, lipid profile, total protein, albumin, urea, and creatinine. Next, the Chi-square test was employed to compare the impact of smoking and sex between the CVD-HR group and the CVD-LR group. Statistical significance was set at p-values less than 0.05. All statistical analysis were conducted using R version 4.3.1, with the limma package (version 3.56.2)^[98]36. The visualization of the results was carried out using ggplot2 and ComplexHeatmap R packages^[99]44. Venn diagrams were created using Intervene^[100]45. Results Characteristics of the Study Cohort The baseline demographic and clinical characteristics of the study cohort comprising a total of 100 individuals are listed in Table [101]1. Based on cardiovascular risk score, we selected 50 subjects categorized as CVD high-risk (CVD-HR) and 50 individuals classified as CVD low-risk (CVD-LR). Overall, the CVD-HR subjects had an average age of 55.32 ± 6.7 compared to 43.06 ± 7.6 in the CVD-LR group. Furthermore, a significantly higher proportion of smokers was observed in the CVD-HR group compared to the CVD-LR group. The CVD-HR group displayed markedly elevated systolic and diastolic blood pressure, glucose levels, HbA1C, lipid profile parameters, urea, and creatinine levels compared to the CVD-LR subjects. Our findings also revealed specific biases in various clinical characteristics, with systolic and diastolic blood pressure demonstrating a positive correlation with CVD-HR. Moreover, BMI and age were identified as contributors to patient segregation, as illustrated in Figure S1. Therefore, we controlled for age and BMI during the differential expression analysis of plasma and saliva proteomic profiles. Table 1. Demographic and Clinical Characteristics of the Study Population. CVD high risk group (CVD-HR) N = 50 CVD low risk group (CVD-LR) N = 50 p value Age (years) 55.32 ± 6.7 43.06 ± 7.6  < 0.0001^a*** Male/Female 36/14 25/25 0.024^b* BMI 31.15 ± 5.241 29.60 ± 4.638 0.096^a Systolic blood pressure 137.1 ± 17.64 113.0 ± 14.95  < 0.0001^a*** Diastolic blood pressure 74.68 ± 10.22 66.98 ± 10.64 0.0002^a*** Smoking, n 22^(44%) 12^(24%) 0.0348^b* Glucose(mmol/L) 8.938 ± 4.093 5.874 ± 2.316  < 0.0001^a*** HBA 1C % 7.738 ± 2.117 5.750 ± 1.259  < 0.0001^a*** HDL-Cholesterol(mmol/L) 1.496 ± 0.4493 1.135 ± 0.3237  < 0.0001^a*** LDL-Cholesterol (mmol/L) 3.435 ± 1.230 2.899 ± 0.8931 0.0049^a** Cholesterol Total (mmol/L) 5.488 ± 1.243 4.972 ± 0.9963 0.0101^a* Triglyceride(mmol/L) 1.874 ± 0.8429 1.270 ± 0.6329  < 0.0001^a*** Total Protein (gm/L) 72.46 ± 4.062 72.10 ± 3.655 0.59^a Albumin(gm/L) 44.28 ± 3.860 44.32 ± 3.560 0.66^a Urea(mmol/L) 5.004 ± 1.533 4.438 ± 1.786 0.0262^a* Creatinine(µmol/L) 74.40 ± 16.56 69.58 ± 21.78 0.0328^a* [102]Open in a new tab ^a Mann–Whitney U was used, ^b Chi-square was used. *P-value < 0.05, **P-value < 0.01, ***P-value < 0.001. Data were shown as mean ± SD or n, (%). The plasma and salivary proteomes show differentially expressed proteins in CVD-HR and CVD-LR subjects Differential expression analysis using plasma samples of 50 CVD-HR subjects in comparison to 50 CVD-LR subjects encompassing a total of 1,317 proteins detected using SOMAscan. Among these proteins, in the plasma, a subset of 207 proteins exhibited significant differences (p-value < 0.05) between CVD-HR and CVD-LR (Figure S2). Subsequently, proteins displaying both p-values < 0.05 and at least a 50% fold-change (|FC|> 1.5) between the CVD-HR and CVD-LR groups were selected for visualization in a heatmap, as depicted in Fig. [103]1a. Approximately 44 plasma proteins (21 increased and 23 decreased) demonstrated a significant differential expression with at least a 50% ( |FC|> 1.5) fold-change in plasma of CVD-HR and CVD-LR groups (Fig. [104]1a). Fig. 1. [105]Fig. 1 [106]Open in a new tab Heat maps of differentially expressed proteins in CVD-HR group. Hierarchical clustering heatmaps of proteins that are differentially expressed between the CVD-HR and CVD-LR groups in plasma (a) and saliva (b). The cohort age, BMI, systolic blood pressure (BP), diastolic blood pressure (BP), smoking, and gender are shown. Samples are clustered using ward. D2 hierarchical clustering method. Key: red = up-regulated, blue = down-regulated. Proteins with p-values < 0.05 and at least 50%-fold change are shown. CVD-HR = 50 subjects, CVD-LR = 50 subjects. On the other hand, a total of 94 proteins exhibited significant differences (p-value < 0.05) in saliva samples when comparing the two groups (Figure S2). Among these, 25 salivary proteins demonstrated significant differential expression with at least a 50%-fold-change (18 increased and 7 decreased) between the CVD-HR and CVD-LR groups, as illustrated in Fig. [107]1b. Identification of common CVD-risk biomarkers in plasma and saliva The differentially expressed proteins in CVD-HR group were further examined to search for common CVD-risk biomarkers between plasma and saliva. We found eight proteins that showed correlated enrichment in both the plasma and saliva of CVD-HR subjects (Figs. [108]2 and [109]3). These potential biomarkers include Plexin B2 (PLXNB2), LDL receptor-related protein 1B (LRP1B), GDNF Family Receptor Alpha 1 (GFRA1), acid phosphatase 5, tartrate resistant (ACP5), Chemokine (C–C motif) ligand 15 (CCL15), Complement Component 1, R Subcomponent (C1R), proteasome activator subunit 3 (PSME3) and kallikrein 5 (KLK5). The PLXNB2, LRP1B, GFRA1, ACP5, C1R, and CCL15 were upregulated in both saliva and plasma of CVD-HR compared to the CVD-LR groups (Figs. [110]2 and [111]3). On the other hand, PSME3 was the only downregulated protein in the CVD-HR group in both plasma and saliva (Figs. [112]2 and [113]3). Interestingly, KLK5 showed a difference in the direction of change between saliva and plasma, as shown in Figs. [114]2 and [115]3. We next examined whether taking anti-diabetic, antihypertensive, or antilipidemic medications will influence the shared CVD-risk biomarkers (Figure S3). All the shared CVD-risk biomarkers showed significant differential expression in saliva and plasma samples of CVD-HR after correction for treatment, except plasma KLK5, which was significantly different in saliva but not in plasma samples (Figure S3). Fig. 2. [116]Fig. 2 [117]Open in a new tab Venn diagram depicting shared and distinct CVD-risk biomarkers between plasma and saliva in CVD-HR group. CVD-HR Plasma = 50 subjects, CVD-HR Saliva = 50 subjects. Fig. 3. [118]Fig. 3 [119]Open in a new tab Expression level of proteins with correlated enrichment in both plasma and saliva of CVD-HR compared to CVD-LR. The Wilcoxon test was used to compare the two groups. p-value < 0.05 was considered significant. CVD-HR = 50 subjects, CVD-LR = 50 subjects. The prediction performance of CVD-risk biomarkers using machine learning models To accurately examine the diagnostic ability of the selected biomarkers to distinguish CVD-HR from CVD-LR and identify a non-invasive biomarker that can be measured either in saliva or plasma we run an unbiased machine learning (ML) analysis. First, we started by identifying the best performing ML models on our data. Hence, we compared six ML models: Random Forest (RF), Elastic-net (eNET), mixOmics (pls), XGBoost, generalized linear model (GLM), and SVM (RBF) using 1,317 proteins or the 8 selected markers as input and using 70% of samples for training and the other 30% as testing set (see methods). In plasma, the unbiased model trained on the 1,317 proteins (all features) gave better predictive power compared to the model trained using only the eight shared markers (selected features), as illustrated in Fig. [120]4a. However, interestingly, the restricted model (selected features) is still highly accurate and shows an AUC > 0.75, indicating that the previously selected markers still hold a very strong predictive power. Fig. 4. [121]Fig. 4 [122]Open in a new tab Estimation of CVD-risk biomarkers importance using machine learning models. Random Forest (RF), Elastic-net (eNET), mixOmics (pls), generalized linear regression model (GLM), XGBoost, and (RBF) SVM models were used to estimate the predive power markers in classifying CVD-HR and CVD-LR groups in plasma (a) and saliva (b). The model, all features, indicates the model learned using all the 1,317 proteins. The “selected features” model was built using the eight common markers. The average importance between RF, eNet, and pls models were used to rank the markers in plasma (c) and saliva (d). The variable importance of each model was scaled to be within [0,1]. CVD-HR = 50 subjects, CVD-LR = 50 subjects. Measured proteins = 1,317. Alternatively, the selected features model demonstrated better predictive power to the models using all the features, as shown in Fig. [123]4b. This suggests that the selected markers have a better predictive potential in saliva tissue, potentially indicating their suitability as non-invasive biomarkers. Moreover, the RF, eNet, and PLS models had the highest performance in both plasma and saliva, with an average AUC > 0.8 in plasma (Fig. [124]4a) and AUC > 0.7 in saliva (Fig. [125]4b). Next, we calculated the mean importance of each marker in the unbiased version of these three models (Table [126]S1). The variable importance of each model was scaled to be within [0,1] before averaging. Among the eight common CVD-risk biomarkers, the top three predictive biomarkers in plasma were LRP1B (median importance = 0.876309), PLXNB2 (median importance = 0.352254), and CCL15 (median importance = 0.328339), respectively (Fig. [127]4c and Table S1). Meanwhile, in saliva, the top three predictive biomarkers were C1R (median importance = 0.387032), LRP1B (median importance = 0.375685), and PLXNB2 (median importance = 0.266522) (Fig. [128]4d and Table S1). Across all the differentially expressed proteins in plasma and saliva of the CVD-HR group, plasma LRP1B (median importance = 0.876309) was the strongest CVD-risk predictive biomarker, followed by 14–3-3 protein (YWHAE) (median importance = 0.804179) and saliva Protein S100-A7 (S100A7) (median importance = 0.541921) (Table S1). Pathway enrichment analysis for the differentially expressed proteins To get insights into pathways involved in the differentially expressed proteins, the gene Ontology (GO) gene sets were analyzed by the enrichGO function from the R/Bioconductor cluster Profiler. The analysis revealed eight pathways enriched in plasma proteins of the CVD-HR group, as shown in Fig. [129]5a. The extracellular matrix organization and the extracellular structure organization were the most enriched pathways for the differentially expressed biomarkers in the plasma of the CVD-HR group. Similarly, for the saliva differential proteins, ten pathways were also enriched in the CVD-HR group (Fig. [130]5b), namely, the humoral immune response. Fig. 5. [131]Fig. 5 [132]Open in a new tab Pathways enrichment analysis for the differential proteins in the CVD-HR and CVD-LR groups. The 10 most enriched Gene Ontology (GO) terms in plasma (a) and saliva (b) are listed. The circle size represents the ratio of proteins/genes. The enrichment level is indicated by the color bar (blue to red) representing the p-value (0.050 to 0.000). CVD-HR = 50 subjects, CVD-LR = 50 subjects. Discussion In summary, the pursuit of a reliable CVD biomarker detectable in bodily fluids presents a substantial promise for cardiovascular risk assessment, diagnostic accuracy, management strategy guidance, and prognosis prediction^[133]46,[134]47. Nonetheless, there remains an urgent requirement for non-invasive biomarkers capable of accurately predicting CVD risk^[135]12. Current biomarkers such as troponin, creatinine kinase, and myoglobin primarily rely on antibody-based detection methods^[136]27. Despite their high sensitivity and selectivity, antibody-based diagnostics are often costly and subject to batch-to-batch variability^[137]48,[138]49. Aptamer-based platforms have emerged as a compelling alternative to address the limitations inherent in antibody-based detection^[139]27. While aptamer-based technology has identified novel CVD biomarkers from plasma or serum samples, data pertaining to saliva-based CVD protein signatures^[140]15,[141]22,[142]23,[143]25,[144]26, particularly employing high-throughput proteomic methods like SOMAscan, are currently scarce^[145]14,[146]27. In the present study, we utilized the aptamer-based SOMAscan platform to analyze plasma and saliva samples obtained from the QGP participants. Through this approach, we assessed 1,317 proteins and identified unique protein signatures associated with increased CVD risk in the Qatari population. Then, using machine learning models we evaluated the predictive power of the identified CVD-risk biomarkers. These findings hold considerable potential for the advancement of promising non-invasive CVD biomarkers and furnish invaluable insights into the proteomic alterations observed in plasma and saliva concerning CVD risk. In plasma, a larger number of proteins (207 proteins) showed association with CVD-HR compared to saliva (94 proteins) (Figure S2). Notably, upon comparing the CVD-HR proteomic signatures between plasma and saliva, distinct proteins linked with CVD risk were identified in each fluid (Figure S2). Predominantly, the differentially expressed CVD markers in both plasma and saliva belonged to the category of inflammatory proteins or were implicated in inflammatory processes (Table S1). Inflammation stands as a recognized risk factor in CVD pathogenesis^[147]4, and inflammatory proteins, such as cytokines, are present in the saliva^[148]50. However, it is plausible that the secretory function of the salivary glands may regulate the levels of inflammatory markers within saliva^[149]50,[150]51. Interestingly, saliva protein markers seem to more prominently reflect local inflammation compared to systemic inflammation, as evidenced by plasma markers^[151]52. This discrepancy may account for the distinct CVD-HR signatures observed in plasma and saliva as previously highlighted^[152]53. In our study, we identified eight candidate CVD-HR protein biomarkers shared between plasma and saliva, capable of distinguishing between CVD-HR and CVD-LR groups (Figs. [153]2 and [154]3). These shared biomarkers include LRP1B, C1R, CCL15, KLK5, GFRA1, PLXNB2, ACP5, and PSME3 (Fig. [155]3). Among these candidates, LRP1B or LDL receptor-related protein 1B emerges as the most promising biomarker, exhibiting the best predictive value (median importance = 0.876309) (Table S1). LRP1B belongs to the LDL receptor family and is prominently expressed in the brain, thyroid, and salivary glands^[156]54. LRP1B exhibits binding affinity to various extracellular proteins implicated in blood coagulation and lipoprotein metabolism, such as fibrinogen and lipoproteins carrying apoE^[157]55. Interestingly, our study underscores a notable elevation in saliva fibrinogen levels and plasma ApoE protein within the CVD-HR group (Table S1). Numerous investigations have elucidated the association of the LRP1B gene with obesity^[158]56. Interestingly, the LRP1B gene was reported in some genome-wide association studies (GWAS), with findings linking it to systolic blood pressure, particularly in Chinese and sub-Saharan African populations^[159]57,[160]58. LRP1B harbors an intronic single nucleotide polymorphism (SNP) linked to blood pressure regulation, and has a notable interaction effect with smoking^[161]59. LRP1B is abundantly expressed in the medial layer of coronary arteries, and genetic variations in LRP1B have been linked to the risk of coronary artery aneurysms in Kawasaki disease among Taiwanese cohorts^[162]60. Additionally, LRP1B protein plays a role in Alzheimer’s disease by modulating the cellular trafficking and localization of the amyloid precursor protein^[163]61. Furthermore, a significant increase in LRP1B protein level was reported in the serum of women with systemic sclerosis^[164]21. In the InCHIANTI population study from Italy, LRP1B was inversely associated with cardiovascular health^[165]26. The current study reports the association of LRP1B protein expression in saliva and plasma with CVD-HR (Fig. [166]3). Moreover, our data highlights LRP1B as a potential biomarker for CVD risk, boasting the highest predictive accuracy (Fig. [167]4). C1R or the Complement Component 1, R subcomponent is a proteolytic subunit in the C1 complex^[168]62, an integral initiator of the classical pathway of the complement system^[169]62. The complement system plays a key role in the immune system^[170]19,[171]63. It’s involved in the inflammatory mechanism leading to the development of atherosclerosis^[172]64. Expression levels of complement proteins, including C1R, have been found to be elevated in atherosclerotic plaques^[173]64. Furthermore, local complement system activation can lead to neutrophil chemotaxis towards clot formation sites in acute myocardial infarction, with C1 protein detected within plasma clots^[174]65. Activation of C1 has been associated with remote ischemic conditioning in animal models of ischemic stroke^[175]66. C1R was also found to increase in circulating exosomes from ischemic stroke patients^[176]67. Our findings reveal the upregulation of C1R in both plasma and saliva samples from individuals at high risk for CVD (Fig. [177]3). Moreover, employing machine learning models, we observed that C1R exhibited superior predictive capabilities in saliva compared to plasma (Fig. [178]4), thereby positioning it as a potential non-invasive biomarker for CVD risk assessment. Another promising biomarker to predict high CVD-risk identified by the current study is kallikrein 5 or kallikrein-related peptidase 5 (KLK5). KLK5 is a member of the Kallikrein-related peptidases (KLKs) family which comprises highly conserved serine proteases^[179]68. KLKs, along with the complement system and the renin-angiotensin system (RAS) pathway, play crucial roles in cardiovascular disease by initiating vascular inflammation, leading to hypertension and subsequent clot formation^[180]19. KLK5 is mainly expressed in the skin, brain, breast, and testis^[181]69and plays a key role in skin homeostasis^[182]70. Additionally, KLK5 is also involved in the thrombolytic system^[183]71as it binds and modifies plasminogen, kininogen, and fibrinogen^[184]72. On the other hand, KLK5 can be inhibited by antiplasmin and antithrombin^[185]72. Interestingly, our study observed a significant decrease in KLK5 levels in the plasma of individuals at high CVD risk, whereas a marked increase was noted in saliva samples from subjects with high CVD risk compared to those at low risk (Fig. [186]3and Table S1). It’s important to note that KLKs proteins in plasma differ from tissue KLKs, exhibiting distinct enzymes and releasing different kinins^[187]19. This difference in enzymatic component, activation, and effect might explain the interesting variation in KLK5 level between saliva and plasma. Additionally, some of the proteins implicated in blood coagulation, such as plasma thrombin and the Integrin alpha-IIb: beta-3 complex (platelet receptor), demonstrated decreased levels in plasma, while fibrinogen exhibited a significant increase in saliva samples from the CVD-HR group (Table S1). This distinctive pattern mirrors the observed variation in KLK5 levels between plasma and saliva. Moreover, KLK5 is recognized as a promising early biomarker in cancer^[188]73. In a recent study, KLK5 was found to be associated with T2D in an African American population^[189]74. Here, we suggest the potential use of salivary KLK5 as a noninvasive and an early biomarker for predicting high CVD-risk. Another candidate for high CVD-risk markers is the Chemokine (C–C motif) ligand 15 (CCL15), also known as Macrophage inhibitory protein-5 (MIP-5) or leukotactin-1 (Lkn-1)^[190]75. This pro-inflammatory chemokine plays a pivotal role in activating and recruiting leukocytes into the blood vessel wall^[191]75. In a study involving a South African population, CCL15 emerged as a valuable indicator of vascular health, demonstrating a positive association with Carotid intima media thickness (cIMT), an early marker of atherosclerotic changes^[192]75. Plasma CCL15, along with 70 proteins, was part of a protein risk-score that was associated with atherosclerotic cardiovascular disease incidence^[193]25. CCL15 was found increased in patients with myocardial infarction with non-obstructive coronary arteries in comparison to acute myocardial infarction with obstructive coronary arteries patients^[194]76. Besides, recent evidence from a large-scale study conducted in cohorts from Norway and the USA has highlighted the link between plasma CCL15 and heart failure incidents^[195]77. Our data confirm the association of increased levels of CCL15 with high CVD-risk in the plasma and report similar findings in the saliva of the Qatari subjects (Fig. [196]3). Additional CVD-risk biomarkers identified in our study include GFRA1, PLXNB2, ACP5, and PSME3. A recent investigation among the African American population reported associations between plasma levels of GFRA1 and Plexin B2 with type 2 diabetes^[197]74. GFRA1, formally known as Glial cell line-derived neurotrophic factor Family Receptor Alpha 1, has been implicated in numerous studies investigating modifiable lifestyle risk factors. For example, within the Framingham Heart Study, GFRA1 exhibited a significant association with alcohol consumption. Furthermore, a study involving Saudi women with gestational diabetes mellitus (GDM) linked the GFRA1 gene with this condition^[198]78. PLXNB2, also recognized as Plexin B2, is expressed in human monocytes, macrophages, and foam cells, and has been observed to play a role in monocyte binding to endothelial cells in vitro. Additionally, Plexin B2 has been associated with heightened diabetes risk within the Cardiovascular Health Study population^[199]79,[200]80. ACP5, or tartrate-resistant acid phosphatase type 5, serves as an enzyme involved in bone metabolism and immune system response against bacteria^[201]81,[202]82. ACP5 is primarily expressed by osteoclasts, dendritic cells, and activated macrophages. Serum ACP5 has been suggested as a potential biomarker for detecting bone metastasis in prostate cancer patients^[203]83. In a previous study that addressed the effect of magnesium on cardiovascular disease blood biomarkers, ACP5 level was found to be affected by magnesium supplementation^[204]84. Moreover, serum ACP5 levels have been observed to rise in chronic kidney disease patients with vascular calcification and undergoing hemodialysis^[205]85. Our data reveal a significant elevation in ACP5 levels in the plasma and saliva of individuals at high risk for cardiovascular disease compared to those at low risk. PSME3, or Proteasome activator complex subunit 3 serves as a pivotal regulator in protein degradation by acting as a regulatory protein for the 20 S proteasome^[206]86,[207]87. Within the cell, PSME3 predominantly exists as a homodimer within the nucleus. Its role in macrophages has been noted for its significant contribution to bolstering protection against bacterial infections^[208]88,[209]89. Previous research has indicated an elevated level of PSME3 in pancreatic cancer. However, our findings demonstrate a reduction in PSME3 expression in both plasma and saliva samples from individuals classified in the high-risk group for cardiovascular disease (CVD-HR) (see Figs. [210]2 and [211]3). Notably, PSME3 has been associated with obesity and insulin resistance^[212]90. Additionally, it has implications in cell proliferation and fosters glycolysis in pancreatic cancer^[213]86. After correction for treatment (Figure S3), we observed a consistent level in most of CVD-risk shared biomarkers with our previous results (Fig. [214]4 c-d), except plasma KLK5. Suggesting treatment didn’t influence the validity of these markers, especially in the saliva of CVD-HR. Moreover, we conducted a pathway enrichment analysis for the differentially expressed proteins in CVD-HR group to explore the pathways associated with the identified CVD-risk biomarkers (Fig. [215]5). Our analysis revealed extracellular matrix organization and disassembly as the two shared enriched pathways among the protein biomarkers associated with CVD risk, identified in both plasma and saliva samples (depicted in Fig. [216]5). This aligns with previous research indicating extracellular matrix organization as a primary mechanism associated with proteins linked to early death in heart failure^[217]91. In saliva, KLK5 was significantly increased in CVD-HR (Fig. [218]3). KLK5, with its trypsin-like activity, can digest components of extracellular matrix like collagen (I, II, III, and IV), fibronectin, and laminin^[219]72. Our data also reports a significant increase in plasma collagen alpha-1(VIII) chain (CO8A1) along with differential expression of Matrix metalloproteinase (like MMP9, MMP3, and MMP12) in CVD-HR group (Table S1). This might explain the enrichment of extracellular matrix organization, and disassembly in CVD-HR (Fig. [220]5). The pathway enrichment analysis also gave an insight into the distinct CVD protein signature in saliva as it is uniquely enriched for pathways related to antibacterial functions (Fig. [221]5b). This can be explained by the significantly high level of KLK5 and KLK7 proteins only in saliva from the CVD-risk group (Table S1). Enzymes in the saliva are involved in key roles, including antimicrobial function^[222]28. KLK5 and KLK7, by their proteolytic activity, mediate the antimicrobial activity of antimicrobial peptides like cathelicidin^[223]92. Besides, other proteins involved in defense against bacteria, like MRC1, and S100A7, were also significantly increased only in the saliva of CVD-HR group (Table S1). Interestingly, all the shared CVD-risk biomarkers are linked to the oxidized LDL, which builds up very early in atherosclerosis development. Suggesting the relevance of the identified CVD-risk biomarkers in reflecting CVD development at an early stage^[224]79,[225]93–[226]99. Looking ahead, longitudinal studies will be needed to follow up on these subjects and observe their progression to CVD. Our study has some limitations. First, the sample size used on the SOMAscan platform for CVD-risk biomarker discovery is relatively small. Since SOMAscan provides a relative quantification instead of an absolute quantification^[227]100, a complementary quantitative proteomic method like immunoassay to validate the biomarkers identified is needed. Second, applying SOMAscan technology to complex samples like plasma and saliva can result in nonspecific protein detection, as a single aptamer may bind to multiple targets. Third, the study analyzed samples from only Qatari subjects, limiting its generalizability to other population. Further validation of CVD risk biomarkers in a larger mutli-ethnic cohort is needed. In conclusion, this study marks the first attempt, to identify a protein signature associated with CVD-risk in saliva samples using a large-scale proteomic approach (SOMAscan) in the Qatari population. Our results unveil the presence of eight potential CVD-risk protein biomarkers with promising diagnostic accuracy, providing a valuable tool for identifying individuals at risk of CVD development. Consequently, both plasma proteomics and saliva present as promising avenues for predicting CVD risk. Supplementary Information [228]Supplementary Information 1.^ (14.5KB, docx) [229]Supplementary Information 2.^ (1.7MB, pdf) Acknowledgements