Abstract Background Atrial fibrillation (AF) is one of the most prevalent causes of cryptogenic stroke. Also, apart from AF itself, structural and remodelling changes in the atria might be an underlying cause of cryptogenic stroke. We aimed to discover circulating proteins and reveal pathways altered in AF and atrial cardiomyopathy, measured by left atrial volume index (LAVI) and peak atrial longitudinal strain (PALS), in patients with cryptogenic stroke. Methods An aptamer array (including 1310 proteins) was measured in the blood of 20 cryptogenic stroke patients monitored during 28 days with a Holter device as a case-control study of the Crypto-AF cohort. Protein levels were compared between patients with (n = 10) and without AF (n = 10) after stroke, and the best candidates were tested in 111 patients from the same cohort (44 patients with AF and 67 without AF). In addition, in the first 20 patients, proteins were explored according to PALS and LAVI values. Results Forty-six proteins were differentially expressed in AF cases. Of those, four proteins were tested in a larger sample size. Only DPP7, presenting lower levels in AF patients, was further validated. Fifty-seven proteins correlated with LAVI, and 270 correlated with PALS. NT-proBNP was common in all the discovery analyses performed. Interestingly, many proteins and pathways were altered in patients with low PALS. Conclusions Multiple proteins and pathways related to AF and atrial cardiomyopathy have been revealed. The role of DPP7 as a biomarker for stroke aetiology should be further explored. Moreover, the present study may be considered hypothesis-generating. Keywords: Atrial fibrillation, Atrial cardiomyopathy, Biomarkers, Cryptogenic stroke, Atrial function 1. Introduction Atrial fibrillation (AF) is a prevalent cardiac rhythm disorder underlying up to one-third of all ischemic strokes [53][1]. Paroxysmal AF remains undetected in a high proportion of patients after stroke and is one of the most prevalent causes of cryptogenic stroke [54][2]. Also, there is increasing recognition that atrial dysfunction itself is associated with an increased risk of thromboembolism, even in patients without AF [55][3]. Therefore, atrial substrate or atrial cardiomyopathy has been proposed as an important cause of cryptogenic strokes [56][4], [57][5]. Left atria (LA) enlargement and atrial fibrosis are two structural hallmarks of the atrial substrate [58][3]. LA size has been associated with cardioembolic stroke and AF detection in patients with embolic stroke of undetermined source (ESUS) [59][6]. Similarly, peak atrial longitudinal strain (PALS), which measures the LA wall deformability and is a surrogate of LA fibrosis, has been associated with AF in cryptogenic stroke patients [60][7]. Some blood biomarkers (e.g. natriuretic peptides) have been proposed as useful tools to detect paroxysmal AF [61][8]. In addition, circulating markers might allow noninvasive assessment of atrial cardiomyopathy before AF appears and might guide the selection of patients for more intensive post-stroke monitoring to personalize the secondary prevention treatments [62][2], [63][5]. The present study aims to discover circulating proteins and reveal pathways altered in AF and atrial cardiomyopathy, measured by left atrial volume index (LAVI) and PALS, in patients with cryptogenic stroke. 2. Methods 2.1. Study population The study population represented a subpopulation of the Crypto-AF study [64][9], [65][10]. Non-lacunar acute ischemic stroke patients over 55 years of age with cryptogenic stroke after standard evaluation were included in the study by four Spanish public Stroke Centers from January 2015 to July 2017. All patients included had no prior history of AF. Patients were monitored for 28 days with a wearable Holter device (Nuubo^TM) following a published protocol [66][9]. The software algorithm classified every episode of irregular ECG rhythm lasting > 120 s as possible AF. An expert cardiologist blinded to clinical data verified the episodes. From this cohort, a total of ten consecutive patients with AF detected during the monitoring period, and ten matched controls (by sex and age) without AF were selected for the present discovery study. In addition, 111 patients with available blood samples were used for the validation study (44 patients with AF and 67 matched controls). Blood samples were collected into EDTA and serum separator tubes within 72 h after symptoms onset. After centrifugation at 1500 g and 4 °C for 15 min, plasma and serum aliquots were frozen at −80 °C until further analysis. Left atrial size was measured by biplane transthoracic echocardiography to obtain the left atrial volume adjusted to body surface index (LAVI, ml/m^2) following the latest guidelines [67][11]. Peak atrial longitudinal strain (PALS) was evaluated by speckle tracking software (GE EchoPAC®) following expert recommendations [68][12]. Written informed consent was obtained from all participants, and the study was approved (PR (AG)49/2014) by the Ethical Committee of Vall d’Hebrón Hospital, Valladolid Hospital, Virgen Macarena Hospital and Virgen del Rocio Hospital, in line with Helsinki guidelines. 2.2. Aptamer array Protein levels in plasma were assessed using the SOMAscan® platform (SomaLogic Inc., Boulder, CO, USA), which is an aptamer-based proteomic assay that allowed the simultaneous measurement and quantification of 1310 proteins [69][13]. This approach uses SOMAmers® reagents, which are short single-stranded DNA sequences with protein affinity. The platform transforms the proteins present in the biological sample into a corresponding SOMAmer signal, which then is quantified using the microarrays technology. Three different dilutions (depending on each protein abundance) were used. Normalization and calibration procedures were performed by SomaLogic according to their protocol [70][14]. All samples passed SomaLogic quality controls. A set of control calibrator samples were used to detect and remove systematic variability between independent assay runs. Seventy-nine proteins were marked as “flags” due to high inter-plate variability and eliminated from the analysis. Data were reported in relative fluorescent units (RFU) after normalization and calibration. 2.3. Elisa Serum coiled-coil domain-containing protein 80 (CCDC80)(BosterBio), and plasma dipeptidyl peptidase 7 (DPP7)(R&D Systems), bone morphogenetic protein 1 (BMP-1)(Elabscience), and cystatin-D (BosterBio) were determined by ELISA. All assays were performed blinded to clinical information and according to the manufacturer’s instructions. All samples were tested in duplicate, and inter-assay variation was determined by a commercial control (Human Serum, male AB, USA origin from clotted, SIGMA, ref number [71]H16914; Human plasma K2 EDTA, Innovative Research, ref number IPLA-N) tested in duplicate in each plate. When inter-assay variation was > 20%, biomarker levels were standardized by the common control sample. Samples with a CV (coefficient of variation) > 20% between duplicates were eliminated from the analysis. 2.4. Statistics R software version 3.6.1 and SPSS version 20 were used to conduct statistical analysis. Categorical variables were expressed as numbers and percentages and continuous variables as mean ± SD, or median (interquartile range) for continuous variables, depending on their distribution. Student’s t-test, Mann–Whitney or x^2 were used to compare variables between AF cases and controls depending on the type and distribution of each variable. SOMAscan data were log-transformed as presented a skewed distribution. Differential expression analyses were performed using the “limma” package (Bioconductor) version 3.42.2, optimized for omics studies with large amounts of data and few samples [72][15]. Spearman correlations were calculated between LAVI or PALS and all the analyzed proteins. The R package “Venndiagram” version 1.6.20 was used to visualize the common proteins between the different analyses. All p-values were adjusted using Benjamini and Hochberg (BH) false discovery rate (FDR). Group matching by sex and age was used to select control samples in the discovery experiment. The validation sample size was estimated based on Somascan results (power of 80%, α = 0.05) (Ene 3.0, GlaxoSmithKline, UK). The addition of DPP7 to a logistic regression model fitted by age, sex, echocardiographic markers (LAVI and PALS), and NT-proBNP was tested using the Likelihood Ratio Test. Odds ratios (OR) for an increment of one unit of concentration were shown. The classification performance of the models was compared using Reciever Operating Curves. The R package “ggeffects” version 1.1.1 was used to plot the average predicted probability of the model when varying the variable of interest. 2.5. Pathway analysis Pathway enrichment analysis was conducted following a published protocol [73][16]. Gene Set Enrichment Analysis (GSEA) software was applied to all the SOMAscan proteins ordered by T-statistic or correlation coefficient against Reactome Pathways and Gene Ontology (biological processes) databases. Gene sets with < 15 genes or > 200 genes were excluded. GSEA calculates a normalized enrichment score (NES) for each gene set. Positive and negative NES values represent enrichment of the corresponding gene set at the top (i.e., upregulated) or bottom (i.e., downregulated) of the ranked list. P-values were computed by gene set permutation for 1000. Then, multiple testing using a false-discovery rate (FDR) was applied to obtain the Q-values. Results were visualized via Cytoscape Enrichment Map with a Jaccard Overlap Combined Coefficient > 0.375. Significant pathways were considered at Q-value < 0.25. 3. Results The descriptive characteristics of the 20 patients included in the discovery study are provided in [74]Table 1. The median age was 71.5, and 55% were women. Clinical variables were similar between the two groups. Table 1. Clinical characteristics of the patients included in the discovery experiment and comparison according to atrial fibrillation detection. All (n = 20) AF (n = 10) No AF (n = 10) p-value Sex (%female) 11 (55%) 6 (60%) 5 (50%) 0.65[75]^& Age (years) 71.5 (67–80) 73.5 (69.75–80) 67.5 (62.25–81.5) 0.247[76]^$ Hypertension 14 (70%) 7 (70%) 7 (70%) 1.00[77]^& Diabetes 5 (25%) 3 (30%) 2 (20%) 1.00[78]^& Vasculopathy 1 (5%) 0 (0%) 1 (10%) 1.00[79]^& Renal failure 1 (5.3%) 1 (11.1%) 0 (0%) 0.47[80]^& COPD 1 (5.3%) 1 (11.1%) 0 (0%) 0.47[81]^& Obesity 8 (40%) 5 (50%) 3 (30%) 0.65[82]^& Heart disease 2 (10%) 2 (20%) 0 (0%) 0.47[83]^& Basal NIHSS 4 (2–7) 3 (1–7) 5 (3–7) 0.29[84]^$ LVEF (%) 64.74 ± 34.19 63 ± 7.84 66.30 ± 7.51 0.362[85]^# PALS (%) 25.76 ± 12.98 29.89 ± 14.06 20.59 ± 10.00 0.134[86]^# LAVI (ml/m^2) 31 (27–37) 30 (27–34) 34 (22.5–37.75) 0.815[87]^$ Number of AF episodes 28 (7–42.5) Longest AF episode (min) 780.48 (121.25–1510.71) [88]Open in a new tab COPD, chronic obstructive pulmonary disease; NIHSS, National Institutes of Health Stroke Scale; LVEF, left ventricular ejection fraction; PALS, peak atrial longitudinal strain; LAVI, left atrial volume index. The “heart disease” terminology included any cardiopathy that the investigator considered of interest, including into this category ischemic cardiopathy, and hypertensive cardiopathy, between others. The two patients with heart disease in this cohort had a mild mitral and aortic valvulopathy, and a hypertensive cardiomyopathy respectively. ^# Student’s t-test. ^$ Mann–Whitney test. ^& x^2 test. 3.1. Differential protein expression and altered pathways in AF Among the tested proteins, 46 were differentially expressed in AF cases at a nominal p-value of 0.05 (22 down-regulated and 24 up-regulated). Although no protein remained significant after multiple comparison correction, NT-proBNP showed the strongest association (p-value = 0.001, logFC = 1.86) and BNP was ranked sixth (p-value = 0.008, logFC = 0.39). Both natriuretic peptides were well-known biomarkers of AF, already validated in this cohort in a published study [89][8]. Proteins with p-values between NT-proBNP and BNP were selected to evaluate their usefulness in a larger group of patients: CCDC80 (p-value = 0.0013), DPP7 (p-value = 0.0039), BMP-1 (p-value = 0.0051), and Cystatin-D (p-value = 0.0080) ([90]Fig. 1 and Supplemental Table 1). Fig. 1. [91]Fig. 1 [92]Open in a new tab Volcano plot of differentially expressed proteins between AF cases and no AF. Black dots above the red line indicate significant proteins, while grey dots below the red line indicate non-significant proteins according to nominal p-value < 0.05. Labeled proteins are those with a nominal p-value < 0.01 or nominal p-value < 0.05 and |logFC|>1.Proteins with positive logFC had higher levels in the AF group and vice-versa. (For interpretation of the references to colour in this figure legend,