Abstract Early- and late-onset preeclampsia (EOPE and LOPE) pose serious maternal-fetal risks, yet non-invasive early prediction remains challenging. In a prospective cohort of 9,586 pregnancies, we analyze trimester-specific plasma cell-free RNA (cfRNA) profiles from 42 EOPE and 43 LOPE cases versus 131 normotensive controls. Organ-specific transcriptomic shifts distinguish EOPE from LOPE. Predictive models based on cfRNA signatures identify EOPE up to 18.0 weeks before clinical onset in the first-trimester (T1) (AUC = 0.88), and 8.5 weeks in the second trimester (T2) (AUC = 0.89). LOPE is predicted 14.9 weeks in advance using T2 data (AUC = 0.90), while T1 performance is lower (AUC = 0.68). External validation confirms robust EOPE prediction (AUC = 0.87 at T1; 0.81 at T2) and acceptable LOPE performance (AUC = 0.63 at T1; AUC = 0.77 at T2). EOPE models are enriched for decidual transcripts, suggesting early maternal involvement; LOPE models reflect broader tissue contributions. These findings offer a path to early, non-invasive, subtype-specific preeclampsia risk stratification and prevention. Subject terms: Predictive markers, RNA sequencing, Pre-eclampsia __________________________________________________________________ Early- and late-onset preeclampsia pose serious maternal-fetal risks, yet non-invasive early prediction remains challenging. Here, the authors show that cfRNA signatures reveal distinct decidual and multiorgan signals, enabling accurate, externally validated prediction of both subtypes. Introduction Maternal and infant mortality during pregnancy and labor are critical indicators of community and national health^[76]1,[77]2. Most pregnancy complications arise from disorders that develop during the periconceptional phase, particularly during embryonic implantation and early placentation^[78]3. Preeclampsia—a life-threatening obstetric syndrome—is characterized by new-onset of hypertension after 20 weeks of gestation, accompanied by signs of kidney, liver, or brain damage^[79]4. Each year, preeclampsia contributes to 14% of maternal deaths worldwide, leaving a lasting impact on survivors’ health^[80]5. It also constitutes a significant public health burden, incurring $1.03 billion in maternal healthcare costs and an additional $1.15 billion for neonatal care in infants born to mothers affected by preeclampsia within the first year after birth in the United States^[81]6. The heterogeneity of preeclampsia is notable, differentiated by the timing of onset and severity of symptoms. Early-onset preeclampsia (EOPE) arises before 34 weeks of gestation, necessitating emergency delivery to mitigate risks to maternal and fetal health^[82]7,[83]8. In contrast, late-onset preeclampsia (LOPE) manifests after 34 weeks and can lead to severe maternal organ damage such as kidney, liver, or brain damage^[84]8–[85]11. Therefore, there is an urgent need for straightforward, non-invasive methods for early diagnosis of preeclampsia in the first trimester to implement preventive strategies effectively^[86]12–[87]14. Since the maternal decidua regulates the initial steps of maternal-embryo communication, decidualization resistance (DR)—characterized by defective endometrial cell differentiation—results in abnormal placentation, which has been associated with the etiology of major obstetric syndromes, including preeclampsia^[88]15–[89]19, even though symptoms may manifest later in gestation^[90]15–[91]19. Recently, we provided an in-depth multi-omics characterization of DR in former EOPE patients, further underscoring the uterine contribution to this pathological condition^[92]20. Analyzing plasma cell-free RNA (cfRNA) through liquid biopsy (i.e., from a blood sample) has emerged as a promising non-invasive tool for molecular monitoring in pregnancy, offering insights into physiological and pathological events^[93]21,[94]22. However, previous cfRNA studies on preeclampsia prediction have faced limitations such as small EOPE sample size^[95]23, lack of clear subtype distinction^[96]23,[97]24, or sampling at later gestational ages^[98]24,[99]25. Our study builds on these foundations and addresses these gaps by including a large, prospectively collected cohort of EOPE cases with strict first-trimester sampling, clearly differentiating EOPE and LOPE, and employing longitudinal sampling. This comprehensive design has allowed us to develop and validate predictive models with improved early and subtype-specific risk stratification. Specifically, in this case-control study, we prospectively analyzed the cfRNA profiles in pregnant women across the three trimesters of pregnancy, comparing EOPE and LOPE with normotensive controls. This approach facilitated the characterization of the circulating transcriptome by mapping the tissue origins and transcriptional changes associated with EOPE and LOPE, revealing that both subtypes display distinct transcriptional differences compared to controls. Our research identified cfRNA profiles that exhibited robust predictive performance for EOPE in both the first (averaging 18.0 weeks before diagnosis) and second trimesters (averaging 8.5 weeks prior to clinical onset), as well as for LOPE in the second trimester (14.9 weeks prior to clinical onset). Monitoring cfRNA profiles not only aids in predicting the risk of developing preeclampsia but also allows the differentiation of both subtypes of preeclampsia and the evaluation of different organ damage in affected patients, providing insights into their prognosis. Results Clinical study design and participants baseline characteristics A total of 9586 pregnant women with singleton pregnancies were enrolled in this prospective and longitudinal case-control study in fourteen tertiary hospitals in Spain (ClinicalTrials.gov Identifier: [100]NCT04990141). Blood samples were collected prospectively across all three trimesters and at the time of EOPE or LOPE diagnosis. Each participant was followed until delivery, ensuring the availability of obstetrical outcome and the creation of a curated database with comprehensive clinical data. Uncomplicated pregnancies that progressed to term (> 37 weeks) were classified as normotensive controls, while those diagnosed with EOPE or LOPE, were categorized according to current established ACOG^[101]4 and FIGO^[102]24 clinical guidelines. Of the 9586 pregnant women enrolled, 7142 were eligible for analysis after excluding participants for selection failure, loss to follow-up, and obstetric complications other than preeclampsia. We included all EOPE cases (n = 42) and randomly selected a subset of LOPE cases (n = 43). The number of LOPE cases was established to match the number of EOPE cases, ensuring that both groups had the same control-to-case ratio of 1:3, which is optimal for model development. Normotensive controls (n = 131) were randomly selected from the 6,905 uncomplicated pregnancies and matched to both EOPE and LOPE cases for key epidemiological variables including gestational age at sampling, maternal age, parity, BMI and ethnicity (Supplementary Fig. [103]1a, b, and Supplementary Table [104]1). Then, a subset of 216 participants composed by preeclampsia cases (EOPE and LOPE) and normotensive controls was selected for total cfRNA sequencing to characterize cfRNA profiles throughout the progression of pregnancy (Fig. [105]1). For the development of predictive models, the cohort was randomly stratified into a discovery set (70% of patients) and a validation set (30% of patients). The discovery set was used to build the predictive model and the validation set to assess its performance in a hold-out group of samples (Supplementary Fig. [106]1c). Fig. 1. Flowchart of the study. [107]Fig. 1 [108]Open in a new tab A total of 9586 pregnant participants were recruited. After excluding participants due to selection failure and loss to follow-up, 8991 remained. Within this cohort, 237 (2,6%) individuals were diagnosed with preeclampsia, including 42 EOPE and 195 LOPE cases, while 1849 (20.6%) individuals had other pregnancy-related pathologies, and 6905 (76,8%) participants had no obstetric complications. For cfRNA analysis, we included all 42 EOPE cases, a subset of 43 LOPE cases and 131 normotensive controls, randomly selected from the matched cohort based on gestational age at sample collection, maternal age, parity, ethnicity, and BMI. From each participant, we collected three peripheral blood samples between 9 and 14 weeks of gestation (T1), 18-28 weeks (T2), and at the time of diagnosis of EOPE and LOPE or after 28 weeks (T3) (Fig. [109]2). Data on the gestational weeks of blood sample collection are summarized in Supplementary Table [110]2. Due to clinical emergencies necessitating immediate termination of pregnancy, T3 could not be collected from fourteen EOPE patients and seven LOPE patients. Fig. 2. Overview of sample collection, preeclampsia diagnosis, and delivery time points across patient and control groups. [111]Fig. 2 [112]Open in a new tab Bar graph illustrating the number of samples collected at each gestational week for the EOPE (a), LOPE (c) and control (e) groups. Color represents the time point of sample collection: T1 (9-14 gestational weeks); T2 (18–28 gestational weeks); T3 (at the time of preeclampsia diagnosis or >28 gestational weeks). Density plot showing the relative frequency of preeclampsia diagnosis and delivery across gestational weeks for the EOPE (b), LOPE (d) and control (f) groups. Maternal characteristics, clinical symptoms, and birth outcomes are summarized in Table [113]1. There were no significant differences in maternal age, parity, ethnicity, BMI index or smoking habits between patients and controls (p > 0.05). Natural conception rate was statistically lower in EOPE patients compared to controls (p = 0.0003) but did not differ significantly in LOPE (p = 0.105). Aspirin prophylaxis (150 mg) was prescribed to 30 EOPE (71.4%), 23 LOPE cases (53.5%), and 10 normotensive controls (7.6%). EOPE was diagnosed at 30.0 ± 3.4 weeks, with severe symptoms in 76.2% of patients; LOPE was diagnosed at 36.5 ± 1.8 weeks, with severe symptoms in 41.9% of patients. Severe symptoms were considered the presence of severely elevated blood pressure (systolic ≥160 mm Hg or diastolic ≥100 mm Hg), thrombocytopenia, impaired liver function, progressive renal insufficiency, pulmonary edema, or neurological complications such as cerebral or visual disturbances^[114]4. Table 1. Maternal characteristics and pregnancy outcomes in the selected subset of participants: EOPE (n = 42), LOPE (n = 43), and Controls (n = 131) Maternal characteristics Group Maternal age (years) Maternal BMI (kg/m2) Primiparous (%) Smoker (%) Natural conception (%) Aspirin (%) EOPE 34.4 (6.3) 28.0 (4.7) 59.5 9.5 80.9 71.4 LOPE 33.7 (4.7) 26.9 (5.3) 62.8 13.9 94.3 53.5 Control 33.68 (3.8) 26.62 (5.0) 55 7.6 96.9 7.6 P-value EOPE vs Control 0.512^a 0.052^b 0.429^c 0.747^d <0.001^d <0.001^c LOPE vs Control 0.982^a 0.816^b 0.368^c 0.213^c 0.105^d <0.001^c Ethnicity / Race Group Caucasian (%) African American (%) Hispanic (%) Asian (%) Other (%) P-Value EOPE 69.0 9.5 19.0 0.0 2.3 0.175^d LOPE 81.0 0.0 13.9 0.0 4.6 0.211^d Control 82.4 5.3 9.2 0.0 3.1 Preeclampsia symptoms Group GA at diagnose (weeks) SBP (mm Hg) DBP (mm Hg) Uteroplacental dysfunction (%) Proteinuria (%) Pulmonary edema (%) HELLP (%) Eclampsia (%) Severe (%) EOPE 30.0 (3.4) 157.5 (18.9) 98.9 (15.0) 69.0 80.9 0.0 14.3 2.4 76.2 LOPE 36.5 (1.8) 152.7 (17.3) 93.9 (8.2) 25.6 88.4 2.3 7.0 0.0 41.9 Control NA 114.1 (11.9) 93.1 (23.4) NA NA NA NA NA NA P-value EOPE vs Control <0.001^b <0.001^a LOPE vs Control <0.001^b <0.001^a Birth outcomes Group GA at delivery (weeks) Preterm birth (%) Cesarea (%) Male fetus (%) Fetal weight (gr) SGA (%) Stillbirth (%) Mother ICU (%) Newborn ICU (%) EOPE 32.1 (3.7) 87.8 69.0 52.4 1520 (557.2) 80.9 11.9 35.2 50.0 LOPE 37.4 (1.6) 41.9 44.2 53.5 2724 (635.5) 39.5 0.0 18.6 16.3 Control 40 (1.1) 0.0 19.8 45.0 3375 (408.3) 1.5 0.0 0.0 0.8 P-value EOPE vs Control <0.001^b <0.001^d <0.001^d 0.407^c <0.001^b <0.001^d <0.001^d <0.001^d <0.001^d LOPE vs Control <0.001^b <0.001^d <0.001^d 0.335^c <0.001^b <0.001^d <0.001^d <0.001^d [115]Open in a new tab Statistical comparisons: Exact p-values are provided for comparisons between case and control groups. Depending on the distribution of the data assessed by the Shapiro–Wilk test, either Student’s t-test (ᵃ) or Wilcoxon rank-sum test (ᵇ) was used for continuous variables. Categorical variables were compared using the Chi-squared test (ᶜ) or Fisher’s exact test (ᵈ), as appropriate. All tests were two-sided. No adjustment for multiple comparisons was applied, as each variable was tested independently. Superscript letters next to p-values indicate the test applied. BMI Body Mass Index, DBP diastolic blood pressure, GA Gestational Age, HELLP Hemolysis, Elevated Liver enzyme levels, and Low Platelet levels, ICU Intensive Care Unit, NA Not Applicable, SBP Systolic blood pressure, SGA Small for Gestational Age. Birth outcomes for EOPE and LOPE included higher rates of small for gestational age, preterm birth, cesarean delivery, and lower fetal weight (p < 0.001). Specifically, preterm deliveries occurred in 87.8% of EOPE patients and in 41.9% of LOPE patients, with cesarean sections required in 69.0% and 44.2% of patients, respectively. In contrast, all deliveries in the control group occurred at term, and only 19.8% involved cesarean sections. Fetal sex did not differ between groups. EOPE patients had significantly higher rates of stillbirth (11.9%) and post-delivery complications (p < 0.001), with 35.2% of mothers and 50.0% of neonates requiring intensive care. In comparison, among patients with LOPE, 18.6% of mothers and 16.3% of newborns required intensive care, whereas no mothers and 0.8% of neonates in the control group needed intensive care. Profiling the tissue origin and dynamics of cfRNA in EOPE and LOPE through pregnancy We analyzed a total of 29,871 cfRNA transcripts after applying quality filtering and normalization processes. To determine the tissue origins of the identified transcripts, we compared our cfRNA dataset to the Human Protein Atlas database^[116]26, focusing on transcripts classified as “enriched” or “enhanced” in specific tissues or organs. In this analysis, we examined tissues and organs that are directly involved in the pathophysiology of preeclampsia and contribute to its clinical manifestations. Our experimental protocol detected over 90% of these classified transcripts for each targeted organ or tissue of interest (Fig. [117]3a), indicating a robust coverage of tissue-specific cfRNA signatures in our dataset. Fig. 3. CfRNA abundance by organ/tissue origin in EOPE, LOPE patients and controls. [118]Fig. 3 [119]Open in a new tab a Number and proportion of cfRNA transcripts from organs/tissues implicated in preeclampsia, relative to Human Protein Atlas reference. b Box plots show cfRNA abundance scores by tissue of origin at each time point, calculated as the sum of log-transformed CPM-TMM normalised counts. Color indicates group. Horizontal lines represent medians; boxes, 25th–75th percentiles; whiskers extend to 1.5x interquartile range. Sample sizes for each time point and group are as follows: T1 (EOPE, n = 41; LOPE, n = 43; control, n = 129); T2 (EOPE, n = 40; LOPE, n = 41; control, n = 120); T3 (EOPE, n = 19 vs. control, n = 34; LOPE, n = 24 vs. control, n = 39). P-values were determined by Wilcoxon rank-sum test with two tails. Exact P-values for all comparisons are provided in Supplementary Table [120]3. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. We then calculated the organ/tissue-specific signature score for patients and controls at three time points during pregnancy (T1, T2 and T3) (Fig. [121]3b and Supplementary Table [122]3). In EOPE patients, a significant increase in cfRNA transcripts from the liver, kidney, and decidua was identified at T2 (p < 0.01), indicating tissue specific damage approximately eight weeks before diagnosis. At T3, when clinical symptoms appear, EOPE patients displayed a significantly higher signature score (p < 0.0001) for additional organs including brain, lungs, placenta, and lymphoid tissues, signaling widespread organ injury. In contrast in LOPE patients, tissue-specific transcripts suggesting organ damage was only observed at T3 (p < 0.01), with lower levels of significance than those in EOPE. To decode cfRNA dynamics throughout pregnancy, we performed a differential abundance analysis at each time point, elucidating molecular changes in the circulating transcriptome associated with disease progression and offering insights into underlying mechanisms. At the time of diagnosis (T3), we identified 24,336 transcripts with significantly altered abundance in EOPE patients compared to controls (FDR < 0.05) (Supplementary Fig. [123]2a and Supplementary Data File [124]1). In contrast, LOPE patients exhibited 11,859 differentially abundant transcripts (FDR < 0.05) (Supplementary Fig. [125]2b and Supplementary Data File [126]2). Notably, 8,127 cfRNAs showed differential abundance in T2 for EOPE patients (FDR < 0.05), whereas no differentially abundant cfRNAs were detected in T1 for either EOPE or LOPE patients, nor in T2 for LOPE. These findings suggest that transcriptomic alterations emerge as EOPE progresses, while LOPE remains largely unchanged. Gene ontology overrepresentation analysis within the differentially abundant cfRNAs at diagnosis revealed biological processes indicative of fetal and maternal organ-specific damage (FDR < 0.05) (Supplementary Fig. [127]2c and Supplementary Table [128]4). Both, EOPE and LOPE patients displayed significant enrichment in key biological processes, including transport across the blood-brain barrier, renal water homeostasis, regulation of blood pressure and cognition, which are hallmark processes of the pathology. Importantly, signatures of fetal tissue damage were identified in both EOPE and LOPE, with a notably greater impact in EOPE patients. Distinct biological processes were associated with either EOPE or LOPE. In EOPE, overrepresentation analysis revealed significantly enriched pathways related to neuronal death, renal filtration, and immune dysfunction ─including interleukin-8 production, response to interleukin-4, neutrophil-mediated immunity, and antimicrobial humoral immune response. In contrast, LOPE cfRNA profile showed signatures linked to heart and brain function (FDR < 0.05), suggesting significant damage to these organs. Thus, cfRNA profile analysis at diagnosis (T3) indicates more extensive transcriptomic alterations in EOPE compared to LOPE, highlighting an exacerbated proinflammatory state as a defining feature. These findings underscore the impacts of the disease on multiple organ systems and suggest that cfRNA profiling may provide valuable insights into the molecular distinctions between preeclampsia subtypes. Additionally, the identification of distinct biological processes linked to each preeclampsia subtype emphasizes the need for tailored therapeutic approaches targeting specific dysfunctions observed in EOPE and LOPE. Early prediction of EOPE and LOPE in the first trimester of pregnancy Given the evidence that cfRNA profiles reflect molecular changes throughout pregnancy, disruptions in these pathways may help identify pregnancies at risk for EOPE or LOPE. Here, we developed a model for EOPE risk assessment based on plasma cfRNA profiles in the first trimester (T1), approximately 18.0 weeks before clinical onset. Our optimal predictive model for EOPE utilized 36 cfRNA transcripts (Supplementary Table [129]5) and was evaluated in a hold-out validation set. The model achieved a sensitivity of 83% and specificity of 90%, with an area under the receiver operator characteristic curve (AUC) of 0.88 (Fig. [130]4a and Supplementary Table [131]6). Nearly all samples were correctly classified, with minimal misclassifications observed reinforcing the model’s robustness and indicating no evidence of overfitting (Fig. [132]4b). Relative contribution of individual cfRNA transcripts to the model’s performance are detailed in Fig. [133]4c. We further evaluated the same cfRNA signature in an independent external dataset^[134]23 (Fig. [135]4a,b and Supplementary Table [136]6), confirming consistent performance (sensitivity 78%, specificity 90%; AUC 0.87), despite cross-cohort variability in protocols and data origin. Fig. 4. Performance and feature importance of first trimester (T1) predictive models for EOPE and LOPE. [137]Fig. 4 [138]Open in a new tab Receiver operating characteristic (ROC) curves for EOPE (a) and LOPE (d) models across internal validation (validation 1) and external validation^[139]23 (validation 2). The X-axis represents the False Negative Rate; the Y-axis, the True Positive Rate. Violin plots showing correctly and misclassified patients and controls based on the classifier score obtained from the predictive model for EOPE (b) and LOPE (e). The X-axis shows the real obstetric outcome; the Y-axis, the predicted outcome. Bar plot illustrating each cfRNAs contribution to EOPE (c) and LOPE (f) models. The X-axis shows the feature importance scores, which quantify the relative contribution of each cfRNA to the model’s predictions, with higher scores indicating features that play a more significant role in discriminating between outcomes. CfRNAs associated with DR are marked with an asterisk. AUC, area under the curve. Further analysis of these 36 transcripts revealed that 17 (47.2%) were identified as markers of DR in women with a history of severe preeclampsia, including CBR3, MMP7, MDK, TRIB1, PAEP^[140]20. The model also incorporates cfRNA transcripts known to be disrupted in preeclamptic placentas, such as RFLBN^[141]27, and CD74^[142]28, as well as others associated with fetal growth restriction, such as CCL4L2^[143]29 and MYL6^[144]30. Using the same computational approach, we developed a predictive model for LOPE in the first trimester (T1), with predictions averaging 24.9 weeks before clinical onset. However, the model’s performance in the validation set was limited, achieving a sensitivity of 72%, specificity of 64%, and an AUC of 0.68 (Fig. [145]4d and Supplementary Table [146]6). Consistent with these findings, the model showed similarly limited performance when applied to an independent external cfRNA dataset^[147]23 (sensitivity 39%, specificity 58%, AUC 0.63) (Fig. [148]4d, e and Supplementary Table [149]6), underscoring the challenges in early LOPE prediction. Misclassified samples are shown in Fig. [150]4e, and the relative contribution of individual cfRNAs to predictive accuracy detailed in Fig. [151]4f. While predictive capability was limited, analysis of the selected cfRNAs offers insights into LOPE mechanisms. Further exploration revealed that several of these cfRNAs map to protein-coding genes with known roles in cardiovascular, hepatic, and immune functions, including PRR23D1, SnoRD126, CD52, TRDV3. Unlike EOPE, no cfRNA transcripts in this model were associated with decidua, underscoring distinct pathophysiological pathways for EOPE and LOPE. In conclusion, our findings demonstrate the effectiveness of cfRNA signatures in predicting EOPE during the first trimester, while LOPE prediction remains challenging, likely reflecting fundamental differences in pathophysiology between EOPE and LOPE. Early prediction for EOPE and LOPE in the second trimester of pregnancy We next investigated the potential for early detection of EOPE and LOPE in the second trimester (T2). The most effective predictive model for EOPE was based on 87 cfRNA transcripts (Supplementary Table [152]5), achieving a sensitivity of 89% and specificity of 86%, with an AUC of 0.89 in the validation set (Fig. [153]5a and Supplementary Table [154]6). Misclassified samples are shown in Fig. [155]5b, and importance scores for each transcript are illustrated in Fig. [156]5c. This model reliably identifies patients at risk for EOPE between 18 and 28 weeks of gestation, approximately 8.5 weeks before clinical onset. When applied to an independent external cfRNA dataset^[157]23, the signature demonstrated good performance (sensitivity 67%, specificity 78%, AUC 0.81), with a moderate reduction likely influenced by the small sample size of EOPE cases (n = 5) (Fig. [158]5a, b and Supplementary Table [159]6). Fig. 5. Performance and feature importance analysis of second trimester (T2) predictive models for EOPE and LOPE. [160]Fig. 5 [161]Open in a new tab Receiver operating characteristic (ROC) curves for EOPE (a) and LOPE (d) models across internal validation (validation 1) and external validation^[162]23 (validation 2). The X-axis represents the False Negative Rate; the Y-axis, the True Positive Rate. Violin plots showing correctly and misclassified patients and controls based on the classifier score obtained from the predictive model for EOPE (b) and LOPE (e). The X-axis shows the real obstetric outcome; the Y-axis, the predicted outcome. Bar plot illustrating each cfRNAs contribution to EOPE (c) and LOPE (f) models. The X-axis shows the feature importance scores, which quantify the relative contribution of each cfRNA to the model’s predictions, with higher scores indicating features that play a more significant role in discriminating between outcomes. cfRNAs associated with DR are marked with an asterisk. AUC, area under the curve. Further investigation into the tissue-specific origin of these transcripts revealed that 32 (36.8%) are associated with DR signature previously described in endometrial tissue from women with a history of severe preeclampsia, including CCL20, CXCR4, IGF1, RBP4, SQSTM1, WNT5A^[163]20. The persistence of decidual contributions as EOPE approaches underscores the maternal decidua’s role in its pathophysiology. The model also includes inflammatory mediators such as SQSTM1, IL1B, CCL20, FASLG and TREM1, as well as transcripts encoding T cell receptors (e.g. TRAV21, TRBV27, TRBV5-7). Additionally, it incorporates anti-inflammatory mediators like ALOX5AP, an immunosuppressive gene linked to recurrent miscarriage^[164]31, and IL19. Transcripts such as RBP4, which directly influences blood pressure regulation^[165]32, NRBF2 involved in autophagy and liver protection^[166]33, and WNT5A, a key regulator of placental growth^[167]34, further support the model’s clinical relevance. The top-performing predictive model for LOPE at T2 included 92 cfRNAs (Supplementary Table [168]5), achieving a sensitivity of 88%, specificity of 92%, and an AUC of 0.90 in the validation cohort (Fig. [169]4d and Supplementary Table [170]6). An analysis of misclassified samples is shown in Fig. [171]5e, with the contributions of individual cfRNAs to predictive accuracy detailed in Fig. [172]5f. Further validation using an independent external dataset^[173]23, performance was lower (sensitivity 60%, specificity 92%, AUC 77%) (Fig. [174]5d, e and Supplementary Table [175]6), likely due to differences in sample timing. While all samples were annotated as collected after 23 weeks, some were obtained near symptom onset or during the third trimester, and metadata limitations precludes precise identification of those cases. Pathway enrichment analysis revealed that this model included cfRNAs related to immune function, such as CFHR1 and CFHR3, involved in complement activation, and immunoglobulin transcripts (e.g., IGKV3D-20, IGKV3D-11, IGHV5-10-1, IGHV3-69-1), and CXCR5, linked to B-cell migration^[176]35. Additionally, the model incorporates a cfRNA corresponding to HISLA, highly expressed in the liver^[177]36, and LINC01419^[178]37. Notably, most predictive cfRNAs were classified as non-coding RNAs or pseudogenes with no annotated function. In contrast to the EOPE model, this LOPE model includes only two cfRNAs related to DR, HES4 and SPEF1. Discussion Previous efforts to develop screening tests for preeclampsia have primarily focused on circulating biomarkers related to placental dysfunction, such as sFLT1 and PlGF^[179]38. These tests have been validated for use starting at 23 weeks of gestation, with their strongest predictive accuracy typically observed within two weeks of symptom onset. Consequently, they are recommended for patients with suspected preeclampsia^[180]39,[181]40. While these tests are particularly useful for short-term prediction, placental dysfunction-based tests are also utilized as early as the first trimester. They are often combined with maternal epidemiological factors and ultrasound or Doppler parameters. However, they face significant limitations in their effectiveness and application^[182]41–[183]44. In settings where guidelines from the National Institute for Health and Care Excellence (NICE) and the ACOG are applied, screening primarily relies on pregnancy-related factors and maternal characteristics. While this approach minimizes additional costs, it has low sensitivity (< 41%)^[184]45,[185]46. Predictive models based on cfRNAs from liquid biopsy, grounded in biological plausibility and applicable early in pregnancy, offering potential improvements for the clinical management of preeclampsia^[186]22–[187]25, yet they have not been clinically applied. Building on this foundation, we prospectively collected blood samples from 9586 pregnant women across three gestational trimesters (9–42 weeks). Then, we selected a subset of 216 participants composed by preeclampsia cases and normotensive controls to generate a comprehensive longitudinal dataset of cfRNA profiles related to EOPE or LOPE progression. The performance metrics demonstrate substantial advancements in leveraging cfRNA signatures for early detection of EOPE in both the first and second trimesters, as well as LOPE in the second trimester. Our study stands out from previous research by addressing several key limitations in the field. First, we report the largest prospective cohort of EOPE cases with first-trimester sampling strictly between 9 and 14 weeks of gestation. This early and consistent inclusion window allowed us to capture cfRNA signatures an average of 18 weeks before clinical onset. Second, our carefully curated dataset enabled a clear distinction between EOPE and LOPE, classified according to established clinical guidelines. Unlike many previous studies that analyze preeclampsia as a single entity or rely on later gestational time points, our approach allows for a more precise molecular characterization of disease subtypes. This also suggests that cfRNA reflects a time-specific pathological status rather than a fixed disease signature. Third, the longitudinal design with multiple sampling points across gestation provided a dynamic view of disease progression, from preclinical stages to diagnosis. The clinical implications of early risk stratification warrant consideration. While low-dose aspirin remains the primary intervention and is already recommended for many at-risk patients, earlier and more precise identification of individuals at high risk for preeclampsia allows for tailored clinical management strategies^[188]47. A shared approach to surveillance—including frequent blood pressure monitoring, renal and liver function assessment, and fetal growth evaluations—has been recommended to mitigate complications in high-risk pregnancies. Furthermore, structured lifestyle interventions, such as calcium and vitamin D supplementation, aerobic exercise, and improved sleep hygiene, may complement pharmacological strategies. A cfRNA-based screening tool offers potential advantages over existing multimarker approaches, which require strict quality control and trained operators, potentially reducing costs and improving accessibility^[189]48. In addition, our analysis highlighted the tissue-specific origins of the detected cfRNAs, offering further insight into the pathophysiology of both subtypes of preeclampsia. For EOPE patients, early signs of tissue distress were observed in the liver, kidney, and decidua at T2, suggesting that these organs may be affected up to eight weeks before clinical diagnosis. By the time of the clinical onset (T3), cfRNA levels associated with critical organs such as the placenta, heart, brain and lungs showed marked elevation in EOPE, indicating widespread organ involvement likely due to apoptotic processes releasing cfRNA into circulation. In LOPE patients, although cfRNA levels also increased by T3, levels were lower compared to EOPE. Furthermore, differential abundance analysis at diagnosis revealed distinct transcriptomic profile in EOPE, with more pronounced cfRNA changes than in LOPE reflecting potential differences in severity and inflammatory response between both preeclampsia subtypes. These distinctions between the subtypes extended to the biological roles of cfRNAs included in the predictive models. In models predicting EOPE, a substantial proportion of cfRNA transcripts were associated with genes involved in decidualization and DR, along with some placental-related transcripts. In contrast, cfRNA transcripts associated with LOPE prediction reflecting broader systemic contributions including placental malfunction. This molecular characterization provides opportunities for the development of targeted interventions. For instance, transcriptomic profiling, such as that presented in this study, could facilitate in silico drug repurposing by identifying dysregulated pathways as potential therapeutic targets^[190]49,[191]50. Experimental approaches using siRNA or mRNA-based interventions to modulate key regulators of preeclampsia pathogenesis are already being explored^[192]51, highlighting the translational potential of cfRNA-driven insights. Furthermore, the progression of cfRNA changes throughout pregnancy could also play a pivotal role in the development of novel therapies. By tracking how cfRNA levels shift in response to disease onset and progression, new therapeutic windows can be explored and more effective treatment targets identified. This approach could pave the way for the development of interventions that could not only prevent the disease but also modify its course, thereby improving maternal and fetal outcomes. While external validation is crucial to confirm the diagnostic performance of our models, we have already conducted validation using an independent external dataset, which supported the relevance of the predictive signature across datasets for both EOPE and LOPE. We acknowledge, however, that the case–control design may limit generalizability compared to case-cohort studies. To address this, a large-scale external validation is currently underway (ClinicalTrials.gov Identifier: [193]NCT06716242), as part of the iPregnostic study, which will assess the performance of these models across a wider range of clinical backgrounds, thereby enhancing their applicability to diverse patient populations. Overall, by analyzing circulating RNAs from a single blood sample at T1 or T2, our approach provides a reliable, standardized diagnostic measure that minimizes subjective interpretation and reduces variability in clinical decision-making. This streamlined strategy simplifies risk stratification, improving both the accuracy and efficiency of preeclampsia screening and facilitating personalized patient monitoring. Methods Study design This prospective, multicenter case-control study was conducted between September 2021 and June 2024 in fourteen hospitals across Spain (ClinicalTrials.gov Identifier: [194]NCT04990141) in compliance with all relevant ethical regulations. Given the incidence rate of preeclampsia, the cohort size was designed to capture a minimum of 30 patients of EOPE over the course of the study. Approval was obtained from the following Clinical Research Ethics Committees in Spain: Comité de Ética de la Investigación con medicamentos del Hospital General Universitario de Castellón (Castellón); CEIm - Hospital Universitario y Politécnico La Fe (Valencia); CEIm Hospital General de Alicante (Alicante); CEIm Hospital Virgen de la Arrixaca (Murcia); CEI Hospital Universitario Sta. Mª del Rosell (Cartagena); CEIm de la Gerencia de Atención Integrada de Albacete (Albacete); CEIm Hospital Puerta del Hierro de Majadahonda (Madrid); Comisión de Investigación del Hospital de Torrejón (Madrid); CEIc Aragón (Zaragoza); CEIm de Euskadi (Bilbao); CEIm Área de Salud Valladolid Oeste (Valladolid); CEI Provincial de Córdoba (Córdoba); CEIm Complejo Hospitalario Universitario de Canarias (Tenerife); and CEIm del Hospital Universitario de Gran Canaria Dr. Negrín (Las Palmas). Written informed consent was collected from all participants prior to blood collection and sample anonymization. A total of 9,586 pregnant women were enrolled based on the following criteria: signed informed consent, age over 18, singleton pregnancy, and first blood sample collection within 9–14 gestational weeks. Each participant provided 20 mL of peripheral blood in the three trimesters of pregnancy, coinciding with routine clinical follow-up: (T1) 9–14 weeks, (T2) 18–28 weeks, and (T3) > 28 weeks or at the time of preeclampsia diagnosis. Gestational age was confirmed via ultrasound during the first trimester. Clinical data for each participant were recorded in an electronic data capture system. All blood samples were processed to isolate plasma and stored at −80 °C until pregnancy outcomes were available. Preeclampsia patients were diagnosed following ACOG^[195]4 and FIGO^[196]52 guidelines, as per the clinical protocol of each hospital involved. To develop predictive models for EOPE and LOPE, a subset of participants was selected from the cohort. All EOPE patients (n = 42), a randomly selected subset of LOPE patients (n = 43) and a subset of normotensive pregnant women with uncomplicated pregnancies were included as controls (n = 131). Control participants were randomly selected from the 6905 uncomplicated pregnancies and matched to both EOPE and LOPE cases for key clinical variables including gestational age at sampling, maternal age, parity, BMI and ethnicity. Participants in the control group were selected based on matching gestational age at the time of blood collection, maternal age, and parity, utilizing Euclidean distance for optimal pairing. Patients and controls were randomly stratified following a 70:30 proportion into two sets: discovery and validation. Sample sizes for the EOPE, LOPE, and control groups in each set are detailed in Supplementary Table [197]6. The discovery set was used for feature selection, model training, and optimization, with model performance assessed by leave-one-out cross-validation. For feature selection, a 1:2 case-to-control ratio was used, as it is optimal for identifying distinct patterns between the groups. For model training, the case-to-control ratio was increased to 1:3 to ensure a larger sample size, which supports better learning of the patterns by the model and improves predictive accuracy. The optimal model from this process was then applied to the validation set to assess the predictive performance, yielding metrics based on an unexposed sample set. The bioinformatic workflow is detailed in Supplementary Fig. [198]3. Blood sample processing and storage Peripheral blood samples (20 mL) were collected in Streck Cell-Free DNA BCT tubes (Illumina, 15073345), stored, shipped at room temperature, and processed within seven days to obtain the plasma fraction. All blood samples were centrifuged for 15 min at 1600 x g and 4 °C. Plasma was transferred to a new collection tube and stored at -80 °C until use. CfRNA isolation, library preparation, and sequencing Plasma supernatant samples (n = 548) from the study patients (n = 216) were centrifuged for 10 min at 13,000 x g. Following the manufacturer’s protocol, cfRNA from 2 mL of plasma was isolated using MiRNeasy Serum/Plasma Advanced Kit (Qiagen, 217204). According to the manufacturer’s protocol, cDNA libraries from total cfRNA samples were prepared using Illumina RNA Prep with Enrichment (L) Tagmentation (Illumina, 20040537). cDNA libraries were quantified using an Agilent D1000 ScreenTape in a 4200 TapeStation system (Agilent Technologies Inc, 5067-5582). Libraries were normalized to 10 nM and pooled in equal volumes. The pool concentration was quantified by qPCR using the KAPA Library Quantification Kit (Roche, 7960336001) and an Agilent D1000 ScreenTape in a 4200 TapeStation system (Agilent Technologies Inc, 5067-5582). The mean value was used to establish pool concentration, which was then sequenced in a NextSeq 500/550 High Output kit with 2.5 cartridges of 150 cycles (Illumina, 20024907). Sequencing data processing Raw reads were aligned to the human reference genome (GRCh38 Gencode v38 Primary Assembly) using STAR (2.7.10a). The SAM/BAM files were further processed using SAMtools (v.1.6). Only reads with mapping quality more significant than 90% were maintained (MAPQ score obtained from the alignment). The duplicated reads were removed with Picard MarkDuplicates (v.2.27.4). The mapping and the quantification of the reads were done using featureCounts (v.2.0.1). Read statistics were estimated using FastQC (v.0.11.9) and RseqQC (v.5.0.1) and summarized using MultiQC (v.1.13). Sample quality filtering Three key quality parameters related to the sequencing process were estimated for each analyzed sample: RNA degradation, DNA contamination, and rRNA fraction as previously defined^[199]21,[200]53 (Supplementary Fig. [201]4a-c). Samples were retained for further analysis if they met the established cut-off values for each parameter: RNA degradation (cut-off: 40%), DNA contamination (cut-off ratio: 3), and rRNA fraction (cut-off: 15%). Principal Component Analysis (PCA) was used as an additional quality control measure (Supplementary Fig. [202]4d, e). Samples deviating by more than 3 standard deviations from the mean of the first and second components for each dataset were excluded from the analysis. In total, 12 samples were removed—2 due to a high rRNA fraction and 10 due to PCA-based exclusion. CfRNA count normalization CfRNAs were filtered based on their detection value, and only cfRNAs with levels over more than 0.5 counts per million reads (CPMs) in ≥70% of discovery samples after removing outlier samples were kept. Discovery set CPMs were normalized using the “deseq median ratio normalization” with pydeseq2 (v0.4.1). The validation set were then normalized with the same algorithm using size factors from discovery set as described in MLSeq package^[203]54,[204]55. Batch effect and other possible confounding factors were assessed using PCA, hierarchical clustering with Spearman correlation as a distance metric, and variance component analysis. Finally, the normalized counts of each sample of discovery and validation sets were re-scaled to 0-1 range with a min-max scaling process. Differential abundance analysis CfRNAs differentially abundant between EOPE or LOPE patients and controls at each time point (T1, T2, T3) were identified using the limma-Voom method from the Bioconductor package limma (v3.60.5). For the T3 samples, comparisons only included patients whose samples were collected at the time of EOPE or LOPE diagnosis and gestationally matched control samples collected during routine medical appointments. Genes with False Discovery Rate (FDR) less or equal to 0.05 were considered statistically significant. Enrichment analysis Gene Ontology (GO) analyses were performed to identify biological processes using the enrichGO function from the clusterProfiler R package (v4.2.2). The input consisted of cfRNAs that were differentially abundant between EOPE and controls, as well as LOPE and controls (FDR < 0.05). The p-value adjustment method used was FDR, with a significance threshold set at 0.05 (FDR < 0.05). Estimating signature scores for each tissue Gene sets for each tissue of interest ─those directly involved in the pathophysiology of preeclampsia─ were derived from the Human Protein Atlas database^[205]26, which includes gene expression data across tissues, focusing specifically on transcripts classified as either “enriched” or “enhanced” within those tissues, but only if they were additionally categorized as “detected in single” to ensure higher tissue specificity. The signature score in our dataset was calculated by summing the log-transformed, normalized counts of each gene in the set. For the T3 samples, comparisons included only those patients whose samples were collected at the time of EOPE or LOPE diagnosis and gestationally matched control samples collected during routine medical appointments, to avoid potential bias. Differences between groups were assessed using the Wilcoxon rank-sum test. Data splitting Our study cohort was divided into discovery and validation sets to develop and evaluate the predictive models, following best practices to prevent overfitting in artificial intelligence. Using stratified sampling based on obstetric outcomes (patient/control groups) and the scikit-learn library (v1.5.1) in Python, 70% of participants were allocated to the discovery set. Feature selection For feature selection, a 1:2 case-to-control ratio was used, as it is optimal for identifying distinct patterns between the groups. The Lasso regression model was used to select the more relevant cfRNAs to discriminate between patients and controls. The discovery dataset was used with a lasso regression algorithm (v1.5.2, sklearn.linear_model.Lasso) with a penalty term (alpha) of 0.5 and the case condition as a dependent part and the cfRNA abundance levels as the independent components, resulting in a regression formula that assigns a coefficient to each cfRNA variable, indicating the correlation between the condition and each variable. The number of cfRNAs selected was determined by a minimum coefficient threshold, which determined whether a cfRNA was relevant or not. Different minimum coefficient thresholds, ranging from 0 to the maximum coefficient in increments of 0.05, were tested to determine the optimal set of cfRNAs. The F1-score was calculated for each set of cfRNAs using the strategy of leave-one-out cross-validation, and the set that yield the highest F1-score metric was selected. Lasso regression was chosen over other feature selection methods due to the relatively small sample size, which can lead to model overfitting. The penalty term of the model helps to counteract overfitting by shrinking and selecting features with less importance^[206]56. Algorithm selection and optimization For the development of the optimal predictor, the discovery set was used, with cfRNA selection performed using the lasso regression method as previously described. Six different algorithms were tested with Python (v3.10.6): Support Vector Machine (v1.5.2, sklearn.svm.SVC), Elastic Net Linear Regression (v1.5.2, sklearn.linear_model.ElastcNet), Lasso Linear Regression (v1.5.2, sklearn.linear_model.Lasso), Random Forest (v1.5.2, sklearn.ensemblel.RandomForestClassifier), XGBoost (v1.7.6 xgboost. XGBClassifier) and TabPFN (v0.1.10 tabpfn.TabPFNClassifier). Each algorithm was trained with the best parameters calculated with a grid search applied with a cross-fold strategy. The evaluation of the predictive capacity of each model was done with a leave-one-out cross-validation with the discovery samples. The algorithm providing the best F1-score was selected for each group of samples: EOPE in the first trimester (EOPE T1), in the second trimester (EOPE T2), LOPE in the first trimester (LOPE T1) and LOPE in the second trimester (LOPE T2). The resulting chosen algorithms were TabPFN for EOPE (T1 and T2) and Lasso Linear Regression for LOPE (T1 and T2). Predictive model training For each model (EOPE T1, LOPE T1, EOPE T2, LOPE T2), the ML algorithm showing the highest F1-score and its best parameters was trained with the discovery dataset, which was based on a 1:3 case-to-control ratio. To evaluate the predictive capacity with the discovery data, a strategy of leave-one-out was performed. The selected algorithm was trained N number of times. In each iteration, one sample was isolated, and the rest were used to fit the model. The fitted model was used to predict the label of the isolated sample, and the result of the prediction was added to a pool of predicted labels that were used to calculate the discovery leave-one-out metrics. Finally, the algorithm was fitted with all the discovery samples, and the obtained trained model was used to predict the labels in the validation dataset and evaluate the performance with never seen samples. Model validation We evaluated the predictive performance of each model (EOPE T1, LOPE T1, EOPE T2, LOPE T2) using three approaches: (1) leave-one-out cross-validation on the discovery dataset; (2) predictions on the hold-out validation dataset using the final model (validation 1); and (3) external validation of the predictive cfRNA signature using an independent dataset^[207]23 (Gene Expression Omnibus: [208]GSE192902), which includes cfRNA profiles of EOPE and LOPE collected during pregnancy (n = 190) (validation 2). For external validation, we retuned the model architecture to account for technical differences in cfRNA processing between datasets. Since the external cohort did not distinguish between EOPE and LOPE, we constructed separate balanced case–control subsets for each subtype. All available PE cases were included, and controls were selected using an agnostic downsampling approach based on cfRNA profiles and gestational age (T1 or T2). Within each timepoint, the largest available subset of controls was retained using a reproducible selection criterion. Model performance was assessed using leave-one-out cross-validation. Performance across the three validations was assessed with key metrics, including accuracy, sensitivity, specificity, AUC, and F1-score. Reporting summary Further information on research design is available in the [209]Nature Portfolio Reporting Summary linked to this article. Supplementary information [210]Supplementary information^ (871.2KB, pdf) [211]41467_2025_64215_MOESM2_ESM.pdf^ (84.6KB, pdf) Description of Additional Supplementary Files [212]Supplementary Data File 1^ (1.3MB, xlsx) [213]Supplementary Data File 2^ (647.8KB, xlsx) [214]Reporting Summary^ (118.4KB, pdf) [215]Transparent Peer Review file^ (1.7MB, pdf) Acknowledgements