Abstract

   Early- and late-onset preeclampsia (EOPE and LOPE) pose serious
   maternal-fetal risks, yet non-invasive early prediction remains
   challenging. In a prospective cohort of 9,586 pregnancies, we analyze
   trimester-specific plasma cell-free RNA (cfRNA) profiles from 42 EOPE
   and 43 LOPE cases versus 131 normotensive controls. Organ-specific
   transcriptomic shifts distinguish EOPE from LOPE. Predictive models
   based on cfRNA signatures identify EOPE up to 18.0 weeks before
   clinical onset in the first-trimester (T1) (AUC = 0.88), and 8.5 weeks
   in the second trimester (T2) (AUC = 0.89). LOPE is predicted 14.9 weeks
   in advance using T2 data (AUC = 0.90), while T1 performance is lower
   (AUC = 0.68). External validation confirms robust EOPE prediction
   (AUC = 0.87 at T1; 0.81 at T2) and acceptable LOPE performance
   (AUC = 0.63 at T1; AUC = 0.77 at T2). EOPE models are enriched for
   decidual transcripts, suggesting early maternal involvement; LOPE
   models reflect broader tissue contributions. These findings offer a
   path to early, non-invasive, subtype-specific preeclampsia risk
   stratification and prevention.

   Subject terms: Predictive markers, RNA sequencing, Pre-eclampsia
     __________________________________________________________________

   Early- and late-onset preeclampsia pose serious maternal-fetal risks,
   yet non-invasive early prediction remains challenging. Here, the
   authors show that cfRNA signatures reveal distinct decidual and
   multiorgan signals, enabling accurate, externally validated prediction
   of both subtypes.

Introduction

   Maternal and infant mortality during pregnancy and labor are critical
   indicators of community and national health^[76]1,[77]2. Most pregnancy
   complications arise from disorders that develop during the
   periconceptional phase, particularly during embryonic implantation and
   early placentation^[78]3.

   Preeclampsia—a life-threatening obstetric syndrome—is characterized by
   new-onset of hypertension after 20 weeks of gestation, accompanied by
   signs of kidney, liver, or brain damage^[79]4. Each year, preeclampsia
   contributes to 14% of maternal deaths worldwide, leaving a lasting
   impact on survivors’ health^[80]5. It also constitutes a significant
   public health burden, incurring $1.03 billion in maternal healthcare
   costs and an additional $1.15 billion for neonatal care in infants born
   to mothers affected by preeclampsia within the first year after birth
   in the United States^[81]6.

   The heterogeneity of preeclampsia is notable, differentiated by the
   timing of onset and severity of symptoms. Early-onset preeclampsia
   (EOPE) arises before 34 weeks of gestation, necessitating emergency
   delivery to mitigate risks to maternal and fetal health^[82]7,[83]8. In
   contrast, late-onset preeclampsia (LOPE) manifests after 34 weeks and
   can lead to severe maternal organ damage such as kidney, liver, or
   brain damage^[84]8–[85]11. Therefore, there is an urgent need for
   straightforward, non-invasive methods for early diagnosis of
   preeclampsia in the first trimester to implement preventive strategies
   effectively^[86]12–[87]14.

   Since the maternal decidua regulates the initial steps of
   maternal-embryo communication, decidualization resistance
   (DR)—characterized by defective endometrial cell
   differentiation—results in abnormal placentation, which has been
   associated with the etiology of major obstetric syndromes, including
   preeclampsia^[88]15–[89]19, even though symptoms may manifest later in
   gestation^[90]15–[91]19. Recently, we provided an in-depth multi-omics
   characterization of DR in former EOPE patients, further underscoring
   the uterine contribution to this pathological condition^[92]20.

   Analyzing plasma cell-free RNA (cfRNA) through liquid biopsy (i.e.,
   from a blood sample) has emerged as a promising non-invasive tool for
   molecular monitoring in pregnancy, offering insights into physiological
   and pathological events^[93]21,[94]22. However, previous cfRNA studies
   on preeclampsia prediction have faced limitations such as small EOPE
   sample size^[95]23, lack of clear subtype distinction^[96]23,[97]24, or
   sampling at later gestational ages^[98]24,[99]25. Our study builds on
   these foundations and addresses these gaps by including a large,
   prospectively collected cohort of EOPE cases with strict
   first-trimester sampling, clearly differentiating EOPE and LOPE, and
   employing longitudinal sampling. This comprehensive design has allowed
   us to develop and validate predictive models with improved early and
   subtype-specific risk stratification.

   Specifically, in this case-control study, we prospectively analyzed the
   cfRNA profiles in pregnant women across the three trimesters of
   pregnancy, comparing EOPE and LOPE with normotensive controls. This
   approach facilitated the characterization of the circulating
   transcriptome by mapping the tissue origins and transcriptional changes
   associated with EOPE and LOPE, revealing that both subtypes display
   distinct transcriptional differences compared to controls. Our research
   identified cfRNA profiles that exhibited robust predictive performance
   for EOPE in both the first (averaging 18.0 weeks before diagnosis) and
   second trimesters (averaging 8.5 weeks prior to clinical onset), as
   well as for LOPE in the second trimester (14.9 weeks prior to clinical
   onset). Monitoring cfRNA profiles not only aids in predicting the risk
   of developing preeclampsia but also allows the differentiation of both
   subtypes of preeclampsia and the evaluation of different organ damage
   in affected patients, providing insights into their prognosis.

Results

Clinical study design and participants baseline characteristics

   A total of 9586 pregnant women with singleton pregnancies were enrolled
   in this prospective and longitudinal case-control study in fourteen
   tertiary hospitals in Spain (ClinicalTrials.gov Identifier:
   [100]NCT04990141). Blood samples were collected prospectively across
   all three trimesters and at the time of EOPE or LOPE diagnosis. Each
   participant was followed until delivery, ensuring the availability of
   obstetrical outcome and the creation of a curated database with
   comprehensive clinical data. Uncomplicated pregnancies that progressed
   to term (> 37 weeks) were classified as normotensive controls, while
   those diagnosed with EOPE or LOPE, were categorized according to
   current established ACOG^[101]4 and FIGO^[102]24 clinical guidelines.

   Of the 9586 pregnant women enrolled, 7142 were eligible for analysis
   after excluding participants for selection failure, loss to follow-up,
   and obstetric complications other than preeclampsia. We included all
   EOPE cases (n = 42) and randomly selected a subset of LOPE cases
   (n = 43). The number of LOPE cases was established to match the number
   of EOPE cases, ensuring that both groups had the same control-to-case
   ratio of 1:3, which is optimal for model development. Normotensive
   controls (n = 131) were randomly selected from the 6,905 uncomplicated
   pregnancies and matched to both EOPE and LOPE cases for key
   epidemiological variables including gestational age at sampling,
   maternal age, parity, BMI and ethnicity (Supplementary Fig. [103]1a, b,
   and Supplementary Table [104]1). Then, a subset of 216 participants
   composed by preeclampsia cases (EOPE and LOPE) and normotensive
   controls was selected for total cfRNA sequencing to characterize cfRNA
   profiles throughout the progression of pregnancy (Fig. [105]1). For the
   development of predictive models, the cohort was randomly stratified
   into a discovery set (70% of patients) and a validation set (30% of
   patients). The discovery set was used to build the predictive model and
   the validation set to assess its performance in a hold-out group of
   samples (Supplementary Fig. [106]1c).

Fig. 1. Flowchart of the study.

   [107]Fig. 1
   [108]Open in a new tab

   A total of 9586 pregnant participants were recruited. After excluding
   participants due to selection failure and loss to follow-up, 8991
   remained. Within this cohort, 237 (2,6%) individuals were diagnosed
   with preeclampsia, including 42 EOPE and 195 LOPE cases, while 1849
   (20.6%) individuals had other pregnancy-related pathologies, and 6905
   (76,8%) participants had no obstetric complications. For cfRNA
   analysis, we included all 42 EOPE cases, a subset of 43 LOPE cases and
   131 normotensive controls, randomly selected from the matched cohort
   based on gestational age at sample collection, maternal age, parity,
   ethnicity, and BMI.

   From each participant, we collected three peripheral blood samples
   between 9 and 14 weeks of gestation (T1), 18-28 weeks (T2), and at the
   time of diagnosis of EOPE and LOPE or after 28 weeks (T3)
   (Fig. [109]2). Data on the gestational weeks of blood sample collection
   are summarized in Supplementary Table [110]2. Due to clinical
   emergencies necessitating immediate termination of pregnancy, T3 could
   not be collected from fourteen EOPE patients and seven LOPE patients.

Fig. 2. Overview of sample collection, preeclampsia diagnosis, and delivery
time points across patient and control groups.

   [111]Fig. 2
   [112]Open in a new tab

   Bar graph illustrating the number of samples collected at each
   gestational week for the EOPE (a), LOPE (c) and control (e) groups.
   Color represents the time point of sample collection: T1 (9-14
   gestational weeks); T2 (18–28 gestational weeks); T3 (at the time of
   preeclampsia diagnosis or >28 gestational weeks). Density plot showing
   the relative frequency of preeclampsia diagnosis and delivery across
   gestational weeks for the EOPE (b), LOPE (d) and control (f) groups.

   Maternal characteristics, clinical symptoms, and birth outcomes are
   summarized in Table [113]1. There were no significant differences in
   maternal age, parity, ethnicity, BMI index or smoking habits between
   patients and controls (p > 0.05). Natural conception rate was
   statistically lower in EOPE patients compared to controls (p = 0.0003)
   but did not differ significantly in LOPE (p = 0.105). Aspirin
   prophylaxis (150 mg) was prescribed to 30 EOPE (71.4%), 23 LOPE cases
   (53.5%), and 10 normotensive controls (7.6%). EOPE was diagnosed at
   30.0 ± 3.4 weeks, with severe symptoms in 76.2% of patients; LOPE was
   diagnosed at 36.5 ± 1.8 weeks, with severe symptoms in 41.9% of
   patients. Severe symptoms were considered the presence of severely
   elevated blood pressure (systolic ≥160 mm Hg or diastolic ≥100 mm Hg),
   thrombocytopenia, impaired liver function, progressive renal
   insufficiency, pulmonary edema, or neurological complications such as
   cerebral or visual disturbances^[114]4.

Table 1.

   Maternal characteristics and pregnancy outcomes in the selected subset
   of participants: EOPE (n = 42), LOPE (n = 43), and Controls (n = 131)
   Maternal characteristics
   Group Maternal age (years) Maternal BMI (kg/m2) Primiparous (%) Smoker
   (%) Natural conception (%) Aspirin (%)
   EOPE 34.4 (6.3) 28.0 (4.7) 59.5 9.5 80.9 71.4
   LOPE 33.7 (4.7) 26.9 (5.3) 62.8 13.9 94.3 53.5
   Control 33.68 (3.8) 26.62 (5.0) 55 7.6 96.9 7.6
   P-value
   EOPE vs Control 0.512^a 0.052^b 0.429^c 0.747^d <0.001^d <0.001^c
   LOPE vs Control 0.982^a 0.816^b 0.368^c 0.213^c 0.105^d <0.001^c
   Ethnicity / Race
   Group Caucasian (%) African American (%) Hispanic (%) Asian (%) Other
   (%) P-Value
   EOPE 69.0 9.5 19.0 0.0 2.3 0.175^d
   LOPE 81.0 0.0 13.9 0.0 4.6 0.211^d
   Control 82.4 5.3 9.2 0.0 3.1
   Preeclampsia symptoms
   Group GA at diagnose (weeks) SBP (mm Hg) DBP (mm Hg) Uteroplacental
   dysfunction (%) Proteinuria (%) Pulmonary edema (%) HELLP (%) Eclampsia
   (%) Severe (%)
   EOPE 30.0 (3.4) 157.5 (18.9) 98.9 (15.0) 69.0 80.9 0.0 14.3 2.4 76.2
   LOPE 36.5 (1.8) 152.7 (17.3) 93.9 (8.2) 25.6 88.4 2.3 7.0 0.0 41.9
   Control NA 114.1 (11.9) 93.1 (23.4) NA NA NA NA NA NA
   P-value
   EOPE vs Control <0.001^b <0.001^a
   LOPE vs Control <0.001^b <0.001^a
   Birth outcomes
   Group GA at delivery (weeks) Preterm birth (%) Cesarea (%) Male fetus
   (%) Fetal weight (gr) SGA (%) Stillbirth (%) Mother ICU (%) Newborn ICU
   (%)
   EOPE 32.1 (3.7) 87.8 69.0 52.4 1520 (557.2) 80.9 11.9 35.2 50.0
   LOPE 37.4 (1.6) 41.9 44.2 53.5 2724 (635.5) 39.5 0.0 18.6 16.3
   Control 40 (1.1) 0.0 19.8 45.0 3375 (408.3) 1.5 0.0 0.0 0.8
   P-value
   EOPE vs Control <0.001^b <0.001^d <0.001^d 0.407^c <0.001^b <0.001^d
   <0.001^d <0.001^d <0.001^d
   LOPE vs Control <0.001^b <0.001^d <0.001^d 0.335^c <0.001^b <0.001^d
   <0.001^d <0.001^d
   [115]Open in a new tab

   Statistical comparisons: Exact p-values are provided for comparisons
   between case and control groups. Depending on the distribution of the
   data assessed by the Shapiro–Wilk test, either Student’s t-test (ᵃ) or
   Wilcoxon rank-sum test (ᵇ) was used for continuous variables.
   Categorical variables were compared using the Chi-squared test (ᶜ) or
   Fisher’s exact test (ᵈ), as appropriate. All tests were two-sided. No
   adjustment for multiple comparisons was applied, as each variable was
   tested independently. Superscript letters next to p-values indicate the
   test applied.

   BMI Body Mass Index, DBP diastolic blood pressure, GA Gestational Age,
   HELLP Hemolysis, Elevated Liver enzyme levels, and Low Platelet levels,
   ICU Intensive Care Unit, NA Not Applicable, SBP Systolic blood
   pressure, SGA Small for Gestational Age.

   Birth outcomes for EOPE and LOPE included higher rates of small for
   gestational age, preterm birth, cesarean delivery, and lower fetal
   weight (p < 0.001). Specifically, preterm deliveries occurred in 87.8%
   of EOPE patients and in 41.9% of LOPE patients, with cesarean sections
   required in 69.0% and 44.2% of patients, respectively. In contrast, all
   deliveries in the control group occurred at term, and only 19.8%
   involved cesarean sections. Fetal sex did not differ between groups.
   EOPE patients had significantly higher rates of stillbirth (11.9%) and
   post-delivery complications (p < 0.001), with 35.2% of mothers and
   50.0% of neonates requiring intensive care. In comparison, among
   patients with LOPE, 18.6% of mothers and 16.3% of newborns required
   intensive care, whereas no mothers and 0.8% of neonates in the control
   group needed intensive care.

Profiling the tissue origin and dynamics of cfRNA in EOPE and LOPE through
pregnancy

   We analyzed a total of 29,871 cfRNA transcripts after applying quality
   filtering and normalization processes. To determine the tissue origins
   of the identified transcripts, we compared our cfRNA dataset to the
   Human Protein Atlas database^[116]26, focusing on transcripts
   classified as “enriched” or “enhanced” in specific tissues or organs.
   In this analysis, we examined tissues and organs that are directly
   involved in the pathophysiology of preeclampsia and contribute to its
   clinical manifestations. Our experimental protocol detected over 90% of
   these classified transcripts for each targeted organ or tissue of
   interest (Fig. [117]3a), indicating a robust coverage of
   tissue-specific cfRNA signatures in our dataset.

Fig. 3. CfRNA abundance by organ/tissue origin in EOPE, LOPE patients and
controls.

   [118]Fig. 3
   [119]Open in a new tab

   a Number and proportion of cfRNA transcripts from organs/tissues
   implicated in preeclampsia, relative to Human Protein Atlas reference.
   b Box plots show cfRNA abundance scores by tissue of origin at each
   time point, calculated as the sum of log-transformed CPM-TMM normalised
   counts. Color indicates group. Horizontal lines represent medians;
   boxes, 25th–75th percentiles; whiskers extend to 1.5x interquartile
   range. Sample sizes for each time point and group are as follows: T1
   (EOPE, n = 41; LOPE, n = 43; control, n = 129); T2 (EOPE, n = 40; LOPE,
   n = 41; control, n = 120); T3 (EOPE, n = 19 vs. control, n = 34; LOPE,
   n = 24 vs. control, n = 39). P-values were determined by Wilcoxon
   rank-sum test with two tails. Exact P-values for all comparisons are
   provided in Supplementary Table [120]3. *P < 0.05, **P < 0.01,
   ***P < 0.001, ****P < 0.0001.

   We then calculated the organ/tissue-specific signature score for
   patients and controls at three time points during pregnancy (T1, T2 and
   T3) (Fig. [121]3b and Supplementary Table [122]3). In EOPE patients, a
   significant increase in cfRNA transcripts from the liver, kidney, and
   decidua was identified at T2 (p < 0.01), indicating tissue specific
   damage approximately eight weeks before diagnosis. At T3, when clinical
   symptoms appear, EOPE patients displayed a significantly higher
   signature score (p < 0.0001) for additional organs including brain,
   lungs, placenta, and lymphoid tissues, signaling widespread organ
   injury. In contrast in LOPE patients, tissue-specific transcripts
   suggesting organ damage was only observed at T3 (p < 0.01), with lower
   levels of significance than those in EOPE.

   To decode cfRNA dynamics throughout pregnancy, we performed a
   differential abundance analysis at each time point, elucidating
   molecular changes in the circulating transcriptome associated with
   disease progression and offering insights into underlying mechanisms.
   At the time of diagnosis (T3), we identified 24,336 transcripts with
   significantly altered abundance in EOPE patients compared to controls
   (FDR < 0.05) (Supplementary Fig. [123]2a and Supplementary Data
   File [124]1). In contrast, LOPE patients exhibited 11,859
   differentially abundant transcripts (FDR < 0.05) (Supplementary
   Fig. [125]2b and Supplementary Data File [126]2). Notably, 8,127 cfRNAs
   showed differential abundance in T2 for EOPE patients (FDR < 0.05),
   whereas no differentially abundant cfRNAs were detected in T1 for
   either EOPE or LOPE patients, nor in T2 for LOPE. These findings
   suggest that transcriptomic alterations emerge as EOPE progresses,
   while LOPE remains largely unchanged.

   Gene ontology overrepresentation analysis within the differentially
   abundant cfRNAs at diagnosis revealed biological processes indicative
   of fetal and maternal organ-specific damage (FDR < 0.05) (Supplementary
   Fig. [127]2c and Supplementary Table [128]4). Both, EOPE and LOPE
   patients displayed significant enrichment in key biological processes,
   including transport across the blood-brain barrier, renal water
   homeostasis, regulation of blood pressure and cognition, which are
   hallmark processes of the pathology. Importantly, signatures of fetal
   tissue damage were identified in both EOPE and LOPE, with a notably
   greater impact in EOPE patients. Distinct biological processes were
   associated with either EOPE or LOPE. In EOPE, overrepresentation
   analysis revealed significantly enriched pathways related to neuronal
   death, renal filtration, and immune dysfunction ─including
   interleukin-8 production, response to interleukin-4,
   neutrophil-mediated immunity, and antimicrobial humoral immune
   response. In contrast, LOPE cfRNA profile showed signatures linked to
   heart and brain function (FDR < 0.05), suggesting significant damage to
   these organs.

   Thus, cfRNA profile analysis at diagnosis (T3) indicates more extensive
   transcriptomic alterations in EOPE compared to LOPE, highlighting an
   exacerbated proinflammatory state as a defining feature. These findings
   underscore the impacts of the disease on multiple organ systems and
   suggest that cfRNA profiling may provide valuable insights into the
   molecular distinctions between preeclampsia subtypes. Additionally, the
   identification of distinct biological processes linked to each
   preeclampsia subtype emphasizes the need for tailored therapeutic
   approaches targeting specific dysfunctions observed in EOPE and LOPE.

Early prediction of EOPE and LOPE in the first trimester of pregnancy

   Given the evidence that cfRNA profiles reflect molecular changes
   throughout pregnancy, disruptions in these pathways may help identify
   pregnancies at risk for EOPE or LOPE. Here, we developed a model for
   EOPE risk assessment based on plasma cfRNA profiles in the first
   trimester (T1), approximately 18.0 weeks before clinical onset.

   Our optimal predictive model for EOPE utilized 36 cfRNA transcripts
   (Supplementary Table [129]5) and was evaluated in a hold-out validation
   set. The model achieved a sensitivity of 83% and specificity of 90%,
   with an area under the receiver operator characteristic curve (AUC) of
   0.88 (Fig. [130]4a and Supplementary Table [131]6). Nearly all samples
   were correctly classified, with minimal misclassifications observed
   reinforcing the model’s robustness and indicating no evidence of
   overfitting (Fig. [132]4b). Relative contribution of individual cfRNA
   transcripts to the model’s performance are detailed in Fig. [133]4c. We
   further evaluated the same cfRNA signature in an independent external
   dataset^[134]23 (Fig. [135]4a,b and Supplementary Table [136]6),
   confirming consistent performance (sensitivity 78%, specificity 90%;
   AUC 0.87), despite cross-cohort variability in protocols and data
   origin.

Fig. 4. Performance and feature importance of first trimester (T1) predictive
models for EOPE and LOPE.

   [137]Fig. 4
   [138]Open in a new tab

   Receiver operating characteristic (ROC) curves for EOPE (a) and LOPE
   (d) models across internal validation (validation 1) and external
   validation^[139]23 (validation 2). The X-axis represents the False
   Negative Rate; the Y-axis, the True Positive Rate. Violin plots showing
   correctly and misclassified patients and controls based on the
   classifier score obtained from the predictive model for EOPE (b) and
   LOPE (e). The X-axis shows the real obstetric outcome; the Y-axis, the
   predicted outcome. Bar plot illustrating each cfRNAs contribution to
   EOPE (c) and LOPE (f) models. The X-axis shows the feature importance
   scores, which quantify the relative contribution of each cfRNA to the
   model’s predictions, with higher scores indicating features that play a
   more significant role in discriminating between outcomes. CfRNAs
   associated with DR are marked with an asterisk. AUC, area under the
   curve.

   Further analysis of these 36 transcripts revealed that 17 (47.2%) were
   identified as markers of DR in women with a history of severe
   preeclampsia, including CBR3, MMP7, MDK, TRIB1, PAEP^[140]20. The model
   also incorporates cfRNA transcripts known to be disrupted in
   preeclamptic placentas, such as RFLBN^[141]27, and CD74^[142]28, as
   well as others associated with fetal growth restriction, such as
   CCL4L2^[143]29 and MYL6^[144]30.

   Using the same computational approach, we developed a predictive model
   for LOPE in the first trimester (T1), with predictions averaging 24.9
   weeks before clinical onset. However, the model’s performance in the
   validation set was limited, achieving a sensitivity of 72%, specificity
   of 64%, and an AUC of 0.68 (Fig. [145]4d and Supplementary
   Table [146]6). Consistent with these findings, the model showed
   similarly limited performance when applied to an independent external
   cfRNA dataset^[147]23 (sensitivity 39%, specificity 58%, AUC 0.63)
   (Fig. [148]4d, e and Supplementary Table [149]6), underscoring the
   challenges in early LOPE prediction. Misclassified samples are shown in
   Fig. [150]4e, and the relative contribution of individual cfRNAs to
   predictive accuracy detailed in Fig. [151]4f. While predictive
   capability was limited, analysis of the selected cfRNAs offers insights
   into LOPE mechanisms.

   Further exploration revealed that several of these cfRNAs map to
   protein-coding genes with known roles in cardiovascular, hepatic, and
   immune functions, including PRR23D1, SnoRD126, CD52, TRDV3. Unlike
   EOPE, no cfRNA transcripts in this model were associated with decidua,
   underscoring distinct pathophysiological pathways for EOPE and LOPE.

   In conclusion, our findings demonstrate the effectiveness of cfRNA
   signatures in predicting EOPE during the first trimester, while LOPE
   prediction remains challenging, likely reflecting fundamental
   differences in pathophysiology between EOPE and LOPE.

Early prediction for EOPE and LOPE in the second trimester of pregnancy

   We next investigated the potential for early detection of EOPE and LOPE
   in the second trimester (T2). The most effective predictive model for
   EOPE was based on 87 cfRNA transcripts (Supplementary Table [152]5),
   achieving a sensitivity of 89% and specificity of 86%, with an AUC of
   0.89 in the validation set (Fig. [153]5a and Supplementary
   Table [154]6). Misclassified samples are shown in Fig. [155]5b, and
   importance scores for each transcript are illustrated in Fig. [156]5c.
   This model reliably identifies patients at risk for EOPE between 18 and
   28 weeks of gestation, approximately 8.5 weeks before clinical onset.
   When applied to an independent external cfRNA dataset^[157]23, the
   signature demonstrated good performance (sensitivity 67%, specificity
   78%, AUC 0.81), with a moderate reduction likely influenced by the
   small sample size of EOPE cases (n = 5) (Fig. [158]5a, b and
   Supplementary Table [159]6).

Fig. 5. Performance and feature importance analysis of second trimester (T2)
predictive models for EOPE and LOPE.

   [160]Fig. 5
   [161]Open in a new tab

   Receiver operating characteristic (ROC) curves for EOPE (a) and LOPE
   (d) models across internal validation (validation 1) and external
   validation^[162]23 (validation 2). The X-axis represents the False
   Negative Rate; the Y-axis, the True Positive Rate. Violin plots showing
   correctly and misclassified patients and controls based on the
   classifier score obtained from the predictive model for EOPE (b) and
   LOPE (e). The X-axis shows the real obstetric outcome; the Y-axis, the
   predicted outcome. Bar plot illustrating each cfRNAs contribution to
   EOPE (c) and LOPE (f) models. The X-axis shows the feature importance
   scores, which quantify the relative contribution of each cfRNA to the
   model’s predictions, with higher scores indicating features that play a
   more significant role in discriminating between outcomes. cfRNAs
   associated with DR are marked with an asterisk. AUC, area under the
   curve.

   Further investigation into the tissue-specific origin of these
   transcripts revealed that 32 (36.8%) are associated with DR signature
   previously described in endometrial tissue from women with a history of
   severe preeclampsia, including CCL20, CXCR4, IGF1, RBP4, SQSTM1,
   WNT5A^[163]20. The persistence of decidual contributions as EOPE
   approaches underscores the maternal decidua’s role in its
   pathophysiology. The model also includes inflammatory mediators such as
   SQSTM1, IL1B, CCL20, FASLG and TREM1, as well as transcripts encoding T
   cell receptors (e.g. TRAV21, TRBV27, TRBV5-7). Additionally, it
   incorporates anti-inflammatory mediators like ALOX5AP, an
   immunosuppressive gene linked to recurrent miscarriage^[164]31, and
   IL19. Transcripts such as RBP4, which directly influences blood
   pressure regulation^[165]32, NRBF2 involved in autophagy and liver
   protection^[166]33, and WNT5A, a key regulator of placental
   growth^[167]34, further support the model’s clinical relevance.

   The top-performing predictive model for LOPE at T2 included 92 cfRNAs
   (Supplementary Table [168]5), achieving a sensitivity of 88%,
   specificity of 92%, and an AUC of 0.90 in the validation cohort
   (Fig. [169]4d and Supplementary Table [170]6). An analysis of
   misclassified samples is shown in Fig. [171]5e, with the contributions
   of individual cfRNAs to predictive accuracy detailed in Fig. [172]5f.
   Further validation using an independent external dataset^[173]23,
   performance was lower (sensitivity 60%, specificity 92%, AUC 77%)
   (Fig. [174]5d, e and Supplementary Table [175]6), likely due to
   differences in sample timing. While all samples were annotated as
   collected after 23 weeks, some were obtained near symptom onset or
   during the third trimester, and metadata limitations precludes precise
   identification of those cases.

   Pathway enrichment analysis revealed that this model included cfRNAs
   related to immune function, such as CFHR1 and CFHR3, involved in
   complement activation, and immunoglobulin transcripts (e.g., IGKV3D-20,
   IGKV3D-11, IGHV5-10-1, IGHV3-69-1), and CXCR5, linked to B-cell
   migration^[176]35. Additionally, the model incorporates a cfRNA
   corresponding to HISLA, highly expressed in the liver^[177]36, and
   LINC01419^[178]37. Notably, most predictive cfRNAs were classified as
   non-coding RNAs or pseudogenes with no annotated function. In contrast
   to the EOPE model, this LOPE model includes only two cfRNAs related to
   DR, HES4 and SPEF1.

Discussion

   Previous efforts to develop screening tests for preeclampsia have
   primarily focused on circulating biomarkers related to placental
   dysfunction, such as sFLT1 and PlGF^[179]38. These tests have been
   validated for use starting at 23 weeks of gestation, with their
   strongest predictive accuracy typically observed within two weeks of
   symptom onset. Consequently, they are recommended for patients with
   suspected preeclampsia^[180]39,[181]40. While these tests are
   particularly useful for short-term prediction, placental
   dysfunction-based tests are also utilized as early as the first
   trimester. They are often combined with maternal epidemiological
   factors and ultrasound or Doppler parameters. However, they face
   significant limitations in their effectiveness and
   application^[182]41–[183]44. In settings where guidelines from the
   National Institute for Health and Care Excellence (NICE) and the ACOG
   are applied, screening primarily relies on pregnancy-related factors
   and maternal characteristics. While this approach minimizes additional
   costs, it has low sensitivity (< 41%)^[184]45,[185]46.

   Predictive models based on cfRNAs from liquid biopsy, grounded in
   biological plausibility and applicable early in pregnancy, offering
   potential improvements for the clinical management of
   preeclampsia^[186]22–[187]25, yet they have not been clinically
   applied. Building on this foundation, we prospectively collected blood
   samples from 9586 pregnant women across three gestational trimesters
   (9–42 weeks). Then, we selected a subset of 216 participants composed
   by preeclampsia cases and normotensive controls to generate a
   comprehensive longitudinal dataset of cfRNA profiles related to EOPE or
   LOPE progression. The performance metrics demonstrate substantial
   advancements in leveraging cfRNA signatures for early detection of EOPE
   in both the first and second trimesters, as well as LOPE in the second
   trimester.

   Our study stands out from previous research by addressing several key
   limitations in the field. First, we report the largest prospective
   cohort of EOPE cases with first-trimester sampling strictly between 9
   and 14 weeks of gestation. This early and consistent inclusion window
   allowed us to capture cfRNA signatures an average of 18 weeks before
   clinical onset. Second, our carefully curated dataset enabled a clear
   distinction between EOPE and LOPE, classified according to established
   clinical guidelines. Unlike many previous studies that analyze
   preeclampsia as a single entity or rely on later gestational time
   points, our approach allows for a more precise molecular
   characterization of disease subtypes. This also suggests that cfRNA
   reflects a time-specific pathological status rather than a fixed
   disease signature. Third, the longitudinal design with multiple
   sampling points across gestation provided a dynamic view of disease
   progression, from preclinical stages to diagnosis.

   The clinical implications of early risk stratification warrant
   consideration. While low-dose aspirin remains the primary intervention
   and is already recommended for many at-risk patients, earlier and more
   precise identification of individuals at high risk for preeclampsia
   allows for tailored clinical management strategies^[188]47. A shared
   approach to surveillance—including frequent blood pressure monitoring,
   renal and liver function assessment, and fetal growth evaluations—has
   been recommended to mitigate complications in high-risk pregnancies.
   Furthermore, structured lifestyle interventions, such as calcium and
   vitamin D supplementation, aerobic exercise, and improved sleep
   hygiene, may complement pharmacological strategies. A cfRNA-based
   screening tool offers potential advantages over existing multimarker
   approaches, which require strict quality control and trained operators,
   potentially reducing costs and improving accessibility^[189]48.

   In addition, our analysis highlighted the tissue-specific origins of
   the detected cfRNAs, offering further insight into the pathophysiology
   of both subtypes of preeclampsia. For EOPE patients, early signs of
   tissue distress were observed in the liver, kidney, and decidua at T2,
   suggesting that these organs may be affected up to eight weeks before
   clinical diagnosis. By the time of the clinical onset (T3), cfRNA
   levels associated with critical organs such as the placenta, heart,
   brain and lungs showed marked elevation in EOPE, indicating widespread
   organ involvement likely due to apoptotic processes releasing cfRNA
   into circulation. In LOPE patients, although cfRNA levels also
   increased by T3, levels were lower compared to EOPE. Furthermore,
   differential abundance analysis at diagnosis revealed distinct
   transcriptomic profile in EOPE, with more pronounced cfRNA changes than
   in LOPE reflecting potential differences in severity and inflammatory
   response between both preeclampsia subtypes.

   These distinctions between the subtypes extended to the biological
   roles of cfRNAs included in the predictive models. In models predicting
   EOPE, a substantial proportion of cfRNA transcripts were associated
   with genes involved in decidualization and DR, along with some
   placental-related transcripts. In contrast, cfRNA transcripts
   associated with LOPE prediction reflecting broader systemic
   contributions including placental malfunction. This molecular
   characterization provides opportunities for the development of targeted
   interventions. For instance, transcriptomic profiling, such as that
   presented in this study, could facilitate in silico drug repurposing by
   identifying dysregulated pathways as potential therapeutic
   targets^[190]49,[191]50. Experimental approaches using siRNA or
   mRNA-based interventions to modulate key regulators of preeclampsia
   pathogenesis are already being explored^[192]51, highlighting the
   translational potential of cfRNA-driven insights. Furthermore, the
   progression of cfRNA changes throughout pregnancy could also play a
   pivotal role in the development of novel therapies. By tracking how
   cfRNA levels shift in response to disease onset and progression, new
   therapeutic windows can be explored and more effective treatment
   targets identified. This approach could pave the way for the
   development of interventions that could not only prevent the disease
   but also modify its course, thereby improving maternal and fetal
   outcomes.

   While external validation is crucial to confirm the diagnostic
   performance of our models, we have already conducted validation using
   an independent external dataset, which supported the relevance of the
   predictive signature across datasets for both EOPE and LOPE. We
   acknowledge, however, that the case–control design may limit
   generalizability compared to case-cohort studies. To address this, a
   large-scale external validation is currently underway
   (ClinicalTrials.gov Identifier: [193]NCT06716242), as part of the
   iPregnostic study, which will assess the performance of these models
   across a wider range of clinical backgrounds, thereby enhancing their
   applicability to diverse patient populations.

   Overall, by analyzing circulating RNAs from a single blood sample at T1
   or T2, our approach provides a reliable, standardized diagnostic
   measure that minimizes subjective interpretation and reduces
   variability in clinical decision-making. This streamlined strategy
   simplifies risk stratification, improving both the accuracy and
   efficiency of preeclampsia screening and facilitating personalized
   patient monitoring.

Methods

Study design

   This prospective, multicenter case-control study was conducted between
   September 2021 and June 2024 in fourteen hospitals across Spain
   (ClinicalTrials.gov Identifier: [194]NCT04990141) in compliance with
   all relevant ethical regulations. Given the incidence rate of
   preeclampsia, the cohort size was designed to capture a minimum of 30
   patients of EOPE over the course of the study. Approval was obtained
   from the following Clinical Research Ethics Committees in Spain: Comité
   de Ética de la Investigación con medicamentos del Hospital General
   Universitario de Castellón (Castellón); CEIm - Hospital Universitario y
   Politécnico La Fe (Valencia); CEIm Hospital General de Alicante
   (Alicante); CEIm Hospital Virgen de la Arrixaca (Murcia); CEI Hospital
   Universitario Sta. Mª del Rosell (Cartagena); CEIm de la Gerencia de
   Atención Integrada de Albacete (Albacete); CEIm Hospital Puerta del
   Hierro de Majadahonda (Madrid); Comisión de Investigación del Hospital
   de Torrejón (Madrid); CEIc Aragón (Zaragoza); CEIm de Euskadi (Bilbao);
   CEIm Área de Salud Valladolid Oeste (Valladolid); CEI Provincial de
   Córdoba (Córdoba); CEIm Complejo Hospitalario Universitario de Canarias
   (Tenerife); and CEIm del Hospital Universitario de Gran Canaria Dr.
   Negrín (Las Palmas). Written informed consent was collected from all
   participants prior to blood collection and sample anonymization. A
   total of 9,586 pregnant women were enrolled based on the following
   criteria: signed informed consent, age over 18, singleton pregnancy,
   and first blood sample collection within 9–14 gestational weeks. Each
   participant provided 20 mL of peripheral blood in the three trimesters
   of pregnancy, coinciding with routine clinical follow-up: (T1) 9–14
   weeks, (T2) 18–28 weeks, and (T3) > 28 weeks or at the time of
   preeclampsia diagnosis. Gestational age was confirmed via ultrasound
   during the first trimester. Clinical data for each participant were
   recorded in an electronic data capture system. All blood samples were
   processed to isolate plasma and stored at −80 °C until pregnancy
   outcomes were available. Preeclampsia patients were diagnosed following
   ACOG^[195]4 and FIGO^[196]52 guidelines, as per the clinical protocol
   of each hospital involved.

   To develop predictive models for EOPE and LOPE, a subset of
   participants was selected from the cohort. All EOPE patients (n = 42),
   a randomly selected subset of LOPE patients (n = 43) and a subset of
   normotensive pregnant women with uncomplicated pregnancies were
   included as controls (n = 131). Control participants were randomly
   selected from the 6905 uncomplicated pregnancies and matched to both
   EOPE and LOPE cases for key clinical variables including gestational
   age at sampling, maternal age, parity, BMI and ethnicity. Participants
   in the control group were selected based on matching gestational age at
   the time of blood collection, maternal age, and parity, utilizing
   Euclidean distance for optimal pairing. Patients and controls were
   randomly stratified following a 70:30 proportion into two sets:
   discovery and validation. Sample sizes for the EOPE, LOPE, and control
   groups in each set are detailed in Supplementary Table [197]6. The
   discovery set was used for feature selection, model training, and
   optimization, with model performance assessed by leave-one-out
   cross-validation. For feature selection, a 1:2 case-to-control ratio
   was used, as it is optimal for identifying distinct patterns between
   the groups. For model training, the case-to-control ratio was increased
   to 1:3 to ensure a larger sample size, which supports better learning
   of the patterns by the model and improves predictive accuracy. The
   optimal model from this process was then applied to the validation set
   to assess the predictive performance, yielding metrics based on an
   unexposed sample set. The bioinformatic workflow is detailed in
   Supplementary Fig. [198]3.

Blood sample processing and storage

   Peripheral blood samples (20 mL) were collected in Streck Cell-Free DNA
   BCT tubes (Illumina, 15073345), stored, shipped at room temperature,
   and processed within seven days to obtain the plasma fraction. All
   blood samples were centrifuged for 15 min at 1600 x g and 4 °C. Plasma
   was transferred to a new collection tube and stored at -80 °C until
   use.

CfRNA isolation, library preparation, and sequencing

   Plasma supernatant samples (n = 548) from the study patients (n = 216)
   were centrifuged for 10 min at 13,000 x g. Following the manufacturer’s
   protocol, cfRNA from 2 mL of plasma was isolated using MiRNeasy
   Serum/Plasma Advanced Kit (Qiagen, 217204). According to the
   manufacturer’s protocol, cDNA libraries from total cfRNA samples were
   prepared using Illumina RNA Prep with Enrichment (L) Tagmentation
   (Illumina, 20040537). cDNA libraries were quantified using an Agilent
   D1000 ScreenTape in a 4200 TapeStation system (Agilent Technologies
   Inc, 5067-5582). Libraries were normalized to 10 nM and pooled in equal
   volumes. The pool concentration was quantified by qPCR using the KAPA
   Library Quantification Kit (Roche, 7960336001) and an Agilent D1000
   ScreenTape in a 4200 TapeStation system (Agilent Technologies Inc,
   5067-5582). The mean value was used to establish pool concentration,
   which was then sequenced in a NextSeq 500/550 High Output kit with 2.5
   cartridges of 150 cycles (Illumina, 20024907).

Sequencing data processing

   Raw reads were aligned to the human reference genome (GRCh38 Gencode
   v38 Primary Assembly) using STAR (2.7.10a). The SAM/BAM files were
   further processed using SAMtools (v.1.6). Only reads with mapping
   quality more significant than 90% were maintained (MAPQ score obtained
   from the alignment). The duplicated reads were removed with Picard
   MarkDuplicates (v.2.27.4). The mapping and the quantification of the
   reads were done using featureCounts (v.2.0.1). Read statistics were
   estimated using FastQC (v.0.11.9) and RseqQC (v.5.0.1) and summarized
   using MultiQC (v.1.13).

Sample quality filtering

   Three key quality parameters related to the sequencing process were
   estimated for each analyzed sample: RNA degradation, DNA contamination,
   and rRNA fraction as previously defined^[199]21,[200]53 (Supplementary
   Fig. [201]4a-c). Samples were retained for further analysis if they met
   the established cut-off values for each parameter: RNA degradation
   (cut-off: 40%), DNA contamination (cut-off ratio: 3), and rRNA fraction
   (cut-off: 15%). Principal Component Analysis (PCA) was used as an
   additional quality control measure (Supplementary Fig. [202]4d, e).
   Samples deviating by more than 3 standard deviations from the mean of
   the first and second components for each dataset were excluded from the
   analysis. In total, 12 samples were removed—2 due to a high rRNA
   fraction and 10 due to PCA-based exclusion.

CfRNA count normalization

   CfRNAs were filtered based on their detection value, and only cfRNAs
   with levels over more than 0.5 counts per million reads (CPMs) in ≥70%
   of discovery samples after removing outlier samples were kept.
   Discovery set CPMs were normalized using the “deseq median ratio
   normalization” with pydeseq2 (v0.4.1). The validation set were then
   normalized with the same algorithm using size factors from discovery
   set as described in MLSeq package^[203]54,[204]55. Batch effect and
   other possible confounding factors were assessed using PCA,
   hierarchical clustering with Spearman correlation as a distance metric,
   and variance component analysis. Finally, the normalized counts of each
   sample of discovery and validation sets were re-scaled to 0-1 range
   with a min-max scaling process.

Differential abundance analysis

   CfRNAs differentially abundant between EOPE or LOPE patients and
   controls at each time point (T1, T2, T3) were identified using the
   limma-Voom method from the Bioconductor package limma (v3.60.5). For
   the T3 samples, comparisons only included patients whose samples were
   collected at the time of EOPE or LOPE diagnosis and gestationally
   matched control samples collected during routine medical appointments.
   Genes with False Discovery Rate (FDR) less or equal to 0.05 were
   considered statistically significant.

Enrichment analysis

   Gene Ontology (GO) analyses were performed to identify biological
   processes using the enrichGO function from the clusterProfiler R
   package (v4.2.2). The input consisted of cfRNAs that were
   differentially abundant between EOPE and controls, as well as LOPE and
   controls (FDR < 0.05). The p-value adjustment method used was FDR, with
   a significance threshold set at 0.05 (FDR < 0.05).

Estimating signature scores for each tissue

   Gene sets for each tissue of interest ─those directly involved in the
   pathophysiology of preeclampsia─ were derived from the Human Protein
   Atlas database^[205]26, which includes gene expression data across
   tissues, focusing specifically on transcripts classified as either
   “enriched” or “enhanced” within those tissues, but only if they were
   additionally categorized as “detected in single” to ensure higher
   tissue specificity. The signature score in our dataset was calculated
   by summing the log-transformed, normalized counts of each gene in the
   set. For the T3 samples, comparisons included only those patients whose
   samples were collected at the time of EOPE or LOPE diagnosis and
   gestationally matched control samples collected during routine medical
   appointments, to avoid potential bias. Differences between groups were
   assessed using the Wilcoxon rank-sum test.

Data splitting

   Our study cohort was divided into discovery and validation sets to
   develop and evaluate the predictive models, following best practices to
   prevent overfitting in artificial intelligence. Using stratified
   sampling based on obstetric outcomes (patient/control groups) and the
   scikit-learn library (v1.5.1) in Python, 70% of participants were
   allocated to the discovery set.

Feature selection

   For feature selection, a 1:2 case-to-control ratio was used, as it is
   optimal for identifying distinct patterns between the groups. The Lasso
   regression model was used to select the more relevant cfRNAs to
   discriminate between patients and controls. The discovery dataset was
   used with a lasso regression algorithm (v1.5.2,
   sklearn.linear_model.Lasso) with a penalty term (alpha) of 0.5 and the
   case condition as a dependent part and the cfRNA abundance levels as
   the independent components, resulting in a regression formula that
   assigns a coefficient to each cfRNA variable, indicating the
   correlation between the condition and each variable. The number of
   cfRNAs selected was determined by a minimum coefficient threshold,
   which determined whether a cfRNA was relevant or not. Different minimum
   coefficient thresholds, ranging from 0 to the maximum coefficient in
   increments of 0.05, were tested to determine the optimal set of cfRNAs.
   The F1-score was calculated for each set of cfRNAs using the strategy
   of leave-one-out cross-validation, and the set that yield the highest
   F1-score metric was selected. Lasso regression was chosen over other
   feature selection methods due to the relatively small sample size,
   which can lead to model overfitting. The penalty term of the model
   helps to counteract overfitting by shrinking and selecting features
   with less importance^[206]56.

Algorithm selection and optimization

   For the development of the optimal predictor, the discovery set was
   used, with cfRNA selection performed using the lasso regression method
   as previously described. Six different algorithms were tested with
   Python (v3.10.6): Support Vector Machine (v1.5.2, sklearn.svm.SVC),
   Elastic Net Linear Regression (v1.5.2, sklearn.linear_model.ElastcNet),
   Lasso Linear Regression (v1.5.2, sklearn.linear_model.Lasso), Random
   Forest (v1.5.2, sklearn.ensemblel.RandomForestClassifier), XGBoost
   (v1.7.6 xgboost. XGBClassifier) and TabPFN (v0.1.10
   tabpfn.TabPFNClassifier). Each algorithm was trained with the best
   parameters calculated with a grid search applied with a cross-fold
   strategy. The evaluation of the predictive capacity of each model was
   done with a leave-one-out cross-validation with the discovery samples.
   The algorithm providing the best F1-score was selected for each group
   of samples: EOPE in the first trimester (EOPE T1), in the second
   trimester (EOPE T2), LOPE in the first trimester (LOPE T1) and LOPE in
   the second trimester (LOPE T2). The resulting chosen algorithms were
   TabPFN for EOPE (T1 and T2) and Lasso Linear Regression for LOPE (T1
   and T2).

Predictive model training

   For each model (EOPE T1, LOPE T1, EOPE T2, LOPE T2), the ML algorithm
   showing the highest F1-score and its best parameters was trained with
   the discovery dataset, which was based on a 1:3 case-to-control ratio.
   To evaluate the predictive capacity with the discovery data, a strategy
   of leave-one-out was performed. The selected algorithm was trained N
   number of times. In each iteration, one sample was isolated, and the
   rest were used to fit the model. The fitted model was used to predict
   the label of the isolated sample, and the result of the prediction was
   added to a pool of predicted labels that were used to calculate the
   discovery leave-one-out metrics. Finally, the algorithm was fitted with
   all the discovery samples, and the obtained trained model was used to
   predict the labels in the validation dataset and evaluate the
   performance with never seen samples.

Model validation

   We evaluated the predictive performance of each model (EOPE T1, LOPE
   T1, EOPE T2, LOPE T2) using three approaches: (1) leave-one-out
   cross-validation on the discovery dataset; (2) predictions on the
   hold-out validation dataset using the final model (validation 1); and
   (3) external validation of the predictive cfRNA signature using an
   independent dataset^[207]23 (Gene Expression Omnibus: [208]GSE192902),
   which includes cfRNA profiles of EOPE and LOPE collected during
   pregnancy (n = 190) (validation 2). For external validation, we retuned
   the model architecture to account for technical differences in cfRNA
   processing between datasets. Since the external cohort did not
   distinguish between EOPE and LOPE, we constructed separate balanced
   case–control subsets for each subtype. All available PE cases were
   included, and controls were selected using an agnostic downsampling
   approach based on cfRNA profiles and gestational age (T1 or T2). Within
   each timepoint, the largest available subset of controls was retained
   using a reproducible selection criterion. Model performance was
   assessed using leave-one-out cross-validation. Performance across the
   three validations was assessed with key metrics, including accuracy,
   sensitivity, specificity, AUC, and F1-score.

Reporting summary

   Further information on research design is available in the [209]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [210]Supplementary information^ (871.2KB, pdf)
   [211]41467_2025_64215_MOESM2_ESM.pdf^ (84.6KB, pdf)

   Description of Additional Supplementary Files
   [212]Supplementary Data File 1^ (1.3MB, xlsx)
   [213]Supplementary Data File 2^ (647.8KB, xlsx)
   [214]Reporting Summary^ (118.4KB, pdf)
   [215]Transparent Peer Review file^ (1.7MB, pdf)

Acknowledgements