Graphical abstract

   graphic file with name fx1.jpg
   [57]Open in a new tab

Highlights

     * Blood gene expression predicts gestational age in normal and
       complicated pregnancies
     * RNA changes preceding preterm prelabor rupture of the membranes are
       shared between cohorts
     * Plasma proteomic profiles from asymptomatic women predict
       spontaneous preterm birth
     __________________________________________________________________

   Harnessing the wisdom of crowds in a DREAM Challenge, Tarca et al.
   developed methods to predict gestational age and preterm birth from
   longitudinal multi-omics data. The authors show that blood RNAs predict
   ultrasound-based gestational age, and they identify molecular changes
   preceding a diagnosis of spontaneous preterm birth in asymptomatic
   women.

Introduction

   Early identification of patients at risk for obstetrical disease is
   required to improve health outcomes and develop new therapeutic
   interventions. One of the “great obstetrical syndromes,”[58]^1 preterm
   birth, defined as birth before the completion of 37 weeks of gestation,
   is the leading cause of newborn deaths worldwide. In 2010, 14.9 million
   babies were born preterm, accounting for 11.1% of all births across 184
   countries, the highest preterm birth rates occurring in Africa and
   North America.[59]^2 In the United States, the rate of prematurity
   remains fundamentally unchanged in recent years,[60]^3 and it has an
   annual societal economic burden of at least $26.2 billion.[61]^4 The
   high incidence of preterm birth is concerning, as 29% of all neonatal
   deaths worldwide, ∼1 million deaths in total, can be attributed to
   complications of prematurity.[62]^5 Furthermore, children born
   prematurely are at increased risk for several short- and long-term
   complications that may include motor, cognitive, and behavioral
   impairments.[63]^6^,[64]^7

   Approximately one-third of preterm births are medically indicated for
   maternal (e.g., preeclampsia) or fetal conditions (e.g., growth
   restriction); the other two-thirds are categorized as spontaneous
   preterm births, inclusive of spontaneous preterm labor and delivery
   with intact membranes (sPTD), and preterm prelabor rupture of the
   membranes (PPROM).[65]^8 Preterm birth is a syndrome with multiple
   etiologies,[66]^9 and its complexity makes accurate prediction by a
   single set of biomarkers difficult. While genetic risk factors for
   preterm birth have been reported,[67]^10^,[68]^11 the two most powerful
   predictors of spontaneous preterm birth are a sonographic short cervix
   in the midtrimester and a history of spontaneous preterm birth in a
   prior pregnancy.[69]^12 As for prevention of the syndrome, vaginal
   progesterone administered to asymptomatic women with a short cervix in
   the midtrimester reduces the rate of preterm birth before 33 weeks of
   gestation by 45% and decreases the rate of neonatal complications,
   including neonatal respiratory distress syndrome.[70]13, [71]14, [72]15

   To compensate for the suboptimal prediction of preterm birth by
   currently used biomarkers, alternative approaches to identify
   biomarkers have been proposed, such as focusing on fetal and
   placenta-specific signatures,[73]^16 with the latter eventually refined
   by single-cell genomics,[74]^16^,[75]^17 and by expanding the types of
   data collected via multi-omics platforms.[76]^10^,[77]^18^,[78]^19
   While molecular profiles have been shown to be strongly modulated by
   advancing gestation in the maternal blood proteome,[79]^20^,[80]^21
   transcriptome,[81]^16^,[82]^22 and vaginal microbiome,[83]23, [84]24,
   [85]25 the timing of delivery based on such molecular clocks of
   pregnancy is still challenging.[86]^16 A recent meta-analysis[87]^26
   suggests that specific changes in the maternal whole-blood
   transcriptome associated with spontaneous preterm birth are largely
   consistent across studies when both symptomatic and asymptomatic cases
   are involved and when the samples collected at or near the time of
   preterm delivery are also included. However, the accuracy of
   transcriptomic predictive models to make inferences in asymptomatic
   women early in pregnancy has not been evaluated, and aptamer-based
   high-throughput plasma proteomics patterns,[88]^27 shown to be
   comprehensive indicators of health,[89]^28^,[90]^29 were not assessed
   in the context of spontaneous preterm birth. This topic is important,
   since identification of early biomarkers, along with the associated
   robust assay platform, are necessary to develop treatment strategies
   that reduce the impact of prematurity.

   Therefore, we generated longitudinal whole-blood transcriptomic data at
   exon-level resolution and plasma proteomic data on 216 women and
   leveraged the Dialogue for Reverse Engineering Assessments and Methods
   (DREAM) crowdsourcing framework[91]^30 to engage >500 members of the
   computational biology community and to robustly assess the value of
   maternal blood multi-omics data in two sub-challenges. In sub-challenge
   1, we assessed maternal whole-blood transcriptomic data for prediction
   of gestational age in normal and complicated pregnancies using the last
   menstrual period (LMP) and ultrasound estimate as the gold standard,
   and showed that predictions are robust to disease-related
   perturbations. To avoid potential biases in the gold standard, in a
   post-challenge analysis, we also predicted delivery dates in women with
   spontaneous birth ([92]Figure 1) and found similar prediction
   performance. In sub-challenge 2, we evaluated within- and cross-cohort
   prediction of preterm birth leveraging longitudinal transcriptomic data
   in asymptomatic women generated herein and by Heng et al.[93]^31 in a
   cohort in Calgary, Canada. The separate consideration of both
   spontaneous preterm birth phenotypes (i.e., sPTD and PPROM), allowed us
   to pinpoint that previously reported leukocyte activation-related RNA
   changes preceding preterm birth are shared across the racially diverse
   cohorts for the PPROM phenotype but not for sPTD. Moreover, the
   evaluation of highly reproducible plasma proteomic assays[94]^32 and
   blood multi-omics data to determine the earliest stage in gestation
   when biomarkers have predictive value ([95]Figure 1) also make this
   study unique and led to the conclusion that changes in plasma
   proteomics can be detected earlier and are more accurate than
   whole-blood transcriptomics for prediction of preterm birth. In
   addition to the transcriptomic signatures of gestational age and the
   multi-omics signatures of preterm birth that were identified here, this
   work sets a benchmark for the evaluation of longitudinal omics data in
   pregnancy research. The computational lessons and algorithms for risk
   prediction from longitudinal omics data can be used to develop future
   studies.

Figure 1.

   [96]Figure 1
   [97]Open in a new tab

   Study overview

   Whole-blood transcriptomic and/or plasma proteomic profiles were
   generated from 216 women with either normal pregnancy, spontaneous
   preterm birth with intact (sPTD) or ruptured membranes (PPROM), or
   preeclampsia. Sub-challenge 1: transcriptomic data were generated from
   samples collected in normal pregnancies without labor at term (black
   dots) and spontaneous labor at term (gray dots), and those complicated
   by sPTD (red dots), PPROM (orange dots), or preeclampsia (blue dots).
   Participating teams were provided gene expression data to develop
   prediction models for gestational age at blood draw defined by last
   menstrual period (LMP) and ultrasound (gold standard). Participants
   submitted predictions on a blinded test set (see [98]Figure S1 for
   training/test partition). In a post-challenge analysis, the approach of
   the top team (smallest test set root mean square error [RMSE]) was
   applied to predict time to delivery. Sub-challenge 2: participants
   submitted risk prediction algorithms designed to use as input omics
   data at ≥2 time points and patient outcomes (control, sPTD, or PPROM)
   for a subset of them (training set), and return disease risk scores for
   women with blinded outcomes (test set). The algorithms were applied to
   70 training/test pairs of datasets (see [99]Figure 4A and [100]Table
   S5) to assess within- and across-cohort predictions of preterm birth by
   transcriptomics and within-cohort prediction by multi-omics data.
   Predictions were assessed by area under the receiver operating
   characteristic and area under the precision recall curves and
   aggregated across datasets and prediction scenarios (see [101]STAR
   Methods).

Results

Prediction of gestational age by maternal whole-blood transcriptomics

   We have generated and shared with the community exon-level gene
   expression data profiled in 703 maternal whole-blood samples collected
   from 133 women enrolled in a longitudinal study at the Center for
   Advanced Obstetrical Care and Research of the Perinatology Research
   Branch, National Institute of Child Health and Human
   Development/National Institutes of Health/Department of Health and
   Human Services (NICHD/NIH/DHHS); the Detroit Medical Center; and the
   Wayne State University School of Medicine. The Human Transcriptome
   Arrays platform was chosen based on its favorable performance compared
   to RNA sequencing (RNA-seq), especially for quantifying short and low
   abundant genes,[102]^33 and it was previously used for detecting
   gestational age- and parturition-related changes in maternal whole
   blood.[103]^22 The patient population included women with a normal
   pregnancy who delivered at term (≥37 weeks) (controls, N = 49), women
   who delivered before 37 completed weeks of gestation by sPTD (N = 34)
   or PPROM (N = 37), and women who experienced an indicated delivery
   before 34 weeks due to early preeclampsia (N = 13) ([104]Figure 2A).
   After including data from 16 additional normal pregnancies obtained
   from the same population[105]^22 and using the same microarray platform
   (Gene Expression Omnibus: [106]GSE113966, 32 transcriptomes), the
   resulting set of 149 pregnancies (see demographic characteristics in
   [107]Table S1), totaling 735 transcriptomes, was divided randomly into
   training (N = 367) and test (N = 368) sets; the latter set excludes
   publicly available data to avoid the possibility that the models are
   trained with data to be used for testing ([108]Figure S1). All of the
   longitudinal samples for the same patient were assigned to the training
   set or the test set; thus, samples were not split between training/test
   sets (see [109]Figure S1 and [110]Video S1). The research community was
   challenged to use data from the training set to develop gene expression
   prediction models for gestational age, as defined by the LMP and
   ultrasound fetal biometry, and to make predictions based only on gene
   expression in the test set. The clinical diagnosis and
   sample-to-patient assignments were not disclosed to the challenge
   participants, while gestational age at the time of sampling was also
   blinded for the test set. Teams were allowed to submit up to 5
   predictions using the test samples, and the best submission (smallest
   root mean square error [RMSE]) was retained for each unique team. We
   received 331 submissions for this sub-challenge from 87 participating
   teams, 37 teams of which provided the required details on the
   computational methods used to be qualified for the final team ranking
   in this sub-challenge ([111]Table S2).

Figure 2.

   [112]Figure 2
   [113]Open in a new tab

   Prediction of gestational age by whole-blood transcriptomics

   (A) Detroit cohort transcriptomics study design. Each line corresponds
   to 1 patient and each dot to 1 sample. Gestational ages at delivery are
   marked by a triangle. The set includes 703 samples from 133 women:
   controls (N = 49), women who delivered before 37 completed weeks of
   gestation by sPTD (N = 34) or PPROM (N = 37) and women who experienced
   an indicated delivery before 34 weeks due to early preeclampsia (N =
   13).

   (B) Test set prediction of gestational age by the model of the
   top-ranked team (M_GA_Team1). The 368 samples are colored according to
   the phenotypic group of patients. r, Pearson correlation coefficient.
   RMSE: root mean squared error.

   (C) Protein-protein interaction network modules for genes part of the
   249-gene core transcriptome predicting gestational age (M_GA_Core). A
   select group of biological processes enriched among these genes are
   shown in the pie charts.
   Video S1. Design of training and test sets in the DREAM Preterm Birth
   Prediction Challenge, related to Figure 2

   The video shows how the transcriptomics and proteomics datasets
   (Figures 2A and S3) were partitioned into training and tests sets to
   evaluate prediction of gestational age (sub-challenge 1) and
   spontaneous preterm birth (sub-challenge 2) (see also Table S5).
   [114]Download video file^ (7.5MB, mp4)

   Robustness analysis of team rankings (see [115]STAR Methods) suggested
   that the predictions of the top-ranked team (authors B.A.P. and I.C.,
   abbreviated as team 1) were significantly better (Bayes factor >3) than
   those of the second- (author Y.G., abbreviated as team 2) and
   third-ranked teams ([116]Figure S1). Among the top 20 teams, the most
   frequent methods used to select predictor genes included univariate
   gene ranking and meta-gene building via principal-component analysis,
   as well as literature-based gene selection. Common prediction models
   included neural networks, random forest, and regularized regression
   (LASSO and ridge regression), with the latter being used by the
   top-ranked team in this sub-challenge.

   The model generated by team 1 in sub-challenge 1 predicted gestational
   ages at blood draw with an RMSE of 4.5 weeks in the test set (Pearson
   correlation between actual and predicted values, r = 0.83, p < 0.001)
   ([117]Figure 2B). The correlation between predicted and actual
   gestational ages was also significant after accounting for repeated
   observations from the same patients in the test set via linear
   mixed-effects modeling (slope 0.76, likelihood ratio test p < 0.001;
   see [118]STAR Methods). The prediction model of team 1 (M_GA_Team1) was
   based on ridge regression, and the predictors were meta-genes derived
   by principal-component analysis from the expression data of 6,106
   genes. As shown in [119]Figure 2B, the gestational age predictions
   showed little bias in the second trimester (14–28 weeks) samples (mean
   error 0.6 weeks); however, gestational ages of first-trimester samples
   were overestimated (mean error 3.7 weeks), while the third-trimester
   samples were underestimated (mean error −1.96 weeks). This finding can
   be understood, in part, by the larger number of second-trimester
   samples relative to first- and third-trimester samples available for
   training of the model. Of interest, the prediction errors for
   complicated pregnancies were similar to those of normal pregnancies
   (ANOVA, p > 0.1), suggesting that this model of gestational age, in
   general, was robust for obstetrical disease- and parturition-related
   perturbations in gene expression data ([120]Figure S2).

   To identify a core transcriptome predicting gestational age in normal
   and complicated pregnancies that captures most of the predictive power
   of the full model (M_GA_Team1) that involved >6,000 predictor genes, in
   a post-challenge analysis, we combined linear mixed effects modeling
   for longitudinal data[121]^34 to prioritize gene expression and then
   used these features as input in a LASSO regression model. The resulting
   249 gene regression models (M_GA_Core) ([122]Figure S2; [123]Table S3)
   had an RMSE of 5.1 weeks (r = 0.80) and involved 2 tightly connected
   modules related to immune response, leukocyte activation, inflammation-
   and development-related Gene Ontology biological processes
   ([124]Figure 2C; [125]Table S4). We previously reported that several
   member genes of these networks (e.g., MMP8, CECAM8, and DEFA4) were
   most highly modulated in the normal pregnancy group used here,[126]^35
   and others have shown the same to be true at a cell-free RNA level in a
   Danish cohort.[127]^16 In addition, these data are consistent with the
   concept that pregnancy is characterized by a systemic cellular
   inflammatory response.[128]36, [129]37, [130]38, [131]39, [132]40 In
   this study, we also show that these mediators correlate with
   gestational age in both normal and complicated pregnancies, and the
   latter group contributed more than half of the transcriptomes used to
   fit and evaluate the models ([133]Figure S1; [134]Table S1).

Comparison of gene expression models and the clinical standard in predicting
time to delivery (TTD) in women with spontaneous term or preterm birth

   To enable a direct comparison with a previous landmark study of
   pregnancy dating by targeted cell-free RNA profiling,[135]^16 in a
   post-challenge analysis, we used the same methods as described above
   for model M_GA_Team1 (see [136]STAR Methods), except for the use of a
   time variable defined backward from delivery, hence, independent of LMP
   and ultrasound estimations as response [TTD = date at sample − date at
   delivery, (weeks)]. As in the study by Ngo et al.,[137]^16 only those
   patients with spontaneous term delivery were included in this analysis,
   thus omitting the subset of normal pregnancies that had been truncated
   by elective cesarean delivery. The training set in this analysis
   included 74 transcriptomes from 18 women and the test set included 64
   transcriptomes from 11 women with a spontaneous term delivery based on
   the original data split ([138]Figure S1). As shown in [139]Figure 3A,
   the gene expression model significantly predicted TTD with the same
   accuracy (RMSE, 4.5 weeks, r = 0.86, p < 0.001) as when predicting LMP
   and ultrasound-based gestational age in the full cohort of normal and
   complicated pregnancies. The predicted TTD values were then averaged
   over multiple samples per patient in a given gestational age interval
   to calculate accuracy, defined as predicting delivery within 1 week of
   the actual date and previously reported to be 55.1%.[140]^41 Results
   were also compared to the LMP and ultrasound fetal biometry, with the
   latter predicting delivery at 40 weeks of gestation. The test set
   accuracy of the gene expression model was 45% (5/11) based on
   third-trimester samples, which is comparable to the LMP and ultrasound
   estimate based on first- or second-trimester fetal biometry (55%)
   ([141]Figure 3A, bottom panel). Of note, 45% accuracy was also reported
   by Ngo et al.[142]^16 using cell-free RNA based on second- and
   third-trimester samples.

Figure 3.

   [143]Figure 3
   [144]Open in a new tab

   Prediction of time to delivery (TTD) by whole-blood transcriptomics

   (A) The top panel shows the test set TTD estimates from the M_sTD_TTD
   model plotted against actual values for 64 transcriptomes from 11
   women. The bottom panel shows the distribution of prediction errors
   (TTD observed–TTD predicted). A negative error means that delivery
   occurred sooner than expected/predicted, while positive values indicate
   the opposite. TTD was estimated using RNA measurements from the first-
   (T1), second- (T2), and third- (T3) trimester samples separately. For
   comparison, trimesters are defined as in Ngo et al. [145]^16 T1:
   <12 weeks; T2 = 12–24 weeks, and T3 = 24–37 weeks of gestation.

   (B) Prediction of TTD in women with spontaneous preterm birth by a gene
   expression model established in women with spontaneous term delivery
   (M_sTD_TTD). Predictions are shown for 355 longitudinal transcriptomes
   from 71 women with preterm prelabor rupture of membranes (PPROM, N =
   37) and spontaneous preterm delivery with intact membranes (sPTD, N =
   34). r, Pearson correlation coefficient. RMSE: root mean squared error.

   When data from all of the pregnancies with spontaneous term delivery
   were used to train a transcriptomic model of time to delivery and the
   model was applied to data from women with spontaneous preterm birth,
   the prediction was found to be statistically significant. However, the
   error increased (RMSE = 5.6) ([146]Figure 3B) relative to the estimate
   (RMSE = 4.5) for prediction of TTD in women with spontaneous term
   delivery ([147]Figure 3A). The additional preterm parturition-specific
   perturbations in gene expression explain, in part, the added
   uncertainty in prediction estimates of TTD in spontaneous preterm birth
   cases compared to spontaneous term pregnancies. Moreover, as expected,
   the term pregnancy TTD model overestimated the duration of pregnancy of
   women who were destined to experience preterm birth ([148]Figure 3B).
   The overestimation (mean prediction error) was 2.3 weeks compared to
   the 5-week gap between the LMP and ultrasound-based gestational ages at
   delivery in the term (mean = 39 weeks) and preterm (mean = 34 weeks)
   birth groups. The significant correlation of predicted and actual
   delivery dates in spontaneous preterm birth (sPTB) cases suggests that
   the M_sTD_TTD model captured both gene expression changes related to
   immune- and development-related processes establishing the age of
   pregnancy and the effects of the common pathway of
   parturition.[149]^42^,[150]^43 Hence, the prediction model generalized
   to the set of women with spontaneous preterm birth when samples at or
   near delivery were included and genome-wide gene expression data were
   available.

Prediction of preterm birth by maternal blood omics data collected in
asymptomatic women (sub-challenge 2)

   Post-challenge analyses of sub-challenge 1 demonstrated that a
   whole-blood transcriptomic model derived from the data of women with
   spontaneous term delivery (M_sTD_TTD) predicted delivery date in
   spontaneous preterm birth cases based on data collected throughout
   pregnancy, including near or at the time of preterm parturition up to
   37 weeks. With sub-challenge 2 of the DREAM Preterm Birth Prediction
   Challenge, we addressed the more difficult task of predicting preterm
   birth from data collected up to 33 weeks of gestation, while the women
   were asymptomatic. Of importance, the development of interventions to
   prevent preterm birth requires pregnant women at risk to be identified
   as early as possible before the onset of preterm parturition. Moreover,
   to enable future targeted studies of candidate biomarkers, we limited
   the maximum number of molecular predictors in this sub-challenge to be
   50 per outcome considered (see [151]Figure 4A).

Figure 4.

   [152]Figure 4
   [153]Open in a new tab

   Sub-challenge 2 design and results

   (A) Scenarios of spontaneous preterm birth model training and testing
   using multi-omics data. ∗Subjects in the original cohort were randomly
   split into equally sized groups that were balanced with respect to the
   phenotypes. ∗∗One-fifth of patients from the Detroit cohort (balanced
   with respect to the phenotypes) were randomly selected in the training
   set, while the remaining four-fifths were used as the test set.
   ^+Training set subjects were sampled with replacements from the
   original cohort to create different versions of the training set, and
   the trained model was then applied to the original test cohort. Sample
   sizes of training and test sets are shown in [154]Table S5.

   (B) Prediction performance for preterm birth-related outcomes based on
   algorithms submitted by 13 teams. AUROC values were converted into Z
   scores and shown as a heatmap for scenarios/outcome combinations shown
   in (A) that led to a significant prediction. sPTB, spontaneous preterm
   birth.

   We drew from the Detroit cohort longitudinal study ([155]Figure 2A)
   only samples collected at specific gestational age intervals while
   women were asymptomatic (i.e., before an eventual diagnosis of sPTD or
   PPROM). Two scenarios of prediction of preterm birth were devised: (1)
   to include cases and controls with available samples collected at 17–23
   and 27–33 weeks ([156]Figure S3A), and (2) to include patients with
   available samples collected at 3 gestational age intervals (17–22,
   22–27, and 27–33 weeks) ([157]Figure S3B). The selection of the 17- 23-
   and 27- 33-week intervals enabled cross-study model development and
   testing, with the microarray gene expression study of Heng
   et al.[158]^31 derived from a cohort in Calgary, Canada. Furthermore,
   we also included the profiles of 1,125 maternal plasma proteins
   measured by using an aptamer-based technology[159]^27^,[160]^44 in
   samples collected at 17–23 and 27–33 weeks of gestation from 66 women
   before the diagnosis of preterm birth (62 sPTD and 4 PPROM). These
   samples were profiled in the same experimental batch with samples from
   39 normal pregnancies that we previously described,[161]^21^,[162]^45
   which served here as controls ([163]Figure S3C). The characteristics of
   pregnancies with available proteomics profiles are shown in [164]Table
   S1.

   The prediction algorithms generated by 13 teams that participated in
   sub-challenge 2 were applied by the Challenge organizers to train and
   test models on 70 pairs of training/test datasets generated under 7
   scenarios ([165]Figure 4A; [166]Table S5; [167]Video S1). The scenarios
   differed in terms of omics data type, number of longitudinal
   measurements per patient, the outcome being predicted, and the patient
   cohorts used for training/testing ([168]Figure 4A). In all of the
   cases, there were no differences in terms of number of samples and
   gestational age at sampling between the cases and controls
   ([169]Figure S3).

   To assess the prediction performance in each of the 70 test sets in
   sub-challenge 2, we used both the area under the receiver operating
   characteristic curve (AUROC) as well as the area under the
   precision-recall curve (AUPRC), the latter being especially suited to
   imbalanced datasets (e.g., the proteomics set that features more cases
   than controls) ([170]Figure S3C).

   AUROC and AUPRC metrics were averaged over the 10 test sets of each
   prediction scenario and the result for each team was converted into a Z
   score. Final team rankings were obtained by aggregating the ranks over
   all of the scenarios and outcome combinations that were significant,
   according to at least one team, after multiple testing correction
   ([171]Table S6; see [172]STAR Methods). Robustness analysis of team
   ranks ([173]Figure S3D) determined that the top-ranked team (authors
   B.A.P. and I.C., abbreviated as team 1) outperformed the second-ranked
   team (author Y.G. abbreviated as team 2) and that the second- and
   third-ranked teams outperformed the fifth-ranked team (Bayes factor >
   3). For all of the scenarios ([174]Figure 4A), the models of team 1
   involved data from 50 molecules (RNA or proteins) collected at the last
   available measurement (closest to delivery), while team 2 used data
   collected at the last 2 available time points for 50 molecules selected
   based on overall expression as opposed to correlation with the outcome.
   Among other differences in their approaches, team 1 treated the outcome
   as a binary variable, while team 2 used a continuous variable derived
   from gestational age at delivery (see [175]STAR Methods). Of note, the
   two top-ranked teams were the same in both sub-challenges 1 and 2.

   A summary of prediction scenarios and outcome combinations with
   significant predictions based on the approach of at least one team in
   sub-challenge 2 is depicted in [176]Figure 4B. These results suggest
   overall higher prediction accuracy based on proteomics compared to
   transcriptomic data. We next highlight some of the prediction results
   for the top team.

   With the approach of team 1, one transcriptomic profile at 27–33 weeks
   of gestation from asymptomatic women predicted PPROM across the cohorts
   and microarray platforms with an AUCROC of ∼0.6, depending on the
   prediction scenario ([177]Figure 5A). Although separate differential
   expression analyses of the data from each cohort and time point failed
   to reach statistical significance after multiple testing correction,
   the consistency across cohorts and time points of gene expression
   changes preceding the diagnosis of PPROM was demonstrated by a
   post-challenge individual patient meta-analysis, which identified 402
   differentially expressed genes after adjusting for cohort and time
   point (moderated t test; q < 0.1) ([178]Figure 5B; [179]Table S7). A
   highly connected protein-protein interaction sub-network corresponding
   to genes significant in this meta-analysis is shown in [180]Figure 5C,
   and it illustrates some of the Gene Ontology biological processes
   significantly enriched in PPROM. These included vesicle-mediated
   transport and leukocyte- (myeloid and lymphocyte) mediated immunity,
   among others ([181]Table S4). Enrichment analysis based on canonical
   pathways and custom gene sets curated in the Molecular Signatures
   database (MSigDB)[182]^46 revealed perturbations associated with PPROM
   in 59 pathways such as interleukin-12 (IL-12), membrane trafficking,
   cytokine signaling in immune system, cellular senescence, and integrin
   cell surface interactions, among others (q < 0.1; see [183]Table S4).
   These data are consistent with the hypothesis that circulating myeloid
   (monocytes and neutrophils) and lymphoid (T cells) cells are especially
   activated in women who experience pregnancy complications such as
   preterm labor[184]47, [185]48, [186]49, [187]50 and PPROM.[188]^51

Figure 5.

   [189]Figure 5
   [190]Open in a new tab

   Prediction of preterm prelabor rupture of the membranes from samples
   collected in asymptomatic women

   (A) Receiver operating characteristic (ROC) curve representing
   prediction of PPROM by 50 genes across the cohorts and microarray
   platforms using the team 1 approach. Sample sizes of test sets used to
   derive the ROC curves are shown in [191]Table S5. AUC: area under the
   curve is given with 95% DeLong confidence intervals.

   (B) Heatmap of 402 genes differentially expressed in PPROM across the
   cohorts and time points. Bars on the left indicate gene inclusion as a
   predictor by the methods of the top 3 teams in sub-challenge 2.

   (C) STRING network constructed from among the 402 genes with
   differential expression in PPROM. Select significantly enriched
   biological processes are highlighted.

   Although participating teams in sub-challenge 2 did not have access to
   the longitudinal preterm birth plasma proteomics when they developed
   prediction algorithms, their algorithms resulted in prediction
   performances that surpassed those obtained by using transcriptomic data
   ([192]Figures 4 and [193]6A; [194]Table S6) when applied to training
   and test sets derived from the plasma proteomics set ([195]Figure S3C).
   Prediction of sPTD by the approach of team 1 involved 50 plasma
   proteins selected by random forest model importance from the panel of
   1,125 available proteins. The test set accuracy was the highest when
   using data collected at 27–33 weeks of gestation (AUROC = 0.76
   [0.72–0.8]). However, importantly, even one proteome profile at
   17–22 weeks of gestation predicted spontaneous preterm delivery
   significantly (AUROC = 0.62 [0.58–0.67]) ([196]Figure 6A), suggesting
   that this approach has value in the early identification of women at
   risk. The addition of four cases with PPROM to those with sPTD did not
   affect the prediction performance of the proteomics models of team 1,
   suggesting that this approach could generalize to both preterm birth
   phenotypes. The increase in plasma protein abundance of PDE11A and
   ITGA2B preceded the diagnosis of both sPTD and PPROM in the Detroit
   cohort at 27–33 weeks of gestation ([197]Figures 6B and 6C). The
   tightly interconnected network of proteins built from differential
   profiles with sPTD in asymptomatic women at 27–33 weeks of gestation
   (q < 0.1; [198]Figure 6B; [199]Table S8) included not only several
   previously known markers of preterm delivery (IL-6, ANGPT1) but also
   MMP7 and ITGA2B, which we previously described as dysregulated in women
   with preeclampsia.[200]^52 Member proteins of this network perturbed
   before a diagnosis of spontaneous preterm delivery are annotated to
   biological processes such as regulation of cell adhesion, response to
   stimulus, and development ([201]Figure 6D). Differentially expressed
   proteins preceding diagnosis with sPTD also included mediators
   annotated to biological processes found by transcriptomic analysis in
   PPROM, such as leukocyte-mediated immunity (AGER, PDPK1, LAG3, HAVCR2,
   IL-6, FCER2, CADM1), neutrophil-mediated immunity (PLAUR, IMPDH2,
   PRDX6, PA2G4, F2, IL-6, PPIE, GDI2), and regulation of vesicle-mediated
   transport (NAPA, PDPK1, MFGE8, ANGPT1, CAMK2A); however, enrichment of
   these biological processes did not reach statistical significance. In
   contrast, the pathway enrichment analysis based on MSigDB identified
   AMB2 neutrophils and cell surface interactions at the vascular wall
   pathways as significantly enriched based on plasma proteomic
   dysregulation preceding diagnosis with sPTD (q < 0.1; [202]Table S4).
   Other top-ranked pathways included nervous system development,
   developmental biology, focal adhesion, VEGFA/VEGFR2 signaling, and
   membrane trafficking pathways (p < 0.05; [203]Table S4), with the
   latter two being in common with those involved in PPROM ([204]Table
   S4).

Figure 6.

   [205]Figure 6
   [206]Open in a new tab

   Prediction of spontaneous preterm delivery by plasma proteomic data

   (A) ROC curve for sPTD and sPTB (which includes sPTD and PPROM) for
   team 1. The ROC curves were obtained from pooled predictions over 10
   test sets each test set including 20 controls versus 31 sPTD cases and
   20 controls versus 33 sPTB cases (see [207]Table S5). AUC: area under
   the curve is given with 95% DeLong confidence intervals.

   (B) Plasma protein abundance for all proteins deemed significant
   according to a moderated t test (q < 0.1); those selected as predictors
   by the top teams in their models are marked on the left side of the
   heatmap.

   (C) Overlap of protein changes with sPTD at 17–22 and 27–33 weeks, and
   with PPROM at 27–33 weeks. See also [208]Table S8.

   (D) Network of proteins among those shown in (B): each protein node is
   annotated to biological processes based on corresponding Gene Ontology.

   Given that differences in the patient characteristics could have
   contributed to the higher prediction performance of spontaneous preterm
   delivery by plasma proteomics as compared to maternal whole-blood
   transcriptomics, the approach of team 1 was also evaluated via
   leave-one-out cross-validation on a subset of 13 controls and 17 sPTD
   cases for which both types of data originated from the same blood draw.
   The prediction performance for spontaneous preterm delivery by plasma
   proteomics remained high (AUROC = 0.86 [0.7–1.0]), while prediction by
   transcriptomic data remained non-significant ([209]Figure 7),
   confirming the superior value of proteomics relative to transcriptomics
   for this endpoint. Of note, for a fixed number of 50 predictors
   allowed, a stacked generalization[210]^53 approach combining
   predictions from individual platform models via a LASSO logistic
   regression led to a higher leave-one-out cross-validation performance
   estimate (AUROC = 0.89 [0.78–1.0]) compared to building a single model
   from the combined transcriptomic and proteomic features
   ([211]Figure 7).

Figure 7.

   [212]Figure 7
   [213]Open in a new tab

   Comparison of prediction performance of spontaneous preterm delivery
   between platforms

   ROC curve for prediction of sPTD by models obtained with the approach
   of team 1 based on a subset of samples for which data from both
   platforms were available. Leave-one-out cross-validation was used to
   generate the ROC curves from a set of 13 controls and 17 sPTD cases.
   The multi-omics model was obtained by applying the same approach on a
   concatenated set of proteomic and transcriptomic features. The
   multi-omics stacked generalization approach involved combining
   predictions from models based on each platform via LASSO logistic
   regression. AUC: area under the curve is given with 95% DeLong
   confidence intervals.

   To extract further insights from the computational approaches best
   suited to predict preterm birth from longitudinal omics data in
   sub-challenge 2, we investigated which computational aspects explained
   the higher performances of the top two teams. Given that team 1 relied
   only on omics data at the last available time point (T2), we kept all
   of the aspects of this method except for the temporal information
   considered among the following: (1) first point (T1), (2) change in
   expression between T2 and T1 (slope), or (3) a combined approach in
   which slopes for all genes and measurements at T2 compete for inclusion
   in the 50 allowed predictors for a given outcome (PPROM or sPTD) (see
   [214]STAR Methods). As shown in [215]Figure S4, none of these
   approaches would have improved prediction performance relative to the
   baseline approach of team 1, which considered only the data from the
   last time point (T2). We then considered several key aspects of the
   approach of team 2 and have subsequently incorporated them in the
   approach of team 1 to determine whether such hybrid approaches could
   translate into higher performances relative to the baseline approach.
   In particular, we have modified the approach of team 1: (1) to start
   with only the top half of the most highly abundant features on each
   platform, (2) to convert the binary classification (preterm versus
   term) into a regression of gestational age at delivery, and (3) given
   the selected 50 predictor genes based on the correlation of T2
   expression values with the outcome, to add the expression of those
   genes at the previous time point as independent predictors in the
   random forest model. Of these three scenarios, the last, which expands
   the number of predictors from 50 to 100 without increasing the number
   of molecules, slightly outperformed the approach of the overall
   prediction performance of team 1 across scenarios ([216]Figure S4) and
   led to the consistent prediction of PPROM in all cross-study analyses
   (see improvement in prediction from [217]Figures 5A to [218]S4).
   Interestingly, simply doubling the number of molecules profiled at T2
   that were allowed as predictors in the model (from 50 to 100) led to a
   worse overall prediction performance relative to the approach of team 1
   that used only 50 molecules at T2 ([219]Figure S4). This finding
   suggests that for preterm birth prediction, it is more important to
   measure the right biomarkers at one additional time point than to
   double the number of markers at the most recent time point.

Discussion

   In this study, we evaluated maternal blood omics data to predict
   gestational age in normal and complicated pregnancies, as well as the
   risk of preterm birth. Although the main interest here was the
   prediction of spontaneous preterm birth, the correlation of omics data
   with advancing gestation was relevant not only to serve as a positive
   control for the evaluation of omics data but also to possibly provide
   relevant information for the development of more affordable tools to
   date pregnancy. We chose the DREAM collaborative competition
   framework[220]^30 to identify the best computational methods for making
   inferences and to assess them in an unbiased and robust way based on
   longitudinal omics data that we and others have generated. DREAM
   Challenges have been used to establish unbiased performance benchmarks
   across a wide array of prediction tasks.[221]54, [222]55, [223]56,
   [224]57, [225]58 Moreover, the results gained from these challenges
   define community standards and advancements in many scientific
   fields.[226]^59^,[227]^60

   Collectively, sub-challenge 1 and the additional post-challenge
   analyses demonstrated that models based on the maternal whole-blood
   transcriptome (1) significantly predict LMP and ultrasound-defined
   gestational age at venipuncture in both normal and complicated
   pregnancies (RMSE = 4.5) and (2) predict a delivery date within ±1 week
   in women with spontaneous term delivery with an accuracy (45%)
   comparable to the clinical standard (55%). The accuracy of gestational
   age prediction was likely understated based on sub-challenge 1 results
   due to the inclusion of cases with early preeclampsia and spontaneous
   preterm birth at much higher rates than expected in the general
   population. Disease-specific perturbations, especially close to the
   time of delivery in early preeclampsia and spontaneous preterm birth
   cases, are expected to have contributed additional variation to gene
   expression patterns establishing gestational age.

   Of interest, the accuracy of dating gestation in women with spontaneous
   term delivery was similar to the report by Ngo et al.,[228]^16 who used
   cell-free RNA profiling in a Danish cohort, although that study
   involved more frequent (weekly) sampling of fewer genes (about 50
   immune, placental, and fetal liver specific) instead of the genome-wide
   data used here. In the study by Ngo et al.,[229]^16 the TTD
   transcriptomic model derived from samples of women with normal
   pregnancy failed to predict delivery dates on independent cohorts of
   women with preterm birth, while in this study, such a model resulted in
   the significant prediction of delivery dates in cases with spontaneous
   preterm birth (r = 0.75; [230]Figure 3B). A possible explanation, in
   addition to the cohort differences between the training and testing
   sets in the previous study, is that our model of normal pregnancy
   captured not only gene changes establishing the gestational age but
   also those changes involved in the common pathway of labor. While the
   prediction of the delivery date of women with spontaneous preterm birth
   by omics data collected up to <37 weeks, including samples taken when
   women were symptomatic, was demonstrated above without using any data
   from preterm birth cases to establish the model, it was also previously
   shown by others who used data from both cases and
   controls.[231]^10^,[232]^19^,[233]^61^,[234]^62

   In the context of sub-challenge 2, we tackled the issue of predicting
   spontaneous preterm birth from samples collected while women were
   asymptomatic before 33 weeks of gestation. Overall,
   transcriptomic-based prediction performance for PPROM was low (AUROC =
   0.6 at 27–33 weeks of gestation); however, the sub-challenge and
   post-challenge analyses provided evidence of changes in maternal
   whole-blood gene expression that precede a diagnosis of PPROM and are
   shared across gestational age time points and racially diverse cohorts
   and different microarray platforms. These transcriptomic changes
   involved immune-, inflammation-, and metabolism-related biological
   processes and pathways ([235]Table S4). Plasma protein changes
   preceding a diagnosis with sPTD were larger at 27–33 weeks and led to
   higher prediction performance (AUROC = 0.76). The 90 proteins
   differentially abundant with sPTD ([236]Table S8) were encoded by genes
   annotated to some of the same biological processes found by
   transcriptomic analysis in PPROM, with vascular endothelial growth
   factor A/VEGF receptor 2 (VEGFA/VEGFR2) signaling and membrane
   trafficking pathways being top-ranked pathways based on both proteomics
   changes with sPTD and transcriptomic changes with PPROM. The
   involvement of membrane trafficking within the secretory membrane
   system, which includes the endoplasmic reticulum (ER), is in line with
   previous observations that ER stress is increased after spontaneous
   labor in gestational tissues, where it regulates the expression of
   prolabor mediators.[237]^63^,[238]^64 The involvement of the VEGF
   family of proteins in early placentation and of the abnormalities in
   maternal plasma and placental expression of angiogenic factors was also
   reported in adverse pregnancy outcomes.[239]^65 Moreover, proteins
   annotated to endocrine system development (PDPK1, IL-6), a pathway
   associated with parturition,[240]^66 were increased in the maternal
   plasma before the onset of sPTD. The inflammatory cytokine IL-6, known
   to play a central role during pregnancy and its
   complications,[241]^67^,[242]^68 was increased in the amniotic fluid of
   women having a preterm labor with intra-amniotic infection.[243]^69
   IL-6 and ANGPT4, a member of the angiopoietins family, were highlighted
   as predictors of preterm birth based on proteomics analysis of maternal
   plasma at 8–20 weeks of gestation in a population of women from low-
   and middle-income countries.[244]^70

   Regarding the comparison between omics platforms for the prediction of
   preterm birth, this study demonstrated evidence of superior performance
   by plasma proteomics compared to whole-blood transcriptomics in the
   prediction of spontaneous preterm delivery (AUROC = 0.76 versus 0.6 at
   27–33 weeks) (see [245]Figure 7 for analyses in the same samples). This
   is in line with a recent multi-omics analysis in preterm birth using
   cell-free RNA and plasma proteomics in preterm birth.[246]^70 After we
   and other investigators reported the value of aptamer-based SomaLogic
   assays to predict early[247]^52 and late preeclampsia,[248]^45 in this
   study, we evaluated this platform to predict preterm birth and found
   the SomaLogic assay to be of superior value when compared to
   whole-blood transcriptomics in predicting spontaneous preterm delivery.
   The plasma proteomics signatures of spontaneous preterm birth
   identified here have direct implications for the development of future
   SomaSignal Tests that were demonstrated in other health applications by
   combining reproducible proteomic signals with machine
   learning.[249]^28^,[250]^29

   The use of crowdsourcing to evaluate computational approaches and
   longitudinal multi-omics data to predict preterm birth is a major
   strength of this study. Many independent approaches to solve this
   challenge were implemented by the data science community.
   Coincidentally, the first and second best-performing teams were the
   same for both sub-challenges, which is indicative of the team’s skill,
   as opposed to chance, a fact that has been observed in several other
   crowdsourcing initiatives (e.g., sbv
   IMPROVER,[251]^22^,[252]^71^,[253]^72 CAGI,[254]73, [255]74, [256]75,
   [257]76 and DREAM[258]^55^,[259]^77^,[260]^78). Another advantage of
   the DREAM Challenge framework is that the model development and the
   prediction assessment are separate; thus, the risk of overstating the
   prediction performance is reduced. As with other similar crowdsourcing
   initiatives, we investigated the key factors that could explain the
   higher prediction performance of the top teams relative to the other
   teams. Given the multitude of differences in prediction pipelines among
   teams, it is challenging to single out individual key components that
   explain prediction performance variability. Therefore, in
   post-challenge analyses of sub-challenge 2, we modified the approach of
   team 1 to include a single new element borrowed from the approach of
   team 2. Based on this strategy, we have identified that the reliance of
   team 1 on the last-available snapshot of molecular activity was a key
   methodological aspect that was superior to the use of all of the
   available time points or the rate of change across points as
   implemented by other lower-ranked teams. This finding is in agreement
   with previous observations that the closer the sampling to the clinical
   diagnosis, the higher the predictive value of the
   biomarkers.[261]^45^,[262]^52^,[263]^79 Once the molecular signature
   was reliably selected based on the last available time point, also
   including the measurements at the second-to-last available time point
   as independent predictors into the model would have been beneficial to
   improve prediction of preterm birth ([264]Figure S4).

   This robust evaluation of prediction performance, combined with a
   separate consideration of preterm birth phenotypes (sPTD and PPROM), of
   time points at sampling, and multi-omic platforms, makes this work one
   of the most comprehensive longitudinal omics studies in preterm birth.
   Importantly, this study provides omics data in a majority
   African-American cohort in which the rate of prematurity is higher than
   that observed in other populations[265]^80 and omics data are scarce.
   Data collected in diverse populations are needed since some
   disease-related molecular changes can be cohort specific, as it was
   reported for other pregnancy complications such as
   preeclampsia.[266]^81 Finally, the work herein has resulted in
   computational algorithms with associated code made available to the
   community with an open-source license, allowing for reproducible
   research and applications to other similar research questions based on
   longitudinal omics data.

Limitations of the study

   Two possible limitations of the comparison between omics platforms are
   the lower sample size used to analyze the same blood draws and the much
   larger number of transcriptomic than proteomic features, which made the
   “needle in the haystack” problem more difficult for the transcriptomic
   platform. This curse of dimensionality was noted when transcriptomic
   and proteomic features were combined, resulting in a lower performance
   estimate for the multi-omics model obtained with the approach of team
   1, than for proteomics data alone. Although here the remedy to this
   issue was to combine the predictions of each platform into a meta-model
   (stacked generalization) ([267]Figure 7), alternative approaches focus
   on biologically plausible sets of features derived by single-cell
   genomics. This latter category of methods was demonstrated to predict
   preeclampsia[268]^79^,[269]^82 and to distinguish between women with
   spontaneous preterm labor and the gestational age-matched
   controls.[270]^49^,[271]^50 Another limitation of the study is that the
   RNA data collection was limited to genes present on the Human
   Transcriptome Array 2.0 microarray platform as opposed to
   sequencing-based methods that could provide a more comprehensive
   snapshot of the transcriptome.[272]^83

Consortia

   The DREAM Preterm Birth Prediction Challenge Consortium is listed below
   in alphabetical order, and author affiliations are available in
   [273]Table S9.

   Benan Bardak, Madhuchhanda Bhattacharjee, Michael Blair, Huiyuan Chen,
   Feng Cheng, Changje Cho, Junseok Choe, Mohit Choudhary, Yang Dai,
   Ophilia Daniel, Bikram K. Das, Francisco de Abreu e Lima, Anjali Dhall,
   Işıksu Ekşioğlu, Bogdan N. Gavrilovic, Akshay Gupta, Romeharsh Gupta,
   Rohan Gurve, Dániel Györffy, Eric D. Hill, Jinseub Hwang, Yuguang F.
   Ipsen, Rıza Işık, Priyansh Jain, Pratheepa Jeganathan, Sujae Jeong,
   Chan-Seok Jeong, Anshul Jha, JinZhu Jia, Jaewoo Kang, Hyojin Kang,
   Gaurang A. Karwande, Harpreet Kaur, Hannah Kim, Keonwoo Kim, Sunkyu
   Kim, Dohyang Kim, Junseok Kim Jongtae Kim, Min-Jeong Kim, Amrit
   Koirala, Adriana N. König, Prachi Kothiyal, Vladimir B. Kovacevic,
   Aleksandra V. Kovacevic, Shiu Kumar, Chandrani Kumari, Christoph F.
   Kurz, Taeyong Kwon, Thuc D. Le, Kyeongjun Lee, Hyungyu Lee, Dawoon
   Leem, Shuya Li, Weng Khong Lim, Xinyue Liu, Yunan Luo, Bahattin C.
   Maral, Suyash Mishra, Yeongeun Nam, Leelavati Narlikar, Thin Nguyen,
   Zoran Obradovic, Hyeju Oh, Kousuke Onoue, Hyojung Paik, Wenchu Pan,
   Bogyu Park, Sumeet Patiyal, Jian Peng, Dimitri Perrin, Kaike Ping,
   Alidivinas Prusokas, Augustinas Prusokas, Peng Qiu, Gajendra P.S.
   Raghava, Derek Reiman, Renata Retkute, Nay Min Min Thaw Saw, Neelam
   Sharma, Alok Sharma, Ronesh Sharma, Rahul Siddharthan, Musalula
   Sinkala, Alex Soupir, Marija Stanojevic, Yufeng Su, Alexander M.
   Sutherland, András Szilágyi, Mehmet Tan, Nandor G. Than, Buu Truong,
   Edwin Vans, Fangping Wan, Rohan B.H. Williams, Wendy S.W. Wong, Jeong
   Woong, Li Xiaomei, Dongchan Yang, Sanghoo Yoon, Dakota York, James
   Young, and Wei Zhu.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Biological samples
     __________________________________________________________________

   Human whole blood and plasma samples Perinatology Research Branch, an
   intramural program of the Eunice Kennedy Shriver NICHD, NIH, DHHS,
   Wayne State University (Detroit, MI, USA), and the Detroit Medical
   Center (Detroit, MI, USA) N/A
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   GeneChip WT Pico Reagent Kit Affymetrix (Thermo Fisher Scientific) P/N
   703262 Rev. 1
   Human Transcriptome Arrays (HTA 2.0) Affymetrix (Thermo Fisher
   Scientific) P/N 902162
   SOMAmer proteomic assays and profiling services (1,125 proteins)
   SomaLogic, Inc. Gene Expression Omnibus: [274]GPL28509
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   Raw and preprocessed transcriptomics data This paper Gene Expression
   Omnibus:: [275]GSE149440
   Raw and preprocessed proteomics data This paper Gene Expression
   Omnibus:: [276]GSE150167
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   oligo Carvalho and Irizarry[277]^84
   [278]https://www.bioconductor.org/packages/release/bioc/html/oligo.html
   limma Smyth[279]^85
   [280]https://www.bioconductor.org/packages/release/bioc/html/limma.html
   lme4 Bates et al.[281]^34
   [282]https://cran.r-project.org/web/packages/lme4/index.html
   glmnet Friedman et al.[283]^86
   [284]https://cran.r-project.org/web/packages/glmnet/index.html
   Cytoscape Otasek et al.[285]^87 [286]https://cytoscape.org/
   GOstats Falcon and Gentleman[287]^88
   [288]https://bioconductor.org/packages/release/bioc/html/GOstats.html
   Predictive modeling; Sub-challenge 1, Team 1 This paper
   [289]https://www.synapse.org/#!Synapse:syn20684755
   Predictive modeling; Sub-challenge 2, Team 1 This paper
   [290]https://www.synapse.org/#!Synapse:syn21443858
   MSigDB curated gene sets Liberzon et al.[291]^46
   [292]http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp#C2
     __________________________________________________________________

   Other
     __________________________________________________________________

   Calgary cohort transcriptomics data Heng et al.[293]^31 GEO:
   [294]GSE59491
   Resource website for the DREAM Preterm Birth Prediction Challenge,
   including data, software code, and vignettes This paper
   [295]https://www.synapse.org/pretermbirth
   [296]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Adi L. Tarca
   ([297]atarca@med.wayne.edu).

Materials availability

   This study did not generate new unique reagents.

Data and code availability

   The accession number for the transcriptomic and proteomic data from the
   Detroit cohort described herein are Gene Expression Omnibus
   super-series [298]GSE149440 and [299]GSE150167, respectively. They were
   also submitted to the March of Dimes repository
   ([300]https://www.immport.org/shared/study/SDY1636).

   Analysis scripts for transcriptomic data preprocessing and for building
   prediction models based on the approaches of the participating teams in
   sub-challenges 1 and 2 are available from the Challenge website
   ([301]https://www.synapse.org/pretermbirth). Direct links to method
   write-ups and computer code for prediction of gestational age and
   preterm birth are also available in [302]Tables S2 and [303]S6,
   respectively. Moreover, R code vignettes demonstrating the use of
   participant methods and key post-challenge analyses were also provided
   at [304]https://www.synapse.org/pretermbirth.

Experimental model and subject details

Human subjects, clinical specimens, and definitions

   Women who provided blood samples included in the transcriptomic (n =
   149) and proteomic (n = 105) studies described in the [305]Results
   section were enrolled in a prospective longitudinal study at the Center
   for Advanced Obstetrical Care and Research of the Perinatology Research
   Branch, NICHD/NIH/DHHS; the Detroit Medical Center; and the Wayne State
   University School of Medicine. Blood samples were collected at the time
   of prenatal visits, scheduled at four-week intervals from the first or
   early second trimester until delivery, during the following
   gestational-age intervals: 8- < 16 weeks, 16- < 24 weeks, 24- <
   28 weeks, 28- < 32 weeks, 32- < 37 weeks, and > 37 weeks. Collection of
   biological specimens and the ultrasound and clinical data was approved
   by the Institutional Review Boards of Wayne State University (WSU
   IRB#110605MP2F) and NICHD (OH97-CH-N067) under the protocol entitled
   “Biological Markers of Disease in the Prediction of Preterm Delivery,
   Preeclampsia and Intra-Uterine Growth Restriction: A Longitudinal
   Study.” Cases and controls were selected retrospectively and sample
   size was determined based on sample availability and cost of
   experiments.

   The first ultrasound scan during pregnancy was used to establish
   gestational age if this estimate was more than 7 days from the
   LMP-based gestational age. The first ultrasound scan was obtained
   before 14 weeks of gestation for 70% of the women, and 95% of the women
   underwent the first ultrasound before 20 weeks of gestation.
   Preeclampsia was defined as new-onset hypertension that developed after
   20 weeks of gestation (systolic or diastolic blood pressure ≥ 140 mm Hg
   and/or ≥ 90 mm Hg, respectively, measured on at least two occasions, 4
   hours to 1 week apart) and proteinuria (≥300 mg in a 24-hour urine
   collection, or two random urine specimens obtained 4 hours to 1 week
   apart containing ≥ 1+ by dipstick or one dipstick demonstrating ≥ 2+
   protein).[306]^89 Early preeclampsia was defined as preeclampsia
   diagnosed before 34 weeks of gestation, and late preeclampsia was
   defined by diagnosis at or after 34 weeks of gestation.[307]^90 The
   diagnosis of PPROM was determined by a sterile speculum examination
   with documentation of either vaginal pooling or a positive nitrazine or
   ferning test.[308]^91 Spontaneous preterm labor and delivery was
   defined as the spontaneous onset of labor with intact membranes and
   delivery occurring prior to the 37^th week of gestation.[309]^92
   Demographic characteristics of the study population are summarized in
   [310]Table S1, and they are available for each individual patient in
   the GEO datasets (see [311]Data and code availability).

Method details

Maternal whole blood transcriptomics

   RNA was isolated from PAXgene® Blood RNA collection tubes (BD
   Biosciences, San Jose, CA; Catalog #762165) and hybridized to GeneChip
   Human Transcriptome Arrays (HTA) 2.0 (P/N 902162), as previously
   described.[312]^35 Microarray experiments were carried out at the
   University of Michigan Advanced Genomics Core, a part of the Biomedical
   Research Core Facilities, Office of Research (Ann Arbor, MI, USA).

Maternal plasma proteomics

   Maternal plasma protein abundance was determined by using the SOMAmer
   (Slow Off-rate Modified Aptamer) platform and reagents to profile 1,125
   proteins.[313]^27^,[314]^44 Proteomic profiling services were provided
   by SomaLogic, Inc. (Boulder, CO, USA). As we previously
   described,[315]^21 plasma samples were diluted and then incubated with
   SOMAmer mixes pre-immobilized onto streptavidin-coated beads. The beads
   were washed to remove non-specifically bound proteins and other matrix
   constituents. Proteins that remained bound to their cognate SOMAmer
   reagents were tagged using an NHS-biotin reagent. After the labeling
   reaction, the beads were exposed to an anionic competitor solution that
   prevents non-specific interactions from reforming after disruption.
   Pure cognate-SOMAmer complexes and unbound (free) SOMAmer reagents are
   released from the streptavidin beads using ultraviolet light that
   cleaves the photo-cleavable linker used to quantitate protein. The
   photo-cleavage eluate, which contains all SOMAmer reagents (some bound
   to a biotin-labeled protein and some free), was separated from the
   beads and then incubated with a second streptavidin-coated bead that
   binds the biotin-labeled proteins and the biotin-labeled
   protein-SOMAmer complexes. The free SOMAmer reagents were then removed
   by washing. In the last elution step, protein-bound SOMAmer reagents
   were released from their cognate proteins using denaturing conditions.
   SOMAmer reagents were then quantified by hybridization to custom DNA
   microarrays. The Cyanine-3 signal from the SOMAmer reagent was detected
   on microarrays.[316]^27^,[317]^44

Sub-challenge 1 organization

   For sub-challenge 1, aimed at predicting gestational age at sampling
   from whole blood transcriptomic data in normal and complicated
   pregnancies, a training set and a test set were generated
   ([318]Figure S1; [319]Video S1). Transcriptomic gene expression data
   were made available to participants for both the training and test
   sets. Gestational age was provided for the training set and
   participants were required to submit predicted gestational-age values
   for the test set, which were compared in real time against the gold
   standard; the RMSE was posted to a leaderboard that was live from May
   22, 2019, to August 15, 2019. Up to five submissions per team were
   allowed, and they were ranked by the RMSE, and the smallest value was
   retained as entry for each unique team ([320]Table S1). Only the teams
   who described their approach and provided the analysis code were
   retained in the final team rankings.

Sub-challenge 2 organization

   In the first phase of sub-challenge 2, participants were invited to
   develop preterm birth prediction algorithms using gene expression data
   from longitudinal transcriptomic data collected from 17 to less than
   37 weeks of gestation from women with a normal pregnancy and from cases
   of preterm birth (sPTD and PPROM) illustrated in [321]Figure 2. The
   training set was composed of data from the Calgary cohort and a
   fraction of the Detroit cohort ([322]Figure 2), while the test set
   comprised the remainder of the Detroit cohort. Teams were requested to
   submit a risk value (probability) for all samples when classifying test
   samples as sPTD versus Control, and as PPROM versus Control. The AUROC
   and AUPRC were calculated separately for each prediction task and the
   ranks for each of the resulting four performance measures were
   calculated for each team and aggregated by summation. Two predictions
   per team were allowed and performance results on the test set were
   posted to a live leaderboard from August 15, 2019, to December 5, 2019.

   Because the prediction models developed in the first phase of
   sub-challenge 2 could have captured eventual differences between the
   cases and controls in terms of the timing and number of samples, a
   second phase of the sub-challenge 2 was organized (December 5, 2019 to
   January 3, 2020) for which teams were asked to provide prediction
   algorithms (computer code) instead of predictions of a given test
   dataset.

Quantification and statistical analysis

Transcriptomics data preprocessing

   Raw intensity data (CEL files) were generated from array images using
   the Affymetrix AGCC software. CEL files from this study and those for
   the Calgary cohort were preprocessed separately for each platform.
   ENTREZID gene level expression summaries were obtained with Robust
   Multi-array Average (RMA)[323]^93 implemented in the oligo
   package[324]^84 using suitable chip definition files from
   [325]http://brainarray.mbni.med.umich.edu. Since samples in the Detroit
   cohort were profiled in several batches, correction for potential batch
   effects was performed using the removeBatchEffect function of the
   limma[326]^85 package in Bioconductor.[327]^94 Cross-study/platform
   analyses were performed on a combined dataset after quantile
   normalizing data across all samples for the set of common genes,
   followed by platform effect-removal.

Proteomics data preprocessing

   The protein abundance in relative fluorescence units was obtained by
   scanning the microarrays. A sample-by-sample adjustment in the overall
   signal within a single plate was performed in three steps per
   manufacturer’s protocol, as we previously described.[328]^21^,[329]^45
   Outlier values (larger than 2 × the 98^th percentile of all samples)
   were set to 2 × the 98^th percentile of all samples. Data was log[2]
   transformed before applying machine learning and differential abundance
   analyses.

Sub-challenge 1 robustness analysis of team ranks

   To determine whether differences in gestational-age prediction accuracy
   between the different teams were robust, we have simulated the
   challenge by drawing 1000 bootstrap samples of the test set. RMSE
   values were calculated for each submission (1 to at most 5) for each
   team, and we retained the submission with the smallest RMSE. Team ranks
   were calculated and the Bayes factors were then calculated as the ratio
   between the number of iterations in which the team k performed better
   than the team ranked next (k+1) relative to the number of iterations
   when the reverse was true. A Bayes factor > 3 was considered a
   significant difference in ranking (see [330]Figure S1B).

Sub-challenge 1 top two algorithms

   Team 1: The first-ranked team in this sub-challenge (authors B.A.P. and
   I.C.) used gene-level expression data after filtering out samples
   considered as outliers, followed by the standardization of gene
   expression for each microarray experiment batch separately. Genes were
   ranked by using singular value decomposition, and those genes having
   higher dot products with singular vectors that correspond to large
   singular values across the training samples were assigned a higher
   score.

   In the next step, ∼6000 genes were selected based on the described
   ranking, which was based on cross-validation results on the training
   set using a ridge regression model. Ridge regression[331]^95 models
   were fitted using the Sklearn package in Python (version 3).

   Team 2: The second-ranked team in this sub-challenge (author Y.G.)
   applied quantile normalization to gene level expression data, followed
   by the modeling of the gestational-age values using Generalized Process
   Regression and Support Vector Regression. Model tuning parameters were
   optimized using a grid search, and predictions by the two approaches
   were weighted equally. Models were fit using Octave.

Sub-challenge 2 team ranking

   The algorithms submitted by participants in the final stage of
   sub-challenge 2 were applied as implemented by the participants without
   any tuning to the 70 pairs of training and test datasets described in
   [332]Figure 4A, [333]Table S5, and [334]Video S1. In each of the 7
   scenarios in [335]Figure 4A, there were 2 outcomes predicted (sPTD
   versus Control; and PPROM versus Control), except for proteomic data
   (scenario DP2), where the feasible comparisons were sPTD versus control
   and sPTB versus control; the sPTB group was defined as the union of
   sPTD and PPROM cases. The AUROC and AUPRC were used to assess
   predictions for each outcome based on predictions on each of the 10
   test sets for each scenario, and then were averaged over the 10 tests
   sets. The resulting 28 prediction performance averages (7 scenarios x 2
   outcomes x 2 metrics) for each team were converted into Z-scores by
   subtracting the mean and dividing by the standard deviation of these
   metrics obtained from 1,000 random predictions (random uniform
   posterior probabilities). Further, only the combinations of scenarios
   and outcomes resulting in a significant prediction performance (False
   Discovery Rate-adjusted p value derived from Z-scores, q < 0.05) for at
   least one of the 13 teams, were considered for team ranking, resulting
   in 20 performance criteria for each team. Teams were ranked by each of
   the 20 prediction performance criteria, and a final rank was generated
   based on the sum of the ranks over all criteria ([336]Table S6).

Sub-challenge 2 robustness analysis of team ranks

   To assess the significance of the differences in prediction performance
   of preterm birth among the teams based on omics data, we used the same
   ranking procedure described above in more than 1,000 simulated
   iterations of the sub-challenge. At each iteration, the rankings were
   calculated by using prediction performance results that corresponded to
   a bootstrap sample of the 10 train/test pairs pertaining to each
   scenario and, at the same time, taking a bootstrap sample of the
   prediction criteria (columns in [337]Table S6). Bayes factors were then
   calculated as the ratio between the number of iterations in which the
   team k performed better than the team ranked next (k+1) relative to the
   number of iterations when the reverse was true. A Bayes factor > 3 was
   considered a significant difference among rankings ([338]Figure S3B).

Sub-challenge 2, the top three algorithms

   Team 1: The algorithm of the first-ranked team in this sub-challenge
   (authors B.A.P. and I.C.) starts with standardizing the input omics
   data so that they have a zero mean and a standard deviation of 1 for
   each omics platform (if more than one in an input set, which was the
   case while training and testing across the platforms). A random forest
   classifier with 100 trees was fit to each prediction task (sPTD versus
   Control and PPROM versus Control). The top 50 features, ranked by
   importance metric derived from the random forest, were selected for
   each task separately and used to fit a final model on the training
   data. Random forest models were fitted using the Sklearn package in
   Python (version 3).

   Team 2: The approach of the second-ranked team in this sub-challenge
   (author Y.G.) first centers the data of each feature around the mean
   for each platform (if more than one) in a given input set. Then, data
   is quantile normalized to make identical the distributions of feature
   data across the samples. Next, the top 50 features with the highest
   average over all samples are retained, and the feature values for the
   last-available two time points for each subject are used as predictors
   (100 predictors) in a Generalized Process Regression model, a Bayesian
   non-parametric regression technique. The two parameters of GPR
   regression were preset to an eye value of 0.75, which represents how
   much noise is assumed in the data, and a sigma of 10, a data
   normalization factor. Models were fitted using Octave.

   Team 3: The approach of the third-ranked team in this sub-challenge
   (author R.K.) starts with the selection of the top 50 features ranked
   by statistical significance p value derived from a t test or Wilcoxon
   test, depending on the normality of the data, and determined by a
   Shapiro test. Then, using the selected features, linear, sigmoid and
   radial Support Vector Machines models are fitted and compared via
   5-fold cross validation, and the predictions for the best method were
   averaged over the five trained models. Models were fit using the e1071
   package[339]^96 in R.

Assessing significance of gestational age prediction

   The correlation between gestational ages predicted by the
   transcriptomics model of Team 1 in sub-challenge 1 (M_GA_Team1) and
   actual gestational ages at blood draw was assessed using a naive
   Pearson correlation test, but also via linear mixed-effects modeling to
   account for repeated-measurements from patients in the test set. This
   latter analysis involved fitting a linear mixed-effects model in which
   the dependent variable was the transcriptomics predicted gestational
   age and the independent variable was the actual gestational age. The
   patient identifier was included as a random effect in this model. A
   likelihood ratio test implemented in the lme4 package[340]^34 was used
   to determine the significance of the linear relation between actual and
   transcriptomics-predicted gestational ages.

Identification of a core transcriptome predicting gestational age

   To identify a core transcriptome that can predict gestational age in
   normal and complicated pregnancies, linear mixed-effects models with
   splines were applied to prioritize genes that change with gestational
   age while accounting for the possible non-linear relation and for the
   repeated observations from each individual, as we previously
   described.[341]^35 Of note, participating teams could have not used
   such an approach given that sample-to-patient annotations were not
   provided on the training data. Then, the genes that did not change in
   average expression by at least 10% over the 10-40-week span were
   filtered out, and the remaining genes were ranked by p values from the
   linear mixed-effects models. The top 300 genes were then used as input
   in a LASSO regression model (elastic net mixing parameter alpha = 0.01)
   for which the shrinkage coefficient (lambda) was determined by
   cross-validation, leading to 249 genes with non-zero coefficients in
   the model ([342]Table S3). Of note, using more than 300 genes as input
   in the ridge regression model did not further reduce the RMSE on the
   test set. LASSO models were fit using the glmnet package[343]^86 in R.

Differential expression and abundance analyses

   Differences in gene expression or protein abundance between the cases
   and controls were assessed based on linear models implemented in the
   limma package[344]^97 in Bioconductor. When data across time points
   and/or cohorts were combined, these factors were included as fixed
   effects in the linear models. Downstream analyses of the differentially
   expressed genes involved enrichment analysis via a hypergeometric test
   implemented in the GOstats package[345]^88 to determine the
   over-representation of Gene Ontology[346]^98 biological processes among
   the significant genes. Additional enrichment analyses for both
   transcriptomics and proteomics platforms in sub-challenge 2 were based
   on a hypergeometric test with pathway definitions extracted from the C2
   collection of the MSigDB database. The C2 collection in MsigDB includes
   pathways from the Pathway Interaction Database,[347]^99 Kyoto
   Encyclopedia of Genes and Genomes,[348]^100 Reactome database,[349]^101
   and Wiki Pathways,[350]^102 among other sources. The background list in
   the enrichment analyses featured all genes profiled on the microarray
   platform. For proteomic-based enrichment analyses for sub-challenge 1,
   protein-to-gene annotations from the manufacturer (SomaLogic) were used
   as input in the stringApp version (1.5.0)[351]^103 in Cytoscape
   (version 3.7.2)[352]^87 using the whole genome as background. A false
   discovery rate adjusted q < 0.1 was used throughout enrichment analyses
   to infer significance. Networks of high-confidence protein-protein
   interactions (STRING confidence score > 0.7) were constructed from the
   lists of significant genes/proteins using stringApp in Cytoscape. For
   visualization, the most interconnected sub-networks were displayed and
   nodes were annotated to significantly enriched biological processes.

Acknowledgments