Abstract Gene regulation is essential to placental function and fetal development. We built a genome-scale transcriptional regulatory network (TRN) of the human placenta using digital genomic footprinting and transcriptomic data. We integrated 475 transcriptomes and 12 DNase hypersensitivity datasets from placental samples to globally and quantitatively map transcription factor (TF)–target gene interactions. In an independent dataset, the TRN model predicted target gene expression with an out-of-sample R^2 greater than 0.25 for 73% of target genes. We performed siRNA knockdowns of four TFs and achieved concordance between the predicted gene targets in our TRN and differences in expression of knockdowns with an accuracy of >0.7 for three of the four TFs. Our final model contained 113,158 interactions across 391 TFs and 7712 target genes and is publicly available. We identified 29 TFs which were significantly enriched as regulators for genes previously associated with preterm birth, and eight of these TFs were decreased in preterm placentas. __________________________________________________________________ A genome-scale network predicts and summarizes gene expression in the placenta, a key tissue for fetal development. INTRODUCTION The placenta is an essential organ with multiple functions including regulating gas, nutrient, and waste transport and providing immunological defense ([74]1). It also performs endocrine functions, produces key hormones, and coordinates the passage of hormonal and paracrine signals between the mother and developing fetus ([75]2). Thus, the placenta influences fetal development, growth, and health ([76]3). This complex organ is composed of various types of trophoblast cells, including extravillous trophoblasts, cytotrophoblasts, and the syncytiotrophoblasts. Syncytiotrophoblasts are a large terminally differentiated, multinucleated cell type that is the primary site of nutrient uptake/transport as well as peptide and steroid hormone synthesis. The placenta is composed of additional cell types such as stromal cells, fibroblasts, endothelial cells, and various immune cells ([77]4). With its many cell types and highly specialized functions, the placenta has a unique transcriptome ([78]5) which includes numerous genes expressed exclusively within this tissue. The placental transcriptome changes throughout gestation ([79]6, [80]7) and in response to cues from the maternal environment ([81]8). In this way, the placental transcriptome reflects the underlying physiological functions of the organ. Characterizing transcriptomic differences between normal and pathological placental tissue can provide insight into the molecular mechanisms involved in placental pathophysiology. Perturbations to the placental transcriptome are linked to pregnancy pathologies including preterm birth ([82]9) and preeclampsia ([83]10). Previous studies of individual genes have revealed several temporally regulated and cell type–specific transcription factors (TFs) that play key roles in discrete developmental stages and functions of the placenta. Throughout pregnancy, TFs including GCM1, GATA2/3, and CREB emerge as unique regulators of specific trophoblast subtypes ([84]11). These TFs regulate key placental processes such as invasion and nutrient transport that may be linked to adverse placental function and prenatal complications. Further, understanding these transcriptional changes in normal and pathological placentas can provide insights into the molecular mechanisms underlying gestational pathophysiology. There is a large and active field of research by the genome science community involving the generation and analysis of transcriptomic data using transcriptional regulatory networks (TRNs) ([85]12–[86]15), which map TFs to the target genes they regulate ([87]16). A TRN is made up of nodes (TFs and their target genes) and edges, which explain which TFs regulate which target genes. We have defined TRN edges based on TF binding within regions of open chromatin and relationships in gene expression between TFs and target genes. TRNs are organized into units known as “regulons,” that are composed of each individual TF and the target genes they regulate. Our team has previously constructed TRNs by integrating information about chromatin accessibility captured through deoxyribonuclease I hypersensitive sites sequencing (DNase-seq) or Assay for transposase-accessible chromatin using sequencing and TF binding using sequence motifs from a compendium of TFs and RNA sequencing (RNA-seq) ([88]17). We have generated a similar network in the human brain ([89]18), which revealed TFs that were regulated by viral abundance and associated with Alzheimer’s disease ([90]19), TFs that drive gene expression changes associated with bipolar disorder and schizophrenia ([91]18), regulatory functions for single-nucleotide polymorphisms (SNPs) associated with genetic risks of these disorders ([92]18), and TF–target gene relationships underlying gene expression changes related to Huntington’s disease ([93]20). Our brain TRN model has demonstrated that quantifying transcriptional networks has the potential to illuminate the etiology of a wide variety of pathological disorders. In the present work, we construct a placental-specific TRN, based on its important role in fetal development and distinct transcriptional regulation compared to other tissues. Other network biology-based approaches have been previously deployed to gain insight into placental function and prenatal disease, such as Weighted gene coexpression network analysis (WGCNA). WGCNA is a network strategy that uses coexpression clustering to identify coregulated gene clusters based on correlated expression ([94]21). A WGCNA of the human placental transcriptome collected at delivery revealed gene clusters involved in core placental functions that were reproducible across trimesters (in mice) ([95]22). An independent WGCNA constructed on a large number of term placenta transcriptomes revealed modules associated with fetal growth restriction (FGR) ([96]23). TRNs are distinct from other network databases, such as the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) protein interaction database, which collects and integrates physical interactions and coexpression data with curated text mining of scientific literature, because this network is not tissue specific and does not include any information about transcriptional regulation ([97]24). The TRN is also distinct from tissue-specific networks such as the HumanBase database, which is a compendium of tissue-specific genome-scale functional networks created by integrating hundreds of publicly available RNA-seq datasets, and interaction networks defined using data-driven Bayesian methodologies ([98]25). Our placental TRN implements constraints of coexpression, TF binding, and chromatin accessibility using highly curated placental datasets, therefore is more conservative. Combining coexpression analyses with chromatin accessibility data is essential for a deeper understanding of the underlying transcriptional mechanisms of regulation. A placenta TRN constructed using DNase-seq data derived from first-trimester placental tissue has been used to model the transformation of embryonic stem cells to trophoblast cells, providing key insights into TFs which regulate genes with regions of open chromatin in this context ([99]26). An accurate model of the placental transcriptome at the end of gestation, enhanced with DNase-seq data, will provide valuable insight into TF–target gene interactions and serve as a tool for placental gene expression analyses. Our goal was to construct the first placental-specific genome scale TRN using DNase-seq data and curated TF binding motif databases paired with RNA-seq data compiled from 475 placental samples. We applied a machine learning approach to evaluate the accuracy of the model at predicting gene expression and performed experimental validation of transcriptional regulation of a core group of TFs in isolated villous cytotrophoblasts. Because previous transcriptomic studies identified fetal sex as a critical biological variable that affects the placental transcriptome ([100]27) and perinatal outcomes ([101]28), we also characterized sex-specific differences in TF–target gene interactions within the placenta. We hypothesized that our network could reveal core TFs that regulated genes whose placental expression was altered in the context of prematurity. To test this hypothesis, we reanalyzed a list of genes whose placental expression was previously associated with preterm birth in an independent analysis ([102]29) to identify the TFs that regulated these genes. We also identified specific TFs that regulated specific placental cell types, based on signatures from recently published single-cell RNA-seq data ([103]30). These examples demonstrate how this approach can be broadly applied to other types placental omics data (e.g., DNA methylation data), providing researchers a different way to interpret lists of candidate genes and identify transcriptional regulators. RESULTS Reconstruction of a transcriptional regulatory network model for the human placenta Our placental TRN was constructed using transcriptomic data generated from 475 placental samples collected in two independent birth cohorts: (i) participants from Magee Womens Research Institute (MWRI) in Pittsburgh, PA and at the University of Tennessee Medical Center in Memphis, TN; and (ii) participants from the Conditions Affecting Cognitive Development and Learning in Early Childhood (CANDLE) study. DNase-seq data were generated in 12 term human placentas collected as part of the Oregon Health and Sciences Placental Biorepository. Covariate information for all placental samples used within the model can be found in [104]Table 1. We excluded samples from pregnancies with medical indications for preterm delivery, as well as multifetal gestations to capture the placental transcriptome of infants without medically complicated deliveries. Overall, the demographics of RNA-seq samples in this dataset was 46% Black, 52% white, and 2% other race groups. There were some intrinsic differences in the demographic and clinical information between the cohorts. The CANDLE cohort predominantly self-identified as Black (56%); whereas the samples from MWRI predominantly self-identified as white (85%). The mean gestational age of the MWRI samples was 39.6 weeks, and the mean gestational age of the CANDLE cohort was 39.2 weeks. A subset (3%) of the samples in the CANDLE cohort were born spontaneously prematurely (<37 weeks), and we have previously published transcriptomic signatures from these spontaneous preterm deliveries ([105]29). These spontaneous preterm samples were included in the model to capture variation in the placental transcriptome related to gestational age. This is important for future studies that may use this model to study a variety of pregnancy pathologies and with samples from varying gestational ages. Table 1. Covariate data from cohorts used for RNA-seq and DNase-seq data. Exact values are not reported for cells with <5 participants. BMI, body mass index. MWRI–RNA-seq CANDLE–RNA-seq OHSU–DNase-seq Males (52) Females (52) Total (104) Males (176) Females (176) Total (371) Males (6) Females (6) Total (12) Delivery method (N) C-section 15 14 29 66 74 140 6 6 12 Vaginal 37 38 75 110 121 231 0 0 0 Maternal race (N) Black 7 4 11 99 110 209 0 <5 <5 White 41 47 88 75 82 157 5 5 10 Other <5 <5 5 <5 <5 5 <5 0 <5 Gestational age (weeks) Range 37.43–41.71 37–41.14 37–41.71 36–41.43 34.29–41.29 34.29–41.43 38.57–39.29 38.29–39.57 38.29–39.57 Median 39.64 39.86 39.86 39.29 39.14 39.29 39.07 39.29 39.14 Birth weight (grams) Range 2755–4801 2830–4348 2755–4801 2407–4685 2097–4410 2097–4685 3280–4167 3220–4465 3220–4465 Median 3520.5 3359.5 3477.5 3420 3250 3311 3545 3657.5 3565 Maternal BMI (kg/m^2) Range 17.33–36.94 18.19–37.57 17.38–37.57 14.2–52.4 14.2–56.8 14.2–56.8 19.3–25 18.3–25.27 18.3–25.27 Median 23.39 23.02 23.03 25.85 25.70 25.80 23.91 23.06 23.82 [106]Open in a new tab Model construction and evaluation in testing dataset To construct our genome scale TRN, genomic footprints derived from placental DNase-seq data ([107]Fig. 1A) were paired with known TF binding motifs ([108]Fig. 1B) to generate putative edges between TFs and target genes ([109]Fig. 1C). Prior work by our group demonstrated that the digital genomic footprinting workflow used in this analysis achieved a Matthews correlation coefficient of 0.44 when compared to chromatin immunoprecipitation sequencing (ChIP-seq) data in matched cell types ([110]17). These results indicated that footprinting data alone did not improve accuracy but did improve robustness of predictions. There is sparse ChIP-seq data for TFs available within placental tissue, which limits our ability to perform this type of calculation ([111]17). Instead, we rely on prediction of target gene expression in an independent holdout dataset to assess model accuracy. Next, we generated a coexpression network of TFs and target genes in our training dataset using these putative edges ([112]Fig. 1D). We then performed a series of Lasso regression models to evaluate how different parameters of the model influenced the overall model size and ability to predict gene expression in our test dataset ([113]Fig. 1E). We constructed six models using Lasso regression with different parameters for our training dataset, which are summarized in table S1. The accuracy of each model was calculated as the R^2 value of the correlation coefficient (R) between the actual and predicted gene expression in the dataset generated from Lasso regression (out of sample R^2 or OOS R^2). The full range of predictions for each individual gene is depicted in fig. S1. Fig. 1. Overview of data and strategy used to construct TRN. [114]Fig. 1. [115]Open in a new tab (A) First, regions of open chromatin, i.e., footprints are identified from placental DNase-seq data. (B) Concurrently, TF binding motifs were curated from publicly available databases. (C) These data were overlayed (C) to identify putative regulators (TFs) of each target gene. (D) We then generated a coexpression network within these putative TF–target gene relationships using RNA-seq data. (E) We evaluated different parameters to prune the model in the training data. Lasso regression was used with selected TFs for each target gene to predict expression, and then the actual versus predicted expression was used to evaluate the model. (F) The final model was tested in our holdout data to generate final accuracy for each gene. (G) Only target genes we could predict with high confidence were included in our final model. TSS, transcriptional start site. The effects of three parameter choices were evaluated: (i) including or excluding enhancers (model 1 versus model 2), (ii) changing the correlation cutoff (model 1 versus models 3 and 4), and (iii) changing the number of TFs (model 1 versus models 5 and 6) (see table S1 and [116]Fig. 1E). We compared these models to the “reference model” (model 1), where we used the median correlation coefficient and included only the top 15 TFs that were correlated with expression of each target gene in our model. Overall, the two parameters that altered the model the most were (i) varying the cutoff for the correlation coefficient and (ii) the number of TFs included in each model (fig. S1). There were minimal differences in the overall model size or accuracy based on varying the maximum number of TFs in the model. Models with a more stringent cutoff contained fewer TFs, target genes, and interactions (models 4 and 5). Model 4 used a higher correlation coefficient cutoff (correlation coefficient > |0.5|) and overall had the highest accuracy of all the models tested (98% of samples had OOS R^2 > 0.25 in test dataset). There was no significant difference in the OOS R^2 values between models constructed with and without enhancers (P = 1, Tukey test; see table S2). We constructed our final model with enhancers to provide complete coverage of potential transcriptional regulators. We elected to construct our final model using a correlation cutoff of 0.25, as this model was among the more comprehensive in terms of interactions and TFs and had the highest accuracy outside of model 4. Thus, only this one model was tested against the final hold out dataset (which was never included in the process of model building and selection), minimizing overfitting from model selection. We also generated a series of “null” models that we compared to our initial model (model 1) to estimate (i) how our TRN performed compared to the background null distribution and (ii) that the addition of TF binding sites and TFs with strong correlations with target genes improved the accuracy of our Lasso regression models. These three null models (described in more detail in table S3) included models in the presence or absence of TF motifs with and without correlation coefficients above the absolute value of the median correlation cutoff. We evaluated the accuracy of these models to predict expression in our test dataset again using Lasso regression, and we compared the results of these Lasso regression models to the results from model 1 (fig. S1). All three of these null models were significantly less accurate at predicting gene expression in the test dataset than any of our actual models (P < 0.05, Tukey test; table S2). The null models including TFs with weak correlations with their target genes (median correlation coefficient), had comparable accuracy to model 7 (OOS R^2, 0.2831) but was still significantly lower than all of the primary models (Tukey test; see table S2 and fig. S1). Computational validation of model accuracy in holdout and independent dataset After model parameterization in the training dataset, we then combined the training and testing dataset (N = 382) and used this as input data to calculate our final accuracy in the holdout dataset (N = 93), again using Lasso regression ([117]Fig. 1F). This step is critical as a guard against overfitting as this holdout dataset was completely separate and not used during the construction of the initial variations in the model and was selected randomly from samples at the beginning of the project (see [118]Fig. 1 and fig. S2). In the holdout data, 88% of TF–target gene correlations were statistically significant after adjusting for multiple comparisons [false discovery rate (FDR) <0.05], and 83% of TF–target gene relationships were highly correlated with a correlation coefficient of >|0.25 | ([119]Fig. 2A). Across all genes, Lasso regression using the TFs identified in our model predicted target gene expression with a mean OOS R^2 of 0.39, with 73.35% of target genes achieving an OOS R^2 > 0.25. We constructed our final TRN using only target genes which we were able to predict with an OOS R^2 > 0.25 in our holdout dataset, removing 2802 genes from our final model. All remaining target gene Lasso regression models were also statistically significant after adjusting for multiple comparisons (FDR adjusted P < 0.05). We separately evaluated the reproducibility of our model in an independent placental transcriptomic dataset from the CANDLE cohort (N = 366) that was generated after model construction had been completed. In this separate dataset, 94.7% of TF–target gene correlations were statistically significant after adjustment for multiple comparisons (FDR <0.05), and 72.34% of TF–target gene relationships were highly correlated with a correlation coefficient of >|0.25 | (see fig. S3). In this independent dataset, Lasso regression using the TFs predicted target gene expression with a mean OOS R^2 of 0.40, with 73.0% of target genes achieving an OOS R^2 of >0.25. Thus, results were comparable in the original holdout dataset and the newer independent dataset. Fig. 2. Accuracy of TRN model to predict target gene expression in holdout dataset. [120]Fig. 2. [121]Open in a new tab (A) Distribution of Pearson correlation coefficients in training + testing dataset (N = 382 samples, pink) and holdout dataset (N = 93, blue). Red lines indicate 0.25, which was the cutoff in the training dataset. (B) Distribution of out of sample accuracy (OOS R^2) for each of the 10,984 genes calculated in our holdout dataset. Light blue box indicates genes with OOS R^2 < 0.25, which were genes not included in our final model. Dark blue genes were high confidence OOS R^2. Model description Our final robust model contained 7712 genes, with a subset of this model including 3213 genes (41.7%) with OOS R^2 prediction accuracy in our holdout dataset >0.5 (calculated from Lasso regression; [122]Fig. 2B). Our final model was composed of 113,158 interactions between 391 TFs and 7712 target genes, including 75,014 positively correlated interactions and 38,144 negatively correlated interactions ([123]Fig. 3A). In our full model, the effect estimates (correlation coefficients) ranged from −0.75 to −0.25 for the negative regulators and 0.25 to 0.93 for the positive regulators. The median regulon size (target genes per TF) was 156 and was heavily tailed with 75 “hub” TFs which regulated >500 genes ([124]Fig. 3B). Network centrality measures are provided in table S4. The eight most central TFs in our model based on eigenvector centrality (measure of influence of node within a network for each TF) were SP1, SMAD4, SP3, CTCF, STAT5B, NR2C2, YBX1, and CLOCK. Our model contained 25 TFs that were placenta-specific or placenta enriched, based on definitions from the Human Protein Atlas ([125]31), including known placental transcriptional regulators such as GCM1, DLX3, DLX4, and DLX5 (see table S5). The largest placental enriched transcriptional regulator was TFAP2A, which regulated 1255 different target genes. Fig. 3. Summary of genome scale TRN. [126]Fig. 3. [127]Open in a new tab (A) Network diagram and (B) network connectivity of 391 TFs based on their target genes. Orange line indicates hub TFs with >500 genes. Top 15 TFs based on degree centrality are shown in green Experimental validation of model accuracy We performed an small interfering RNA (siRNA) knockdown of four TFs (GRHL2, CREB3L2, ESRRG, and GCM1) in primary villous trophoblast cells. These four genes were selected because they were characterized as “placenta-enriched genes” from the Human Protein Atlas ([128]31) and because they regulated >125 target genes in our model. TF knockdown was performed in seven female and six male samples (with the exception of GCM1, which had four male samples, and GRHL2, which had five male samples). Gene expression for predicted target genes was quantified through RNA-seq, and differentially expressed genes (DEGs) (unadjusted P < 0.05) are summarized in fig. S4. We generated model accuracy values by comparing number of instances of “true positives” and “true negatives” out of the total number of target genes based on the directionality of the knockdown (log fold change of the knockdown versus control) with the direction of the correlation coefficient (positive or negative) between TF and target gene ([129]Fig. 4), with the understanding that knockdown of TFs which were positive regulators in the model should result in decreased expression (negative log fold change), i.e., true positives, and knockdowns of TFs that were negative regulators in the model should result in increased expression (positive log fold change), i.e., true negatives. We achieved an accuracy of 0.96 for GRHL2, 0.80 for GCM1, 0.71 for ESRRG, and 0.32 for CREB3L2. Statistical significance was determined using a one-sided Fisher’s exact test (GRHL2, P = 5.65 × 10^−4; CREB3L3, P = 1; ESRRG, P = 7.09 × 10^−4; GCM1, P = 0.08). We also analyzed the predictive accuracy and significance of our model within the experimental data in a sex-stratified analysis. In the female siRNA knockdown samples, we achieved a stronger validation of GRHL2 (accuracy = 1, P = 0.015) and ESRGG (accuracy = 0.88, P = 0.018), and we generally had lower accuracy in male samples (fig. S5). Overall, our experimental knockdowns revealed that the TRN provided accurate insight into TF regulation for three of our four selected TFs. Fig. 4. Experimental validation for assessment of final model accuracy. [130]Fig. 4. [131]Open in a new tab Results of Experimental validation for (A) GRHL2, (B) CREB3L2, (C) ESRRG, and (D) GCM1. X axis represents the log fold change of knockout versus control, and the Y axis represents correlation between TF and target gene. Model accuracy was calculated as the number of positively regulated genes with a negative log fold change (i.e., true positives, shown in red) added to the number of negatively regulated target genes with a positive log fold change (i.e., true negatives, shown in blue) divided by the total number of target genes for each sample. Genes that were not concordant are shaded in gray. The size and shading of each target gene represent the relative rank (1 to 15) of the TF as a regulator in our TRN. Identification of sex-specific differences in transcriptional regulation We examined effect modification by fetal sex in all TF–target gene interactions captured in our model by running separate linear regression models between each target gene (dependent variable) and TF (independent variable) with fetal sex as a statistical interaction term. Interactions were considered statistically significant based on the FDR of the interaction term <0.05. We also calculated the difference in effect estimates in these models by performing sex-stratified linear regression models in males (N = 247) and females (N = 228). We observed 5361 TF–target gene relationships between 254 TFs and 1996 target genes that were significantly different between male and females (FDR adjusted, <0.05; table S6). Six of these TFs and 81 of the target genes were specifically located on the X chromosome, and the rest of the interactions were within autosomal regions. The TFs with the biggest sex-specific differences were ELK3, SP11, ETV5, ETS2, and TWIST1. We compared the difference in estimates between males and females and the direction of the estimates between the TF–target gene relationships. We observed four distinct types of sex-specific interactions, as shown in [132]Fig. 5. We found that 291 positive TF–target gene relationships were stronger in males than in females (quadrant 1), 3681 positive TF–target gene relationships were stronger in females than in males (quadrant 2), 1261 negative TF–target gene relationships were stronger in females than in males (quadrant 3), and 120 negative TF–target gene relationships that were stronger in males than in females. (quadrant 4) The top three biggest differences for each quadrant are depicted in fig. S6. Overall, these statistically significant interaction terms indicate global effect modification by fetal sex and suggest differences in regulation of gene expression between female and male placentas that can be identified in our robust model. Fig. 5. Plot depicting the 5361 sex-specific interactions in the placental TRN, divided into four distinct interaction categories. [133]Fig. 5. [134]Open in a new tab The x axis represents TF–target gene estimate values from linear regression analysis in the whole dataset. The y axis represents the difference in estimates between male and female samples. Examination of TFs enriched as regulators of specific placental cell types Next, we integrated our TRN with recently published placental cell type signatures derived from single-cell RNA-seq data ([135]30) to provide additional insight into how individual TFs may regulate specific placental cell types with different roles in placental function. Campbell et al. ([136]30) identified 19 fetal placental cell types from single-cell RNA-seq data, and we used the cell type signatures derived from Seurat (presented in Campbell et al.’s supplementary table 2) for TF enrichment tests. We identified enriched TFs using overrepresentation analysis via hypergeometric tests, with the regulons used in lieu of a gene ontology (GO) or Kyoto Encyclopedia of Genes and Genomes (KEGG) gene set. Overall, there were 159 TFs that were enriched for these cell type markers in one or more placental cell line (FDR < 0.05, Fisher’s exact test), which are presented visually in [137]Fig. 6. We observed a pattern of distinct TFs regulating different types of immune subpopulations (CD14^+ monocytes, Hoffbauer cells, and granzyme B (GZMB) natural killer cells) and a different subset of TFs regulating trophoblast-derived cell types (cytotrophoblasts and proliferative cytotrophoblasts). Fibroblasts and proliferative cytotrophoblast cell signatures had the highest number of enriched TFs (38 TFs). Fourteen TFs were enriched for both cytotrophoblasts and proliferative cytotrophoblasts. Extravillous trophoblasts (EVTs) and syncytiotrophoblasts were generally enriched for different TFs, with the exception of the TF CEBPA. Syncytiotrophoblasts had the highest number of unique TFs ([138]22), and the placental-specific TF TWIST1 was the most significantly enriched TF in this cell type. EVTs had the second highest number of uniquely enriched TFs ([139]21), and the most enriched TFs for EVTs was the placenta-specific TF ASCL2. Fibroblasts and proliferative cytotrophoblast cell signatures had the highest number of enriched TFs (38 TFs). These results may be related to the underlying proportions of cell types present in our RNA-seq data. Fig. 6. Heatmap of all TFs enriched for one or more placental cell type signature as defined by Campbell et al. ([140]30), based on overrepresentation-based analysis (FDR <0.05; Fisher’s exact test). [141]Fig. 6. [142]Open in a new tab The color represents the proportion of total TF target genes that were cell type signatures, and we report P values adjusted for FDR. NK cells, natural killer cells. Applying the placental TRN to discover transcriptional regulators of genes associated with spontaneous preterm birth Ultimately, the goal of the placental TRN is to gain insight into transcriptional regulation in the context of prenatal exposures or placental pathologies. As a proof of concept, we used the placental TRN to identify key TFs that regulated the group of 961 genes whose placental gene expression was associated with preterm birth, which we established in an independent study of spontaneous preterm birth (sPTB) using clinical definitions and adjustment for confounding variables ([143]29). These 961 genes were regulated by a total of 301 TFs. Twenty-nine TFs were significantly enriched for these preterm birth associated genes (FDR < 0.05, Fisher’s exact test; see table S7). Together, these 29 TFs were responsible for regulation of 47.55% of the differences in expression observed. In addition, 31 of the 961 transcriptomic signature were TFs that regulated downstream genes that were also significantly associated with preterm birth. [144]Table 2 lists the eight TFs that were both decreased in relation to preterm birth [FDR < 0.05 ([145]29)] and were enriched for PTB genes (FDR < 0.05, Fisher’s exact test). A network of these TF–target gene relationships revealed that these eight TFs together positively regulated 304 genes that were also deceased in preterm placentas and negatively regulated 30 genes that were increased in preterm placentas ([146]Fig. 7). Of note, the TFs SPI1 and RUNX1 emerged as upstream regulators of these TFs based on hierarchical network mapping. These data demonstrate that our placental TRN can be used to identify regulatory pathways driving differential gene expression in placentas from preterm births. Table 2. TFs significantly associated with sPTB and enriched for genes associated with sPTB. FC, fold change. Association with sPTB* Enrichment test results TF Log FC FDR adjusted P N target Genes N target genes Associated with sPTB Proportion of PTB target genes FDR^† SPI1 −0.61 8.92 × 10^−04 996 208 0.21 2.23 × 10^−14 KLF9 −0.53 3.60 × 10^−03 888 172 0.19 7.61 × 10^−09 ETS2 −0.29 6.78 × 10^−03 904 160 0.18 1.25 × 10^−05 RUNX1 −0.32 3.84 × 10^−02 650 133 0.20 2.70 × 10^−08 RARA −0.43 4.72 × 10^−04 479 127 0.26 3.48 × 10^−16 JDP2 −0.3 2.59 × 10^−02 434 98 0.23 2.70 × 10^−08 FOXO1 −0.28 4.90 × 10^−02 402 89 0.22 3.96 × 10^−07 IRF9 −0.26 1.26 × 10^−02 170 46 0.27 2.87 × 10^−06 [147]Open in a new tab *Results described in ([148]29). †P values were adjusted using the Benjamini Hochberg method to calculate the FDR. Fig. 7. TRN subnetwork depicting enriched TFs (P < 0.1; Fisher’s exact test) and their corresponding genes that were previously associated with preterm birth (PT). [149]Fig. 7. [150]Open in a new tab Diamonds are TFs. Color of gene indicates directionality of log fold change in association with preterm birth. Placental-specific TFs, defined by the Human Protein Atlas ([151]31), are outlined in green. Color of edge indicates correlation between TF and target gene, and the number represents correlation coefficient. Transcriptional regulation between enriched TFs is highlighted by red dotted edges. Applications of the placental TRN Placental omics data can be used to quantify perturbations that influence gene regulation and physiological activity, providing insight into the underlying etiology related to outcomes or exposures ([152]32). Many researchers studying the placenta have used omics data to generate lists of genes which are associated with prenatal exposures as well as infant or child health outcomes, as summarized in our recent review ([153]33). These genes are often contextualized using pathway analysis approaches within KEGG or GO gene sets. The TF enrichment analysis we have performed using regulons is complementary to using other defined gene sets such as KEGG or GO. TRNs are focused on the question: “What TFs regulate these genes within the placenta?” There are other applications that provide an opportunity to query TF binding, such as Chip-X enrichment analysis ([154]34), which are based on aggregation of ChIP-seq data that are not tissue or cell type specific. Our TRN provides a placental tissue–specific version of this information that can be specifically queried to investigate individual TFs or target genes of interest or to perform an overrepresentation analysis, akin to the analysis described above using genes associated with premature birth. We have presented the full results of the genome-scale model in the format of a searchable web tool, PlacentalTRN, which is available at [155]https://paquettelab.org/placentaltrn. This model has 3 features: (i) a target gene search feature, (ii) a TF search feature, and (iii) an enrichment test. The search tools allow users to search the model for specific genes or specific TFs and evaluate model accuracy and predicted targets on a case-by-case basis. The enrichment test tool allows users to import a list of genes (such as the list of preterm birth genes presented here) and identify TFs which regulate these genes. For those interested in accessing the TRN independently of the web interface, we provide the entire TRN in table S8. Our intention is that other placental biology researchers will be able to easily access the TRN and identify transcriptional regulators of their own gene lists of interest by using this database. DISCUSSION In this study, we generated a placenta-specific TRN using established genomic footprinting techniques ([156]35) and high dimensional RNA-seq data from 475 placental transcriptomes. Our network captures 113,158 interactions between 391 TFs and 7712 target genes. Our model was validated against a separate transcriptomic dataset and follow-up experiments for several key regulators. Our study revealed that the TFs SP1, ZNF148, SP3, SMAD4, and CLOCK were master transcriptional regulators of placental gene expression within the TRN. Our study also identified 25 TFs unique to the placenta, with TFAP2A emerging as the largest regulon, and highlighted a subset of 5361 TF–target gene interactions that displayed sexual dimorphisms between male and female samples, revealing more transcriptional regulation occurring within female placentas. We used our TRN to identify specific TFs that were enriched as regulators of signatures of specific placental cell types. Last, we applied our model to identify 29 TFs that were enriched as regulators of genes whose placental expression was previously associated with preterm birth, including eight TFs that were also decreased in preterm birth, which may indicate clinically modifiable targets. This model captures genomic regulation from placentas collected at the end of pregnancy, which is generally when placental samples are readily available and typically collected for large cohort studies. To facilitate additional studies, we have made our model publicly available as a web accessible resource ([157]https://paquettelab.org/placentaltrn/). The major advancement of this study is the generation of a large and sophisticated TRN model using DNase-seq data within key placental tissues at term and the validation of this model using gene expression data captured within a large and diverse cohort. Our model also complements other models of placental transcription constructed using gene coexpression network analysis approaches but provides individual gene level resolution, which enhances interpretability. An independent gene coexpression network of the human placenta constructed using WGCNA on transcriptomic data revealed five hub modules associated with fetal growth restriction (FGR) ([158]23). One of the core genes of the module associated with both FGR and fetal overgrowth was CREB3, which was a hub TF that regulated 738 target genes within our model. Our model builds upon this by inclusion of information about chromatin accessibility using DNase-seq. DNase-seq data (without coexpression analysis) have been used to construct a gene regulatory model of early placental transcriptional regulation ([159]26). This study mapped genes involved in differentiation of human embryonic stem cells to trophoblast cells in mice and identified 20 key TFs involved in placental differentiation through centrality analysis ([160]26). Many of the reported genes in this study that were linked to placental development are hub TFs in our model, including SMAD4, ETS1, RUNX1, CREB1, SMAD3, PPARA, and JUND. We did not include the subset of established TFs without motifs as defined by Lambert et al. ([161]36). We surmised that there were complex mechanisms by which these TFs act to regulate gene expression (including acting as cofactors). Comparison of the accuracy of null models to our primary model highlights that constraining models based on TF motifs within regions of open chromatin and strong TF–target gene correlations results in higher accuracy of TFs at predicting target gene expression compared to models with weak correlations and/or no binding sites. Our model also captures regulation of key TFs identified in other studies from human placental data collected at term. Another WGCNA-based analysis revealed that a large proportion of the placental transcriptome is organized into modules of coexpressed genes that were involved in core placental functions and were reproducible across trimesters ([162]22). Our model (generated at the end of gestation) is concordant with previously generated network-based analyses earlier in gestation, suggesting that these TFs are essential not only to placental development but also to placental function across gestation. However, there are instances where established TF–target gene interactions were not included in our network. For example, several studies in placental cell lines have demonstrated that TF GCM1 is a positive regulator of the PGF gene (encoding placental growth factor) ([163]37, [164]38), and while we identified binding sites and a positive correlation in our initial model, this relationship was not strong enough to be included in the final model based on our criteria. Thus, it is important to recognize that not all TF–target gene interactions are captured in our network, due to type II error. We note that PGF and GCM1 relationships are also not captured in other network tools such as HumanBase or the STRING database, indicating that this is a common issue with network approaches. We elected to use Lasso regression–based approaches for each target gene individually, as opposed to clustering-based approaches, for enhanced interpretability and decreased chances of capturing indirect and false-positive interactions ([165]39). WGCNA and clustering-based approaches may be more appropriate in other contexts, and researchers can apply our TRN to their own coexpression clusters by using web application to identify regulons that are enriched for their lists of coexpressed genes. Many of the highest-ranking core TFs in our model, including SP1, SP3, SMAD4, and TFAP2A, are essential regulators of placental function. SP1 and SP3 had the highest eigenvector and degree centrality in our model. SP TFs are involved in placental glucocorticoid signaling ([166]40), and placental expression of these TFs has also been linked to preeclampsia ([167]41). SMAD4 had the second highest eigenvector centrality value and is a signaling protein in the transforming growth factor (TGF) family which is activated in response to TGF-β. This signaling pathway plays a crucial role in regulation of trophoblast proliferation, differentiation, and invasion, as was reviewed by Adu-Gyamfi et al. ([168]42). TFAP2A regulated 1255 target genes and was the largest placental-specific TF based on both eigenvector centrality and degree centrality. TFAP2A encodes the activating enhancer binding protein 2-alpha (AP-2α), a TF that is involved in several key processes unique to the placenta ([169]43). TFAP2A expression is detectable throughout the placenta, including extravillous cytotrophoblasts, cytotrophoblasts, and syncytiotrophoblasts. AP-2α promotes the epidermal growth factor–dependent invasion of the human trophoblast ([170]44) and induces human CRH gene expression ([171]45). We were able to recapitulate some of these published findings of TFAP2A regulation. In another study, the TFs specificity protein 1 (SP1) and specificity protein 3 (SP3) regulate expression of TFAP2A in trophoblast cells ([172]46). In our model, SP1 and SP3 were included in the top 15 TFs regulating the expression of TFAP2A. TFAP2A is a positive regulator of CRH in our model, in alignment with work showing that AP-2 induced CRH gene expression ([173]45). Thus, our TRN reveals the significance of this TF for placental function as well as some core TFs that regulate many genes within the placenta. Sex differences between male and female placentas may contribute in part to differences in growth rates of male and female fetuses and differential responses to perturbations in the in utero environment and manifestation of disease later in life ([174]47). In our analysis, we identified 5361 TF–target gene relationships with effect modification based on fetal sex. Many of these TF–target gene differences may be related to hormonal changes, as we observed several differences related to nuclear hormone receptors that are responsive to sex hormones, including ESR1, ESSRA, ESRRG, and PGR. We observed a general trend in our interaction terms where both negative and positive TF–target gene relationships were stronger in female placentas compared to male placentas. An aggregated meta-analysis of microarray data revealed 140 differentially expressed genes in male and females samples, with the majority residing in autosomal regions ([175]27). This meta-analysis also identified several TFs that were enriched for these genes, including NFATC2, AHR, NKX3, HIF1A, and REL ([176]27). A number of these TFs were associated in our analysis with sex differences in expression of target genes, such as the AHR, which was associated with 111 TF–target gene relationships that were significantly different between males and females. On the basis of these findings, we hypothesize that there may be stronger transcriptional control of gene expression in female compared to male placentas. These differences may be driven by sex-specific differential response to stress hormones, environmental chemicals, and nutrition, as reviewed by Rosenfeld ([177]48). This is important because there is variation in pregnancy outcomes related to fetal sex ([178]49), such as an increased rate of preterm birth in male infants ([179]50). Further analysis needs to be performed on these relationships, including factors that modify these relationships and the role of genomic imprinting. We used our TRN to perform an enrichment analysis and identify specific TFs that regulated transcriptomic signatures attributed to 19 specific fetal placental cell types identified by Campbell et al. ([180]30). Our analysis revealed unique TFs that were enriched as regulators of different types of trophoblast cells. This is likely related to the specific regulatory roles of trophoblast TFs that are unique to specific lineages of placental cells as summarized by Knofler et al. ([181]51). For example, ASCL2 was the most significantly enriched TF for extravillous trophoblasts in our model, and this TF is known to be a key regulatory TF unique to extravillous trophoblasts and their progenitors ([182]51). The TF TWIST1 is involved in the process of trophoblast syncytialization ([183]52), and we found this TF to be uniquely enriched in synctiotrophoblasts in our samples. Overall, this enrichment analysis using the regulon models provided additional insight into the regulatory roles of specific TFs within individual placental cell types. To this end, we used the TRN to identify 29 different TFs enriched for genes associated with spontaneous preterm birth identified in an independent study ([184]29), including eight TFs (SPI1, KLF9, ETS2, RUNX1, RARA, JDP2, FOXO1, and IRF9) whose placental gene expression decreased in preterm placentas compared to term placentas. Developmental biology studies have revealed that these TFs play crucial roles in placental development. Both FOXO1 and KLF9 have been linked to the process of decidualization ([185]53, [186]54), which forms the basis of the maternal-fetal interface. RUNX1 and RARA are required for trophoblast differentiation ([187]55, [188]56), and ETS2 is essential for trophoblast cell renewal ([189]57). This body of work suggests that the network level changes we observed, including decreased expression of these TFs and their downstream genes, may reflect foundational differences in the development of the placenta in preterm infants. TFs that were decreased in preterm placentas, including FOXO1, JDP2, and KLF9, were involved in transcriptional regulation of the progesterone receptor (encoded by PGR). Progesterone is essential for the maintenance of pregnancy and timing of parturition ([190]58). Kruppel-like TF 9 (encoded by KLF9) influences the length of parturition through regulation of PGR in myometrial tissue in mice ([191]59), and this relationship was confirmed in our placental TRN as well. Forkhead Box O1 (encoded by FOXO1) is regulated by progesterone in myometrial cells ([192]60), and PGR also regulated the FOXO1 TF in our TRN. This body of work confirms the important regulatory role of progesterone in preterm birth and indicates specific mechanisms of transcriptional regulation. Several of these TFs have also previously been linked to preterm birth in other studies. Both FOXO1 and ETS2 were differentially expressed in preterm placentas compared to term placental control samples in a case-control transcriptomic study ([193]61). Another network biology–based study that used a protein-protein interaction network identified RUNX1 as regulator of genes containing SNPS associated with sPTB ([194]62). The retinoic acid receptor RARA was decreased in sPTB placentas and regulated 127 downstream genes that were also decreased in preterm placentas in our TRN. It is well established that serum levels of retinoic acid are decreased in preterm placentas ([195]63), which has been confirmed in a recent metabolomics study ([196]64). Ultimately, the 29 TFs that regulated sPTB genes could also represent potential clinical targets for preterm birth, as we reveal that downstream gene expression differences related to preterm birth can be regulated by these TFs. For example, our analysis revealed that several TFs involved in progesterone signaling decreased in sPTB, as well as the retinoic acid receptor. There are ongoing clinical studies investigating the use of progesterone ([197]65–[198]67) and retinoic acid ([199]68, [200]69) in the treatment and prevention of preterm birth. TFs are attractive druggable targets in the context of disease and have broadly been used in other diseases including cancer and autoimmune disease ([201]70). Our model of placental transcriptional regulation should be interpreted with limitations in mind. We use TF expression as a proxy for TF activity, which does not account for posttranscriptional or posttranslational modifications. Common to many similar studies, our results may be confounded due temporal and spatial disconnects, cellular heterogeneity, and differences in proliferation ([202]71). Our results may be confounded due to cellular heterogeneity, as TRNs reflect epigenetic regulation dependent on regions of open chromatin and DNA methylation which is intrinsically cell but not tissue type specific. Our results are constructed from bulk RNA-seq of placental tissue which is a composite of different cell types. This can create false-positive relationships and dilute signal strengths between known TFs and target genes. For each target gene, we present only the top 15 TFs that are best predicted to regulate expression. Thus, we do not represent all the TFs with binding sites within that region, contributing to type II error in our models. Our model was constrained to regions of open chromatin identified through DNase-seq. One limitation of this assay is the potential for DNase cleavage bias, which can obscure specific genomic locations and would also lead to type II error in our analysis ([203]72). Our analyses are composed of data solely from placentas at the end of gestation from relatively nonpathological pregnancies without preeclampsia and so may be inappropriate for researchers studying earlier developmental time points and also may not represent perturbed TF–target gene interactions related to those pathologies. Our transcriptome data were generated from placentas of infants delivered either vaginally or through cesarean section, and our DNase-seq data were collected from placentas after cesarean section. The influence of labor and delivery method on DNase-seq data are not well studied, but there are known influences on the placental transcriptome ([204]73). It is essential that studies investigating placental transcriptional differences consider the role of labor type and delivery method as a potential confounding variable before downstream network analysis. We applied the widely used ComBat algorithm ([205]74) to adjust for batch effects within our two datasets, which may remove biological variation in the data, so it is possible that the TF–target gene relationships we present in our models may have type II error. As always, it will be important to further test the model in an entirely separate validation cohort collected from different participants as they become available. In the future, we hope to build upon this model by including temporal resolution of time series changes that occur in the placental transcriptome and to apply to this model to other clinically relevant endpoints such as preeclampsia or gestational diabetes with specially designed cohorts designed to capture transcriptomic signatures in these populations. Our model makes an important contribution to the field of placental biology as the first publicly available placental TRN, as well as one of the first models of transcriptional regulation of any organ with an accessible and searchable database that allows researchers to perform their own analysis. We applied rigorous methods to test the accuracy of our model, including an independent validation dataset which is the gold standard in machine learning ([206]75, [207]76). To our knowledge, our model is composed of the largest placental transcriptomic dataset constructed to date. We also provide an innovative assessment of sex differences in TF–target gene relationships in our model, which further captures the importance of fetal sex as a biological variable in placental omics research. Last, we have tested our TRN using isolated villous cytotrophoblasts collected from placentas of nonpathological women delivering at term, which are more reflective of the term placental transcriptome than commercial placental cell lines which are commonly used for transcriptomic analyses. Our data could potentially be used in the future to provide insight into transcriptional mechanisms related to pregnancy complications and identify pharmacological interventions to target specific TFs that alter placental function. MATERIALS AND METHODS Sample information To construct the TRN, we integrated transcriptomic and chromatin accessibility data from placental tissue generated from three different sources as described below. MWRI sample Tissue was collected from nonpathological placentas from pregnant individuals who delivered at the Magee Womens Hospital in Pittsburgh, PA. Inclusion criteria included singleton pregnancies with signed informed consent and gestational age ≥ 37 weeks. Exclusion criteria included pregnancy complications including FGR, chromosomal abnormalities, preeclampsia, gestational diabetes, chorioamnionitis, antepartum hemorrhage, and a previous history of preterm birth. Placental biopsies (5 mm^3) were sharply dissected within 30 to 40 min after delivery from a region of the placenta midway between the cord insertion and the placental margin between the chorionic and basal plates and free of any macroscopic lesions, as we previously detailed ([208]77). Tissue biopsies were placed in RNAlater for 48 hours in 4°C and then stored in −80°C until processed for RNA-seq. Directional RNA-seq was performed by the Genomics, Epigenomics, and Sequencing Core at the University of Cincinnati. Approximately 50 mg tissue was homogenized in 0.8 ml lysis/binding buffer from the mirVana miRNA Isolation Kit (Thermo Fisher Scientific, Grand Island, NY) by using FastPrep-24 5G homogenizer (MP Biomedicals, Solon, OH). Total RNA extraction was conducted according to the manufacturer’s protocol. RNA quality was examined on an Agilent 2100 Bioanalyzer (Santa Clara, CA). Ribosomal RNA was depleted using the Ribo-Zero Gold kit (Illumina, San Diego, CA) with 1 μg of total RNA as input for the SMARTer Apollo NGS library prep system (Takara, Mountain View, CA). The library for RNA-seq was prepared by using the NEBNext Ultra II Directional RNA Library Prep kit (New England BioLabs, Ipswich, MA), and the sequencing was run on an Illumina HighSeq2000 with paired end reads, 101-bp read length, and at 50 million read depth. CANDLE samples Placental tissue was collected from nonpathological placental tissue from women from Shelby County Tennessee as part of the CANDLE study. RNA-seq data were generated as part of the ECHO-PATHWAYS consortium ([209]78). Inclusion criteria included singleton pregnancies enrolled in CANDLE with signed informed consent. Exclusion criteria included infants categorized as having severe pregnancy complications recorded in the CANDLE medical records, including confirmed clinical chorioamnionitis, preeclampsia, oligohydramnios, placental abruption, infraction or previa, and fetal chromosomal abnormalities. Methods of placental sample collection and RNA-seq processing for this cohort have been previously described ([210]79). Briefly, within 15 min of delivery, a piece of placental villous tissue approximately 2 cm by 0.5 cm by 0.5 cm was dissected from the placental parenchyma. The tissue cubes were stored in 20 ml of RNAlater and refrigerated at 4°C overnight (at least 8 hours but no more than 24 hours). Following this, each cube was transferred to an individual cryovial containing fresh RNAlater and stored at −80°C. The fetal villous tissue was manually dissected and cleared of maternal decidua. RNA was isolated using the AllPrep DNA/RNA/miRNA Universal Kit (Qiagen, Germantown, MD) according to the manufacturer’s recommended protocol. RNA purity was assessed by measuring optical density at 260/230 nm (OD[260/230]) and OD[260/260] ratios with a NanoDrop 8000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA). RNA integrity was determined with a Bioanalyzer 2100 using RNA 6000 Nanochips (Agilent, Santa Clara, CA). cDNA libraries were prepared from 1 μg of total RNA using the TruSeq Stranded mRNA kit (Illumina, San Diego, CA) and the Sciclone NGSx Workstation (PerkinElmer, Waltham, MA). Before cDNA library construction, ribosomal RNA was removed by means of polyadenylation (Poly-A) enrichment. Each library was uniquely barcoded and subsequently amplified using a total of 13 cycles of polymerase chain reaction. Library concentrations were quantified using Qubit fluorometric quantitation (Life Technologies, Carlsbad, CA). Average fragment size and overall quality were evaluated with the DNA1000 assay on an Agilent 2100 Bioanalyzer. Each library was sequenced to approximately a depth of 30 million reads on an Illumina HiSeq sequencer. Oregon Health Sciences Center Placental Repository Twelve placental villi samples were derived from nonpathological pregnancies from individuals recruited into the Oregon Health Sciences Center Placental Repository (OHSU). Inclusion criteria included singleton pregnancies delivered via c-section at OHSU Hospital Labor and Delivery Unit with signed informed consent and gestational age ≥ 37 weeks. Exclusion criteria included maternal body mass index > 25, maternal age < 18 or > 40, any current pregnancy complications, and current smokers. Samples were specifically selected to be equally matched across fetal sexes. Placentas were collected and weighed immediately following cesarean section. Five random samples of tissue (~80 grams) were collected from each placenta and placed in phosphate-buffered saline (PBS) to be transported back to the lab. The chorionic plate and decidua were removed from each randomly isolated placental sample, leaving only villous tissue, which was thoroughly rinsed in PBS to remove excess blood. Primary cytotrophoblasts were isolated from villous tissue using a protocol adapted from Eis et al. ([211]80), using trypsin/DNase digestion followed by density gradient purification. Isolated cytotrophoblast cells (5 to 10 × 10^6 cells/ml) were then frozen in freezing media [10% dimethyl sulfoxide in fetal bovine serum (FBS)] and stored in liquid nitrogen until usage. Placentas were collected into a tissue repository under a protocol approved by the Oregon Health and Sciences University Institutional Review Board with informed consent from the patients. All tissues and clinical data were deidentified before being made available to the investigative team. Quantifying placental gene expression in the CANDLE AND MWRI placentas A flowchart describing sample preprocessing is described in fig. S2B. At the time of TRN construction, 475 CANDLE placental samples had been sequenced. For the RNA-seq samples from MWRI and the CANDLE cohort, transcript abundances were estimated using the quantification program Kallisto ([212]80, [213]81). Initial quality control was performed on each RNA-seq sample using FastQC ([214]82). Length scaled counts were condensed to Ensembl Gene IDs using the Bioconductor TXImport package ([215]83). Genes were then filtered to remove low-expressing genes using the “FilterByExprs” function of EdgeR, which, by default, removes genes with <10 counts [0.5 counts per million (CPM)] in 70% of samples ([216]84). For the dataset from MWRI, this included 17,574 genes, and from the CANDLE samples, this included 16,440 genes. Datasets were then normalized using the trimmed mean of M, and then precision weights were generated for log CPM values using the voom transformation. The datasets were then merged by all genes common across both datasets, generating a final transcriptomic dataset of 14,837 genes in 475 samples. Confounding related to both RNA-seq batch and intrinsic differences between the cohorts was eliminated using the ComBat algorithm from the Bioconductor SVA package ([217]74), which applies an empirical Bayes method to estimate location and scale parameters for each batch, which are then removed to provide a consistent mean and variance for each gene, across all batches. This approach has successfully been used to integrate microarray datasets ([218]85), as well as RNA-seq datasets ([219]86). RNA-seq batch was set as the “batch” variable to be adjusted for (the CANDLE cohort was sequenced in four batches, and the MWRI dataset was sequenced as one batch), and maternal race and gestational age were included as covariate variables in the model matrix, to ensure that differences related to these variables would be retained in the final dataset. Batch effects were evaluated using principal components analysis (see fig. S7). For TRN construction, the data were split into a holdout dataset (20% of samples, N = 93), and 20% of the remaining samples was used for the testing dataset (N = 75), with 80% used for the construction of the initial TRN (N = 307). After model splitting but before model construction, we tested to ensure that there were no statistical differences between the training and holdout datasets in relation to the distribution of covariates including batch/cohort, labor status maternal race, and fetal sex (categorical, tested using chi-square tests), as well as gestational age and birth weight (continuous, t tests). In the final model, all samples were used (N = 475 samples). An additional 366 CANDLE placental transcriptomes were sequenced after model construction and were processed using the same pipeline for use as an independent validation dataset. DNase hypersensitivity data generation DNase-seq data were generated from the samples collected at OHSU. The Altius Institute used an established protocol for tissue processing developed as part of the Encyclopedia of DNA Elements (ENCODE) project ([220]87). Samples processed using the (DNAse hypersensitivity (DHS) -pseudonano assay & nuclei were sterile-filtered (0.25 to 6.4 million nuclei per sample), with minimal cellular debris, and treated with three doses of DNase (40, 60, and 80 U) followed by a stop buffer including ribonuclease A, followed by treatment with proteinase K. The samples had a signal portion of tags (spot) score > 0.5 except for one sample (which had a score of 0.34), which is a measure of enrichment, reflecting the fraction of reads that fall within peak calling regions. A score > 0.4 is indicative of high-quality DNase sequencing data meeting ENCODE quality guidelines. Peak regions of DNase-seq data were identified using F-seq ([221]88), a density estimator for high-throughput sequence tags, using a threshold parameter of 2.5, which was selected based on previously reported performance metrics ([222]72). The peak regions and DNase-seq data were both used to detect genomic “footprints” (i.e., regions of accessible DNA) using Hmm-based IdeNtification of Transcription factor footprints ([223]89), a method based on hidden Markov models. TF motifs and single gene model construction For each gene in our network, we identified putative TFs through integration of the DNase-seq data and TF binding motif information. TFs were defined by the Human Transcription Factor Database which had motifs from these TF databases (667 TFs) ([224]36). TF motifs were identified using three established databases: JASPAR ([225]90), HOmo sapiens COmprehensive MOdel COllection ([226]91), and Swissregulon ([227]92). We first matched these binding sites with the footprints in the placenta (described above) using Find Individual Motif Occurrences, applying the default 10-4 match threshold (version 5.0.5) ([228]93). The motifs were aggregated into position weight matrices within the R package “MotifDB” ([229]94). We assigned motifs to their TFs as well as those in the same DNA binding domain family using the TFClass database, as established in our previous work ([230]17). In our analysis, we did not consider the subset of TFs classified by Lambert et al. ([231]36) with no known motifs. We included both promoter and enhancer regions in our model. The promoter region was defined as ±5 kb from the transcriptional start site, which has been shown to maximize target gene prediction in TRNs constructed with footprinting data ([232]95). The enhancer regions were identified using genehancer (version 4.11). Genehancer is a database that integrates reported enhancers from multiple sources, including ENCODE and Ensembl, and uses a combinatorial likelihood method to define genes based on tissue coexpression correlations, enhancer-targeted TFs, expression quantitative trait loci, and chromatin conformation capture ([233]96). We included all enhancer regions defined by Genehancer as “elite,” indicating that they had more than two evidence sources, and those defined as “placental-specific” based on Genehancer analysis of tissue-specific genes. TRN construction and parameterization Constructing and evaluating model in training data We integrated the data described above to generate a model consisting of nodes and edges. A “putative edge” was identified between a TF and a target gene if a motif for that TF was present in a region of open chromatin (i.e., footprint) in the gene’s promoter or enhancer (see [234]Fig. 1). We generated a coexpression matrix for each putative edge. We ranked each target gene based on the absolute value of the Pearson correlation coefficient. We pruned our model through an initial “baseline model” (model 1), including only TFs with > median absolute value of the overall Pearson correlation coefficient for each target gene and then including only the top 15 TFs based on absolute Pearson correlation coefficient. We selected this initial module size of 15 as an initial cutoff based on similar analyses performed in the construction of a TRN constructed in the context of T helper cell type 17 differentiation ([235]97), and we tested other cutoffs (10 and 20 TFs). All models were constructed using Lasso regression within the “glmnet” R package (version 2.0-16). First, we constructed the model using the “cv.glmnet” function using the default parameters, which performs a 10-fold cross-validation on a grid of 100 values of lambda and identifies a lambda minimum value from the mean squared error of these models. This lambda minimum was used to predict gene expression first in the test dataset (during model construction) and then in the holdout dataset. To evaluate the accuracy of our initial TRN and select model parameters for our full model, we examined how the expression of the TFs in our model predicted gene expression of each target gene in our testing dataset by comparing the overall OOS R^2 values. These values were calculated as the squared estimate of the correlation between the predicted versus actual expression values generated from Lasso regression. We selected the best model to use in the holdout dataset based on these thresholds. Null model We created three null TRNs to evaluate the validity of our model to distinguish between signal and noise, which we constructed in our training and evaluated in testing dataset. The first null model (model 7) was constructed from 15 randomly selected genes with motifs but with correlations that were less than the median correlation coefficient. In parallel to our model constructed above, we generated a network consisting of “null putative edges” defined as interactions between a TF and a target gene if a motif for that TF was not present in a region of open chromatin (i.e., footprint) in the genes promoter or enhancer. We generated a coexpression matrix for each “null putative edge” within our training dataset. Our second null model (model 8) was constructed of 15 randomly selected genes with no motifs but with correlation coefficients > the median value in our true dataset. Our third null model (model 9) was constructed of 15 randomly selected genes with correlation coefficients less than the median value in our true dataset and with no motifs. In some instances, 15 genes were not available that met these criteria, and in this case, we included all the genes available. Final model construction and analysis After we established specific model parameters (i.e., enhancers, numbers of TFs, and correlation coefficient cutoff) in our training and testing dataset, we combined these two datasets and then tested the model in our final holdout dataset using the best parameters as identified from our original models. We did not include target genes with an OOS R^2 accuracy < 0.25. Network connectivity scores were calculated using the R package “igraph” (version 1.2.4.1). To examine effect modification in TF–target gene relationships by fetal sex, we ran linear regression models for each target gene–TF interaction in our TRN, with the target gene as the dependent variable and each one of the TFs as a separate independent variable and with fetal sex as an interaction term. Models were considered statistically significantly different by sex if the FDR adjusted P value of the interaction term was <0.05. We adjusted for the total number of linear regressions run to test all interaction terms (N = 113,158). We ran fetal sex-stratified linear regression models in male samples (N = 247) and female samples (N = 228) to calculate the coefficient estimate and differences in these estimates, which was used to evaluate the directionality of these relationships. Experimental validation We performed knockdowns of selected TFs in primary villous trophoblast cells from placental samples collected through the OHSU placental biorepository. Placenta-specific TFs were identified using the Human Protein atlas ([236]31), and then four TFs were selected on the basis of module size (>125 target genes) and median gene expression (>median reads per million). Separate transfections were performed using the ON-Target SMARTpool siRNAs (Dharmacon) to induce silencing of four genes: CREB2L2 (catalog no. L-019357-02-0005), ESRRG (catalog no. L-003405-00-0005), GCM1 (catalog no. L-011491-00-0005), and GRHL2 (catalog no. L-014515-02-0005) with a nontargeting negative control (catalog no. D-001810-10-05). siRNAs were diluted in nuclease-free water and stored at a 50 μM stock concentration. These transfections were each performed from villous trophoblasts collected from elective c-sections at OHSU as described above. For transfection, cytotrophoblast cells were rapidly thawed in a 37°C water bath and immediately diluted in Iscove’s modified Dulbecco’s medium [25 mM glucose, 4 mM glutamine, and 1 mM pyruvate (Gibco #12440-046) supplemented with 10% FBS (Corning #35-010-CV) and 1% penicillin/streptomycin (Gibco #15140-122)] termed as complete media. Cells were centrifuged at 1000g for 10 min and suspended in fresh complete media. The cells were plated at 2 × 10^6 cells per well concentration in 12-well culture plates and cultured at 37°C, 5% CO[2]. After 24 hours, cells were transiently transfected using the above siRNAs and Lipofectamine (Invitrogen #11668-019) as per the manufacturer’s protocol and incubated for another 24 hours. At the end of 24 hours (total of 48 hours from the start of culture), the transfection media was replaced with complete media, and cells were cultured for another 24 hours. After 72 hours of total culture time (24 hours following previous complete media change), the cells were lysed in 700 μl of Qiazol, and RNA was isolated using the Zymo Direct-zol Mini-prep kit and reagents (Zymo, catalog no. RD2051). RNA integrity was determined with a bioanalyzer, and only RNA samples with an RNA integrity number > 8.6 were used for RNA-seq analysis. cDNA libraries were prepared from 1 μg of total RNA using the Illumina Stranded Total RNA library kit. Each library was sequenced to approximately a depth of 50 million reads on an Illumina NovaSeq 6000 S4 (2 ×100 read length) sequencer by the OHSU sequencing core. We used a similar pipeline to process the RNA-seq data as was used for the human placental data, as described above. Transcript abundances were estimated using Kallisto, and length scaled counts were condensed to Ensembl Gene IDs using the Bioconductor TXImport package ([237]83). Genes were then filtered to remove low-expressing genes using the FilterByExprs function of EdgeR ([238]84). Differentially expressed genes between siRNA knockdowns versus negative scrambled controls were identified using generalized linear models implemented within EdgeR ([239]84), including only target genes that were predicted to be expressed by each TF (i.e., for GCM1, we analyzed 240 genes; for CREB3L2, we analyzed 425 genes; for ESR1, we analyzed only 676 genes; and for GRHL2, we analyzed 283 genes). Dispersion parameters were estimated using the Cox-Reid method, and then differentially expressed genes were identified using the using quasi-likelihood F tests, which is more robust and produces more reliable error rates than a likelihood ratio test when the number of replicates is small ([240]98). For this screen, we considered genes statistically significant using an unadjusted P < 0.05. These significant genes were used to calculate model accuracy and statistical significance. Model accuracy was calculated as the number of positively regulated genes with a negative log fold change (i.e., true positives) added to the number of negatively regulated target genes with a positive log fold change (i.e., true negatives) divided by the total number of target genes for each sample. Model accuracy reflects these ratios. Statistical significance was assessed using a one-sided Fisher’s exact test on the number of genes that were true positives and true negatives versus other genes, and we report P values from this test. TF module enrichment analyses We performed two separate TF enrichment analyses ([241]99) in which we considered the TFs and all the genes they regulate (i.e., regulons) as gene groups (analogous to predefined pathways such as KEGG or GO) and compared if the two gene lists of interest were enriched within specific regulons using one sided Fisher’s exact test. The first analysis involved cell signatures of fetal placental-specific cell types derived from single-cell RNA-seq data of the placenta and identified by Seurat by Campbell et al. ([242]30). Top marker gene expression is reported as a Bonferroni adjusted P of <0.05 for this paper ([243]30) . The second analysis involved differentially expressed genes that were significantly associated with spontaneous preterm birth identified in a combined cohort analysis of RNA-seq data from the CANDLE cohort as well as a different cohort (Global Alliance to Prevent Prematurity and Stillbirth) ([244]29). In the original study, genes were considered statistically significantly associated with preterm birth using an FDR < 0.05. We are using the subset of 961 genes significant from the binary analysis comparing preterm to term. For both analyses, we considered the 7712 target genes captured within our TRN as the background list of genes for overrepresentation analyses, based on the rationale underlying pathway enrichment analysis that nonspecific background lists bias overrepresentation tests ([245]100, [246]101). We only included TFs in our enrichment tests which regulated three or more differentially expressed genes. For both of these exploratory analyses, we considered a TF to be a key regulator if its target genes were overrepresented based on an FDR < 0.05. We used Benjamini and Hochberg’s method to compute the FDR using P values ([247]102) for all analyses. All code is publicly available in the following github repository ([248]https://github.com/AlisonPaquette/PlacentalTRN_ManuscriptCode/). All analyses were performed in R (version 3.6.1), and data were visualized using Cytoscape (version 3.9.1). The TRN was developed using the R Shiny Web Application. Acknowledgments: