Abstract

Background

   Patterns of gene expression of human pregnancy are poorly understood.
   In a trial of vitamin D supplementation in pregnant women, peripheral
   blood transcriptomes were measured longitudinally on 30 women and used
   to characterize gene co-expression networks.

Objective

   Studies suggest that increased maternal Vitamin D levels may reduce the
   risk of asthma in early life, yet the underlying mechanisms have not
   been examined. In this study, we used a network-based approach to
   examine changes in gene expression profiles during the course of normal
   pregnancy and evaluated their association with maternal Vitamin D
   levels.

Design

   The VDAART study is a randomized clinical trial of vitamin D
   supplementation in pregnancy for reduction of pediatric asthma risk.
   The trial enrolled 881 women at 10–18 weeks of gestation. Longitudinal
   gene expression measures were obtained on thirty pregnant women, using
   RNA isolated from peripheral blood samples obtained in the first and
   third trimesters. Differentially expressed genes were identified using
   significance of analysis of microarrays (SAM), and clustered using a
   weighted gene co-expression network analysis (WGCNA). Gene-set
   enrichment was performed to identify major biological pathways.

Results

   Comparison of transcriptional profiles between first and third
   trimesters of pregnancy identified 5839 significantly differentially
   expressed genes (FDR<0.05). Weighted gene co-expression network
   analysis clustered these transcripts into 14 co-expression modules of
   which two showed significant correlation with maternal vitamin D
   levels. Pathway analysis of these two modules revealed genes enriched
   in immune defense pathways and extracellular matrix reorganization as
   well as genes enriched in notch signaling and transcription factor
   networks.

Conclusion

   Our data show that gene expression profiles of healthy pregnant women
   change during the course of pregnancy and suggest that maternal Vitamin
   D levels influence transcriptional profiles. These alterations of the
   maternal transcriptome may contribute to fetal immune imprinting and
   reduce allergic sensitization in early life.

Trial Registration

   clinicaltrials.gov [42]NCT00920621

Introduction

   Pregnancy is associated with many biological changes. While the
   structural and physiological changes are well characterized, the
   associated underlying molecular changes are less understood. The Barker
   hypothesis first posited that maternal dietary influences in pregnancy
   affect fetal development and can increase the risk of chronic disease
   later in life [[43]1]. Within this context, recent studies have
   implicated the prenatal period as a critical timeframe during which
   responsiveness to environmental stimuli may be permanently programmed.
   Numerous epidemiological studies have shown strong associations between
   maternal smoking, pollution, chemical exposure, maternal diet and
   altered risk of chronic disease [[44]2–[45]4]. Yet, the mechanism by
   which environmental exposures influence maternal gene expression and
   whether it is the environmental exposure or the maternal gene
   expression that influences the developing fetus is incompletely
   understood [[46]5–[47]9].

   Fetal growth is thought to be a multifactorial process that is
   modulated by maternal, fetal, placental and environmental factors
   [[48]10]. Studies in mice have demonstrated a link between maternal
   genotype, gestational length and fetal lung maturation [[49]11]. It is
   reasonable to hypothesize that maternal gene expression over the course
   of pregnancy might directly, or indirectly, influence fetal gene
   expression profiles, with potential long-term effects.

   Studies that have investigated gene expression in pregnant mothers have
   largely been restricted to a single time point (i.e. cross-sectional
   studies) or limited to a short time interval over the course of
   pregnancy [[50]12]. In this study, we wished to examine changes in gene
   expression profiles over the course of pregnancy, ranging from the 1^st
   trimester to the 3^rd trimester. Our key concern is to illuminate the
   relationship between maternal levels of Vitamin D (VitD) and variation
   in gene transcription patterns. Our study is based on a nested cohort
   of 30 participants in the Vitamin D Antenatal Asthma Reduction Trial
   (VDAART), a multicenter randomized, controlled trial of vitamin D
   supplementation in pregnancy[[51]13]. This trial evaluates the effect
   of maternal VitD exposure on fetal development and risk of development
   of several diseases such as asthma. Several epidemiological studies
   have documented a worldwide VitD deficiency among pregnant women,
   identifying possible detrimental effects on the developing
   fetus[[52]14, [53]15]. VitD is a pro-hormone that is obtained either
   through diet or synthesized endogenously in the skin after exposure to
   sunlight [[54]16] and plays a major role during pregnancy and fetal
   development [[55]17]. Numerous observational epidemiologic cohort
   studies suggest that higher maternal VitD levels reduce the risk of
   chronic diseases, including asthma in early life [[56]18–[57]20]. Thus,
   while mounting evidence indicates that maternal VitD is necessary for
   normal fetal development and organ function the underlying mechanisms,
   including changes in global maternal gene expression profiles over the
   course of pregnancy, have not been explored [[58]21].

   This study identifies genes associated with maternal VitD levels during
   pregnancy that have the potential to regulate fetal development. Our
   results provide a framework upon which to evaluate antenatal influences
   of maternal origin on the developing fetus and, consequently, the
   predisposition to chronic disease later in life.

Materials and Methods

Study Participants

   We conducted a nested cohort study involving 30 healthy first trimester
   (10–18 weeks of gestation) women aged 18–39 years that were recruited
   as part of a larger multicenter, randomized, double-blind, controlled
   clinical trial from 3 different centers, Boston, San Diego and St.
   Louis [[59]S1 Table]. VDAART has been registered in clinicaltrials.gov
   by ID# [60]NCT00920621. The design, method and results for the primary
   outcome of the trial are published [[61]13, [62]22]. The 30 women were
   chosen randomly and equally from both of the blinded treatment arms
   (placebo arm and treatment arm) corresponding to low dose (400 IU) and
   high dose (4400 IU) Vitamin D, respectively. Women were screened at the
   subjects’ first pre-natal visit and their pre-pregnancy body mass index
   (ppBMI) obtained. The following eligibility criteria were applied: a)
   Maternal personal history of or biological father history of: asthma,
   eczema, allergic rhinitis; b) Gestational age between 10 and 18 weeks
   at the time of randomization; c) Maternal age between 18 and 39 years;
   d) Not a current smoker (defined as not having smoked for at least 1
   month prior to enrollment) and not a user of other nicotine products
   (e.g. nicotine patch for at least 1 month prior to enrollment); e)
   English- or Spanish-speaking; f) Intent to participate for the full 4
   years (through pregnancy and then until the 3rd birthday of the child).
   Exclusion criteria for the study were: a) Gestational age >18 weeks; b)
   Presence of chronic medical conditions: (i) hypertension on
   medications, (ii) diabetes mellitus, (iii) parathyroid disease, (iv)
   uncontrolled thyroid disease, v) kidney stones, and (vi) sarcoidosis;
   c) Intake of vitamin D supplements containing > 2,000 IU/day of vitamin
   D3; d) Multiple gestation pregnancy; e) Pregnancy achieved by assisted
   reproduction techniques (e.g. IUI, IVF); f) Current use of illicit
   drugs (defined as any use in the past 6 months prior to enrollment); g)
   Previously enrolled in VDAART for a prior pregnancy; h) Any major fetal
   anomalies detected prior to delivery; i) Patient Health Questionnaire
   (PHQ-9)39 depression scale ≥ 15 and j) Any condition, in the opinion of
   the Clinical Center Principal Investigators, that would inhibit
   compliance with the study medications or prohibit long-term
   participation in the trial. All mothers gave written consent for their
   own enrollment and that of their offspring. IRB approval was obtained
   from each of the three clinical centers and the Data Coordinating
   Center (Washington University in St. Louis, Boston Medical Center and
   Kaiser Health Care San Diego and Partners Health Care Boston MA
   [[63]13].

Blood collection

   Peripheral blood samples were collected at enrollment (at 10–18 weeks
   of gestation) and during the 3^rd trimester at 32–38 weeks of
   gestation.

Measurement of 25(OH) D (Vitamin D)

   All specimens were analyzed at the Channing Laboratory DCC as
   previously described [[64]13]. The method for quantitative
   determination of 25-hydroxyvitamin D is an FDA approved, direct,
   competitive chemiluminescence immunoassay (CLIA) using the DiaSorin
   LIAISON 25-OH Vitamin D Total assay. This assay is co-specific for
   25-hydroxyvitamin D3 and 25-hydroxyvitamin D2. The assay utilizes a
   specific antibody to 25-hydroxyvitamin D for coating magnetic particles
   (solid phase) and a vitamin D analogue,
   22-carboxy-23,24,25,26,27-pentanorvitamin D3, linked to an isoluminol
   derivative. During the incubation, 25-hydroxyvitamin D is dissociated
   from its binding protein, and competes with the isoluminol labeled
   analogue for binding sites on the antibody. After the incubation, the
   unbound material is removed with a wash cycle. Subsequently, the
   starter reagents are added and a flash chemiluminescent reaction is
   initiated. The light signal is measured by a photomultiplier as
   relative light units (RLU) and is inversely proportional to the
   concentration of 25-hydroxyvitamin D present in calibrators, controls,
   or samples. The inter-and intra-assay Coefficients of Variability for
   this assay are 11.2% and 8.1%, respectively. All assays were performed
   at the Data Coordinating Center blinded for treatment assignment
   [[65]13].

RNA isolation and quantification

   Total RNA was isolated from whole blood using the Paxgene Blood RNA Kit
   (Qiagen®) according to manufacturers protocol. The Ambion Globin Clear
   kit (Ambion®) was used to remove alpha and beta globin mRNA from the
   sample. This is done for whole blood samples in order to increase the
   sensitivity of gene expression assays, improving the detection rate of
   expressed genes. The RNA was quantified using Nanodrop 8000 and checked
   for high integrity before preparation of cDNA (first strand synthesis).
   The quality was assessed using the Agilent 2100 Bioanalyzer, which
   provides a good estimate of RNA concentration and purity of the sample
   as assessed by the RNA Integrity Number (RIN). An RIN ≥ 8 was deemed
   acceptable.

Hybridization to Illumina BeadChip array

   Gene expression was assessed using the Illumina GX–Direct Hybridization
   Assay and Biotinylated–cRNA prepared according to manufactures protocol
   (Illumina, San Diego, CA). Briefly, 200ng total RNA was reverse
   transcribed to synthesize first- and second-strand cDNA followed by a
   single in-vitro transcription that incorporates biotin-labeled
   nucleotides. The quality and quantity of the purified cRNA was assessed
   and samples hybridized onto Illumina HumanHT-12 v4 Expression Bead
   Chip. The chips were washed, blocked and stained with strepdavidin-Cy3
   and scanned using Illumina’s Bead Array Reader. Quality control
   analysis and reports were created using Illumina’s GenomeStudio 2011.1
   Software; Gene Expression Module version 3.2.7; iCheck version 0.5.0.

Processing of image files

   The overall VDAART pilot gene expression data consist of 144 arrays
   from 12 HumanHT-12_V4_0_R2_15002873_B chips. Each chip has one genetic
   control (GC) array and 11 sample arrays. Each array has measures of
   gene expression levels for 47,304 sample gene probes and 882 quality
   control probes. We excluded 13 technically failed arrays, and one
   frankly outlying sample. Statistics on control arrays indicated good
   reproducibility of the protocol, with pairwise R^2 among the 12 GC
   arrays ranging from 0.961 to 0.995, and pairwise R^2 among duplicate
   and triplicate arrays ranging from 0.856 to 0.996. Probes analyzed were
   confined to those annotated to autosomal chromosomes. Background
   adjustment, log2 transformation and quantile normalization were
   performed using the functions lumiB, lumiT, and lumiN in R
   BioConductor's library lumi. Pairs of first/third trimester samples
   were hybridized to arrays on the same chip, blocked together to
   mitigate potential chip effects. The final expression data was further
   filtered to yield a final expression set of 36,116 probes with 60
   samples, representing 30 subjects with samples collected at 10–18 weeks
   (M1) and 30–38 weeks (M2), respectively.

Gene-expression analysis and differentially expressed genes

   Gene expression measurements were analyzed using the publicly available
   open source statistical computing and graphics software tool “R”
   ([66]http://www.r-project.org). In conjunction with this platform,
   several Bioconductor tools for detailed gene expression analysis
   ([67]http://www.bioconductor.org) were also used. Differential
   expression analysis was carried out using the Bioconductor package
   “siggenes” and a two class, paired case significance of analysis of
   microarrays (SAM) performed [[68]23, [69]24]. Based on this approach,
   the relative difference in gene expression for a specific gene is
   defined by the ratio of change in gene expression to the standard
   deviation for that gene [[70]23]. Differentially expressed genes are
   identified through permutation analysis, after setting a threshold
   delta for which the false discovery rate is <0.05. The number of
   permutations used was 500.

Gene Co-expression Network Analysis

   Differentially expressed transcripts identified by SAM analysis were
   further analyzed using weighted gene co-expression network analysis
   (WGCNA)[[71]25]. WGCNA reveals connections between genes by grouping
   transcripts based on topological overlap measure (TOM)[[72]26].
   Briefly, we first determined the mean connectivity (K) for all
   transcripts in the data set and used this to determine the lowest soft
   threshold power for which the scale-free topology index reaches 0.90.
   At a soft-threshold power of 8, transcripts were then clustered into 14
   highly interconnected modules using hierarchical clustering based on
   the topological overlap of their connectivity. Each module was assigned
   a color name as identifier, and the first principal component for the
   module expression matrix is computed and referred to as the module
   eigengene. To identify those modules with greatest clinical relevance,
   Pearson’s correlation coefficient was computed to correlate the
   eigengene for each module to various maternal and clinical traits.
   Illumina Probes IDs were extracted from the modules and matched against
   official gene Entrez identifiers and gene symbols using the
   Bioconductor annotation package “Illumina Human Illumina expression
   annotation data (chip lumiHumanAll) [[73]27].

GeneMANIA Biological Pathway and GO Biological Pathway analysis

   Modules with significant correlation to VitD levels were chosen for
   further characterization to identify relevant biological pathways using
   the GeneMANIA platform ([74]http://genemania.org). To this end, we
   first matched transcripts to their corresponding official human gene
   symbols and uploaded these to the GeneMANIA website. The options
   Genetic, Physical and Predicted interactions were applied and the
   attributes Consolidated Pathways, InterPro and
   Transcriptional-Factor-Targets-2013 selected. The query was run with
   the setting: “equal by network" and with number of related genes set to
   “0”. This ensured that only module genes measured in our study were
   used in the pathway analysis. The analysis consisted of those genes for
   which gene symbols were recognized by the webtool, while any duplicate
   genes and those that did not match any annotated Gene Symbol were
   excluded. A biological network analysis was performed on the modules of
   interest using the “Consolidated Pathway” function as described by
   GeneMANIA [[75]28]. GO biological pathway analysis was performed using
   WebGestalt, a web-based gene set analysis tool kit
   ([76]http://bioinfo.vanderbilt.edu/webgestalt). P-values for
   differential network or gene-set expression were obtained via the
   hypergeometric method, adjusted for multiple comparisons using the
   Benjamini & Hochberg FDR measure.

MetaCore Network

   We used the web accessible MetaCore™ platform from Thompson Reuters to
   perform additional module network analysis to generate a list of key
   transcription factors networks and their targets within modules of
   interest. Transcripts within modules of interest were converted to
   Entrez IDs and uploaded to the MetaCore™ website at
   ([77]https://portal.genego.com) and a transcription factor network
   enrichment analysis performed. The “Transcription Regulation Workflow”
   creates sub-networks that are centered on transcription factors as
   described by MetaCore ™. Each sub-networks contains nodes from module
   of interest that are labeled (seed nodes). Transcription factors
   sub-networks are ranked by a p-value and interpreted in terms of Gene
   Ontology (GO).

Data

   The data discussed in this publication have been deposited in NCBI’s
   Gene Expression Omnibus and are accessible through GEO Series accession
   number [78]GSE86200. Available at:
   [79]http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=ixeziewgpzgthkj
   &acc=GSE86200

Results

Study Participants

   We examined changes in gene expression profiles over the course of
   pregnancy in a cohort of 30 pregnant women all of whom contributed
   whole blood samples in both the first and the third trimesters of
   pregnancy. The cohort was 80% African American and 20% Caucasian, with
   mean age of 25.2 years and mean pre-pregnancy body mass index (ppBMI)
   of 32.61kg/m^2. Subjects were recruited from 3 cities, with 20%
   recruited in Boston, 20% in San Diego and 60% in St. Louis with
   baseline 25(OH)D levels of 47.2 nmol/L and third trimester levels of
   79nmol/L ([80]Table 1). We compared this nested cohort to the remaining
   VDAART study cohort of 846 women and found no significant differences
   in population characteristics ([81]Table 2).

Table 1. Population Characteristics of 30 pregnant women.

   Total no. of Subjects N = 30
   Age at enrollment (yrs) Mean(SD) 25.2 (5.6)
   [Min, Max] [18, 36.8]
   n %
   Race/Ethnicity African American 24 80
   Caucasian (Non-Hispanic) 6 20
   Site of Enrollment Boston 6 20
   San Diego 6 20
   St. Louis 18 60
   Self reported asthma No 19 63
   Yes 11 37
   Number of pregnancies, including VDAART One 12 40
   Two 6 20
   > = 3 12 40
   Treatment arm (400 IU) 15 50
   (4,400 IU) 15 50
   Baseline/Enrollment Vitamin D nmol/l (25 OHD) Mean (SD) 47.2 (25.4)
   [Min, Max] [16, 111.3]
   Third trimester Vitamin D nmol/l (25 OHD) Mean (SD) 79 (37.1)
   [Min, Max] [24.9, 156.7]
   ppBMI (n = 27)[82]^* Mean (SD) 32.6 (8.8)
   [Min, Max] [20.5, 53.4]
   [83]Open in a new tab

   *ppBMI = pre-pregnancy body mass index

Table 2. Population Characteristics of the VDAART women, EXCLUDING the 30
women.

   Total no. of Subjects N = 846
   Age at Enrollment (yrs) Mean (SD) 27.5 (5.5)
   [Min, Max] [18, 39.5]
   n %
   Race/Ethnicity African American 356 42
   Caucasian (Hispanic) 120 14
   Caucasian (Non-Hispanic) 224 26
   Other 146 17
   Site of Enrollment Boston 256 30
   San Diego 294 35
   St. Louis 296 35
   Self reported asthma No 499 59
   Yes 347 41
   Number of pregnancies, including vdaart One 293 35
   Two 222 26
   > = 3 331 39
   Treatment arm 400 IU 425 50
   4,400 IU 421 50
   Baseline Vitamin D nmol/l (25 OHD) (n = 840) Mean (SD) 57.5 (25.2)
   [Min, Max] [10.9, 201.7]
   Third trimester Vitamin D nmol/l (25 OHD) (n = 747) Mean (SD) 82.4
   (36.4)
   [Min, Max] [14.2, 224.9]
   ppBMI (n = 669)[84]^* Mean (SD) 28.1 (7.5)
   [Min, Max] [15.8, 78.9]
   [85]Open in a new tab

   *ppBMI = pre-pregnancy body mass index

Differential Gene Expression during Pregnancy

   To identify differentially expressed transcripts, we used the
   significance analysis of microarrays (SAM) algorithm of Tusher et al.
   [[86]23] to compute moderated t-tests for all transcripts on the
   filtered Illumina expression set. Comparison of expression profiles
   during the third trimester with the first trimester identified a total
   of 5839 differentially expressed transcripts at a False Discovery Rate
   (FDR) <0.05 ([87]Fig 1). Of these, 2473 transcripts were significantly
   increased expression in third relative to first trimester, while 3366
   transcripts had decreased expression in the third, relative to the
   first, trimester. Up and down-regulated transcripts were ranked based
   on the lowest p-value. Top differentially expressed genes ([88]Table 3
   and [89]S1 Table).

Fig 1. SAM plot of maternal gene expression during the course of pregnancy:

   [90]Fig 1
   [91]Open in a new tab

   Differentially expressed gene-probes were identified by plotting the
   Observed test scores (di) against expected scores at a threshold of
   delta = 1.22 (parallel lines) and FDR = 0.05. 2734 genes had d-values
   greater than the expected d-value (up regulated) while 2589 had
   d-values lower than expected (down-regulated). Modified t-test,
   adjusted for multiple testing (BH), adjusted p-value <0.05.

Table 3. Top differentially expressed genes identified by SAM analysis, (FDR
<0.05).

   Up-regulated Genes
   Entrez ID Gene P-value adj. P-value Gene Description
   8451 CUL4A < 0.0001 < 0.0001 cullin 4A
   83666 PARP9 < 0.0001 < 0.0001 poly (ADP-ribose) polymerase family,
   member 9
   4128 MAOA < 0.0001 < 0.0001 monoamine oxidase A
   10935 PRDX3 < 0.0001 < 0.0001 peroxiredoxin 3
   948 CD36 < 0.0001 < 0.0001 CD36 molecule (thrombospondin receptor)
   221895 JAZF1 < 0.0001 < 0.0001 JAZF zinc finger 1
   6423 SFRP2 < 0.0001 < 0.0001 secreted frizzled-related protein 2
   56994 CHPT1 < 0.0001 < 0.0001 choline phosphotransferase 1
   10935 PRDX3 5.54E-08 7.80E-05 peroxiredoxin 3
   7027 TFDP1 5.54E-08 7.80E-05 transcription factor Dp-1
   4928 NUP98 1.11E-07 9.48E-05 nucleoporin 98kDa
   440672 NUDT4P1 1.11E-07 9.48E-05 nudix (nucleoside diphosphate linked
   moiety X)-type motif4
   11171 STRAP 1.11E-07 9.48E-05 serine/threonine kinase receptor
   associated protein
   6772 STAT1 1.11E-07 9.48E-05 signal transducer and activator of
   transcription 1
   140739 UBE2F 1.11E-07 9.48E-05 ubiquitin-conjugating enzyme E2F
   Down-regulated Genes
   Entrez ID Gene P-value adj. P-value Gene Description
   5333 PLCD1 < 0.0001 < 0.0001 phospholipase C, delta 1
   6844 VAMP2 < 0.0001 < 0.0001 vesicle-associated membrane protein 2
   (synaptobrevin 2)
   6689 SPIB < 0.0001 < 0.0001 Spi-B transcription factor (Spi-1/PU.1
   related)
   135 ADORA2A < 0.0001 < 0.0001 adenosine A2a receptor
   2788 GNG7 5.54E-08 7.80E-05 guanine nucleotide binding protein (G
   protein), gamma 7
   3633 INPP5B 5.54E-08 7.80E-05 inositol polyphosphate-5-phosphatase,
   75kDa
   1359 CPA3 5.54E-08 7.80E-05 carboxypeptidase A3 (mast cell)
   84958 SYTL1 1.11E-07 9.48E-05 synaptotagmin-like 1
   26207 PITPNC1 1.11E-07 9.48E-05 phosphatidylinositol transfer protein,
   cytoplasmic 1
   326624 RAB37 1.11E-07 9.48E-05 RAB37, member RAS oncogene family
   128637 TBC1D20 1.11E-07 9.48E-05 TBC1 domain family, member 20
   9619 ABCG1 1.11E-07 9.48E-05 ATP-binding cassette, sub-family G
   9813 EFCAB14 2.22E-07 0.000161771 KIAA0494
   9619 ABCG1 2.77E-07 0.000161771 ATP-binding cassette, sub-family G
   8498 RANBP3 2.77E-07 0.000161771 RAN binding protein 3
   [92]Open in a new tab

Weighted Gene Co-expression Network Analysis

   To enable a comprehensive analysis of the expressed gene list that
   takes into account all significantly differentially expressed
   transcripts, we used a well described (unsupervised) methodology of
   gene correlation network analysis to cluster transcripts into groups of
   highly interconnected modules based on topological overlap mapping,
   Weighted Gene Correlation Network Analysis (WGCNA) [[93]25].

   The weighted gene correlation network analysis clustered 5839
   transcripts into 14 co-expression modules, with the gray module
   representing those transcripts with the lowest connectivity scores
   ([94]Fig 2). Each module varies in size, consisting of different
   numbers of transcripts per module and is represented by a single
   eigengene value (first Principal Component); this “eigengene vector” is
   a theoretical average gene expression value based on actual expression
   levels of all transcripts in the module. To identify modules with
   transcripts of greatest clinical relevance, the eigengene values of
   each module was further correlated to various maternal traits. This led
   to the identification of three modules (color coded as green, yellow
   and salmon), each consisting of 241, 287 and 61 transcripts,
   respectively, that were significantly correlated to maternal
   pre-pregnancy body mass index (ppBMI), race and VitD levels at both
   early (enrollment) and late time points (3^rd trimester). Specifically,
   our data show that the green module is positively correlated with both
   baseline VitD and late VitD levels with correlation coefficients of
   0.35 (p = 0.05) and 0.45 (p = 0.01), respectively. Similarly, the
   yellow module was significantly correlated with late VitD levels, with
   a correlation coefficient of 0.38 (p = 0.04) while the salmon module is
   positively correlated to maternal ppBMI, and negatively correlated to
   both baseline VitD level and mother’s race, with correlation
   coefficients of 0.41 (p = 0.02), -0.41 (p = 0.02) and -0.44 (p = 0.07)
   respectively ([95]Fig 2). After adjusting for the effects of VitD using
   multiple regression analysis, race and ppbmi were no longer
   significantly correlated with the salmon module.

Fig 2. Weighted gene coexpression network analysis and associations with
clinical traits:

   [96]Fig 2
   [97]Open in a new tab

   Weighted Co-expression Network Analysis (WGCNA) was carried out on 5839
   differentially expressed probes identified by SigGenes. 14
   co-expression modules were identified and, and correlated to various
   clinical traits. Gene network, represented by different colored coded
   co-expression modules (y-axis) and their association with various
   clinical traits (x-axis). The intensity of the colors indicates the
   strength of the relationship, as indicated by the scale to the right.
   The range of the scale (+1 to -1) indicates either positive (+1) or
   negative (-1) correlation with a specific clinical trait. Top number in
   each box corresponds to the Pearson’s correlation coefficient between a
   module and a specific trait, while the lower number represents its
   p-value. Traits: ppbmi = pre-pregnancy BMI; gestdays = gestational age;
   basev = Vitamin D levels in 1^st trimester; latev = Vitamin D levels in
   3^rd trimester; mother.race = maternal race (White/African-American);
   Child.gender = infant gender (boy/girl). Pearson’s correlation
   (p<0.05).

Biological Pathway analysis of Green and Salmon gene modules

   Following WGCNA analysis, we wished to explore in more detail the
   nature of the green and the salmon module as these showed significant
   correlation with VitD levels at the 1^st trimester of pregnancy. The
   yellow module was not considered further, since it showed significant
   correlation at 3^rd trimester, but not the first trimester. To gain
   insight into the biological role (significance) of the 241 transcripts
   associated with the green module and 61 transcripts of the salmon
   module, we used the open source, web based GeneMANIA analysis tool to
   perform gene-set enrichment and identify major biological pathways
   using enrichment maps as accessed through the Consolidated Pathways
   option. Of the 61 genes uploaded to GeneMania, 46 (75%) were recognized
   as genes by the webtool, Overall, the consolidated pathway analysis
   revealed an enrichment of 11 of the 46 genes in a variety of biological
   pathways largely involved in immune defense, extracellular matrix
   reorganization and degradation, as well as activation of
   metalloproteinase genes ([98]Fig 3). In addition to the pathway
   enrichment analysis, we performed a functional enrichment analysis
   using Gene Ontology (GO). The salmon module was associated with a
   variety of significant GO terms, such as extracellular matrix
   reorganization, secretory granule as well as heparin and
   glycosaminoglycan binding ([99]S2 Table).

Fig 3. Functional Pathway Enrichment of Genes in Salmon Module.

   [100]Fig 3
   [101]Open in a new tab

   Gene maps constructed based on evidence from genetic (green line),
   physical (red line) and predicted (yellow line) interaction. The
   distance between groups of genes reflects the strength of their
   relationship and groups of genes that are more closely related are
   clustered together. Functional Pathway analysis revealing most enriched
   pathways based on InterPro, Pathway Commons and Transcription Factor
   target databases. The resulting visual map represents known functional
   pathways involving salmon gene nodes. Black circle = nodes, grey
   diamonds = enriched functional pathways

   Similarly, pathway analysis of the Green module resulted in the
   enrichment of several biological pathways ([102]Fig 4). Of the 241
   genes, 184 (76%) had recognizable gene symbols (on GeneMANIA). Of the
   184 annotated genes, 18 were found to be associated with known pathways
   such as glutathione biosynthesis, Notch Signaling pathway and E2F
   transcription factor network, among others. Functional GO enrichment of
   the Green module genes yielded significant GO functional terms before,
   but not after adjusting for multiple testing ([103]S3 Table).

Fig 4. Functional Pathway Enrichment of Genes in Green Module.

   [104]Fig 4
   [105]Open in a new tab

   Functional Pathway analysis of genes in green module revealing most
   enriched pathways based on InterPro, Pathway Commons and Transcription
   Factor target databases. The distance between groups of genes reflects
   the strength of their relationship and groups of genes that are more
   closely related are clustered together. Black circle = nodes, grey
   diamonds = enriched functional pathways

Transcriptional Network Analysis

   To gain additional insight into possible transcription factor networks
   identified by pathway enrichment, we used the MetaCore™ software to
   examine whether key transcription factor networks and their gene
   targets are overrepresented in the green module. To this end, we
   uploaded Entrez IDs of the 241 green module transcripts onto the
   MetaCore platform that converted to 202 annotated, unique, gene
   symbols. Subsequent transcription factor network analysis of these 202
   genes identified 21 different transcription factors that were either
   significantly enriched in the green module or acted on target genes
   within the green module ([106]Table 4 and [107]S4 Table).

Table 4. Enrichment analysis of key transcription factors that act on genes
in green module (FDR < 0.05).

   Network GO processes Total nodes Seed nodes adjusted p-value
   CREB1 G1/S transition of mitotic cell cycle, metallo-and- iron-sulfur
   cluster assembly, 73 73 3.980E-193
   c-Myc modulation by virus of host morphology or physiology or of other
   organism involved in symbiotic interaction 49 48 8.490E-124
   p53 cell cycle process cellular response to glucose starvation,
   negative regulation of cell cycle 20 19 4.240E-48
   ZNF143 single-organism carbohydrate metabolic process, carbohydrate
   metabolic process, nucleotide metabolic process), nucleoside phosphate
   metabolic process, CMP-N-acetylneuraminate biosynthetic process 19 18
   1.550E-45
   GCR-alpha cellular component organization, cellular component
   organization or biogenesis, cellular amino acid biosynthetic process,
   rhythmic process, response to arsenic-containing substance 19 18
   1.550E-45
   Androgen receptor androgen receptor signaling pathway, intracellular
   steroid hormone receptor signaling pathway, positive regulation of
   transcription, DNA-dependent, positive regulation of RNA metabolic
   process, positive regulation of gene expression 16 15 7.080E-38
   SP1 response to arsenic-containing substance, cellular response to
   chemical stimulus, cellular nitrogen compound metabolic process,
   modulation by virus of host morphology or physiology, regulation of
   transcription from RNA polymerase II promoter in response to hypoxia 15
   14 2.490E-35
   ESR1 (nuclear) intracellular receptor signaling pathway, RNA metabolic
   process, intracellular steroid hormone receptor signaling pathway, gene
   expression, transcription from RNA polymerase II promoter 15 14
   2.490E-35
   E2F1 cell cycle process, mitotic cell cycle, negative regulation of
   cellular process, cell cycle, negative regulation of biological process
   14 13 8.650E-33
   [108]Open in a new tab

   Top 10 significantly enriched transcription factor networks of the 202
   annotated genes from the green module. For each network, the GO
   processes are shown along with the number of genes from the green
   network that are enriched within each network and the number of total
   nodes that define the network. Total nodes = total number of objects in
   the network (database); Seed nodes = number of objects in green
   dataset. Hypergeometric test adjusted for multiple comparisons using
   Benjamini & Hochberg (p<0.05).

   The top five sub-networks were represented by, cAMP responsive element
   binding protein 1(CREB1), Myc Proto-Oncogene Protein (c-Myc), tumor
   protein p53, zinc finger protein (ZNF) 143 and glucocorticoid
   receptor-alpha (GCR-alpha). The total nodes for each sub-network were
   calculated and the number of seed nodes (number of genes from the green
   module) identified. Our data show, that the CREB1 transcription factor
   sub-network represented the sub-network with the lowest p-value (p<
   10^−20), contained 73 total nodes of which all 73 nodes (100%) were
   derived from the original data set (green module). In contrast, the
   c-Myc transcription factor sub-network, which was also found to be
   significantly enriched (p<10^−20), was found to have a total of 49
   nodes of which 48 (98%) were derived from the green module data set
   (identified as “seed nodes”). Indeed, our data show that of the 21
   transcription factor sub-networks, only the CREB1 network was fully
   contained within the module, with both the transcription factor (CREB1)
   and a high number of target genes (n = 72) with known interactions
   being present ([109]Fig 5).

Fig 5. Transcription factor enrichment analysis of Green module genes.

   [110]Fig 5
   [111]Open in a new tab

   CREB1 transcription factor network depicting CREB1 in center and known
   interactions among 72 genes demonstrating various functionality within
   the green module. Hypergeometric test adjusted for multiple comparisons
   using Benjamini & Hochberg (p<0.05).

Discussion

   In this study, we have conducted a longitudinal whole genome expression
   analysis over the course of pregnancy ranging from the 1^st to the 3^rd
   trimester of pregnancy in 30 otherwise healthy women. Our study
   consisted of a cohort nested within a larger randomized, controlled
   trial, the Vitamin D Antenatal Asthma Reduction Trial
   (VDAART)[[112]13]. Initial gene expression analysis identified 5839
   differentially expressed transcripts that were significantly altered
   between the 1^st and the 3^rd trimester. Of these, 57% (3366 probes)
   were down regulated in later gestation while 42% (2473 probes) were up
   regulated ([113]Fig 1). The relatively large number of differentially
   expressed transcripts likely reflects the dramatic physiological
   changes associated with pregnancy that ultimately enable the maternal
   support of a developing human fetus [[114]17].

   To derive meaningful biological information from large gene lists, we
   employed an unsupervised network approach based on co-expression
   analysis using WGCNA [[115]25]. This approach resulted in the
   generation of 14 distinct modules which were subsequently correlated to
   specific clinical maternal traits such as pre-pregnancy body mass index
   (BMI), gestational age, VitD levels, maternal race and child gender
   ([116]Fig 2). Correlating gene modules with clinical traits enables
   identification of those traits that likely impart significant influence
   on gene expression profiles during pregnancy. In this regard, our data
   show that 2 of the 14 modules, green and salmon (representing 61, and
   241 probes, respectively), were significantly associated with serum
   vitamin D levels both, during the 1^st and 3^rd trimester of pregnancy.
   The salmon model was also found to be negatively correlated with race
   and positively correlated with pre-pregnancy BMI ([117]Fig 2). While
   these observations suggest a mixed influence over gene expression
   profiles that are driven by all 3 covariates, subsequent multiple
   regression analysis revealed that two of the variables (pre-pregnancy
   BMI and race) behaved in a collinear fashion, as their effects become
   non-significant after adjustment for serum VitD level. As a result, our
   analysis suggests that transcriptional profiles of the salmon and green
   module are significantly associated with maternal vitamin D levels
   during pregnancy.

   To gain insight into the biological significance of the two modules
   (salmon and green), we performed a pathway enrichment analysis. The
   analysis revealed that the salmon module is enriched in genes largely
   involved in antimicrobial processes and immune defense mechanisms. In
   addition, we observed a significant enrichment for genes associated
   with cellular rearrangement as well as collagen and extra cellular
   matrix deposition/reorganization. Interestingly, this module was
   enriched for genes responsive to vitamin D regulation, particularly
   neutrophil associated genes such as defensins, antimicrobial peptide
   cathelicidin (CAMP) and Neutrophil Elastase (ELANE) [[118]29, [119]30].
   In addition, studies have shown that vitamin D induces the expression
   of potent antifibrotic genes, such as matrix metalloproteinase 8 (MMP8)
   while reducing collagen and other profibrotic factors [[120]31], genes
   and pathways that were also represented in this module. The
   significance of these findings within the context of their direct
   relationship to fetal development remains to be determined, however
   within the context of pregnancy, several studies have shown that this
   period is marked by leukocytosis (particularly, neutrophilia) and is
   associated with enhanced immune regulated responses [[121]32].

   Similar pathway analysis of the green module revealed biological
   pathways such as cell cycle regulation, transcription factor networks,
   glutathione regulation and several metabolic pathways. To examine the
   nature of the transcriptional networks more closely, we conducted
   additional (supervised) pathway enrichment analysis and hypothesized
   that pregnancy related transcriptional changes are likely associated
   with the activation of various transcription factors. The resulting
   network analysis identified 21 significant transcriptional networks
   with cyclic AMP Responsive Element Binding Protein 1 (CREB1) ranked as
   the top, most significant transcription factor network (p =
   3.9X10^-193). CREB1 is a major transcription factor that regulates over
   73 genes (of various functions) within the green module ([122]Table 4
   and [123]Fig 5) and is associated with a variety of GO Biological
   processes. Interestingly, CREB1 has been shown to interact with C/EBP
   family of transcription factors [[124]33]; a transcription factor that
   we also identified in our complete data set of 5839 genes ([125]S1
   Table). Interestingly, studies have shown functional cooperation
   between C/EBP beta and the vitamin D receptor (VDR), thus suggesting a
   direct influence of maternal VitD levels on CREB1 expression and the
   downstream regulation of CREB1 responsive genes [[126]34].

   The extent to which expression of maternal transcription factors
   influences fetal development is not well understood. Studies that
   directly link changes in gene expression in maternal peripheral blood
   with changes in fetal gene expression are lacking. However, gene
   expression studies in human fetal lung samples have identified numerous
   regulated genes, including transcription factors during early fetal
   lung development and a number of experimental studies using knock-out
   mouse models have demonstrated a crucial role for a variety of
   transcription factors in regulating fetal lung maturation [[127]35,
   [128]36]. Within this context, CREB1 was shown to be required for
   differentiation of respiratory epithelium during murine lung
   development [[129]36, [130]37]. Although these studies did not
   specifically examine transcription factors expression in peripheral
   blood of pregnant mice, they do support the rational that maternal gene
   expression profiles during pregnancy influences fetal lung maturation.

   Long chain fatty acids are important for fetal and postnatal growth.
   Most notably, they influence the development of the central nervous
   system, promote lactation, contribute to proper barrier formation in
   cellular membranes and have been shown to have important
   immunoregulatory roles [[131]38–[132]40]. Numerous epidemiological
   studies have shown associations between variation in maternal fatty
   acid intake and the development of allergic disease in the offspring
   [[133]40, [134]41]. In this context, we highlight the expression of
   very-long-chain 3-oxoacyl-CoA Synthase 6 (ELOVL6) within the green
   module. Interestingly, recent studies using human and mouse
   developmental time series have investigated the role of VitD regulated
   genes during fetal lung development and identified a number of VitD
   related genes that are differentially regulated during fetal lung
   development, including ELOVL6 suggesting that this gene contributes to
   normal fetal lung development [[135]42]. While the direct impact of
   ELOVL 6 expression in maternal peripheral blood on fetal gene
   expression, fatty acid biosynthesis and normal lung development remains
   to be elucidated, collectively these data suggest that VitD can
   influence the regulation of genes that are involved in a wide variety
   of complex cellular processes important during pregnancy and the
   developing fetal lung.

   In summary, our data suggests that maternal gene expression changes
   during pregnancy and that these changes are related to VitD levels.
   What remains unclear is whether these changes in maternal VitD levels
   impact fetal development directly, as it crosses the maternal-fetal
   interface of the placenta, or whether there is any direct effect of
   maternal gene expression on the fetus and that is independent of the
   changes in VitD levels. Due to ethical concerns, such questions cannot
   be practically addressed in the human setting. Such studies will likely
   require the use of animal models as proof-of concept studies that have
   the ability to simultaneously evaluate maternal gene expression and
   fetal development in animal models and provide a useful platform from
   which to derive conclusive evidence of the relationship between
   maternal and fetal gene expression. Within this context, our study
   provides a framework upon which to evaluate maternal, antenatal
   influences on the developing fetus and, consequently, the
   predisposition to chronic disease later in life.

Supporting Information

   S1 Table. Maternal differentially expressed genes between 1st and 3rd
   trimester of pregnancy.

   (PDF)
   [136]Click here for additional data file.^ (1.5MB, pdf)
   S2 Table. GO Biological Processes for Salmon Module.

   (PDF)
   [137]Click here for additional data file.^ (34.6KB, pdf)
   S3 Table. GO Biological Processes for Green Module.

   (PDF)
   [138]Click here for additional data file.^ (33.9KB, pdf)
   S4 Table. Transcription factor networks targeting genes within the
   green module.

   (PDF)
   [139]Click here for additional data file.^ (87.4KB, pdf)

Acknowledgments