Abstract
In today’s chicken egg industry, maintaining the strength of eggshells
in longer laying cycles is pivotal for improving the persistency of egg
laying. Eggshell development and mineralization underlie a complex
regulatory interplay of various proteins and signaling cascades
involving multiple organ systems. Understanding the regulatory
mechanisms influencing this dynamic trait over time is imperative, yet
scarce. To investigate the temporal changes in the signaling cascades,
we considered eggshell strength at two different time points during the
egg production cycle and studied the genotype–phenotype associations by
employing the Random Forests algorithm on chicken genotypic data. For
the analysis of corresponding genes, we adopted a well established
systems biology approach to delineate gene regulatory pathways and
master regulators underlying this important trait. Our results indicate
that, while some of the master regulators (Slc22a1 and Sox11) and
pathways are common at different laying stages of chicken, others
(e.g., Scn11a, St8sia2, or the TGF-
[MATH: β :MATH]
pathway) represent age-specific functions. Overall, our results
provide: (i) significant insights into age-specific and common
molecular mechanisms underlying the regulation of eggshell strength;
and (ii) new breeding targets to improve the eggshell quality during
the later stages of the chicken production cycle.
Keywords: eggshell strength, chicken, Random Forests, feature
selection, master regulators, over-represented pathways
1. Introduction
Today’s poultry industry is highly invested in the development of
chicken capable of producing more eggs in longer laying cycles [[32]1].
This production goal, however, must go hand in hand with improvement in
sustainability of egg quality, especially eggshell strength (ESS),
during the whole laying period [[33]1,[34]2]. The calcified eggshells
not only provide protection against physical damage but also play a
crucial role for the development of the embryo by allowing gaseous
exchange, abating moisture loss, and supplying calcium for the embryo
bone development [[35]3]. Multiple molecular actors involved in the
homeostasis and transportation of minerals, especially calcium, the
main constituent of the eggshell, have been identified [[36]4,[37]5].
More than 500 eggshell matrix proteins have also been reported
[[38]6,[39]7] implicating a plethora of genes that knit together the
complex protein scaffold and the mineral phase of the eggshell
[[40]5,[41]8]. However, most of these discoveries provide only the
genes expressed in a certain segment of the chicken oviduct, the
principal organ for egg development, and consequently the overall
mechanisms of eggshell development remain illusive. Moreover, similar
to other economically important traits, ESS remains relevant throughout
the productive life and commonly deteriorates with the age of the
chicken [[42]9]. This decline in the eggshell quality remains one of
the major reasons for replacing commercial flocks [[43]1]. Hence,
understanding the genetic basis of ESS at different laying stages is
very important for breeders if they are to extend the laying cycle of
chicken. Therefore, an analysis of this trait at different time points
during the life of the bird can better delineate its genetics and its
molecular mechanisms involved in this dynamic behavior [[44]10]. This
knowledge can then be utilized to design breeding strategies to improve
the eggshell quality during the later stages of the chicken production
cycle.
Until now, a variety of association studies have been conducted to
decipher the genetic architecture of quantitative traits such as ESS,
which led to the identification of a valuable repertoire of genes
controlling a range of traits (see the reviews [[45]11,[46]12,[47]13]).
Finding loci associated with a trait through genome wide association
studies (GWAS) is commonly based on single-SNP based models that test
each SNP for its association with the phenotype, ignoring its
dependency on the neighboring SNPs. This statistical design of GWAS
seems quite straightforward, yet entails several challenges including
those of population stratification, relationships among the samples,
multiple hypothesis testing, and overestimation of SNP effects, among
others, as pointed out in previous studies
[[48]14,[49]15,[50]16,[51]17]. Many approaches such as different
multiple testing correction methods and linear mixed models have been
proposed to overcome these challenges [[52]15,[53]18,[54]19]. However,
the most devastating challenge of GWAS still persisting is the lack of
power to detect the loci having medium to small effect sizes [[55]20].
This inability of GWAS to explain a major proportion of the
heritability has been under intensive discussion.
To overcome these limitations of GWAS, application of Bayesian
frameworks as well as machine learning algorithms have gained
importance in the last decade [[56]21,[57]22,[58]23,[59]24,[60]25].
Their comparative performance has been evaluated for a variety of
traits with different genetic architectures (see the reviews
[[61]13,[62]26,[63]27]). Nevertheless, multiple studies have revealed
that machine learning algorithms surpass currently available well-known
GWAS approaches in identifying genes having small effects on the
phenotype [[64]28,[65]29,[66]30]. In particular, Brieuc et al. pointed
out the efficiency of Random Forests (RF) models for analyzing a large
number of loci simultaneously and identifying promising associations
[[67]28]. Inspired by Brieuc et al.’s study, we applied the RF
algorithm to assess the importance of SNPs that could provide a clue of
their essential roles for ESS and to characterize the differences
observed in this trait at different time points. For the analysis of
the corresponding genes of these SNPs, we adopted a well established
systems biology approach and identified age-specific and common key
regulatory pathways and master regulators. These findings could: (i)
enhance our understanding of the regulatory mechanisms underlying
eggshell strength; and (ii) provide novel targets and hypotheses for
future breeding strategies. To the best of our knowledge, it is the
first study in this field which mainly focuses on the importance of the
age-specific and common key regulatory mechanisms in chicken to reveal
the genetic programs influencing the eggshell strength.
2. Materials and Methods
In this section, we describe the chicken dataset analyzed and the
methods applied. Our analysis follows the structure of [68]Figure 1.
Figure 1.
[69]Figure 1
[70]Open in a new tab
Flowchart of the analysis applied in this study (ESS, Eggshell
strength).
2.1. Chicken Dataset
To explore the genomic background of the changes that incur to the
eggshell strength during the life of laying birds, we analyzed a
genotype dataset that has previously been used to investigate the
accuracy of imputation as well as the prediction of genomic breeding
values in chicken [[71]31,[72]32,[73]33]. The dataset consists of a
purebred commercial brown layer line with 892 animals and 580,000 SNPs
generated using Affymetrix Axiom Chicken Genotyping Array. The
genotypic data do not contain mitochondrial SNPs. The corresponding
phenotypic data consist of eggshell strength (ESS) measured (as the
force in Newton that was required to break the eggshell) for each bird
at two distinct stages of its production cycle. These two stages were
then regarded as Time Point 1 and Time Point 2, respectively. The first
time point for ESS was recorded at the ages of 42, 45, and 48 weeks and
the second time point was recorded at the ages of 64 and 68 weeks.
Averages of the recorded breaking strengths at Time Point 1 (ESS1) and
Time Point 2 (ESS2) were used as phenotypes in the further analysis.
Extensive pedigree data, consisting of, in total, 40,545 individuals
from six generations, were available on these birds which were included
in an animal model for breeding values estimation of the birds. These
breeding values were then de-regressed following Garrick et al.
[[74]34] to obtain the pseudo-phenotypes that were used in the further
analysis. To ensure genotype quality, we filtered the genotyped data
and removed the SNPs: (i) that were unassigned to any chromosome or
present on the sex chromosomes; (ii) with a minor allele frequency <
0.01; (iii) with a genotyping call rate ≤ 97%; (iv) significantly
deviating from Hardy–Weinberg equilibrium (p-value
[MATH:
<1×10−<
/mo>6 :MATH]
); and (v) for animals having a SNP call rate smaller than 95%.
Finally, after filtering, we used 892 animals and 318,513 SNPs for our
analyses.
2.2. Association Analysis Using Random Forests
To identify SNPs potentially associated with eggshell strength, we used
the concept of the Random Forests (RF) algorithm to estimate the
relative importance of each SNP (attribute) regarding its involvement
in the prediction of response variables (de-regressed breeding values).
For this purpose, we employed the Boruta algorithm in our study
[[75]35], which is a specially developed powerful wrapper for the RF
based feature selection approach. The main principle of the Boruta
algorithm is based on the extension of the attributes by adding random
attributes to the dataset which are called shadow attributes and
created by shuffling the original values of each attribute (in our case
SNPs) in the dataset. The enlargement of the attributes results in
apposition of the randomness to the dataset, which leads to the
reduction of the bias of hidden (false) signals arising from random
fluctuations or correlations in the dataset [[76]35,[77]36,[78]37]. To
this end, a RF classifier is applied to the extended dataset, and SNPs
are systematically and iteratively removed whose importance are
significantly smaller than those of the shadow attributes. By repeating
the process of shadow attributes generation and RF algorithm
application, importance is assigned to all SNPs. As a result, the
Boruta algorithm provides a ranked list of SNPs with a decision of
whether the importance of a SNP is confirmed, rejected, or tentative.
It is important to note that a similar idea to the Boruta algorithm is
manually implemented in [[79]22] to assess the importance of SNPs.
2.3. Gene Set Analysis
We extracted the genes corresponding to the SNPs identified by the
Boruta algorithm from Ensembl using BioMart [[80]38] (R-script given in
[81]File S1). Furthermore, we performed a gene set analysis regarding
their molecular functions to obtain functional annotations of these
genes.
2.4. Identification of Master Regulators and Over-Represented Pathways
Following our previous studies [[82]39,[83]40], we performed the
"upstream analysis” and pathway analysis using the geneXplain platform
[[84]41] to gain more insight into the functional relationships of
genes. The algorithm of “upstream analysis” workflow was introduced by
Koschmann et al. [[85]42] and its main goal is to reveal the underlying
key regulators that control the activity of target genes. For this
purpose, the underlying algorithm of “upstream analysis” firstly
constructs molecular pathway networks and then detects convergence
points of these networks, which are called master regulators and are
likely to orchestrate the transcriptional regulation of several genes.
In our analysis, we used the GeneWays database [[86]43] and ran the
standard “upstream analysis” workflow with a maximum radius of 10 steps
upstream to identify the top five master regulators of each gene set
resulted from the previous step of the analysis.
To discover novel biological functions and to reveal the properties of
the genes under study, we performed a pathway enrichment analysis as
the second step of our analysis. To this end, we used the TRANSPATH
pathway database [[87]44], which is a regularly updated signaling
pathway database and contains information about genes, molecules and
reactions for the identification of age-specific and common
over-represented pathways.
3. Results
In this study, we performed the RF approach using the Boruta algorithm
to identify the informative SNPs associated with eggshell strength at
two time points during the laying cycle of commercial brown layer
chicken. For this purpose, the importance of each SNP was separately
assessed for its association with the phenotype of interest. To this
end, we obtained a list of SNPs for each time point whose importance
was confirmed by the Boruta algorithm for the prediction of the
phenotype. Analyzing both time points, we identified 3726 SNPs
associated with eggshell strength at Time Point 1 (ESS1) and 1815 SNPs
associated with eggshell strength at Time Point 2 (ESS2) (the lists of
SNPs are given in [88]Table S2). These SNPs were then mapped to the
genome and the genes harboring at least one of these SNPs were
identified for both traits. In total, we identified 405 genes for ESS1
and 253 genes for ESS2 (the lists of genes and their Gene Ontology (GO)
categories are given in [89]Tables S2 and S3, respectively). A closer
look at these gene lists reveals that 22 % (118 genes) of them are
overlapping (see [90]Figure 2), which depicts the conservation of some
of the underlying mechanisms involved in the synthesis of eggshell
during different stages of the egg production cycle. Our results also
show that a considerably high number of genes that were distinct for
the time points highlight the dynamic nature of this trait.
Figure 2.
[91]Figure 2
[92]Open in a new tab
Venn diagram depicting the number of genes associated with eggshell
strength at Time Point 1 (ESS1), at Time Point 2 (ESS2), and their
overlap.
This section is comprised of three parts. First, to gain a deeper
insight into these gene sets, we performed a gene set analysis and
clustered their functions based on the GO terms. Second, we performed
the “upstream analysis” introduced by Koschmann et al. [[93]42] for the
identification of specific and common master regulators of both time
points. Third, we present the over-represented pathways to further
elucidate the mechanisms that control the ESS at different production
stages of birds.
3.1. Gene Set Analysis
The functional classification of both gene sets indicates that there
are several GO categories that were common for both time points (see
the treemaps depicted in [94]Figure 3 and [95]Figure 4 and the top 15
GO terms in [96]Table 1 and [97]Table 2). In particular, the
transportation of cations across membranes was the most salient
function for the underlying mechanism of ESS at both time points. In
this regard, calcium ions, being the main constituent of the eggshell,
are supplied in large amount to the uterine fluid by transepithelial
transport. In addition, other cations such as sodium, magnesium, and
potassium are exchanged across the uterine endothelium to maintain the
cell homeostasis [[98]4,[99]5]. This transmembrane transport remains
important during the production cycle to ensure the development of an
eggshell. The gene set analysis further reveals that the activities
pertaining to ATPase, GTPase, calmodulin binding, calmodulin-dependent
protein kinase, and Smad binding were specific for ESS1. Meanwhile,
functions related to hormone/vitamin D receptor binding, chaperone
binding, and Wnt-activated receptors were more relevant for ESS2.
Figure 3.
[100]Figure 3
[101]Open in a new tab
Gene Ontology (GO) treemap for genes associated with eggshell strength
at Time Point 1 (ESS1). The boxes are grouped together based on the
upper-hierarchy GO-term which is written in bold letters.
Figure 4.
[102]Figure 4
[103]Open in a new tab
Gene Ontology (GO) treemap for genes associated with eggshell strength
at Time Point 2 (ESS2). The boxes are grouped together based on the
upper-hierarchy GO-term which is written in bold letters.
Table 1.
Top 15 Gene Ontology (GO) molecular function terms based on the
adjusted p-value for the eggshell strength at Time Point 1 (ESS1).
GO Term GO Title Number
of Genes Adjusted
p-Value
GO:0005515 protein binding 281
[MATH:
5.11×10−8 :MATH]
GO:0005488 binding 331
[MATH:
1.97×10−7 :MATH]
GO:0043167 ion binding 155
[MATH:
4.93×10−3 :MATH]
GO:0000146 microfilament motor activity 5
[MATH:
4.93×10−3 :MATH]
GO:0003779 actin binding 20
[MATH:
6.9×10−
3 :MATH]
GO:0032559 adenyl ribonucleotide binding 49
[MATH:
1.47×10−2 :MATH]
GO:0030554 adenyl nucleotide binding 49
[MATH:
1.51×10−2 :MATH]
GO:0044877 macromolecular complex binding 50
[MATH:
1.54×10−2 :MATH]
GO:0004683 calmodulin-dependent protein kinase activity 5
[MATH:
1.54×10−2 :MATH]
GO:0005524 ATP binding 47
[MATH:
2.05×10−2 :MATH]
GO:0042623 ATPase activity, coupled 16
[MATH:
2.24×10−2 :MATH]
GO:0008092 cytoskeletal protein binding 30
[MATH:
3.32×10−2 :MATH]
GO:0043168 anion binding 74
[MATH:
3.93×10−2 :MATH]
GO:0046983 protein dimerization activity 40
[MATH:
4.15×10−2 :MATH]
GO:0017016 Ras GTPase binding 12
[MATH:
4.86×10−2 :MATH]
[104]Open in a new tab
Table 2.
Top 15 Gene Ontology (GO) molecular function terms based on the
adjusted p-value for the eggshell strength at Time Point 2 (ESS2).
GO Term GO Title Number
of Genes Adjusted
p-Value
GO:0005515 protein binding 168
[MATH:
1.30×10−2 :MATH]
GO:0022843 voltage-gated cation channel activity 9
[MATH:
2.09×10−2 :MATH]
GO:0005242 inward rectifier potassium channel activity 4
[MATH:
2.10×10−2 :MATH]
GO:0032549 ribonucleoside binding 40
[MATH:
2.79×10−2 :MATH]
GO:0000166 nucleotide binding 48
[MATH:
2.79×10−2 :MATH]
GO:0005524 ATP binding 34
[MATH:
2.79×10−2 :MATH]
GO:0001883 purine nucleoside binding 39
[MATH:
3.66×10−2 :MATH]
GO:0032559 adenyl ribonucleotide binding 34
[MATH:
3.66×10−2 :MATH]
GO:0005488 binding 199
[MATH:
3.66×10−2 :MATH]
GO:0030554 adenyl nucleotide binding 34
[MATH:
3.66×10−2 :MATH]
GO:0051427 hormone receptor binding 9
[MATH:
3.66×10−2 :MATH]
GO:0015276 ligand-gated ion channel activity 8
[MATH:
3.66×10−2 :MATH]
GO:0017076 purine nucleotide binding 39
[MATH:
3.7×10−
2 :MATH]
GO:0022836 gated channel activity 12
[MATH:
3.83×10−2 :MATH]
GO:0036094 small molecule binding 50
[MATH:
4.64×10−2 :MATH]
[105]Open in a new tab
Among others, the function of ATPase in eggshell formation has been
well investigated in previous studies [[106]5,[107]45]. Along with
maintaining a pH of the uterine fluid during the eggshell formation,
the ATPases also provide the required energy and function as
transmembrane transportation channels for ions [[108]46]. The
calmodulin binding and calmodulin-dependent protein kinase activity is
known to regulate the concentration of calcium in various cells
[[109]47] and so does the vitamin D receptor binding [[110]48]. The
chaperone binding activity of the genes associated with ESS2 is another
distinctive finding of this study. Chaperone proteins have been
reported in the uterine fluid where they perform the folding of the
eggshell matrix proteins into a rigid scaffold upon which
mineralization takes place to produce the fabric of eggshell [[111]5].
These functional classes elucidate the molecular functions that gain
more relevance depending on the age of the birds and demonstrate the
key functions that remain important throughout the laying cycle of the
birds.
3.2. Identification of Master Regulators
Applying the “upstream analysis” integrated in the geneXplain platform
[[112]41], we identified the top five age-specific and common master
regulators. While the master regulators Arx, Sox1, and Scn11a were
specifically found for ESS1, the master regulators St8sia2, Tead2, and
Prox1 were identified for ESS2. Additionally, Slc22a1 and Sox11 were
identified for both time points (see [113]Figure 5 and [114]Figure 6).
Figure 5.
[115]Figure 5
[116]Open in a new tab
Scheme of gene regulatory pathways revealing the top five master
regulators (pink filled boxes) for eggshell strength at Time Point 1
(ESS1) following the “upstream analysis” [[117]42]. The master
regulators written in dark blue and surrounded by dark blue boxes (Arx,
Scn11a and Sox1) were identified specifically for ESS1 while master
regulators written in dark red and surrounded by dark red boxes
(Slc22a1 and Sox11) were identified at both time points (corresponding
networks for eggshell strength at Time Point 2 (ESS2) in [118]Figure
6).
Figure 6.
[119]Figure 6
[120]Open in a new tab
Scheme of gene regulatory pathways revealing the top five master
regulators (pink filled boxes) for eggshell strength at Time Point 2
(ESS2) following the “upstream analysis” [[121]42]. The master
regulators written in dark blue and surrounded by dark blue boxes
(Prox1, St8sia2 and Tead2) were identified specifically for ESS2 while
master regulators written in dark red and surrounded by dark red boxes
(Slc22a1 and Sox11) were identified at both time points (corresponding
networks for eggshell strength at Time Point 1 (ESS1) in [122]Figure
5).
The ESS1 specific master regulator Scn11a is a gene encoding
transmembrane sodium channels which control the voltage-gated sodium
transport especially in the uterus [[123]49,[124]50], the site of
eggshell synthesis in birds. Moreover, the importance of sodium
channels in the transportation of inorganic minerals deposited in the
eggshell is well established [[125]51]. Interestingly, we found the
master regulator Slc22a1 at both time points. It codes for the protein
OCT1, an organic cation transporter for substrates such as putrescine
[[126]52], which plays an important role for eggshell thickness
[[127]53] and calcium transport in the intestine [[128]54].
Furthermore, many other members of the super-family of transport
proteins, Slc (solute carrier proteins), are well known to play an
essential role in the homeostasis of calcium ions in a variety of
tissues [[129]55]. The Slc proteins have also been reported to
transport magnesium ions during the egg calcification process [[130]5].
Another interesting master regulator, Sox11, which encodes a member of
the Sox (SRY-related HMG-box) family of transcription factors, was
found at both time points. Sox11 is known to positively regulate the
process of osteogenesis (the formation of bone) [[131]56]. This
regulator gains relevance given the importance of bone as a labile
reservoir of minerals, especially calcium [[132]4]. In birds, the
calcium homeostasis is achieved by regulating the metabolism of bone
minerals as well as by controlling the absorption and excretion of
calcium in the intestine and in kidneys, respectively [[133]57].
Furthermore, the master regulator Tead2 found for ESS2 is a regulator
of osteogenesis [[134]58] and it is also one of the direct downstream
target genes of Sox11. This might be an indication of different
regulatory mechanisms involved in the osteogenesis or bone remodeling
during the later stages of the laying cycle [[135]56].
The St8sia2, identified as an ESS2 specific master regulator, encodes a
membrane protein which catalyzes the metabolism of sialic acid
[[136]59], a carbohydrate found in the eggshell membranes
[[137]60,[138]61,[139]62]. The eggshell membranes constitute the inner
layer of the eggshell and contribute to its strength. They further
provide the nucleation sites for the initiation of the shell synthesis
[[140]63]. Sialic acid is also part of podocalyxin and secreted
phosphoprotein 1 (SPP1), both of which are glycoproteins found in the
uterus during eggshell calcification [[141]5,[142]64]. Because of its
high negative charge, podocalyxin is presumed to interact with calcium
carbonate during the calcification of the eggshell [[143]64]. The
master regulator Prox1 encodes the protein prospero homeobox 1 that has
also been reported as part of eggshell membranes [[144]65,[145]66].
However, the Prox1 gene is mostly implicated in the regulation of the
development of a variety of organs including liver, pancreas and kidney
[[146]67]. Although the vast majority of the master regulators could be
biologically characterized to be crucial for ESS, the importance and
role of the two master regulators Sox1 and Arx for this trait is
currently biologically unconfirmed and could hence provide novel
targets for future studies.
3.3. Identification of Over-Represented Pathways
To further elucidate and investigate the mechanisms that control the
ESS at different time points, we were interested in identifying
age-specific and common over-represented pathways. Applying the pathway
analysis, we identified eleven and nine significantly over-represented
pathways for ESS1 and ESS2, respectively, and seven of these pathways
are overlapping for both time points (see [147]Figure 7 and [148]Table
3).
Figure 7.
[149]Figure 7
[150]Open in a new tab
Venn diagram of over-represented pathways (p adjusted < 0.001) of
eggshell strength at Time Point 1 (ESS1), at Time Point 2 (ESS2), and
their overlap. Pathways are based on the TRANSPATH pathway database
[[151]44].
Table 3.
Significantly over-represented pathways for both time points (p
adjusted < 0.001) sorted by adjusted p-values (based on the smaller one
of either ESS1 or ESS2). Pathways are based on the TRANSPATH pathway
database [[152]44]. (ESS1/ESS2, eggshell strength at Time Point 1/2).
Pathway Name Adjusted p-Value
for ESS1 / ESS2 Over-Represented
in
E2F —/ Smad4
[MATH:
5.05×10−5 :MATH]
/
[MATH:
7.99×10−4 :MATH]
ESS1, ESS2
Endothelin-1 gene regulation
[MATH:
5.05×10−5 :MATH]
/ - ESS1
G2/M phase (cyclin A:Cdk1)
[MATH:
1.61×10−4 :MATH]
/
[MATH:
1.65×10−4 :MATH]
ESS1, ESS2
SMAD7, SIK1 gene induction
[MATH:
1.61×10−4 :MATH]
/ - ESS1
oxysterol —>apoE
[MATH:
1.61×10−4 :MATH]
/
[MATH:
1.85×10−4 :MATH]
ESS1, ESS2
LXR network
[MATH:
1.61×10−4 :MATH]
/
[MATH:
1.65×10−4 :MATH]
ESS1, ESS2
p73alpha —/ NF-Y - /
[MATH:
1.65×10−4 :MATH]
ESS2
Sox9 —Smad3—>COL2A1
[MATH:
5.43×10−4 :MATH]
/ - ESS1
G1 phase (Cdk6)
[MATH:
7.60×10−4 :MATH]
/
[MATH:
7.93×10−4 :MATH]
ESS1, ESS2
G1 phase (Cdk4)
[MATH:
9.77×10−4 :MATH]
/
[MATH:
7.99×10−4 :MATH]
ESS1, ESS2
p38 pathway
[MATH:
9.77×10−4 :MATH]
/
[MATH:
7.99×10−4 :MATH]
ESS1, ESS2
MIC2 signaling - /
[MATH:
7.99×10−4 :MATH]
ESS2
TGFbeta pathway
[MATH:
9.53×10−4 :MATH]
/ - ESS1
[153]Open in a new tab
Among the pathways shared by both time points, G1 phase (Cdk4), G1
phase (Cdk6), and G2/M phase (cyclinA:Cdk1) involve different members
of the cyclin-dependent kinase (CDK) family which regulate
transcription, mRNA processing, and, more importantly, cell cycle
[[154]68]. In the context of ESS, these pathways may influence the
differentiation efficiency of osteoblasts, osteoclasts, chondrocytes
[[155]69], and uterine epithelium cells, all of which are crucial for
the supply of calcium ions as well as for bone and calcium homeostasis
[[156]70,[157]71]. The p38 pathway is implicated in a variety of
cellular responses including those related to proliferation,
differentiation and apoptosis [[158]72]. Moreover, the role of this
pathway has also been reported in the egg development of Drosophila
melanogaster [[159]73]. The LXR (liver X receptors) network plays a
central role in the transcriptional control of lipid metabolism
[[160]74]. This pathway also mediates the concentrations of oxysterols
and ApoE (Apolipoprotein E) if activated in response to elevated
intra-cellular cholesterol levels [[161]75]. The oxysterols, oxygenated
forms of cholesterol, are intermediates in bile acid and steroid
hormone biosynthetic pathways [[162]76]. Among other steroid hormones,
estrogen is more intimately involved in calcium homeostasis and has
also been implicated in the development of osteoporosis [[163]77].
Moreover, other forms of oxysterols are also involved in calcium
metabolisms [[164]78] and mesenchymal stem cell differentiation
[[165]79]. In addition to the CDKs, the Smad4 proteins, predominantly
present in the nucleus of the cell, mediate the cell cycle due to their
association with the E2F family of transcription factors [[166]80].
These pathways can be upstream regulated by the transforming growth
factor
[MATH: β :MATH]
(TGF-
[MATH: β :MATH]
) [[167]81].
The transforming growth factor-
[MATH: β :MATH]
(TGF-
[MATH: β :MATH]
) signaling pathway can be regarded as the most important pathway
enriched for ESS1. This pathway, among its other functions, is
well-known for its role in bone homeostasis [[168]82]. Furthermore,
some components of this pathway also overlap with other pathways
delineated in our analysis. The Sox9 is a transcription factor that
regulates the expression of the COL2A1 (collagen type II, alpha 1) gene
which contributes to collagen formation [[169]83]. During this process,
Smad3, a member of effector molecules in the signaling pathways of the
TGF-
[MATH: β :MATH]
ligand superfamily is activated [[170]84]. Another pathway that is
based on Smad7, SIK1 gene induction also regulates TGF-
[MATH: β :MATH]
signaling [[171]85]. Owing to this crosstalk with a variety of other
pathways, the TGF-
[MATH: β :MATH]
signaling pathway allows the bone to adapt to dynamic environments
[[172]82].
The Endothelin-1 gene (ET-1) regulation pathway includes the mechanisms
regulating ET-1 gene expression. Among other functions, ET-1 is
involved in osteoblast proliferation and differentiation in bone tissue
as well as in the ovulation process in the uterus [[173]86]. ET-1 gene
regulation is responsive to intracellular calcium and calmodulin
[[174]87]. The MIC2 signaling pathway, which was specifically enriched
for ESS2, has CD99 as the main cell surface protein and has been
implicated in apoptosis, adhesion, differentiation, and protein
trafficking possibly by affecting actin cytoskeleton reorganization
[[175]88,[176]89]. Another ESS2 specific pathway involves the
inactivation of the nuclear factor Y (NF-Y) transcription factor by p73
proteins, a process that represses the promoter of the telomerase
catalytic subunit and induces replicative senescence [[177]90,[178]91].
The activity of NF-Y is further linked to the parathyroid hormone,
which is the main regulator of calcium and phosphorus homeostasis.
Taken together, the pathways show a diversity of complex functional
features in chicken in response to age-dependent changes in eggshell
formation. Some pathways show a direct relevance for ESS while others
seem to be indirectly linked via interactions between pathways and
regulators [[179]92,[180]93].
4. Discussion
To uncover the associations between genetic variants and phenotypes,
genome wide association studies (GWAS) have become the method of choice
[[181]12]. Despite their success in identifying a multitude of genes,
the prediction performance of single-SNP based GWAS strategies is
limited [[182]15,[183]17,[184]94]. Alternatively, multi-marker models
including different Bayesian frameworks were introduced for GWAS. In
these models, all SNPs are fitted simultaneously as random effects
assuming a certain prior distribution of SNP effects [[185]13]. In
practice, these SNP effects are unknown and may not even strictly
follow a certain distribution [[186]25]. Unlike these traditional
statistical models, machine learning methods do not require these prior
assumptions about the genetic architecture of traits and have been
applied in GWAS in humans [[187]30] as well as in livestock
[[188]27,[189]95]. Especially, Romagnoni et al. [[190]30] and Huang et
al. [[191]24] showed that machine learning based algorithms provide
promising prediction power to assess genotype–phenotype associations.
In particular, the Random Forests (RF) algorithm has been successfully
applied for this purpose. These articles encouraged us to utilize RF in
our study since the application of GWAS to identify genetic variants
associated with ESS was futile.
Applying RF, we were able to identify a remarkably high number of genes
related to ESS which is in agreement with the findings of Maan et al.
[[192]6,[193]7], Mikšík et al. [[194]96,[195]97], and Brionne et al.
[[196]5], who pointed out a large number of genes/proteins involved in
ESS due to the complexity of this trait. The large difference in the
number of genes identified for ESS1 and ESS2 reflects the change in the
genetic and environmental components of the phenotypic variance over
age, as has been reported before for complex traits [[197]98,[198]99].
The overlap between the genes for both time points (see [199]Figure 2)
reflects that certain molecular functions remain relevant to eggshell
development during the laying cycle of chicken. Particularly, the
similarity of genes responsible for the transportation of ions is in
line with the findings of Park et al. [[200]100] and Fan et al.
[[201]51] who found that the concentration level of different ions in
blood does not change with the age of chicken. Interestingly, a closer
look at the biological processes of these traits reveals that, while
highly significant GO terms are involved in development for ESS1, the
significant biological processes for ESS2 are rather related to
different metabolic processes ([202]Table S3). The differences in
biological processes at both time points could be associated with the
temporal changes in the signaling cascades influencing dynamic behavior
of eggshell strength over time.
In line with previous studies
[[203]39,[204]42,[205]101,[206]102,[207]103], we applied a systems
biology approach and identified master regulators to investigate and
unravel the transcriptional regulatory machinery of ESS associated
genes. Interestingly, our results show that, similar to the genes,
there are common master regulators (Sox11 and Slc22a1) for both time
points, which are likely to govern various eggshell related processes
during the laying of the birds. In particular, being a member of the
Slc superfamily which is involved in the transmembrane transport, the
Slc22a1 could be essential to eggshell development. For ESS1, the most
promising master regulator Scn11a controls sodium transport in the
uterus [[208]49,[209]50] to maintain a voltage difference as well as
osmolarity across uterine cell membranes to help in the calcium
transportation [[210]51]. In ESS2, the master regulator Tead2 together
with the master regulator Sox11 underline the importance of bone
remodeling during the later stages of the production cycle of the
chicken.
Another fundamental step of our analysis was the identification of the
over-represented pathways. The results of this analysis also reinforce
the findings of gene set analysis as well as the identified master
regulators. Some of the over-represented pathways were conserved at
both time points while others were age-specific. Here, we specifically
highlight the well-characterized TGF-
[MATH: β :MATH]
pathway that interacts with most of the identified pathways in our
analysis to regulate bone homeostasis and thus might play an important
role in ESS [[211]82]. The majority of the remaining pathways,
especially those which are common to both time points, were found to be
related to the cell cycle. The uterine epithelium and bone are the
tissues that actively take part in the development of eggshell, hence
the renewal of the cells of both tissues is crucial for the synthesis
of a strong eggshell [[212]4]. Furthermore, multiple studies suggest
that a declining ability of uterine epithelium cells to transport
calcium is the main reason of the age-related deterioration of
eggshells [[213]51,[214]100]. In particular, the ESS2 specific p73alpha
—/ NF-Y pathway that results in the inactivation of the NF-Y
transcription factor by p73 proteins and consequently causes
replicative senescence of cells [[215]90] may also point towards the
underlying reason for weaker eggshells during the later stages of the
production cycle.
Recently, the use of systems biology based approaches to study the
traits of economic importance is gaining importance in the field of
agriculture [[216]39,[217]102,[218]103,[219]104]. However, one of the
major impediments in the use of this knowledge in practical animal
breeding is to integrate this large amount of information into
traditional genetic evaluation programs [[220]105]. A small group of
master regulators such as those identified in our analysis integrated
into prediction models can possibly be a remedy and might provide novel
breeding targets to improve the economically important trait of ESS.
Additionally, the knowledge about the specific pathways such as TGF-
[MATH: β :MATH]
could provide novel hypotheses for further studies.
5. Conclusions
In this study, we performed a systematic analysis to investigate the
age-specific and common regulatory mechanisms that underlie the dynamic
trait eggshell strength in chicken. For this purpose, we applied a RF
feature selection algorithm to detect the age-dependent
genotype–phenotype associations and then used a well established
systems biology approach to highlight the master regulators and
regulatory pathways that govern the underlying genetic mechanisms of
eggshell development. Our results show that most of the genes
identified for the ESS at both time points are in agreement with
previous studies. Our findings further indicate that some biological
processes related to eggshell development remain conserved across
production stages while others are age-specific and thus changing over
time. To the best of our knowledge, this is the first study revealing
master regulators and over-represented pathways in the context of ESS
and our findings should be further utilized to design novel hypothesis
for future studies.
Acknowledgments