Abstract

   Improved reproductive efficiency could lead to economic benefits for
   the beef industry, once the intensive selection pressure has led to a
   decreased fertility. However, several factors limit our understanding
   of fertility traits, including genetic differences between populations
   and statistical limitations. In the present study, the RNA-sequencing
   data from uterine samples of high-fertile (HF) and sub-fertile (SF)
   animals was integrated using co-expression network meta-analysis,
   weighted gene correlation network analysis, identification of upstream
   regulators, variant calling, and network topology approaches. Using
   this pipeline, top hub-genes harboring fixed variants (HF × SF) were
   identified in differentially co-expressed gene modules (DcoExp). The
   functional prioritization analysis identified the genes with highest
   potential to be key-regulators of the DcoExp modules between HF and SF
   animals. Consequently, 32 functional candidate genes (10 upstream
   regulators and 22 top hub-genes of DcoExp modules) were identified.
   These genes were associated with the regulation of relevant biological
   processes for fertility, such as embryonic development, germ cell
   proliferation, and ovarian hormone regulation. Additionally, 100
   candidate variants (single nucleotide polymorphisms (SNPs) and
   insertions and deletions (INDELs)) were identified within those genes.
   In the long-term, the results obtained here may help to reduce the
   frequency of subfertility in beef herds, reducing the associated
   economic losses caused by this condition.

   Keywords: meta-analysis, RNA-sequencing, gene network, functional
   candidate genes, systems biology, subfertility, beef cattle

1. Introduction

   Intensive selection pressure has led to a decrease in fertility
   efficiency in both beef and dairy cattle populations [[28]1,[29]2].
   Genetic mechanisms such as pleiotropy, genetic hitchhiking, and
   epistasis can be the cause of the genetic correlations, and consequent
   undesirable effects, observed between production and fertility traits
   [[30]3,[31]4,[32]5,[33]6,[34]7]. Poor fertility and reproductive
   inefficiency are among the main causes of the negative impact on the
   profitability of both beef and dairy herds [[35]8,[36]9]. Embryo
   mortality is the major factor affecting fertility and production costs,
   with the majority of pregnancy losses occurring in the first month of
   pregnancy [[37]10,[38]11]. Regarding the comparison between infertility
   and subfertility, sub-fertile animals are more pervasive in the herds
   because of the fact that true infertility has a frequency of up to 5%
   in the herds [[39]12,[40]13,[41]14]. Additionally, sub-fertile animals
   can generate progeny, consequently maintaining the causal alleles for
   this phenotype in the population [[42]15]. On the other hand, the
   causal alleles for the infertile phenotype tend to reduce its allelic
   frequency naturally across time owing to the absence of progeny
   carrying the alleles. Therefore, sub-fertile animals have a higher
   probability to have a higher cost in the livestock industry.

   Reproductive traits are considered complex phenotypes as they present a
   high heterogeneity, high environmental impact, and do not follow a
   Mendelian inheritance pattern [[43]16]. In this sense, investigating
   the genes involved in complex phenotypes is not a trivial task.
   Especially when using high-throughput genetic tools, which usually
   demand a high number of samples and a high accuracy of the phenotype
   evaluated to obtain a consistent and significant result. RNA-sequencing
   technology has allowed the identification of several candidate genes
   and genetic variants associated with fertility traits in cattle in the
   last decade [[44]17,[45]18,[46]19,[47]20,[48]21,[49]22,[50]23,[51]24].
   However, the majority of these studies are focused on the conventional
   gene by gene differential expression analysis. Other than to provide
   significant results to understand the genetic basis of complex traits,
   this approach may result in an underrepresentation of the genetic
   interactions between genes. The use of co-expression gene networks
   accounts for the expression profile across multiple samples, leading to
   the identification of regulatory and functional mechanisms in common
   [[52]25]. The guilt-by-association (GBA) principle is one of the major
   measurements to evaluate the quality of co-expression networks, where
   genes with similar functional activities tend to have a similar
   expression profile, and consequently, a higher co-expression [[53]26].
   Meta-analysis approaches tend to enhance the performance of
   co-expression networks when the GBA principle is evaluated
   [[54]27,[55]28,[56]29,[57]30,[58]31]. The application of network
   meta-analysis using high-throughput expression data is relatively new
   and can help to improve the detection of differentially expressed genes
   (DEGs) and to reduce the impact of differences between studies that can
   be hard to remove, such as bias during the library preparation step,
   which will implicate in spurious differences between groups
   [[59]27,[60]32]. This is reinforced by the stronger correlation
   observed between true log (fold-change) values and the values obtained
   in the network meta-analyses when these values are compared with those
   obtained from meta-analyses performed just merging datasets [[61]27].
   Consequently, the integration of both approaches (co-expression gene
   networks and network meta-analysis) can be a good alternative to
   increase the potential to identify functional candidate genes
   regulating a complex trait.

   In the present work, we performed the integration of network
   meta-analysis and weighted gene correlation network analysis (WGCNA)
   approaches in order to scrutinize the co-expression and the genetic
   basis of high-fertile and sub-fertile phenotypes in beef cattle.
   Additionally, potential functional candidate variants, fixed in one of
   the phenotypic groups, were prospected using the RNA-sequencing data.

2. Materials and Methods

2.1. Ethics Approval and Consent to Participate

   The current study integrates data from previously published studies.
   The respective information about the ethics approval can be found in
   Moraes et al., (2018) [[62]18] and Geary et al. (2016) [[63]20].

2.2. Data Collection

   The RNA-sequencing (RNA-seq) data from the endometrium tissue of high-
   and sub-fertile beef cows (HF and SF, respectively) were retrieved from
   National Center for Biotechnology Information (NCBI) Gene Expression
   Omnibus (GEO) public database from two previously published studies:
   [64]GSE81449 and [65]GSE107891 [[66]18,[67]20]. In these studies, the
   differentially expressed genes between HF and SF beef cows were
   identified. A total of 20 animals (10 animals per group; HF (n = 10)
   and SF (n = 10) from both studies) were analyzed. Briefly, the
   fertility status of those animals was based on the pregnancy outcome
   ratio after up to four rounds of successive high-quality embryo
   transfer protocol of estrus synchronization (PG-6d-CIDR and GnRH),
   where heifers that did not exhibit standing estrus received GnRH
   injection on day 0. As described by Moraes et al., (2018) and Geary et
   al., (2016) [[68]18,[69]20], the pregnancy outcome was detected by
   ultrasound and those animals with a pregnancy success ratio equal to
   100% or 25%–33% were classified as HF and SF, respectively. Additional
   details about the breed composition, synchronization protocol, flushes,
   biopsies, RNA extraction, and sequencing can be found in the original
   manuscripts [[70]18,[71]20].

2.3. RNA-Sequencing Data Alignment and Variant Calling

   The CLC Genomics Workbench 11.0 (CLC bio, Cambridge, MA, US) was used
   to perform quality control (QC), read alignment, transcript
   quantification, and variant calling [[72]33,[73]34,[74]35]. In QC, the
   PHRED score distribution, GC content, nucleotide contribution, and
   duplication levels were evaluated as described by Cánovas et al.,
   (2014) [[75]36]. Sequencing reads were aligned against the bovine
   reference genome ARS-UCD1.2 [[76]37] using the “Map reads to reference”
   algorithm with the following criteria: match score = 1; mismatch cost =
   2, length fraction = 0.5, and similarity fraction = 0.8. Subsequently,
   we quantified transcript expression (total counts) and only those genes
   with a fragments per kilobase of exon model per million reads mapped
   (FPKM) > 0.2 in both conditions (HF and SF) were maintained for the
   next analyses [[77]38,[78]39]. The variant calling was performed using
   the fixed ploidy variant detect algorithm (diploidy genome) on CLC
   Genome Workbench. A required variant probability >90%, a minimum
   coverage of 10, and a minimum count of 2 were set for the variant
   detection [[79]34]. The base quality filter was performed using a
   neighborhood radius = 5, minimum central quality = 20, and minimum
   neighborhood quality = 15 [[80]24]. Genetic variants (single nucleotide
   polymorphism, SNP; and insertion and deletion, INDEL) fixed in one of
   the groups were selected as potential functional variants for further
   analyses.

2.4. Identification of Genes with Expression Determined by the Study and
Outliers

   After filtering those genes with an FPKM > 0.2 in both conditions, the
   raw counts were used to perform a log-likelihood ratio test (LRT) in
   the DESeq2 package in R [[81]40] in order to estimate the impact of
   different studies over the gene expression. Those genes with a
   differential expression significantly affected (adjusted false
   discovery rate (FDR) 5%, p-value < 0.05) by the different studies
   ([82]GSE81449 and [83]GSE107891), not by the conditions (HF and SF),
   were excluded from the analysis. Additionally, the counts for the
   maintained genes were used to perform a clustering analysis in order to
   identify potential outliers among the samples. The Manhattan distance
   among the animals was calculated and used in a multidimensional scaling
   analysis in order to cluster the animals using the first two principal
   components. These analyses were performed using the function cmdscale
   in R. The possible outliers were removed from the next steps. In this
   study, the outliers were classified as those animals that did not
   cluster following the condition (HF and SF).

2.5. Meta-Analysis of Differentially Expressed Genes

   The DEGs were identified using the DESeq2 package in R, where a
   negative binomial generalized linear model was used using as a fixed
   effect the condition (HF and SF) of each animal [[84]40]. Initially,
   this analysis was conducted for each study individually. The threshold
   to define a gene as DE in each dataset was maintained as described
   previously by Geary et al., (2016) [[85]20] ([86]GSE81449; adjusted
   p-value FDR 5% < 0.1 and |log(fold-change (FC))| > 2) and Moraes et
   al., (2018) [[87]18] ([88]GSE107891; adjusted p-value FDR 5% < 0.05 and
   |log(FC)| > 2). Subsequently, the netmeta package in R [[89]41] was
   used to perform a network meta-analysis and calculated the combined
   test-statistics (p-values and log2(FC)) for each gene expressed in both
   datasets. The DEGs in the network meta-analysis approach were
   identified using a threshold composed by adjusted p-value <0.1 and
   |log2(FC)| >2, which is the combination of the less stringent threshold
   from both studies.

2.6. Weighted Correlation Network Analysis

   Once the comparability between the datasets was confirmed, the R
   package WGCNA (Weighted Correlation Network Analysis) [[90]42] was used
   to identify the differentially co-expressed modules (DcoEx) of genes
   for HF and SF groups of animals. Briefly, after QC, a soft-thresholding
   power was chosen based on a criterion of approximate scale-free
   topology. The first soft-thresholding power to reach a scale-free
   topology model fit ≥0.8 was selected for each group. Subsequently, the
   co-expression similarity matrix is raised to this soft-thresholding (8
   for HF and 8 for SF) power in order to obtain the adjacency matrix.
   Consequently, this last matrix was used to calculate the topological
   overlap measure (TOM). The adjacency matrix was calculated using a
   signed hybrid network. The module detection was performed using the
   blockwiseModules function using the dynamic tree cut algorithm [[91]43]
   with a minimum number of genes per module equal to 30 and the maximum
   size of blocks equal to 9000 (this number was selected to fit all the
   genes in a single module, which might increase the module detection
   sensibility). After the module detection by WGCNA package, the R
   package km2gcn [[92]44] was used to reallocate the genes within modules
   using a k-means clustering approach. Finally, the final modules
   detected for each group were compared using the following methodology:
     * -
       For each sample s in (HF and SF);
     * -
       For each module m(s) in s;
     * -
       Apply a Fisher’s exact test under the null hypothesis that there is
       no significant overlapping of m(HF) in SF and m(SF) in HF after a
       Bonferroni multiple test correction.

   At the end of this step, the modules of genes in HF without overlapping
   in SF, and vice-versa, here called DcoExp modules, were selected. The
   interconnectivity among the genes within each DcoExp module was plotted
   using the igraph package [[93]45]. In addition, the hub score of the
   genes within each module was estimated using the principal eigenvector
   of A*t(A), where A is the adjacency matrix of the module. From these
   results, the top 10 genes explaining the majority of the module
   topology were identified.

2.7. Functional Analysis and Annotation of Candidate Genes

   The DcoExp modules with top-hub genes harboring variants fixed in one
   of the conditions (HF and SF) were selected for the functional
   analysis, here called candidate DcoExp modules. The functional analysis
   was conducted using the “Core Analysis” function implemented in the
   ingenuity pathways analysis (IPA—Ingenuity System Inc, Redwood City,
   CA, USA). Genes without an associated gene symbol or without gene
   annotation were subjected to an annotation by homology. The BioMart
   application [[94]46] was used to retrieve the respective associated
   human gene symbol for those genes. Only the non-annotated genes, with a
   percentage of identity with the human homolog higher than 80%, were
   annotated by this approach. The enriched (p-value < 0.05) canonical
   pathways and diseases and functions for each candidate DcoExp modules
   were annotated. Additionally, the significant upstream regulators (FDR
   < 0.05 multiple testing correction) for each candidate DcoExp modules
   were identified. The functional candidate genes were selected among
   those genes within candidate DcoExp modules or in the significant
   upstream regulators that harbor fixed variants and are among those
   genes in which the expression profile was not determined by the study.

   Additionally, a “guilt by association”-based prioritization approach
   was performed using the GUILDify and ToppGene applications on the
   functional candidate genes [[95]47,[96]48]. Initially, GUILDify was
   used to retrieve a “trained-list” of candidate genes associated with
   pre-selected phenotypes. After this step, the “trained list” obtained
   using GUILDify and the list of functional candidate genes are used in
   ToppGene. Briefly, GUILDify uses BIANA knowledge base to identify genes
   associated with selected phenotypes. In the present study, the
   phenotypes used on GUILDify were as follows: “fertility”,
   “fertilization”, “decidualization”, “implantation”, “preimplantation”,
   “endometrium”, and “embryonic development”. BIANA creates a
   species-specific (human, in this study) interaction network for each
   gene identified by GUILDify. Subsequently, a prioritization algorithm
   based on network topology is used to rank the genes. Genes with a GUILD
   score higher than 0.44 (mean + 3 × standard deviation of GUILD score)
   were used to create the “trained” gene list. This list was used on
   ToppGene and a functional annotation-based prioritization was performed
   using a fuzzy-based multivariate approach. ToppGene uses the functional
   information shared among the “trained” and the functional candidate
   gene lists from several sources including the gene ontology (GO) terms
   for the three main categories of molecular function (MF), biological
   process (BP), and cellular component (CC); human and mouse phenotypes;
   metabolic pathways; Pubmed publications; co-expression pattern; and
   diseases. For each functional annotation for each functional candidate
   gene, a p-value was calculated using a random sampling of 5000 genes
   from the whole genome. In the next step, using a statistical
   meta-analysis, the p-values were combined into a final p-value.
   Subsequently, the genes with a significant p-value, after an FDR of 5%
   multiple corrections, were selected as prioritized genes. The complete
   description of the meta-analysis to calculate the final p-value is
   available in Chen et al., (2009) [[97]48]. Briefly, for each gene G, a
   p-value is computed by random sampling genes across the whole genome
   (5000 genes in this study) and a similarity score between G and the
   functional annotation of the trained list using different statistical
   approaches for categorical (e.g., GO and pathways terms) and numeric
   terms (expression profile in different tissues). While a fuzzy approach
   is applied for the categorical terms, a Pearson correlation between the
   expression vectors of the candidate gene and the genes in the trained
   list is computed. Finally, a p-value for each functional annotation of
   G is computed using a derivation from the annotation of the genes
   randomly sampled across the genome using the following formula:
   [MATH:
   <mrow><mrow><mi>p</mi><mrow><mo>(</mo><mrow><msub><mi>S</mi><mi>i</mi><
   /msub></mrow><mo>)</mo></mrow><mo>=</mo><mfrac><mrow><mrow><mo>(</mo><m
   row><mtable><mtr><mtd><mrow><mi>N</mi><mi>u</mi><mi>m</mi><mi>b</mi><mi
   >e</mi><mi>r</mi><mo> </mo><mi>o</mi><mi>f</mi><mo> </mo><mi>r</mi><mi>
   a</mi><mi>n</mi><mi>d</mi><mi>o</mi><mi>m</mi><mo> </mo><mi>s</mi><mi>a
   </mi><mi>m</mi><mi>p</mi><mi>l</mi><mi>e</mi><mi>d</mi><mo> </mo><mi>g<
   /mi><mi>e</mi><mi>n</mi><mi>e</mi><mi>s</mi><mo> </mo><mi>w</mi><mi>i</
   mi><mi>t</mi><mi>h</mi><mo> </mo></mrow></mtd></mtr><mtr><mtd><mrow><mi
   >s</mi><mi>i</mi><mi>m</mi><mi>i</mi><mi>l</mi><mi>a</mi><mi>r</mi><mi>
   i</mi><mi>t</mi><mi>y</mi><mo> </mo><mi>s</mi><mi>c</mi><mi>o</mi><mi>r
   </mi><mi>e</mi><mo> </mo><mi>h</mi><mi>i</mi><mi>g</mi><mi>h</mi><mi>e<
   /mi><mi>r</mi><mo> </mo><mi>t</mi><mi>h</mi><mi>a</mi><mi>n</mi><mo> </
   mo><mi>G</mi><mo> </mo></mrow></mtd></mtr></mtable></mrow><mo>)</mo></m
   row></mrow><mrow><mrow><mo>(</mo><mrow><mtable><mtr><mtd><mrow><mi>N</m
   i><mi>u</mi><mi>m</mi><mi>b</mi><mi>e</mi><mi>r</mi><mo> </mo><mi>o</mi
   ><mi>f</mi><mo> </mo><mi>g</mi><mi>e</mi><mi>n</mi><mi>e</mi><mi>s</mi>
   <mo> </mo><mi>i</mi><mi>n</mi><mo> </mo><mi>t</mi><mi>h</mi><mi>e</mi><
   mo> </mo><mi>r</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>o</mi><mi>m</mi><m
   o> </mo><mi>s</mi><mi>a</mi><mi>m</mi><mi>p</mi><mi>l</mi><mi>i</mi><mi
   >n</mi><mi>g</mi><mo> </mo></mrow></mtd></mtr><mtr><mtd><mrow><mi>p</mi
   ><mi>r</mi><mi>o</mi><mi>c</mi><mi>e</mi><mi>s</mi><mi>s</mi><mi>e</mi>
   <mi>s</mi><mo> </mo><mi>w</mi><mi>h</mi><mi>i</mi><mi>t</mi><mi>h</mi><
   mo> </mo><mi>f</mi><mi>u</mi><mi>n</mi><mi>c</mi><mi>t</mi><mi>i</mi><m
   i>o</mi><mi>n</mi><mi>a</mi><mi>l</mi><mo> </mo><mi>a</mi><mi>n</mi><mi
   >n</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>
   n</mi></mrow></mtd></mtr></mtable></mrow><mo>)</mo></mrow></mrow></mfra
   c></mrow></mrow> :MATH]
   (1)

   All the p-values obtained for each G are combined using a Fisher’s
   inverse chi-square method, where the p-values are assumed to come from
   an independent test:
   [MATH: <mrow><mrow><mo>−</mo><mn>2</mn><munderover><mstyle
   mathsize="140%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>i</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>n</mi></munderover><mi>l</mi><mi>o</mi><mi>g</mi><msub><
   mi>p</mi><mi>i</mi></msub><mo>→</mo><msup><mi>X</mi><mn>2</mn></msup><m
   row><mo>(</mo><mrow><mn>2</mn><mi>n</mi></mrow><mo>)</mo></mrow></mrow>
   </mrow> :MATH]
   (2)

   This metanalysis approach followed by multiple testing correction
   substantially reduces the number of false positive functional candidate
   genes.

3. Results

   [98]Figure 1 summarizes the pipeline applied in the present study to
   identify the functional candidate genes, and candidate genetic variants
   harboring those genes, within the differentially co-expressed gene
   modules between HF and SF animals.

Figure 1.

   [99]Figure 1
   [100]Open in a new tab

   Pipeline for identification of functional candidate genes regulating
   differentially co-expressed gene networks between high-fertile and
   sub-fertile beef cows. SF, sub-fertile; HF, high-fertile; INDEL,
   insertion and deletion; SNP, single nucleotide polymorphism; FPKM,
   fragments per kilobase of exon model per million reads mapped; WGCNA,
   weighted gene correlation network analysis; IPA, ingenuity pathways
   analysis; FC, fold-change.

3.1. RNA-Sequencing and Variant Calling Statistics

   The number and percentage of uniquely mapped reads for each sample are
   shown in [101]Table S1. A total of 13,812 genes were expressed in both
   studies ([102]GSE81449 and [103]GSE107891) with an FPKM > 0.2. These
   genes were used for the DEG meta-analysis in order to estimate the
   combined test-statistics (p-value and log2(FC)) for each group (HF and
   SF). Additionally, independently of the DEG meta-analysis, the LRT
   analysis indicated that 2462 genes had the expression determined by the
   study, not by the condition. Consequently, 11,350 genes were used to
   further investigate possible outliers among the samples and initially
   used in the WGCA analysis.

   Regarding the variant calling analysis, the genomic coordinates, type
   (SNP or INDEL), and functional impact for the variants fixed in one of
   the conditions, obtained using Ensembl Variant Effect Predictor (VeP),
   are shown in [104]Table S2. A total of 2254 variants were identified as
   uniquely fixed in the HF group (2113 SNPs and 141 INDELs), while 3117
   variants were uniquely fixed in the SF group (3034 SNPs and 83 INDELs).
   The percentage of each functional class for the fixed variants in each
   group is shown in [105]Figure S1.

3.2. Outlier Detection and Differential Expression Analysis between
High-Fertile and Sub-Fertile Animals

   The outlier detection analysis resulted in the exclusion of one SF
   animal in the [106]GSE81449 dataset (SRR3505358) and one animal from
   each condition (HF: SRX3461001 and SF: SRX3461010) in the
   [107]GSE107891 dataset ([108]Figure S2). The final sample size used in
   the analyses was 17 animals (HF = 9 and SF = 8). The overlapping among
   the DEG genes was calculated for the DEG identified in the
   meta-analysis performed in this study, including the DEG identified
   using the alignment with CLC Bio Genomics against the new bovine
   reference genome ARS-UCD1.2 and the DEG identified originally in the
   previously published studies [[109]18,[110]20]. A very low overlap was
   observed across the differential expression analyses between both
   studies. The network meta-analysis resulted in 14 DEGs ([111]Figure 2a,
   adjusted p-value < 0.1 and |log2(FC)| > 2). Additionally, despite the
   larger number of DEGs, the results obtained with CLC Bio Genomics and
   DESeq2 using as a reference genome for the mapping step the new bovine
   assembly ARS-UCD1.2 showed a small overlapping with the original result
   (maximum overlapping of 11 genes) ([112]Figure 2b). The majority of
   DEGs in the network meta-analysis (using the non-adjusted p-value) are
   shared with the results obtained using the [113]GSE107891 dataset when
   the alignment against the ARS-UCD1.2 reference genome was performed
   using CLC Bio Genomics. The p-values and log2(FC) for all the genes
   evaluated in the network meta-analysis, as well as the DEGs previously
   identified in [114]GSE107891 and [115]GSE81449, are presented in
   [116]Table S3.

Figure 2.

   [117]Figure 2
   [118]Open in a new tab

   Volcano plot for the differentially expressed genes identified in the
   network meta-analysis (non-adjusted p-value < 0.1 and
   |log(fold-change)| > 2) (a) and Venn diagram comparing the results
   obtained in the different datasets of differentially expressed genes
   identified using different bovine reference genomes assemblies (b). In
   (a), the green dots represent differentially expressed genes identified
   using non-adjusted p-value < 0.1 and |log(fold-change)| > 2. The yellow
   dots represent differentially expressed genes identified using a
   |log(fold-change)| > 2. In (b), the differentially expressed genes
   identified in each dataset are represented in red (network
   meta-analysis), yellow ([119]GSE107891 data set using ARS-UCD1.2 bovine
   reference genome), purple ([120]GSE81449 data set using ARS-UCD1.2
   bovine reference genome), dark green ([121]GSE107891 data set using UMD
   3.1 bovine reference genome; Moraes et al. (2018)), and blue
   ([122]GSE81449 data set using UMD 3.1 bovine reference genome; Geary et
   al. (2016)).

3.3. Identification of Candidate Differentially Co-Expressed Gene Modules for
High-Fertile and Sub-Fertile Animals

   The correlation between the HF and SF gene ranked expression values
   (FPKM) obtained from DESeq2 package and ranked connectivity, estimated
   through WGCNA package, was evaluated to estimate the network
   conservation in the combined dataset composed of the samples from
   [123]GSE107891 and [124]GSE81449. A correlation of 0.99 (p < 1 ×
   10^−200) was obtained for the ranked expression values and 0.4 (p-value
   < 1.7 × 10^−191) for the ranked connectivity between the HF and SF
   samples. These results indicate that both datasets are suitable to be
   analyzed together for the identification of co-expressed gene networks
   owing to the strong correlation observed. Thirty-two and 34
   co-expressed gene modules were identified for HF and SF, respectively.
   The identification of DcoEx modules (non-overlapping threshold =
   p-value < 4.59 × 10^−5) resulted in 44 modules, 22 for each condition
   ([125]Table S4). From them, we performed an additional filtering to
   select the modules with at least 1 of the 10 top-hub genes harboring
   fixed functional candidate variants (FCVs) identified in the variant
   calling process (HF = 157 FCV and SF = 214 FCV). The resulting modules,
   called candidate DcoExp modules, were ten and two in the HF and SF
   datasets, respectively ([126]Table 1). The list of enriched biological
   processes, diseases, and functions, as well as upstream regulators
   obtained for each candidate DcoExp module using IPA core module, can be
   found in [127]Table S5. It is important to highlight that, for both
   functional prioritization and canonical metabolic pathway enrichment
   analysis, we used human, mice, and rat annotations owing to the
   database availability and the more complete annotation status for these
   organisms. Consequently, it would be possible to observe small
   differences in the functions performed by some orthologous gene in
   different species. Overall, owing to the close evolutionary
   relationship between these species and cattle, a high level of
   similarities of functions performed by the orthologous genes in those
   species is expected.

Table 1.

   Differentially co-expressed gene modules (DcoExp), and their respective
   top 10 hub-genes between high-fertile (HF) and sub-fertile (SF)
   animals.
   Module Number of Genes Top Hub-Genes
   Cyan HF 204 ANKRD65, TIMM17A, ENSBTAG00000046047, ENSBTAG00000051586,
   RRP1, PWP2, PSMB5, LTBP2, C11H2orf81, ENSBTAG00000050675
   Darkgreen HF 147 SREBF2, DACT2, SDHA, PPFIA4, ENSBTAG00000047824,
   ENSBTAG00000052047, ELF3, ENSBTAG00000051421, NPTN, DHRS4
   Grey60 HF 177 IARS2, ENSBTAG00000053801, ENSBTAG00000033740,
   ENSBTAG00000048975, WRB, ENSBTAG00000052845, FAM214A, EIF2AK3, MPV17,
   MAPKAP1
   Lightgreen HF 136 CEP104, PKP1, PPP1R12B, ENSBTAG00000051541,
   ENSBTAG00000049133, ARF6, NUMB, SLC25A15, EEF1AKMT1, PARP4
   Purple HF 227 TIRAP, PYCR2, FMO2, MIIP, ENSBTAG00000049485,
   ENSBTAG00000054279, ENSBTAG00000052750, EAF1, SDR39U1, TINF2
   Red HF 237 STRADA, ENSBTAG00000054600, MDM4, MARK1, KLHL20, CACYBP,
   ABL2, RABIF, ENSBTAG00000051120, TMEM50B
   Saddlebrown HF 114 NME7, CCDC181, ENSBTAG00000054228, TCTEX1D2, IL20RB,
   ITGB2, ENSBTAG00000023186, F2RL2, OIP5, DUT
   Tan HF 172 ZMAT2, CPOX, IQCG, WDR53, ENSBTAG00000042475, HACL1, INTS14,
   ENSBTAG00000043377, REC8, RPS27L
   Turquoise HF 544 SLC45A3, IKBKE, PIGR, PRRX1, TNFRSF1B, SPSB1, CAMTA1,
   NOL9, TNFRSF25, ARHGEF16
   Green HF 240 CTCF, MIA3, PSEN2, CASZ1, SDF4, COLGALT2,
   ENSBTAG00000054874, ENSBTAG00000051836, ENSBTAG00000051084, GOLGB1
   Lightgreen SF 244 SOX13, TMEM81, COQ8A, ZBTB48, VWA1, EFHD2, PWP2,
   PLEKHO2, NRDE2, MALL
   Paleturqouise SF 130 MTHFR, ENSBTAG00000031572, JDP2, CCDC142, CD8A,
   DNAJC27, ENSBTAG00000053045, ENSBTAG00000048432, CABLES2, NECAB3
   [128]Open in a new tab

   Note: In bold, those genes harboring functional candidate variants are
   highlighted.

3.4. Functional Candidate Genes

   The functional candidate genes, top-hub genes from the candidate DcoExp
   modules, and upstream regulators of the DcoExp modules identified with
   the core analysis from IPA software, harboring FCV, are shown in
   [129]Table 2 and [130]Table 3. This candidate gene list is composed of
   a total of 52 genes: 24 top-hub genes from the candidate DcoExp modules
   harboring FCV, and 28 upstream regulators harboring FCV. Additionally,
   functional candidate genes were using a “guilt by association”-based
   prioritization analysis (ToppGene sofware) using a trained dataset of
   genes related to fertility obtained from GUILDify. The adjusted (FDR)
   overall p-value of the significantly prioritized candidate genes from
   the functional prioritization analysis is shown in [131]Table 2. It is
   important to highlight that the top hub-gene from Cyan module
   (ENSBTAG00000046047) with a fixed candidate variant was not analyzed
   using ToppGene owing to the lack of gene symbol annotation (even after
   the annotation by homology process). The functional prioritization
   resulted in 32 significantly prioritized functional candidate genes.
   The enriched terms for the traied dataset and the complete
   prioritization result for the candidate genes are provided in
   [132]Table S6. The FCV identified within the genomic coordinates of the
   prioritized genes and the corresponding functional consequence is shown
   in [133]Table S7. The relationship between the functional candidate
   genes and the candidate DcoExp modules is shown in [134]Figure 3. The
   turquoise HF module showed the highest number of related prioritized
   functional candidate genes (14), while the pale turquoise SF module
   showed the smallest number of related genes (1). The percentage of each
   functional consequence and the number of fixed variants per each
   prioritized functional candidate gene is shown in [135]Figure 4. In
   total, 100 FCV were identified in the 32 significantly prioritized
   functional candidate genes.

Table 2.

   Prioritization result for top hub-genes, harboring functional candidate
   variants, from differentially co-expressed modules between high-fertile
   (HF) and sub-fertile (SF) animals.
   Gene ID Mapped Candidate Modules Top Hub-Gene Candidate Modules Overall
   Adjusted p-Value (FDR 5%)
   ENSBTAG00000046047 Cyan HF Cyan HF NA *
   PWP2 Cyan HF, Lightgreen SF Cyan HF, Lightgreen SF 0.112
   DACT2 Darkgreen HF Darkgreen HF 0.028 **
   MIA3 Green HF Green HF 0.032 **
   COLGALT2 Green HF Green HF 0.109
   SKA2 Grey60 HF Grey60 HF 0.05
   MAPKAP1 Grey60 HF Grey60 HF 0.018 **
   PPP1R12B Lightgreen HF Lightgreen HF 0.026 **
   SLC25A15 Lightgreen HF Lightgreen HF 0.09
   EEF1AKMT1 Lightgreen HF Lightgreen HF 0.1
   PARP4 Lightgreen HF Lightgreen HF 0.052
   FMO2 Purple HF Purple HF 0.061
   MDM4 Red HF Red HF 0.018 **
   RABIF Red HF Red HF 0.109
   CCDC181 Saddlebrown HF Saddlebrown HF 0.085
   F2RL2 Saddlebrown HF Saddlebrown HF 0.028 **
   IQCG Tan HF Tan HF 0.022 **
   HACL1 Tan HF Tan HF 0.09
   PIGR Turquoise HF Turquoise HF 0.026 **
   ARHGEF16 Turquoise HF Turquoise HF 0.028 **
   NRDE2 Lightgreen SF Lightgreen SF 0.278
   IFT80 Turquoise HF, Paleturquoise SF Paleturquoise SF 0.028 **
   [136]Open in a new tab

   * NA = not applicable as it was not possible to obtain a gene symbol
   for this transcript; ** significant prioritization at a significance
   level of 0.05 after false discovery rate (FDR) correction.

Table 3.

   Prioritization result for the significant upstream regulators,
   harboring functional candidate variants, from differentially
   co-expressed modules between high-fertile (HF) and sub-fertile (SF)
   animals.
   Gene ID Mapped Candidate Modules Regulated Module Target Genes Adjusted
   p-Value (FDR 5%) Upstream Regulation Overall Adjusted p-Value (FDR 5%)
   for Prioritization
   EGFR Darkgreen HF Turquoise HF BIRC5, CCNA2, CXCL5, E2F1, EXOSC5,
   FKBP11, FOXP3, GFAP, HMGB3, HNRNPA1, IGBP1, ITGA6, MYBL2, PDK1, PROM1,
   PSEN1, RANBP1, SEMA7A, SKP2, TUBA4A, TUBB4A, VEGFA 0.003 0.002 *
   EGFR Darkgreen HF Cyan HF CCT5, EIF5A, EPS15, GADD45A, NUTF2, ODC1,
   PPIA, PSMB5, STAT3, TPST1 0.003 0.002 *
   ETV5 - Turquoise HF AQP5, CHSY1, KRT19, KRT7, MYB, RAB27A, TJP3, VEGFA
   0.016 0.009 *
   KLF4 - Turquoise HF CCND2, CRABP2, DUSP5, E2F1, HES1, KRT14, KRT19,
   KRT7, MSX2, PAX2, PROM1, VEGFA, WNT5A 0.032 0.006 *
   TCHP - Turquoise HF VEGFA 0.039 0.028 *
   COX7A2 - Turquoise HF STAR 0.039 0.147
   PIK3C2A - Turquoise HF VEGFA 0.039 0.009 *
   ARID4A - Turquoise HF E2F1, FOXP3 0.039 0.046 *
   ARID4A - Saddlebrown HF HOXB6 0.039 0.046 *
   ARID4A - Grey60 HF HOXB3, HOXB5 0.039 0.046 *
   CUX1 Cyan HF Turquoise HF CCNA2, LTF, RAB36, WNT5A 0.047 0.006 *
   PGR Turquoise HF Turquoise HF AK3, HES1, HPGD, ITGA6, MSX2, NPC1,
   PDGFA, PGR, PPM1H, PRRX1, TAT 0.047 0.006 *
   PGR Turquoise HF Cyan HF LIG1, MAP2K3, STAT3, TSC22D3, UCK2, URB2 0.047
   0.006 *
   IPO9 - Turquoise HF PTK2B 0.047 0.09
   DLG1 Red HF Tan HF KCNJ2 0.045 0.006 *
   AGER - Saddlebrown HF CCL4, TJP1 0.046 0.006 *
   SORT1 - Saddlebrown HF UBE2I 0.047 0.009 *
   HNRNPAB Purple HF Saddlebrown HF TJP1 0.047 0.031 *
   TBX6 Tan HF Red HF HES7 0.049 0.024 *
   HSF1 - Purple HF CCT4, FKBP4, HSF2, HSP90AA1, HSPA8, HSPH1, KNTC1,
   RELA, SPHK2, STIP1 0.003 0.009 *
   HSF1 - Darkgreen HF CSRP2, EFEMP1, INHBB, RPL22 0.003 0.009 *
   NUB1 - Red HF NEDD8 0.046 0.088
   DPH5 - Red HF NFKBIA, RELA 0.046 0.077
   LEPR Darkgreen HF Lightgreen HF ANGPTL4, CDK2, MMP7, PLP1, SOCS2 0.044
   0.006 *
   DYSF Tan HF Lightgreen HF CD48, DNAJB1, FCGR2B 0.044 0.012 *
   API5 Purple HF Lightgreen HF CDK2 0.044 0.077
   BCKDK - Lightgreen HF PLP1 0.047 0.041 *
   DUSP16 Turquoise HF Lightgreen HF VCAM1 0.047 0.014 *
   CHFR Cyan HF Grey60 HF PLK1 0.046 0.032 *
   PROM1 Turquoise HF Green HF DSG2 0.044 0.006 *
   ERN2 Grey60 HF Green HF XBP1 0.046 0.031 *
   UTP3 - Darkgreen HF IGLL1/IGLL5 0.043 0.112
   RDH10 - Darkgreen HF RDH5 0.045 0.032 *
   [137]Open in a new tab

   * Significant prioritization at a significance level of 0.05 after
   false discovery rate correction.

Figure 3.

   [138]Figure 3
   [139]Open in a new tab

   Relationship between the prioritized functional candidate genes
   (left-hand side) and the candidate differentially co-expressed modules
   (DcoExp, right-hand side). The genes in bold are the top hub-genes from
   the DcoExp modules, while the other genes are the significant upstream
   regulators genes. On the right-hand side, the HF modules are those
   identified in the high-fertile animals and the SF modules are those
   identified in the sub-fertile animals.

Figure 4.

   [140]Figure 4
   [141]Open in a new tab

   Percentage of each functional consequence annotated (A) and number of
   fixed variants per each functional candidate gene (B). The red bars in
   B represent genes where the fixed variants were identified in
   sub-fertile animals, while the green bars represent those variants
   fixed in high-fertile animals.

4. Discussion

4.1. Network Meta-Analysis for Identification of Differentially Expressed
Genes between High-Fertile and Sub-Fertile Animals

   The integration of multiple datasets and genetic information from
   different levels has been shown to be a powerful strategy for the
   identification of candidate genes in livestock
   [[142]3,[143]49,[144]50,[145]51]. The network meta-analysis performed
   in the present study reinforced the association between the expression
   profile of some genes with the high- or sub-fertile condition. The 14
   DEGs identified by the network meta-analysis (adjusted p-value < 0.1
   and |log(fold-change| > 2) were not identified as DE in the original
   results of both previous studies ([146]GSE107891 and [147]GSE81449)
   described by [[148]18,[149]20]. This result can be explained by the
   differences in the assemblies, alignment, and quantification
   algorithms, as well as the detection power improvement observed in the
   meta-analysis performed in the present study. However, 11 genes were
   shared with the [150]GSE107891 dataset using the new bovine reference
   genome ARS-UCD1.2 by CLC BIO genomics. In general, a small overlap was
   observed in all of the comparison scenarios ([151]Figure 2b). The
   Simpson’s paradox is a common phenomenon observed in biological
   analysis that can help to address these differences across the
   analyses. Briefly, the Simpson’s paradox occurs when results from
   combined datasets contradict those from the individual analysis. The
   impact of Simpson’s paradox was already discussed in several fields,
   such as gene expression network analysis [[152]52,[153]53,[154]54].
   Despite biological bases for the Simpson’s paradox still being poorly
   understood, there are some points that must be highlighted in the
   present study. First, despite both studies analyzing the transcriptome
   of the same tissue (endometrium), [155]GSE81449 analyzed a dataset
   obtained from endometrial biopsy from day 14 post-estrus, while
   [156]GSE107891 analyzed a dataset from day 17 post-estrus. This
   difference, together with the population effect, the new mapping
   strategy (different software and bovine reference genome), and the
   software used for DE analysis, can affect the transcriptional profile
   of each sample, consequently resulting in a strong study-dependent
   effect (as observed in the LRT analysis). Additionally, as shown in
   [157]Table S3, few genes were identified as DE between HF and SF
   animals in the original results of both studies, with the majority of
   these DEGs composed by non-annotated genes (LOCs) or genes with poorly
   understood biological function. Here, a larger number of DEGs were
   obtained in the individual datasets. However, the network meta-analysis
   resulted in a similar number of DEGs, but used a non-adjusted p-value
   threshold. These results reinforce the difficulty in validating and
   identifying functional candidate genes using the traditional gene by
   gene differential expression analysis when complex traits are analyzed
   without the comparison of extreme groups, such as HF and SF cows. For
   example, Moraes et al. (2018) [[158]18] identified a significantly
   larger number of DEGs when comparing high-fertile versus infertile
   animals. Consequently, new strategies must be applied to better
   understand the genetic differences between HF and SF animals.

4.2. Differentially Co-Expressed Modules and the Identification of Functional
Candidate Genes

   The combination of co-expression gene networks, the identification of
   top hub-genes, the identification of fixed genetic variants in HF or SF
   group of cows, and the functional analysis using the p-values and
   log2(FC) obtained in the network meta-analysis were used to prospect
   for functional candidate genes regulating the differences between
   fertile groups. The integration of different sources of biological
   information is a powerful tool for the identification of functional
   candidate genes, which has already resulted in interesting results in
   livestock species [[159]3,[160]49,[161]50,[162]51,[163]55,[164]56].
   Here, in order to avoid a massive discussion about all the results
   obtained, only the main achievements will be addressed. Thirty-two
   prioritized functional candidate genes (22 upstream regulators and 10
   top hub-genes), related to 11 candidate DcoExp modules, were
   identified.

   The upstream regulators genes of candidate DcoExp modules, harboring
   functional candidate variants, were follows: DLG1 (Tan HF), AGER
   (Saddlebrown HF), SORT1 (Saddlebrown HF), HNRNPAB (Saddlebrown HF),
   TBX6 (Red HF), HSF1 (Purple HF and Darkgreen HF), LEPR (Lightgreen HF),
   DYSF (Lightgreen HF), BCKDK (Lightgreen HF), CHFR (Grey60 HF), ERN2
   (Green HF), and RDH10 (Darkgreen HF). Among these genes, it was
   possible to identify relevant biological processes associated with
   fertility, such as oocyte polarization during maturation (DLG1),
   disorders of the müllerian ducts (TBX6), regulation of affects gonads
   or gonadotrophs (LEPR), and reproductive success (HSF1)
   [[165]57,[166]58,[167]59,[168]60,[169]61,[170]62].

   In the HF group, the turquoise module showed the largest number of
   related prioritized functional candidate genes, 14 in total. This
   module is enriched for relevant biological processes (FDR < 0.05), such
   as regulation of inositol metabolism (3-phosphoinositide degradation,
   superpathway of inositol phosphate compounds, D-myo-inositol
   (1,4,5,6)-tetrakisphosphate biosynthesis, and so on) and
   estrogen-mediated S-phase entry. The inositol metabolism and the cell
   cycle mediated by estrogen activity are very important signaling
   pathways associated with the cellular proliferation in the uterus
   [[171]63]. Genes located within these modules such as DUSP16, DUSP2,
   and PIK3AP1 are directly associated with the regulation of
   phosphatidylinositol activity [[172]64,[173]65].

   The top hub-genes from turquoise HF module, harboring fixed variants in
   HF or SF animals, and prioritized in the functional analysis, were
   ARHGEF16, IFT80, and PIGR. Rho Guanine nucleotide exchange factor 16
   (ARHGEF16) codifies an ELMO1 interacting protein responsible to promote
   the clearance of apoptotic cells in a RhoG-dependent and
   Dock1-independent manner [[174]66]. The relationship between ARHGEF16
   and fertility has yet to be described in the literature, such as by
   Elliott et al. (2010) and Gong et al. (2018). Interestingly, ELMO1
   knockout mice presented multinucleated giant cells, uncleared apoptotic
   germ cells, and decreased sperm output in Sertoli cells owing to the
   phagocytic deficiency [[175]67,[176]68]. However, there is no link
   between Elmo1 and female fertility currently described. The
   intraflagellar transport 80 (IFT80) codifies a intraflagellar transport
   protein responsible for regulating the Jeune asphyxiating thoracic
   dystrophy, osteoblast, and chondrocyte differentiation
   [[177]69,[178]70,[179]71]. Despite the absence of a direct link between
   IFT80 and fertility, the regulation of osteoblast and chondrocyte is
   essential for embryo survival [[180]72,[181]73]. Interestingly, VEGFA,
   another gene located within the turquoise HF module, is crucial for
   chondrocyte survival and bone development in the embryonic stage
   [[182]74]. The polymeric immunoglobulin receptor (PIGR) encodes a
   poly-g receptor in epithelial cells responsible for controlling the
   transcytosis process that can be regulated by steroids, such as
   estrogen [[183]75,[184]76]. In uterine epithelial cells, PIGR is
   responsible for transporting polymeric IgA. In rats, the
   transcriptional levels of PIGR are higher in the estrous when compared
   with proestrus or diestrus [[185]77]. Variations in the immunoglobulin
   diversity and quantity in the uterus are observed during the ovulatory
   process, indicating a key regulatory role of ovarian hormones.
   Consequently, this suggests there is an impact on fertility
   [[186]78,[187]79]. Additionally, proteomic analysis of fertile and
   sub-fertile hens suggested that the levels of PIGR decreased 24 h after
   insemination in the uterine fluid, with the main location in the
   uterovaginal sperm storage tubules (SST), suggesting a response caused
   by the sperm arrival. These results suggest that PIGR responds to sperm
   arrival in both scenarios in sub-fertile hens, which could be a result
   of the higher transport activity of IgA and secretory complex to the
   lumen of SST [[188]80]. It is important to highlight that PIGR was the
   second candidate gene with the highest number of fixed functional
   candidate variants. All the variants were identified as fixed in HF
   animals. Five missense variants were identified as fixed in the PIGR
   gene, where four were previously described (rs41790811, rs41790822,
   rs41790826, and rs41580873) and one is a new variant (c.627A > C or
   p.Ile162Arg). Two of these variants (rs41790822 and rs41790826) have a
   predicted deleterious effect based on the Sorting Intolerant from
   Tolerant (SIFT) score (0.04 and 0.05, respectively). The identification
   of missense variants with predicted deleterious effect in the HF
   animals might corroborate the hypothesis that higher activity levels of
   PIGR are associated with the sub-fertile phenotype in hens, as proposed
   by Riou et al. (2019) [[189]80].

   The significant upstream regulators genes identified by the IPA core
   analysis associated with the turquoise HF module and harboring fixed
   functional candidate variants were PGR, EGFR, PIK3C2A, CUX1, TCHP,
   ETV5, KLF4, and ARID4A. The progesterone receptor (PGR) is crucial for
   the initiation of pregnancy and subsequent preservation of uterus
   health [[190]81]. Consequently, PGR is an interesting marker for
   uterine receptivity during implantation [[191]82]. In this study, three
   fixed SNPs in HF animals were identified in the downstream region of
   PGR. Two of those SNPs were already described (rs208289597 and
   rs208479533) and one is new (g.- 1458T > C). It is important to
   highlight that the PGR was identified as an upstream regulator of the
   turquoise module, however, it is also one of the genes that composes
   this module. Additionally, PGR was also identified as an upstream
   regulator of the cyan HF module. Epidermal growth factor receptor
   (EGFR) is a member of the epidermal growth factor family that plays
   crucial roles in the regulation of female fertility [[192]83]. Among
   its functions, EGFR regulates puberty, oocyte maturation, uterine
   development, embryo implantation, and placental overgrowth [[193]83].
   Phosphatidylinositol-4-phosphate 3-kinase catalytic subunit type 2 α
   (PIK3C2A) is a member of the PI3-kinases acting in cell proliferation,
   oncogenic transformation, cell survival, cell migration, and
   intracellular protein trafficking [[194]84]. The phosphatidylinositol 3
   kinase (PI3K) pathway plays a crucial role in the control of mammalian
   oocyte growth and early follicular development [[195]85]. Cut like
   homeobox 1 (CUX1) codifies a DNA binding protein related to the control
   of cell cycle progression, cell motility, and invasion [[196]86].
   Recently, SNPs on CUX1 were associated with cow and heifer conception
   rate in Holstein cattle [[197]87]. Trichoplein keratin filament binding
   (TCHP) is associated with cytoskeleton remodeling neurofilaments and
   vulvar sarcoma according to MalaCards database (ID: VLV038). However,
   there is no direct evidence linking this gene with fertility status.
   ETS variant 5 (ETV5) is a transcription factor that plays crucial roles
   in male fertility, acting in the spermatogonial stem cell self-renewal
   and maintenance of spermatogonial stem cell niche
   [[198]88,[199]89,[200]90]. Female knockout mice for ETV5 are infertile
   owing to a decreased ovulation and no interest in mating [[201]91].
   Kruppel like factor 4 (KLF4) is a transcription factor required for
   normal development of the barrier function of skin and with the ground
   state of pluripotent stem cells [[202]92]. Regarding female fertility,
   KLF4 mediates the anti-proliferative effects of progesterone during the
   G0/G1 arrest in endometrial epithelial cells [[203]93]. AT-rich
   interaction domain 4A (ARID4A) is a nuclear binding protein that acts
   as a transcriptional coactivator of androgen receptor and
   retinoblastoma-binding protein during the regulation of Sertoli cell
   function, consequently playing a crucial role in male fertility
   [[204]94]. However, disregarding the impact of target therapies for
   ovarian and endometrial cancer, there is no direct link between ARID4A
   and female fertility [[205]95]. Interestingly, dual specificity
   phosphatase 16 (DUSP16) and prominin 1 (PROM1) are genes within the
   turquoise HF modules that act like top hub-genes in other candidate
   DcoExp modules (ligthgreen HF and green HF, respectively). The function
   of DUSP16 was described previously, while PROM1 is a pentaspan
   transmembrane glycoprotein that was already identified as DE during the
   window implantation [[206]96]. It is important to highlight that, even
   without a direct discussion addressed, all these genes harbour fixed
   variants identified in HF or SF animals.

   The other top hub-genes of candidate DcoExp modules in the HF group,
   harboring fixed functional variants, were as follows: IQCG (Tan H),
   MAPKAP1 (Grey60 HF), MDM4 (Red HF), F2RL2 (Saddlebrown HF), MIA3 (Green
   HF), and PPP1R12B (Lightgreen HF). IQ motif containing G (IQCG) was the
   gene with the largest number of fixed variants (14 variants, all mapped
   in the upstream region of the gene and fixed in HF animals). IQ motif
   containing G is a key regulator of ciliary/flagellar motility that
   plays a crucial role in the formation of the sperm flagellum and
   spermiogenesis in mice [[207]97,[208]98]. However, to the best of our
   knowledge, there is no direct link between female fertility and IQCG.
   The MAPK associated protein 1 (MAPKAP1) is a member of FRAP1 complex.
   The disruption of the FRAP1 complex leads to post implantation
   lethality caused by the impaired cell proliferation and hypertrophy of
   embryonic disc and trophoblast [[209]99]. The MAPKAP1 gene is one of
   the top hub-genes of the Grey60 HF module, which is enriched for
   several fertility related biological processes, such as cell cycle
   regulation, PI3K signaling pathway, and triacylglycerol metabolism.
   MDM4 regulator of P53 (MDM4) encodes a protein responsible for
   inhibiting p53 function. Regarding fertility, MDM4 variants are
   associated with the susceptibility of ovarian and endometrial cancer
   [[210]100]. Coagulation factor II thrombin receptor like 2 (F2RL2) is a
   transmembrane G protein-coupled cell surface receptor that is
   differentially expressed (down-regulated) during pregnancy [[211]101].
   Additionally, the Saddlebrown HF module is enriched for
   fertility-related processes such as androgen signaling, sperm motility,
   and estrogen-dependent breast cancer signaling. To the best of our
   knowledge, MIA SH3 domain ER export factor 3 (MIA3) and protein
   phosphatase 1 regulatory subunit 12B (PPP1R12B) do not have a direct
   link with fertility.

   Interestingly, the only candidate DcoExp module identified in the SF
   animals maintained after the functional analysis was the pale turquoise
   SF. The top-hub gene identified in this module harboring functional
   candidate variants was IFT80. This gene was also identified as a
   hub-gene in the turquoise HF module and its association with fertility
   was mentioned above. The fixed variant identified in IFT80 is mapped in
   a splice donor site and it was fixed in the SF animals. The analysis of
   the enriched canonical pathways, diseases, and functions enriched for
   this module highlighted a specialization for cell cycle, embryonic
   development, and cell death and survival. Additionally, an interesting
   overlap between the canonical pathways enriched in the turquoise HF and
   the pale turquoise SF is observed regarding the metabolism of inositol,
   which, as described before, is related to fertility status. The pale
   turquoise SF was also enriched for ketogenesis and ketolysis (FDR <
   0.05), which are relevant processes in animals subjected to a high
   selective pressure for production traits. Within this module,
   3-hydroxybutyrate dehydrogenase 2 (BDH2) is the main gene associated
   with ketone body metabolism [[212]102,[213]103]. Interestingly, a
   polymorphism in BDH2 was associated with days open and services per
   conception in Holstein cows, reinforcing a possible role of this gene
   with fertility traits in cattle [[214]104].

5. Conclusions

   The results obtained here reinforce the increase in detection power for
   functional candidate genes in the context of data integration. The
   approach applied in the present study, combining co-expression gene
   networks, identification of top hub-genes, identification of fixed
   genetic variants in HF or SF cows, and functional analyses, resulted in
   the identification of highly relevant functional gene networks and
   candidate genes associated with fertility status in beef cattle. These
   results contribute to the better understanding of the genetic
   architecture of high- and sub-fertility cows. Additionally, candidate
   functional variants were identified uniquely fixed in one of the
   fertility groups (HF or SF animals) in the functional prioritized
   genes. Consequently, these variants could be used in genetic selection
   programs to validate the contribution of these variants in the
   fertility status. In the long-term, the results obtained here may help
   to reduce the frequency of subfertility in beef herds, helping to
   reduce the economic losses caused by this condition.

Acknowledgments