Abstract

   Boar taint (BT) is an offensive odour or taste observed in pork from a
   proportion of non-castrated male pigs. Surgical castration is effective
   in avoiding BT, but animal welfare issues have created an incentive for
   alternatives such as genomic selection. In order to find candidate
   biomarkers, gene expression profiles were analysed from tissues of
   non-castrated pigs grouped by their genetic merit of BT. Differential
   expression analysis revealed substantial changes with log-transformed
   fold changes of liver and testis from −3.39 to 2.96 and −7.51 to 3.53,
   respectively. Co-expression network analysis revealed one module with a
   correlation of −0.27 in liver and three modules with correlations of
   0.31, −0.44 and −0.49 in testis. Differential expression and
   co-expression analysis revealed candidate biomarkers with varying
   biological functions: phase I (COQ3, COX6C, CYP2J2, CYP2B6, ACOX2) and
   phase II metabolism (GSTO1, GSR, FMO3) of skatole and androstenone in
   liver to steroidgenesis (HSD17B7, HSD17B8, CYP27A1), regulation of
   steroidgenesis (STARD10, CYB5R3) and GnRH signalling (MAPK3, MAP2K2,
   MAP3K2) in testis. Overrepresented pathways included “Ribosome”,
   “Protein export” and “Oxidative phosphorylation” in liver and “Steroid
   hormone biosynthesis” and “Gap junction” in testis. Future work should
   evaluate the biomarkers in large populations to ensure their usefulness
   in genomic selection programs.

Introduction

   Boar taint (BT) is an offensive odour that occurs in cooked or heated
   pork from a proportion of non-castrated male pigs. Two compounds are
   mainly responsible for BT: skatole (3-methyl-indole) and androstenone
   (5α-androst-16-ene-3-one)^[32]1,[33]2. Skatole is a by-product of
   enzymatic breakdown of L-tryptophan by bacterial action in the large
   intestine and subsequently absorbed into the blood^[34]3. Androstenone
   is a steroid produced in the testis where it is transported by blood to
   the salivary gland and functions as a mating pheromone^[35]4. It is
   synthesised from pregnenolone in association with sexual maturity
   together with other testicular steroids, such as testosterone and
   oestrogens^[36]5,[37]6. Due to their lipophilic properties, the
   compounds accumulate in the adipose tissue and concentrations are
   correlated with the level of sexual maturity at the time of
   slaughter^[38]7,[39]8 and/or the intensity of liver
   degradation^[40]9,[41]10. A distinct relationship between androstenone
   and skatole exists: high concentration of androstenone prevents
   breakdown of skatole by inhibiting enzymes responsible for skatole
   metabolism and causes accumulation in adipose tissues^[42]11.
   Furthermore, androstenone itself is an active contributor to BT, and it
   has been proposed that elevated levels of androstenone are primary to
   elevated levels of skatole in the hierarchy of BT development^[43]12.
   Threshold levels for skatole and androstenone concentrations in fat
   considered to cause BT are 0.5 to 1 µg/ g for androstenone and 0.20 to
   0.25 µg/g for skatole^[44]2.

   Boar taint is the main factor preventing the production of
   non-castrated male pigs in many countries^[45]13 and often leads to
   depreciation of the carcass, even though their production is associated
   with a number of significant economic advantages including increased
   efficiency, leaner carcasses and lower faecal and urinary nitrogen
   losses^[46]14. Furthermore, production of non-castrated males is often
   viewed as advantageous in the context of animal welfare due to the
   absence of surgical castration. Indeed, this practice is often
   performed without anaesthesia or analgesia which results in pain and
   stress for the animal^[47]15, but the practice is justified by its
   alleviation of aggression and sexual activity which can result in leg
   fractures and skin damage^[48]16. Surgical castration is also an
   effective technique for avoiding BT, but increased attention to the
   animal welfare issues has led to a voluntary initiative to end the
   practice in the European Community from January 1, 2018^[49]17.

   In order to avoid BT in the production of non-castrated males without
   surgical castration, two overall solutions have been proposed: (i)
   chemicals and/or chemotherapeutics such as immunocastration and (ii)
   optimised animal breeding programs. For the first solution,
   immunocastration by Improvac^® (Pfizer Ltd, NY, USA) has been found to
   be an effective strategy against BT with no detrimental effects on
   carcass and meat quality^[50]18–[51]23 and with environmental benefits
   such as increased or at least similar growth rates and feed conversion
   efficiencies between immunised and surgically castrated pigs^[52]24.
   However, the European Food Safety Authority (EFSA) has anticipated poor
   public acceptance of immunocastration^[53]25,[54]26 due to the possible
   perception of a link with hormone treatment and possible harmful
   residues^[55]24,[56]27.

   For the latter solution, genomic-based breeding programs have been
   proposed. As skatole and androstenone are moderate to highly heritable
   traits^[57]9,[58]11,[59]28–[60]30, reduction of BT would be possible
   through breeding programs. By this strategy, breeding organisations
   would apply genomic selection on individuals with desirable genotypes
   that are predisposed to low genetic merit of BT compounds without the
   need for trait recording^[61]31. Previous work has indeed established
   low and mostly favourable genetic correlations between BT compounds and
   production traits such as the quality of meat without negatively
   affecting important male fertility trait^[62]32. Thus, selection of low
   genetic merit of BT can be deemed an effective and sustainable
   solution. By employing molecular genetic methods, identification of
   male pigs with low genetic merit of BT is possible, before they
   actually develop the condition or have their traits recorded^[63]33.
   However, this requires knowledge on highly predictive key genes
   associated with the trait of interest. In turn, these genes can be
   applied as biomarkers, e.g. Mazzoni, et al.^[64]34 and Salleh, et
   al.^[65]35. Finding biomarkers with strong predictive powers for a
   trait can be greatly enhanced with omics technologies, such as
   transcriptomics^[66]36. By measuring changes in gene expression
   profiles, analysis of differentially expressed (DE) genes associated
   with BT has been performed in a variety of breeds and experimental
   designs (as reviewed by Zadinová, et al.^[67]37). While many studies
   employed microarrays to obtain gene expression profiles, a few employed
   RNA sequencing (RNA-Seq) which provides far more precise measurements
   of transcript expression levels and their isoforms than traditional
   technologies^[68]38. Another powerful analysis within transcriptomics
   is construction of co-expression networks which reveal co-expressed
   (CE) genes with a high connectivity (“hub genes”). Hub genes are often
   regulatory genes, with major impact on genetic networks and may affect
   the trait of interest^[69]39–[70]41. To the authors knowledge, no
   studies have applied a systems biology approach^[71]36 to genetic merit
   of BT. By this approach, findings from DE and CE analyses from RNA-Seq
   data are integrated in order to enhance discriminatory power of highly
   associated genes, useful as candidate biomarkers within animal
   breeding.

   The aims of this study were: (i) to detect important DE genes in gene
   expression profiles of liver and testis in male pigs associated with
   divergent genetic merit of BT, (ii) to build co-expression networks by
   application of the Weighted Gene Co-expression Network Analysis (WGCNA)
   method^[72]42 and extract important hub genes, (iii) to integrate
   results and extract candidate biomarkers associated with genetic merit
   of BT and evaluate their potential within animal breeding (Fig. [73]1).

Figure 1.

   [74]Figure 1
   [75]Open in a new tab

   Illustration of the study design. From a total of 114 Landrace male
   pigs, 48 pigs were grouped as having low, medium and high boar taint
   (BT) according to estimated breeding values (EBVs) of skatole and human
   nose score of their respective sires. Each group consisted of 16 pigs.
   At 100 kg weight, they were slaughtered and two tissues were sampled.
   Of these, liver and testis tissues were subjected to paired-end RNA
   sequencing. These were analysed for differentially expressed (DE) genes
   and co-expressed (CE) genes which were extracted from co-expression
   networks by a range of statistical methods. In turn, this led to the
   discovery of candidate biomarkers for BT and BT related pathways.

Results

Genetic merit of boar taint

   The animal model in this study comprised of commercial Danish Landrace
   male pigs (n = 114). The pigs were produced from sires with known
   genetic merit of BT assessed from estimated breeding values (EBVs) of
   skatole concentrations in fat and human nose scores (HNS). The genetic
   merit of BT was defined as the summarised EBVs calculated as the sum of
   EBVs of skatole and HNS obtained from the respective sire of each pig.
   In brief, EBVs are a measure of the relative genetic performance
   compared with a contemporary group of a trait and used as an industry
   standard in animal breeding. The full concept, definition and methods
   used to calculate EBVs are given in Bourdon and Bourbon^[76]43. Based
   on the distribution of summarised EBVs, a total of 48 pigs (16 low, 16
   medium and 16 high) were selected and used in the analysis,
   representing a low, medium and high genetic merit of BT. The means of
   the summarised EBVs (±standard deviation) for each group were 0.71
   (±0.19), −0.01 (±0.09) and −0.38 (±0.17) for high, medium and low
   genetic merit of BT, respectively. A list of selected animals and their
   EBVs are available in Supplementary file [77]1.

RNA sequencing of liver and testis tissue

   RNA-Seq was performed on liver and testis tissue of the selected pigs.
   Post-mapping quality control revealed that a mean of 33.25 million
   reads throughout the 96 samples had been successfully mapped to the
   reference genome. The number of reads (±SD) aligned to genes in liver
   and testis tissue were 7.76 (±1.61) million and 7.04 million (±2.44),
   respectively. Quality control revealed a distribution of 50.14% exonic
   reads, 41.29% intergenic reads and 8.57% intronic reads in liver and
   47.42% exonic reads, 37.89% intergenic reads and 14.79% intronic reads
   in testis. After filtering of genes with expression levels equal to or
   fewer than five counts, the number of genes included in the
   differential expression and co-expression analyses were 10,545 for
   liver and 12,731 for testis, respectively.

Identification of differentially expressed genes

   Differential expression analysis of liver and testis by contrast of
   high vs low genetic merit of BT from the three-group study design
   revealed 507 and 5,943 differentially expressed (DE) genes
   (FDR < 0.05), respectively. An additional filtering step which compared
   expression profiles for each DE gene throughout the three groups
   revealed that 241 and 2,315 genes were significantly (P < 0.05)
   different across groups in liver and testis, respectively. The
   log-transformed fold change of the DE genes in liver and testis ranged
   from −3.39 to 2.96 and −7.51 to 3.53, showing substantial gene
   expression differences. The top significant DE genes were MORN2,
   AASDHPPT, SPCS3, SEC. 11  C and NDUFC1 in liver and LAMP1, TNPO1,
   TOPORS, GFPT1 and COX4I1 in testis. The top 25 most significant DE
   genes found in liver and testis are presented in Table [78]1 and
   Table [79]2, respectively. Full result tables are included in
   Supplementary file [80]2.

Table 1.

   Top 25 differentially expressed (DE) genes from liver.
   Gene Description logFC^a FDR^b K-W P.
   MORN2 MORN repeat containing 2 −0.83 0.007 0.0043
   AASDHPPT aminoadipate-semialdehyde dehydrogenase-phosphopantetheinyl
   transferase −0.62 0.009 0.011
   SPCS3 signal peptidase complex subunit 3 −0.64 0.009 0.073
   SEC. 11 C SEC11 homolog C, signal peptidase complex subunit −0.73 0.009
   0.024
   NDUFC1 NADH:ubiquinone oxidoreductase subunit C1 −0.76 0.009 0.0079
   ZNF181 zinc finger protein 181 −0.58 0.009 0.0012
   ARL5A ADP ribosylation factor like GTPase 5A −0.66 0.009 0.0023
   NEURL1 neuralized E3 ubiquitin protein ligase 1 −1.30 0.009 0.0053
   EIF1AY eukaryotic translation initiation factor 1A, Y-linked −0.60
   0.009 0.011
   SGPP1 sphingosine-1-phosphate phosphatase 1 −0.74 0.009 0.0044
   PLGRKT plasminogen receptor, C-terminal lysine transmembrane protein
   −0.65 0.009 0.024
   SFR1 SWI5-dependent homologous recombination repair protein 1 −1.28
   0.009 0.0049
   ATP5H ATP synthase, H+ transporting, mitochondrial Fo complex subunit D
   −0.86 0.009 0.019
   KBTBD3 kelch repeat and BTB (POZ) domain containing 3 −0.69 0.009
   0.0068
   DUS4L dihydrouridine synthase 4-like −0.59 0.009 0.02
   MED21 mediator complex subunit 21 −0.62 0.009 0.015
   COMMD8 COMM domain containing 8 −0.74 0.010 0.016
   SNX14 sorting nexin 14 −0.61 0.010 0.058
   ACTR6 ARP6 actin-related protein 6 homolog (yeast) −0.60 0.010 0.056
   MOCS2 molybdenum cofactor synthesis 2 −0.79 0.010 0.039
   PDCD10 programmed cell death 10 −0.62 0.010 0.018
   ZBTB8OS zinc finger and BTB domain containing 8 opposite strand −0.58
   0.011 0.068
   METTL23 methyltransferase like 23 −0.62 0.011 0.014
   NIPSNAP3B nipsnap homolog 3B (C. elegans) −0.49 0.011 0.014
   IKBIP IKBKB interacting protein −1.03 0.011 0.011
   [81]Open in a new tab

   ^aLogFC = the log2 transformed fold change obtained from the limma
   procedure.

   ^bFDR = the Benjamini-Hochberg (FDR) procedure adjusted P value
   obtained from the limma procedure.

   ^cK-W P. = the P value obtained from Kruskal Wallis test of expression
   profile from each gene throughout the three groups.

   Abbreviations: LogFC log transformed fold change, FDR False Discovery
   Rate, K-W Kruskal Wallis.

Table 2.

   Top 25 differentially expressed (DE) genes from testis.
   Gene Description logFC^a FDR^b K-W P.
   LAMP1 lysosomal-associated membrane protein 1 0.73 8.077E-06 0.015
   TNPO1 transportin 1 −0.81 8.077E-06 1.20E-05
   TOPORS topoisomerase I binding, arginine/serine-rich, E3 ubiquitin
   protein ligase −0.56 8.077E-06 7.10E-06
   GFPT1 glutamine–fructose-6-phosphate transaminase 1 −0.82 1.139E-05
   0.00019
   COX4I1 cytochrome c oxidase subunit IV isoform 1 0.77 1.139E-05 0.059
   RLF rearranged L-myc fusion −0.70 1.139E-05 7.70E-05
   C17orf62 chromosome 17 open reading frame 62 0.91 1.139E-05 1.20E-05
   KIAA1109 KIAA1109 −1.24 1.139E-05 0.00063
   RAB5C RAB5C, member RAS oncogene family 0.67 1.139E-05 0.008
   IRGC immunity-related GTPase family, cinema 0.99 1.139E-05 0.023
   LRPAP1 LDL receptor related protein associated protein 1 0.52 1.181E-05
   0.015
   MZT2A mitotic spindle organizing protein 2 A 0.93 1.181E-05 0.0054
   ZNF3 zinc finger protein 3 −1.59 1.181E-05 0.00043
   KANSL1L KAT8 regulatory NSL complex subunit 1 like −1.22 1.182E-05
   0.00056
   EXOC5 exocyst complex component 5 −1.05 1.302E-05 0.00043
   MAF1 MAF1 homolog, negative regulator of RNA polymerase III 0.57
   1.302E-05 3.00E-04
   PARPBP PARP1 binding protein −1.54 1.302E-05 0.00051
   HLTF helicase-like transcription factor −1.04 1.372E-05 0.00021
   ATR ATR serine/threonine kinase −0.62 1.372E-05 5.50E-05
   ZYG11B zyg-11 family member B, cell cycle regulator −1.19 1.372E-05
   0.00077
   HNRNPAB heterogeneous nuclear ribonucleoprotein A/B 0.64 1.372E-05
   0.015
   PFN1 profilin 1 0.62 1.372E-05 0.0052
   EDF1 endothelial differentiation-related factor 1 1.45 1.372E-05 0.029
   SLC25A39 solute carrier family 25 member 39 0.86 1.372E-05 0.056
   CREBRF CREB3 regulatory factor −1.41 1.372E-05 0.003
   [82]Open in a new tab

   ^aLogFC = the log2 transformed fold change obtained from the limma
   procedure.

   ^bFDR = the Benjamini-Hochberg (FDR) procedure adjusted P value
   obtained from the limma procedure.

   ^cK-W P. = the P value obtained from Kruskal Wallis test of expression
   profile from each gene throughout the three groups.

   Abbreviations: LogFC log transformed fold change, FDR False Discovery
   Rate, K-W Kruskal Wallis.

Gene functional enrichment analysis

   Gene functional enrichment analysis revealed a total of 39 and 421
   significantly enriched (Adj. P-val. < 0.05) gene ontology (GO) terms in
   liver and testis, respectively (Supplementary file [83]3). Subsequent
   filtering of redundant GO terms by semantic similarity further reduced
   the amount in order to visually represent the most enriched terms in
   each tissue (Fig. [84]2). In liver, “Peptide metabolic processes” (Adj.
   P-val. = 1.1 × 10^−4) were the most significantly enriched GO terms of
   Biological Process (BP), “Mitochondrial proton-transporting ATP
   synthase complex, coupling factor F(o)” (Adj. P-val. = 0.9 × 10^−7)
   were the most significantly enriched Cellular Component (CC) terms
   and”Transferase activity, transferring one-carbon groups” (Adj.
   P-val. = 1.8 × 10^−2) were the most significantly enriched Molecular
   Function (MF) terms (Fig. [85]2a). In testis, the GO terms “Organelle
   organization” (Adj. P-val. = 1.8 × 10^−50) were the most significantly
   enriched BP, “Intracellular part” (Adj. P-val. = 4.4 × 10^−115) were
   the most significantly enriched CC and finally, “Transmembrane
   signalling receptor activity” (Adj. P-val. = 1.0 × 10^−61) were the
   most significantly enriched MF (Fig. [86]2b). A total of 11 GO terms
   were found to be significantly (Adj. P-val. < 0.05) enriched in both
   tissues which comprised 5 BP and 6 CC terms. According to results
   obtained from each individual tissue, the most significantly enriched
   common GO terms were “Mitochondrion” (Adj. P-val. = 3.2 × 10^−5),
   “Peptide metabolic process” (Adj. P-val. = 1.1 × 10^−4) and “Cellular
   amide metabolic process” (Adj. P-val. = 1.9 × 10^−4) in liver and
   “Catalytic complex” (Adj. P-val. = 1.0 × 10^−25), “Transferase complex”
   (Adj. P-val. = 5.0 × 10^−20) and “Ribonucleoprotein complex biogenesis”
   (Adj. P-val. = 1.9 × 10^−5) in testis.

Figure 2.

   [87]Figure 2
   [88]Open in a new tab

   Bar plot of enriched gene ontology (GO) terms of (a) liver and (b)
   testis. Y-axis indicates the GO term, X-axis indicates the adjusted
   P-value. Abbreviations: BP Biological process, CC cellular component,
   MF molecular function.

   Gene functional enrichment analysis of a fixed number (n = 507) of the
   top DE genes from each tissue revealed the MF GO terms “Transferase
   activity, transferring one-carbon groups”,
   “S-adenosylmethionine-dependent methyltransferase activity”,
   “Cytochrome-C oxidase activity” with child GO term “Hydrogen ion
   transmembrane transporter activity” and “Peptidase activator activity”
   with child GO term “Cysteine-type endopeptidase regulator activity
   involved in apoptotic process” significantly enriched (Adj.
   P-val. < 0.05) in liver. Furthermore, “Translation factor activity, RNA
   binding”, “Helicase activity”, “Ubiquitin-protein transferase
   activity”, “ Lysine-acetylated histone binding”, “Peptide
   N-acetyltransferase activity”, “P-P-bond-hydrolysis-driven
   transmembrane transporter activity” and “NAD + binding” were
   significantly enriched (Adj. P-val. < 0.05) in testis. The MF GO terms
   from both tissues were visualised with the DE genes mapped to their
   corresponding GO terms and the expression levels in the “low BT” and
   “high BT” groups for liver (Fig. [89]3a) and testis (Fig. [90]3b).

Figure 3.

   [91]Figure 3
   [92]Open in a new tab

   Network visualisation of molecular function gene ontology terms found
   in (a) liver and (b) testis obtained by from a fixed amount of the top
   507 differentially expressed genes. Each term is represented by a
   circle whose diameter is negatively correlated with the adjusted P
   value (pV). Each node represents a gene where the expression levels are
   indicated in the circle where the green group (left side) represents
   the expression level in low BT group and the red group (right side)
   represents the expression level in high BT group.

Pathway enrichment analysis

   Pathway analysis revealed six and 20 significantly enriched (Adj.
   P-val. < 0.05) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways
   in liver and testis, respectively (Supplementary file [93]4). The top
   three significantly enriched pathways found were “Ribosome” (Adj.
   P-val. = 5.0 × 10^−4), “Protein export” (Adj. P-val. = 6.4 × 10^−4) and
   “Alzheimer’s disease” (Adj. P-val. = 4.3 × 10^−3) in liver and
   “Neuroactive ligand-receptor interaction” (Adj. P-val. = 1.1 × 10^−9),
   “Cytokine-cytokine receptor interaction” (Adj. P-val. = 1.1 × 10^−6)
   and “Thyroid hormone signalling pathway” (Adj. P-val. = 7.8 × 10^−5) in
   testis. A selection of pathways deemed biologically interesting were
   visualised with their DE genes mapped to their corresponding pathways
   and the expression levels in the “low BT” and “high BT” in liver
   (Fig. [94]4a) and testis (Fig. [95]4b). These pathways were “Ribosome”,
   “Oxidative phosphorylation”, “Gluthathione metabolism”, “Protein
   export”, “Folate biosynthesis” and “Drug metabolism” in liver and
   “Taste transduction”, “GnRH signalling pathway”, “Drug metabolism”,
   “Steroid hormone biosynthesis” and “Gap junction” in testis.

Figure 4.

   [96]Figure 4
   [97]Open in a new tab

   Network visualisation of a selection of the most biologically
   interesting and overrepresented pathways in (a) liver and (b) testis.
   Each pathway is represented by a circle whose diameter is negatively
   correlated with the adjusted P-value (pV) of the given term. Each node
   represents a gene where the expression levels are indicated in the
   circle where the green group (left side) represents the expression
   level in low BT group and the red group (right side) represents the
   expression level in high BT group.

Co-expression network analysis

   In liver, no modules were found to be significantly (P < 0.05)
   associated with genetic merit of BT, but one module (lightgreen)
   containing 593 genes was found to have a strong tendency of association
   with genetic merit of BT (P = 0.06) with a negative correlation
   (module-trait relationship (r) = −0.27). Hence, the module was selected
   for additional analysis. SERPINC1, PRDX3, HSD17B8, GSTZ1 and NIT1 were
   found to be hub genes (Fig. [98]5a). The intramodular connectivity
   defined as the node degree (k) (links per node) of the individual hub
   gene ranged from 23 to 36 and module membership measure ranged from
   0.77 to 0.84. Enrichment analysis of the genes from the lightgreen
   module revealed “Small molecule catabolic process”, “Lipid metabolic
   process” and “Cofactor metabolic process” as the top three
   significantly (adj. P-val. < 0.05) enriched BP GO terms (Fig. [99]5b).
   Due to its biological relevance, GO term “Organic hydroxyl compound
   metabolic process” was selected for further analysis which revealed
   child GO terms of “Cellular hormone metabolic process”, “Hormone
   metabolic process”, “Steroid metabolic process”, “Cholesterol metabolic
   process” and “Lipid homeostasis” (Fig. [100]5c).

Figure 5.

   [101]Figure 5
   [102]Open in a new tab

   Composite visualisation of the lightgreen module in liver. (a) Network
   visualisation of hub genes in the module. Hub genes are indicated by
   large font and circle size; (b) Bar plot of gene ontology (GO) terms
   related to the module. Y-axis indicate the GO term, X-axis indicate the
   adjusted P-value and (c) analysis of genes related to the GO term
   “Organic hydroxy compound metabolic process”. Abbreviations: BP
   Biological process, CC cellular component, MF molecular function.

   In testis, two modules were found to be significantly (P < 0.05)
   associated with genetic merit of BT: darkgreen and tan containing 180
   and 519 genes, respectively. Furthermore, the darkturquoise module was
   found to have a tendency of association with genetic merit of BT
   (P = 0.05). The module darkgreen containing 180 genes was positively
   correlated with the summarised EBVs (r = 0.31), whereas tan had a
   negative correlation (r = −0.49). COPZ2, RAMP2, VDAC1 and SLC25A6 were
   found to be hub genes in the darkgreen module (Fig. [103]6a) with
   intramodular connectivity ranging from 24 to 35 and module membership
   measure ranging from 0.34 to 0.53. The top three most significantly
   (adj. P-val. < 0.05) enriched BP GO terms attributed to darkgreen
   module were “Carboxylic acid catabolic process”, “Small molecule
   catabolic process” and “Sterol molecule metabolic process”
   (Fig. [104]6b). The GO term “Sterol metabolic process” was selected for
   further analysis which revealed further GO annotations of “Cholesterol
   biosynthetic process”, “Coenzyme binding”, “Flavin adenine dinucleotide
   binding”, “Oxidoreductase activity, acting on the CH-CH group of
   donors”, and “Oxidoreductase activity, acting on the CH-CH group of
   donors, NAD or NADP as acceptor” (Fig. [105]6c). Full result tables
   from co-expression analysis are included in Supplementary file [106]5.

Figure 6.

   [107]Figure 6
   [108]Open in a new tab

   Composite visualisation of the darkgreen module in testis. (a) Network
   visualisation of hub genes in the module. Hub genes are indicated by
   large font and circle size; (b) Bar plot of gene ontology (GO) terms
   related to the module. Y-axis indicates the GO term, X-axis indicates
   the adjusted P-value and (c) analysis of genes related to the GO term
   “Steroid metabolic process”. Abbreviations: BP Biological process, CC
   cellular component, MF molecular function.

Candidate biomarker selection

   By combining results from differential expression analysis with
   co-expression analysis and applying a reductionist strategy, a total of
   16 candidate biomarkers were detected as potential biomarkers: COQ3,
   COX6  C, GSTO1, GSR, FMO3, CYP2J2, CYP2B6 and ACOX2 in liver
   (Fig. [109]7a) and HSD17B7, HSD17B8, MAPK3, MAP2K2, MAP3K2, STARD10,
   CYB5R3 and CYP27A1 in testis (Fig. [110]7b). Briefly, the reductionist
   strategy comprised selection of candidate biomarkers from genes that
   were found as significant (FDR < 0.05) in the differential expression
   analysis and significant (P < 0.05) in additional filtering by
   expression profile comparison. Furthermore, the genes were either: (i)
   annotated to relevant GO terms and/or pathways and/or (ii) part of a
   co-expression module associated with genetic merit of BT.

Figure 7.

   [111]Figure 7
   [112]Open in a new tab

   Selected candidate biomarkers with their associated box plot from (a)
   liver and (b) testis. Y-axis indicates normalised expression levels.
   X-axis indicates low, medium and high boar taint (BT) groups. Finally,
   the P-value (p) indicates the result from a separate Kruskal-Wallis
   test on expression levels of the candidate biomarker throughout the
   three BT groups.

Discussion

   In this study, candidate biomarkers associated with divergent genetic
   merit of boar taint (BT) was identified from RNA sequencing (RNA-Seq)
   of liver and testis of Danish Landrace pigs. The genetic merit of BT
   was defined from the estimated breeding values (EBVs) of skatole
   concentrations and human nose score (HNS) obtained from the sires of
   each pig. The HNS method has been shown to be a rapid, sensitive and
   cost-effective system with good correlations with androstenone and
   skatole levels and contains the ability to capture subjective variation
   not obtained by laboratory assays^[113]44. However, the HNS method is
   prone to some limitations that are intrinsically linked to using human
   beings as trained panellists: (i) regional differences in panels
   perception of BT compounds and training of panellists^[114]45; (ii)
   moderate reproducibility due to differences in human sensitivity and
   perception of BT even in experienced panellists^[115]46 and (iii)
   accuracy of panels is related to the number of panellists and their
   individual olfactory acuity^[116]47. The EBVs (the relative genetic
   performance compared with a contemporary group^[117]43) were calculated
   by Genomic BLUP Animal Models^[118]48,[119]49 and is used by breeding
   organisations as a measure of genetic merit. This is related to the
   advantage of EBVs which can be calculated for a large number of animals
   without phenotypic data via their pedigree or genomic relationship with
   animals having phenotypic data by a Best Linear Unbiased Prediction
   (BLUP) framework. Collecting phenotypic data of BT traits on large
   number of animals is also very prohibitive. As this study was concerned
   with animal breeding under industry settings, we integrated EBVs from
   the breeding sires with gene expression profiles of their progeny to
   find candidate biomarkers associated with genetic merit of BT. Hence,
   biomarkers will be able to identify pigs with low genetic merit of BT
   for selection, before they develop BT and have the trait recorded which
   is of high relevance in practical and cost-effective breeding programs.
   The EBVs were adjusted for age (in days) and carcass weight (in kg) at
   slaughter. Thus, systematic differences in sexual maturation between
   pigs were accounted for in order to ensure consistent steroidogenic
   potential in the three groups, as warranted in previous
   research^[120]33. Gene expression profiles were not subjected to
   further correction by any covariates other than technical
   normalisation. Previous work found EBVs as a very reliable phenotype
   indicator^[121]50 and they have been successfully applied in gene
   co-expression analysis^[122]51.

   A shortcoming of the current study is the lack of phenotypic data. Due
   to the commercial nature of the pigs, we did not have access to
   phenotypic measurements. In a previous study by Lervik, et al.^[123]50,
   the correlation between EBVs of androstenone and the androstenone
   phenotype was high. On the other hand, skatole has a slightly lower
   heritability than androstenone^[124]30 and we expect some disparity due
   to lack of correlation between EBVs and phenotype in the individual pig
   at slaughter weight (~100 kg). Thus, the lack of phenotypic data means
   that any disparity between genetic merit and BT phenotype cannot be
   identified or quantified. However, we emphasise that the current study
   identifies candidate biomarkers of genetic merit of BT and the
   candidate biomarkers found in this study cannot be used directly for
   phenotype identification without prior validation.

   Liver metabolism of xenobiotic substances is divided into two phases:
   phase I metabolism which comprises oxidation, reduction and
   hydroxylation steps to make the substrate more soluble in water by
   adding or exposing a functional group and phase II sulfoconjugation
   which adds polar groups for increased hydrophilicity and
   excretion^[125]52,[126]53. Related to the phase II sulfoconjougation,
   transferase activity was found to be highly enriched GO term in liver,
   but not in testis (Fig. [127]2a). Transferases are a large group of
   enzymes involved in transfer of specific functional groups from one
   molecule to another and previous studies have indicated their
   importance in catalysis of phase II sulfoconjugation of 16-androstene
   steroids in liver and testis^[128]52,[129]54. All eleven DE genes
   annotated to “Single-carbon transferase activity” in liver
   (Fig. [130]3a) were downregulated in high BT pigs, indicating a strong
   decrease in phase II metabolism of BT compounds. Furthermore,
   transferase activity was found enriched in both tissues which
   emphasises the importance of transferase-related genes of BT.
   Interestingly, COQ3 encodes an O-methyltransferase that is required for
   two steps in the biosynthetic pathway of coenzyme Q 10 (CoQ10). CoQ10
   is involved in the respiratory chain and participates in aerobic
   cellular respiration which generates energy in the form of ATP^[131]55.
   In mammals, tissues with high energy demands, such as heart, liver and
   kidney, have the highest CoQ10 concentrations^[132]56,[133]57. As the
   BT condition is dependent on the level of liver degradation, it is
   likely that the downregulated expression of COQ3 reflects insufficient
   metabolism. Hence, COQ3 is considered a candidate biomarker for BT in
   liver (Fig. [134]7a).

   Important enriched MF terms in liver included “cytochrome-c oxidase
   activity” which comprised isoforms of COX6 and COX7. These genes encode
   subunits of cytochrome-c oxidase, the terminal enzyme of the
   mitochondrial respiratory chain catalysing the electron transfer from
   reduced cytochrome-c to oxygen^[135]58. Previous studies reported genes
   encoding cytochrome oxidases of the mitochondrial respiratory chain to
   be associated with skatole levels^[136]12,[137]59–[138]62.
   Specifically, the cytochrome subunit family COX4I1 and COX8 C encoding
   cytochrome-c oxidases were involved in regulation of skatole
   metabolism^[139]63,[140]64, which was attributed to their involvement
   in the phase I oxidation reactions in liver. In this study, the
   cytochrome-c oxidases COX6  C, COX7AL, COX7B2 and COX7A1 were all
   downregulated in the high BT pigs which may reflect the situation of
   deficient skatole metabolism in high BT pigs. Another strong indicator
   of the importance of cytochrome-c oxidases in the context of BT is the
   high enrichment of mitochondrial CC GO terms where the enzymes are
   located. Hence, COX6  C is considered a candidate biomarker for BT in
   liver (Fig. [141]7a). Another important MF term found enriched in liver
   is the oxidoreductases, specifically the “CH-CH oxidoreductases” which
   participate in the phase I metabolism of BT compounds and play
   important enzymatic roles, some which have not yet been precisely
   clarified^[142]65. In this study, the “CH-CH oxidoreductases” comprised
   ACAD10, ACADSB, ACOX2, CPOX and DUS4L (Supplementary file [143]3) which
   are commonly found in gene expression studies for BT in liver^[144]65
   and testis^[145]66,[146]67. Interestingly, all five genes were
   downregulated in the high BT pigs (Supplementary file [147]2), which
   may be attributed to the insufficient liver degradation of BT
   compounds. ACOX2 encodes a protein of the acyl-Coenzyme A oxidase
   family which involves degradation of long branched fatty acids and bile
   acid intermediates in peroxisomes^[148]68. The similar ACOX1 was
   previously found to be slightly upregulated in high BT pigs^[149]65.
   The current finding suggests a contrary expression pattern for ACOX2 as
   the gene is indeed downregulated in high BT pigs. ACOX2 is also found
   in the lightgreen CE module in liver where the gene is functionally
   annotated to “Steroid metabolic process” (Fig. [150]5c). Hence, ACOX2
   is considered a candidate biomarker for BT (Fig. [151]7a).

   Analysis of overrepresented pathways in liver revealed a strong
   correlation towards active transcription (“Ribosome”), export of
   proteins (“Protein export”) and oxidation (“Oxidative phosphorylation”)
   (Fig. [152]4a). Analysis of the specific DE genes annotated to “Drug
   metabolism” revealed FMO3 and CYP2B6. Flavin-containing monooxygenases
   (FMOs) are a family of enzymes that converts lipophilic compounds into
   polar metabolites, thus decreasing their activity^[153]69. The FMOs are
   involved in metabolic activation or detoxification of numerous
   xenobiotics such as pesticides, drugs and carcinogens^[154]70. FMOs are
   regulated by sex steroid hormones and FMO3 was found to be repressed by
   testosterone and upregulated by castration^[155]71. Furthermore, FMO3
   was previously speculated to be associated with off flavour in
   pork^[156]72. Consistently, we found FMO3 downregulated in high BT as
   androstenone and testosterone levels are positively correlated^[157]7.
   Hence, FMO3 is considered a candidate biomarker for BT in liver
   (Fig. [158]7a). The cytochrome P450 (CYP) proteins are also
   monooxygenases and similar in function to FMOs; they are involved in
   drug metabolism and synthesis of cholesterol, steroids and other
   lipids^[159]73. In human and pig, 18 families of CYPs have been
   identified with 57 and 54 genes, respectively^[160]73. The CYPs have
   previously been reviewed as candidate genes for BT^[161]37. In the
   current study, ten CYPs were identified as DE genes: CYP2J2, CYP2B6 in
   liver and CYP4V2, CYP1B1, CYP2D7, CYP2R1, CYP51A1, CYP20A1, CYP26B1 and
   CYP27A1 in testis (Supplementary file [162]2). In liver, CYP2B6 was
   upregulated in the high BT group which clearly reflects its implication
   in phase I metabolism. Another member of the CYP2 family was the CYP2J2
   which was found to be upregulated in high BT. Interestingly, previous
   studies showed that the CYP2B was found only to metabolise skatole to a
   low degree and to our knowledge, no data is available for CYP2J (see a
   review by Rasmussen and Zamaratskaia^[163]74). In testis, CYP27A1 was
   found to be strongly upregulated in high BT pigs. In porcine
   hepatocytes, CYP27A1 has been found to be active in 25-hydroxylation of
   the prohormone vitamin D[3] and to carry out the 27-hydroxylation of
   cholesterol-derived intermediates in bile acid biosynthesis^[164]75. In
   this context, it is likely that CYP27A1 is affected by the increased
   steroid production characterised in high BT pigs and to our knowledge,
   we are the first to report association of the gene with BT. Hence,
   CYP2J2, CYP2B6 and CYP27A1 are considered candidate biomarkers for BT
   in liver and testis, respectively (Fig. [165]7).

   GSTO1 and GSR are part of “Glutathione metabolism” and “Drug
   metabolism” pathways and participate in the phase II metabolism. GSTO1
   encodes a member of the glutathione S-transferases (GSTs) that are
   known for their role in catalysis of conjugation reactions in
   endogenous substances, haeme, fatty acids, xenobiotics and products of
   oxidative processes^[166]76. In context of BT, GSTO2 has been
   speculated to be involved in excretion of skatole from the porcine
   body^[167]77. We found GSTO1 to be clearly downregulated in high BT
   pigs (Fig. [168]4a), which could explain the BT condition observed by
   decreased excretion of skatole. The finding is consistent with previous
   gene expression studies in liver^[169]65 and testis^[170]66. Due to its
   implication in relevant pathways and clear expression pattern, GSTO1
   could be considered a candidate biomarker (Fig. [171]7a). GSR encodes a
   glutathione reductase enzyme that catalyses the reduction of
   glutathione disulphide to glutathione, an important antioxidant and
   critical for the cell to resist oxidative stress and to maintain its
   reducing environment^[172]78. The expression of GSR is probably related
   to liver degradation capacity as GSR exhibited a clear pattern of
   downregulation in high BT pigs. Hence, GSR is considered a candidate
   biomarker in liver (Fig. [173]7a). However, the precise relationship
   between GSR and BT compounds should be subject to further research.

   Analysis of DE genes from testis revealed “Transmembrane signalling
   receptor activity” to be a highly enriched MF GO term. This may reflect
   the fact that steroid hormone biosynthesis is a highly active function
   in the testis of high BT pigs, as previous research has found a high
   genetic correlation between androstenone (plasma and fat) and sex
   steroids in Duroc and Landrace pigs^[174]79. Activation and regulation
   of steroid hormone biosynthesis requires a high level of cellular
   communication observed by enrichment of the MF GO terms “Poly(A) RNA
   binding” and “Enzyme binding” which reflects the high pathway activity.
   Furthermore, the CC GO terms such as “Intracellular part” and “Nuclear
   part” revealed that most processes are located in the intracellular
   part or nuclear part where transcription and translation are undertaken
   in the cell (Fig. [175]2b). The BP term “Organelle organization” was
   highly enriched which probably reflects high activity of transcription
   and protein metabolic processes. When a fixed number of top DE genes
   between tissues were subjected to MF GO term enrichment analysis,
   “Translation factor activity, RNA binding” had four upregulated genes
   in high BT pigs which probably is attributed to the high level of
   steroidgenesis. Interestingly, genes associated with the term
   “Ubiquitin-protein transferase activity” were almost exclusively
   downregulated in the high BT pigs. This family of transferases is
   responsible for the transfer of ubiquitin from one protein to another
   which can signal the degradation of the protein by the proteasome. The
   downregulation of the associated genes implies a markedly less
   regulated steroidgenesis which probably results in overproduction of
   steroid hormones in testis. This hypothesis is supported by the
   significant overrepresentation of the pathway “Steroid hormone
   biosynthesis” which comprises the 17 β-hydroxysteroid dehydrogenase
   (HSD) genes HSD17B7 and HSD17B8 (Fig. [176]4b). The 17β-HSD encode
   enzymes that are part of the last catalysis step in the formation of
   androgens and estrogens^[177]80 and the various types provide specific
   cell types with the mechanisms to regulate intracellular androgens
   and/or estrogens^[178]81. The importance of the HSD family in relation
   to BT compounds has been well documented in both RNA-Seq and microarray
   studies and both HSD17B2 and HSD17B4 have previously been suggested as
   biomarkers for reduced BT^[179]37,[180]66,[181]77,[182]82. We also
   found the HSD17B7 and HSD17B8 to be highly upregulated in the high BT
   pigs. HSD17B7 is involved in biosynthesis of cholesterol^[183]83 and
   sex steroids^[184]84 and regulates bioavailability of both androgens
   and estrogens, yet interestingly, previous work has confirmed that the
   HSD17B7 is not the key enzyme responsible for androstenone and
   testosterone metabolism in porcine hepatocyte^[185]85. However, we
   found the HSD17B7 upregulated in testis and not liver, which
   consequently justifies the proposition of it being a candidate
   biomarker in testis (Fig. [186]7b). The HSD17B8 is involved in
   regulation of concentrations of biologically active estrogens and
   androgens^[187]86. Recently, the protein encoded by the HSD17B8 was
   identified as a major interaction partner in gene networks of pigs with
   divergent androstenone levels. The major function of the protein was
   hypothesised to be testosterone inactivation^[188]87. The testicular
   testosterone concentrations were positively correlated with
   androstenone concentrations due to active testosterone synthesis which
   could also imply active androstenone synthesis^[189]87. However, we
   found that HSD17B8 was upregulated in testis from high BT pigs which is
   contrary to the hypothesis that HSD17B8 inactivates testosterone
   synthesis. We suggest that the gene plays an important role in BT and
   is considered a candidate biomarker, but further research should
   clarify its exact role in regulation of BT compounds (Fig. [190]7b).

   The selection of pathways of biological interest included the “Drug
   metabolism”, “Estrogen signalling pathway” and “GnRH signalling
   pathway” (Fig. [191]4b). In “GnRH signalling pathway”, genes had both
   up- or downregulation patterns, but mostly downregulated. MAPK3 and
   MAP2K2 were upregulated whereas MAP3K2 were downregulated in high BT
   pigs, respectively (Fig. [192]4b). The MAPK pathway encodes protein
   kinases which have been shown to modulate numerous male reproductive
   functions such as spermatogenesis, sperm maturation and activation and
   other functions of the germ cells^[193]88. Interestingly, MAPK3, MAP2K2
   and MAP3K2 were also a part of the “Estrogen signalling pathway”
   (Supplementary file [194]4). The “Estrogen signalling pathway” was
   significantly overrepresented, which is probably due to secretion of
   large amounts of estrogen from the testis^[195]89. Estrogen
   concentrations have been found to be correlated with androstenone in
   adipose tissue^[196]90,[197]91. The roles of the MAPK pathway in
   variation of BT has recently been shown in a gene expression study on
   Duroc boars^[198]92. The authors found that upregulation of MAPK
   coincided with pubertal increase in androstenone and testosterone
   production^[199]92 as the upregulation was observed several weeks
   earlier than complex morphological restructuring of testis into
   reproductive tissue. Furthermore, proteins encoded by MAPK may regulate
   the Steroidogenic Acute Regulatory protein (STAR) (a cholesterol
   traffic regulator) and have a complex interaction with steroidgenesis
   (see review by Manna and Stocco^[200]93). Interestingly, STARD10, which
   may play a metabolic role in sperm maturation and lipid
   transport^[201]93, was found upregulated in testis (Supplementary
   file [202]2) and may have other localised functions such as enrichment
   of lipids in milk in the mammary glands^[203]94. Previous works have
   documented the role of a subfamily of STAR in BT in testis, especially
   STARD6 and STAR ^[204]66,[205]92. To our knowledge this is the first
   time a relationship between STARD10 and BT has been reported and the
   association warrants further investigation of the role of STARD10 in
   BT. Due to the complexity of MAPK in BT and the warrant for further
   research of these genes, MAPK3, MAP2K2, MAP3K2 and STARD10 are
   considered candidate biomarkers (Fig. [206]7b).

   Co-expression analysis in liver revealed the lightgreen module to be
   associated and slightly downregulated with the summarised EBVs. Among
   the five hubs genes (HSD17B8, GSTZ1, SERPINC1, PRDX3 and NIT1), HSD17B8
   was also DE in testis and selected as a candidate biomarker.
   Furthermore, HSD17B8 was co-expressed with CYB5A. Expression levels of
   CYB5A has previously been linked to androstenone levels^[207]94. Hence,
   the finding of HSD17B8 as a hub gene strengthens the hypothesis of a
   regulatory role in androstenone synthesis. GSTZ1 was found as a hub
   gene in the lightgreen module and as a DE gene in liver which encodes a
   member of the GSTs^[208]95. Other hub genes included SERPINC1
   (Fig. [209]5a) which encode a gene that inhibits thrombin as well as
   other activated serine proteases of the coagulation system and
   regulates the blood coagulation cascade^[210]96. Members of the SERPIN
   family have previously been associated with skatole metabolism in
   liver^[211]77 and it is very likely that the hub genes are all related
   through various mechanisms in the clearing of skatole and/or
   androstenone from the liver either directly or indirectly through
   regulation or activation of other genes and/or pathways. Interestingly,
   SERPINC1 was co-expressed with HSD17B13 which is not exactly clear as
   17β-HSD catalyses biosynthesis of androgens and estrogens but the
   finding warrants further research into the association of SERPINC1 and
   BT. The darkgreen module was significantly enriched for the MF GO term
   “Steroid metabolic process” which comprised SULT1B1 and CYP7A1.
   Downregulation of CYP7A1 has been speculated to increase androstenone
   levels through reduced fat catabolism^[212]82 which is consistent with
   the current study where the gene was downregulated in high BT pigs.
   Considering the finding of CYP7A1 as co-expressed in liver, additional
   research should be focused on its role in BT. SULT1B1 encodes a member
   of the sulfotransferases which catalyse the sulphate conjugation of
   steroid hormones, neurotransmitters, drugs and xenobiotic
   compounds^[213]97 and previous work established them as key enzymes
   involved in the testicular and hepatic metabolism of
   androstenone^[214]66. The closely related SULT1A1 were found to be
   involved in phase II metabolism of skatole^[215]98 and a mutation in
   the gene is thought to be responsible for skatole accumulation^[216]99
   which has prompted researchers to consider it a candidate biomarker for
   BT^[217]37. Indeed, the sulfotransferases seem important in the
   metabolism of BT compounds as a negative correlation between
   accumulation of androstenone and SULT2A1 in which low activity of
   SULT2A1 was found in animals with high concentrations of
   androstenone^[218]100. Similar results were found in a study performed
   on Duroc, Norwegian Landrace and Yorkshire^[219]67 which is also
   consistent with the findings in the current study where SULT1B1 was
   downregulated in high BT pigs. The finding of SULT1B1 as a co-expressed
   gene reflects a possible regulatory association and warrants further
   research on its precise role in BT. In the current study, only SULT4A1
   was found as a DE gene in testis and upregulated in high BT pigs but it
   did not pass additional filtering (Supplementary file [220]2).

   In testis, the significantly associated darkgreen module contained
   genes significantly enriched for “Carboxylic acid catabolic process”,
   “Small molecule catabolic process” and “Sterol molecule metabolic
   process” (Fig. [221]6b). Further analysis of the “Sterol metabolic
   process” term revealed that several genes overlapped with “Cholesterol
   biosynthetic process” (Fig. [222]6c). Interestingly, the previously
   selected candidate biomarker HSD17B7 was part of the darkgreen module
   and the finding of HSD17B7 as a co-expressed gene may point to some
   kind of regulatory function in BT. However, this is in contrast to
   previous research where HSD17B7 was excluded as a key enzyme
   responsible for androstenone and testosterone metabolism in porcine
   liver cells^[223]85. Another important enriched gene in the module was
   CYB5R3 which encodes a cytochrome b5 reductase^[224]101. The subfamily
   CYB5 has been found particular important in androstenone biosynthesis
   and expression levels and a mutation in CYB5A was speculated to
   decrease androstenone levels^[225]94,[226]102. From its role as a
   reductase, CYB5R3 is a modulator of the activity of CYB5A ^[227]103.
   Interestingly, work on nuclear receptor transactivation found that
   CYB5R1 probably increased levels of 16-androstene steroids but
   ambiguous results were found for CYB5R3 ^[228]104. In the current
   study, CYB5R1 was also found as a DE gene in testis (Supplementary
   file [229]3) but did not pass additional filtering. Due to strong
   upregulation and co-expression of CYB5R3 in high BT pigs, we consider
   the gene as a candidate biomarker in testis (Fig. [230]7b).

   In this study, the candidate biomarkers were genes with functions
   within phase I (COQ3, COX6 C, CYP2J2, CYP2B6, ACOX2) and phase II
   metabolism (GSTO1, GSR, FMO3) of skatole and androstenone in liver to
   steroidgenesis (HSD17B7, HSD17B8, CYP27A1), regulation of
   steroidgenesis (STARD10, CYB5R3) and GnRH signalling (MAPK3, MAP2K2,
   MAP3K2) in testis. For comparison, a recently published review by
   Zadinová, et al.^[231]37 summarised genes currently believed to be
   controlling levels of androstenone (CYP11A1, CYP17A, CYP21A2, CYB5A,
   LHB, HSD17B7), androstenone metabolism (HSD3B, SULT2A1, SULT2B1,
   TEAD3), phase I skatole metabolism (CYP2A6, CYP2E1, CYB5A) and phase II
   skatole metabolism (SULT1A1). Other important genes were FMO3 ^[232]72,
   GSTO2 ^[233]77, COX4I1 ^[234]64, MAPK14 ^[235]79, STARD6 ^[236]92 and
   ACOX1 ^[237]77. Thus, a high level of consistency is evident between
   known gene families involved in BT and the selection of candidate
   biomarkers in this study. Importantly, our study presents two new genes
   (COQ3, GSR) and novel gene members and isoforms of BT which is probably
   attributed to the use of deep sequencing of ~33 mio reads per sample to
   register even small fluctuations in gene expression. Some candidate
   biomarkers probably share similar functions with known genes of BT such
   as the case of HSD3B and HSD17B ^[238]91 or may be involved in
   regulation of known candidate biomarkers such as CYB5R3 that regulates
   CYB5A ^[239]103. Furthermore, co-expression network analysis provided
   valuable insight into the complexity of both DE and non-DE genes and
   their functions in BT such as CYP7A1 and SULT1B1 and enabled discovery
   of hub genes with a possible regulatory association such as HSD17B8 and
   SERPINC1 co-expressed with CYB5A and HSD17B13¸ respectively.

   Future work should focus on the validation of the detected biomarkers
   at the population-wise level as well as in other breeds to develop a
   sensitive and specific industry-targeted test for optimised
   genomic-based animal breeding to reduce BT. Gene expression data should
   be combined with genotypes to find expression quantitative trait loci
   (eQTLs) of BT and/or further utilise the power of the RNA-Seq data with
   additional analysis of splicing events, polymorphisms and different
   isoforms of transcripts^[240]105 of the candidate biomarkers proposed
   in this study.

Methods

   A complete overview of the experimental design is presented in
   Fig. [241]1.

Animal model

   Male Landrace pigs (n = 114) were housed at the commercial testing
   station Bøgildgård in Denmark (N 56°27′52.60′′ W 9°38′62.97′′) operated
   by the Pig Research Centre, SEGES (Copenhagen, Denmark) and raised with
   ad libitum feed and water supply. The pigs were produced from sires
   with known genetic merit of boar taint (BT) assessed from estimated
   breeding values (EBVs) of skatole concentrations and human nose score
   (HNS). To account for both traits, the genetic merit of BT was defined
   as the sum of the EBVs of skatole concentration and HNS obtained from
   the respective sire of each pig and denoted as the summarised EBV
   following equation ([242]1):
   [MATH: <mi mathvariant="italic">Summarised</mi><mspace
   width=".25em"></mspace><mi mathvariant="italic">EB</mi><msub><mrow><mi
   mathvariant="italic">V</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></ms
   ub><mo>=</mo><mi mathvariant="italic">Skatole</mi><mspace
   width=".25em"></mspace><mi mathvariant="italic">EB</mi><msub><mrow><mi
   mathvariant="italic">V</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></ms
   ub><mo>+</mo><mi mathvariant="italic">Human</mi><mspace
   width=".25em"></mspace><mi mathvariant="italic">nose</mi><mspace
   width=".25em"></mspace><mi mathvariant="italic">score</mi><mspace
   width=".25em"></mspace><mi mathvariant="italic">EB</mi><msub><mrow><mi
   mathvariant="italic">V</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></ms
   ub> :MATH]
   1

   where the summarised EBV[i] is the sum of the EBV for skatole
   concentration in fat and the EBV for HNS for the individual pig i
   obtained from its respective sire j. All EBVs were collected from the
   Danish pig breeding database (Pig Research Centre, SEGES, Copenhagen,
   Denmark). The raw values for skatole concentrations were obtained from
   measurements on carcass fat samples by a calorimetric method^[243]106.
   HNS were collected by a standardised method^[244]107. The EBVs (the
   relative genetic performance compared with a contemporary group) were
   calculated by Genomic BLUP Animal Models^[245]48,[246]49. The EBVs were
   corrected for age (days) and weight (kg) in order to account for
   systematic differences in sexual maturation and to obtain full
   steroidogenic potential^[247]33 in the three groups.

   Subsequently, the animals were grouped as low, medium or high genetic
   merit of BT, according to the summarised EBVs. A total of 48 pigs out
   of the 114 pigs were selected for analysis from each of the three
   groups as high, medium and low genetic merit of BT (Supplementary
   file [248]1). The selection of pigs was based on the most extreme
   values within each of the low and high groups, and the closest values
   to the mean in the medium group. To account for any pedigree effects,
   the maximum amount of relatedness allowed between pigs but within sire
   groups were only half-siblings. However, the sires themselves were not
   related as far as we could find in the dataset.

Tissue collection and preparation

   Pigs were slaughtered at a weight of ~100 kg at a commercial
   slaughterhouse (Danish Crown, Herning, Denmark). Slaughter was
   performed by submersion into CO[2] until unconsciousness ensued
   followed by exsanguination. Following slaughter, the liver was
   extracted from the carcass and 150 mg of tissue were retrieved by punch
   biopsy and immediately immersed into 1.5 ml RNAlater (QIAGEN, Hilden,
   Germany) in 2 ml Eppendorf tubes (Eppendorf, Hamburg, Germany). The
   carcasses were kept in a cold room at 4 °C for approximately 1.5 h
   before 150 mg testis tissue were retrieved by punch biopsy and immersed
   into 1.5 ml RNAlater (QIAGEN). All samples (n = 96) were stored at
   −20 °C for 14 days until RNA extraction and sequencing at a commercial
   facility (AROS A/S, Aarhus, Denmark).

RNA extraction and sequencing

   Total RNA was extracted from the 96 samples by RNeasy Mini Kit (QIAGEN)
   following instructions of the manufacturer. Concentration of RNA was
   measured by Nanodrop^® 2000 (ThermoScientific, Massachusetts, USA). The
   quality of the RNA was measured with a Bioanalyzer (Agilent,
   California, USA) to ensure an RNA integrity number (RIN) of at least 6;
   otherwise, the samples were discarded. For samples with an RIN values
   of 7 or above, the sequencing libraries were prepared using the TruSeq
   stranded mRNA (Illumina, San Diego, USA) kit following instructions of
   the manufacturer. Samples with RIN values from 6 to 7 were therefore
   prepared using the TruSeq total stranded RNA (Illumina) kit following
   instructions of the manufacturer. In both protocols, 400 ng of total
   RNA were used as input and fragmentation was performed following
   instructions of the manufacturer, but with an incubation step of 1 min
   at 94 °C protocol to produce libraries that are compatible with
   sequencing on a paired-end 100 base pair flow cell (PE-100bp FC,
   Illumina, San Diego, USA). Libraries were subjected to quality control
   with respect to size profile by test on an Agilent DNA 1000 (Agilent,
   California, USA) and library concentration by KAPA quantitative PCR
   (qPCR) kit and three independent 10^6-fold dilutions of libraries
   following instructions of the manufacturer (Kapa Biosystems,
   Massachusetts, USA). The samples were sequenced with an Illumina HiSeq.
   2500 (Illumina, San Diego, USA) which amounted to a theoretical 40
   million reads per sample.

Quality control and gene counting

   Data from sequencing machine were converted from base calls into
   FastQ-files by CASAVA^®-software (Illumina, San Diego, USA). Quality
   control (QC) of RNA-Seq reads was conducted with FastQC^[249]108. Reads
   were trimmed for known Illumina TruSeq adapter sequences using the
   programme CutAdapt^[250]109. Poor reads were trimmed by
   Trimmomatic^[251]110 using default parameters. The trimmed reads were
   then mapped to the Sus scrofa reference genome (10.2, version 79)
   obtained from Ensembl^[252]111 by the STAR aligner using default
   parameters^[253]112. Post-mapping QC was performed with
   Qualimap^[254]113. The mapped reads were counted to each gene by HTSeq
   count using default parameters^[255]114. All subsequent statistical
   analysis were performed in R version 3.1.0^[256]115. Only genes with a
   mean count of more than five were included in the gene count matrices.
   Due to relatively poor annotation of the Sus scrofa genome, Ensembl
   identifiers were translated into their human orthologous gene symbols
   by the R package biomaRt^[257]116,[258]117, using the Homo sapiens
   reference genome (h38, version 84, Ensembl).

Differential expression analysis

   Normalisation of gene counts was performed by voom
   variance-stabilization function with sample quality weights^[259]118
   implemented in the R package limma. Differentially expressed (DE) genes
   were detected using the R package limma^[260]119. Briefly, limma fits a
   linear model to each gene using the function lmFit following equation
   ([261]2):
   [MATH:
   <msub><mrow><mi>γ</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><msub><mrow><mi>β</mi></mrow><mrow><mi>j</mi><mo>,</mo><mi>B</m
   i><mi>T</mi></mrow></msub><mi>B</mi><mi>T</mi><mo>+</mo><msub><mrow><mi
   >ε</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub> :MATH]
   2

   where
   [MATH:
   <msub><mrow><mi>γ</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub>
   :MATH]
   is the measured expression level of gene j for individual pig i,
   [MATH:
   <msub><mrow><mi>β</mi></mrow><mrow><mi>j</mi><mo>,</mo><mi>B</mi><mi>T<
   /mi></mrow></msub> :MATH]
   is the estimated regression coefficient calculated by regression of the
   gene expression value of the i ^th individual pig categorised to its
   genetic merit of BT group (“low”, “medium” or “high”) for the j ^th
   gene and ε [ij] is the error component. By using the function eBayes in
   limma, the moderated t-test statistics for differences in the variance
   of gene expression of a gene across replicates were calculated and
   subjected to multiple testing correction of the P values by the
   Benjamini-Hochberg procedure^[262]120 which resulted in a false
   discovery rate (FDR) and the logarithm-transformed fold change for each
   gene. Genes were categorised as differentially expressed genes when FDR
   was below 0.05 and were found by contrasting “high BT vs low BT” groups
   from a design matrix that included all three groups. Additional
   filtering was performed by comparison of expression profiles of the
   individual DE gene throughout the three groups by a Kruskal-Wallis
   test. Gene functional enrichment analysis was performed with the
   Cytoscape^[263]121 plug-in ClueGO version 2.2.5^[264]122. By providing
   DE genes and genes attributed to co-expression modules with significant
   association to the summarised EBVs, ClueGO performed overrepresentation
   test and visualisation for gene ontology (GO) terms and KEGG pathways
   using default parameters. Test results were subjected to multiple
   testing correction of the P values by Bonferroni correction. For
   semantic filtering of redundant GO terms, the online tool
   REVIGO^[265]123 was used. As the BT condition involves numerous
   enzymatic functions and for the sake of simplicity, the most
   interesting molecular function (MF) GO terms from a fixed number
   (n = 507) of top DE genes from each tissue to make analysis consistent.
   Similar DE analyses based on 3 or more categories and treatment groups
   with similar sample sizes / biological replicates and RNA-Seq data in
   animals have been applied for various conditions including Mazzoni, et
   al.^[266]34 for cattle reproduction and Salleh, et al.^[267]35 for
   cattle feed efficiency.

Co-expression analysis

   In order to identify strongly co-expressed (CE) genes (“hub genes”)
   which may have regulatory functions and/or major impact on the trait,
   we employed a co-expression analysis approach for each tissue which has
   previously been successfully employed by our group^[268]40. Scale-free
   undirected co-expression networks were built using the R package
   WGCNA^[269]42,[270]124 and modules with significant association
   (P < 0.05) to genetic merit of BT were detected and analysed. The
   genetic merit of BT was inputted as the high, medium and low groups
   defined from the summarised EBVs as described in the animal model
   section. The co-expression analyses was performed independently for
   liver and testis by the following procedure: An adjacency matrix
   comprising Pearson’s correlations between expression levels of genes
   was created from the 7,500 most highly expressed genes from the tissue
   in question which satisfied the requirements of both acceptable
   computational time and biological meaningfulness. Subsequently, the
   matrix was raised by a power (β) of 10 in each tissue which was found
   to be an appropriate value by the function pickSoftThreshold() to reach
   a scale-free topology index (R ^2) of at least 0.90. The topological
   overlap measure (TOM), which assesses the degree of shared neighbours
   between pairs of genes, was calculated from the adjacency matrix and
   used to build a gene dendrogram which detects modules and assigns
   colour by the DynamicTreeCut algorithm^[271]125, using a minimum of 30
   genes per module. The module eigengene, the first principal component
   of each module, represents the expression value of each module and was
   used to detect biologically relevant modules. The module-trait
   relationship was calculated as the Pearson’s correlation between the
   module eigengene and the traits of interest. Modules significantly
   (P < 0.05) associated with genetic merit of BT were exported for
   topological analysis and visualisation in the software VisANT^[272]126.
   An arbitrarily chosen subset of 150 gene pairs with the highest TOM
   within the selected modules was selected and hub genes were defined as
   genes with highest intramodular connectivity. The intramodular
   connectivity was assessed by a chosen topological analysis of the
   module: genes with a node degree distribution (k) (links per node) of
   ≥12 were assigned as hub genes for the specific tissue.

Candidate biomarkers

   Selection of candidate biomarkers for each tissue was made by applying
   a reductionist strategy of genes that had significant (FDR < 0.05)
   differential expression and significant (P < 0.05) difference in
   expression profiles by a Kruskal-Wallis test throughout the three
   groups and were either: (i) annotated to relevant GO terms and/or
   pathways and/or (ii) part of a co-expression module related to a
   summarised EBVs of skatole and HNS. Each chosen candidate biomarker was
   visualised on a box plot with the P value of the Kruskal-Wallis test.

Declarations

Ethics approval and consent to participate

   Animal Care and Use Committee approval was not obtained for this study,
   because tissue samples were obtained from a commercial slaughter
   facility.

Availability of data and material

   The datasets generated during and/or analysed during the current study
   are available in NCBI’s Gene Expression Omnibus^[273]127 and are
   accessible through GEO Series accession number [274]GSE93734
   ([275]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93734).

   All data analysed during this study are included in this published
   article [and its supplementary information files].

Electronic supplementary material

   [276]Dataset 1^ (10KB, xls)
   [277]Dataset 2^ (1.8MB, xls)
   [278]Dataset 3^ (1.6MB, xls)
   [279]Dataset 4^ (63KB, xls)
   [280]Dataset 5^ (6.2MB, xls)
   [281]Supplementary information^ (162.8KB, pdf)

Acknowledgements