Abstract

   The regulation of residual feed intake (RFI) in beef cattle involves
   brain-gut mechanisms due to the interaction between neural signals in
   the brain and hunger or satiety in the gut. RNA-Seq data contain an
   extensive resource of untapped SNPs. Therefore, hypothalamic and
   duodenal tissues from ten extreme RFI individuals were collected, and
   transcriptome sequenced in this study. All the alignment data were
   combined according to RFI, and the SNPs in the same group were
   identified. A total of 270,410 SNPs were found in the high RFI group,
   and 255,120 SNPs were found in the low RFI group. Most SNPs were
   detected in the intronic region, followed by the intergenic region, and
   the exon region accounts for 1.11% and 1.38% in the high and low RFI
   groups, respectively. Prediction of high-impact SNPs and annotation of
   the genes in which they are located yielded 83 and 97 genes in the
   high-RFI and low-RFI groups, respectively. GO enrichment analysis of
   these genes revealed multiple NADH/NADPH-related pathways, with ND4,
   ND5, and ND6 significantly enriched as core subunits of NADH
   dehydrogenase (complex I), and is closely related to mitochondrial
   function. KEGG enrichment analysis of ND4, ND5, and ND6 genes was
   enriched in the thermogenic pathway. Multiple genes, such as ATP1A2,
   SLC9A4, and PLA2G5, were reported to be associated with RFI energy
   metabolism in the concurrent enrichment analysis. Protein-protein
   interaction analysis identified multiple potential candidate genes
   related to energy metabolism that were hypothesized to be potentially
   associated with the RFI phenotype. The results of this study will help
   to increase our understanding of identifying SNPs with significant
   genetic effects and their potential biological functions.

   Keywords: RNA-Seq, Single nucleotide polymorphisms, Beef cattle,
   Hypothalamus, Duodenum

Introduction

   Feed cost accounts more than 70% of the total input cost in cattle
   production, making feed utilization a crucial metric for evaluating
   production expenditures ([42]Patience, Rossoni-Serao & Gutierrez,
   2015). Efficient feed utilization can reduce herd maintenance costs by
   9–10%, lower feed intake by 10–12%, and decrease methane emissions by
   15–20% ([43]Moore, Mujibi & Sherman, 2009). Thus, optimizing feed
   utilization and reducing production costs are essential for livestock
   development. Residual feed intake (RFI) represents the disparity
   between the average daily feed intake (ADFI) and the average expected
   feed intake (AEFI) needed to maintain production levels ([44]Koch et
   al., 1963). RFI provides a precise measure of feed utilization
   efficiency in livestock, isolating the effects of animal growth traits
   and rates ([45]Richardson & Herd, 2004).

   RFI is a promising candidate for genetic improvement due to its
   moderate heritability (0.28–0.58) ([46]Moore, Mujibi & Sherman, 2009)
   and significant genetic variability ([47]Archer & Bergh, 2000; [48]Herd
   & Bishop, 2000). Our research found that RFI is related to gut
   microbiota ([49]Zhou et al., 2023), circRNA-miRNA interaction ([50]Zhao
   et al., 2023), and gene expression ([51]Yang et al., 2023, [52]2021,
   [53]2022). These studies support a comprehensive analysis of RFI and
   show that its influencing factors are numerous and complex.

   The hypothalamus and duodenum are critical organs in animal feed
   intake, energy metabolism, and digestion. The hypothalamic arcuate
   nucleus regulates appetite, where neuropeptide Y (NPY) and
   agouti-related peptide (AGRP) promote feeding. Conversely, α-MSH
   (α-melanocyte-stimulating hormone) induces satiety ([54]Perkins et al.,
   2014). The duodenum, a key organ for nutrient absorption, facilitates
   various metabolic functions such as glucose, fat, vitamin B, calcium,
   zinc, and iron ([55]Anand et al., 2021; [56]Cooke & Clark, 1976;
   [57]Reeves & Chaney, 2004). The interplay between the central nervous
   and digestive systems is evident in the microbiota-gut-brain axis
   (MGBA). The nervous system influences gut function through
   neurotransmitters and gut hormones, while gut microbes play crucial
   roles in host nutrient metabolism ([58]The 1000 Genomes Project
   Consortium, 2015; [59]Olivier, 2003). Thus, the close association of
   the hypothalamus and duodenum with feeding efficiency underscores their
   importance in studies investigating RFI in beef cattle.

   Single nucleotide polymorphism (SNP) is a genetic variation resulting
   from a single nucleotide change in the DNA sequence. These variants can
   influence phenotypes and disease susceptibilities ([60]Kim & Misra,
   2007). Due to the low cost and high availability of RNA-seq data,
   coding region variants from RNA-Seq data are widely studied for their
   potential contribution to phenotype ([61]Karczewski et al., 2020).
   Transcriptome data offer gene expression levels that can be utilized to
   investigate cis-regulation based on the expression of genes with SNP
   sites ([62]Jehl et al., 2021). A wealth of research has been dedicated
   to extracting SNPs from transcriptomic data, yielding significant
   advancements across various fields. For instance, transcriptome
   sequencing of cow’s milk has facilitated the discovery of SNPs,
   providing a robust foundation for marker-trait association studies
   ([63]Canovas et al., 2010). In aquaculture, RNA-seq analysis has
   identified SNPs potentially linked to the immune response and the
   growth performance of Penaeus vannamei ([64]Santos, Andrade & Freitas,
   2018). In crop science, the development of genome-wide SNP markers for
   barley has been achieved through reference-based RNA-Seq analysis
   ([65]Tanaka et al., 2019). In animal husbandry, RNA-Seq SNP data has
   revealed potential causal mutations relevant to pig production traits
   and the intricacies of RNA editing ([66]Martinez-Montes et al., 2017).
   SNP analysis of transcriptomic data from 20 human and bovine tissues
   revealed that cis-regulatory elements of gene expression are conserved
   between humans and cattle ([67]Yao et al., 2022). Differential
   expression of an intronic SNP in FABP4 was found to correlate with
   lipid transport and intracellular homeostatic regulation in studies of
   bovine rumen acidosis ([68]Zhao et al., 2017).

   This study characterized the SNPs from the hypothalamus and duodenum
   tissues of the same cattle with high and low residual feed intake based
   on RNA-seq data. The objective was to identify SNPs related to beef RFI
   and conduct subsequent bioinformatics analysis to detect the functional
   SNPs/genes associated with feed utilization performance in beef cattle
   and expand our understanding of the role of genetic variants in RFI
   phenotypes from expressed regions of the genome.

Materials and Methods

Experimental animals and data collection

   Based on our previous study ([69]Yang et al., 2021), 30 Qinchuan bulls
   with similar age (15 ± 1 months) and weight (280.6 ± 30.9 kg) were
   selected from a farm in Ningxia, China. The study subjects were given a
   standardized feeding regimen throughout the experimental period, and
   free access to water and food was ensured. Body weight measurements
   were taken monthly throughout the 81-day experimental period, then
   daily feed intake, average daily gain (ADG), and the midpoint metabolic
   body weight (MMBW0.75) was calculated based on feed intake (FI). To
   classify the cattle into high and low RFI groups, we used multiple
   linear regression of FI on the midpoint MMBW^0.75 and ADG to estimate
   individual RFI ([70]Yang et al., 2021).

RNA extraction and sequencing

   Based on the results of the RFI calculation, five individuals with
   extremely low RFI (LRFI, high efficiency) and high RFI (HRFI, low
   efficiency) phenotypes were selected for slaughter after a 16-h fasting
   period. All experimental procedures involving animals were conducted by
   the Guidelines for Ethical Review of Laboratory Animal Welfare of
   Ningxia University (NXUC20211015). The hypothalamus (including the
   arcuate strong nucleus, parabrachial nucleus, supraoptic nucleus,
   dorsal/ventral medial nucleus, and other brain tissues) and the
   descending duodenum (mucosa, submucosa, and external muscular propria)
   were collected post-slaughter. Our tissue samples include hypothalamus
   and duodenum tissues from five high RFI and five low RFI cattle,
   totaling 20 samples. These tissue samples were washed with PBS to
   remove blood and other impurities, then cut into small pieces to
   increase surface area, and placed into sterile tubes for further
   processing. Total RNA was extracted from 500 mg of tissue samples using
   TRIzol method (TaKaRa Bio, Beijing, China), following the
   manufacturer’s instructions. After extraction, the RNA was further
   purified using column purification to enhance purity and remove
   residual contaminants. DNase treatment was performed to eliminate
   genomic DNA contamination. The quality and integrity of the extracted
   RNA were assessed using 1% agarose gel electrophoresis, Nanodrop, and
   Agilent 2100 to ensure a sample concentration of ≥500 ng/µL, 28S:18S >
   1.0, and RIN ≥ 7. For library construction, the RNA was first reverse
   transcribed into cDNA. Adapter ligation was then performed to
   facilitate the sequencing process, followed by amplification to enrich
   the library. The library’s initial quantification was carried out using
   Qubit 2.0, and the insert size was verified using an Agilent 2100. The
   effective concentration of the library (effective concentration > 2 nM)
   was accurately determined using qRT-PCR. The hypothalamus tissue
   samples underwent whole transcriptome sequencing to capture both coding
   and non-coding RNA species, providing a comprehensive view of the
   transcriptome. The duodenum tissue samples were subjected to standard
   transcriptome sequencing. Finally, pair-end sequencing data (raw data)
   with 150 bp read length were generated using the Illumina HiSeq 4000
   platform. High RFI sequencing data of hypothalamus and duodenum were
   named Q_H1~Q_H5 and S_H1~S_H5, respectively, while low RFI sequencing
   data were named Q_L1~Q_L5 and S_L1~S_L5, respectively.

Quality control, mapping and transcript assembly

   The statistical power of this experimental design, calculated in
   RNASeqPower
   ([71]https://bioconductor.org/packages/release/bioc/html/RNASeqPower.ht
   ml) is 0.86 (The corresponding code can be found in [72]Script S1). The
   quality of raw data was assessed using the fastQC v.0.11.9.
   Subsequently, the Trimmomatic v.0.39 was used to perform quality
   control on the data. This included removing adapter sequences, trimming
   bases with Phred scores below 30 at the beginning and end of reads,
   applying a sliding window approach with a window size of 5 bp to remove
   bases with an average Phred score below 20, and discarding reads
   shorter than 75 bp. The cleaned data were then reevaluated using the
   fastQC software to ensure they met the requirements for subsequent
   analysis. The clean data were aligned to the bovine reference genome
   (ARS-UCD1.2; INSDC Assembly) using the STAR v.2.7.3a with the following
   parameters: --outSAMtype BAM Unsorted SortedByCoordinate,
   -outFilterMismatchNmax 999, --outFilterMismatchNoverReadLmax 0.04,
   --outFilterMultimapNmax 1. The resulting alignment files were further
   processed using the AddOrReplaceReadGroups tool of the PICARD v.2.27.4.
   This added the sample ReadGroups (RG) information to each alignment
   file. Additionally, the MarkDuplicates tool was applied to remove
   duplicate amplifications resulting from the PCR process during library
   construction.

Merging of sample data

   For increasing the number of reads per variant locus, enhancing the
   depth coverage of reads across the entire transcriptome, as well as the
   depth coverage and quality of variant calls ([73]Lam et al., 2020), in
   this study, the data from hypothalamic and duodenal tissues were merged
   into two BAM files based on phenotype (high RFI group and low RFI
   group). This merging process, performed using the “merge” command of
   the samtools v.1.16.1, aimed to balance sequencing depth between
   samples and minimize the impact on SNP analysis results. Both the high
   RFI and low RFI groups in the subsequent analysis referred to the
   combined group data ([74]Fig. 1).

Figure 1. Sample collection and bioinformatics analysis.

   [75]Figure 1
   [76]Open in a new tab

   Hypothalamic and duodenal tissues from extreme RFI individuals were
   collected from 30 Qinchuan cows. After transcriptome sequencing,
   quality control, alignment, and deduplication, the alignment files were
   merged. The data from the two tissues were merged according to the RFI
   group, resulting in two alignment files. The merger greatly increased
   the depth (DP) of the reads, and the average reads DP of the two groups
   was basically consistent. Finally, the BCFtools software was used to
   identify SNPs in the merged data.

SNPs recognition, filtering and annotation

   BCFtools v.1.16 was utilized to execute the variant calling on the
   combined data of the high RFI group and low RFI group respectively,
   enabling identification of SNP sites and generating BCF files
   containing variant information. The “norm” parameter of BCFtools was
   then employed to normalize the variant information, thereby eliminating
   ambiguity caused by varying methods. Subsequently, the low-quality SNPs
   data underwent further filtering to reduce the likelihood of false
   positives and alleviate computational resource requirements for
   subsequent analysis. The software BCFtools and VCFtools v.0.1.16) were
   employed for variant filtering, employing the following criteria: (1)
   Removal of SNPs within a 5 bp range near indels; (2) setting a minimum
   coverage (DP) of 10; (3) enforcing a minimum allele frequency not less
   than 0.2 and a secondary allele depth not less than 2; (4) filtering
   loci with quality scores below 30. Finally, the functional annotation
   of SNPs was performed using the SnpEff v.5.1d with the built-in
   ARS-UCD1.2.105 database. The thresholds of above software are
   referenced from previous study ([77]Lam et al., 2020).

Identifying and annotating high and low RFI group-specific SNPs

   Using the SnpEff software, the VCF files underwent annotation, allowing
   for the identification of SNPs specific to the high and low RFI groups.
   SnpSift v.5.1d was then employed to screen for SNP loci with
   significant functional and modifier-type impacts. This enabled the
   selection of candidate genes associated with these SNP loci.

Gene function enrichment analysis and protein interaction network analysis

   We employ clusterProfiler to conduct GO (Gene Ontology) and KEGG (Kyoto
   Encyclopedia of Genes and Genomes) enrichment analysis. The filter
   parameter is set as p value < 0.05. The GO enrichment can further
   clarify the main biological functions of the genes where the specific
   SNPs are located. The KEGG pathway enrichment can be used to understand
   the signal pathway regulated by the genes. Using the STRING database
   ([78]https://cn.string-db.org/), we perform protein interaction
   analysis on the relevant genes to select core genes that have
   interaction effects.

Statistical analysis

   The data from the experiment were analyzed and visualized using R
   v.4.3.0 ([79]R Core Team, 2023) and Prism v.10.1.1 software.
   Statistical significance between the treatment and control groups was
   assessed using non-parametric tests or t-tests. A p-value of less than
   0.05 was considered indicative of a significant difference. The
   corresponding code scripts can be found in [80]Script S1.

Results

RNA-seq sequencing data quality and comparisons

   From the RNA-seq data of the hypothalamus and duodenum, we obtained a
   total of 1,019 million and 275 million paired sequencing reads,
   respectively. After quality control, all 20 samples in this study had a
   Q30 score (error rate p < 0.001) above 96%. The GC% content was
   approximately 50% ([81]Table S1). The clean data was aligned to the
   bovine reference genome, where the percentage of reads aligning to the
   reference genome was above 91%, and the percentage of reads with a
   unique alignment position ranged from 81.07% to 93.81% ([82]Table S2).
   Analysis of the transcript expression for each sample indicated
   relatively consistent transcript abundances ([83]Fig. 2). These results
   demonstrate the high quality of the obtained data, reducing the impact
   of sequencing errors on subsequent analysis.

Figure 2. Transcript expression (TPM) analysis of each sample.

   [84]Figure 2
   [85]Open in a new tab

   (A) Transcript expression density in each sample. (B) Transcript
   expression in each sample.

SNPs screening and analysis

   SNPs in the high and low RFI groups were identified by combining RFI
   phenotype with RNA-Seq data from hypothalamic and duodenal tissues. The
   numbers of homozygous and heterozygous mutations in the combined
   samples were counted ([86]Table 1), revealing that there were 270,410
   and 255,120 specific SNPs in the high and low RFI groups, respectively.
   Among these SNPs, 11,991 (4.43%) and 14,007 (5.49%) were homozygous
   mutations, with heterozygous mutations far outnumbering homozygous
   mutations. Statistical analysis based on the different types of
   mutations in the SNPs ([87]Fig. 3) showed that the total number of
   transition types (A–G, C–T) was higher than the total number of
   transversion types (A–T, C–G, A–C, G–T). Among the six types of single
   nucleotide variations, A–G and C–T had the highest occurrence rates,
   while the occurrence rates of the other four types of transitions were
   relatively lower. In the high RFI group, A–G accounted for 36.5% and
   C–T accounted for 35.4%. In the low RFI group, A–G accounted for 35.7%
   and C-T accounted for 35.8%. The Ts/Tv ratios in the high and low RFI
   groups were 2.55 and 2.50, respectively. The differences in occurrence
   rates of the six types of single nucleotide variations between the high
   and low RFI groups were small, with a mean occurrence rate of
   transitions being 71.72% and transversions being 28.29%.

Table 1. Statistics on the number of homozygote and heterozygote SNPs in
different RFI groups.

   RFI group SNP statistics Homozygote sites Heterozygote sites
   HRFI      Number of SNPs 11,991           258,419
             Probability    4.43%            95.57%
   LRFI      Number of SNPs 14,007           241,113
             Probability    5.49%            94.51%
   [88]Open in a new tab

Figure 3. High RFI group and low RFI group combined data SNPs type
statistics.

   [89]Figure 3
   [90]Open in a new tab

   (A) Statistics on SNPs types in the high RFI group. (B) Statistics on
   SNPs types in the low RFI group.

Distribution statistics of SNPs

   The study investigated the distribution and variation of SNPs across
   different chromosomes, providing insights into the genetic diversity
   among genes. Analysis of the combined high and low RFI groups revealed
   no significant difference (p > 0.05) in distribution between the two
   groups. Chromosome 1 exhibited the highest number of SNPs, while the
   mitochondria (MT) showed the lowest distribution ([91]Fig. 4A). To
   account for differences in chromosome length, the ratio of SNPs number
   to chromosome length was calculated, revealing that the MT had the
   highest variant rate among both groups, indicating a higher density of
   SNPs per unit length ([92]Fig. 4B). The chromosomal distribution of
   SNPs within the high and low RFI groups is delineated in [93]Figs. 4C
   and [94]4D.

Figure 4. SNPs distribution statistics on chromosomes.

   [95]Figure 4
   [96]Open in a new tab

   (A) Statistics on the number of SNPs on different chromosomes. (B) SNPs
   number and length ratio statistics on different chromosomes. (C)
   Distribution of SNPs on chromosome locations in the high RFI group. (D)
   Distribution of SNPs on chromosome locations in the low RFI group. (E)
   Percentage of the genome comprising each type of feature (top) and the
   proportion of SNPs detected by HRFI group-specific SNPs (middle) and
   LRFI group-specific SNPs (bottom) across these genomic features.

   Comprehensive statistical analysis of SNP loci distribution across the
   genome was conducted for both high and low RFI datasets, emphasizing
   different genomic functional regions, such as downstream, exon,
   intergenic, intron, untranslated region, etc. ([97]Table 2). The
   statistical analysis of the SNP locations in the genome for the high
   RFI and low RFI groups shows ten different distributions (DOWNSTREAM,
   EXON, INTERGENIC, INTRON, SPLICE_SITE_ACCEPTOR, SPLICE_SITE_DONOR,
   SPLICE_REGION, UPSTREAM, UTR_3_PRIME, UTR_5_PRIME) ([98]Table 2). A
   single SNP may be located in multiple transcript regions. The analysis
   found that the INTRON region had the most SNP locations in both groups,
   with 429,995 areas annotated in the HRFI group and 435,881 areas in the
   LRFI group, significantly more than other functional regions. The next
   most abundant functional elements are INTERGENIC and DOWNSTREAM, while
   the remaining functional regions are less common. SPLICE_SITE_ACCEPTOR
   has the fewest functional regions, with 37 in the high RFI group and 73
   in the low RFI group. We hope to find most of the SNPs in the exonic
   regions, but coding regions generally experience higher selective
   pressure compared to non-coding regions ([99]Zhao et al., 2003). The
   annotation of SNPs in the high and low RFI groups accounts for 1.11%
   and 1.38% in the exonic regions, respectively. At the same time, this
   also explains our detection results: the higher distribution of SNPs in
   intron regions is partly due to the fact that unspliced transcripts are
   also detected during sequencing, and partly because intron regions
   constitute 47.51% of the whole genome, which is significantly higher
   than the length of exonic regions ([100]Fig. 4E). SNPs located in
   intergenic regions may be found in new genes or gene portions that have
   not been annotated yet.

Table 2. Genomic functional annotation of SNPs.

                          HRFI group      LRFI group
                         Count  Percent  Count  Percent
   DOWNSTREAM           35,350  6.05%   43,281  7.40%
   EXON                 6,504   1.11%   8,070   1.38%
   INTERGENIC           83,530  14.30%  62,282  10.65%
   INTRON               429,995 73.61%  435,881 74.52%
   SPLICE_SITE_ACCEPTOR 37      0.01%   73      0.01%
   SPLICE_SITE_DONOR    80      0.01%   169     0.03%
   SPLICE_SITE_REGION   309     0.05%   366     0.06%
   UPSTREAM             22,703  3.89%   27,289  4.67%
   UTR_3_PRIME          4,371   0.75%   5,920   1.01%
   UTR_5_PRIME          1,251   0.21%   1,621   0.28%
   [101]Open in a new tab

Influence prediction and amino acid change

   By using the SnpEff software to evaluate the potential effects of SNP
   mutations on codons, it was found that over 98% of the SNPs in both
   groups were classified as modifiers, having minimal effect on genes and
   proteins. However, 159 SNPs in the high RFI group and 293 SNPs in the
   low RFI group were predicted to have a high effect ([102]Figs. 5A,
   [103]5B; [104]Tables S3, [105]S4), warranting further investigation.
   This situation is as we expected, most SNPs are located in intron
   regions and intergenic areas, making it difficult to directly affect
   protein coding. Therefore, high-impact SNPs will be relatively fewer.
   Additionally, analyzing the overall levels of each amino acid can
   provide important insights into evolutionary pressures and adaptation
   mechanisms. This approach helps identify patterns and frequencies of
   amino acid substitutions in different biological contexts, enhancing
   our understanding of how these changes affect protein stability,
   function, and interactions. To analyze the potential impact of SNPs on
   genes and proteins, the effect of intergroup-specific SNPs ([106]Tables
   S5, [107]S6) on codons and subsequent amino acids was assessed. The
   analysis revealed that the amino acids most affected in both the high
   and low RFI groups were alanine-threonine, alanine-valine, and
   isoleucine-valine ([108]Figs. 5C, [109]5D). By identifying SNP loci
   that significantly impacted both the high and low RFI groups and
   mapping them to the corresponding genes using the SNPs annotation
   files, a total of 83 genes were identified in the high RFI group and 97
   genes in the low RFI group. Interestingly, one gene, JSP.1, belonging
   to the MHC class I family, was common in both groups, and played a key
   role in regulating animal health within the immune system ([110]Hewitt,
   2003).

Figure 5. Influence prediction and amino acid change.

   [111]Figure 5
   [112]Open in a new tab

   (A) High RFI group specific SNPs influence prediction statistics. (B)
   Low RFI group specific SNPs influence prediction statistics. (C) The
   amino acid changes caused by SNPs in the high RFI group were reference
   amino acids horizontally and altered amino acids vertically. Red
   background colors indicate that more changes happened (heat-map). (D)
   The amino acid changes caused by SNPs in the low RFI group.

Genes function annotation of high-impact SNP loci

   GO functional annotation and enrichment analysis were conducted for the
   aforementioned genes ([113]Figs. 6A, [114]6B). The results revealed
   that the enriched genes in the high and low RFI groups were primarily
   associated with protein binding and enzyme binding processes. Notably,
   a significant number of genes related to NADH activity were found in
   the low RFI (high feed efficiency) group. These genes were associated
   with oxidoreductase activity, acting on NADH or NADPH; NADH
   dehydrogenase activity; NADH dehydrogenase (ubiquinone) activity; NADH
   dehydrogenase (quinone) activity; and oxidoreductase activity, acting
   on NADH or NADPH, quinone, or similar compounds as acceptors. NADH and
   its phosphorylation product NADPH play pivotal roles as coenzymes in
   various metabolic activities, including cell signaling, protein
   modification, energy metabolism, mitochondrial function, calcium
   homeostasis, antioxidative stress, biosynthesis, and cell death
   ([115]Berger, Ramirez-Hernandez & Ziegler, 2004; [116]Patterson et al.,
   2005; [117]Xiao et al., 2018; [118]Ying, 2006, [119]2007, [120]2008).
   In particular, the enrichment genes ND4, ND5, and ND6 are core subunits
   of the mitochondrial respiratory chain NADH dehydrogenase (complex I).
   They facilitate the transfer of electrons from NADH through the
   respiratory chain, utilizing ubiquinone as an electron acceptor, and
   are crucial for the catalysis and assembly of complex I ([121]UniProt,
   2023).

Figure 6. Gene function annotation of high-impact SNP loci.

   [122]Figure 6
   [123]Open in a new tab

   (A) Gene GO enrichment analysis (p < 0.05) of HRFI group-specific
   high-impact SNP sites, and select the top 10 for each term type based
   on p-value. (C: cellular component; F: molecular function; P:
   biological process). (B) Gene GO enrichment analysis (p < 0.05) of LRFI
   group-specific high-impact SNP sites, and select the top 10 for each
   term type based on p value. (C) Gene KEGG enrichment analysis (p <
   0.05) of HRFI group-specific high-impact SNP sites, and select the top
   20 based on p value. (D) Gene KEGG enrichment analysis (p < 0.05) of
   LRFI group-specific high-impact SNP sites, and select the top 20 based
   on p value.

   The KEGG enrichment analysis outcomes indicated enrichment of pathways
   related to thyroid hormone synthesis, pancreatic secretion, gastric
   acid secretion, cAMP signaling pathway, thermogenesis, parathyroid
   hormone synthesis and secretion, glycerolipid metabolism, TNF signaling
   pathway and beta-alanine metabolism in the high and low RFI groups
   ([124]Figs. 6C, [125]6D). Notably, the thermogenesis pathway exhibited
   enrichment of ND4, ND5, and ND6 genes. Additionally, ATP1A2, SLC9A4,
   and PLA2G5 were identified as genes associated with energy metabolism
   ([126]Lingrel, 1992; [127]Sakuta et al., 2020; [128]Sun et al., 2004).

Protein-protein interaction analysis of high-impact SNP loci

   Protein-protein interaction analysis is a method used to study the
   interactions between proteins, which can be employed to uncover the
   relationships and networks among proteins, consequently explaining
   functional interactions and illustrating the intricate interconnections
   between proteins. Our results revealed distinct patterns of core genes
   and interaction relationships between the two groups. In the high RFI
   group, we identified 29 core genes and 23 interaction relationships,
   while in the low RFI group, we found 42 core genes and 41 interaction
   relationships ([129]Fig. 7). Several genes, such as HSP90AA1, EIF2AK3,
   PAK1, MAP3K7, PGM2L1, DNM1L and CYB5R3, were found to be related to
   energy metabolism, fat deposition and muscle development ([130]Badri et
   al., 2018; [131]Charoensook et al., 2012; [132]Chen et al., 2019;
   [133]Chiang & Jin, 2014; [134]Hogarth et al., 2018; [135]Liu et al.,
   2024; [136]Lopez-Bellon et al., 2022; [137]Zhang, O’Keefe & Jonason,
   2017).

Figure 7. Protein-protein interaction analysis of high-impact SNP loci.

   [138]Figure 7
   [139]Open in a new tab

   (A) Protein-protein interaction analysis of high-impact SNP sites in
   HRFI group, using string database, composed of 29 nodes and 23 edges.
   (B) Protein-protein interaction analysis of high-impact SNP sites in
   LRFI group, using string database, composed of 42 nodes and 41 edges.

Analysis of candidate gene SNP loci

   Based on the results of GO and KEGG analysis, we focused on
   phenotype-related terms. In the high RFI group we screened GO terms:
   positive regulation of metabolic process and multicellular organismal
   development; KEGG terms: thyroid hormone synthesis, gastric acid
   secretion, cAMP signaling pathway, pancreas signaling pathway. In the
   low RFI group we screened GO terms: oxidoreductase activity, acting on
   NADH or NADPH; KEGG terms: thermogenesis, metabolic pathways, TNF
   signaling pathway. Finally, 18 genes were identified in the high RFI
   group and 21 genes in the LRFI group ([140]Tables S3, [141]S4,
   [142]S7). Also combining protein interaction analysis and existing
   studies, we finally focused on 14 genes. In these genes, we combined
   the prediction results of SNP impact, showing high-impact SNP sites in
   the genes. We found that these SNPs are mostly located in the exon
   regions, and most are A-G mutation types. This type of mutation might
   change the coded amino acids, and affect the structure and function of
   the protein ([143]Table 3). However, their specific function and roles
   would be detected in the future studies for clarifying their variant
   effect to phenotype.

Table 3. Location of SNPs from candidate genes.

   Gene Chromosome Site Reference nucleotide Mutated nucleotide Group
   References