Abstract The regulation of residual feed intake (RFI) in beef cattle involves brain-gut mechanisms due to the interaction between neural signals in the brain and hunger or satiety in the gut. RNA-Seq data contain an extensive resource of untapped SNPs. Therefore, hypothalamic and duodenal tissues from ten extreme RFI individuals were collected, and transcriptome sequenced in this study. All the alignment data were combined according to RFI, and the SNPs in the same group were identified. A total of 270,410 SNPs were found in the high RFI group, and 255,120 SNPs were found in the low RFI group. Most SNPs were detected in the intronic region, followed by the intergenic region, and the exon region accounts for 1.11% and 1.38% in the high and low RFI groups, respectively. Prediction of high-impact SNPs and annotation of the genes in which they are located yielded 83 and 97 genes in the high-RFI and low-RFI groups, respectively. GO enrichment analysis of these genes revealed multiple NADH/NADPH-related pathways, with ND4, ND5, and ND6 significantly enriched as core subunits of NADH dehydrogenase (complex I), and is closely related to mitochondrial function. KEGG enrichment analysis of ND4, ND5, and ND6 genes was enriched in the thermogenic pathway. Multiple genes, such as ATP1A2, SLC9A4, and PLA2G5, were reported to be associated with RFI energy metabolism in the concurrent enrichment analysis. Protein-protein interaction analysis identified multiple potential candidate genes related to energy metabolism that were hypothesized to be potentially associated with the RFI phenotype. The results of this study will help to increase our understanding of identifying SNPs with significant genetic effects and their potential biological functions. Keywords: RNA-Seq, Single nucleotide polymorphisms, Beef cattle, Hypothalamus, Duodenum Introduction Feed cost accounts more than 70% of the total input cost in cattle production, making feed utilization a crucial metric for evaluating production expenditures ([42]Patience, Rossoni-Serao & Gutierrez, 2015). Efficient feed utilization can reduce herd maintenance costs by 9–10%, lower feed intake by 10–12%, and decrease methane emissions by 15–20% ([43]Moore, Mujibi & Sherman, 2009). Thus, optimizing feed utilization and reducing production costs are essential for livestock development. Residual feed intake (RFI) represents the disparity between the average daily feed intake (ADFI) and the average expected feed intake (AEFI) needed to maintain production levels ([44]Koch et al., 1963). RFI provides a precise measure of feed utilization efficiency in livestock, isolating the effects of animal growth traits and rates ([45]Richardson & Herd, 2004). RFI is a promising candidate for genetic improvement due to its moderate heritability (0.28–0.58) ([46]Moore, Mujibi & Sherman, 2009) and significant genetic variability ([47]Archer & Bergh, 2000; [48]Herd & Bishop, 2000). Our research found that RFI is related to gut microbiota ([49]Zhou et al., 2023), circRNA-miRNA interaction ([50]Zhao et al., 2023), and gene expression ([51]Yang et al., 2023, [52]2021, [53]2022). These studies support a comprehensive analysis of RFI and show that its influencing factors are numerous and complex. The hypothalamus and duodenum are critical organs in animal feed intake, energy metabolism, and digestion. The hypothalamic arcuate nucleus regulates appetite, where neuropeptide Y (NPY) and agouti-related peptide (AGRP) promote feeding. Conversely, α-MSH (α-melanocyte-stimulating hormone) induces satiety ([54]Perkins et al., 2014). The duodenum, a key organ for nutrient absorption, facilitates various metabolic functions such as glucose, fat, vitamin B, calcium, zinc, and iron ([55]Anand et al., 2021; [56]Cooke & Clark, 1976; [57]Reeves & Chaney, 2004). The interplay between the central nervous and digestive systems is evident in the microbiota-gut-brain axis (MGBA). The nervous system influences gut function through neurotransmitters and gut hormones, while gut microbes play crucial roles in host nutrient metabolism ([58]The 1000 Genomes Project Consortium, 2015; [59]Olivier, 2003). Thus, the close association of the hypothalamus and duodenum with feeding efficiency underscores their importance in studies investigating RFI in beef cattle. Single nucleotide polymorphism (SNP) is a genetic variation resulting from a single nucleotide change in the DNA sequence. These variants can influence phenotypes and disease susceptibilities ([60]Kim & Misra, 2007). Due to the low cost and high availability of RNA-seq data, coding region variants from RNA-Seq data are widely studied for their potential contribution to phenotype ([61]Karczewski et al., 2020). Transcriptome data offer gene expression levels that can be utilized to investigate cis-regulation based on the expression of genes with SNP sites ([62]Jehl et al., 2021). A wealth of research has been dedicated to extracting SNPs from transcriptomic data, yielding significant advancements across various fields. For instance, transcriptome sequencing of cow’s milk has facilitated the discovery of SNPs, providing a robust foundation for marker-trait association studies ([63]Canovas et al., 2010). In aquaculture, RNA-seq analysis has identified SNPs potentially linked to the immune response and the growth performance of Penaeus vannamei ([64]Santos, Andrade & Freitas, 2018). In crop science, the development of genome-wide SNP markers for barley has been achieved through reference-based RNA-Seq analysis ([65]Tanaka et al., 2019). In animal husbandry, RNA-Seq SNP data has revealed potential causal mutations relevant to pig production traits and the intricacies of RNA editing ([66]Martinez-Montes et al., 2017). SNP analysis of transcriptomic data from 20 human and bovine tissues revealed that cis-regulatory elements of gene expression are conserved between humans and cattle ([67]Yao et al., 2022). Differential expression of an intronic SNP in FABP4 was found to correlate with lipid transport and intracellular homeostatic regulation in studies of bovine rumen acidosis ([68]Zhao et al., 2017). This study characterized the SNPs from the hypothalamus and duodenum tissues of the same cattle with high and low residual feed intake based on RNA-seq data. The objective was to identify SNPs related to beef RFI and conduct subsequent bioinformatics analysis to detect the functional SNPs/genes associated with feed utilization performance in beef cattle and expand our understanding of the role of genetic variants in RFI phenotypes from expressed regions of the genome. Materials and Methods Experimental animals and data collection Based on our previous study ([69]Yang et al., 2021), 30 Qinchuan bulls with similar age (15 ± 1 months) and weight (280.6 ± 30.9 kg) were selected from a farm in Ningxia, China. The study subjects were given a standardized feeding regimen throughout the experimental period, and free access to water and food was ensured. Body weight measurements were taken monthly throughout the 81-day experimental period, then daily feed intake, average daily gain (ADG), and the midpoint metabolic body weight (MMBW0.75) was calculated based on feed intake (FI). To classify the cattle into high and low RFI groups, we used multiple linear regression of FI on the midpoint MMBW^0.75 and ADG to estimate individual RFI ([70]Yang et al., 2021). RNA extraction and sequencing Based on the results of the RFI calculation, five individuals with extremely low RFI (LRFI, high efficiency) and high RFI (HRFI, low efficiency) phenotypes were selected for slaughter after a 16-h fasting period. All experimental procedures involving animals were conducted by the Guidelines for Ethical Review of Laboratory Animal Welfare of Ningxia University (NXUC20211015). The hypothalamus (including the arcuate strong nucleus, parabrachial nucleus, supraoptic nucleus, dorsal/ventral medial nucleus, and other brain tissues) and the descending duodenum (mucosa, submucosa, and external muscular propria) were collected post-slaughter. Our tissue samples include hypothalamus and duodenum tissues from five high RFI and five low RFI cattle, totaling 20 samples. These tissue samples were washed with PBS to remove blood and other impurities, then cut into small pieces to increase surface area, and placed into sterile tubes for further processing. Total RNA was extracted from 500 mg of tissue samples using TRIzol method (TaKaRa Bio, Beijing, China), following the manufacturer’s instructions. After extraction, the RNA was further purified using column purification to enhance purity and remove residual contaminants. DNase treatment was performed to eliminate genomic DNA contamination. The quality and integrity of the extracted RNA were assessed using 1% agarose gel electrophoresis, Nanodrop, and Agilent 2100 to ensure a sample concentration of ≥500 ng/µL, 28S:18S > 1.0, and RIN ≥ 7. For library construction, the RNA was first reverse transcribed into cDNA. Adapter ligation was then performed to facilitate the sequencing process, followed by amplification to enrich the library. The library’s initial quantification was carried out using Qubit 2.0, and the insert size was verified using an Agilent 2100. The effective concentration of the library (effective concentration > 2 nM) was accurately determined using qRT-PCR. The hypothalamus tissue samples underwent whole transcriptome sequencing to capture both coding and non-coding RNA species, providing a comprehensive view of the transcriptome. The duodenum tissue samples were subjected to standard transcriptome sequencing. Finally, pair-end sequencing data (raw data) with 150 bp read length were generated using the Illumina HiSeq 4000 platform. High RFI sequencing data of hypothalamus and duodenum were named Q_H1~Q_H5 and S_H1~S_H5, respectively, while low RFI sequencing data were named Q_L1~Q_L5 and S_L1~S_L5, respectively. Quality control, mapping and transcript assembly The statistical power of this experimental design, calculated in RNASeqPower ([71]https://bioconductor.org/packages/release/bioc/html/RNASeqPower.ht ml) is 0.86 (The corresponding code can be found in [72]Script S1). The quality of raw data was assessed using the fastQC v.0.11.9. Subsequently, the Trimmomatic v.0.39 was used to perform quality control on the data. This included removing adapter sequences, trimming bases with Phred scores below 30 at the beginning and end of reads, applying a sliding window approach with a window size of 5 bp to remove bases with an average Phred score below 20, and discarding reads shorter than 75 bp. The cleaned data were then reevaluated using the fastQC software to ensure they met the requirements for subsequent analysis. The clean data were aligned to the bovine reference genome (ARS-UCD1.2; INSDC Assembly) using the STAR v.2.7.3a with the following parameters: --outSAMtype BAM Unsorted SortedByCoordinate, -outFilterMismatchNmax 999, --outFilterMismatchNoverReadLmax 0.04, --outFilterMultimapNmax 1. The resulting alignment files were further processed using the AddOrReplaceReadGroups tool of the PICARD v.2.27.4. This added the sample ReadGroups (RG) information to each alignment file. Additionally, the MarkDuplicates tool was applied to remove duplicate amplifications resulting from the PCR process during library construction. Merging of sample data For increasing the number of reads per variant locus, enhancing the depth coverage of reads across the entire transcriptome, as well as the depth coverage and quality of variant calls ([73]Lam et al., 2020), in this study, the data from hypothalamic and duodenal tissues were merged into two BAM files based on phenotype (high RFI group and low RFI group). This merging process, performed using the “merge” command of the samtools v.1.16.1, aimed to balance sequencing depth between samples and minimize the impact on SNP analysis results. Both the high RFI and low RFI groups in the subsequent analysis referred to the combined group data ([74]Fig. 1). Figure 1. Sample collection and bioinformatics analysis. [75]Figure 1 [76]Open in a new tab Hypothalamic and duodenal tissues from extreme RFI individuals were collected from 30 Qinchuan cows. After transcriptome sequencing, quality control, alignment, and deduplication, the alignment files were merged. The data from the two tissues were merged according to the RFI group, resulting in two alignment files. The merger greatly increased the depth (DP) of the reads, and the average reads DP of the two groups was basically consistent. Finally, the BCFtools software was used to identify SNPs in the merged data. SNPs recognition, filtering and annotation BCFtools v.1.16 was utilized to execute the variant calling on the combined data of the high RFI group and low RFI group respectively, enabling identification of SNP sites and generating BCF files containing variant information. The “norm” parameter of BCFtools was then employed to normalize the variant information, thereby eliminating ambiguity caused by varying methods. Subsequently, the low-quality SNPs data underwent further filtering to reduce the likelihood of false positives and alleviate computational resource requirements for subsequent analysis. The software BCFtools and VCFtools v.0.1.16) were employed for variant filtering, employing the following criteria: (1) Removal of SNPs within a 5 bp range near indels; (2) setting a minimum coverage (DP) of 10; (3) enforcing a minimum allele frequency not less than 0.2 and a secondary allele depth not less than 2; (4) filtering loci with quality scores below 30. Finally, the functional annotation of SNPs was performed using the SnpEff v.5.1d with the built-in ARS-UCD1.2.105 database. The thresholds of above software are referenced from previous study ([77]Lam et al., 2020). Identifying and annotating high and low RFI group-specific SNPs Using the SnpEff software, the VCF files underwent annotation, allowing for the identification of SNPs specific to the high and low RFI groups. SnpSift v.5.1d was then employed to screen for SNP loci with significant functional and modifier-type impacts. This enabled the selection of candidate genes associated with these SNP loci. Gene function enrichment analysis and protein interaction network analysis We employ clusterProfiler to conduct GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis. The filter parameter is set as p value < 0.05. The GO enrichment can further clarify the main biological functions of the genes where the specific SNPs are located. The KEGG pathway enrichment can be used to understand the signal pathway regulated by the genes. Using the STRING database ([78]https://cn.string-db.org/), we perform protein interaction analysis on the relevant genes to select core genes that have interaction effects. Statistical analysis The data from the experiment were analyzed and visualized using R v.4.3.0 ([79]R Core Team, 2023) and Prism v.10.1.1 software. Statistical significance between the treatment and control groups was assessed using non-parametric tests or t-tests. A p-value of less than 0.05 was considered indicative of a significant difference. The corresponding code scripts can be found in [80]Script S1. Results RNA-seq sequencing data quality and comparisons From the RNA-seq data of the hypothalamus and duodenum, we obtained a total of 1,019 million and 275 million paired sequencing reads, respectively. After quality control, all 20 samples in this study had a Q30 score (error rate p < 0.001) above 96%. The GC% content was approximately 50% ([81]Table S1). The clean data was aligned to the bovine reference genome, where the percentage of reads aligning to the reference genome was above 91%, and the percentage of reads with a unique alignment position ranged from 81.07% to 93.81% ([82]Table S2). Analysis of the transcript expression for each sample indicated relatively consistent transcript abundances ([83]Fig. 2). These results demonstrate the high quality of the obtained data, reducing the impact of sequencing errors on subsequent analysis. Figure 2. Transcript expression (TPM) analysis of each sample. [84]Figure 2 [85]Open in a new tab (A) Transcript expression density in each sample. (B) Transcript expression in each sample. SNPs screening and analysis SNPs in the high and low RFI groups were identified by combining RFI phenotype with RNA-Seq data from hypothalamic and duodenal tissues. The numbers of homozygous and heterozygous mutations in the combined samples were counted ([86]Table 1), revealing that there were 270,410 and 255,120 specific SNPs in the high and low RFI groups, respectively. Among these SNPs, 11,991 (4.43%) and 14,007 (5.49%) were homozygous mutations, with heterozygous mutations far outnumbering homozygous mutations. Statistical analysis based on the different types of mutations in the SNPs ([87]Fig. 3) showed that the total number of transition types (A–G, C–T) was higher than the total number of transversion types (A–T, C–G, A–C, G–T). Among the six types of single nucleotide variations, A–G and C–T had the highest occurrence rates, while the occurrence rates of the other four types of transitions were relatively lower. In the high RFI group, A–G accounted for 36.5% and C–T accounted for 35.4%. In the low RFI group, A–G accounted for 35.7% and C-T accounted for 35.8%. The Ts/Tv ratios in the high and low RFI groups were 2.55 and 2.50, respectively. The differences in occurrence rates of the six types of single nucleotide variations between the high and low RFI groups were small, with a mean occurrence rate of transitions being 71.72% and transversions being 28.29%. Table 1. Statistics on the number of homozygote and heterozygote SNPs in different RFI groups. RFI group SNP statistics Homozygote sites Heterozygote sites HRFI Number of SNPs 11,991 258,419 Probability 4.43% 95.57% LRFI Number of SNPs 14,007 241,113 Probability 5.49% 94.51% [88]Open in a new tab Figure 3. High RFI group and low RFI group combined data SNPs type statistics. [89]Figure 3 [90]Open in a new tab (A) Statistics on SNPs types in the high RFI group. (B) Statistics on SNPs types in the low RFI group. Distribution statistics of SNPs The study investigated the distribution and variation of SNPs across different chromosomes, providing insights into the genetic diversity among genes. Analysis of the combined high and low RFI groups revealed no significant difference (p > 0.05) in distribution between the two groups. Chromosome 1 exhibited the highest number of SNPs, while the mitochondria (MT) showed the lowest distribution ([91]Fig. 4A). To account for differences in chromosome length, the ratio of SNPs number to chromosome length was calculated, revealing that the MT had the highest variant rate among both groups, indicating a higher density of SNPs per unit length ([92]Fig. 4B). The chromosomal distribution of SNPs within the high and low RFI groups is delineated in [93]Figs. 4C and [94]4D. Figure 4. SNPs distribution statistics on chromosomes. [95]Figure 4 [96]Open in a new tab (A) Statistics on the number of SNPs on different chromosomes. (B) SNPs number and length ratio statistics on different chromosomes. (C) Distribution of SNPs on chromosome locations in the high RFI group. (D) Distribution of SNPs on chromosome locations in the low RFI group. (E) Percentage of the genome comprising each type of feature (top) and the proportion of SNPs detected by HRFI group-specific SNPs (middle) and LRFI group-specific SNPs (bottom) across these genomic features. Comprehensive statistical analysis of SNP loci distribution across the genome was conducted for both high and low RFI datasets, emphasizing different genomic functional regions, such as downstream, exon, intergenic, intron, untranslated region, etc. ([97]Table 2). The statistical analysis of the SNP locations in the genome for the high RFI and low RFI groups shows ten different distributions (DOWNSTREAM, EXON, INTERGENIC, INTRON, SPLICE_SITE_ACCEPTOR, SPLICE_SITE_DONOR, SPLICE_REGION, UPSTREAM, UTR_3_PRIME, UTR_5_PRIME) ([98]Table 2). A single SNP may be located in multiple transcript regions. The analysis found that the INTRON region had the most SNP locations in both groups, with 429,995 areas annotated in the HRFI group and 435,881 areas in the LRFI group, significantly more than other functional regions. The next most abundant functional elements are INTERGENIC and DOWNSTREAM, while the remaining functional regions are less common. SPLICE_SITE_ACCEPTOR has the fewest functional regions, with 37 in the high RFI group and 73 in the low RFI group. We hope to find most of the SNPs in the exonic regions, but coding regions generally experience higher selective pressure compared to non-coding regions ([99]Zhao et al., 2003). The annotation of SNPs in the high and low RFI groups accounts for 1.11% and 1.38% in the exonic regions, respectively. At the same time, this also explains our detection results: the higher distribution of SNPs in intron regions is partly due to the fact that unspliced transcripts are also detected during sequencing, and partly because intron regions constitute 47.51% of the whole genome, which is significantly higher than the length of exonic regions ([100]Fig. 4E). SNPs located in intergenic regions may be found in new genes or gene portions that have not been annotated yet. Table 2. Genomic functional annotation of SNPs. HRFI group LRFI group Count Percent Count Percent DOWNSTREAM 35,350 6.05% 43,281 7.40% EXON 6,504 1.11% 8,070 1.38% INTERGENIC 83,530 14.30% 62,282 10.65% INTRON 429,995 73.61% 435,881 74.52% SPLICE_SITE_ACCEPTOR 37 0.01% 73 0.01% SPLICE_SITE_DONOR 80 0.01% 169 0.03% SPLICE_SITE_REGION 309 0.05% 366 0.06% UPSTREAM 22,703 3.89% 27,289 4.67% UTR_3_PRIME 4,371 0.75% 5,920 1.01% UTR_5_PRIME 1,251 0.21% 1,621 0.28% [101]Open in a new tab Influence prediction and amino acid change By using the SnpEff software to evaluate the potential effects of SNP mutations on codons, it was found that over 98% of the SNPs in both groups were classified as modifiers, having minimal effect on genes and proteins. However, 159 SNPs in the high RFI group and 293 SNPs in the low RFI group were predicted to have a high effect ([102]Figs. 5A, [103]5B; [104]Tables S3, [105]S4), warranting further investigation. This situation is as we expected, most SNPs are located in intron regions and intergenic areas, making it difficult to directly affect protein coding. Therefore, high-impact SNPs will be relatively fewer. Additionally, analyzing the overall levels of each amino acid can provide important insights into evolutionary pressures and adaptation mechanisms. This approach helps identify patterns and frequencies of amino acid substitutions in different biological contexts, enhancing our understanding of how these changes affect protein stability, function, and interactions. To analyze the potential impact of SNPs on genes and proteins, the effect of intergroup-specific SNPs ([106]Tables S5, [107]S6) on codons and subsequent amino acids was assessed. The analysis revealed that the amino acids most affected in both the high and low RFI groups were alanine-threonine, alanine-valine, and isoleucine-valine ([108]Figs. 5C, [109]5D). By identifying SNP loci that significantly impacted both the high and low RFI groups and mapping them to the corresponding genes using the SNPs annotation files, a total of 83 genes were identified in the high RFI group and 97 genes in the low RFI group. Interestingly, one gene, JSP.1, belonging to the MHC class I family, was common in both groups, and played a key role in regulating animal health within the immune system ([110]Hewitt, 2003). Figure 5. Influence prediction and amino acid change. [111]Figure 5 [112]Open in a new tab (A) High RFI group specific SNPs influence prediction statistics. (B) Low RFI group specific SNPs influence prediction statistics. (C) The amino acid changes caused by SNPs in the high RFI group were reference amino acids horizontally and altered amino acids vertically. Red background colors indicate that more changes happened (heat-map). (D) The amino acid changes caused by SNPs in the low RFI group. Genes function annotation of high-impact SNP loci GO functional annotation and enrichment analysis were conducted for the aforementioned genes ([113]Figs. 6A, [114]6B). The results revealed that the enriched genes in the high and low RFI groups were primarily associated with protein binding and enzyme binding processes. Notably, a significant number of genes related to NADH activity were found in the low RFI (high feed efficiency) group. These genes were associated with oxidoreductase activity, acting on NADH or NADPH; NADH dehydrogenase activity; NADH dehydrogenase (ubiquinone) activity; NADH dehydrogenase (quinone) activity; and oxidoreductase activity, acting on NADH or NADPH, quinone, or similar compounds as acceptors. NADH and its phosphorylation product NADPH play pivotal roles as coenzymes in various metabolic activities, including cell signaling, protein modification, energy metabolism, mitochondrial function, calcium homeostasis, antioxidative stress, biosynthesis, and cell death ([115]Berger, Ramirez-Hernandez & Ziegler, 2004; [116]Patterson et al., 2005; [117]Xiao et al., 2018; [118]Ying, 2006, [119]2007, [120]2008). In particular, the enrichment genes ND4, ND5, and ND6 are core subunits of the mitochondrial respiratory chain NADH dehydrogenase (complex I). They facilitate the transfer of electrons from NADH through the respiratory chain, utilizing ubiquinone as an electron acceptor, and are crucial for the catalysis and assembly of complex I ([121]UniProt, 2023). Figure 6. Gene function annotation of high-impact SNP loci. [122]Figure 6 [123]Open in a new tab (A) Gene GO enrichment analysis (p < 0.05) of HRFI group-specific high-impact SNP sites, and select the top 10 for each term type based on p-value. (C: cellular component; F: molecular function; P: biological process). (B) Gene GO enrichment analysis (p < 0.05) of LRFI group-specific high-impact SNP sites, and select the top 10 for each term type based on p value. (C) Gene KEGG enrichment analysis (p < 0.05) of HRFI group-specific high-impact SNP sites, and select the top 20 based on p value. (D) Gene KEGG enrichment analysis (p < 0.05) of LRFI group-specific high-impact SNP sites, and select the top 20 based on p value. The KEGG enrichment analysis outcomes indicated enrichment of pathways related to thyroid hormone synthesis, pancreatic secretion, gastric acid secretion, cAMP signaling pathway, thermogenesis, parathyroid hormone synthesis and secretion, glycerolipid metabolism, TNF signaling pathway and beta-alanine metabolism in the high and low RFI groups ([124]Figs. 6C, [125]6D). Notably, the thermogenesis pathway exhibited enrichment of ND4, ND5, and ND6 genes. Additionally, ATP1A2, SLC9A4, and PLA2G5 were identified as genes associated with energy metabolism ([126]Lingrel, 1992; [127]Sakuta et al., 2020; [128]Sun et al., 2004). Protein-protein interaction analysis of high-impact SNP loci Protein-protein interaction analysis is a method used to study the interactions between proteins, which can be employed to uncover the relationships and networks among proteins, consequently explaining functional interactions and illustrating the intricate interconnections between proteins. Our results revealed distinct patterns of core genes and interaction relationships between the two groups. In the high RFI group, we identified 29 core genes and 23 interaction relationships, while in the low RFI group, we found 42 core genes and 41 interaction relationships ([129]Fig. 7). Several genes, such as HSP90AA1, EIF2AK3, PAK1, MAP3K7, PGM2L1, DNM1L and CYB5R3, were found to be related to energy metabolism, fat deposition and muscle development ([130]Badri et al., 2018; [131]Charoensook et al., 2012; [132]Chen et al., 2019; [133]Chiang & Jin, 2014; [134]Hogarth et al., 2018; [135]Liu et al., 2024; [136]Lopez-Bellon et al., 2022; [137]Zhang, O’Keefe & Jonason, 2017). Figure 7. Protein-protein interaction analysis of high-impact SNP loci. [138]Figure 7 [139]Open in a new tab (A) Protein-protein interaction analysis of high-impact SNP sites in HRFI group, using string database, composed of 29 nodes and 23 edges. (B) Protein-protein interaction analysis of high-impact SNP sites in LRFI group, using string database, composed of 42 nodes and 41 edges. Analysis of candidate gene SNP loci Based on the results of GO and KEGG analysis, we focused on phenotype-related terms. In the high RFI group we screened GO terms: positive regulation of metabolic process and multicellular organismal development; KEGG terms: thyroid hormone synthesis, gastric acid secretion, cAMP signaling pathway, pancreas signaling pathway. In the low RFI group we screened GO terms: oxidoreductase activity, acting on NADH or NADPH; KEGG terms: thermogenesis, metabolic pathways, TNF signaling pathway. Finally, 18 genes were identified in the high RFI group and 21 genes in the LRFI group ([140]Tables S3, [141]S4, [142]S7). Also combining protein interaction analysis and existing studies, we finally focused on 14 genes. In these genes, we combined the prediction results of SNP impact, showing high-impact SNP sites in the genes. We found that these SNPs are mostly located in the exon regions, and most are A-G mutation types. This type of mutation might change the coded amino acids, and affect the structure and function of the protein ([143]Table 3). However, their specific function and roles would be detected in the future studies for clarifying their variant effect to phenotype. Table 3. Location of SNPs from candidate genes. Gene Chromosome Site Reference nucleotide Mutated nucleotide Group References