Abstract Loin muscle area (LMA) is an indicator of carcass composition and is related to weight gain, animal musculature and meat quality traits. Therefore, integrating multi-omics data to reveal candidate genes affecting LMA has attracted extensive attention. We used the combined analysis method of GWAS and RNA-seq to find the candidate genes that affect the size of LMA. The association of 770K SNPs with the LMA captured four significant SNPs within or near three genes. Additionally, seven overlapping genes regarding LMA were determined via the analysis of differentially expressed genes (DEGs) and weighted gene co-expression network analysis (WGCNA). There is an overlapping gene (CD93) between the results of GWAS and DEGs. Through functional enrichment analysis of the above genes, candidate genes were identified as THBD, CD93, RIMS2, PLP1, SNCA, and NDUFS8, and it was found that they mainly affected the size of LMA by affecting muscle fiber diameter, muscle cell development, differentiation, and function. The findings provide valuable molecular insights into the mechanisms that influence LMA content in beef cattle. Introduction Beef is a source of high-quality protein, beneficial fatty acids, and various trace elements in the human diet, which plays an essential role in maintaining the healthy diet [[52]1]. Improving meat quality traits in beef cattle is a key issue facing the beef cattle breeding industry and a hot research topic for beef cattle breeding scholars. Loin muscle area (LMA) is the surface area of the longissimus dorsi muscle (ribeye) at the 12th to 13th rib interface of beef cattle. Previous studies have estimated the heritability of LMA to range from 0.24 to 0.32 [[53]2–[54]4], indicating that it is a moderately heritable trait that can be improved through selective breeding. The identification of genes that influence LMA in beef cattle is important, especially in the improvement of meat quality and breeding. LMA is closely related to net meat yield and slaughter rate, implying that net meat yield and slaughter rate can be increased by increasing the LMA, which is essential for improving the economic value of beef cattle. Huaxi cattle, is a new specialized beef cattle breed in China, which consists of 24/32 Northern America beef Simmental cattle, 5/32 of dual-purpose Simmental cattle from Germany and Austria, 1/32 Charolais cattle, 1/32 Sanhe cattle, and 1/32 Mongolian cattleSince the beginning of breeding in 1978, Huaxi cattle have shown good performance in growth rate, slaughter rate and net meat rate [[55]5]. Genome-wide association analysis (GWAS) refers to the analysis of the association between the molecular marker data obtained from the scan and phenotypic traits through genome-wide scanning, to identify genes that affect certain phenotypic or quantitative traits. As an effective molecular genetics method, GWAS plays an important role in the identification of functional genes for quantitative traits and the study of genetic variation of important economic traits in livestock and poultry [[56]6]. A series of genetic loci significantly associated with beef carcass traits have been identified using GWAS. For example, CARTPT gene affects LMA by encoding multiple peptides that affect tissue growth and development [[57]7]. The CHKA gene [[58]8] influences backfat thickness (BFT) by influencing lipid biosynthesis. RNA sequencing (RNA-seq) enables the study of gene function and regulatory mechanisms at the global level and the discovery of candidate genes associated with specific functions [[59]9]. The analysis of differentially expressed genes (DEGs) and weighted gene co-expression network analysis (WGCNA) were two mainstream strategies for transcriptomics analysis [[60]10]. The analysis of DEGs is a basic method to compare the differences and similarities of gene expression between two groups of different samples, which can obtain the genes that are significantly up-regulated and down-regulated in one group of samples compared with the other group. Further study of the function of these DEGs will eventually detect the vital genes that affect the target traits. The WGCNA approach aims to find co-expressed gene modules and to explore the associative relationship between the gene network and the phenotype of interest, as well as the core genes in the network. Santos Silva used RNA-seq technology to identify DEGs in the longissimus dorsi muscle of Neolore cattle and predicted that three key genes were related to LMA [[61]11]. Wang H [[62]12] used the WGCNA method to detect six gene modules significantly related to sex between obese boars and sows and finally identified seven DEGs affecting sex. Yang M [[63]13] identified the DBI gene as a candidate gene affecting intramuscular fat in Beijing black pigs using a combined analysis of GWAS and RNA-seq. The FGL1 gene had a significant effect on LMA [[64]14]. GWAS was performed on Simmental beef quality traits to discover 19 candidate SNP loci and 11 candidate genes associated with five meat quality traits such as meat color and LMA, among which KCTD16 and LOC506594 were related to LMA [[65]15]. Although numerous studies on LMA have performed, DEGs affecting LMA in Huaxi cattle still need to be further determined. Therefore, this study aimed to identify candidate genes affecting the size of LMA in Huaxi cattle using the combined analysis of GWAS and RNA-seq. Materials and methods Institutional review board statement The animal study protocol was approved by the Animal Welfare and Ethics Committee of Institute of Animal Science, Chinese Academy of Agricultural Sciences (IAS2020–21; Beijing, China). Samples collection and measurement The 1,507 experimental individuals in this study were from the Chinese Huaxi cattle resource group, which was established by the Bovine Genetic Breeding Team of the Institute of Animal Sciences of the Chinese Academy of Agricultural Sciences. Blood samples were collected for genotyping during routine farm quarantine procedures. All experimental animals were slaughtered in strict accordance with GB/T 27642–2011. Cattle to be slaughtered are electrocuted and then bled for slaughter. At the time of slaughter, Samples of the left longissimus dorsi muscle from all male individuals were collected and placed into 5 mL freezing tubes, which were immediately frozen in liquid nitrogen and subsequently transferred to a liquid nitrogen tank for long-term storage. These samples will be used for transcriptome sequencing. The section of 12 rib Loin muscle of Huaxi cattle was drawn with sulfuric acid paper [[66]11,[67]16]. Subsequently, the drawn LMA is placed on the measuring board, the number of dots in the measuring board is counted, and the LMA is calculated according to the number of dots, where one dot represents an area of 1 cm^2. Bulls with an average body weight of 665 kg and an average age of 25 months were selected and divided into two groups for differential expression analysis: a high loin muscle area group (HLMA, n = 6) and a low loin muscle area group (LLMA, n = 6). Additionally, WGCNA analysis was conducted on bulls (n = 20) with the same average body weight and age. Genotyping and quality control Genomic DNA was extracted from blood samples using a TIANamp Blood DNA Kit (Tiangen Biotech Company Limited, Beijing, China). DNA with 260/280 ratios between 1.8 and 2.0 was further analyzed and genotyped by 770K SNPs Illumina BovineHD BeadChip (Illumina Inc., San Diego, California, USA). Using PLINK (v1.90) for individuals and SNPs quality control. The quality control criteria are as follows: SNPs with deletion rates greater than 0.05, individuals with genotype deletion greater than 10% [[68]17], SNPs with minor allele frequency (<0.05) and Hardy-Weinberg balance test (P < 1e-6) [[69]18], and individuals with heterozygosity levels higher than 3 standard deviations were excluded [[70]19]. Finally, 1,488 samples and 605,671 SNPs were remained for subsequent analysis. Single-trait GWAS analysis In this study, according to the actual situation of resource groups, the significance test was conducted on the variables influencing the original phenotypic data, and the significant variables were used to adjust the phenotypic values accordingly. The model is as follows [[71]10]: [MATH: yij k=μ+s exi +fieldj+yeark+eijk :MATH] (1) [MATH: yij k :MATH] is the phenotypic value; [MATH: μ :MATH] is the population mean; [MATH: sex :MATH] , [MATH: field :MATH] , and [MATH: year :MATH] as fixed factors; [MATH: e :MATH] is the residual after the normal distribution [MATH: N(0,I< mi>σ2) :MATH] , where [MATH: I :MATH] is the identity matrix. Through the significance test, [MATH: sex :MATH] and [MATH: year :MATH] were selected as fixed factors to be incorporated into the linear model, and the phenotypic values were corrected. Subsequently, PLINK (v1.90) software was used to calculate the principal components to correct the population stratification, and then using the twstats method to test the significance of the principal components. The results showed that the P-values of the first three principal components were less than 0.05, thus the first three principal components were selected to be as covariables. This study selected the first three principal components to control for population stratification effects. The Fixed and Random Model Circulating Probability Unification (FarmCPU) method was used to evaluate the population structure, as this model offers higher statistical power and computational efficiency compared to general linear models and mixed linear models. The rMVP package (v1.0.5) was used to explore the association between genomic variations and phenotypes [[72]20]. The significance threshold for GWAS was set at P = 0.05/N, where N is the number of quality-controlled SNPs. SNPs with p-values below this threshold were considered significant. These significant SNPs were then mapped to the bovine reference genome ARS-UCD 1.2, and linked genes within 100Kb upstream and downstream of these SNPs were identified using the BioMart module on the Ensembl website. These genes were further evaluated by combining literature reviews and gene function annotations to determine their potential influence on LMA trait. RNA-seq library preparation Following the manufacturer’s instructions, the TRIzol reagent (Invitrogen, Life Technologies) was used to extract total RNA from LMA of Huaxi The concentration, quality, and integrity of RNA samples were examined by Nano photometer Spectrophotometer (Thermo Fisher Scientific, MA, USA) and RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). To ensure suitability for constructing sequencing libraries, all RNA samples had OD260/OD280 values between 1.8 and 2.0, and an RNA integrity number (RIN) greater than 7.0. The qualified samples were then sent to the Illumina NovaSeq 6000 platform for paired-end RNA sequencing (read length 150 bp). The RNA sequencing was completed by Beijing Novogene Technology Co., Ltd. RNA-seq data preprocessing After the sequencing data were obtained, FastQC (v0.11.4) software was used to detect the quality of original Reads (mass fraction, GC content, length distribution, etc.), and Trimmomatic (v0.39) was employed for quality control including removing low-quality sequences and splitter sequences, resulting high-quality clean reads for further analysis. The index was constructed using Hisat2 (v2.2.1) software and clean reads were compared to the bovine reference genome (ARS-UCD 1.2), and then the SAM files were converted to BAM files by SAMtools (v1.11) software. Finally, the raw count file of gene expression was obtained by using FeaturesCounts (v1.5.2) software. The reading numbers of all samples were combined. Per kilobases per million read (FPKM) values were calculated using R. Identification of differential gene expression The high (H) and low (L) LMA groups each consisted of six Huaxi cattle, with one group having high LMA and the other having low LMA. The raw count of individuals in the two groups was analyzed by DESeq2 (v1.18.0) software. The criteria for DEGs were |log2FoldChange| ≥ 1 and FDR < 0.05. Subsequently, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment [[73]21] and Gene Ontology (GO) annotation [[74]22] were performed for these genes using the clusterProfiler R package (v4.12.0) [[75]23]. GO annotation covers three categories: Biological process (BP), Cellular Component (CC), and Molecular function (MF). The significance threshold of enrichment was set as P < 0.05. Weighted gene co-expression network analysis Weighted gene co-expression network analysis (WGCNA) is used to identify clusters (modules) of highly correlated genes, summarize these modules using either the module eigengene or an intramodular hub gene, and explore relationships between modules, as well as their associations with external sample traits through eigengene network analysis. Additionally, WGCNA allows for the calculation of module membership measures [[76]24]. This experiment used the WGCNA R package (v1.72) to construct a gene co-expression network of all genes and LMA trait in 20 samples. Hub genes in the significance module were defined with Gene significance (|GS|) was greater than 0.2 and the absolute value of Module membership (|MM|) was greater than 0.8. Construction of protein interaction network and further identification of core genes Constructing protein-protein interaction (PPI) networks for Hub genes within each significant module [[77]25] via the STRING database. Then, the intersection and visualization of the top ten significant Hub genes of each significant module based on Maximum Clique Centrality (MCC), Neighborhood Component Centrality (MNC), and Degree were identified by the CytoHubba plug-in of Cytoscape software [[78]26]. Finally, The selected genes were further intersected with DEGs for additional analysis. KEGG pathway enrichment and GO annotation were subsequently conducted on the KOBAS website for both these genes and the genes identified through GWAS. Results GWAS results After quality control of the selected samples, 1,488 samples as well as 605,671 SNPs were subjected to GWAS analysis. The heritability of LMA was calculated as 0.43 using GCTA [[79]27], which is high heritability, indicating that the trait can be improved by genetic methods. The Manhattan plot in [80]Fig 1A showed that four significant SNP loci were screened at the genome-wide level (P = 0.05/605,671); and the QQ plot in [81]Fig 1B indicated that there was a consistent relationship between the actual and theoretical values for most of the loci, which effectively demonstrated the effectiveness of this method in population control [[82]28]. The basic information of these four significant SNP is shown in [83]Table 1. Fig 1. Genome-wide association analysis of LMA trait in the Huaxi cattle population. [84]Fig 1 [85]Open in a new tab (A) Manhattan plot. The red dotted lines indicated the significant threshold (P = 0.05/605,671). (B) QQ plot. Table 1. The annotation of significant SNPs. SNP ID Chr POS REF ALT SE P.adj BovineHD1200013268 12 48368923 A G 0.568644443 6.41E-17 BovineHD1300012261 13 42247767 A G 0.418439684 6.68E-08 BovineHD1400017455 14 62769117 A C 0.91433653 6.31E-11 BovineHD2000018602 20 65011159 A G 0.752846917 4.68E-09 [86]Open in a new tab Based on the BioMart module, these significant SNPs were identified to be located in or close to three coding genes. BovineHD1300012261 on Bos taurus autosomes (BTA) 13 was located in or close to the THBD gene and CD93 gene; BovineHD1400017455 on BTA 14 was located in the RIMS2 gene. The two significant SNPs located on chromosome 12 as well as 20 did not detect reveal any associated genes. Results of transcriptome sequencing data comparison S1 Table in [87]S1 File demonstrated the basic statistics for the LMA in the transcriptome 12 experimental Huaxi cattle. After quality control of the transcriptome sequencing data, an average of 21,848,620 clean reads were obtained for each sample. The comparison results showed that the multiple mapping rate, the unique mapping rate, the total mapping rate, and the unmapped rate of the 12 samples were 2.74% ~ 3.79%, 90.08% ~ 91.83%, and 95.97% ~ 97.12%, and 5.16% ~ 6.69%, respectively (S2 Table in [88]S1 File), which indicated that the sequencing results were better to be used for the subsequent analysis. Identification of DEGs The individuals in high LMA and low LMA groups were screened according to the values of LMA trait. Principal component analysis was performed on the gene expression data of the two groups of individuals after quality control. [89]Fig 2A showed the sample cluster graph based on the first two principal components. PC1 and PC2 explained 40.3% and 8.6% of the gene expression variation, respectively. DEGs were screened with | log2FoldChange |≥1 and FDR < 0.05. Finally, a total of 2,896 DEGs were identified in the comparison group of HLMA and LLMA, including 647 up-regulated genes and 2,249 down-regulated genes ([90]Fig 2B). Heatmap demonstrated that the expression patterns of these differential genes were similar within groups while different between groups ([91]Fig 2C). Fig 2. Identification of DEGs in the high LMA and low LMA groups. [92]Fig 2 [93]Open in a new tab (A) PCA of the samples. Green and red dots indicated samples with high LMA and low LMA, respectively. (B) Volcano plot of DEGs. Red dots represented significantly up-regulated genes; dark-green dots indicated significantly down-regulated genes; black dots indicated genes with no significant effect. (C) Heatmap of DEGs. H: high LMA group; L: low LMA group. Functional enrichment analysis of DEGs To further understand the biological function of DEGs, 2,896 DEGs were enriched by GO and KEGG. The results showed that DEGs were enriched in 268 GO terms, including 203 biological processes (BP), 36 cellular components (CC), and 29 molecular functions (MF). These DEGs were mainly concentrated in cell adhesion (GO:0007155), actin cytoskeleton organization (GO:0030036), and muscle cell differentiation (GO:0042692) ([94]Fig 3A). Furthermore, these DEGs were enriched in 103 KEGG pathways ([95]Fig 3B), among which the pathways related to muscle development included MAPK signaling pathway (bta04010), Wnt signaling pathway (bta04310), and mTOR signaling pathway (bta04150) could be served as important pathways affecting LMA trait. Fig 3. Functional enrichment analysis of DEGs. [96]Fig 3 [97]Open in a new tab (A) BP, CC, and MF showed the top ten terms, respectively; (B) The top 15 KEGG pathways enrichment of significant DEGs. Construction of weighted gene co-expression network Firstly, the hclust function was used to perform hierarchical clustering on 21,297 genes in 20 samples to eliminate outlier samples [[98]24]. As shown in [99]Fig 4A, 19 samples were retained for subsequent analysis. Second, the scale-free co-expression gene network was established with the one-step network construction function, and the soft threshold (β) was calculated using the R function “ pickSoftThreshold “ ([100]Fig 4B). Then, the adjacency matrix was transformed into topological overlap matrix (TOM) and the corresponding dissimilarity (1-TOM) was calculated. Using hierarchical clustering and dynamic tree cutting to detect the modules in the network. Modules with a height value lower than 0.25 were merged, resulting in 19 distinct modules. Correlation analysis between these 19 modules and LMA traits revealed that the sienna3 and lightcyan1 modules were strongly positively correlated with LMA. Fig 4. WGCNA analysis. [101]Fig 4 [102]Open in a new tab (A) Sample clustering tree. (B) Soft threshold estimation based on adjacency matrix. (C) Dynamic cutting tree algorithm to partition gene modules. (D) Modular-trait heatmap of correlation between gene modules and LMA. Each module contains the corresponding correlation coefficient and p-value. Hub genes screening in the sienna3 module and lightcyan1 module Hub genes in the sienna3 module and lightcyan1 module were defined with GS was greater than 0.2 and MM greater than 0.8. The results showed there were 18 Hub genes in the sienna3 module ([103]Fig 5A) and 105 Hub genes in the lightcyan1 module ([104]Fig 5B), respectively. Then, the top 10 genes of hub gene in lightcyan1 module and the top 10 genes of all genes in sienna3 module were selected according to MCC, MNC and Degree calculated by Cytohubba plug-in in Cytoscape software. At the same time, the intersection of genes obtained by the three algorithms was taken ([105]Fig 5C and [106]5D). Subsequently, these initially selected genes were further intersected with DEGs and the overlapping genes including PLP1, SNCA, MRPL36, NDUFS8, NDUFB7, NDUFA11, and MRPL40 were determined for further analyzed ([107]Fig 5E). Fig 5. Intersection gene screening. [108]Fig 5 [109]Open in a new tab (A) Hub gene within the sienna3 module. (B) Hub gene within the lightcyan1 module. (C) Intersection genes obtained by MCC, MNC, and Degree algorithms in the sienna3 module. (D) Intersection genes obtained by MCC, MNC, and Degree algorithms in the lightcyan1 module. (E) The intersection of genes selected by three algorithms and DEGs. Discussion LMA is the cross-sectional area of the longest dorsal muscle of livestock, which is particularly important in breeding because of its strong correlation with livestock meat production performance. In general, a larger LMA indicates a relatively higher pure meat percentage and better meat quality, while cattle with a larger LMA tend to have fuller and more developed muscles, which may also imply that they grow faster. Previous studies combined GWAS with RNA-seq to reveal that HDAC9 is a candidate gene affecting LMA in Beijing black pigs [[110]29]; Two quantitative trait loci (QTL) related to LMA were identified on SSC1 and SSC2 by multi-species genome-wide association analysis. The candidate gene for the QTL on SSC1 is FUBP3 (far upstream element-binding protein 3) [[111]30]. Zhao X et al [[112]31] identified one (DOCK5), three (PID1, PITX2, ELOVL6), and three (CCR1, PARP14, CASR) genes as candidate genes for LMA, loin muscle depth (LMD), and BFT using GWAS. In this study, four significant SNPs were screened at the genome-wide level and annotated to be located in or close to three coding genes. The THBD gene encodes Thrombomodulin, a protein primarily expressed on the surface of endothelial cells. The THBD gene is enriched in GO pathways related to calcium ion binding and blood coagulation.The THBD gene played vital roles in the muscle regulatory networks in the Yorkshire breed [[113]32], which is consistent with the results of this study. CD93, also known as complement component C1q receptor, plays a key role in cell proliferation and migration [[114]33]. The CD93 gene exacerbated cell proliferation, angiogenesis, and immune evasion in osteosarcoma by triggering the PI3K/AKT pathway [[115]34]. The RIMS2 gene encodes the regulatory synaptic membrane exocytosis 2 protein and affects metabolic pathways related to cell differentiation and proliferation. It is also associated with the regulation of the immune system. In this study, seven candidate genes were screened at the transcriptome level, with one overlapping gene with the GWAS results. Pathway enrichment analysis showed proteolipid protein l (PLP1) was mainly enriched in the signaling pathway associated with mental illness, and was also involved in the long-chain fatty acid biosynthetic process (GO:0042759). PLP1 was the main component of myelin, and its mutation will lead to progressive neurodegeneration and eventually death due to severe protein defects [[116]35], thus its dysfunction will directly affect the motor function of muscles. PLP1 was also involved in the regulation of intestinal motility and barrier function [[117]36]. The α-synuclein (SNCA) was involved in Parkinson’s disease (hsa05012), actin binding (GO:0003779) and other related pathways. SNCA may play a role in the compartmentalization of acetylcholine at the neuromuscular junction and in the fine control of skeletal muscle activity [[118]37]. MRPL36 and MRPL40 both belong to mitochondrial ribosomal proteins, which are important components of the structural and functional integrity of the mitochondrial complex [[119]38]. The overexpression of MRPL36 seems to improve the efficiency of mitochondrial translation [[120]39]. Ying Liu et al. found that MRPL40 may be a potential target for the treatment of cryptorchidism and decreased sperm motility and number [[121]40]. Moreover, the downregulation of MRPL40 expression can eliminate the effect of interleukin-8 (IL-8) on promoting the proliferation and migration of rectal cancer [[122]41]. Currently, there was no direct evidence to show that MRPL36 and MRPL40 were involved in the genetic regulation of muscle development. However, as key components of mitochondrial ribosomal proteins, their contribution to mitochondrial function and energy metabolism may indirectly affect muscle development and maintenance. NDUFS8, NDUFB7, and NDUFA11 were three genes related to the mitochondrial respiratory chain, and the proteins encoded by these three genes were all important components of the mitochondrial respiratory chain Complex I (NADH). All three genes were involved in biological processes such as citric acid (TCA) cycling respiratory electron transport (R-HSA-1428517) and respiratory electron transport (R-HSA-611105). The expression level of NDUFS8 in myoblasts decreased significantly with the increase of age, and experimental results showed that the expression of NDUFS8 in myoblasts promoted differentiation, self-renewal, and apoptosis resistance [[123]42]. Knockout of NDUFS8 inhibits cell proliferation, migration, and capillary formation in cultured endothelial cells [[124]43]. NDUFB7 and other genes were involved in regulating lipid synthesis and fatty acid β-oxidation, to maintain normal energy supply under low oxygen conditions [[125]44]. In chickens with woody breast meat, the expression of the NDUFB7 gene, which was associated with mitochondrial and ATP synthesis, tends to decrease. Additionally, other studies have shown that NDUFA11 expression was significantly reduced in the blood of patients with acute myocardial infarction, making it a potential biomarker for diagnosis of myocardial infarction [[126]45]. The genes PPP3R1, FAM129B, and UBE2G1 are key genes influencing the LMA of uncastrated Nellore males [[127]11]. These genes do not overlap with those identified in this study, which is likely due to differences in breeds. Conclusions This study described the genomic variants and candidate genes associated with LMA in Huaxi cattle. Through genome-wide association analysis, the THBD gene and the CD93 gene on BTA 13, and the RIMS2 gene on BTA 14 were detected to be crucial in affecting the LMA trait. Transcriptome analysis showed that three other candidate genes (PLP1, SNCA, and NDUFS8) were also identified as candidate genes, which regulate LMA by participating in actin synthesis and muscle development. Summarily, the candidate genes related to LMA identified in this study were THBD, CD93, RIMS2, PLP1, SNCA, and NDUFS8. This result could provide new insights into the molecular mechanisms regulating LMA in beef cattle. Supporting information S1 File S1 Table. Phenotype statistics of two groups of samples. S2 Table. The annotation of significant SNPs (±100 kb). S3 Table. The information of sequencing reads alignment to Bos taurus reference genome. S4 Table. The significant GO terms and KEGG pathways for differentially expressed genes. S5 Table. The enriched GO terms and KEGG pathways for the hub genes in two significant module. S6 Table. The detailed information of 6 genes associated with loin muscle area identified by GWAS and RNA-seq strategies. (XLSX) [128]pone.0322026.s001.xlsx^ (103.9KB, xlsx) Acknowledgments