Abstract It is critical in sheep farming to accurately estimate ram fertility for maintaining reproductive effectiveness and for production profitability. However, there is currently a lack of reliable biomarkers to estimate semen quality and ram fertility, which is hindering advances in animal science and technology. The objective of this study was to uncover long non-coding RNAs (lncRNAs) in sperm from rams with distinct fertility phenotypes. Mature rams were allocated into two groups: high and low fertility (HF; n = 31; 94.5 ± 2.8%, LF; n = 25; 83.1 ± 5.73%; P = 0.028) according to the pregnancy rates sired by the rams (average pregnancy rate; 89.4 ± 7.2%). Total RNAs were isolated from sperm of the highest- and lowest-fertility rams (n = 4, pregnancy rate; 99.2 ± 1.6%, and 73.6 ± 4.4%, respectively) followed by next-generation sequencing of the transcripts. We uncovered 11,209 lncRNAs from the sperm of rams with HF and LF. In comparison to each other, there were 93 differentially expressed (DE) lncRNAs in sperm from the two distinct fertility phenotypes. Of these, 141 mRNAs were upregulated and 134 were downregulated between HF and LF, respectively. Genes commonly enriched for 9 + 2 motile cilium and sperm flagellum were ABHD2, AK1, CABS1, ROPN1, SEPTIN2, SLIRP, and TEKT3. Moreover, CABS1, CCDC39, CFAP97D1, ROPN1, SLIRP, TEKT3, and TTC12 were commonly enriched in flagellated sperm motility and sperm motility. Differentially expressed mRNAs were enriched in the top 16 KEGG pathways. Targets of the differentially expressed lncRNAs elucidate functions in cis and trans manner using the genetic context of the lncRNA locus, and lncRNA sequences revealed 471 mRNAs targets of 10 lncRNAs. This study illustrates the existence of potential lncRNA biomarkers that can be implemented in analyzing the quality of ram sperm and determining the sperm fertility and is used in breeding soundness exams for precision livestock farming to ensure food security on a global scale. Keywords: ram, sperm, fertility, long non-coding RNAs, gene expression 1 Introduction Male fertility is crucial for animal reproduction as it encompasses the successful fertilization of an egg with a viable spermatozoon, subsequently leading to embryonic and fetal development, ensuring the species' continued existence. Fertility is an economically important trait, and ram flock represents more than 50% of the genetics of every sheep farmer's flock ([27]1). Thus, proper management of rams for maximum performance and longevity is vital to the success and reproductive efficiency of sheep farming ([28]2). Accordingly, advancing ram fertility in the livestock system is imperative to provide animal-based food demands to feed the ever-increasing human population on a global scale. In addition to genetic evaluation and testing, rams have been selected according to breeding soundness exam (BSE), which requires a series of examinations, including physical exam, scrotal measurement, sperm morphology, and motility. Despite massive attempts to evaluate ram fertility using the BSE, the predictability of ram fertility still awaits improvement. Non-compensable factors, such as DNA damage, RNA molecules ([29]3), and protein markers ([30]4, [31]5) in sperm fertility, are attributed to minute sperm abnormalities that cannot be determined using conventional procedures ([32]6). Advanced omics approaches have paved the way for underlying molecular mechanisms related to spermatogenesis, fertilization, and embryogenesis ([33]7, [34]8). Such methods may be used more widely in the future for producing farm animals in combination with the evaluation of semen parameters ([35]9, [36]10), while also estimating accurate sperm fertility markers in farm animals ([37]11, [38]12). Spermatozoa can transmit not just the paternal DNA but also certain RNA molecules and transcription factors, presumably inherited into the oocytes during fertilization ([39]13). Researchers have recently used cutting-edge approaches for discovering small noncoding RNAs with various nucleotide lengths and biogenesis processes, such as PIWI-interacting RNAs (piRNA), microRNAs (miRNA), tRNAs, and long noncoding RNA (lncRNA)-derived short RNAs ([40]14–[41]17). Sperm bearing RNA molecules are implicated in spermatogenesis and embryo development at transcriptional and posttranscriptional levels ([42]18–[43]20). As such, they play roles in regulating spermatogenesis and fertilization by transferring small noncoding RNA (sncRNA) and lncRNA into the oocyte ([44]3, [45]19, [46]21). In addition, paternally derived noncoding RNAs are key regulators of preimplantation embryos ([47]22) since some of them are involved in the control of gene expression in zygotic and early embryonic development ([48]23). Accordingly, sperm noncoding RNAs can enhance the transmission of epigenetic information to the offspring ([49]13) because environmental and metabolic-induced modifications of sperm cells may influence the epigenetic modulation of embryo development by changing the gene expression through noncoding RNAs ([50]24–[51]27). LncRNAs comprise nucleotides located in the cytoplasm and nucleus, which are transcribed by RNA polymerase II and are longer than 200 nucleotides and lack protein-coding capacity. LncRNAs, with their higher and more stable structures, modulate gene expression (cis and trans manner) at several levels, including epigenetic, transcriptional, posttranscriptional, and posttranslational, through their interactions with mRNA, proteins, and other sncRNAs. They have significant functions in biological processes such as modifying the chromatin structure, activating gene expression, inhibiting gene expression, and translating mRNA molecules. LncRNAs have crucial roles in regulating the many biological processes that are highly expressed in bovine and mouse testis and mature sperm ([52]28, [53]29). There is a linkage between noncoding RNA markers and male fertility, as the transcripts can be involved in the prediction of fertility ([54]30, [55]31). The lncRNAs differentially expressed in distinct motility phenotypes of bovines and humans ([56]32) imply that sperm lncRNA may possess functional roles in fertility ([57]33). Using RNA sequencing technologies, sperm bearing RNA was found to be conversed among many species, such as stallions, goats, and boar ([58]34–[59]36). The unique male sperm ncRNAs with consistent fertility phenotypes can be valuable as potent fertility biomarkers. Accordingly, this study aimed to uncover long noncoding RNA profiles from ram sperm possessing distinct fertility phenotypes. 2 Material and methods 2.1 Ram fertility assessment and experimental design The Republic of Turkey's Ministry of Agriculture and Forestry's Institute of Bahri-Dagdaş International Agricultural Research supplied information on the fertility phenotypes of adult rams. Pregnancy rates from natural mating were used to determine the fertility of mature rams (n = 66) at least 3–4 years old during the breeding seasons of 2017–2018–2019. The ewes' estrus was detected using teaser rams, who were not permitted to mate, by covering the prepuce area. The teaser rams were presented to the ewes early in the morning for about 30 min. The estrus was considered to be ewes seeking, standing for teasing, and allowing mount attempts by teaser rams. A handler selected estrous ewes and brought them into an enclosure along with a randomly selected single ram for natural mating. Throughout the breeding season, estrus detection was maintained, and ewes were accepted to mate with the chosen ram at random. Ewes were regarded as pregnant if they did not return to estrus within 35 days of mating. In addition, the number of pregnant and non-pregnant ewes for each ram was confirmed by matching the mating and lambing dates according to the duration of pregnancies. The rams' fertility scores were rated based on their conception rates. The rams were split into two groups based on their fertility levels: the high-fertility (HF) group (n = 31; 94.5 ± 2.8%) and the low-fertility (LF) group (n = 25; 83.1 ± 5.73%). The average pregnancy rate was 89.4 ± 7.2% (n = 66), and the rams were divided into these groups based on their fertility levels. We conducted an a priori power analysis using G^*Power3 (V3.1.9.4) to test the differences between two independent group means using a two-tailed test, an effect size (d = 2.52), and an alpha level of .05. The result showed that a total sample of eight animals with two equal-sized groups of n = 4 was required to achieve a power of .80 for RNA profiling. However, we excluded the rams that did not have 50 mating and were not used, so we profiled four animals for each group out of a total of 56. During the breeding season, each ram served at least 50 ewes in both groups. 2.2 Semen collection The Bahri-Dagdaş Research Center Ethical Committee, Turkey (Number: 22.12.2016/58), approved the animal procedures. We trained the rams to obtain sperm using an artificial vagina (AV) that enabled them to mount on teaser ewes during estrus. Rams were permitted to ejaculate into the AV upon mounting. The first three collections were discarded prior to the collection of research samples, which was followed by semen collection and processing for use in research. Fertility rates that were 1 standard deviation above or below the mean were termed outliers. Four rams with the highest fertility (pregnancy rate; % 99.2 ± 1.6) and four rams with the lowest fertility (pregnancy rate; % 73.6 ± 4.4) were selected for lncRNA sequencing with high confidence. Subsequently, we collected about 2 × 10^9/ml spermatozoa per ejaculate, and then the aliquots from each sample were adjusted to a final concentration of 10^7/ml in straws and frozen at −80°C until lncRNA analysis. 2.3 RNA isolation Prior to RNA isolation, we purified sperm by filtering the semen samples with a 500-mesh sieve to eliminate cell debris. Then, samples were treated with a somatic cell lysis solution (0.3% Triton X-100 and 0.1% SDS in DEPC-treated H[2]O) for 30 min on ice to eradicate somatic cells, followed by microscopic analysis of non-sperm cell contamination. We isolated total RNA from the purified ram sperm (n = 4, for each group) samples using the SanPrep column microRNA miniprep kit (Bio Basic Inc, Canada) with slight modification using the manufacturer's protocols. We added 800 μl of a guanidine–thiocyanate lysis buffer enriched in 20 mM DL-dithiothreitol onto the pellet, and then sperm cells were homogenized by passing the samples through a 26-G needle syringe 20–25 times. After other contaminants were thoroughly removed and total RNA was attached to the membrane, an on-column DNase digestion was carried out to remove any traces of DNA contamination. We evaluated the concentration and integrity of the total RNA samples using a NanoDrop (Colibri Microvolume Spectrometer, Titertek-Berthold, Germany) and a 2100-Bioanalyzer with the RNA 2100 Nano Chip (Applied Biosystems, Carlsbad, CA, USA), respectively. 2.4 Library preparation for lncRNA sequencing Each RNA sample was utilized to prepare 2 g of RNA for the RNA library, and ribosomal RNA was first eliminated using the Ribo-ZeroTM rRNA Removal Kit from Epicenter. Then, using the NEBNext^® UltraTM-Directional RNA Library Prep Kit from Illumina^® (NEB, USA) in compliance with the commercial kit protocol, sequencing libraries were developed utilizing the rRNA-depleted RNA. Reverse transcription was used to create the first strand of the cDNA following fragmentation with an average length of 200 bp. The Agilent Bioanalyzer 2100 equipment was used to evaluate library quality following product purification using the AMPure XP system. As a result, libraries underwent sequencing using the Illumina NovaSeq 6000 (Illumina Inc., San Diego, CA, United States), and Novogene Corporation (Beijing, China) generated 150-bp paired-end reads. 2.5 RNA-Seq read alignment and transcript assembly Initially, rRNA, adapter sequences, empty reads, and low-quality reads were eliminated from the raw data. All trimmed reads were confirmed to satisfy the quality threshold (Q-score; Q20 and Q30) to ensure that there was no bias in the evaluation step toward approaches that favor maximum read. The Phred scale indicating the reliability of base-calling, with Q20 representing a base call accuracy of 99% (or a 1% chance of error) and Q30 representing a base call accuracy of 99.9% (or a 0.1% chance of error) was used as the quality score. The Ovis aries (v4.0) reference genome was indexed with Bowtie v2.0.6, and the processed paired-end reads were mapped to that genome with HISAT2 2.1.0 ([60]37). Each sample's mapped reads were constructed using StringTie (v1.3.1) ([61]38). Finally, Cuffcompare, a program included in Cufflinks (v2.1.1), was used to annotate the assembled transcripts. 2.6 Putative lncRNA identification To classify newly screened lncRNAs with respect to their positional relationship with known mRNAs, putative lncRNAs were identified. To minimize the false-positive rate (FDR), assembled transcripts were classified to retrieve putative lncRNAs, such as lincRNA, antisense lncRNA, intronic lncRNA, and sense-overlapping. (A) Transcripts with a single exon were eliminated. (B) Transcripts with fewer than 200 nucleotides were eliminated. (C) Using Cufflinks v2.1.1, the annotated lncRNAs in the database were used to exclude the transcripts that overlapped with the exon region of the database annotation. (D) All transcripts with modest levels of expression [FPKM thresholds were set for the categorization of transcript expression levels; genes with very low or no expression (FPKM < 0.5), and FPKM = 0.5 was chosen as the cutoff to filter out the average read coverage per transcript which was much higher than the other transcript] were omitted (FPKM of a single exon transcript). Using three methods, namely, Coding-Non-Coding-Index (CNCI) ([62]39), Pfam-scan ([63]40), and coded potential calculator (CPC) ([64]41), all estimated transcripts with coding potential were filtered out, and a set of putative lncRNAs was compiled from those with noncoding potential. Using Cuffcompare, the distinct categories of lncRNAs were obtained. 2.7 Analysis of mRNA and lncRNA expression levels The FPKM value was used to evaluate levels of mRNA and lncRNA expression. Cuffdiff (v2.1.1) was used to determine lncRNA FPKM values. Later, the statistically significant DE genes were quantified by a log[2]-fold change higher or equal to 2 (P-value < 0.05) or P-adjust < 0.05 (applied correction for multiple testing to the P-values to FDR), using Ballgown ([65]42). 2.8 Target gene prediction DE lncRNAs were selected to determine target genes. Pearson's correlation was used to assess potential coexpression between lncRNAs and mRNAs. A Pearson correlation >0.7 and a P-value of 0.05 were used to determine a positive association between an lncRNA and an mRNA. LncRNAs can act as cis regulators by remodeling factors onto local chromatin. We described cis-modulated genes as protein-coding genes that were coexpressed with one dysregulated lncRNA and were within 30 kb upstream or downstream in genomic distance in the same allele. To participate in certain biological processes, key transcription factors (TFs) are regulated in a trans manner by unique lncRNAs. As a result, we matched these lncRNAs' coexpressed mRNAs to mRNAs that were regulatory targets of specific TFs to anticipate that these lncRNAs might be involved in pathways controlled by these TFs. 2.9 Gene ontology terms and KEGG pathway enrichment We assessed DE mRNAs for gene ontology (GO) enrichment analysis using g:Profiler ([66]43). GO terms possessing a corrected P-value < 0.05 were accepted as significantly enriched by DE genes. The statistical enrichment of lncRNA target genes was examined for KEGG pathway functional analysis based on the reactome pathway database by WebGestalt (WEB-based Gene SeT AnaLysis Toolkit) with P < 0.05 and FDR < 5.0% ([67]44). The following parameters were specified for the enrichment analysis. A particular organism was labeled as Ovis aries (sheep). Sequential GO analyses (biological process; BP, cellular component; CC, and molecular function; MF) were conducted. The g:SCS method is used to compute multiple testing corrections for P-values based on GO and pathway enrichment analysis, with the user threshold set at 0.05. 2.10 Statistical analysis Data were analyzed through SPSS software (version 22.0). Statistical plots were generated using GraphPad Prism 9 (GraphPad Software, USA). During the experimental process, four biological replicates were included with the measurement repeated twice. LF vs. HF groups were analyzed using an independent t-test. Significance was accepted at P-value = 0.05. 3 Results 3.1 Overview of sequencing data in ram sperm between LF and HF In the current study, we pooled two cDNA libraries out of the eight total RNAs (LF and HF, each = 4) obtained from low- and high-fertility ram sperm. The total number of raw readings obtained from all cDNA libraries was 91,108,174. Upon filtering the reads, a total of 89,252,823 clean reads were obtained. The Q20 (%) percentages were 96.06 and 95.39 for LF and HF, respectively. The Q30 (%) percentages were shown to be 90.96 and 89.50 for LF and HF, respectively. The percentages of GC content (%) were shown to be LF, 60.89, and HF, 62.37 ([68]Table 1). Furthermore, after aligning the clean reads with the ovine reference genome through the TopHat2 algorithm, we discovered that the total mapped reads or fragments referring to all samples exceeded 75% and were mapped in the reference genome ([69]Table 2). Table 1. Summary of data production. Sample Raw_reads Clean_reads Raw_data (G) Clean_data (G) Error_rate (%) Q20 (%) Q30 (%) GC_ content (%) LF 41,020,716 40,183,273 12.3 12.1 0.03 96.06 90.96 60.89 HF 50,087,458 49,069,550 15 14.7 0.03 95.39 89.5 62.37 [70]Open in a new tab Raw read statistics: the total number of reads for each file is determined, and each set of four consecutive lines represents the information for one read. Clean reads are the same as raw reads, but only the filtered reads are used to calculate the results of any further study; Clean reads (G) the sum of a sequence's length and number, expressed in giga bases; Error rate: the sequencing error rate; Q20, Q30: the proportion of all bases for which the Phred score is 20 or 30 or above, indicating base call accuracy; the proportion of guanine (G) and cytosine (C) in all bases is known as GC content. Table 2. A list of reads that were mapped to the reference genome. Sample name HF LF Total reads 98,139,100 80,366,546 Total mapped 82,508,774 (84.07%) 69,017,041 (85.88%) Multiple mapped 3,526,455 (3.59%) 2,394,038 (2.98%) Uniquely mapped 78,982,319 (80.48%) 66,623,003 (82.90%) Read-1 40,351,963 (41.12%) 33,802,488 (42.06%) Read-2 38,630,356 (39.36%) 32,820,515 (40.84%) Reads map to “+” 39,407,126 (40.15%) 33,255,875 (41.38%) Reads map to “–” 39,575,193 (40.33%) 33,367,128 (41.52%) Non-splice reads 77,745,931 (79.22%) 65,903,933 (82.00%) Splice reads 1,236,388 (1.26%) 719,070 (0.89%) [71]Open in a new tab Total reads: the number of reads after data filtration (clean data); Total mapped represents the total quantity of mappable reads. If a suitable reference genome is available and no contamination occurs during the experimental procedure, the percentage will typically exceed 70%. Numerous mappings: the number of sequences mapped to multiple reference sequence positions; specifically mapped: the number of reads that map to a particular position in the reference sequences. Map to “+”; Reads that map to “–”: the number of reads that are mapped to the minus strand. Quantity of reads mapped to two exons; also known as junction reads. Similarly, non-splice reads are those that are completely mapped to a single exon. The ratio of splice readings to total read length is proportional. 3.2 Chromosome read distribution and known RNA types The distribution intensity of the total mapped reads was split up and computed for both the plus and minus strands within each chromosome ([72]Supplementary Table 1). The distributions of reads in the known RNA types for LF and HF are depicted in [73]Table 3. The corresponding data related to these distributions can be found in [74]Supplementary Table 2. Table 3. A list of reads that were mapped to the reference genome. Classification of mapped reads (LF) Classification of mapped reads (HF) Others (49,390,745 [78.4%]) Others (57,960,150 [77.95%]) Protein_coding (12,785,367 [20.3%]) Protein_coding (15,352,804 [20.65%]) lncRNA (697,609 [1.1%]) lncRNA (855,897 [1.15%] pseudogene (125,827 [0.2%]) pseudogene (137,580 [0.2%]) rRNA (10,395 [0.02%]) rRNA (11,500 [0.02%]) miRNA (5,685 [0.01%]) miRNA (7,407 [0.01%]) misc_RNA (3,588 [0.0%]) misc_RNA (13,778 [0.0%]) ribozyme (2,495 [0.0%]) ribozyme (7,883 [0.0%]) [75]Open in a new tab Distribution of mapped reads in known types of RNAs for low fertility (LF) and high fertility (HF). 3.3 Novel lncRNA identification in ram sperm between LF and HF We applied the lncRNA filtering method to determine the novel lncRNA candidates in ram sperm. The specific filtering technique is outlined in [76]Supplementary Table 3. Using this lncRNA filtering process, we identified a total of 14,352 lncRNAs, of which 11,209 were novel candidates, the data for which are supplied in [77]Supplementary Table 4. We utilized three distinct software packages (CNCI, CPC, and Pfam) to estimate the potential protein-coding ability of the transcripts ([78]Figure 1). Figure 1. Figure 1 [79]Open in a new tab Venn diagram. The numerical values within each circle and the areas of overlap symbolize the total count and the number of noncoding transcripts that are found out by the software. 3.4 Identification and characteristics of lncRNAs between LF and HF in ram sperm The primary characteristics of the lncRNAs' biotype distribution and length, exon intensity, and ORF length were examined as a consequence of the sequence analysis. Using Cufflinks, we divided lncRNAs into three groups: long intervening noncoding RNA (lincRNA), antisense lncRNA, and intronic lncRNA. Of these, 8.3%, 8.5%, 9.8%, 72.6%, and 0.8% of the attained lncRNAs were sense intronic, antisense, sense overlapping, lincRNA, and others, respectively ([80]Figure 2A). Using this classification scheme, we revealed that the majority of the sperm-specific lncRNAs (72.6%) were lincRNAs. The exon lengths of the obtained lncRNAs were between 126 and 20,040 bp, and approximately 37% of the lncRNAs were demonstrated to be intense in 946–1,356 bp length ([81]Figure 2B), with 1,542 bp being the median value. The exon length of the obtained mRNAs ranged between 75 and 26,726 bp, with around 10% of the mRNAs being intense at 835–1,025 bp ([82]Figure 2B). Furthermore, lncRNAs have been shown to be more abundant in the first and fourth exons, whereas mRNAs were found to be more abundant in the first and seventh exons ([83]Figure 2C). The length of the ORF of the lncRNAs was between 54 and 20,037 bp, with a median value of 1,461 bp ([84]Figure 2D). For mRNAs, the ORF length was reported as 3–26,460, and the median value was 1,332. Figure 2. [85]Figure 2 [86]Open in a new tab (A) Distribution of various lncRNA types. (B) Comparison of the lengths of lncRNAs, mRNAs, novel lncRNA–mRNA, and TUCP. (C) A comparison of the exon counts of lncRNAs and mRNAs. (D) A comparison of the ORF lengths of lncRNAs and mRNAs. The expressions associated with transcripts of lncRNA, mRNA, novel lncRNAs, and TUCP were evaluated using Cuffdiff. The most common approach for predicting gene expression levels, FPKM, is based on the effects of gene length on sequence depth and fragments described in the RNA sequence. The FPKM values associated with samples were calculated ([87]Supplementary Table 5). Upon comparing FPKM values between LF and HF, no significant change was observed ([88]Figures 3A, [89]B). Moreover, the expression levels related to novel lncRNAs were the highest among the transcripts. TUCP and novel mRNA transcripts were greater than those of lncRNA and mRNAs, while lncRNA transcripts were similar to those of mRNA ([90]Figures 3C, [91]D). Figure 3. [92]Figure 3 [93]Open in a new tab (A, B) A comparison of gene expressions. Box plot depicting FPKM values. The X and Y axes reflect the corresponding sample name and log[10](FPKM + 1) value, respectively. For each sample, the plot region reflects, from top to bottom, the maximum, upper quartile, median, lower quartile, and minimum statistics. (2) Density distribution of Fragments Per Kilobase of transcript per Million mapped reads. The X and Y axes show, respectively, the value of log[10](FPKM + 1) and the density of genes. (C, D) Violin plot for distinct forms of lncRNA, mRNA, and TUCP transcription. The X-axis of the FPKM violin plot displays the sample names, while the Y-axis displays the log[10](FPKM + 1). Each violin plot possesses five statistical parameters (max value, upper quartile, median, lower quartile, and min value). The breadth of the violin plot indicates gene density. 3.5 LncRNAs and mRNAs that are differentially expressed in LF and HF Cuffdiff was employed to identify differentially expressed lncRNAs, mRNAs, and TUCPs. As a result, in HF and LF ram sperm, 93 lncRNAs and 275 mRNAs were reported to be differentially expressed (DE). We discovered 49 lncRNAs that are significantly upregulated and 44 that are downregulated between LF and HF groups. In addition, we discovered that 141 mRNAs were upregulated, whereas 134 were downregulated. Volcano plots show the up and downregulation highlights ([94]Figures 4A, [95]B; [96]Supplementary Table 6). We also examined the expression patterns of DE lncRNAs and mRNAs using hierarchical clustering analysis, which serves as an additional approach for shedding light on differentially expressed genes by grouping genes with comparable expression patterns. The transcript FPKMs were utilized for hierarchical clustering, with discrete colors indicating the direction of the expression level. The clustering of genes on the left was caused by similar expressions (fold change >2, P < 0.05) and LF and HF ram sperm, whereas the expression change from blue to red indicated them as increasingly increased ([97]Figures 4C, [98]D). Figure 4. [99]Figure 4 [100]Open in a new tab Volcano plot: (A, B) show differentially expressed lncRNAs and mRNAs in sperm between LF and GF rams, respectively. Heat map of clustering of genes; (C, D) show differentially expressed mRNAs and lncRNAs between LF and HF ram sperm (fold change >2, p < 0.05), respectively. Colors indicate the following red; up and blue; down. 3.6 DE mRNA functional annotation and KEGG pathways analysis Following a study at the transcriptome level, 275 mRNAs were selected for enrichment analysis due to their differential expression levels (141 upregulated and 134 downregulated). The top eight GO annotations terms in biological process (BP) were significantly enriched in the DE mRNAs, namely, the biological process (GO:0008150), cellular process (GO:0009987), regulation of cellular process (GO:0050794), organelle organization (GO:0006996), flagellated sperm motility (GO:0030317), sperm motility (GO:0097722), and cilium movement involved in cell motility (GO:0060294). In addition, in cellular component, intracellular anatomical structure (GO:0005622), sperm flagellum (GO:0036126), and motile cilium (GO:0031514) were among the top GO terms and binding (GO:0005488), protein binding (GO:0005515), enzyme binding (GO:0019899), and protein C-terminus binding (GO:0008022) in molecular function (MF; [101]Figure 5A). The detailed definition of the GO terms is presented in [102]Supplementary Figure 1. According to the statistics of the pathway enrichment, the top 16 KEGG pathways, including complex I biogenesis, the citric acid (TCA) cycle, respiratory electron transport, respiratory electron transport, mitochondrial translation, and gene silencing by RNA, had higher concentrations of DE mRNAs. ABHD2, AK1, CABS1, ROPN1, SEPTIN2, SLIRP, and TEKT3 genes were commonly enriched for 9 + 2 motile cilium, sperm flagellum, and motile cilium. Moreover, CABS1, CCDC39, CFAP97D1, ROPN1, SLIRP, TEKT3, and TTC12 were commonly enriched in flagellated sperm motility, sperm motility, and cilium movement involved in cell motility ([103]Supplementary Figure 1). Figure 5. [104]Figure 5 [105]Open in a new tab (A) Functional enrichment analysis of DE mRNAs in LF and HF sperm of ram. Significantly enriched target gene terms are shown. GO keywords are represented by red color codes, molecular functions by orange ones, biological processes by green ones, and cellular components by green ones (CC). (B) Functional enrichment analysis of target genes of cis- or trans-regulated by lncRNAs. Illustrated are normalized enrichment scores for specific sets of target genes. 3.7 Target genes of cis- or trans-regulated by lncRNAs To better understand how lncRNAs act in both cis and trans manner, we predicted their targets. There were 471 mRNAs that were discovered as targets of 10 lncRNAs (five upregulated and five downregulated), using 30 kb as the cutoff ([106]Supplementary Figure 2). According to the GO enrichment study results, 55 significant GO terms were found (corrected P-value 0.05). In MF, the top five GO keywords were nucleoside phosphate binding, protein binding, small-molecule binding, binding, and nucleotide binding ([107]Supplementary Figure 3). In BP, negative regulation of the cellular metabolic process, system development, response to chemical stimuli, organonitrogen compound metabolic process, and multicellular organism development were among the top five GO terms and cytoplasm, cell junction, cytosol, nucleoplasm, and synapse in CC ([108]Figure 5B). 4 Discussion The quality of sperm on a cellular level alone is no longer considered to be a reliable predictor of male fertility in selecting superior male prospects in livestock as a source of frozen semen ([109]12, [110]31). Methods for analyzing the sperm transcriptome, such as measuring messenger RNA levels, have been linked to increased fertility ([111]45). Analysis of the sperm transcriptome has become the primary tool for predicting male fertility potential in the livestock business, due in part to the widespread application of this technology in livestock growth ([112]45, [113]46). Accordingly, this study demonstrated long noncoding RNA profiles in sperm from rams with HF and LF phenotypes to uncover potential lncRNAs associated with fertility. The comparative analysis across multiple species indicates a considerable gap in our understanding of the functional roles of lncRNAs within sperm cells despite their prevalence in male germ cell development. Previous studies on cattle, boar, sheep, mice, and humans have collectively identified substantial numbers of potential lncRNAs in sperm cells ([114]46–[115]48). However, the functional annotation and investigation of these lncRNAs have remained limited. We obtained a total of 91,108,174 cDNA libraries, of which 89,252,823 were clean reads from ram spermatozoa. This provides a robust dataset for comparative analysis, especially when considering the consistency with similar studies conducted, such as on human, turkey, and boar (79.8, 84, and about 65.5 million, respectively) spermatozoa ([116]49, [117]50). Furthermore, the range of unique mapped reads, ranging from 66,623,003 to 78,982,319, indicated a strong alignment of the sequenced data to the reference genome of sheep. The high percentage of over 80% uniquely mapped reads demonstrated a significant mapping consistency comparable to that observed in bull and stallion spermatozoa ([118]32, [119]36). This consistency across species further confirms the reliability and quality of the obtained sequence data for ram spermatozoa, supporting the validity of results. In this study on ram fertility, we identified 11,209 sperm lncRNAs, a subset comprising 93 differentially expressed lncRNAs associated with LF and HF phenotypes in ram sperm. Comparatively, in mouse mature spermatozoa, 4,088 novel lncRNA transcripts were identified out of 20,907 known lncRNA transcripts ([120]29), demonstrating the complexity and diversity of these transcripts in different species' sperm cells. Similarly, in human sperm, 27,472 novel lncRNAs were discovered ([121]29), with 19 differential expressions of lncRNA out of 11,561 lncRNA transcripts in mature bull spermatozoa ([122]32). Analysis of goat spermatozoa revealed 655 lncRNA transcripts relevant to spermatogenesis from sequencing of the cDNA library, of which 1% annotated to lncRNAs was similar to ram spermatozoa with a classification of 1.1% ([123]51). We also showed that the type of lincRNA seemed to be closer between ram and goat spermatozoa ([124]34), both revealing a 72.6 and 76.64% annotation to lncRNAs, respectively, which highlights the relevance of these findings across species and their potential implications for understanding male fertility mechanisms. The GO and KEGG investigations were carried out for associated genes of DE mRNAs and lncRNAs to completely examine the functional roles of mRNAs and lncRNAs in ram sperm fertility. Candidate genes have been identified using this bioinformatics approach, and they were related to male reproductive biology. Our findings demonstrated that 275 mRNA transcripts were enriched for the biological process, cellular component, and molecular function GO characterizations. It has become apparent that several metabolic pathways and regulatory mechanisms have crucial roles in fertility, as related to sperm motility. We demonstrated that ABHD2, AK1, CABS1, ROPN1, SEPTIN2, SLIRP, and TEKT3 genes were commonly enriched for the 9 + 2 motile cilium, sperm flagellum, and motile cilium. Of these, ABHD2, an isolated molecule from sperm tails, is needed to activate sperm ([125]52). AK1 is an enzyme that is often responsible for cellular energy balance. It is detected in the flagella of murine and bovine sperm, which suggests that it is involved in sperm motility and is also demonstrated to be directly associated with bull fertility ([126]53–[127]55). Moreover, CABS1, CCDC39, CFAP97D1, ROPN1, SLIRP, TEKT3, and TTC12 were commonly enriched in flagellated sperm motility, sperm motility, and cilium movement involved in cell motility. As a Ca^2+ storage protein in mature sperm, CABS1 is a crucial factor in the regulation of calcium signaling and has been shown to preserve sperm flagella structure ([128]56, [129]57). ROPN1, implicated in fibrous sheath integrity and sperm motility, is engaged in PKA-dependent signaling for spermatozoa capacitation; therefore, mutation and lack of its expression in murine sperm cells cause impaired fertility ([130]58, [131]59). We showed that differentially abundant protein profiles of sperm from rams with contrasting fertility phenotypes were associated with metabolic energy generation by sperm cells along with the motility signaling pathway ([132]4). Our results are consistent with results from the previous study that mRNA–lncRNA interaction seemed to regulate signaling pathways for functional motility. The primary potential roles of lncRNAs are to control the expression of neighboring protein-coding genes through both cis and trans manners, integrating transcriptional coactivation or repression. Consequently, we conducted an in-depth analysis and determined the mRNAs situated within the 30 kilobase (kb) threshold upstream and downstream regions of the differentially expressed lncRNAs. We then employed GO and KEGG analyses on the target genes to determine the undertaken lncRNAs. We found that lncRNAs with different levels of expression are involved in a number of important biological processes. We demonstrated that the identified lncRNA TCONS_00136350 may regulate the differentially expressed coding gene ADAM metallopeptidase domain 32 (ADAM32). ADAM32 is a member of the disintegrin family of membrane-anchored proteins and is detected on the surface of mature sperm ([133]60). ADAM32 has been demonstrated to be implicated in sperm–egg plasma membrane interaction, and ([134]61) its expression levels are correlated with the quality of human sperm ([135]62). Although ADAM32 is not required for normal male fertility ([136]63), we showed that it was identified in the top 10 differentially abundant proteins in ram sperm ([137]4). We also discovered that differentially expressed lncRNA TCONS_00035618 modulated enolase 1 (ENO1), the protein expression of which was higher in increased fertility of ram sperm ([138]4). ENO1 plays an important role in the process of metabolism of monocarboxylic acids, and it is a component in the pathway that leads to glycolysis and gluconeogenesis (KEGG:00010) associated with bull fertility ([139]64), and its lower expression levels lead to low motility ([140]65). In line with our study and of others, lncRNAs are potentially important for the coactivation/repression of target genes that regulate sperm fertility and motility. 5 Conclusions Even though several potential lncRNAs have already been identified through cutting-edge technology and accessible lncRNA annotation tools, functional annotation of lncRNAs in sperm biology is still in its infancy and holds great promise. In this study, differentially expressed lncRNAs in ram sperm were ascertained along with their associations with low vs. high fertility phenotypes. These findings are important because they help advance both the fundamental science of mammalian male gamete biology and applied science that may provide practical value for potential male fertility markers. Data availability statement The datasets generated and/or analyzed during the current study are available in the EMBL-EBI HYPERLINK [141]http://www.ebi.ac.uk/biostudies/studies/S-BSST1231 accessible at this accession number “S-BSST1231”. The data encompass all raw relevant data. Ethics statement The animal study was approved by Bahri-Dagdas, Research Center Ethical Committee, Turkey (Number: 22.12.2016/58). The study was conducted in accordance with the local legislation and institutional requirements. Author contributions MH: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. AK: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. EM: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. Funding Statement The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was funded by the Kastamonu University, Scientific Research Coordination Unit with a project number of KÜ-BAP01/2020-63. Animal materials provided were by the Scientific and Technological Research Council of Turkey (TUBITAK) with project no: (117O992) grant to AK. Conflict of interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Publisher's note All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher. Supplementary material The Supplementary Material for this article can be found online at: [142]https://www.frontiersin.org/articles/10.3389/fvets.2024.1337939/fu ll#supplementary-material Supplementary Figure 1 Distribution of lncRNA on chromosomes. (A) Distribution of lncRNAs in low fertility (LF) ram group. (B) Distribution of lncRNAs in high fertility (HF) ram group. [143]Image_1.JPEG^ (1.3MB, JPEG) Supplementary Figure 2 Identification pipeline for lncRNAs Identification pipeline for lncRNAs. Each step is documented in detail in the Methods section. [144]Image_2.JPEG^ (1.1MB, JPEG) Supplementary Figure 3 Functional enrichment analysis of target genes of Cis- or Trans-regulated by lncRNAs. Illustrated are normalized enrichment scores for specific sets of target genes. Significantly enriched target gene terms were shown. GO keywords are represented by red color codes, molecular functions by orange ones, biological processes by green ones, and cellular components by green ones (CC). [145]Image_3.PNG^ (13.3MB, PNG) [146]Data_Sheet_1.CSV^ (6.6MB, CSV) [147]Data_Sheet_2.CSV^ (322KB, CSV) [148]Table_1.XLS^ (2.9MB, XLS) [149]Table_2.XLSX^ (4.1MB, XLSX) [150]Table_3.XLSX^ (37.2KB, XLSX) [151]Table_4.XLSX^ (149.8KB, XLSX) References