Abstract Pheromones play a pivotal role in chemical communication across various taxa, with protein-based pheromones being particularly significant in amphibian courtship and reproduction. In this study, we investigate the Emei music frog (Nidirana daunchina), which utilizes both acoustic and chemical signals for communication. Base on a de novo assembled genome of a male Emei music frog, we identify substantial expansion in four pheromone-related gene families associated with chemical communication. Notably, six members of the two-domain three-finger protein (2D-TFP) family, belonging to the sodefrin precursor-like factor (SPF) pheromone system, exhibited high and specific expression in the male post-axillary glands during the breeding season. Structural and evolutionary analyses confirm the presence of the SPF system across amphibians, classifiable into four distinct classes (two within urodeles and two within anurans). We propose a complete regulatory network governing SPF secretion via the hypothalamic-pituitary-testicular-breeding gland axis, and suggest testosterone synthesis as the pivotal pathway. Behavioral experiments further reveal a previously unknown female-attractant role of SPF in anurans. Overall, these findings not only highlight the underestimated diversity and function of pheromones in anurans, but also provide important insights into the evolution of protein-based pheromones in vertebrates. Subject terms: Comparative genomics, Herpetology, Phylogenetics __________________________________________________________________ Multi-omics techniques facilitate a comprehensive understanding of the function and regulatory mechanisms of SPF pheromones in Anurans, shedding lights on the protein pheromones evolutionary and multi-modal communication in vertebrates. Introduction Pheromones are chemical substances produced and released by organisms that trigger specific behavioral or physiological responses in other members of the same species^[42]1. In the animal kingdom, pheromones have been well-documented across a variety of taxa, including insects, mammals, and amphibians^[43]2–[44]4. The diversity and ubiquity of pheromones across different species highlight their fundamental importance in reproductive and social contexts. Among amphibians, protein-based pheromones are especially significant in courtship and reproductive behaviors, where they help to mediate interactions between potential mates and influence reproductive success^[45]5,[46]6. Among the various pheromone systems in amphibians, the sodefrin precursor-like factor (SPF) system is of particular interest due to its widespread presence and functional significance^[47]7. SPFs belong to the two-domain three-finger protein (2D-TFP) family, as stated in reference^[48]8. These proteins can be neatly divided into two distinct groups, namely α-SPF and β-SPF, based on the unique cysteine patterns present within their two domains. SPF pheromones are found in both Caudata (salamanders and newts) and Anura (frogs and toads), with the system originating before the divergence of these two major groups, which occurred over 300 million years ago^[49]9. This ancient origin underscores the evolutionary importance of SPF pheromones in amphibian communication. In Caudata, SPF pheromones have been well-studied, with sodefrin, a decapeptide, being a notable example that specifically attracts female Japanese fire-bellied newts (Cynops pyrrhogaster)^[50]10. The SPF system is also present in male-specific glands across various amphibian species, further emphasizing its broad relevance^[51]7,[52]8. However, research on the SPF pheromone system in anuran amphibians remains relatively scarce. The Emei music frog (Nidirana daunchina) is an excellent model for studying multimodal communication in amphibians. This species is distinguished by its unique vocalizations, reminiscent of piano harmonies, and the presence of a male-specific, sexually dimorphic post-axillary gland during the breeding season. The post-axillary gland plays a critical role in reproduction, as evidenced by histological studies and behavioral experiments that confirm its function in attracting females^[53]11. Despite the extensive research on the bioacoustic communication of the Emei music frog^[54]12,[55]13, the pheromonal signaling mechanisms are still relatively unexplored. To address this gap in knowledge, our study aims to de novo assemble the Emei music frog genome and conduct comprehensive comparative genomic, transcriptomic, and metabolomic analyses. The primary objectives are to identify SPF pheromones and explore the regulatory mechanisms associated with their expression. We also conducted protein recombination and behavioral assays to validate the functions of the identified pheromones. By doing so, we seek to provide a valuable genetic resource for future investigations into pheromone evolution and offer insights into the molecular and evolutionary underpinnings of protein-based pheromones in vertebrates. Results and discussion Genomic features The PacBio sequencing platform was used to generate a high-quality genome of a male Emei music frog, yielding 201.36 Gb of PacBio Sequel II clean data with approximately 31.5-fold coverage. A 6.1 Gb genome was assembled, which approximated the size estimated by k-mer distribution (6.4 Gb, Fig. [56]S1), with a contig N50 size of 4.73 Mb and a scaffold N50 size of 669.5 Mb. The average GC content was 44.4%, similar to that of other anurans (Fig. [57]S2 and Table [58]S1). Based on Hi-C technology and karyotypic analysis of the Emei music frog (2n = 26, Fig. [59]S3), the scaffolds were assigned to 13 pseudochromosomes, consisting of five macrochromosomes and eight microchromosomes, ranging from 222.46 Mb to 930.04 Mb (Fig. [60]S4 and Table [61]S2). Analysis of genome assembly completeness was performed, identifying 91.65% complete vertebrate benchmarking universal single-copy orthologs (BUSCO) genes (Table [62]S3). Regarding BUSCO and N50 scores, the assembly was relatively complete compared with recently published Anuran genome assemblies. In total, 27,860 protein-coding genes were predicted, 98.13% of which were annotated in the NR, GO, KEGG, or other databases (Fig. [63]S5 and Table [64]S4). Repetitive sequences accounted for 72.24% of the genome, consisting of 11.61% tandem repeats and 60.63% transposable elements (TEs) (Table [65]S5). To analyze anuran divergence, the genome of the Emei music frog was compared to seven other anuran species: tropical clawed frog (Xenopus tropicalis), Leishan spiny toad (Leptobrachium leishanense), kio flying frog (Rhacophorus kio), common frog (Rana temporaria), túngara frog (Engystomops pustulosus), Chinese toad (Bufo gargarizans), and common toad (B. bufo), as well as two other vertebrates: humans (Homo sapiens) and green anoles (Anolis carolinensis). Phylogenetic analysis using single-copy orthologs revealed that the Emei music frog belonged to Neobatrachia. The most recent common ancestor (MRCA) for the eight anurans was estimated at 203.53 million years ago (Ma), while the MRCA of all Neobatrachia species was estimated at 160.71 Ma (Fig. [66]1A). A total of 254 gene families (GFs) were significantly expanded in the Emei music frog relative to its MRCA (Fig. [67]1A), many of which were significantly enriched in GO terms related to protein synthesis, including gene functions related to protein glycosylation (GO:0006486), protein hydroxylation (GO:0018126), protein localization to Golgi apparatus (GO:0034067), post-translational protein modification (GO:0043687), protein-lipid complex organization (GO:0071825), protein localization to plasma membrane(GO:0072659), negative regulation of protein localization to plasma membrane (GO:1903077), and protein localization to cell periphery (GO:1990778) (Fig. [68]1C and Table [69]S6). Moreover, four expanded GFs were annotated to SPF pheromone-related terms (Table [70]S7). The Emei music frog genome contained a relatively high percentage of TEs and other interspersed repeats, accounting for 61% of the genome, with long terminal repeat (LTR) retrotransposons accounting for the highest percentage (38%) (Fig. [71]1B and Table [72]S8). The insertion of LTR retrotransposons in anurans peaked around 0–1 Mya, while the genome of the Emei music frog showed multiple amplification peaks (Fig. [73]1D). Fig. 1. Evolutionary history of Emei music frog. [74]Fig. 1 [75]Open in a new tab A Key evolutionary innovations and specialized post-axillary gland in Emei music frog, and GF expansion (red)/contraction (blue) in Emei music frog and other anurans. The bootstrap value of all nodes is 100. B Percentages of TEs in different anuran genomes studied. LTR long terminal repeats, LINE long interspersed element. C Enriched GO terms in expanded GFs. D Insertion time of LTR retrotransposons in anurans. The assembled Emei music frog genome (6.1 Gb) was larger than the genomes of the seven other species analyzed in this study and exhibited a higher proportion of repetitive sequences (61%), in line with previous studies demonstrating that larger genomes typically contain a higher proportion of such sequences^[76]14. Notably, among the species studied, the Emei music frog genome contained the highest percentage of LTR retrotransposons. Analysis of the insertion times of LTR retrotransposons suggested the presence of a large number of recent transposons in the Emei music frog compared to other anurans, a key factor known to affect genome size^[77]15. 2D-TFP gene repertoire in vertebrates As SPF pheromone proteins belong to the 2D-TFP family, we first identified the 2D-TFP gene family in the Emei music frog genome. A total of 30 2D-TFP genes were identified and characterized in the Emei music frog genome, designated Nda2D-TFP1 to Nda2D-TFP30 based on their chromosomal locations (Table [78]S9). The 2D-TFP gene family was also recognized in the common toad, Chinese toad, túngara frog, Leishan spiny toad, kio flying frog, common frog, and tropical clawed frog (Table [79]S10). The lengths of the amino acid sequences of the 30 Emei music frog 2D-TFPs ranged from 163 residues (Nda2D-TFP2) to 269 residues (Nda2D-TFP23), with their molecular weights ranging from 18.73 to 29.21 kDa. Subcellular localization for the Nda2D-TFPs was diverse, with 25 categorized as extracellular, three associated with the plasma membrane and one found in both the cytoplasm and nucleus (Table [80]S9). Phylogenetic analysis can serve as a reference for exploring the functional diversification of 2D-TFPs in amphibians. Here, our phylogenetic analysis included 24 species of amphibians, 12 reptiles (including five birds), 11 mammals, and two fish (see Table [81]S11 for accession numbers). From these 49 species, 289 amino acid sequences were aligned to generate a maximum-likelihood tree (Fig. [82]2A), revealing a wide distribution of 2D-TFPs across vertebrates, with numerous subtypes observed within amphibians. While some sequences are recognized for their SPF roles^[83]7, the functions of many subtypes are yet to be determined. Additionally, our analysis highlighted that the divergence in 2D-TFP functions does not align with the phylogenetic relationships among species. Fig. 2. Evolution of 2D-TFPs in vertebrates. [84]Fig. 2 [85]Open in a new tab A Maximum-likelihood tree, with Japanese puffer (Takifugu rubripes) and zebrafish (Danio rerio) (black) used as the outgroup. B Chromosomal distribution of 2D-TFP genes in anurans. The identified 2D-TFP genes were mapped to three Emei music frog chromosomes, with a significant abundance of gene copies observed on chromosome 11. Similar patterns were also identified in other anuran species, common toad (chromosome 1), Chinese toad (chromosome 2), Leishan spiny toad (chromosome 11), kio flying frog (chromosome 11), common frog (chromosome 10), and tropical clawed frog (chromosome 7) (Fig. [86]2B). Gene duplication is a major driver of evolutionary innovation^[87]16 and plays an important role in the evolution of new protein functions^[88]17. Due to the accumulation of mutations in coding, regulatory, or splicing regions, the functions of duplicate genes tend to diverge over time. Thus, by producing additional copies of the sequence, gene replication provides a source for new functions and may contribute to phenotypic adaptation and the evolution of organisms^[89]18. Furthermore, related studies in metazoans have shown that duplicate genes play an important role in various fields of phenotypic evolution^[90]19,[91]20. Chromosomal mapping revealed the presence of multiple copies of 2D-TFP genes on single chromosomes in anuran amphibians, hinting at a functional acquisition through gene duplication. The maximum-likelihood tree constructed using 2D-TFP sequences from anurans, caudates, and other vertebrates revealed that amphibian 2D-TFPs do not exhibit a monophyletic phylogenetic relationship, with a small number of amphibian 2D-TFPs clustered with mammalian 2D-TFPs. These findings challenge previous studies that suggest amphibian 2D-TFPs are monophyletic and emerged post-divergence from Amniota^[92]10. Contrary to this, other studies posit that during the evolutionary history of vertebrate 2D-TFPs, fish differentiated into an independent clade, while the remaining non-fish vertebrates merged to form a distinct clade^[93]7. Our findings lend support to the latter view. Except for a small number of identified sequences (caudate α-SPF, caudate β-SPF, and anuran SPFs), the phylogenetic analysis highlighted a significant number of sequences with functions yet to be characterized. Despite the vertebrate genome containing numerous duplicates of the coding genes within the 2D-TFP protein class, the functions of these proteins remain poorly understood^[94]7. The high expression levels of several 2D-TFP genes in the reproductive tissues of both sexes in green anoles hint at a close association with animal reproduction^[95]21. However, in amphibians, beyond the established role of SPF genes expressed in male-specific glands, the functions of other 2D-TFP-encoding genes, such as those expressed in frog gastrula embryos^[96]22 and in the eyes and regenerative tissues of axolotls^[97]23, remain to be elucidated. Here, certain anuran 2D-TFPs clustered with caudate α-SPF, indicating the potential existence of an unidentified class of SPF pheromones in anurans. These insights highlight the need for further in-depth research on the complex functions of 2D-TFPs. Expression patterns of 2D-TFP genes in Emei music frog To explore the expression patterns of 2D-TFP genes in the Emei music frog, transcriptomes from the post-axillary gland were sequenced for both sexes and across two developmental stages (Fig. [98]3 and Table [99]S12). Additionally, transcriptomes from the dorsal skin were sequenced to serve as controls. Three independent comparisons were conducted: (1) post-axillary glands of breeding males versus the corresponding area on the skin of breeding females, which lack post-axillary glands; (2) post-axillary glands of breeding males versus the dorsal skin of breeding males; and (3) post-axillary glands of breeding males versus the post-axillary glands of non-breeding males. Fig. 3. Expression patterns of 2D-TFP genes in Emei music frog. [100]Fig. 3 [101]Open in a new tab A Diagram of transcriptome sample collection. B Genes with high expression in post-axillary glands during the breeding season, illustrated with a schematic of the specialized mucus gland based on a paraffin section photo (Fig. [102]S6). C Heatmap of expression patterns of different tissues in male Emei music frog during two breeding seasons.. Based on these comparative analyses, not all 2D-TFP genes in the genome exhibited high expression, with only six genes found to have significantly elevated expression in male post-axillary glands during the breeding season (Fig. [103]3 and Tables [104]S13–[105]15). Subsequent examination of the physicochemical properties of the proteins encoded by these six genes revealed that most of these proteins tended to be hydrophilic, except for Nda11G11450.1 (Table [106]S16). Previous research has shown that SPF pheromones are predominantly expressed in the reproductive glands of amphibians during the breeding season^[107]7,[108]8. In our study, we found that only six out of the 30 2D-TFP sequences exhibited specific and high expression in the post-axillary glands of male Emei music frogs at the breeding stage. The molecular weights of these six sequences ranged from 17.67 to 21.78 kDa, aligning with previously reported molecular weights of SPF pheromones^[109]8. Their isoelectric points varied from 5.63 to 8.79, with a minimal charge in neutral solution, inconsistent with the low reported isoelectric points (4.5–5.3) of SPF pheromones in salamanders^[110]5,[111]22 but similar to the recently discovered caudate pheromone persuasin (7.8–8.5)^[112]24. Except for Nda11G011450.1, all sequences exhibited hydrophilic characteristics, which may facilitate transmission through aquatic environments. Classification and evolution of SPF pheromones in amphibians The maximum-likelihood tree was constructed by integrating the protein sequences encoded by the six identified genes with existing SPF sequences from amphibians (Fig. [113]S7). Results revealed that similar to α-SPF and β-SPF in Caudata, anuran SPFs were categorized into two distinct classes, designated as γ-SPF and σ-SPF. The γ-SPF class contained SPFs from Nyctibatrachus, Boana, and Hyloscirtus species, while σ-SPF comprised the 2D-TFPs from the Emei music frog, SPFs from Plectrohyla species, and SPF isoform 06 from Demerara Falls treefrog (Boana cinerascens) (Fig. [114]S7). The phylogenetic relationship, (α-SPF, (σ-SPF, (β-SPF, γ-SPF))), suggests that α-SPF represents the most ancestral form, followed by σ-SPF. Additionally, β-SPF and γ-SPF formed a sister group. Multiple sequence alignment demonstrated that the number of cysteines in the two domains of γ-SPF was 10/12 at the first domain and eight at the second domain, while σ-SPF had 11 cysteines at the first and nine at the second (Fig. [115]4A). The three-dimensional structures of these sequences were subsequently predicted, and their disulfide bond patterns were analyzed (Table [116]S17), which varied across the four SPF categories, with 7–8 disulfide bonds in α-SPF, nine in β-SPF and γ-SPF, and 9–10 in σ-SPF. Notably, the patterns of the first three disulfide bonds in the first TFP domain were conserved across all SPF types, while the configurations of the fourth and fifth bonds differed between α-SPF and β-SPF, with the primary distinction highlighted in the fourth bond. Selection pressure analysis showed that the d[N]/d[S] values (ω) for all four SPF types were significantly greater than 1 (Table [117]S18), indicating positive selection. Fig. 4. Classification of SPF pheromone systems in amphibians. [118]Fig. 4 [119]Open in a new tab A Cysteine skeleton of four SPF types. B Structural information of four SPF types, including number of cysteines in different domains, d[N]/d[S]rate, number of disulfide bonds, and three-dimensional structure prediction. In the three-dimensional protein structure, red represents the helix, yellow represents the sheet, green represents the loop, and the round ball represents the disulfide bond. The maximum-likelihood tree indicated that the SPFs in caudates and anurans shared a common origin, corroborating earlier evolutionary findings of amphibian 2D-TFPs. Our results showed that SPFs in anurans resembled those in caudates, with both being categorized into two distinct groups. The SPFs from the Emei music frog and Plectrohyla species, along with the SPF isoform 06 identified in the Demerara Falls treefrog, constituted a novel classification within the anuran SPF spectrum. Comparative sequence analysis showed that the Emei music frog SPFs exhibited distinct characteristics compared to the α-SPFs and β-SPFs identified in numerous caudates and certain anurans^[120]8,[121]22. Unlike β-SPFs, which contain 10 cysteines in the first domain and eight in the second domain, or α-SPFs, which contain eight cysteines in the first domain and six in the second domain, Emei music frog SPFs possessed 11 cysteine residues in the first domain and nine in the second. This unique distribution pattern closely mirrors that observed in Demerara Falls treefrog SPF isoform 06 and persuasin^[122]24, partially explaining the deviation of Emei music frog SPFs from typical salamander SPFs in terms of isoelectric point and its phylogenetic clustering with Plectrohyla SPF and Demerara Falls treefrog SPF isoform 06. Regarding disulfide bonds, α-SPFs, positioned at the very phylogenetic base, contained the fewest disulfide bonds, while σ-SPFs contained the most. The evolutionary trajectory of SPF proteins appears to involve an increase in cysteine residues, facilitating the formation of additional disulfide bonds. The defining characteristic of TFPs is their abundance of cysteine residues, which promotes the formation of disulfide bonds within the structural domains. This increase in disulfide bonds enhances both the stability and longevity of the protein^[123]25,[124]26. Physicochemical analyses further revealed that the SPF pheromone is a water-soluble protein, and the increase in disulfide bonds plays a critical role in maintaining its structural integrity. This feature likely enables TFPs to serve as effective sex pheromones, standing out among the myriad proteins secreted by amphibian glands^[125]22,[126]27. In summary, we delineated anuran SPFs into two classes: γ-SPFs (with 10–12 and eight cysteines in the first and second domains, respectively) and σ-SPFs (with 11 and nine cysteines in the first and second domains, respectively). The discovery of these SPF types suggests a greater diversity within anuran SPFs than previously recognized. Future research on anurans could reveal more SPF variants yet to be explored. Regulatory mechanism of SPF To uncover the regulatory mechanisms of SPFs, we utilized weighted gene correlation network analysis (WGCNA) to analyze gene co-expression networks and identify genetic module clusters of highly correlated genes potentially sharing biological activities associated with SPF secretion. Analysis of gene expression correlations revealed a total of 19 modules, with the 19th module comprising genes not assigned to any module (Fig. [127]S8). We then assessed the gene expression patterns of each module based on the module eigengene (ME), which represents the first principal component of the scaled module expression profiles. Analysis revealed a significant correlation between the MEyellow module and male post-axillary gland expression (r = 0.92, P < 0.001, Fig. [128]S9), with the module containing 345 genes, including four SPF genes. Thus, we examined 156 genes located upstream of these four SPF genes and mapped the co-expression network (Fig. [129]S10). Enrichment analysis highlighted involvement in pathways related to pheromone synthesis, pheromone metabolism, steroid synthesis, and metabolism (Fig. [130]S11 and Table [131]S19), suggesting hormonal regulation of SPF secretion. GO enrichment analysis of differentially expressed genes (DEGs) in the post-axillary gland and dorsal skin during the breeding season further supported this hypothesis (Fig. [132]5A). Fig. 5. Pathways associated with SPF secretion and functional verification in Emei music frogs. [133]Fig. 5 [134]Open in a new tab A GO enrichment results of DEGs in post-axillary glands and dorsal skins of Emei music frogs during the breeding season. B Schematic of SPF secretion regulation and behavioral test results of Emei music frogs (n = 10 females). Error bars are the standard errors. C Schematic of testosterone biosynthesis, drawn from KEGG map00140: Steroid hormone biosynthesis. Arrows indicate metabolites, bubbles indicate genes, black indicates no difference, red indicates up-regulated, and blue indicates down-regulated. LH luteinizing hormone, LDL low-density lipoprotein, T testosterone. We then sequenced the post-axillary gland and dorsal skin metabolomes of a male Emei music frog during the breeding season and identified a total of 1754 metabolites (Table [135]S20). The variable importance in projection (VIP) scores derived from the OPLS-DA model serve to assess the strength and interpretive capacity of metabolite expression patterns in categorizing and differentiating sample groups, enabling the identification of biologically significant differential metabolites. Therefore, we used strict OPLS-DA VIP > 1 and P < 0.05 as screening criteria and identified a total of 263 significant differential metabolites (Table [136]S21). Firstly, we conducted KEGG pathway annotations for all metabolites that showed significant differences, thereby obtaining their KEGG Map IDs. By integrating these findings with our transcriptomic data analysis, we narrowed our focus to sterol metabolites. Investigating the map IDs linked to these metabolites, we identified the testosterone biosynthesis pathway as a significant contributor to SPF secretion (Fig. [137]5B, C). Secondary sexual characteristics in amphibians, as observed by Darwin, often manifest during the breeding season, suggesting the involvement of sex steroids in their development^[138]28. Subsequent studies have confirmed the influence of gonadal steroid hormones on seasonal variation in sexual accessory structures and secondary sexual characteristics^[139]29–[140]32. Given that SPF expression was notably high in the sexually dimorphic post-axillary glands during the breeding season, we hypothesized that sex steroids may regulate this expression, as supported by our WGCNA results. Furthermore, metabolomic analysis indicated that Emei music frog SPF secretion was predominantly controlled by the testosterone biosynthesis pathway. Testosterone and 5α-dihydrotestosterone (DHT) are the primary androgens secreted by the testes in amphibians^[141]33–[142]35, with their role in the hypothalamic-pituitary axis shown to influence pheromone secretion in caudate species^[143]36,[144]37. Studies on amphibians such as the African clawed toad (X. laevis)^[145]38, Mexican giant tree frog (Pachymedusa dacnicolor)^[146]39, and rough-skinned newt (Taricha granulosa)^[147]40 have linked reproductive behaviors to elevated plasma androgen levels. Research on the effects of castration and testosterone treatment on the dorsal skin glands of male African clawed toads and peeping frogs (Rana pipiens) found that castration increased specialized mucous glands while almost eliminating breeding glands, while testosterone treatment significantly enhanced the presence of breeding glands and reduced serous mucous glands^[148]41. In our study, we observed a significant increase in both testosterone and DHT levels in the differential metabolites of the post-axillary gland compared to the dorsal skin during the breeding season. Additionally, we found high expression of genes encoding enzymes involved in their synthesis based on transcriptome analysis, providing evidence for the regulation of SPF secretion by sex steroids. Thus, we propose a regulatory mechanism for SPF in which the hypothalamic-pituitary axis releases luteinizing hormone (LH), triggering gonadal testosterone synthesis. This, in turn, promotes the functional activation of reproductive glands, such as the post-axillary gland, ultimately driving the secretion and release of SPFs. Functional verification of SPFs To determine the functions of SPFs, we conducted protein recombination to express the six SPF sequences identified in the Emei music frog, ultimately isolating a protein encoded by the gene Nda11G011470 for use in behavioral experiments (Fig. [149]S12). In the first experiment, female frogs showed a preference for the tank area with the post-axillary gland stimulus (421.20 ± 76.59 s) over the area with the dorsal homogenate (178.80 ± 76.59 s, Z = 2.23, P < 0.05). In the second experiment, female frogs preferred the tank end with recombinant SPF (rSPF) (435.80 ± 71.09 s) over the dorsal homogenate end (164.20 ± 71.09 s, Z = 2.70, P < 0.05). In the third experiment, females showed no preference when choosing between the post-axillary gland homogenate (311.1 ± 90.40 s) and rSPF (288.9 ± 90.40 s, Z = 0.36, P > 0.05, Fig. [150]5B and Table [151]S22). The functional role of the SPF pheromone system in caudate species has been explored in previous studies, resulting in the discovery of the attractive effects of sodefrin on female red-bellied salamanders^[152]42. Although the functional role of SPF pheromones in anuran species remains poorly understood, they are presumed to be related to reproductive behavior based on their reproductive gland secretion^[153]7,[154]43. Our behavioral tests are the first to confirm that post-axillary gland-derived homogenates from the male Emei music frog attract females during the breeding season, corroborating findings in the Yunnan pond frog^[155]11, with the same effect observed with rSPF. When both the male post-axillary gland and rSPF homogenates were presented simultaneously, females exhibited no preference between the two. These findings are consistent with the known role of SPFs in caudate species, implying involvement in female attraction during anuran reproduction. Despite differences in the location and origin of reproductive glands between caudate and anuran species, the secretion and function of SPFs appear conserved, suggesting functional convergent evolution of SPFs across amphibians. Pheromones were previously thought to only be used for communication in anurans lacking acoustic sacs^[156]44–[157]46. While studies have shown that Emei music frog vocalizations are used to attract females and reflect nest quality^[158]12, chemical communication signals have often been neglected. In this study, we revealed that SPF pheromones also play a key role in anuran courtship, likely complementing acoustic signals, with males calling to attract nearby females and females using chemical signals to locate males. Furthermore, our analyses indicated that SPF secretion may be regulated by testosterone. However, given the diversity of pheromone subtypes, their potentially varied functions remain a subject for future research. In this study, we generated a high-quality chromosome-level genome assembly of the Emei music frog (N. daunchina). Comparative genomic analysis with other anurans revealed expanded GFs in Emei music frog involved in protein synthesis, including four GFs associated with pheromones. We detected the presence of the 2D-TFP gene family within the genomes of the Emei music frog and other anurans and found that most 2D-TFP genes were duplicated on a single chromosome, suggesting functional enhancement through extensive replication. Transcriptomic analyses indicated that only six 2D-TFP genes were highly expressed in the post-axillary glands of the Emei music frog during the breeding season. These proteins encoded by these six genes, together with the SPF pheromone of the Plectrohyla species and an SPF subtype of Demerara Falls treefrog, constituted a new SPF class. Furthermore, the transcriptomic and metabolomic analysis highlighted the involvement of the testosterone synthesis pathway in SPF expression. The role of SPFs in Emei music frog reproduction was verified by protein recombination expression and behavioral experiments. Overall, these findings significantly contribute to our understanding of the SPF pheromone system in amphibians, presenting the first network associated with SPF expression and verifying the function of SPF in anurans, thus laying a foundation for future research on multimodal communication in amphibians. Materials and methods Samples used for different sequencing methods and behavioral experiments The Emei music frog samples (Nidirana daunchina) for this research were obtained from the Emei Mountain region in Leshan County, Sichuan Province, China. Genomic sequencing was performed using muscle tissue from a single adult male frog. For transcriptomic sequencing, muscle tissue along with eight other tissue types (dorsal skin, post-axillary gland, nuptial pad, brain, liver, heart, kidney, and lung) was collected from the same individual. The muscle tissue was also used for Hi-C library preparation. For comparative transcriptomic analysis, tissue samples from the dorsal skin, post-axillary gland, brain, and gonads were taken from both male and female frogs during two different breeding seasons (see Table [159]S12). A total of 48 samples were used, with three biological replicates per sample. In the behavioral experiments, we captured three male and 30 female Emei music frogs between July 23 and 25, 2023. Three ethological trials were conducted sequentially, and the frogs were housed in separate plastic containers (15 × 10 × 6 cm) lined with moist tissue paper the night before the trials. After completing their respective trials, the frogs were released back into their natural habitat, except for those involved in stimulus provision. All frogs were sexually mature, with males identified by their advertisement calls, and females by a snout-vent length greater than 45 mm and visibly swollen abdomens indicating gravidity. All the procedures applied for this study were approved by the Institutional Ethics Committee of the Animal Ethical and Welfare Committee of Chengdu Institute of Biology, Chinese Academy of Sciences (permit: CIBDWLL2021014), and all the methods were carried out in accordance with the Code of Practice for the Care and Handling of animal guidelines. This study is reported in compliance with the ARRIVE guidelines. We have complied with all relevant ethical regulations for animal use. Genome sequencing, assembly, and annotation Genomic DNA was extracted from muscle using sodium dodecyl sulfate (SDS). For Illumina sequencing, a short-read (350 bp) library was constructed and sequenced on the Illumina NovaSeq 6000 (Illumina, San Diego, CA, USA). We sequenced five libraries in total. After the removal of sequencing adapters, contaminant reads (mitochondrial, bacterial, and viral sequences), and low-quality reads, we finally obtained 312.15 Gb of clean reads. Genomic DNA was sheared into ~15 kb fragments by Megaruptor®°2. The SMRTbell library was constructed using the SMRTbell Express Template Prep kit 2.0 (Pacific Biosciences), and then the library was sequenced on a PacBio Sequel II platform (PacBio, CA, USA). After filtering out the low-quality reads and sequence adapters, we obtained 201.36 Gb of clean subreads (31.5× coverage) with an N50 value of 14.26k. Hi-C libraries were created from the muscle of the same adult male used for genome sequencing. According to the protocol, nuclear DNA from muscle was cross-linked and enzymatically digested with Hind III, leaving pairs of distally located but physically interacting DNA molecules attached to each other. The sticky ends of the digested fragments were biotinylated and ligated to each other to form chimeric circles. Biotinylated circles, which are chimeras of physically associated DNA molecules from the original cross-linking, were enriched, sheared, and sequenced with the Illumina NovaSeq 6000. A total of 226.5 million clean Hi-C reads pairs (676.21 Gb, 105.7× coverage) were obtained. All whole-genome sequencing data used in this study are listed in Tables [160]S23–[161]S25. For genome assembly, we utilized KMC v.3.0.0^[162]47 to calculate the frequency distribution of k-mers (k = 21 bp) of HiFi sequencing data and employed GCE v.1.0.3^[163]48 to estimate the genome size and heterozygosity. For genome assembly of N. daunchina, we utilized Hifiasm v.0.19.7^[164]49 with optimized parameters “-k 61 -D 10 -N 200”. Subsequently, we employed purge_haplogs^[165]50 to eliminate redundant heterozygous regions in the N. daunchina primary assembly. The resulting de-redundant genomic contigs were then assembled into chromosomes. To achieve the final chromosome-scale assembly, we utilized Hi-C data. Initially, the Hi-C reads were mapped onto the polished assembly, and valid reads approved by HiC-Pro v.2.2^[166]51 were collected. These valid reads were then mapped to the assembly again using Juicer v.1.5^[167]52, and the alignments were used to scaffold contigs with the 3D-DNA v.180922 pipeline^[168]53. Any assembly errors that occurred during the Hi-C scaffolding process were automatically corrected using Juicebox v2.20.00^[169]54. The final chromosome-level genome assembly was evaluated by employing the gene sets of BUSCO v.3.1.0^[170]55, with vertebrate genome mode and related lineage data. Repeat sequences mainly include tandem repeats and interspersed repeats, with the latter primarily being transposable elements (TEs), which are our main focus of study. Due to the relatively low conservation of repeat sequences among different species, it is necessary to construct a specific repeat sequence database when predicting repeat sequences for a particular species. We first employ RepeatModeler2 v2.0.1^[171]56 for de novo prediction, primarily invoking two de novo prediction software tools: RECON v1.0.8^[172]57 and RepeatScout v1.0.6^[173]58. We then use RepeatClassifier, with the aid of the known Dfam database v3.5^[174]59, to categorize the prediction results. Subsequently, we use LTR_retriever v2.9.0^[175]60 specifically for de novo prediction of LTRs, mainly relying on the predictive results from LTRharvest v1.5.10^[176]61 and LTR_FINDER v1.07^[177]62. Following this, we merge the de novo prediction results with the known database and eliminate redundancies to create a species-specific repeat sequence database. Lastly, we employ RepeatMasker v4.1.2^[178]63 to predict the transposable element (TE) sequences in the genome based on the constructed repeat sequence database. The LTR family classification criterion was defined by the 5′LTR sequences of the same family that shared at least 80% identity over at least 80% of their lengths. Tandem repeats are mainly predicted by MIcroSAtellite identification tool MISA v2.1^[179]64 and TandemRepeatFinder v 409(TRF, parameters: 2 7 7 80 10 50 500-d-h)^[180]65. We integrated three approaches, namely, de novo prediction, homology search, and transcript-based assembly, to annotate protein-coding genes in the genome. The de novo gene models were predicted using two ab initio gene-prediction software tools, Augustus v3.1.0^[181]66 and SNAP (2006-07-28)^[182]67. For the homolog-based approach, GeMoMa v1.7^[183]68 software was performed by using reference gene models from the other seven anuran species (Xenopus tropicalis, Leptobrachium leishanense, Rhacophorus kio, Rana temporaria, Engystomops pustulosus, Bufo gargarizans, B. bufo). For the transcript-based prediction, RNA-sequencing data were mapped to the reference genome using Hisat v2.1.0^[184]69 and assembled by Stringtie v2.1.4^[185]70. GeneMarkS-T v5.1^[186]71 was used to predict genes based on the assembled transcripts. Another is to obtain transcripts by Trinity v2.11^[187]72 assembly and then use PASA v2.4.1^[188]73 for gene prediction. Gene models from these different approaches were combined using the EVM software v1.1.1^[189]74 and updated by PASA v2.4.1^[190]73. In total, 27,860 protein-coding genes with an average length of 87,910.65 bp were predicted in the assembled N. daunchina genome. Gene functions were inferred according to the best match of the alignments to the National Center for Biotechnology Information (NCBI) Non-Redundant (NR), EggNOG^[191]75, KOG, TrEMBL^[192]76, InterPro^[193]77 and Swiss-Prot^[194]76 protein databases using diamond blastp (diamond v2.0.4.142) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database^[195]78 with an e-value threshold of 1E-5. The protein domains were annotated Using InterProScan (v5.34-73.0)^[196]79 based on InterPro protein databases. The motifs and domains within gene models were identified by PFAM databases^[197]80. Gene Ontology (GO) IDs for each gene were obtained from TrEMBL, InterPro, and EggNOG. In total, approximately 27,338 (about 98.13%) of the predicted protein-coding genes of N. daunchina could be functionally annotated with known genes, conserved domains, and Gene Ontology terms. Comparative genomic analyses Orthologous gene groups were constructed among 10 vertebrate species, including the green anole, human, tropical clawed frog, Emei music frog, common toad, Chinese toad, túngara frog, common toad, and Kio flying frog (Table [198]S26). Orthologous relationships were identified using Orthofinder v2.4^[199]81 with the diamond-based method, applying an E-value threshold of 0.001. A total of 3601 single-copy genes were extracted from these species and aligned individually. These alignments were combined to generate a supergene matrix used to infer the phylogenetic tree via IQ-TREE v1.6.11^[200]82, incorporating 1000 rapid bootstrap replicates and a maximum-likelihood (ML) tree search in a single run. Then, we used the MCMCTree in PAML v4.9i^[201]83 to estimate the divergence time under a relaxed clock model, with several calibration points referenced (Table [202]S27) to date the divergence time in the unit of Ma. Gene family (GF) analysis was conducted to assess the expansion and contraction of gene families across species. CAFE v4.2^[203]84 was employed to predict GFs expansion and contraction along each branch of the tree, using the birth mortality model. The criterion for defining whether there is a significant expansion or contraction is that both family-wide P-values and Viterbi P-values are less than 0.05. Expanded GFs were then subjected to functional enrichment analysis using Gene Ontology (GO) databases. 2D-TFP gene family identification and phylogenetic analysis To identify the 2D-TFP encoding genes in anurans, we carried out two different approaches. On the one hand, we used the HMMER v3.3.2^[204]85 to search for proteins that conserved the uPAR_Ly6 domain (PF00021) in their sequence, which was previously downloaded from the Pfam database^[205]81. On the other hand, the amino acid sequences from the 2D-TFP described in the vertebrate were downloaded from the NCBI database. These sequences were used as queries to search for 2D-TFPs in the anurans genome using the BLASTP v2.2.30. Finally, redundant proteins, proteins without a two “three-finger” domain, and proteins whose length was out of the range of 160~280 amino acids were discarded. The chromosomal locations of 2D-TFP genes were obtained from general feature format files (GFF3). MapChart v2.32^[206]86 was used to visualize the distribution of these 2D-TFP genes in the genomes. The identified anuran 2D-TFP sequences, along with the vertebrate 2D-TFP sequences collected from the NCBI database (289 in total, see Table [207]S11 for accession numbers), were aligned using Mega7.0^[208]87 for a multiple sequence alignment. The results of the multiple sequence alignment were used to calculate the best evolutionary model in Modelfinder within Phylosuite v1.2.2^[209]88, with “Criterion” set to AICc and “Model for” chosen as IQ tree. The construction of the maximum-likelihood phylogenetic tree was carried out using IQ-TREE within Phylosuite v1.2.2 software^[210]88, selecting the best model calculated by Modelfinder, with “Bootstrap” set to Ultrafast and “Num of bootstrap” to 5000, while also checking “Approximate Bayes test”. The three-dimensional protein structures of 2D-TFPs were identified using AlphaFold^[211]89,[212]90 and visualized in PyMOL^[213]91. Comparative transcriptome sequencing and analysis Samples from the DS, PG, brain, and gonadal tissues were collected from males and females during two distinct breeding seasons, with three replicates for each sample, resulting in a total of 48 samples. Subsequently, the collected samples were sent to Beijing Biomarker Technologies Co., LTD. (Beijing, China) for further library construction and sequencing. The specific procedures can be referred to Li et al., 2019^[214]92[.] The transcript expression levels were quantified as transcripts per million (TPM). The differentially expressed genes (DEGs) were identified in the DESeq2 package^[215]93 by |log2(fold change)| >1 and BH-corrected p < 0.01. We made comparisons between males and females, as well as between different breeding seasons. The potential functions of the DEGs were examined by GO enrichment and elimination of redundancy by ReviGo v1.8.1^[216]94. We used WGCNA v1.63^[217]95 to cluster genes with similar expression patterns across 24 male samples. Initially, a similarity matrix was constructed using TPM values across the samples, which was then transformed into an adjacency matrix by selecting the appropriate weighting coefficient, β. Subsequently, we calculated the topological overlap matrix (TOM) based on this adjacency matrix. The topological overlap measures (1 − TOM) were utilized to perform hierarchical clustering and generate a hierarchical clustering tree. The dynamic tree-cutting method was utilized to identify co-expressed modules with a minimum gene number set at 30 for each module. Then, we computed the module eigengene (ME) for each module and determined Pearson correlations between ME values and sampled traits. To assess statistical significance, Fisher’s asymptotic p-value was calculated using the corPvalueFisher module based on these correlations. A p-value < 0.05 indicated significant associations between modules and traits. Selection pressure analysis Based on the coding sequence (cds) of each SPF pheromone protein (removed the termination codons (TGA, TAA, or TAG)), the average non-synonymous substitution rate (d[N]) and synonymous substitution rate (d[S]) was calculated using the modified Nei and Gojobori method with the Jukes–Cantor correction in MEGA v7.0^[218]87. The standard error (se) was calculated using 500 bootstrap replicates. d[N]/d[S] (ω value) was calculated using the Codon-based Z-test of selection, also with 500 bootstrap replicates^[219]96. Metabolome sequencing and analysis Tissues from the PG and DS of male N. daunchina during the breeding season were immediately snap-frozen in liquid nitrogen following dissection. Subsequently, the samples were sent to Annoroad Gene Technology Co., Ltd. (Beijing, China) for metabolomics sequencing. The detailed procedures can be referred to Tu et al. ^[220]97. For pathway annotation, metabolites were compared against the KEGG database to determine their associated pathways. Enrichment analysis was performed using Fisher’s exact test, considering all metabolites in each pathway as background, and only pathways with p < 0.05 were considered significantly altered. Hierarchical clustering was performed using Cluster 3.0 and Java Treeview^[221]98, with Euclidean distance as the similarity measure and average linkage clustering for grouping. Recombinant protein preparation and behavioral experiments Recombinant SPF (termed rSPF) was expressed as a precursor protein with a cleavable N-terminal 6xHis affinity tag. We chose 2D-TFP sequences that are highly expressed in the post-axillary gland transcriptome of an adult male during the breeding season for expression. The target sequence was obtained by gene synthesis and the synthesized nucleotides were cloned into the mammalian expression vector pcDNA3.4 (Invitrogen, Carlsbad, CA, USA) by use of the Hieff Clone Plus One Step Cloning Kit (YEASEN) according to specification and were optimized for the expression in Expi293. Then, the plasmids pcDNA3.4 were transiently transfected into Expi293 cells (Thermo Fisher, Waltham, MA, USA). The cellular density was adjusted to 3 × 106 cells/mL. Next, the transfection reagent and plasmids were mixed and incubated at room temperature for 20 min. After this time, the mixture was added to the cell culture. After 24 h of incubation, 300 µL of transfection enhancer-1 and 3 mL of transfection enhancer-2 were added and then incubated under shaking conditions. Finally, the supernatants were analyzed after 3 and 5 days post-induction, and the cultures were centrifuged at 4000 × g for 20 min. The supernatant was collected in a Falcon tube and stored at 4 °C until the protein purification process, with 1× protease inhibitors added (SIGMA Cat# S8830-20) to prevent degradation. To perform animal preference tests, we prepared the chemical stimuli. As for chemotaxis detections, the frogs were anesthetized using tricaine methanesulfonate (MS-222, 3 g/ L). Then two bilateral post-axillary glands (weight about 0.07 g) or a dorsal skin patch of the same weight were excised from a euthanized male N. daunchina and wiped away blood gently with wet tissue paper, then suspended in 5 ml cold (4 °C) distilled water and homogenized using a glass tissue homogenizer. Cold distilled water was added to the homogenate to make the final volume 10 ml. After 30 s shaking, the homogenate was divided into 10 aliquots (1 ml each), which were stored at 4 °C and used on the night of the preparation day. Each aliquot contained the equivalent of one-fifth of one post-axillary gland or a tenth of a removed dorsal skin patch. When used as a chemical stimulus, one aliquot was injected into a sponge block (1 cm^2), and then the sponge block was attached to one end of the experimental tank (below water level). Sponge blocks injected with rSPF were also placed in the tank as optional stimuli for tested frogs. There were three sets of behavioral experiments, each composed of 10 trials, were conducted to evaluate the responses of adult females to rSPF. The objectives of the three trials were as follows: (1) Determine whether the post-axillary gland secretes sexual attractants, with females choosing between post-axillary gland and dorsal skin homogenates; (2) Assess whether rSPF functions to attract females, with females selecting between rSPF and dorsal skin homogenates; and (3) Verify whether rSPF replicates the natural odorant, with females selecting between post-axillary gland and rSPF homogenates. All trials were conducted between 20:00 and 02:00 in July 2023, during the active period of the frogs. The trials were performed in a self-made experimental tank: a 1 m polyvinyl chloride rain gutter with a rectangular cross-section (18 cm wide) was sealed at both ends using matched plastic lids (Fig. [222]S13). Markers were painted on the edge of the tank with a black marker pen to subdivide it into 3 areas: from the center point to extend 10 cm bidirectionally was assigned as the waiting zone, whereas the two areas (40 × 18 cm^2) at both ends were referred as the selected zones. A network surveillance camera with an infrared module (PE2200, Panbao Technology, Shenzhen, China) was set up above the tank and manipulated by the software IPClient to record animal behavior. The general procedure of a single trial was described here: Aged tap water was added to the experimental tank to bring the depth to 6 cm, and a tested frog was then placed at the center of the waiting zone with a wire-mesh box covered above to restrain its movement. In the meantime, two different stimuli were introduced to two different ends of the tank based on random side assignments and video recording started. After 10 min acclimation, the wire-mesh box was removed, thus the tested animal was allowed to move freely in the tank. Once the tested frog moved out of the waiting zone, 10 min timing was initiated, after which the frog was collected out from the tank, so as the stimuli, the camera was switched off, then the tank was cleaned thoroughly with detergent and rinsed by aged tape water for several times Recorded behavioral videos were analyzed using the embedded player of IPClient. Timing commenced when the tested frog left the waiting zone, and the total time spent on each stimulus side was calculated, with the midline of the experimental tank serving as the reference for side switching. Two sets of duration data from each preference trial were compared. The normality of the distribution was first assessed using the Shapiro-Wilk test. If the data followed a normal distribution, Levene’s test was used to check for homogeneity of variance. For equal variance, a pooled t-test was used, otherwise, the t’-test was applied. If the data did not meet normality, the Wilcoxon rank-sum test was employed. Statistics and reproducibility The detailed methods of statistical analysis used in this study are described in the corresponding experimental section and [223]Supplementary Information. The differential expression of the genes between different samples was detected using default algorithms implemented in DESeq2. Furthermore, the module associated with SPF pheromone secretion was identified using default algorithms implemented in WGCNA. Transcriptome sequencing was conducted with three independent biological sample replicates, excluding auxiliary annotation. Metabolite identification involved six independent biological samples in duplicate. For the behavioral experiments, ten repetitions were performed for each set of trials, and the stimuli placement was randomized for each trial. Reporting summary Further information on research design is available in the [224]Nature Portfolio Reporting Summary linked to this article. Supplementary information [225]Supplementary Information^ (1.6MB, pdf) [226]42003_2024_7388_MOESM2_ESM.pdf^ (30.7KB, pdf) Description of Additional Supplementary Files [227]Supplementary Data 1^ (5.4MB, xlsx) [228]Supplementary Data 2^ (767.5KB, xlsx) [229]Reporting Summary^ (2.2MB, pdf) [230]Transparent Peer Review file^ (632.6KB, pdf) Acknowledgements