Abstract Whole genome duplication (WGD) is a powerful evolutionary mechanism in plants. Autopolyploids have been comparatively less studied than allopolyploids, with sexual autopolyploidization receiving even less attention. In this work, we studied the transcriptomes of neotetraploids (2n = 4x = 32) obtained by crossing two diploid (2n = 2x = 16) plants of Medicago sativa that produce a significant percentage of either 2n eggs or pollen. Diploid progeny from the same cross allowed us to separate the transcriptional outcomes of hybridization from those of WGD. This material can help to elucidate events at the base of the domestication of cultivated 4x alfalfa, the world’s most important leguminous forage. Three 2x and three 4x progeny plants and 2x parental plants were used for this study. The RNA-seq data revealed that WGD did not dramatically affect the transcription of leaf protein-coding genes. The two parental genotypes did not contribute equally to the progeny transcriptomes, and genome-wide expression level dominance of the male parent was observed. A large majority of the genes whose expression level changed due to WGD presented increased expression, indicating that the 4x state requires the upregulation of approximately 2.66% of the protein-coding genes. Overall, we estimated that 3.63% of the protein-coding genes were transcriptionally affected by WGD and may contribute to the phenotypic novelty of the neotetraploid plants. Pathway analysis suggested that WGD could affect secondary metabolite biosynthesis, which in turn may influence forage quality. We found four times as many transcription factor genes among the polyploidization-affected genes than among those affected only by hybridization. Several of these belong to classes involved in stress response. Small RNA-seq revealed that very few miRNAs were significantly associated with WGD, but they target several hundred genes, and their role in the WGD response may be relevant. Integrated network analysis led to the identification of putative miRNA: mRNA interactions potentially involved in transcriptome reprogramming. Allele-specific expression analysis indicated that parent-of-origin bias was not a significant outcome of WGD, but we found that parentally biased RNA editing may be a significant source of variation in neopolyploids. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-025-06090-z. Keywords: Alfalfa, Expression level dominance, Gene coexpression network, RNA-Seq, Sexual polyploidization, Small RNA, Whole genome duplication Introduction Polyploidy is the presence of more than two complete sets of chromosomes within a nucleus. Polyploids are generally classified into two main categories: autopolyploids, arising from intraspecific whole genome duplication (WGD), and allopolyploids, deriving from the merging of the genomes of two (or more) species. More realistically, merging genomes creates a continuum based on how closely related the original genotypes are, from closely related genotypes creating polyploids containing sets of completely homologous chromosomes to polyploids containing sets of highly divergent homoeologous chromosomes. Alterations in meiosis are considered the main path to polyploidy in natural populations through a process known as sexual polyploidization, which occurs when one or both gametes involved in fertilization are not reduced [[40]1]. In particular, bilateral sexual polyploidization results from the fusion of a 2n egg cell and a 2n sperm nucleus, resulting in 4x offspring, which may initiate the establishment of a neopolyploid lineage [[41]2]. In plants, polyploidy has long been recognized as a major force for evolution and speciation as well as a driver of genomic novelty and plasticity [[42]3–[43]7]. Barker et al. [[44]8] estimated that among 4003 plant species of 47 genera, 11% are allopolyploid, and 13% are autopolyploid; moreover, most, if not all, extant diploid plant species are paleopolyploid, that is, they derive from the diploidization of ancient polyploids [[45]9]. WGD probably favoured the domestication of agricultural plants [[46]10]. About 50% of economically important crop species are polyploid, including alfalfa, banana, cotton, coffee, oilseed rape, potato, sweet potato, sugarcane, tobacco, and wheat [[47]11, [48]12] emphasizing the importance of polyploid research for agriculture. In crop plants, the improvement of economically important traits by artificial polyploidy has long been investigated. There are examples of agricultural success of induced autopolyploids (banana, clover, ryegrass, sugar beet) and allopolyploids (festulolium, triticale). In allopolyploids, combining different homoeoalleles from parental species can lead to new allelic interactions contributing to intergenomic heterozygosity and heterosis, which in turn can improve crop plasticity under environmental fluctuations and stressful conditions [[49]13–[50]15]. In autopolyploids, the possible presence of more than two alleles per locus can allow for higher-order gene interactions [[51]16]. In 4x alfalfa, the positive effects of multiple alleles on biomass yield have been clearly demonstrated [[52]17]. These allelic interactions are considered the basis for “progressive heterosis”, that is, higher vigour of double crosses (AxB) × (CxD) with respect to single crosses AxB or CxD, as demonstrated in potato, alfalfa and maize [[53]18–[54]20]. Consistent with this hypothesis, autopolyploids from somatic WGD (sometimes termed colchiploids because they are most frequently obtained by colchicine treatments) may not be more vigorous in terms of biomass and are sometimes less vigorous than the original diploids [[55]21–[56]23]. For example, in Solanum commersonii and S. bulbocastanum, synthetic autotetraploids showed significant gene expression alterations and morpho-anatomical changes, but these changes did not consistently result in phenotypic superiority over diploids, suggesting variable effects of polyploidy depending on the species and genotype [[57]24]. Recently, a thorough phenotypic characterization of the effects of somatic WGD on Arabidopsis plants at 2x, 4x, 6x, and 8x ploidies showed that, at the 4x level, the growth rate decreased, cell size increased, stem biomass at flowering increased and the cell wall composition was modified [[58]25]. Autotetraploids from sexual polyploidization likely contain more than two alleles per locus if they originate from unrelated diploid parents; thus, they can functionally be considered an intermediate condition between somatic autopolyploids and allopolyploids. Compared with their diploid parents, these tetraploids can display greater biomass and fertility [[59]26, [60]27]. Polyploidization can cause rapid genomic structural changes, such as chromosomal rearrangements, gene loss, and genome fractionation, sometimes resulting in greater loss of genes from one of the subgenomes (subgenome dominance) [[61]28–[62]30]. WGD can also alter the associations among chromosomes in the nucleus [[63]31, [64]32]. WGD has been demonstrated to affect transcription, resulting in deviations from the parental mean (nonadditive expression), expression level dominance, transgressive expression, and allele-specific expression (ASE) bias [[65]30, [66]33–[67]35]. ASE is defined as the differential expression of two alleles of a gene and is commonly observed in both plants and animals [[68]35, [69]36]. It can arise through several different mechanisms: gene loss and silencing, activation of transposable elements, epigenetic modifications, and interactions between the divergent regulatory networks of the parents [[70]35–[71]38]. Among the transcriptional outcomes of WGD, the impact of WGD on the expression of small RNAs (smRNAs), including nonadditive expression, has been reported [[72]39–[73]43]. When studying the effects of polyploidization, an important issue is the need to differentiate the impact of hybridization from those of WGD [[74]30, [75]44, [76]45]. Hybridization, both intraspecific and interspecific, has generally been demonstrated to influence gene expression more than WGD alone [[77]22, [78]23, [79]44, [80]46–[81]48]. In fact, interspecific hybridization combines two genomes with suites of partially diverged cis- and trans-regulatory elements [[82]35, [83]49–[84]52], that can interact and alter the transcriptional balance between parental genomes. In the case of autopolyploids from sexual polyploidization, separating the impacts of hybridization and WGD is extremely relevant. Compared with allopolyploidization, autopolyploidization is expected to cause fewer alterations in gene expression [[85]53]. Limited gene expression imbalance was observed in some autopolyploids relative to their diploid parents [[86]27, [87]46, [88]54–[89]56]. Alfalfa (Medicago sativa L., 2n = 4x = 32) is an important leguminous forage crop worldwide. The origin of cultivated 4x alfalfa can be found in sexual polyploidization events within and between the two wild diploid subspecies, M. sativa ssp. caerulea and ssp. falcata, which gives rise to cultivated M. sativa ssp. x varia [[90]26, [91]57–[92]59]. In previous work by our group, a M. sativa ssp. falcata plant (PG-F9, 2n = 2x = 16) and a ssp. falcata x caerulea hybrid (12P, 2n = 2x = 16) producing significant percentages of 2n eggs and pollen, respectively, were selected and characterized [[93]27, [94]60, [95]61]. Three 2x and three 4x plants (2n = 4x = 32) were randomly selected from among the progeny of the PG-F9 × 12P cross. These six plants are full sibs at different ploidy levels and are well suited for separating the effects of hybridization from those of polyploidization at the phenotypic and molecular levels. Compared with their diploid full siblings, the three neotetraploids presented increased biomass, earlier flowering, higher seed set in crosses with unrelated plants, higher seed weight, and larger leaves with larger cells. Microarray analyses revealed that the expression levels of several hundred genes related to diverse metabolic functions changed in response to polyploidization alone [[96]27]. In this work, we further investigated the transcriptomes of these plant materials via RNA-seq and smRNA-seq. We conducted allele-specific expression analyses to gain further insight into the short-term transcriptional consequences of sexual tetraploidization in M. sativa. We show quantitatively limited but qualitatively relevant nonadditive mRNA expression and almost no significant allele-specific expression bias that can be ascribed specifically to WGD. Small RNA expression changes were also limited but potentially relevant. We conclude that the gene expression shifts induced by sexual polyploidization involving divergent alfalfa genotypes are not large but can impact several pathways involved in adaptation and phenotypic traits. Materials and methods Plant material and experimental conditions Eight genotypes were used in this work [[97]27]: the two 2x parental plants PG-F9 and 12P described above, three 2x progeny plants (S8, S16, S24) and three 4x progeny plants (S29, S48, S60). They were maintained by yearly cloning through stem cuttings. Three plants per genotype (biological replicates) were grown in pots filled with field soil, sand and peat moss 2:1:1 and randomized on benches within a bee-proof cage under natural conditions at the departmental experimental station (S. Andrea d’Agliano, Perugia, Italy). All the plants were clipped at the beginning of flowering. Three-week-old shoots (vegetative growth stage) of the regrowth were sampled by taking the first three fully expanded leaves from several shoots of each of the three cloned plants per genotype. The leaves were then immediately flash-frozen in liquid nitrogen and stored at -80 °C until DNA or RNA extraction. Genomic DNA was extracted using the GenElute Plant Kit (Sigma-Aldrich, St. Louis, MO, USA). RNA isolation was performed using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Acid nucleic purity and concentration were first assayed using the NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). For sequencing, RNA quantification was performed using Bioanalyzer (Agilent) and equal amounts of total RNA per sample were used. RNA-seq RNA-seq (Novaseq6000 2 × 150 bp, ~ 18 million fragments per sample) was performed by Sequentia Biotech (Barcelona, Spain). The raw fastq files were cleaned and trimmed with Cutadapt 4.6 [[98]62], using default parameters for Illumina sequencing. FastQC 0.11.9 was applied to check for quality of both the raw and the clean sequences. To reduce bias for multiple copies of the same exact gene in polyploid genomes, cleaned reads were aligned with STAR 2.7.10b [[99]63] to a nonredundant (collapsed) version of the alfalfa genome sequence of Chen et al. [[100]64], produced by Sequentia as described below. Bam files from the alignment were managed with samtools 1.19 [[101]65] and a count matrix, accounting only for uniquely mapped reads, produced with FeatureCounts, from the subread package [[102]66] with the help of the annotation previously produced. This matrix was then imported into R (R Code Team 2024) [[103]67] and processed with DESeq2 [[104]68] with an in-house built script. The raw matrix was first purged from low-expressed genes, and DESeq2 best practices were applied to retrieve differentially expressed genes. The hierarchical clustering map for differential expression genes is shown in Supplementary Fig. [105]S1A. Genes were considered differentially expressed between genotypes or ploidy levels when their log[2] fold change (FC) was ± 1.0 with Padj < 0.05; P values were adjusted using the Benjamini and Hochberg’s approach for controlling the false discovery rate [[106]69]. The iTAK tool (version 1.6) was used for transcription factor prediction with default parameters ([107]http://itak.feilab.net/cgi-bin/itak/index.cgi) [[108]70]. Pathway enrichment analysis of nonadditively expressed or differentially expressed genes was conducted using the ShinyGO V0.80 online tool [[109]71], with M. truncatula used as the reference gene set. To achieve comprehensive gene functional annotation, M. sativa NCBI non-redundant protein IDs were converted into M. truncatula orthologs using gProfiler ([110]https://biit.cs.ut.ee/gprofiler/convert). KOBAS (v 3.0 version, adjusted P < 0.05) was used to test the statistical enrichment of differentially expressed genes (DEGs) in KEGG pathways [[111]72]. Small RNA (smRNA) sequencing Small RNA libraries were generated and sequenced by Novogene Co., Ltd. Twenty-four sequencing libraries were constructed (eight samples × three biological replicates) using the smRNA Library Prep Kit (Illumina, San Diego, CA, USA) following the manufacturer’s recommendations. Briefly, 3’ and 5’ adaptors were ligated to the 3’ and 5’ ends of smRNA. The first strand of cDNA was subsequently synthesized after hybridization with a reverse transcription primer. The double-stranded cDNA libraries were generated through PCR enrichment and checked with a Qubit fluorometer (Thermo Fischer Scientific, MA, USA) and real-time PCR for quantification and a sensitive Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) for size distribution detection. The libraries with inserts between 18 and 40 bp were pooled and sequenced on the Illumina NovaSeq 6000 platform to generate 50 bp single-end reads. Raw fastq format reads were first processed through custom Perl and Python scripts to remove low-quality reads, adapter sequences or the insert tag, which contained poly A, T, G, or C. At the same time, the Q20, Q30, and GC contents of the raw data were calculated. All downstream analyses were based on clean, high-quality data. The smRNA tags were mapped to the M. sativa reference genome [[112]64] by Bowtie (bowtie-0.12.9 version) [[113]73] without mismatches. Mapped smRNA tags were used to search for known or conserved miRNAs against miRbase (version 20.0). miRDeep2 (mirdeep2_0_0_5 version) [[114]74] and srna-tools-cli ([115]https://github.com/sRNAworkbenchuea/UEA_sRNA_Workbench) were used to identify candidate miRNAs and draw their secondary structures. Then, smRNA tags were mapped to the RepeatMasker (version 4.0.3) [[116]75] and Rfam [[117]76] databases to exclude reads originating from protein-coding genes, repetitive sequences, rRNAs, tRNAs, snRNAs and snoRNAs. The resulting reads were used for novel miRNAs prediction based on the hairpin structure of miRNA precursors using miREvo (miREvo_v1.1 version) [[118]77] and miRDeep2 (mirdeep2_0_0_5 version) [[119]74]. The miRNA target genes were predicted using psRobot (psRobot_v1.2 version) [[120]78]. Small RNA differential expression, gene ontology and KEGG enrichment The expression levels of known and unique miRNAs estimated via transcripts per million were statistically analysed and normalized [[121]79]. The read counts of each gene were used as input data for DESeq2 (1.12.0) to obtain differentially expressed miRNAs (DEMs) [[122]68]. The resulting P values were adjusted using the Benjamini and Hochberg’s approach to control the false discovery rate [[123]69]. The significant genes (Padj ≤ 0.05) with corresponding log[2] fold change values of ± 1.0 were considered differentially expressed. A correlation analysis was performed to demonstrate experimental repeatability and reveal differences in gene expression among samples (Supplementary Fig. [124]S1B). Hierarchical clustering was performed using the read counts of each sample as input data. The GO enrichment analysis of the DEMs’ target genes was implemented using the ShinyGO online tool (8.08 version, Padj ≤ 0.05) [[125]71] with M. truncatula as the reference species. Gene functions were determined using the KEGG pathway database. KOBAS (3.0 version, Padj ≤ 0.05) was used to test the statistical enrichment of the target genes in KEGG pathways [[126]72]. Gene coexpression network analysis The integrated network analysis between miRNA‒target pairs with negative regulatory relationships was implemented using the SWIM tool [[127]80, [128]81]. The read counts from both mRNA- and miRNA-related genes were used as input data to identify statistically significant master regulators between diploids and tetraploids. An adjusted p-value cutoff of 0.05 and a log[2]fold change (log[2]FC) threshold of ± 1 were adopted to construct a gene coexpression network based on Pearson correlation. Allele-specific expression analysis (ASE) A custom reference genome was constructed for allele-specific expression analysis on the basis of a recently published allele-aware chromosome-level genome assembly for cultivated alfalfa [[129]64]. This genomic assembly consists of 32 allelic chromosomes (8 chromosomes with four copies each) generated by the combination of high-fidelity single-molecule sequencing and Hi-C data. Since variant calling and RNA-Seq tools require a linear reference genome, we constructed a unified reference genome that includes all unique genes without redundancy, to avoid losing information due to possible gene content differences between the four haplotypes. To this end, two tools were combined: MashMap ([130]https://github.com/marbl/MashMap) [[131]82], a fast mapper specifically designed to work with long DNA sequences such as genome assemblies, and OrthoFinder ([132]https://github.com/davidemms/OrthoFinde) [[133]83], which identifies orthogroups and orthologous genes. MashMap alignments allowed to identify shared regions between haplotypes and cross-references gene annotations to determine whether genes overlap