Abstract

   Whole genome duplication (WGD) is a powerful evolutionary mechanism in
   plants. Autopolyploids have been comparatively less studied than
   allopolyploids, with sexual autopolyploidization receiving even less
   attention. In this work, we studied the transcriptomes of
   neotetraploids (2n = 4x = 32) obtained by crossing two diploid
   (2n = 2x = 16) plants of Medicago sativa that produce a significant
   percentage of either 2n eggs or pollen. Diploid progeny from the same
   cross allowed us to separate the transcriptional outcomes of
   hybridization from those of WGD. This material can help to elucidate
   events at the base of the domestication of cultivated 4x alfalfa, the
   world’s most important leguminous forage. Three 2x and three 4x progeny
   plants and 2x parental plants were used for this study. The RNA-seq
   data revealed that WGD did not dramatically affect the transcription of
   leaf protein-coding genes. The two parental genotypes did not
   contribute equally to the progeny transcriptomes, and genome-wide
   expression level dominance of the male parent was observed. A large
   majority of the genes whose expression level changed due to WGD
   presented increased expression, indicating that the 4x state requires
   the upregulation of approximately 2.66% of the protein-coding genes.
   Overall, we estimated that 3.63% of the protein-coding genes were
   transcriptionally affected by WGD and may contribute to the phenotypic
   novelty of the neotetraploid plants. Pathway analysis suggested that
   WGD could affect secondary metabolite biosynthesis, which in turn may
   influence forage quality. We found four times as many transcription
   factor genes among the polyploidization-affected genes than among those
   affected only by hybridization. Several of these belong to classes
   involved in stress response. Small RNA-seq revealed that very few
   miRNAs were significantly associated with WGD, but they target several
   hundred genes, and their role in the WGD response may be relevant.
   Integrated network analysis led to the identification of putative
   miRNA: mRNA interactions potentially involved in transcriptome
   reprogramming. Allele-specific expression analysis indicated that
   parent-of-origin bias was not a significant outcome of WGD, but we
   found that parentally biased RNA editing may be a significant source of
   variation in neopolyploids.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12870-025-06090-z.

   Keywords: Alfalfa, Expression level dominance, Gene coexpression
   network, RNA-Seq, Sexual polyploidization, Small RNA, Whole genome
   duplication

Introduction

   Polyploidy is the presence of more than two complete sets of
   chromosomes within a nucleus. Polyploids are generally classified into
   two main categories: autopolyploids, arising from intraspecific whole
   genome duplication (WGD), and allopolyploids, deriving from the merging
   of the genomes of two (or more) species. More realistically, merging
   genomes creates a continuum based on how closely related the original
   genotypes are, from closely related genotypes creating polyploids
   containing sets of completely homologous chromosomes to polyploids
   containing sets of highly divergent homoeologous chromosomes.

   Alterations in meiosis are considered the main path to polyploidy in
   natural populations through a process known as sexual polyploidization,
   which occurs when one or both gametes involved in fertilization are not
   reduced [[40]1]. In particular, bilateral sexual polyploidization
   results from the fusion of a 2n egg cell and a 2n sperm nucleus,
   resulting in 4x offspring, which may initiate the establishment of a
   neopolyploid lineage [[41]2]. In plants, polyploidy has long been
   recognized as a major force for evolution and speciation as well as a
   driver of genomic novelty and plasticity [[42]3–[43]7]. Barker et al.
   [[44]8] estimated that among 4003 plant species of 47 genera, 11% are
   allopolyploid, and 13% are autopolyploid; moreover, most, if not all,
   extant diploid plant species are paleopolyploid, that is, they derive
   from the diploidization of ancient polyploids [[45]9]. WGD probably
   favoured the domestication of agricultural plants [[46]10]. About 50%
   of economically important crop species are polyploid, including
   alfalfa, banana, cotton, coffee, oilseed rape, potato, sweet potato,
   sugarcane, tobacco, and wheat [[47]11, [48]12] emphasizing the
   importance of polyploid research for agriculture. In crop plants, the
   improvement of economically important traits by artificial polyploidy
   has long been investigated. There are examples of agricultural success
   of induced autopolyploids (banana, clover, ryegrass, sugar beet) and
   allopolyploids (festulolium, triticale).

   In allopolyploids, combining different homoeoalleles from parental
   species can lead to new allelic interactions contributing to
   intergenomic heterozygosity and heterosis, which in turn can improve
   crop plasticity under environmental fluctuations and stressful
   conditions [[49]13–[50]15]. In autopolyploids, the possible presence of
   more than two alleles per locus can allow for higher-order gene
   interactions [[51]16]. In 4x alfalfa, the positive effects of multiple
   alleles on biomass yield have been clearly demonstrated [[52]17]. These
   allelic interactions are considered the basis for “progressive
   heterosis”, that is, higher vigour of double crosses (AxB) × (CxD) with
   respect to single crosses AxB or CxD, as demonstrated in potato,
   alfalfa and maize [[53]18–[54]20]. Consistent with this hypothesis,
   autopolyploids from somatic WGD (sometimes termed colchiploids because
   they are most frequently obtained by colchicine treatments) may not be
   more vigorous in terms of biomass and are sometimes less vigorous than
   the original diploids [[55]21–[56]23]. For example, in Solanum
   commersonii and S. bulbocastanum, synthetic autotetraploids showed
   significant gene expression alterations and morpho-anatomical changes,
   but these changes did not consistently result in phenotypic superiority
   over diploids, suggesting variable effects of polyploidy depending on
   the species and genotype [[57]24]. Recently, a thorough phenotypic
   characterization of the effects of somatic WGD on Arabidopsis plants at
   2x, 4x, 6x, and 8x ploidies showed that, at the 4x level, the growth
   rate decreased, cell size increased, stem biomass at flowering
   increased and the cell wall composition was modified [[58]25].
   Autotetraploids from sexual polyploidization likely contain more than
   two alleles per locus if they originate from unrelated diploid parents;
   thus, they can functionally be considered an intermediate condition
   between somatic autopolyploids and allopolyploids. Compared with their
   diploid parents, these tetraploids can display greater biomass and
   fertility [[59]26, [60]27].

   Polyploidization can cause rapid genomic structural changes, such as
   chromosomal rearrangements, gene loss, and genome fractionation,
   sometimes resulting in greater loss of genes from one of the subgenomes
   (subgenome dominance) [[61]28–[62]30]. WGD can also alter the
   associations among chromosomes in the nucleus [[63]31, [64]32]. WGD has
   been demonstrated to affect transcription, resulting in deviations from
   the parental mean (nonadditive expression), expression level dominance,
   transgressive expression, and allele-specific expression (ASE) bias
   [[65]30, [66]33–[67]35]. ASE is defined as the differential expression
   of two alleles of a gene and is commonly observed in both plants and
   animals [[68]35, [69]36]. It can arise through several different
   mechanisms: gene loss and silencing, activation of transposable
   elements, epigenetic modifications, and interactions between the
   divergent regulatory networks of the parents [[70]35–[71]38]. Among the
   transcriptional outcomes of WGD, the impact of WGD on the expression of
   small RNAs (smRNAs), including nonadditive expression, has been
   reported [[72]39–[73]43].

   When studying the effects of polyploidization, an important issue is
   the need to differentiate the impact of hybridization from those of WGD
   [[74]30, [75]44, [76]45]. Hybridization, both intraspecific and
   interspecific, has generally been demonstrated to influence gene
   expression more than WGD alone [[77]22, [78]23, [79]44, [80]46–[81]48].
   In fact, interspecific hybridization combines two genomes with suites
   of partially diverged cis- and trans-regulatory elements [[82]35,
   [83]49–[84]52], that can interact and alter the transcriptional balance
   between parental genomes. In the case of autopolyploids from sexual
   polyploidization, separating the impacts of hybridization and WGD is
   extremely relevant.

   Compared with allopolyploidization, autopolyploidization is expected to
   cause fewer alterations in gene expression [[85]53]. Limited gene
   expression imbalance was observed in some autopolyploids relative to
   their diploid parents [[86]27, [87]46, [88]54–[89]56].

   Alfalfa (Medicago sativa L., 2n = 4x = 32) is an important leguminous
   forage crop worldwide. The origin of cultivated 4x alfalfa can be found
   in sexual polyploidization events within and between the two wild
   diploid subspecies, M. sativa ssp. caerulea and ssp. falcata, which
   gives rise to cultivated M. sativa ssp. x varia [[90]26,
   [91]57–[92]59]. In previous work by our group, a M. sativa ssp. falcata
   plant (PG-F9, 2n = 2x = 16) and a ssp. falcata x caerulea hybrid (12P,
   2n = 2x = 16) producing significant percentages of 2n eggs and pollen,
   respectively, were selected and characterized [[93]27, [94]60, [95]61].
   Three 2x and three 4x plants (2n = 4x = 32) were randomly selected from
   among the progeny of the PG-F9 × 12P cross. These six plants are full
   sibs at different ploidy levels and are well suited for separating the
   effects of hybridization from those of polyploidization at the
   phenotypic and molecular levels. Compared with their diploid full
   siblings, the three neotetraploids presented increased biomass, earlier
   flowering, higher seed set in crosses with unrelated plants, higher
   seed weight, and larger leaves with larger cells. Microarray analyses
   revealed that the expression levels of several hundred genes related to
   diverse metabolic functions changed in response to polyploidization
   alone [[96]27]. In this work, we further investigated the
   transcriptomes of these plant materials via RNA-seq and smRNA-seq. We
   conducted allele-specific expression analyses to gain further insight
   into the short-term transcriptional consequences of sexual
   tetraploidization in M. sativa. We show quantitatively limited but
   qualitatively relevant nonadditive mRNA expression and almost no
   significant allele-specific expression bias that can be ascribed
   specifically to WGD. Small RNA expression changes were also limited but
   potentially relevant. We conclude that the gene expression shifts
   induced by sexual polyploidization involving divergent alfalfa
   genotypes are not large but can impact several pathways involved in
   adaptation and phenotypic traits.

Materials and methods

Plant material and experimental conditions

   Eight genotypes were used in this work [[97]27]: the two 2x parental
   plants PG-F9 and 12P described above, three 2x progeny plants (S8, S16,
   S24) and three 4x progeny plants (S29, S48, S60). They were maintained
   by yearly cloning through stem cuttings. Three plants per genotype
   (biological replicates) were grown in pots filled with field soil, sand
   and peat moss 2:1:1 and randomized on benches within a bee-proof cage
   under natural conditions at the departmental experimental station (S.
   Andrea d’Agliano, Perugia, Italy). All the plants were clipped at the
   beginning of flowering. Three-week-old shoots (vegetative growth stage)
   of the regrowth were sampled by taking the first three fully expanded
   leaves from several shoots of each of the three cloned plants per
   genotype. The leaves were then immediately flash-frozen in liquid
   nitrogen and stored at -80 °C until DNA or RNA extraction. Genomic DNA
   was extracted using the GenElute Plant Kit (Sigma-Aldrich, St. Louis,
   MO, USA). RNA isolation was performed using the RNeasy Plant Mini Kit
   (Qiagen, Hilden, Germany) according to the manufacturer’s instructions.
   Acid nucleic purity and concentration were first assayed using the
   NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA,
   USA). For sequencing, RNA quantification was performed using
   Bioanalyzer (Agilent) and equal amounts of total RNA per sample were
   used.

RNA-seq

   RNA-seq (Novaseq6000 2 × 150 bp, ~ 18 million fragments per sample) was
   performed by Sequentia Biotech (Barcelona, Spain). The raw fastq files
   were cleaned and trimmed with Cutadapt 4.6 [[98]62], using default
   parameters for Illumina sequencing. FastQC 0.11.9 was applied to check
   for quality of both the raw and the clean sequences. To reduce bias for
   multiple copies of the same exact gene in polyploid genomes, cleaned
   reads were aligned with STAR 2.7.10b [[99]63] to a nonredundant
   (collapsed) version of the alfalfa genome sequence of Chen et al.
   [[100]64], produced by Sequentia as described below. Bam files from the
   alignment were managed with samtools 1.19 [[101]65] and a count matrix,
   accounting only for uniquely mapped reads, produced with FeatureCounts,
   from the subread package [[102]66] with the help of the annotation
   previously produced. This matrix was then imported into R (R Code Team
   2024) [[103]67] and processed with DESeq2 [[104]68] with an in-house
   built script. The raw matrix was first purged from low-expressed genes,
   and DESeq2 best practices were applied to retrieve differentially
   expressed genes. The hierarchical clustering map for differential
   expression genes is shown in Supplementary Fig. [105]S1A. Genes were
   considered differentially expressed between genotypes or ploidy levels
   when their log[2] fold change (FC) was ± 1.0 with Padj < 0.05; P values
   were adjusted using the Benjamini and Hochberg’s approach for
   controlling the false discovery rate [[106]69]. The iTAK tool (version
   1.6) was used for transcription factor prediction with default
   parameters ([107]http://itak.feilab.net/cgi-bin/itak/index.cgi)
   [[108]70]. Pathway enrichment analysis of nonadditively expressed or
   differentially expressed genes was conducted using the ShinyGO V0.80
   online tool [[109]71], with M. truncatula used as the reference gene
   set. To achieve comprehensive gene functional annotation, M. sativa
   NCBI non-redundant protein IDs were converted into M. truncatula
   orthologs using gProfiler
   ([110]https://biit.cs.ut.ee/gprofiler/convert). KOBAS (v 3.0 version,
   adjusted P < 0.05) was used to test the statistical enrichment of
   differentially expressed genes (DEGs) in KEGG pathways [[111]72].

Small RNA (smRNA) sequencing

   Small RNA libraries were generated and sequenced by Novogene Co., Ltd.
   Twenty-four sequencing libraries were constructed (eight samples ×
   three biological replicates) using the smRNA Library Prep Kit
   (Illumina, San Diego, CA, USA) following the manufacturer’s
   recommendations. Briefly, 3’ and 5’ adaptors were ligated to the 3’ and
   5’ ends of smRNA. The first strand of cDNA was subsequently synthesized
   after hybridization with a reverse transcription primer. The
   double-stranded cDNA libraries were generated through PCR enrichment
   and checked with a Qubit fluorometer (Thermo Fischer Scientific, MA,
   USA) and real-time PCR for quantification and a sensitive Agilent 2100
   Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) for size
   distribution detection. The libraries with inserts between 18 and 40 bp
   were pooled and sequenced on the Illumina NovaSeq 6000 platform to
   generate 50 bp single-end reads. Raw fastq format reads were first
   processed through custom Perl and Python scripts to remove low-quality
   reads, adapter sequences or the insert tag, which contained poly A, T,
   G, or C. At the same time, the Q20, Q30, and GC contents of the raw
   data were calculated. All downstream analyses were based on clean,
   high-quality data. The smRNA tags were mapped to the M. sativa
   reference genome [[112]64] by Bowtie (bowtie-0.12.9 version) [[113]73]
   without mismatches. Mapped smRNA tags were used to search for known or
   conserved miRNAs against miRbase (version 20.0). miRDeep2
   (mirdeep2_0_0_5 version) [[114]74] and srna-tools-cli
   ([115]https://github.com/sRNAworkbenchuea/UEA_sRNA_Workbench) were used
   to identify candidate miRNAs and draw their secondary structures. Then,
   smRNA tags were mapped to the RepeatMasker (version 4.0.3) [[116]75]
   and Rfam [[117]76] databases to exclude reads originating from
   protein-coding genes, repetitive sequences, rRNAs, tRNAs, snRNAs and
   snoRNAs. The resulting reads were used for novel miRNAs prediction
   based on the hairpin structure of miRNA precursors using miREvo
   (miREvo_v1.1 version) [[118]77] and miRDeep2 (mirdeep2_0_0_5 version)
   [[119]74]. The miRNA target genes were predicted using psRobot
   (psRobot_v1.2 version) [[120]78].

Small RNA differential expression, gene ontology and KEGG enrichment

   The expression levels of known and unique miRNAs estimated via
   transcripts per million were statistically analysed and normalized
   [[121]79]. The read counts of each gene were used as input data for
   DESeq2 (1.12.0) to obtain differentially expressed miRNAs (DEMs)
   [[122]68]. The resulting P values were adjusted using the Benjamini and
   Hochberg’s approach to control the false discovery rate [[123]69]. The
   significant genes (Padj ≤ 0.05) with corresponding log[2] fold change
   values of ± 1.0 were considered differentially expressed. A correlation
   analysis was performed to demonstrate experimental repeatability and
   reveal differences in gene expression among samples (Supplementary Fig.
   [124]S1B). Hierarchical clustering was performed using the read counts
   of each sample as input data. The GO enrichment analysis of the DEMs’
   target genes was implemented using the ShinyGO online tool (8.08
   version, Padj ≤ 0.05) [[125]71] with M. truncatula as the reference
   species. Gene functions were determined using the KEGG pathway
   database. KOBAS (3.0 version, Padj ≤ 0.05) was used to test the
   statistical enrichment of the target genes in KEGG pathways [[126]72].

Gene coexpression network analysis

   The integrated network analysis between miRNA‒target pairs with
   negative regulatory relationships was implemented using the SWIM tool
   [[127]80, [128]81]. The read counts from both mRNA- and miRNA-related
   genes were used as input data to identify statistically significant
   master regulators between diploids and tetraploids. An adjusted p-value
   cutoff of 0.05 and a log[2]fold change (log[2]FC) threshold of ± 1 were
   adopted to construct a gene coexpression network based on Pearson
   correlation.

Allele-specific expression analysis (ASE)

   A custom reference genome was constructed for allele-specific
   expression analysis on the basis of a recently published allele-aware
   chromosome-level genome assembly for cultivated alfalfa [[129]64]. This
   genomic assembly consists of 32 allelic chromosomes (8 chromosomes with
   four copies each) generated by the combination of high-fidelity
   single-molecule sequencing and Hi-C data. Since variant calling and
   RNA-Seq tools require a linear reference genome, we constructed a
   unified reference genome that includes all unique genes without
   redundancy, to avoid losing information due to possible gene content
   differences between the four haplotypes. To this end, two tools were
   combined: MashMap ([130]https://github.com/marbl/MashMap) [[131]82], a
   fast mapper specifically designed to work with long DNA sequences such
   as genome assemblies, and OrthoFinder
   ([132]https://github.com/davidemms/OrthoFinde) [[133]83], which
   identifies orthogroups and orthologous genes. MashMap alignments
   allowed to identify shared regions between haplotypes and
   cross-references gene annotations to determine whether genes overlap