Abstract How allopolyploids are able not only to cope but profit from their condition is a question that remains elusive, but is of great importance within the context of successful allopolyploid evolution. One outstanding example of successful allopolyploidy is the endemic Iberian cyprinid Squalius alburnoides. Previously, based on the evaluation of a few genes, it was reported that the transcription levels between diploid and triploid S. alburnoides were similar. If this phenomenon occurs on a full genomic scale, a wide functional ‘‘diploidization’’ could be related to the success of these polyploids. We generated RNA-seq data from whole juvenile fish and from adult livers, to perform the first comparative quantitative transcriptomic analysis between diploid and triploid individuals of a vertebrate allopolyploid. Together with an assay to estimate relative expression per cell, it was possible to infer the relative sizes of transcriptomes. This showed that diploid and triploid S. alburnoides hybrids have similar liver transcriptome sizes. This in turn made it valid to directly compare the S. alburnoides RNA-seq transcript data sets and obtain a profile of dosage responses across the S. alburnoides transcriptome. We found that 64% of transcripts in juveniles’ samples and 44% in liver samples differed less than twofold between diploid and triploid hybrids (similar expression). Yet, respectively 29% and 15% of transcripts presented accurate dosage compensation (PAA/PA expression ratio of 1 instead of 1.5). Therefore, an exact functional diploidization of the triploid genome does not occur, but a significant down regulation of gene expression in triploids was observed. However, for those genes with similar expression levels between diploids and triploids, expression is not globally strictly proportional to gene dosage nor is it set to a perfect diploid level. This quantitative expression flexibility may be a strong contributor to overcome the genomic shock, and be an immediate evolutionary advantage of allopolyploids. Introduction In polyploid lineages resulting from hybridization (allopolyploids), the combination of homeologous chromosomes from divergent species promotes a multitude of biological events [[29]1]. Heterozygosity, divergence of duplicate genes, and novel gene interactions lead to genetic and phenotypic variability [[30]2] that are stably and successfully maintained in these lineages [[31]1]. Allopolyploids are, in this scope, great evolutionary projects full of opportunities for selection and adaptation. On the other hand, allopolyploid lineages have to face an important challenge, namely to overcome genomic shock caused by the simultaneous high level of heterozygosity (due to hybridization) and gene dosage increase (due to polyploidy) [[32]3]. However, mostly plants and invertebrates but also lower vertebrates, deal with these challenges very successfully [[33]4] as they survive and perpetuate. The evolutionary success of several animal allopolyploid lineages like Squalius alburnoides [[34]5], Rana esculenta [[35]6], Bufo viridis [[36]7] or Poecilia formosa [[37]8], outdates research that suggests that the fate of (allo)polyploids is a rapid extinction, and suggests that such animals might developed mechanisms that stabilize their genomes as already widely reported in plants [[38]9]. In allopolyploid plants, the reduction of gene redundancy towards a functional diploidization (dosage compensation) has been pointed out as a way to cope with gene dosage increase [[39]10], but in vertebrates this hypothesis has been scarcely investigated. However, the recent recognition that hybridization and polyploidy are much more frequent in animals than previously inferred [[40]11] and that this might have significantly shaped vertebrate genomes [[41]4] highlighted the importance to extend these studies further than to allopolyploid plants and invertebrates. In a first attempt to study gene expression regulation in a vertebrate allopolyploid context, the expression level of 7 genes (gene set encompassing tissue specific and housekeeping genes), were evaluated and the occurrence of a compensation mechanism was reported in the allopolyploid cyprinid Squalius alburnoides [[42]12]. In this fish, for those first analysed genes, the existence of a dosage compensation mechanism that brings transcript levels in triploids to the diploid state was shown. Yet, the genomic extension of the phenomenon remains unknown. It may be a global gene dosage compensation event, acting without exception throughout the whole genome. On the other hand, a certain number of genes may escape the dosage-regulation mechanism, or dosage compensation may be restricted to specific subsets of genes, or any other still unformulated conjecture [[43]12]. However, either taking place on a full genomic scale or only partially, the occurrence of similar transcription levels between diploids and triploids can be a relevant factor contributing to the success and perpetuation of polyploids among lower vertebrates. Analyses of entire transcriptomes (RNA-seq), available in the meantime, are now the imminent choices to disentangle this kind of questions, [[44]13, [45]14, [46]15]. RNA-seq allows in a fast and cost-effective way to do a simultaneous qualitative and quantitative analysis of complex transcriptomes [[47]16]. It showed to be a general improvement compared to microarrays [[48]17] and was extensively validated by qPCR, exceeding it in range [[49]14, [50]18]. Squalius alburnoides is an allopolyploid cyprinid, resulting from interspecific hybridization between females of Squalius pyrenaicus (P genome) and males of a now extinct species related to Anaecypris hispanica (A genome) [[51]5, [52]19] ([53]Fig. 1A). S. alburnoides natural populations are composed of animals of different ploidy levels and genomic constitutions ([54]Fig. 1B) referred to as genomotypes, and include fertile sexual and non-sexual forms ([55]Fig. 1C) Currently, in the Iberian southern basins the predominant S. alburnoides genomotypes are the hybrid triploid PAA, diploid PA, and the parental-like diploid AA. Individuals of AA genomotype are all males, and are only reconstituted from hybrids within the complex [[56]5], so despite the homogenomic A. hispanica-like nucleus, they carry P mitochondrial DNA and are called nuclear non-hybrids ([57]Fig. 1C). Fig 1. S. alburnoides complex simplified overview. [58]Fig 1 [59]Open in a new tab A) Initial hybridization event in the origin of the complex. B) Diversity of genomotypes found in the main southern portuguese river basins. In gray background are the S. alburnoides genomotypes more frequent in nature, which are in focus in this work. Dashed lines indicate naturally occurring but rare genomotypes. C) Diversity of gametes, produced through a variety of mechanisms—clonally, by hybridogenesis, meiotic hybridogenesis or normal meiosis, depending on the sex and genomic composition of the individual. Asterisk represents mitochondrial genotype: blue from S. pyrenaicus and red from A. hispanica-like. In this work, our goal is to expose how an evolutionary successful allopolyploid vertebrate, the cyprinid S. alburnoides, deals on the transcriptional level with the genomic stress derived from hybridization and polyploidy. Also, this study aims to contribute to understand the role of gene dosage compensation in the S. alburnoides breeding complex. In the present work, the quantitative expression profiles of diploid and triploid S. alburnoides were compared after RNA-sequencing and de novo assembly of the S. alburnoides transcriptome. Since it is known that there are many transcriptional changes from juveniles to adults [[60]20] and that gene expression patterns can be tissue-specific [[61]12, [62]21], RNA-seq was done both from whole bodies and at a single tissue level. However, the RNA-seq transcript profiling experiments the differences in expression of a gene between two samples are in fact differences in expression per unit of RNA or “per transcriptome. To directly infer global expression dosage responses from the RNA-Seq transcript profiling experiments the transcriptomes compared must be of equal size. Without information about the sizes of the transcriptomes compared, direct assumptions about the expression per gene copy or expression per cell drawn from the transcriptome-normalized expression can be flawed [[63]22; [64]23]. Based only on the expression per transcriptome, the differences in expression per cell that are proportional to the change in the total transcriptome size will appear as equal expression per transcriptome [[65]23]. Consequently, we also estimated the relative transcriptome size from liver samples between diploid and triploid S. alburnoides hybrids. Materials and Methods Fish samples and genotyping Adult specimens From the area of sympatry of S. pyrenaicus and S. alburnoides, in southern Portuguese river basins, a total 20 specimens (6 S. pyrenaicus and 14 S. alburnoides) were collected to perform experimental crosses and provide adult biological material. Individuals were collected from tree locations: Algarve basin, Almargem stream (29 S; 622495.24 m E; 4113964.49 m N (UTM)); Guadiana basin, Oeiras stream (29 S; 604985.29 m E; 4164883.94 m N (UTM)) and Tejo basin, Cobrão stream (29 S; 606212.42 m E; 4398531.98 m N (UTM)). Sampling locations were chosen according with the legal permits and considering the existence of a differential geographical distribution of genomotypes, different relative frequencies of each genomotype in each river basin and the fact that diploid and triploid S. alburnoides are not morphologically distinguishable [[66]5]. All specimens were adults, sexually mature (determined by obvious abdominal distension and gametes releasing upon slight abdominal pressure) and approximately one to three years old (estimated from the length of each specimen). Fish were captured by electrofishing and brought alive to the laboratory. Each fish was photographed to posterior identification by Scaleprinting [[67]24]. Also, DNA was obtained from fin clips and the specimens were genotyped according to [[68]25]. Fish were acclimated for three weeks in high-quality glass tanks (30 l capacity) equipped with filtration units, under the same standard conditions of light (14 hours light, 10 hours dark), temperature (22°C ±1 °C), water quality (pH between 6.5 and 7.30) and nutrition (twice a day feeding with frozen anthemia and commercial fish food flacks). Fourteen individuals (6 PA; 6 PAA; 1 AA and 1 PP) were sacrificed and organs were dissected and preserved in RNA later (Ambion) at -20°C. Juveniles During the reproductive season, and before its use in the previous section, all adults of S. alburnoides and S. pyrenaicus (previously genotyped) and visibly sexually maturated were used to perform defined experimental crosses in order to obtain progenies specifically with PAA, PA, AA and PP genomotypes [[69]5, [70]26]. For each cross, eggs and sperm were collected (by gentle abdominal pressure) from the selected individuals and used for embryo production in petri dishes. Successful fertilization was assessed by observation under the stereoscope after 2h. Viable progeny wit PP genomotype was not obtained. At least 2 viable progenies of each, putatively of AA, PA and PAA genomotypes were reared in high-quality glass tanks (5 l capacity) equipped aerating units, under the same standard conditions of light (14 hours light, 10 hours dark), temperature (20°C ±1°C), water quality (pH between 6.5 and 7.30) and nutrition (twice a day feeding with commercial powder food for fish larvae). 30 days after hatching (dah) several siblings from the same clutch and from each genotype were collected and preserved in RNA later (Ambion) at-20°C. Simultaneous extractions of DNA and total RNA were performed with the AllPrep DNA/RNA Mini Kit (Qiagen). The extracted DNA was used to assess the genomic composition of the selected progenies according to [[71]25]. Library construction and sequencing Total RNA extracted with the AllPrep DNA/RNA Mini Kit (Qiagen) was DNase treated on-column with the RNase-free DNase Set (Qiagen). At least 15 μg of RNA were obtained per sample. Integrity evaluation and quantification of the extracted RNA was performed with Nanodrop 1000 (Thermo Scientific) and 2100 Bioanalyser (Agilent Technologies). All samples presented a RIN higher than 8.5 (Bioanalyser). Normalized juvenile cDNA libraries were homemade prepared according to [[72]27]. Non-normalized liver libraries were prepared with TruSeq RNA Sample Preparation Kit (Illumina) according to the Illumina specifications. All libraries were paired-end sequenced using Illumina HiSeq 2000. Juvenile samples Three barcoded RNA libraries were constructed: juvenile-AA, juvenile-PA and juvenile-PAA. For the construction of the 3 libraries RNA was purified from whole bodies of pools of 4 larvae of each genomotype at 30 dah. At this age, all major organs (except the reproductive systems that are not yet fully defined) are already formed in all 3 investigated genomic forms of S. alburnoides (unpublished data). We did pooling of individuals in order to obtain the minimal amount of RNA required for library construction and sequencing. For each library only siblings from the same cross were used. The 3 libraries were sequenced producing 12 Gb clean data (~ 4Gb per library) in 3 data sets (juv-AA; juv-PA; and juv-PAA) of Illumina HiSeq short paired-end sequence reads (90 bp). The output statistics of sequenced data is available as [73]S1 Table. Liver samples Four barcoded RNA libraries were constructed: one for S. pyrenaicus (liver- PP) and three for S. alburnoides (liver-AA, liver-PA and liver-PAA). For the construction of the libraries RNA was purified from livers, independently for each sample/library. The 4 libraries were sequenced producing 4Gb of clean data (~1Gb per library) in 4 data sets (liv-AA; liv-PA; liv-PP and liv-PAA) (short paired-end sequence reads around 50 bp). The output statistics of sequenced data is available as [74]S2 Table. Processing of RNA-seq data for gene expression quantification The raw data of juv-AA, juv-PA, and juv-PAA were processed into clean data by removing reads with adaptors, reads with more than 5% of unknown nucleotides and reads where more than half of the bases’ quality values were less than 5. Also, orphan reads were excluded. Transcriptome de novo assembly was carried out with Trinity [[75]28]. Assemblies were taken into further processes of sequence splicing and redundancy removing with the sequence clustering software TGICL [[76]29]. After clustering, UniGenes were divided in two classes: clusters (prefix CL) and singletons (prefix unigene) (Statistics of assembly quality provided as [77]S3 Table). blastx alignment (e-value < 0.00001) between unigenes and protein databases (nr, Swiss-Prot, KEGG, COG) was performed, and the best aligning results were used to decide sequence direction. When results of different databases conflicted, the priority order of nr, Swiss-Prot, KEGG and COG was followed (statistics of annotation results provided as [78]S4 Table). Unigenes that were not aligned to any of these databases were scanned by ESTScan (v2.1) [[79]30], to decide the sequence direction. The expression level of each unigene was calculated as FPKM, defined as fragments per kilobase of exon model per million mapped fragments [[80]31], with the Cufflinks package (v0.9.3). Concerning raw sequencing data of liv-AA, liv-PA, liv-PP and liv-PAA, quality filtering was performed: low quality (phred score<20), and ambiguous nucleotides were trimmed off and the quality assessed using FastQC v0.10.1. Reads were mapped to the Danio rerio reference genome (Ensembl Danio rerio genome Zv9.69) using Stampy v1.0.21 (substitution rate of 11% and no multiple hits allowed) (Mapping statistics are presented as [81]S2 Table). To use the mapping approach the divergence rate between D. rerio and S. pyrenaicus (11%) and D. rerio and S. alburnoides AA genomotype (10%) had to be assessed and the higher value was used (11%). Fragments mapped into genes were counted using HTSeq v0.5.3p9 (htseq-count option for no stranded data). FPKM values were calculated using fragment counts from HTSeq and total fragments mapped obtained with Samtools v0.1.18. (flagstat option, counting pair reads plus singletons mapped). Differentially expressed genes were calculated using Bioconductor edgeR package v3.0.8 (for a FDR<0.05) after data normalization using Bioconductor EDASeq package v1.4.0 (first normalized to gene length and second to the libraries size). Comparative analysis of expression levels Expression differences were obtained by dividing the normalized expression values (FPKM) in one library by the corresponding expression value of the same transcript in each other library (fold change) independently for juveniles and livers. The quantitative comparative profiles were displayed through orderly plotting of log[2] (fold change). The value of |log[2](Ratio)|<1 was considered to be the threshold for similar expression. A false discovery rate (FDR) ≤ 0.05 was used as cutoff threshold to determine the significance of differential expression (FDR correction, version for dependent tests, applied to the raw p-value of all transcripts). Also, the observed gene expression level of each transcript in the hybrids was compared to an expected expression level if P and A alleles are expressed exactly at the same level as in the non-hybrid situation (additivity expectation). The expression level of each gene in the parental diploids (AA and PP) was used to calculate the expected additive expression for each gene in both hybrids. Then, the observed expression value (obs) of each transcript was divided by its corresponding expected additive value (exp), both in liv-PAA and in liv-PA. These ratios were log[2] transformed and when within the interval -1exp). The percentages of transcripts in each category, for both a) PA and b) PAA are represented. In the case of the diploid hybrid (PA), we observed that the percentage of additively expressed transcripts rises to more than half of the transcripts (60%) ([110]Fig. 4). From the non-additively expressed ones a significant (χ-text, p<0.001) majority (30% of the total) is also under-expressed in comparison to the additivity expectations ([111]Fig. 4). Similar gene expression and dosage compensation We quantified the similarly expressed transcripts (SE) between each pair of 3n vs 2n genomotypes, both in juveniles and liver data sets ([112]S6 Table). Focusing on the comparisons between hybrids ([113]Table 1), we observe that 64% of the transcripts in juveniles and 44% in the livers are represented similarly in diploids and triploids. Within the SE group we also evaluated the occurrence and/or extension of strictly diploid expression levels in triploids (fold change equal to 1) and also of increased expression in triploids proportional to dosage increase (1.5 fold higher). So, we sorted the SE transcripts into 4 classes. Class I comprises the compensated transcripts, with ratio PAA/PA approximately equal to 1 and within the interval] 0.75;1.25[; in class II are the dosage sensitive ones, with ratio approximately equal to 1.5 and within] 1.25;1.75[; in class III are transcripts affected by some repression, with ratios lower than 0,75 and; in class IV are transcripts overexpressed, with ratios higher than 1.75 ([114]Table 1). The results show that in triploid hybrids more than one third of the SE transcripts are strictly dosage compensated to the diploid hybrid level (45% in juveniles and 35% in liver) ([115]Table 1). On the other hand, there is a much smaller representation of SE transcripts that follow the “1.5-fold rule”, being expressed proportionally to gene dosage (17% in juveniles and 13% in liver) ([116]Table 1). Of notice is that in both data sets only a very small percentage of the SE PAA transcripts (4% in juveniles and 3% in livers) present an expression level higher than 1.5 fold (class IV), while 34% in juveniles and 49% in livers are repressed beyond dosage compensation (class III). Table 1. Similarly expressed transcripts (SE) between triploid and diploid S. alburnoides hybrids in juveniles and livers. Comparisons SE (% of total) SE per class % of SE % of total Juveniles PAA/PA 58076 (64%) I 26376 45 29 II 9935 17 11 III 19672 34 22 IV 2093 4 2 Liver PAA/PA 10068 (44%) I 3508 35 15 II 1308 13 6 III 4947 49 21 IV 305 3 1 [117]Open in a new tab Total numbers and percentages of SE unigenes (juveniles) or mapped genes (livers) and total numbers and percentages of SE’s per expression class. Differential expression Considering the significance criteria for differential gene expression (see [118]material and methods) we quantified the significantly different expressed transcripts between diploid and triploid juveniles and livers ([119]S7 Table). Focusing on the comparisons between hybrids ([120]Table 2), we observed that 22.5% of unigenes in juveniles and 0.83% genes in livers are DE between diploids and triploids. Also, both for juveniles and livers the significant majority of the DE transcripts are higher represented in PAs then in PAAs, despite the higher gene dosage in triploids ([121]Table 2). Table 2. Differentially expressed transcripts (DE) between triploid and diploid S. alburnoides hybrids in juveniles and livers. Comparisons DE (% of total) DE group DE per group % of DE % of total Juveniles PAA/PA 20468 (22.5%) DEH 6813 33 7,5 DEL 13655 67 15 Livers PAA/PA 195 (0.83%) DEH 41 21 0,17 DEL 154 79 0. 65 [122]Open in a new tab Total numbers and percentages of differently expressed (DE) unigenes in juveniles or mapped genes in livers. DE's were divided in two groups: significantly higher expressed in PAA compared to PA (DEH), and significantly lower expressed in PAA compared to PA (DEL). Functional enrichment analysis We used the annotated de novo assembled transcriptome of juveniles and the mapping of the S. alburnoides liver transcriptome to the Zebrafish genome to perform a functional analysis [[123]35]. In order to look for the biological context for the gene expression dosage regulation observed between diploid and triploid hybrids, we performed a GO and a KEGG pathway enrichment analysis in DE and SE groups. We found significant functional enrichment in both DE and SE groups in juveniles ([124]Table 3) and livers ([125]Table 4), and the analysis is quit consistent between the two data sets. Briefly, the SE group is enriched in terms associated with metabolic processes, intracellular parts and constitutes of the ribosomes, while DE is mostly enriched in terms associated with the cell membrane (e.g. transport, adhesion, motility). From the KEGG pathway analysis in the SE group we observed that it is consistent with an enrichment of ribosomal components and ribosomal-linked pathways in both data sets. KEGG pathway analysis of the DE group of juveniles is significantly enriched in components of the circadian rhythm, Wnt signaling and melanogenesis pathways. The DE group in the liver data set is significantly enriched in components of the sphingolipid metabolism and PPAR signaling pathways. Within the SE group we also looked for differential functional enrichment between classes I and II in both juveniles and liver data sets ([126]Table 5). Within our criteria of significance, no significant functional enrichment was detected in class I, both in juveniles and livers. Class II of juveniles is enriched in terms linked to ribosomal complex, to the respiratory chain and to the hemoglobin complex. Livers class II is enriched in terms linked to ribosomes. Table 3. Functional enrichment in GO terms and KEGG pathways of PAA vs PA similarly expressed and differentially expressed gene groups for juveniles. Term # FE p-val. SE BP catabolic process 326 1,1 4,6E-3 BP macromolecule metabolic process 1993 1,0 3,6E-2 BP primary metabolic process 2496 1,0 4,7E-2 CC intracellular part 2829 1,0 6,7E-4 BP ribonucleoprotein complex 195 1,1 7,1E-4 BP cell part 4913 1,0 6,9E-4 BP intracellular 3450 1,0 6,0E-4 BP intracellular organelle 2358 1,0 6,3E-4 MF structural constituent of ribosome 94 1,1 3,0E-2 KEGG Spliceosome 93 1,2 1,4E-3 DE BP cell adhesion 85 1,7 2,1E-6 BP cellular developmental process 154 1,3 1,5E-2 BP cellular component morphogenesis 61 1,5 1,0E-2 BP anatomical structure development 282 1,2 8,5E-3 BP cell motion 56 1,5 4,8E-2 BP anatomical structure morphogenesis 169 1,2 5,0E-2 CC extracellular region part 57 1,5 1,2E-2 CC extracellular matrix 37 1,6 4,7E-2 MF signal transducer activity 272 1,3 3,9E-5 MF ion binding 620 1,1 1,2E-3 KEGG Circadian rhythm 12 3,1 3,2E-2 KEGG Wnt signaling pathway 43 1,5 4,7E-2 KEGG Melanogenesis 31 1,6 3,9E-2 [127]Open in a new tab (BP) Biological process, (MF) molecular function, (CC) cellular component, (KE) KEGG pathway, (#) number of transcripts, (FE) fold enrichment, (p) Benjamini corrected p-value. GO enrichment analysis was performed at the secondary classification of terms. Terms with FE ≥ 1.5 [[128]35] are in bold. Table 4. Functional enrichment in GO terms and KEGG pathways of PAA vs PA similarly expressed and differentially expressed gene groups for livers. Term # FE p-val. SE BP cellular metabolic process 1442 1,1 1,0E-16 BP macromolecule metabolic process 1259 1,2 5,3E-16 BP primary metabolic process 1529 1,1 5,9E-11 BP ribonucleoprotein complex biogenesis 42 1,8 1,8E-5 BP nitrogen compound metabolic process 663 1,1 2,0E-5 BP biosynthetic process 656 1,1 1,4E-4 BP catabolic process 205 1,2 9,0E-3 BP macromolecular complex subunit org. 82 1,3 9,4E-3 BP establishment of RNA localization 18 1,9 1,9E-2 BP translational initiation 24 1,7 2,0E-2 CC ribonucleoprotein complex 185 1,6 2,3E-18 CC intracellular 2021 1,1 5,3E-15 CC intracellular part 1652 1,1 4,5E-12 CC intracellular organelle 1369 1,1 4,7E-9 CC organelle lumen 138 1,5 1,3E-8 CC membrane-bounded organelle 1164 1,1 8,1E-8 CC cell part 2750 1,0 2,9E-4 CC intracellular organelle part 446 1,1 2,5E-3 CC organelle part 446 1,1 2,5E-3 CC non-membrane-bounded organelle 322 1,1 4,6E-3 MF nucleic acid binding 1002 1,2 1,2E-15 MF structural constituent of ribosome 104 1,7 9,3E-13 MF nucleotide binding 793 1,1 1,5E-7 MF translation factor activity 60 1,7 1,3E-6 MF nucleoside binding 501 1,1 1,6E-3 MF transferase activity 630 1,1 1,9E-3 MF ion binding 1052 1,1 4,2E-3 MF ligase activity 116 1,2 2,0E-2 KEGG Ribosome 79 2,0 2,1E-19 KEGG Spliceosome 89 1,7 6,5E-10 KEGG Ubiquitin mediated proteolysis 76 1,4 3,6E-3 KEGG RNA degradation 38 1,5 2,2E-3 DE BP transport 28 2,7 1,5E-5 BP establishment of localization 28 2,7 9,1E-6 BP transmembrane transport 11 3,6 1,1E-2 CC membrane 34 1,5 3,6E-2 CC apical part of cell 3 37,5 2,3E-2 MF substrate-specific transporter activity 17 4,3 1,7E-5 MF hydrolase activity 26 2,5 1,1E-4 MF transmembrane transporter activity 13 3,5 1,6E-3 KEGG Sphingolipid metabolism 4 16,8 2,2E-2 KEGG PPAR signaling pathway 4 13,3 2,2E-2 [129]Open in a new tab BP) Biological process, (MF) molecular function, (CC) cellular component, (KE) KEGG pathway, (#) number of transcripts, (FE) fold enrichment, (p) Benjamini corrected p-value. GO enrichment analysis was performed at the secondary classification of terms. Terms with FE ≥ 1.5 [[130]35] are in bold. Table 5. Differential functional enrichment in GO terms and KEGG pathways between class I and II, both in juvenile and liver data sets. Juveniles Term # FE p-val. Class I No significant enrichment Term # FE p-val. Class II BP heterocycle biosynthetic process 21 2,9 3,9E-3 BP tetrapyrrole metabolic process 12 4,0 1,2E-2 BP tetrapyrrole biosynthetic process 11 4,2 1,2E-2 MF structural molecule activity 65 1,6 3,4E-2 MF heme-copper terminal oxidase activity 10 4,1 3,7E-2 MF cytochrome-c oxidase activity 10 4,1 3,7E-2 MF oxidoreductase activity, acting on heme group 10 4,1 3,7E-2 KEGG Ribosome 21 2,6 5,7E-3 Liver Term # FE p-val. Class I No significant enrichment Term # FE p-val. Class II BP translation 71 3,0 1,4E-16 CC ribosome 62 3,4 3,7E-20 CC ribonucleoprotein complex 65 2,3 1,6E-10 CC intracellular non-membrane-bounded organelle 82 1,7 6,6E-6 CC non-membrane-bounded organelle 82 1,7 6,6E-6 CC ribosomal subunit 11 3,8 6,3E-3 CC small ribosomal subunit 7 4,6 4,8E-2 MF structural constituent of ribosome 60 4,3 8,5E-24 MF structural molecule activity 72 3,4 2,9E-21 KEGG Ribosome 60 5,4 5,2E-35 [131]Open in a new tab (BP) Biological process, (MF) molecular function, (CC) cellular component, (KE) KEGG pathway, (#) number of transcripts, (FE) fold enrichment, (p) Benjamini corrected p-value. GO enrichment analysis was performed at the slim classification of terms. Relative transcriptome size To estimate the relative size of the PAA transcriptome vs the PA transcriptome we used livers from both hybrid genomotypes and analyzed six target genes (rpl8, rpl35, actb2, pabpc1a, eef1a and rpsa) to obtain genome-normalized expression estimates through a qRT-PCR assay and transcriptome-normalized expression estimates from the RNA-Seq assay. In order to estimate relative expression level per genome, we used a qRT-PCR strategy devised by [[132]23] that normalizes cDNA amplification to genomic DNA (gDNA) amplification. The simultaneous RNA and gDNA extraction from the same cells preserves the in vivo RNA/gDNA ratios. This allowed us to normalize gene expression (cDNA amplification) to genome copy number (gDNA amplification), which directly gives the transcript abundance per genome. With this approach we quantified the expression per genome in the allotriploid (PAA) S. alburnoides relatively to its diploid counterpart (PA) for the six target genes ([133]Table 6). Table 6. Data and calculations used for estimating relative triploid vs diploid hybrid transcriptome size. Gene [134]^1 Transcripts/genome (qRT-PCR; N = 5) [135]^2 Transcripts/cell (qRT-PCR; N = 5) [136]^3 Transcripts/transcriptome (RPKM; N = 1) [137]^4 Transcriptome size PAA/PA SD PAA/PA PAA/PA PAA/PA rpl8 0,8 0,1 1,1 1,5 0,8 eef1a 0,8 0,1 1,1 1,3 0,9 actb2 0,6 0,2 0,9 0,6 1,6 rpsa 0,9 0,3 1,3 1,5 0,9 pabpc1a 0,8 0,2 1,2 1,0 1,2 rpl35 0,8 0,2 1,2 1,5 0,8 media 0,8 1,2 1,0 1,0 SD 0,1 0,1 0,3 0,3 [138]Open in a new tab ^1-Expression quantified by qRT-PCR, using total nucleic acid as the template for reverse transcription and normalization to genome copy number. ^2- Because PAA has 3 genomes per cell, meaning 1,5x the amount of genomes per diploid cell, the transcripts/cell values for PAA/PA are equal to 1,5x the values for transcripts/genome. ^3-For each target gene RPKM values were derived from the liver RNA-Seq data set. ^4-Trascriptome size is determined by dividing “transcripts/cell” by “transcripts/transcriptome”. Because PAA has three copies of each gene, for every two copies in PA diploids we calculated expression per cell in PAA relative to PA as 1,5x the relative expression per genome ([139]Table 6). The transcript abundance per transcriptome (RPKM values) for all six target genes were searched within the liver data set and the relative expression per transcriptome between liv-PAA and liv-PA was calculated ([140]Table 6, [141]Fig. 5). With this approach we obtained six independent estimates of the size of the triploid transcriptome relative to the diploid hybrid transcriptome. As expected, there was variation among individual gene estimates, but on average the PAA transcriptome was equal in size to the PA transcriptome. With these data we rejected the null hypothesis that the triploid hybrid transcriptome was subjected to a genome-wide dosage effect, as it was not increased 1.5 fold relative to the diploid hybrid transcriptome (P<0,0001; One sample t-test). On the other hand, the null hypothesis that PAA transcriptome was equal in size (genome wide dosage compensation) to the PA transcriptome was not rejected (P = 0.8867; One sample t-test). Fig 5. S. alburnoides PAA transcriptome size relative to the transcriptome of the diploid PA. [142]Fig 5 [143]Open in a new tab Six individual gene-based estimates of relative transcriptome size and average estimate (±SD; N = 6) of the triploid hybrid transcriptome relative to the diploid hybrid transcriptome. DE represents the expected value if the PAA transcriptome experienced a genome-wide dosage effect. DC represents the expected value if PAA transcriptome experienced genome-wide dosage compensation. Dashed blue line represents the tendency curve of the sactterplot. The blue arrow indicates a possible outlier in the data set but that hypothesis was rejected (Tukey's Method). Discussion Transcriptome size and overall expression To directly infer global expression dosage responses from the RNA-Seq transcript profiling experiments the transcriptomes compared must be of equal size [[144]23]. When comparisons are made between different ploidy levels, intuitively this assumption is flawed due to the real genome-wide differences in gene dosage. However, in the S. alburnoides case, the hypothesis put forward in [[145]12] and explored in the present work, is that there is a common “diploid” state of genic activity between diploid and triploid S. alburnoides individuals. Following the method described and implemented in [[146]23], that couples transcript profiling data with a genome normalized qRT-PCR assay we estimated the liver transcriptome size of the S. alburnoides triploid hybrid (PAA) relatively to the liver transcriptome size of the diploid S. alburnoides hybrid (PA). We showed that the two compared transcriptomes are fairly the same size. This validates the direct use of the RNA-Seq transcript profiling experiments to infer the “gene by gene” global pattern of expression dosage responses between diploid and triploid S. alburnoides. Moreover, it supports at an overall scale the previous conjecture of transcriptional equivalence between diploid and triploid S. alburnoides. Allopolyploid genome regulation One of the puzzling features of S. alburnoides complex is the extraordinary morphological similarity between PA and PAA individuals, which are even undistinguishable by morphometric characters [[147]36]. Conversely, PP and AA genomotypes are easily distinguishable from each other and from the hybrids. The stable phenotypic similarity between PA and PAA hybrids could be interpreted as indication of similar gene expression between them [[148]12, [149]37]. At its quantitative component, this hypothesis was corroborated on a small scale [[150]12, [151]38], when diploid and triploid individuals were found to have similar expression levels for a small analyzed gene set. Conversely, our genome wide approach shows that, despite many genes do not present a significant differential expression between diploid and triploid hybrids, PAA gene expression levels are not globally identical to the ones of PA, or to any of the other diploids. So, the morphological similarity between PAA allotriploids and PA allodiploids is not due to strictly conserved mRNA levels between these genomotypes. On the other hand, the similarities may be at the relative transcriptional contribution of each genome type (P and A alleles) to the overall expression level of each gene, regardless of the total expression level of each gene on each hybrid. Other alternative to the above stated, would be the occurrence of conserved protein levels and identities between 2n and 3n hybrids [[152]39]. So, the regulation between PA and PAA at the translational and post-translational levels and the allele specific contribution to the overall expression of each gene should be investigated in a near future. However, at the transcriptional level, though we ruled out the hypotheses of a global strict full “functional diploidization” of triploids, the majority of transcripts is less represented in PAA then it is in PA and AA genomotypes. One could expect that the level of gene expression would change proportionally to the ploidy variation [[153]40] and not the opposite. But, as an allopolyploid, S. alburnoides combines “ploidy rise” with hybridization and so, the effects associated to hybridization have to be considered. The pre-existing differences between the expression profiles of the parental genomotypes and also unpredictable effects of the complexities of an inter-genomic gene expression regulation are manifesting in the expression profiles of the S. alburnoides hybrids. We observed that the comparisons of both hybrids with PP are consistent with the comparative profile of AA vs PP, where the vast majority of transcripts are less represented in PP. Additionally, we observe a higher amount of transcripts that are less expressed in PAA then in PA. The observed “overcompensation” supports that between triploid and diploid hybrids, dosage compensation occurs, but it is not accurate. We can speculate that in PAA allotriploids the unbalanced genomic contribution and/or a faulty interaction between the different genomes might have to be compensated through allele specific expression regulation [[154]12], and it may well be that the expression level of each allele might be quite variable and adaptable. So, as in plants and invertebrates, also in the S. alburnoides allopolyploid complex there is disruption, due to hybridization and anorthoploidy (odd ploidy) [[155]41] of the quantitative assumptions of additivity, that are usually valid in the case of most homogenomic diploids and autopolyploids [[156]9]. Another hypothesis we can put forward to explain the absence of a positive correlation between copy number and mRNA amount is that the expression level profiles of PAA and PA genomotypes, may be influenced by differences in cell size, which are expected to exist between individuals of different ploidy levels [[157]42]. This hypothesis was not yet explored within the S. alburnoides complex, and is barely investigated in other organisms [[158]43]. Additivity In any hybrid, gene expression is under the influence of divergent genomes, so new qualitative and quantitative gene expression networks are expectedly established, resulting from the interactions of the divergent alleles. In S. alburnoides, we observed that for most genes this expression level divergence from the parental genomotypes was not achieved by averaging the parental allelic contributions. The analysis for additive expression showed that this occurs only for a subset of genes, both in diploid and triploid hybrids of S. alburnoides ([159]Fig. 4). The occurrence of non-additive gene expression in hybrids and allopolyploids has been extensively reported in plants, for example in maize [[160]44], rice [[161]45] and Arabidopsis [[162]46]. Also in animals as oysters [[163]47] and Drosophila [[164]13] the topic was explored and conclusions extended to the animal kingdom. However, for vertebrates, genome wide quantitative gene expression studies were missing. Interestingly, the results of the available studies on plants and invertebrates are not all coincident. Several showed that the majority of genes are expressed additively [[165]48, [166]49], while other studies found higher levels of nonadditive expression [[167]13, [168]44]. The causes for these apparent discrepancies are not yet clear [[169]50]. Anyway, the considerable body of data gathered so far shows a possible positive correlation between size of the fraction of the nonadditively expressed genes and the magnitude of heterotic response. Also, increasing the number of diverse genome copies in an allopolyploid, usually leads to increasingly greater magnitudes of heterosis [[170]51]. Nevertheless, there is no consensus about the amount or identity of the nonadditively expressed genes [[171]50, [172]52]. Concerning the S. alburnoides complex, the phenomenon of heterosis has been barely addressed, except for a few comparisons of growth and reproductive traits between diploid and triploid hybrids [[173]53] and a comparative morphometric study [[174]36], where the AA and PP parental genomotypes were not included. From these studies, mostly non-significant differences between diploid and triploid hybrids have been found, except for a marginal longevity increase in triploids. Yet, the PAA genomotype is far more frequent in the natural populations than PA. So, if we consider the number of non-additively expressed transcripts as an indicator of heterosis, PAA S. alburnoides are favored since the amount of additively expressed transcripts is higher than in PA genomotype. Also, according the Bateson-Dobzhansky-Muller Model a lower fitness in hybrids might result from a bad interaction between divergent genomes due to the differential capacity of interaction between their proteins [[175]54]. In this light, allopolyploid individuals have better chances to evade this weakness. Allopolyploids have more options to non-additively combine allele-specific regulated expressions and so, have higher chances to achieve an optimized and more functional expression pattern than one achieved merely additively. Dosage responses across the S. alburnoides hybrid transcriptome—Expression level regulation In our genome wide prospection for gene expression dosage compensation, a genome wide regulatory mechanism that brings all genic activity of triploids to the diploid state, in a “strict diploidization” of triploids was not seen. However, in both juveniles and liver data sets, we found a considerable fraction of all transcripts (29% in juveniles and 15% in livers) that really suffices the most stringent parameter definition for “fully dosage compensated”. So, “diploidization” might not mean that all genes are down regulated to the diploid level, but only those that need to be “diploidized” in RNA amount to function correctly. Also, a considerable part of all transcripts (around half) do not have a significantly different representation between diploid and triploid hybrids (SE group), and from the SE transcripts that do not belong to class I, the majority are lower represented in PAA than in PA (class III). In fact, there is only a very small percentage of triploid transcripts that are represented strictly proportionally to gene dosage (class II) or even higher (class IV). Thus, our results show a significant “diploidization” in triploid PAA genomotype, but not as a strictly regulated and fine-tuned phenomenon. That is consistent with a switch-like way to regulate the mRNA concentrations, where transcription is turned “on” or “off”, regardless of exact concentrations [[176]55], but within boundaries of similar expression. The mechanisms involved in this regulation of gene expression in S. alburnoides triploids vs diploids are still elusive, but miRNA’s were recently pointed [[177]37] as significant regulators for the functional stability of triploidy in the S. alburnoides complex [[178]37]. The quantitative PAA/PA gene expression analysis in juveniles and livers are conceptually coincidental so, most conclusions should be valid both at the single tissue gene expression regulation as at the full-body scale. There is also convergence between our study and the ones the ones that preceded it. For the same S. alburnoides genes analyzed in [[179]12] and [[180]38] we inspected the PAA/PA expression obtained in our study. All were place inside the SE group (data not shown), except amh and dmrt1 (that are not expected to fit with the dataset). Functional context of the S. alburnoides genome regulation In both, juveniles and liver data sets, the SE group is enriched in terms related to the basal biological maintenance of the cells (eg. metabolism), and mostly in ribosome-linked terms. But regardless of the statistical significance for the enrichment (corrected p-value < 0.05), the fold enrichment of each term is approximately 1. So, in the context of global expression, this enrichment may not be meaningful [[181]35] or may indicate that between triploid and diploid hybrids genes with expression within boundaries of similar RNA amount can occur at any quantity without compromising there function. This expression level flexibility may be a strong contributor to overcome the allopolyploid “genomic shock”. Also it gives an immediate evolutionary advantage to the (allo)polyploids. Within the SE group, we looked for functional enrichment in the strictly dosage compensated class (I) but no significant enrichment was found. That reinforces the previous idea, yet the very small class II, composed of genes with expression strictly proportional to gene dosage, presents a significant enrichment in terms associated to multi-subunit complexes, namely ribosomes. However, the detection of dosage sensitivity in genes whose products are part of multi-subunit complexes is in accordance with the gene balance hypothesis [[182]40, [183]52], which posits that changes in the stoichiometry of the individual subunits would be deleterious. The functional enrichment of the DE gene group may shed some light on phenotypic differences between PAA and PA genomotypes. We verified that in both, the liver and juvenile data sets the GO term enrichment is mostly associated to cell surface and to processes intimately linked to the cells membrane. Previously, it was described the same enrichment for differential expression between budding yeasts (S. cerevisiae) with different ploidy levels [[184]42]. The authors suggested that the differential gene expression observed between ploidy levels was due to cell size and geometry differences between yeasts of different ploidies and not directly to the gene dosage increase [[185]42]. In addition to yeasts, in many other polyploid organisms, including fish [[186]56, [187]57, [188]58], the nucleus and cell volumes expand proportionally to accommodate the enlarged genome of polyploid cells [[189]56]. However, there is a reduction in surface area relative to cell volume [[190]43]. Consequently, the interactions between surface and cytoplasmic signaling, transport of metabolites and the cellular component organization are expected to be affected. Concluding Remarks To our knowledge, this is the first study that globally quantitatively and comparatively profiled by mRNA-seq the transcriptomes of diploid and triploid forms of an allopolyploid vertebrate organism. Our results point towards a certain level of flexibility of expression within a range of mRNA amounts per locus between diploid and triploid hybrids of the S. alburnoides complex. For these allotriploids, gene expression levels are similar to the ones of allodiploids but are neither genome wide strictly diploidized nor strictly proportional to gene dosage. The occurrence of a non-fine-tuned expression regulation at the transcription level might be a key factor for the evolutionary success of allopolyploids. Similar to nucleotide sequence variation, variability at the mRNA expression levels may be also a source for regulatory adaptation to selective pressures. Moreover, the evolution of new functions and subfunctionalizations from redundant genes, that are well known to occur in the allopolyploid situation, are probably facilitated in a context of expression dosage plasticity. In conclusion, this work illustrates how a successful allopolyploid vertebrate transcriptionally deals with the genomic stress derived from hybridization and polyploidy and may shed some light on important features of genome evolution in allopolyploids. Supporting Information S1 Table. Output Statistics of sequencing for juveniles’ data set. (DOCX) [191]Click here for additional data file.^ (17.8KB, docx) S2 Table. Sequencing and mapping statistics for livers data set. (DOCX) [192]Click here for additional data file.^ (17.7KB, docx) S3 Table. Statistics of assembly quality for juveniles’ data set. (DOCX) [193]Click here for additional data file.^ (18.7KB, docx) S4 Table. Summary of annotation results for juveniles’ data set (cds information). (DOCX) [194]Click here for additional data file.^ (17.6KB, docx) S5 Table. qRT-PCR primers. (XLSX) [195]Click here for additional data file.^ (10.9KB, xlsx) S6 Table. Similarly expressed transcripts (SE) between each pair of 3n vs 2n S. alburnoides genomotypes, both in juveniles and liver data sets. (DOCX) [196]Click here for additional data file.^ (19.3KB, docx) S7 Table. Significantly differently expressed (DE) transcripts between each pair of 3n vs 2n S. alburnoides genomotype, both in juveniles and liver data sets. (DOCX) [197]Click here for additional data file.^ (18.2KB, docx) Acknowledgments