Abstract
Cassava, Manihot esculenta Crantz, is the main raw material used in
starch production in China. However, due to the small planting scale
and high demand in China, large-scale imports are needed. To improve
cassava yield and to meet China’s needs, we examine the agronomic
traits of root weight, root number, and root length-to-width ratio per
plant. By constructing two semi-sibling genetic maps and using years of
data for quantitative trait locus (QTL) localization, we compare two
population mapping results to screen co-located 15 QTLs, and
transcriptome analysis to explore candidate genes related to these
traits. We found OsWRKY78 in rice to be homologous to candidate gene
Manes.03G051300, which can regulate rice stem elongation and seed size,
and Manes.18G023500 to be homologous to MeMYB108, which can reduce leaf
shedding and regulate cassava biomass. Through QTL mapping, we identify
key genes related to yield traits that can be used in cassava molecular
breeding to improve cassava yield.
Supplementary Information
The online version contains supplementary material available at
10.1186/s12870-025-06278-3.
Keywords: Manihot esculenta, Cassava root, Linkage map, Quantitative
trait locus
Introduction
Cassava is an important energy plant and one of six major global food
crops [[44]1]. It is widely grown in tropical regions of Africa, Asia,
and Latin America [[45]2]. Characterized by its adaptability, fast
growth and high yield, cassava is an important food crop, especially in
the tropics and subtropics, and is of high economic and social value
because of its ability to grow in less-fertile soils [[46]3] and
because it is not limited by season or moisture. Cassava starch is
suitable for producing high-quality modified starch because of its
large particles and high transparency, and it is necessary for modified
starch in some industries. Cassava tubers are more valuable because
they contain large amounts of starch, and small amounts of protein
among other nutrients. These tubers can be processed into a variety of
food products, used as animal feed, for brewing beer, and for producing
alcohol for bioethanol [[47]4]. For some African countries,
approximately 25% of energy acquisition is obtained from cassava
[[48]5]. However, despite its importance, the average cassava yield in
Africa has not significantly increased since 1961.
Quantitative trait locus (QTL) localization is based on genetic linkage
analysis, using linkage relationships between molecular markers and
target traits to identify genomic regions associated with those traits
[[49]6]. With development of high-throughput sequencing technology the
drawing of fine genetic maps for cassava (Manihot esculenta Crantz) has
considerably progressed, with multiple genetic linkage maps having been
constructed using RFLP (Restriction Fragment Length Polymorphism), SSR
(simple sequence repeat), and SNP (single nucleotide polymorphism)
markers. These maps cover 18 linkage groups, with an average molecular
marker distance ranging 4.51–4.81, with linkage map lengths ranging
1707.9–2412.0 centimorgans (cM) [[50]7–[51]9]. Aided by these maps,
significant progress has also been made in identifying Quantitative
Trait Loci (QTL) for various cassava traits. Boonchanawiwat et al.
searched for QTL that affected the height of cassava plants and the
height of the first branch, by performing linkage map analysis in
self-pollinated progeny [[52]10]. Nang et al. performed QTL analysis on
a reciprocal cross population and reported plant and first branch
heights to be controlled by multiple genes [[53]11]. Okogbenin et al.,
investigating the genetic basis of early root bulking in cassava by
genetic analysis and QTL localization in a population of
non-self-crossing parents, reported early root bulking to be a complex
trait controlled by multiple genes [[54]12]. Akinbo et al. identified
several QTLs that affected the protein content of cassava roots through
QTL analysis in a backcross population [[55]13]. Zou et al. constructed
an amplified-fragment single nucleotide polymorphism and methylation
(AFSM) QTL map by crossing two non-inbred parents, KU50 and SC124 (KS
population), to obtain 186 cassava populations, and identified 260
candidate QTL genes for cold stress and 301 candidate QTL genes for
storing root quality and yield [[56]14, [57]15]. Okogbenin, and Ewa’s
studies identified several QTLs associated with cassava plant size and
yield traits [[58]16–[59]18]. Overall, progress has been made in the
study of QTL localization for root number in cassava, but further
research is needed to improve understanding of genetic mechanisms
controlling root number, and to use this knowledge to improve cassava
varieties.
Hyper-seq is a novel, effective and flexible marker-assisted selection
method for genotyping that is based on the polymerase chain reaction
(PCR). Compared with commonly used whole genome sequencing methods such
as restriction site-associated DNA tag sequencing fragments generated
by, for example, type IIB restriction endonucleases (2b-RAD) and AFSM,
Hyper-seq is more efficient and less costly. Its greatest advantage is
that it can complete genome simplification, barcode addition, and
simplified genome sequence amplification in one step. A Hyper-seq
library is constructed with specially designed primers and DNA (for PCR
amplification) or leaf (for direct PCR amplification) templates,
without the need for restriction enzyme digestion and barcode
connection steps [[60]19].
In the present research, two half sibling populations were obtained by
crossbreeding cassava cultivar SC205(♀) with HB60(♂) and 18R(♂),
respectively. To study the genetic mechanisms and key genes affecting
cassava yield traits, in order to improve cassava breeding and increase
cassava yield. Three traits related to cassava yield were selected for
phenotype analysis, and gene typing was performed using target sequence
method. The phenotype and genotype data were integrated for association
mapping analysis to identify QTLs closely related to the infected area,
which were validated through gene expression analysis.
Materials and methods
Mapping populations and DNA extraction
We crossed cassava cultivar SC205 with cultivars HB60 and 18R to
produce two F1 populations, with 259 offspring from SC205 × 18R (A1
population) and 233 offspring from SC205 × HB60 (A2 population)
(Supplementary Table S1). Populations were planted on two plots of land
in Chengmai Meiyu Village, Hainan Province, in April of 2021 and 2022.
Each planting was replicated twice, with two rows per replicate, and
six plants per row. After 6 months, young leaf samples were collected,
from which DNA was extracted using the CTAB method, and evaluated using
nanodrop [[61]20]. After 10 months of planting, cassava root samples
were collected for RNA extraction for transcriptome sequencing.
Hyper-seq library preparation and sequencing
The DNA concentration of all samples was homogenized to 150–200 ng /µL,
and the reaction volume was increased to 20 µL by adding 1 µL of DNA
sample, mix, F’ primer, and ddH[2]O for PCR amplification reaction;
obtain PCR products, purify PCR products using TIANGEN(DP214) reagent
kits, perform double end sequencing using MGI sequencer (DNBSEQ-T7),
and upload data to BGI raw data were uploaded to BGI.
Sequence analyses and genotyping
Raw data quality control was performed using FastQC(fastqc -o
outputfile inputfile) software [[62]21]. Original data were divided
into each sample using Barcode and fastq-multx(fastq-multx -B
barcode.txt -m 1 -b itaq.1.fastq itaq.2.fastq -o %.R1.fastq -o
%.R2.fastq) software ([63]https://github.com/brwnj/fastq-multx). The
AM5608.1 version genome was selected as a reference and BWA [[64]22]
(bwa index -a bwtsw cassava.fa) was used to establish a reference
genome index to generate BAM files. Variant calling was performed using
SAMtools and BCFtools [[65]23, [66]24], and high-quality SNPs were
filtered from VCF files using PLINK [[67]25] software (–geno 0.1 –maf
0.05 –hwe 0.0001). SNP marker density was visualized using the CMplot
package in R, and variant sites were annotated using SNPEff [[68]26].
Transcriptome analyses and genotyping
Raw transcriptome data were analyzed using FastQC, followed by adapter
removal, base modification, trimming, and filtering of low-quality
sequences using Trimmomatic(trimmomatic PE -phred33 input_forward.fq.gz
input_reverse.fq.gz) [[69]27]. Hisat2 [[70]28] was used to construct a
reference genome index of the AM5608.1 version genome and to align
paired-end clean reads to the reference genome. FeatureCounts [[71]29]
was used to accurately count reads; reads of all samples were then
combined and exported as a single RNA-Seq_Practice_countstable file.
Expression levels of each gene in each sample were quantified using
FPKM and TPM calculation formulae. Gene expression levels were obtained
for different samples.
Linkage map construction
A linkage group refers to a set of genes or markers that are located on
the same chromosome and are genetically linked to each other
[[72]30]. We constructed a genetic linkage map of common parents from
the 259 offspring dataset of Group A1 and the 233 offspring dataset of
Group A2, and compared the differential loci between parents to obtain
the genotype of parent specific variant loci. Firstly, we used VCFtools
[[73]31] to extract the shared specific variant loci between offspring
and parents. PLINK [[74]24] (– bfile myfile – geno 0.05) was then used
to filter mutation sites, and PLINK (– file myfile – recodeA) was used
to encode the offspring genotype data as 012. Then import the markers
into QTL IciMapping software [[75]32] and use it to construct a linkage
map. Initially, markers were grouped into linkage groups based on a LOD
score threshold of over 3. Within each group, markers were ordered
using the nnTwoOpt algorithm, which is a hybrid approach combining the
Nearest Neighbor Algorithm and the Two-opt Algorithm [[76]33], and
genetic distances were calculated using the Kosambi mapping function
[[77]34]. The map was further refined by selecting a window size of 10
markers and adjusting the marker data based on the sum of adjacent
recombination frequencies. This adjustment ensured the shortest
possible genetic distances between markers. During the refinement
process, marker positions were iteratively optimized to minimize map
expansion and distortion, thereby enhancing the accuracy and stability
of the final linkage map. The final linkage map, including marker
names, genetic positions, and linkage group assignments, was exported
from QTL IciMapping. The map was then visualized using the qtl package
in R [[78]35].
Phenotypic assessment analyses
Phenotypic data were measured in the field 10 months after planting,
when three plants with similar growth vitality were selected, RW, RN,
and RLW of each was measured. Finally, the phenotype field raw data for
2022 and 2023 were obtained. Then, outlier phenotypic data were removed
according to the 3 σ rule (values outside μ − 3 σ and μ + 3 σ intervals
are outliers, where μ is the sample data mean and σ is the data
standard deviation).
After handling the outliers, we choose a linear mixed model approach to
analyze all phenotypic traits. The best linear unbiased estimation
(BLUE) of each individual was used to perform QTL mapping. In order to
obtain joint localization results of different populations with data
between single year and multiple years, Model 1 was used to obtain
phenotype data of different populations in 2022 and 2023, and Model 2
was used to obtain merged phenotype data of 22–23.
[MATH: Yijl=μ+βi+γj
+ϵl+e
ijl
:MATH]
1
[MATH: Yijkl=μ+
mo>βi+γj+δk+
ϵjl+eijkl :MATH]
2
where Yijl is the phenotypic observation related to line i, location j,
repetition l, Y[ijkl] is the phenotypic observation related to line i,
location j, year k repetition l,μ is the global intercept. β[i] is the
fixed effect of the i-th variety. γ[j] is the random effect at position
j. δ[k] is the random effect of the kth year. ϵ[l] is a random effect
of l repeated combinations. ϵ [jl] is a random effect of j positions
and l repeated combinations. Eijl and e[ijkl] is error term,
representing random errors in observations.
Lme4 is an R package that can generally be used to solve mixed models.
It supports linear mixed models, generalized linear mixed models, and
nonlinear mixed models. We selected lme4 package based on Model 3 to
calculate the heritability of different traits in two populations.
Spearman’s coefficients were used to evaluate correlations between
traits, and Shapiro–Wilk tests were used to evaluate normality,
kurtosis, and skewness (α = 0.05).
[MATH:
h2=VgVg+VGL/L+VGY/Y+VGLY/L∗Y+
Ve/Y∗L∗R :MATH]
3
where V[g] is the variance component of cultivar, V[GL] is the
interaction variance component between cultivar and location, V[GY] is
the variance component between cultivar and year, and V[e] is the
residual variance component. V[GLY] is the interaction variance
component between cultivar and location and repetition, and Ve is the
residual variance component, Y, L and R are the the number of year,
location, and repetition respectively.
QTL Detection and candidate gene analysis
MapQTL5 software [[79]36] was used for QTL calculation. Linkage maps
and phenotype information from 2022 and 2023 were used for Multiple-QTL
Mapping (MQM Mapping) to map QTL and estimate their effects [[80]37].
The LOD score for significant QTL was determined through a test
analysis (1000 permutations, total error level 5%). Candidate loci with
LOD > 2 were screened, by collating the loci mapped in the two
populations; loci within a 50 Kb interval upstream and downstream of
the candidate loci that are present in both populations and in three or
more data sets were selected as co-located loci. Collation of the
co-located loci to determine the co-located QTL, and estimation of the
contribution of each identified QTL to the total phenotypic variance
through analysis of variance. Loci and QTL were organized based on
population and year information (e.g., 22-A1), and the final QTL were
determined. The nomenclature for identified QTL involves a trait
abbreviation preceded by “q,” the relevant chromosome name, and a
numerical identifier (e.g., qRW—1a). A confidence interval of 1 Mb
flanking the QTL identification region was used to screen candidate
genes.
Candidate genes were extracted from the QTL interval and functionally
annotated using annotation information for the cassava AM560.8.1
genome. Important genes were selected for analysis. Gene numbers were
compared with annotation files of the AM560v8.1 reference genome to
obtain corresponding GO numbers for each gene. Candidate gene numbers
and GO numbers were input into WEGO ([81]http://wego.genomics.org.cn/)
for GO functional annotation, and gene sequences were extracted using
SeqKit subseq [[82]38]. The input file was organized and imported into
KEGG ([83]http://kobas.cbi.pku.edu.cn/genelist/) for pathway enrichment
analysis. An enrichment plot was generated using R.
Results
SNP marker density and annotation
Approximately 406 Gb of sequencing data were obtained, with an average
sequencing depth of 812 Mb per sample (the cassava AM560 8.1 version
genome size is 617 Mb). In total, 1,215,893,952 high-quality reads were
obtained, with an average of 2,461,324 reads per sample. Obtained two
mutation site vcf files for the (2,417,535) and A2 populations
(1,376,544). The A1 population had 161,361 unique variant sites shared
with the SC205 parent (the A1 maternal parent, abbreviated A1_SC205)
and the A2 population had 121,556 unique variant sites shared with the
SC205 parent (the A2 maternal parent, abbreviated A2_SC205). An SNP
marker density map indicated that variant sites were densely
distributed across the genome (Fig. [84]1A, B). Obtained variant sites
were annotated to better understand their position on the genome;
annotation information and SNP statistics by type and region are
provided in Supplementary Table S2.
Fig. 1.
[85]Fig. 1
[86]Open in a new tab
A A1 population SNP density map, with an average of 8610.2 mutation
sites per chromosome; B A2 population SNP density map, with an average
of 6478.9 mutation sites per chromosome; C Collinearity analysis of two
group linkage maps (A1 with an average collinearity of 30.37%, and peak
(38.01%) collinearity in LG16; A2 with an average collinearity of
26.69% and peak (39.40%) collinearity in LG18)
By filtering and separating sites and redundant sites, 5913 were
obtained in population A1 and 6648 in population A2 for construction of
high-density genetic maps of population maternal parents, and 18
linkage groups were identified finally (consistent with the number of
haploid chromosomes in cassava). The total map distance of the A1_SC205
genetic map was 2492.62 cM, the average distance between markers was
0.43 cM, the length of 18 linkage groups ranged 82.42–219.53 cM, and
the longest linkage group was LG01. The total map distance of the
A2_SC205 genetic map was 2445.17 cM, the average distance between
markers was 0.37 cM, the length of linkage groups ranged
72.10–206.95 cM, and the longest linkage group was LG01 (Supplementary
Table S3). Collinearity in linkage map markers between populations was
observed by counting common loci, of which there were 1792; for A1, the
average collinearity rate was 30.37%, with a maximum of 38.01% (LG16);
for A2 the average collinear rate was 26.69%, with a maximum of 39.40%
(LG18) (Supplementary Table S3). TBtools was used to draw chromosome
collinearity for each population (Fig. [87]1C). The collinearity
relationship between linkage maps of each population (Supplementary
Fig. 1) revealed a certain degree of collinearity on each chromosome,
laying a foundation for subsequent co-localization.
Statistical analysis and correlation of phenotypic data
Summary data for parental population phenotypic mean values,
statistical data for different years, and combined Best Linear Unbiased
Estimators (BLUE) values for correlated traits between populations, are
presented in Supplementary Table S4. A combined analysis of populations
revealed a coefficient of variation (CV) for RW to range 0.413–0.656,
and heritability values ranged 0.16–0.23. For RN, the CV ranged
0.244–0.342, and heritability values ranged 0.48–0.5. For RLW, the CV
ranged 0.207–0.283, and heritability values ranged 0.48–0.59
(Table [88]1). In both populations, RW was most variable and had
relatively lower heritability, while RN and RLW had smaller trait
variabilities and higher stability and heritability. This suggests that
data stability may influence the magnitude of trait heritability.
Additionally, the similarity in heritability between populations
indicates a relatively stable heritability.
Table 1.
Statistical analysis of population traits of hybrid progenies
population Trait Year Mean standarddeviation Range Skew Kurtosis CV
Heritability
A1 RN 2022 6.589 1.901 2.510–12.510 0.194 2.472 0.288 0.48
2023 5.353 1.824 0.720–10.720 0.538 3.165 0.341
22–23 5.738 1.620 1.57–11.57 0.079 3.064 0.282
RW 2022 2.614 1.560 0.120–9.310 0.936 4.158 0.597 0.16
2023 2.643 1.734 0.230–9.120 1.297 4.635 0.656
22–23 2.513 1.474 0.08–9.103 0.934 4.357 0.587
RLW 2022 6.788 1.556 3.290–12.330 0.635 3.516 0.229 0.58
2023 8.695 1.956 1.340–13.860 −0.385 3.754 0.225
22–23 7.608 1.572 0.935–12.812 −0.156 4.123 0.207
A2 RN 2022 6.585 2.204 1.00–14.160 0.419 3.318 0.335 0.53
2023 4.684 1.600 0.720–9.720 0.438 3.753 0.342
22–23 5.599 1.364 2.237–10.73 0.211 3.726 0.244
RW 2022 2.161 1.219 0.140–8.670 1.394 6.670 0.564 0.23
2023 1.563 0.791 0.060–5.090 1.010 4.917 0.506
22–23 1.789 0.740 0.081–4.327 0.074 2.866 0.413
RLW 2022 6.818 1.761 3.450–12.460 0.750 3.746 0.258 0.59
2023 7.432 2.106 0.830–14.660 0.406 4.172 0.283
22–23 7.100 1.537 2.876–11.604 0.225 2.835 0.217
[89]Open in a new tab
Overall skewness was close to 0 and kurtosis to 3, suggesting
approximate data normality and suitability for subsequent analysis.
Data also approximately follow a normal distribution (Fig. [90]2A).
Violin plots and trait error distribution plots (Fig. [91]2B) reveal
minimal overall variation in phenotype data for each population over
the two years, indicating low data error and high stability.
Fig. 2.
[92]Fig. 2
[93]Open in a new tab
A Violin plots of trait distribution; B Error plots of trait
distribution; and Trait correlation coefficients for C) Group A1 and D)
Group A2
R software was used to compute trait correlation coefficients and
correlations between populations, and to generate a correlation heatmap
(Fig. [94]2C, D). All three traits correlated positively in both
populations, with RN strongly and positively correlated with RW.
Boxplots revealed similar overall trends in the data for RN and RW.
QTL localization
QTL mapping results reveal 188 QTLs to be located with RLW in different
years between populations, along with 157 QTLs associated with RN and
213 QTLs related to RW. Quantities of QTLs mapped in different
populations and years are summarized in Table [95]2, and detailed in
Supplementary Table S5. To validate QTL effectiveness, positional
information of mapped loci in populations was compared; 3 co-located
RLW-related QTLs, 3 RN-related QTLs, and 9 RW-related QTLs were
identified (Table [96]3).
Table 2.
Number of loci for different traits in different populations and years
Year RLW RN RW Total
A1-22 20 9 17 46
A1-23 38 41 31 110
A1-22 + 23 32 13 19 64
A2-22 38 41 74 153
A2-23 35 21 14 70
A2-22 + 23 25 32 58 115
Total 188 157 213 558
[97]Open in a new tab
Table 3.
Coloucation QTLs mapped for root quantity traits in Cassava
QTL_Name Group Trait Position Population LOD %Expl
RLW-1 1 RLW 446,386–526,632 A1-22,A1-22 + 23,A2-23,A2-22 + 23 2.77 8
RLW-7 7 RLW 7,270,622–7,306,733 A1-22 + 23,A2-23,A2-22 + 23 3.65 14.8
RLW-9 9 RLW 56,068–75,706 A1-22 + 23,A2-22,A2-23,A2-22 + 23 2.5 8.8
RN-3 3 RN 4,862,573–4,922,415 A1-23,A1-22 + 23,A2-22,A2-23,A2-22 + 23
2.5 25.9
RN-16 16 RN 26,309,903–26,309,939 A1-22,A1-22 + 23,A2-23,A2-22 + 23
2.14 11.5
RN-17 17 RN 27,016,633–27,016,679 A1-23,A1-22 + 23,A2-22 + 23 2.85 31.2
RW-7 7 RW 3,472,827–3,497,170 A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23
3.08 12.8
RW-9 9 RW 140,164–148,432 A1-22 + 23,A2-22,A2-23,A2-22 + 23 7.6 12.8
RW-10 10 RW 4,262,539–4,262,551 A1-22 + 23,A2-22,A2-23,A2-22 + 23 3.11
13.6
RW-11 11 RW 16,003,135–16,003,215 A1-23,A1-22 + 23,A2-22,A2-22 + 23
3.52 15.4
RW-12 12 RW 21,852,432–21,935,379 A1-23,A1-22 + 23,A2-22,A2-23 2.53 8.4
RW-15 15 RW 23,949,858–23,961,834
A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23 4.14 10.2
RW-17 17 RW 31,297,009–31,368,543
A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23 3.78 9.8
RW-18 18 RW 16,818,273–16,893,445 A1-22,A1-22 + 23,A2-23,A2-22 + 23
3.08 16.7
RW-18 18 RW 2,317,804–2,317,946 A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23
3.34 30.5
[98]Open in a new tab
The 3 QTLs associated with RLW are positioned on chromosomes 1, 7, and
9. Notably, the QTL designated as qRLW-7 exhibits a LOD score exceeding
3. These QTLs contributes an average phenotypic effect of 10.53% and
has been consistently identified in both A1 and A2 populations. The 3
QTLs of RN are located on chromosomes 3, 16, and 17. Two of them have a
mapping interval of less 100 bp. The 9 QTLs of RW occur mainly on
chromosomes 7, 9, 17, and 18, with two QTLs located on chromosome 18
and three QTL positioning intervals less 150 bp. At the same time, QTLs
related to RLW and RN also occur on chromosomes 7 and 9. QRW-18a,
located around 2 Mb on chromosome 18, occurred in 5 sets of data from
populations A1 and A2, with an LOD value of 3.34.
To understand variation in LOD scores of QTLs mapped for different
traits in different years within populations, three co-located QTLs for
each trait were selected, and QTL interval and nearby LOD waveforms
were plotted (Fig. [99]3A). There was a consistent overall fluctuation
in LOD scores within mapped QTL intervals and their vicinity across
datasets. Two linkage maps were drawn using genetic distance and
markers, and QTL localization information was labeled based on
subsequent localization results (Fig. [100]3B, C).
Fig. 3.
[101]Fig. 3
[102]Open in a new tab
A LOD value fluctuation chart: for each trait, select the main three
QTLs for localization, and draw a LOD waveform of the localization
population based on physical location. The trend of LOD values in the
QTL interval is consistent, and LOD peaks in the QTL interval; B A1
linkage map; C A2 linkage map. QTL labels (B, C): yellow, RLW related
QTLs; purple, RW related QTLs; blue, RN related QTLs
Candidate gene analysis
Based on localization results, interval and upstream and downstream
genes were identified as candidate genes. GO and KEGG analyses were
performed to better understand gene functions. GO enrichment analysis
revealed these genes to be mainly enriched in “polar transport of
auxin,” “auxin transport,” “cytokinin metabolic process,” “ATP
biosynthetic process,” “auxin-activated signaling pathway,” and
“transcription coregulator activity” (Fig. [103]4A). KEGG analysis
revealed candidate genes to be mainly enriched in pathways such as
“protein photosynthesis proteins,” “starch and sucrose metabolism,” and
“plant hormone signal transduction” (Fig. [104]4B). Through screening
and identification, a total of 39 genes were obtained (Table [105]4),
including 5 candidate genes associated with RLW, 11 candidate genes
associated with RN, 23 candidate genes associated with RW, and 7 genes
containing 5 classes of transcription factors (WARY, BES 1, bZIP, ARF,
MYB). Remaining candidate genes included auxin-responsive factor, zinc
finger (C3HC4-type RING finger) family protein, ethylene sensor,
SNARE-like superfamily protein, plant glycogenin-like starch initiation
protein, and glucose-6-phosphate dehydrogenase 6. Expression levels of
candidate genes were calculated based on QTL localization candidate
genes and parental transcriptome data (Supplementary Table S6).
Depending on whether expression level changes were consistent with
trends in parental traits, the candidate genes were classified into two
categories, and expression level heat maps of localized genes were
plotted (Fig. [106]4C, D).
Fig. 4.
[107]Fig. 4
[108]Open in a new tab
A GO function enrichment (gene function mainly enriched in cellular
response to auxin stimuli, auxin polar transport, and other biological
processes); B KEGG pathway: Plant hormone signal transduction, Starch
and cross metabolism pathways, etc.; C Gene expression heatmap
inconsistent with trends in change of parental traits, where expression
changes of candidate genes in different parents are not significantly
related to changes in parental traits; D Gene expression heatmap
consistent with trends in change of parental traits, where expression
levels of candidate genes increase or decrease synchronously with the
size of parental traits in different parents
Table 4.
Candidate genes for root quantity traits in Cassava
Trait Chromosome Gene_ID Physical position(bp) Gene Function
RN 3 Manes.03G044400 3,875,045–3,881,657 Ankyrin repeat family protein
RN 3 Manes.03G047500 4,297,715–4,299,007 Winged-helix DNA-binding
transcription factor family protein
RN 3 Manes.03G051300 4,782,760–4,784,024 WRKY
RN 3 Manes.03G053600 5,181,953–5,190,433 Zn-dependent exopeptidases
superfamily protein
RN 3 Manes.03G055200 5,383,408–5,388,267 Ankyrin repeat family protein
RLW 7 Manes.07G056000 6,355,527–6,365,241 Tetratricopeptide repeat
(TPR)-like superfamily protein
RW 7 Manes.07G026100 2,819,441–2,822,633 Smg-4/UPF3 family protein
RW 7 Manes.07G027700 3,050,129–3,053,172 SNARE-like superfamily protein
RW 7 Manes.07G028400 3,120,266–3,131,581 Protein phosphatase 2C family
protein
RW 7 Manes.07G030500 3,349,025–3,352,522 Zinc finger (C3HC4-type RING
finger) family protein
RW 7 Manes.07G031300 3,431,995–3,432,570 Dehydroascorbate reductase 2
RLW 9 Manes.09G001500 149,564–163,612 DHHC-type zinc finger family
protein
RLW 9 Manes.09G000800 433,877–435,511 DNA/RNA helicase protein
RLW 9 Manes.09G000500 459,723–477,394 Ribosomal protein L32e
RLW 9 Manes.09G004500 1,145,489–1,158,661 bZIP
RW 9 Manes.09G001500 149,564–163,612 DHHC-type zinc finger family
protein
RW 10 Manes.10G043100 4,441,036–4,451,646 Plant-specific GATA-type zinc
finger transcription factor family protein
RW 10 Manes.10G045000 4,799,921–4,809,695 Clast3-related
RW 11 Manes.11G081500 15,096,310–15,098,673 Adaptin family protein
RW 11 Manes.11G090300 16,982,368–16,982,661 R-protein L3 B
RW 12 Manes.12G097300 21,899,393–21,904,241 Signal transduction
histidine kinase, hybrid-type, ethylene sensor
RW 15 Manes.15G183200 24,954,254–24,957,013 Triosephosphate isomerase
RN 17 Manes.17G061300 26,043,602–26,053,323 Plant glycogenin-like
starch initiation protein 3
RN 17 Manes.17G066200 26,623,698–26,625,728 WRKY
RN 17 Manes.17G066900 26,690,489–26,691,693 P-loop containing
nucleoside triphosphate hydrolases superfamily protein
RN 17 Manes.17G073900 27,398,788–27,404,693 Auxin efflux carrier family
protein
RN 17 Manes.17G074600 27,527,652–27,529,759 Glucose-6-phosphate
dehydrogenase 6
RN 17 Manes.17G077900 27,772,133–27,779,912 With no lysine (K) kinase 4
RW 17 Manes.17G100500 30,715,977–30,722,346 Transcriptional factor B3
family protein / auxin-responsive factor AUX/IAA-related
RW 17 Manes.17G102000 30,954,184–30,956,884 WRKY
RW 18 Manes.18G014800 1,564,349–1,568,553 Lysine histidine transporter
1
RW 18 Manes.18G015300 1,619,465–1,621,243 Ubiquitin-like superfamily
protein
RW 18 Manes.18G015400 1,638,932–1,645,840 ARF
RW 18 Manes.18G023500 2,256,111–2,258,325 MYB
RW 18 Manes.18G027800 2,392,755–2,401,894 Nodulin MtN3 family protein
RW 18 Manes.18G029600 2,509,358–2,512,562 BES1
RW 18 Manes.18G035600 3,171,581–3,174,946 Raffinose synthase family
protein
RW 18 Manes.18G036300 3,195,620–3,197,448 Octicosapeptide/Phox/Bem1p
family protein
[109]Open in a new tab
Discussion
Genetic map and QTL mapping precision
High-quality SNP markers were obtained using hyper-seq simplified
sequencing technology, and genetic maps were constructed for two
populations. Multiple cassava maps have been previously constructed
using markers such as RFLP and SSR, ranging 100–510 markers, map
lengths of 845.2–1420.3 cM, and with spacing of 5.6–17.92 cM/marker
[[110]27, [111]39–[112]42]. We increase marker density, and by
producing a final genetic map spacing of 0.37–0.43 cM/marker, we
provide a higher precision cassava genetic map. A total of 15 QTLs
related to traits were identified. There are two of the QTLs in RN have
a mapping interval of less than 100 bp, which has high accuracy and
helps to accurately locate candidate genes related to this trait. There
are three QTL in RW positioning intervals less than 150 bp, indicating
high overall positioning accuracy, and the average LOD value of QTLs
exceeds 3, indicating a high possibility of the existence of the
positioned QTLs. Meantime, Shengkui [[113]43] used GWAS to locate
multiple significant loci for cassava agronomic traits, and positioning
results of qRN-17a and qRW-18a are consistent. Partly consistent with
our results, Ewa [[114]16] reported QTLs related to RW traits to be
located on chromosomes 3, 4, and 7, and those related to RN traits to
be located on chromosomes 3 and 7.while QTLs related to RN traits were
located on chromosomes 3 and 7, consistent with some of the results of
this study.
Functional analysis of candidate genes
We selected three traits related to cassava yield (RN, RW, and RLW) for
QTL localization analysis based on a high-precision cassava genetic
map. There is a high positive correlation between these traits; RN、RW
and RLW can effectively evaluate crop yield and tuber shape, which is
also an important factor to affect yield. We also report
Manes.09G001500 to be associated with RN and RW traits; this gene
belongs to the DHHC type zinc finger family protein which affects plant
growth and development. OsDHHC01 increases the number of tillers and
grain yield in rice and oilseed rape (Brassica napus) [[115]44,
[116]45], and OsDHHC13 positively regulates the oxidative stress
response [[117]44]. DHHC type zinc finger protein (DHHC) and S-acylated
protein mainly serve as substrates for S-acyltransferase. S-acylation,
also known as palmitoylation, is an important protein lipid
modification that plays an important role in plant growth, development,
and stress response [[118]46]. OsDHHC30 participates in regulation of
salt tolerance in rice through S-acylation modification of OsCBL2/3
[[119]47]. We speculate that this gene may affect cassava growth and
development by regulating its stress resistance, thereby affecting the
weight and quantity of root tubers. Further research is needed to
identify how this gene affects root tuber traits.
Manes.03G051300, Manes.17G066200, and Manes.17G102000 belong to the
WRKY family, in which Manes.03G051300 and Manes.17G066200 are
associated with RN traits and Manes.17G102000 is associated with RW.
This family is an important regulator of growth and development in
higher plants. The WRKY transcription factor OsWRKY78 regulates rice
stem elongation and seed size [[120]48], and SIWRKY23 expressed mainly
in tomato roots is associated with seed size [[121]49]. In cassava, the
MeWRKY20 gene promotes ABA accumulation [[122]50], and high ABA levels
can promote radial expansion of cortical cells and reduce root
penetration ability [[123]51]. Manes.18G015400 and Manes.17G100500
belong to the AFR transcription factor family, and Manes.17G073900
belongs to the Auxin efflux carrier family protein, auxin, Aux)
Indole-3-acetic acid, IAA is almost involved in all processes of plant
growth and development; most of these processes are regulated by gene
expression [[124]52]. Auxin response factor (ARF) is a family of
transcription factors that regulates expression of auxin responsive
genes [[125]53, [126]54]. ARF can specifically interact with auxin
response elements in the promoter region of auxin responsive genes, and
AuxRE binds to “TGTCTC” to activate or inhibit gene expression
[[127]55]. The auxin signaling pathway mediated by ARF transcription
affects plant growth and development by regulating cell division,
elongation, and differentiation [[128]56, [129]57]. miRNA160 and
miRNA167 are involved in Arabidopsis AtARF6, regulating adventitious
root formation; Arabidopsis AtARF17 is also involved in this process
[[130]58, [131]59]. The SlARF2 gene in tomato regulates lateral root
formation [[132]60]. DnARF11 is specifically expressed in the roots of
Dendrobium officinale, indicating its importance in root system
establishment [[133]61]. We speculate that these genes affect RN and RW
by regulating root tuber development.
The candidate gene Manes.18G023500 belongs to the MYB transcription
factor family, one of the largest transcription factor families in
plants with important roles in physiological and biochemical processes
in growth and development. MeMYB108 can reduce leaf shedding and
regulate cassava biomass [[134]62]. In Arabidopsis, AtMYB3R1 and
AtMYB3R4 (AtMYB3R1/4) act as transcriptional activation factors,
expressed in proliferating tissue, regulating the cell cycle process,
and affecting circadian rhythms [[135]63–[136]65]. Ectopic expression
of AtMYB56 can inhibit root growth, leading to failed root regeneration
after stem cell damage [[137]66]. In soybeans, GmMYB81 regulates
development of soybean tissues and embryos and has a significant effect
on abiotic stress [[138]67]. We speculate that the MeMYB67 and MeMYB61
genes may affect cassava yield by influencing leaf shedding and
regulating the cell cycle process.
RLW not only effectively evaluates the shape of cassava, but also
relates to yield traits [[139]68, [140]69]. We identify Manes.09G004500
(in the bZIP family) to be related to RLW. This family is an important
regulatory factor in higher plants and participates in plant growth and
development [[141]70]. BZIP is involved in regulating certain signaling
pathways related to the ABA pathway. During germination and flowering
of Arabidopsis seeds, bZIP transcription factors ABI5 (ABA sensitive 5)
and ABFs (ABRE binding factors) are key factors regulating ABA
signaling Hossain. Arabidopsis abi5 mutant is less sensitive to ABA.
Because of ABA-inhibiting seed germination, ABI5 can activate
expression of specific genes in seeds through ABA-mediated signaling
pathways, thereby promoting seed germination [[142]71] Additionally,
Arabidopsis AtbZIP29 is specifically expressed in proliferative
tissues, regulates the number of cells in leaf and root meristem
tissues, and promotes tissue regeneration by specifically binding to
cell cycle regulatory factors and cell wall regeneration related genes.
During the cell division cycle, BZIP29 can interact with bZIP69 to form
dimers, which play roles in the root primordia, root crown, and
meristem of lateral roots, thereby affecting root shape [[143]72]. We
speculate that this gene may affect RLW through similar molecular
mechanisms.
Conclusions
To identify genes related to cassava yield, and to improve cassava
yield, we selected three important agronomic traits. By constructing
genetic maps of two half-sibling populations sharing the same parent,
QTL mapping was performed on two populations, co-located loci were
extracted to obtain 15 QTLs for important quantitative traits, and
potential candidate genes were identified that affected these traits.
We also performed transcriptome analysis to analyze expression levels
of different genes. Finally, 5 candidate genes related to RLW, 11
candidate genes related to RN, and 22 candidate genes related to RW
were obtained. We have found that previous studies have shown that
candidate genes such as Manes. 09G001500, Manes. 17G066200, and Manes.
17C073900 belong to gene families that are associated with their
traits. Localization of these candidate genes improves our
understanding of the genetic basis of cassava yield traits, and
identifies potential gene targets to improve cassava yield and to
provide reference for improving cassava yield through molecular
breeding methods.
Supplementary Information
[144]12870_2025_6278_MOESM1_ESM.docx^ (2.1MB, docx)
Supplementary Material 1: Fig. S1. Collinear diagram of chromosomal
spiders. The figure shows the linkage maps of parents from two
different populations. The collinear loci between the two sets of
linkage maps are connected and highlighted in red. At the same time,
QTLs co located between the two populations are randomly marked with
colors in the linkage maps, visually demonstrating the collinearity
between the parental linkage maps of different populations.
[145]12870_2025_6278_MOESM2_ESM.xlsx^ (137.8KB, xlsx)
Supplementary Material 2: Table S1. population smple name, Table S2.
Variants information, Table S3. Information for linkage map, Table S4.
Phenotypic data, Table S5. QTL information, Table S6. Candidate gene
expression levers.
Acknowledgements