Abstract Cassava, Manihot esculenta Crantz, is the main raw material used in starch production in China. However, due to the small planting scale and high demand in China, large-scale imports are needed. To improve cassava yield and to meet China’s needs, we examine the agronomic traits of root weight, root number, and root length-to-width ratio per plant. By constructing two semi-sibling genetic maps and using years of data for quantitative trait locus (QTL) localization, we compare two population mapping results to screen co-located 15 QTLs, and transcriptome analysis to explore candidate genes related to these traits. We found OsWRKY78 in rice to be homologous to candidate gene Manes.03G051300, which can regulate rice stem elongation and seed size, and Manes.18G023500 to be homologous to MeMYB108, which can reduce leaf shedding and regulate cassava biomass. Through QTL mapping, we identify key genes related to yield traits that can be used in cassava molecular breeding to improve cassava yield. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-025-06278-3. Keywords: Manihot esculenta, Cassava root, Linkage map, Quantitative trait locus Introduction Cassava is an important energy plant and one of six major global food crops [[44]1]. It is widely grown in tropical regions of Africa, Asia, and Latin America [[45]2]. Characterized by its adaptability, fast growth and high yield, cassava is an important food crop, especially in the tropics and subtropics, and is of high economic and social value because of its ability to grow in less-fertile soils [[46]3] and because it is not limited by season or moisture. Cassava starch is suitable for producing high-quality modified starch because of its large particles and high transparency, and it is necessary for modified starch in some industries. Cassava tubers are more valuable because they contain large amounts of starch, and small amounts of protein among other nutrients. These tubers can be processed into a variety of food products, used as animal feed, for brewing beer, and for producing alcohol for bioethanol [[47]4]. For some African countries, approximately 25% of energy acquisition is obtained from cassava [[48]5]. However, despite its importance, the average cassava yield in Africa has not significantly increased since 1961. Quantitative trait locus (QTL) localization is based on genetic linkage analysis, using linkage relationships between molecular markers and target traits to identify genomic regions associated with those traits [[49]6]. With development of high-throughput sequencing technology the drawing of fine genetic maps for cassava (Manihot esculenta Crantz) has considerably progressed, with multiple genetic linkage maps having been constructed using RFLP (Restriction Fragment Length Polymorphism), SSR (simple sequence repeat), and SNP (single nucleotide polymorphism) markers. These maps cover 18 linkage groups, with an average molecular marker distance ranging 4.51–4.81, with linkage map lengths ranging 1707.9–2412.0 centimorgans (cM) [[50]7–[51]9]. Aided by these maps, significant progress has also been made in identifying Quantitative Trait Loci (QTL) for various cassava traits. Boonchanawiwat et al. searched for QTL that affected the height of cassava plants and the height of the first branch, by performing linkage map analysis in self-pollinated progeny [[52]10]. Nang et al. performed QTL analysis on a reciprocal cross population and reported plant and first branch heights to be controlled by multiple genes [[53]11]. Okogbenin et al., investigating the genetic basis of early root bulking in cassava by genetic analysis and QTL localization in a population of non-self-crossing parents, reported early root bulking to be a complex trait controlled by multiple genes [[54]12]. Akinbo et al. identified several QTLs that affected the protein content of cassava roots through QTL analysis in a backcross population [[55]13]. Zou et al. constructed an amplified-fragment single nucleotide polymorphism and methylation (AFSM) QTL map by crossing two non-inbred parents, KU50 and SC124 (KS population), to obtain 186 cassava populations, and identified 260 candidate QTL genes for cold stress and 301 candidate QTL genes for storing root quality and yield [[56]14, [57]15]. Okogbenin, and Ewa’s studies identified several QTLs associated with cassava plant size and yield traits [[58]16–[59]18]. Overall, progress has been made in the study of QTL localization for root number in cassava, but further research is needed to improve understanding of genetic mechanisms controlling root number, and to use this knowledge to improve cassava varieties. Hyper-seq is a novel, effective and flexible marker-assisted selection method for genotyping that is based on the polymerase chain reaction (PCR). Compared with commonly used whole genome sequencing methods such as restriction site-associated DNA tag sequencing fragments generated by, for example, type IIB restriction endonucleases (2b-RAD) and AFSM, Hyper-seq is more efficient and less costly. Its greatest advantage is that it can complete genome simplification, barcode addition, and simplified genome sequence amplification in one step. A Hyper-seq library is constructed with specially designed primers and DNA (for PCR amplification) or leaf (for direct PCR amplification) templates, without the need for restriction enzyme digestion and barcode connection steps [[60]19]. In the present research, two half sibling populations were obtained by crossbreeding cassava cultivar SC205(♀) with HB60(♂) and 18R(♂), respectively. To study the genetic mechanisms and key genes affecting cassava yield traits, in order to improve cassava breeding and increase cassava yield. Three traits related to cassava yield were selected for phenotype analysis, and gene typing was performed using target sequence method. The phenotype and genotype data were integrated for association mapping analysis to identify QTLs closely related to the infected area, which were validated through gene expression analysis. Materials and methods Mapping populations and DNA extraction We crossed cassava cultivar SC205 with cultivars HB60 and 18R to produce two F1 populations, with 259 offspring from SC205 × 18R (A1 population) and 233 offspring from SC205 × HB60 (A2 population) (Supplementary Table S1). Populations were planted on two plots of land in Chengmai Meiyu Village, Hainan Province, in April of 2021 and 2022. Each planting was replicated twice, with two rows per replicate, and six plants per row. After 6 months, young leaf samples were collected, from which DNA was extracted using the CTAB method, and evaluated using nanodrop [[61]20]. After 10 months of planting, cassava root samples were collected for RNA extraction for transcriptome sequencing. Hyper-seq library preparation and sequencing The DNA concentration of all samples was homogenized to 150–200 ng /µL, and the reaction volume was increased to 20 µL by adding 1 µL of DNA sample, mix, F’ primer, and ddH[2]O for PCR amplification reaction; obtain PCR products, purify PCR products using TIANGEN(DP214) reagent kits, perform double end sequencing using MGI sequencer (DNBSEQ-T7), and upload data to BGI raw data were uploaded to BGI. Sequence analyses and genotyping Raw data quality control was performed using FastQC(fastqc -o outputfile inputfile) software [[62]21]. Original data were divided into each sample using Barcode and fastq-multx(fastq-multx -B barcode.txt -m 1 -b itaq.1.fastq itaq.2.fastq -o %.R1.fastq -o %.R2.fastq) software ([63]https://github.com/brwnj/fastq-multx). The AM5608.1 version genome was selected as a reference and BWA [[64]22] (bwa index -a bwtsw cassava.fa) was used to establish a reference genome index to generate BAM files. Variant calling was performed using SAMtools and BCFtools [[65]23, [66]24], and high-quality SNPs were filtered from VCF files using PLINK [[67]25] software (–geno 0.1 –maf 0.05 –hwe 0.0001). SNP marker density was visualized using the CMplot package in R, and variant sites were annotated using SNPEff [[68]26]. Transcriptome analyses and genotyping Raw transcriptome data were analyzed using FastQC, followed by adapter removal, base modification, trimming, and filtering of low-quality sequences using Trimmomatic(trimmomatic PE -phred33 input_forward.fq.gz input_reverse.fq.gz) [[69]27]. Hisat2 [[70]28] was used to construct a reference genome index of the AM5608.1 version genome and to align paired-end clean reads to the reference genome. FeatureCounts [[71]29] was used to accurately count reads; reads of all samples were then combined and exported as a single RNA-Seq_Practice_countstable file. Expression levels of each gene in each sample were quantified using FPKM and TPM calculation formulae. Gene expression levels were obtained for different samples. Linkage map construction A linkage group refers to a set of genes or markers that are located on the same chromosome and are genetically linked to each other [[72]30]. We constructed a genetic linkage map of common parents from the 259 offspring dataset of Group A1 and the 233 offspring dataset of Group A2, and compared the differential loci between parents to obtain the genotype of parent specific variant loci. Firstly, we used VCFtools [[73]31] to extract the shared specific variant loci between offspring and parents. PLINK [[74]24] (– bfile myfile – geno 0.05) was then used to filter mutation sites, and PLINK (– file myfile – recodeA) was used to encode the offspring genotype data as 012. Then import the markers into QTL IciMapping software [[75]32] and use it to construct a linkage map. Initially, markers were grouped into linkage groups based on a LOD score threshold of over 3. Within each group, markers were ordered using the nnTwoOpt algorithm, which is a hybrid approach combining the Nearest Neighbor Algorithm and the Two-opt Algorithm [[76]33], and genetic distances were calculated using the Kosambi mapping function [[77]34]. The map was further refined by selecting a window size of 10 markers and adjusting the marker data based on the sum of adjacent recombination frequencies. This adjustment ensured the shortest possible genetic distances between markers. During the refinement process, marker positions were iteratively optimized to minimize map expansion and distortion, thereby enhancing the accuracy and stability of the final linkage map. The final linkage map, including marker names, genetic positions, and linkage group assignments, was exported from QTL IciMapping. The map was then visualized using the qtl package in R [[78]35]. Phenotypic assessment analyses Phenotypic data were measured in the field 10 months after planting, when three plants with similar growth vitality were selected, RW, RN, and RLW of each was measured. Finally, the phenotype field raw data for 2022 and 2023 were obtained. Then, outlier phenotypic data were removed according to the 3 σ rule (values outside μ − 3 σ and μ + 3 σ intervals are outliers, where μ is the sample data mean and σ is the data standard deviation). After handling the outliers, we choose a linear mixed model approach to analyze all phenotypic traits. The best linear unbiased estimation (BLUE) of each individual was used to perform QTL mapping. In order to obtain joint localization results of different populations with data between single year and multiple years, Model 1 was used to obtain phenotype data of different populations in 2022 and 2023, and Model 2 was used to obtain merged phenotype data of 22–23. [MATH: Yijl=μ+βi+γj +ϵl+e ijl :MATH] 1 [MATH: Yijkl=μ+βi+γj+δk+ ϵjl+eijkl :MATH] 2 where Yijl is the phenotypic observation related to line i, location j, repetition l, Y[ijkl] is the phenotypic observation related to line i, location j, year k repetition l,μ is the global intercept. β[i] is the fixed effect of the i-th variety. γ[j] is the random effect at position j. δ[k] is the random effect of the kth year. ϵ[l] is a random effect of l repeated combinations. ϵ [jl] is a random effect of j positions and l repeated combinations. Eijl and e[ijkl] is error term, representing random errors in observations. Lme4 is an R package that can generally be used to solve mixed models. It supports linear mixed models, generalized linear mixed models, and nonlinear mixed models. We selected lme4 package based on Model 3 to calculate the heritability of different traits in two populations. Spearman’s coefficients were used to evaluate correlations between traits, and Shapiro–Wilk tests were used to evaluate normality, kurtosis, and skewness (α = 0.05). [MATH: h2=VgVg+VGL/L+VGY/Y+VGLY/LY+ Ve/YLR :MATH] 3 where V[g] is the variance component of cultivar, V[GL] is the interaction variance component between cultivar and location, V[GY] is the variance component between cultivar and year, and V[e] is the residual variance component. V[GLY] is the interaction variance component between cultivar and location and repetition, and Ve is the residual variance component, Y, L and R are the the number of year, location, and repetition respectively. QTL Detection and candidate gene analysis MapQTL5 software [[79]36] was used for QTL calculation. Linkage maps and phenotype information from 2022 and 2023 were used for Multiple-QTL Mapping (MQM Mapping) to map QTL and estimate their effects [[80]37]. The LOD score for significant QTL was determined through a test analysis (1000 permutations, total error level 5%). Candidate loci with LOD > 2 were screened, by collating the loci mapped in the two populations; loci within a 50 Kb interval upstream and downstream of the candidate loci that are present in both populations and in three or more data sets were selected as co-located loci. Collation of the co-located loci to determine the co-located QTL, and estimation of the contribution of each identified QTL to the total phenotypic variance through analysis of variance. Loci and QTL were organized based on population and year information (e.g., 22-A1), and the final QTL were determined. The nomenclature for identified QTL involves a trait abbreviation preceded by “q,” the relevant chromosome name, and a numerical identifier (e.g., qRW—1a). A confidence interval of 1 Mb flanking the QTL identification region was used to screen candidate genes. Candidate genes were extracted from the QTL interval and functionally annotated using annotation information for the cassava AM560.8.1 genome. Important genes were selected for analysis. Gene numbers were compared with annotation files of the AM560v8.1 reference genome to obtain corresponding GO numbers for each gene. Candidate gene numbers and GO numbers were input into WEGO ([81]http://wego.genomics.org.cn/) for GO functional annotation, and gene sequences were extracted using SeqKit subseq [[82]38]. The input file was organized and imported into KEGG ([83]http://kobas.cbi.pku.edu.cn/genelist/) for pathway enrichment analysis. An enrichment plot was generated using R. Results SNP marker density and annotation Approximately 406 Gb of sequencing data were obtained, with an average sequencing depth of 812 Mb per sample (the cassava AM560 8.1 version genome size is 617 Mb). In total, 1,215,893,952 high-quality reads were obtained, with an average of 2,461,324 reads per sample. Obtained two mutation site vcf files for the (2,417,535) and A2 populations (1,376,544). The A1 population had 161,361 unique variant sites shared with the SC205 parent (the A1 maternal parent, abbreviated A1_SC205) and the A2 population had 121,556 unique variant sites shared with the SC205 parent (the A2 maternal parent, abbreviated A2_SC205). An SNP marker density map indicated that variant sites were densely distributed across the genome (Fig. [84]1A, B). Obtained variant sites were annotated to better understand their position on the genome; annotation information and SNP statistics by type and region are provided in Supplementary Table S2. Fig. 1. [85]Fig. 1 [86]Open in a new tab A A1 population SNP density map, with an average of 8610.2 mutation sites per chromosome; B A2 population SNP density map, with an average of 6478.9 mutation sites per chromosome; C Collinearity analysis of two group linkage maps (A1 with an average collinearity of 30.37%, and peak (38.01%) collinearity in LG16; A2 with an average collinearity of 26.69% and peak (39.40%) collinearity in LG18) By filtering and separating sites and redundant sites, 5913 were obtained in population A1 and 6648 in population A2 for construction of high-density genetic maps of population maternal parents, and 18 linkage groups were identified finally (consistent with the number of haploid chromosomes in cassava). The total map distance of the A1_SC205 genetic map was 2492.62 cM, the average distance between markers was 0.43 cM, the length of 18 linkage groups ranged 82.42–219.53 cM, and the longest linkage group was LG01. The total map distance of the A2_SC205 genetic map was 2445.17 cM, the average distance between markers was 0.37 cM, the length of linkage groups ranged 72.10–206.95 cM, and the longest linkage group was LG01 (Supplementary Table S3). Collinearity in linkage map markers between populations was observed by counting common loci, of which there were 1792; for A1, the average collinearity rate was 30.37%, with a maximum of 38.01% (LG16); for A2 the average collinear rate was 26.69%, with a maximum of 39.40% (LG18) (Supplementary Table S3). TBtools was used to draw chromosome collinearity for each population (Fig. [87]1C). The collinearity relationship between linkage maps of each population (Supplementary Fig. 1) revealed a certain degree of collinearity on each chromosome, laying a foundation for subsequent co-localization. Statistical analysis and correlation of phenotypic data Summary data for parental population phenotypic mean values, statistical data for different years, and combined Best Linear Unbiased Estimators (BLUE) values for correlated traits between populations, are presented in Supplementary Table S4. A combined analysis of populations revealed a coefficient of variation (CV) for RW to range 0.413–0.656, and heritability values ranged 0.16–0.23. For RN, the CV ranged 0.244–0.342, and heritability values ranged 0.48–0.5. For RLW, the CV ranged 0.207–0.283, and heritability values ranged 0.48–0.59 (Table [88]1). In both populations, RW was most variable and had relatively lower heritability, while RN and RLW had smaller trait variabilities and higher stability and heritability. This suggests that data stability may influence the magnitude of trait heritability. Additionally, the similarity in heritability between populations indicates a relatively stable heritability. Table 1. Statistical analysis of population traits of hybrid progenies population Trait Year Mean standarddeviation Range Skew Kurtosis CV Heritability A1 RN 2022 6.589 1.901 2.510–12.510 0.194 2.472 0.288 0.48 2023 5.353 1.824 0.720–10.720 0.538 3.165 0.341 22–23 5.738 1.620 1.57–11.57 0.079 3.064 0.282 RW 2022 2.614 1.560 0.120–9.310 0.936 4.158 0.597 0.16 2023 2.643 1.734 0.230–9.120 1.297 4.635 0.656 22–23 2.513 1.474 0.08–9.103 0.934 4.357 0.587 RLW 2022 6.788 1.556 3.290–12.330 0.635 3.516 0.229 0.58 2023 8.695 1.956 1.340–13.860 −0.385 3.754 0.225 22–23 7.608 1.572 0.935–12.812 −0.156 4.123 0.207 A2 RN 2022 6.585 2.204 1.00–14.160 0.419 3.318 0.335 0.53 2023 4.684 1.600 0.720–9.720 0.438 3.753 0.342 22–23 5.599 1.364 2.237–10.73 0.211 3.726 0.244 RW 2022 2.161 1.219 0.140–8.670 1.394 6.670 0.564 0.23 2023 1.563 0.791 0.060–5.090 1.010 4.917 0.506 22–23 1.789 0.740 0.081–4.327 0.074 2.866 0.413 RLW 2022 6.818 1.761 3.450–12.460 0.750 3.746 0.258 0.59 2023 7.432 2.106 0.830–14.660 0.406 4.172 0.283 22–23 7.100 1.537 2.876–11.604 0.225 2.835 0.217 [89]Open in a new tab Overall skewness was close to 0 and kurtosis to 3, suggesting approximate data normality and suitability for subsequent analysis. Data also approximately follow a normal distribution (Fig. [90]2A). Violin plots and trait error distribution plots (Fig. [91]2B) reveal minimal overall variation in phenotype data for each population over the two years, indicating low data error and high stability. Fig. 2. [92]Fig. 2 [93]Open in a new tab A Violin plots of trait distribution; B Error plots of trait distribution; and Trait correlation coefficients for C) Group A1 and D) Group A2 R software was used to compute trait correlation coefficients and correlations between populations, and to generate a correlation heatmap (Fig. [94]2C, D). All three traits correlated positively in both populations, with RN strongly and positively correlated with RW. Boxplots revealed similar overall trends in the data for RN and RW. QTL localization QTL mapping results reveal 188 QTLs to be located with RLW in different years between populations, along with 157 QTLs associated with RN and 213 QTLs related to RW. Quantities of QTLs mapped in different populations and years are summarized in Table [95]2, and detailed in Supplementary Table S5. To validate QTL effectiveness, positional information of mapped loci in populations was compared; 3 co-located RLW-related QTLs, 3 RN-related QTLs, and 9 RW-related QTLs were identified (Table [96]3). Table 2. Number of loci for different traits in different populations and years Year RLW RN RW Total A1-22 20 9 17 46 A1-23 38 41 31 110 A1-22 + 23 32 13 19 64 A2-22 38 41 74 153 A2-23 35 21 14 70 A2-22 + 23 25 32 58 115 Total 188 157 213 558 [97]Open in a new tab Table 3. Coloucation QTLs mapped for root quantity traits in Cassava QTL_Name Group Trait Position Population LOD %Expl RLW-1 1 RLW 446,386–526,632 A1-22,A1-22 + 23,A2-23,A2-22 + 23 2.77 8 RLW-7 7 RLW 7,270,622–7,306,733 A1-22 + 23,A2-23,A2-22 + 23 3.65 14.8 RLW-9 9 RLW 56,068–75,706 A1-22 + 23,A2-22,A2-23,A2-22 + 23 2.5 8.8 RN-3 3 RN 4,862,573–4,922,415 A1-23,A1-22 + 23,A2-22,A2-23,A2-22 + 23 2.5 25.9 RN-16 16 RN 26,309,903–26,309,939 A1-22,A1-22 + 23,A2-23,A2-22 + 23 2.14 11.5 RN-17 17 RN 27,016,633–27,016,679 A1-23,A1-22 + 23,A2-22 + 23 2.85 31.2 RW-7 7 RW 3,472,827–3,497,170 A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23 3.08 12.8 RW-9 9 RW 140,164–148,432 A1-22 + 23,A2-22,A2-23,A2-22 + 23 7.6 12.8 RW-10 10 RW 4,262,539–4,262,551 A1-22 + 23,A2-22,A2-23,A2-22 + 23 3.11 13.6 RW-11 11 RW 16,003,135–16,003,215 A1-23,A1-22 + 23,A2-22,A2-22 + 23 3.52 15.4 RW-12 12 RW 21,852,432–21,935,379 A1-23,A1-22 + 23,A2-22,A2-23 2.53 8.4 RW-15 15 RW 23,949,858–23,961,834 A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23 4.14 10.2 RW-17 17 RW 31,297,009–31,368,543 A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23 3.78 9.8 RW-18 18 RW 16,818,273–16,893,445 A1-22,A1-22 + 23,A2-23,A2-22 + 23 3.08 16.7 RW-18 18 RW 2,317,804–2,317,946 A1-22,A1-22 + 23,A2-22,A2-23,A2-22 + 23 3.34 30.5 [98]Open in a new tab The 3 QTLs associated with RLW are positioned on chromosomes 1, 7, and 9. Notably, the QTL designated as qRLW-7 exhibits a LOD score exceeding 3. These QTLs contributes an average phenotypic effect of 10.53% and has been consistently identified in both A1 and A2 populations. The 3 QTLs of RN are located on chromosomes 3, 16, and 17. Two of them have a mapping interval of less 100 bp. The 9 QTLs of RW occur mainly on chromosomes 7, 9, 17, and 18, with two QTLs located on chromosome 18 and three QTL positioning intervals less 150 bp. At the same time, QTLs related to RLW and RN also occur on chromosomes 7 and 9. QRW-18a, located around 2 Mb on chromosome 18, occurred in 5 sets of data from populations A1 and A2, with an LOD value of 3.34. To understand variation in LOD scores of QTLs mapped for different traits in different years within populations, three co-located QTLs for each trait were selected, and QTL interval and nearby LOD waveforms were plotted (Fig. [99]3A). There was a consistent overall fluctuation in LOD scores within mapped QTL intervals and their vicinity across datasets. Two linkage maps were drawn using genetic distance and markers, and QTL localization information was labeled based on subsequent localization results (Fig. [100]3B, C). Fig. 3. [101]Fig. 3 [102]Open in a new tab A LOD value fluctuation chart: for each trait, select the main three QTLs for localization, and draw a LOD waveform of the localization population based on physical location. The trend of LOD values in the QTL interval is consistent, and LOD peaks in the QTL interval; B A1 linkage map; C A2 linkage map. QTL labels (B, C): yellow, RLW related QTLs; purple, RW related QTLs; blue, RN related QTLs Candidate gene analysis Based on localization results, interval and upstream and downstream genes were identified as candidate genes. GO and KEGG analyses were performed to better understand gene functions. GO enrichment analysis revealed these genes to be mainly enriched in “polar transport of auxin,” “auxin transport,” “cytokinin metabolic process,” “ATP biosynthetic process,” “auxin-activated signaling pathway,” and “transcription coregulator activity” (Fig. [103]4A). KEGG analysis revealed candidate genes to be mainly enriched in pathways such as “protein photosynthesis proteins,” “starch and sucrose metabolism,” and “plant hormone signal transduction” (Fig. [104]4B). Through screening and identification, a total of 39 genes were obtained (Table [105]4), including 5 candidate genes associated with RLW, 11 candidate genes associated with RN, 23 candidate genes associated with RW, and 7 genes containing 5 classes of transcription factors (WARY, BES 1, bZIP, ARF, MYB). Remaining candidate genes included auxin-responsive factor, zinc finger (C3HC4-type RING finger) family protein, ethylene sensor, SNARE-like superfamily protein, plant glycogenin-like starch initiation protein, and glucose-6-phosphate dehydrogenase 6. Expression levels of candidate genes were calculated based on QTL localization candidate genes and parental transcriptome data (Supplementary Table S6). Depending on whether expression level changes were consistent with trends in parental traits, the candidate genes were classified into two categories, and expression level heat maps of localized genes were plotted (Fig. [106]4C, D). Fig. 4. [107]Fig. 4 [108]Open in a new tab A GO function enrichment (gene function mainly enriched in cellular response to auxin stimuli, auxin polar transport, and other biological processes); B KEGG pathway: Plant hormone signal transduction, Starch and cross metabolism pathways, etc.; C Gene expression heatmap inconsistent with trends in change of parental traits, where expression changes of candidate genes in different parents are not significantly related to changes in parental traits; D Gene expression heatmap consistent with trends in change of parental traits, where expression levels of candidate genes increase or decrease synchronously with the size of parental traits in different parents Table 4. Candidate genes for root quantity traits in Cassava Trait Chromosome Gene_ID Physical position(bp) Gene Function RN 3 Manes.03G044400 3,875,045–3,881,657 Ankyrin repeat family protein RN 3 Manes.03G047500 4,297,715–4,299,007 Winged-helix DNA-binding transcription factor family protein RN 3 Manes.03G051300 4,782,760–4,784,024 WRKY RN 3 Manes.03G053600 5,181,953–5,190,433 Zn-dependent exopeptidases superfamily protein RN 3 Manes.03G055200 5,383,408–5,388,267 Ankyrin repeat family protein RLW 7 Manes.07G056000 6,355,527–6,365,241 Tetratricopeptide repeat (TPR)-like superfamily protein RW 7 Manes.07G026100 2,819,441–2,822,633 Smg-4/UPF3 family protein RW 7 Manes.07G027700 3,050,129–3,053,172 SNARE-like superfamily protein RW 7 Manes.07G028400 3,120,266–3,131,581 Protein phosphatase 2C family protein RW 7 Manes.07G030500 3,349,025–3,352,522 Zinc finger (C3HC4-type RING finger) family protein RW 7 Manes.07G031300 3,431,995–3,432,570 Dehydroascorbate reductase 2 RLW 9 Manes.09G001500 149,564–163,612 DHHC-type zinc finger family protein RLW 9 Manes.09G000800 433,877–435,511 DNA/RNA helicase protein RLW 9 Manes.09G000500 459,723–477,394 Ribosomal protein L32e RLW 9 Manes.09G004500 1,145,489–1,158,661 bZIP RW 9 Manes.09G001500 149,564–163,612 DHHC-type zinc finger family protein RW 10 Manes.10G043100 4,441,036–4,451,646 Plant-specific GATA-type zinc finger transcription factor family protein RW 10 Manes.10G045000 4,799,921–4,809,695 Clast3-related RW 11 Manes.11G081500 15,096,310–15,098,673 Adaptin family protein RW 11 Manes.11G090300 16,982,368–16,982,661 R-protein L3 B RW 12 Manes.12G097300 21,899,393–21,904,241 Signal transduction histidine kinase, hybrid-type, ethylene sensor RW 15 Manes.15G183200 24,954,254–24,957,013 Triosephosphate isomerase RN 17 Manes.17G061300 26,043,602–26,053,323 Plant glycogenin-like starch initiation protein 3 RN 17 Manes.17G066200 26,623,698–26,625,728 WRKY RN 17 Manes.17G066900 26,690,489–26,691,693 P-loop containing nucleoside triphosphate hydrolases superfamily protein RN 17 Manes.17G073900 27,398,788–27,404,693 Auxin efflux carrier family protein RN 17 Manes.17G074600 27,527,652–27,529,759 Glucose-6-phosphate dehydrogenase 6 RN 17 Manes.17G077900 27,772,133–27,779,912 With no lysine (K) kinase 4 RW 17 Manes.17G100500 30,715,977–30,722,346 Transcriptional factor B3 family protein / auxin-responsive factor AUX/IAA-related RW 17 Manes.17G102000 30,954,184–30,956,884 WRKY RW 18 Manes.18G014800 1,564,349–1,568,553 Lysine histidine transporter 1 RW 18 Manes.18G015300 1,619,465–1,621,243 Ubiquitin-like superfamily protein RW 18 Manes.18G015400 1,638,932–1,645,840 ARF RW 18 Manes.18G023500 2,256,111–2,258,325 MYB RW 18 Manes.18G027800 2,392,755–2,401,894 Nodulin MtN3 family protein RW 18 Manes.18G029600 2,509,358–2,512,562 BES1 RW 18 Manes.18G035600 3,171,581–3,174,946 Raffinose synthase family protein RW 18 Manes.18G036300 3,195,620–3,197,448 Octicosapeptide/Phox/Bem1p family protein [109]Open in a new tab Discussion Genetic map and QTL mapping precision High-quality SNP markers were obtained using hyper-seq simplified sequencing technology, and genetic maps were constructed for two populations. Multiple cassava maps have been previously constructed using markers such as RFLP and SSR, ranging 100–510 markers, map lengths of 845.2–1420.3 cM, and with spacing of 5.6–17.92 cM/marker [[110]27, [111]39–[112]42]. We increase marker density, and by producing a final genetic map spacing of 0.37–0.43 cM/marker, we provide a higher precision cassava genetic map. A total of 15 QTLs related to traits were identified. There are two of the QTLs in RN have a mapping interval of less than 100 bp, which has high accuracy and helps to accurately locate candidate genes related to this trait. There are three QTL in RW positioning intervals less than 150 bp, indicating high overall positioning accuracy, and the average LOD value of QTLs exceeds 3, indicating a high possibility of the existence of the positioned QTLs. Meantime, Shengkui [[113]43] used GWAS to locate multiple significant loci for cassava agronomic traits, and positioning results of qRN-17a and qRW-18a are consistent. Partly consistent with our results, Ewa [[114]16] reported QTLs related to RW traits to be located on chromosomes 3, 4, and 7, and those related to RN traits to be located on chromosomes 3 and 7.while QTLs related to RN traits were located on chromosomes 3 and 7, consistent with some of the results of this study. Functional analysis of candidate genes We selected three traits related to cassava yield (RN, RW, and RLW) for QTL localization analysis based on a high-precision cassava genetic map. There is a high positive correlation between these traits; RN、RW and RLW can effectively evaluate crop yield and tuber shape, which is also an important factor to affect yield. We also report Manes.09G001500 to be associated with RN and RW traits; this gene belongs to the DHHC type zinc finger family protein which affects plant growth and development. OsDHHC01 increases the number of tillers and grain yield in rice and oilseed rape (Brassica napus) [[115]44, [116]45], and OsDHHC13 positively regulates the oxidative stress response [[117]44]. DHHC type zinc finger protein (DHHC) and S-acylated protein mainly serve as substrates for S-acyltransferase. S-acylation, also known as palmitoylation, is an important protein lipid modification that plays an important role in plant growth, development, and stress response [[118]46]. OsDHHC30 participates in regulation of salt tolerance in rice through S-acylation modification of OsCBL2/3 [[119]47]. We speculate that this gene may affect cassava growth and development by regulating its stress resistance, thereby affecting the weight and quantity of root tubers. Further research is needed to identify how this gene affects root tuber traits. Manes.03G051300, Manes.17G066200, and Manes.17G102000 belong to the WRKY family, in which Manes.03G051300 and Manes.17G066200 are associated with RN traits and Manes.17G102000 is associated with RW. This family is an important regulator of growth and development in higher plants. The WRKY transcription factor OsWRKY78 regulates rice stem elongation and seed size [[120]48], and SIWRKY23 expressed mainly in tomato roots is associated with seed size [[121]49]. In cassava, the MeWRKY20 gene promotes ABA accumulation [[122]50], and high ABA levels can promote radial expansion of cortical cells and reduce root penetration ability [[123]51]. Manes.18G015400 and Manes.17G100500 belong to the AFR transcription factor family, and Manes.17G073900 belongs to the Auxin efflux carrier family protein, auxin, Aux) Indole-3-acetic acid, IAA is almost involved in all processes of plant growth and development; most of these processes are regulated by gene expression [[124]52]. Auxin response factor (ARF) is a family of transcription factors that regulates expression of auxin responsive genes [[125]53, [126]54]. ARF can specifically interact with auxin response elements in the promoter region of auxin responsive genes, and AuxRE binds to “TGTCTC” to activate or inhibit gene expression [[127]55]. The auxin signaling pathway mediated by ARF transcription affects plant growth and development by regulating cell division, elongation, and differentiation [[128]56, [129]57]. miRNA160 and miRNA167 are involved in Arabidopsis AtARF6, regulating adventitious root formation; Arabidopsis AtARF17 is also involved in this process [[130]58, [131]59]. The SlARF2 gene in tomato regulates lateral root formation [[132]60]. DnARF11 is specifically expressed in the roots of Dendrobium officinale, indicating its importance in root system establishment [[133]61]. We speculate that these genes affect RN and RW by regulating root tuber development. The candidate gene Manes.18G023500 belongs to the MYB transcription factor family, one of the largest transcription factor families in plants with important roles in physiological and biochemical processes in growth and development. MeMYB108 can reduce leaf shedding and regulate cassava biomass [[134]62]. In Arabidopsis, AtMYB3R1 and AtMYB3R4 (AtMYB3R1/4) act as transcriptional activation factors, expressed in proliferating tissue, regulating the cell cycle process, and affecting circadian rhythms [[135]63–[136]65]. Ectopic expression of AtMYB56 can inhibit root growth, leading to failed root regeneration after stem cell damage [[137]66]. In soybeans, GmMYB81 regulates development of soybean tissues and embryos and has a significant effect on abiotic stress [[138]67]. We speculate that the MeMYB67 and MeMYB61 genes may affect cassava yield by influencing leaf shedding and regulating the cell cycle process. RLW not only effectively evaluates the shape of cassava, but also relates to yield traits [[139]68, [140]69]. We identify Manes.09G004500 (in the bZIP family) to be related to RLW. This family is an important regulatory factor in higher plants and participates in plant growth and development [[141]70]. BZIP is involved in regulating certain signaling pathways related to the ABA pathway. During germination and flowering of Arabidopsis seeds, bZIP transcription factors ABI5 (ABA sensitive 5) and ABFs (ABRE binding factors) are key factors regulating ABA signaling Hossain. Arabidopsis abi5 mutant is less sensitive to ABA. Because of ABA-inhibiting seed germination, ABI5 can activate expression of specific genes in seeds through ABA-mediated signaling pathways, thereby promoting seed germination [[142]71] Additionally, Arabidopsis AtbZIP29 is specifically expressed in proliferative tissues, regulates the number of cells in leaf and root meristem tissues, and promotes tissue regeneration by specifically binding to cell cycle regulatory factors and cell wall regeneration related genes. During the cell division cycle, BZIP29 can interact with bZIP69 to form dimers, which play roles in the root primordia, root crown, and meristem of lateral roots, thereby affecting root shape [[143]72]. We speculate that this gene may affect RLW through similar molecular mechanisms. Conclusions To identify genes related to cassava yield, and to improve cassava yield, we selected three important agronomic traits. By constructing genetic maps of two half-sibling populations sharing the same parent, QTL mapping was performed on two populations, co-located loci were extracted to obtain 15 QTLs for important quantitative traits, and potential candidate genes were identified that affected these traits. We also performed transcriptome analysis to analyze expression levels of different genes. Finally, 5 candidate genes related to RLW, 11 candidate genes related to RN, and 22 candidate genes related to RW were obtained. We have found that previous studies have shown that candidate genes such as Manes. 09G001500, Manes. 17G066200, and Manes. 17C073900 belong to gene families that are associated with their traits. Localization of these candidate genes improves our understanding of the genetic basis of cassava yield traits, and identifies potential gene targets to improve cassava yield and to provide reference for improving cassava yield through molecular breeding methods. Supplementary Information [144]12870_2025_6278_MOESM1_ESM.docx^ (2.1MB, docx) Supplementary Material 1: Fig. S1. Collinear diagram of chromosomal spiders. The figure shows the linkage maps of parents from two different populations. The collinear loci between the two sets of linkage maps are connected and highlighted in red. At the same time, QTLs co located between the two populations are randomly marked with colors in the linkage maps, visually demonstrating the collinearity between the parental linkage maps of different populations. [145]12870_2025_6278_MOESM2_ESM.xlsx^ (137.8KB, xlsx) Supplementary Material 2: Table S1. population smple name, Table S2. Variants information, Table S3. Information for linkage map, Table S4. Phenotypic data, Table S5. QTL information, Table S6. Candidate gene expression levers. Acknowledgements