Graphical abstract graphic file with name fx1.jpg [46]Open in a new tab Highlights * • Generated phased diploid genome assembly of Citrus sinensis Newhall for ASE analysis * • Citrus pan-genome contains 50,442 genes including 13,301 core genes * • Sequenced 91 citrus accessions and conducted GWAS using 447 citrus accessions * • Identification of candidate genes for HLB resistance, tolerance, or susceptibility __________________________________________________________________ Association analysis; Plant pathology; Genomics Introduction Citrus is one of top three fruit crops worldwide and is an important source of vitamin C in human diet.[47]^1 However, citrus production faces many challenges including diseases, drought, flood, and freezes. Among them, citrus Huanglongbing (HLB, also known as greening) caused by Candidatus Liberibacter spp. presents an unprecedented challenge and has spread to most citrus growing regions such as China, Brazil, and USA.[48]^2^,[49]^3Ca. L. asiaticus (CLas) is the most prevalent HLB pathogen. Almost all commercial citrus varieties are susceptible to HLB with few exceptions, such as Sugar Belle Mandarin (Citrus reticulata), Persian lime (Citrus latifolia), and US-897 (C. reticulata Blanco × Poncirus trifoliata L. Raf.) that show tolerance against HLB.[50]^4^,[51]^5^,[52]^6^,[53]^7 In addition, multiple citrus relatives such as Microcitrus australis, Eremocitrus glauca, Swinglea glutinosa, M. warburgiana, and M. papuana, have shown resistance against HLB.[54]^8^,[55]^9 HLB has been suggested to be a pathogen-triggered immune disease.[56]^10 CLas is vectored mainly by Asian citrus psyllid (Diaphorina citri) and after transmission begins to colonize the phloem where it initiates a systemic and chronic immune response including reactive oxygen species (ROS) production, subsequent cell death of phloem tissues, and eventual HLB symptom development. Consistent with this model, CLas causes fewer changes in the expression of immunity genes in the tolerant rootstock variety US-897 than the susceptible variety ‘Cleopatra’ mandarin.[57]^7 In addition, CLas stimulates significantly higher levels of ROS production in both the HLB susceptible Mexican lime and the HLB-tolerant Persian lime, with the latter demonstrating higher antioxidants, suggesting that high ROS levels are tolerated in Persian lime.[58]^6 However, the genetic determinants responsible for resistance, tolerance, and susceptibility of different citrus genotypes against HLB remain unknown. With HLB being a pathogen-triggered immune disease, it has been proposed that CLas recognition and downstream immune signaling, ROS production, and antioxidants may be responsible for variation in the response to HLB in citrus (resistance, tolerance, and susceptibility).[59]^10 Antimicrobial peptides have also been associated with resistance to HLB.[60]^11 Bacterial pathogens trigger immune response via recognition by either membrane-localized pattern recognition receptors (PRRs) and intracellular nucleotide-binding domain leucine-rich repeat receptors (NLRs),[61]^12^,[62]^13 leading to PAMP-triggered immunity (PTI) and effector-triggered immunity (ETI), respectively. PTI and ETI share multiple downstream immune responses including the influx of Ca^2+, ROS burst, activation of mitogen-activated protein kinase (MAPK) cascades, defense gene induction, and biosynthesis of defense phytohormones.[63]^12^,[64]^13 Recent studies showed that both PTI and ETI are needed to mount a robust immune response, as they synergistically enhance each other.[65]^14^,[66]^15^,[67]^16^,[68]^17^,[69]^18^,[70]^19^,[71]^20 In citrus there are a large number of genes involved in these two signaling pathways. For example, citrus immune signaling cascades include approximately 925 pattern recognition receptors (PRRs), 703 nucleotide-binding, leucine-rich domains (NLRs), 45 calcium-dependent protein kinases (CDPKs or CPKs), 100 mitogen-activated protein kinase (MAPKs), 119 cytoplasmic receptor-like kinases (CRLK), as well as 137 pathogenesis-related (PR) genes, 19 ROS genes, and 436 antioxidant enzyme genes. We hypothesized that the genetic determinants of HLB pathogenicity, including candidates in these immune signaling pathways, can be identified through the integration of existing and newly produced citrus genomic resources to facilitate pan-genome analysis, a genome-wide association study (GWAS) for HLB response, and allelic-specific expression analyses of citrus accessions that differ in HLB resistance, tolerance and susceptibility. The genomes of multiple citrus genotypes and relatives have been assembled including C. sinensis,[72]^21 Atalantia buxifolia, Citrus medica, Citrus ichangensis,[73]^22 Fortunella hindsii,[74]^23 Citrus clementina,[75]^1C. reticulata,[76]^24Citrus grandis, and P. trifoliata.[77]^25 However, chromosome-level phased genomes were not available except C. sinensis cv. Valencia,[78]^26 limiting exploration of haplotype-specific differences in gene content. In addition, thousands of accessions of citrus cultivars and relatives are available worldwide, representing a treasure to be mined. In this study, we developed a haplotype-resolved genome assembly of C. sinensis ‘Newhall’, which was used for identification of genes that are specific to HLB resistant accessions. In total, pan-genome analysis was performed using genome assemblies from both susceptible and tolerant/resistant citrus cultivars. In addition, 26 citrus accessions (HLB resistant, tolerant, or susceptible) that have high quality genome sequencing data were mined for small indels linked to HLB response. The phased chromosome-level genome of C. sinensis was also used to investigate the difference in HLB susceptibility and tolerance of Valencia sweet orange and Sugar Belle mandarin LB9-9, respectively. We have also sequenced 91 citrus accessions, which, together with 356 previously sequenced citrus accessions, were used for GWAS analysis of genetic determinants responsible for HLB pathogenicity. Overall, this study has developed a significant resource of citrus genomic information and identified candidate genes to be further explored to understand the genetic determinants of HLB pathogenicity and to generate HLB resistant/tolerant citrus varieties. Results Phased diploid genome assembly of C. sinensis ‘Newhall’ We have selected C. sinensis ‘Newhall’ for sequencing because of its interesting features including maturation in the winter, whereas most other sweet orange cultivars mature before the summer, a second “twin” fruit opposite its stem (navel), and seedlessness. As a side note, it is worth mentioning that this project was started in 2019, which has suffered many hurdles caused by COVID19 similar to others. In total, 54 gigabases (Gb) Illumina paired-end short read data, 35 Gb PacBio HIFI read data, and 45 Gb Hi-C data were produced ([79]Table S1). The genome of Newhall navel orange was de novo assembled using both HIFI and Illumina reads. The de novo assembly length was 685.27 Mb, including 1,624 contigs with N50 value of 12.5 Mb. Those contigs were further assembled into 1,013 scaffolds with N50 value of 32.8 M ([80]Table S2) using Hi-C scaffolding methods. The final assembly contained 618 Mb in 18 chromosomes, which were assigned into primary or secondary haplotypes. Only 67 Mb sequences were unassigned to a scaffold ([81]Figure 1, [82]Tables S3 and [83]S4). Syntenic blocks between homologous chromosomes indicated that the haplotypes were phased correctly ([84]Figure S1) Figure 1. [85]Figure 1 [86]Open in a new tab Genomic features of the Citrus sinensis Osbeck cv. Newhall (A–G) The circle graph depicts the genomic characteristics, including chromosome length (A), gene density (B), density of transposable element (C), density of miRNA (D), density of snRNA (E), density of rRNA (F), density of tRNA (G), and the synteny between homologous chromosomes (H). The quality of assembly was supported by the k-mer analysis, which indicated that the haploid genome size was 350.15 Mb, approximately half of the assembled diploid genome size ([87]Tables S2 and [88]S5). In addition, BUSCO (95.1% of completeness), CEGMA (97.74% of completeness) assessments, and genomic sequencing coverage (99.97%) indicated high integrity of the phased diploid sweet orange genome assembly ([89]Tables S6–S8). Synteny blocks were identified and revealed an overall synteny between the homoeologous chromosomes in Newhall navel orange ([90]Figures 1, [91]S1 and [92]S2). Relatively low similarity values (37.41%–46.82%) were observed between the homoeologous chromosomes in Newhall navel orange ([93]Table S9). A total of 356 Mb (52%) of the assembled genome was masked and annotated as repeated sequences, of which 44.42% were long terminal repeat (LTR) retrotransposons, 1.74% were DNA transposable elements, 0.83% were long interspersed nuclear elements (LINE), and 0.06% were short interspersed nuclear elements (SINE) ([94]Figure 1, [95]Tables S2 and [96]S10). To reduce false positives in gene prediction, we combined de novo, homology, and RNA-seq based approaches ([97]Figure S3). Consequently, a total of 46,616 gene models were identified ([98]Tables S2 and [99]S11, [100]Figure S3), among which 45,431 (97.46% of total predicted genes) were protein-coding genes ([101]Table S12). The Newhall genome contained 22,916 genes on one chromosome set and 22,824 genes on the other chromosome set, and 876 genes unable to be assigned to either chromosome set. The annotated genes were classified into 26 COG categories including signal transduction, transcription, post-translational modification, protein turnover, and chaperones, carbohydrate transport and metabolism, translation, ribosomal structure, and biogenesis, intracellular trafficking, secretion, and vesicular transport, and secondary metabolites biosynthesis, transport, and catabolism ([102]Table S13), with more than 42.6% of predicted genes as functionally unknown. In addition, 11,221 non-coding RNA sequences were identified and annotated, including 931 micro RNAs (miRNAs), 1,200 transfer RNAs (tRNAs), 6,765 ribosomal RNA (rRNAs) and 2,325 small nuclear RNA (snRNA) ([103]Figure 1, [104]Table S14). We found most homologous gene pairs (13,257/15,423) have a Ka/Ks ratio <1, indicative of purifying selection ([105]Figure S4). A total of 894 gene pairs with a Ka/Ks ratio >1 may have undergone positive selection, which included genes involved in metabolic and biosynthetic process, and response to stress and stimulus ([106]Figure S5). Pan-genome analyses of genes that are differentially present in HLB tolerant and susceptible genotypes We explored the level of presence/absence sequence variation across the genus citrus and its relatives to dissect the genetic determinants of HLB pathogenicity. This was done using 9 previously assembled genomes of citrus and relatives and the newly sequenced Newhall genome. The 10 citrus accessions and relatives were classified into HLB- susceptible (6 citrus accessions) and-tolerant groups (4 citrus accessions) ([107]Table S15). Here we have defined HLB-tolerant trees as those showing vigorous growth (such as thick canopy and not dying) in the presence of CLas and HLB symptoms, whereas HLB-susceptible trees refer to those without vigorous growth (such as with thin canopy and dying) in the presence of CLas and HLB symptoms based on the description in the original publications. Pan-genome analyses of the 10 genomes identified 50,442 gene families. The total number of gene sets continued to increase with the addition of each genome and was approaching but did not reach a plateau at n = 10 ([108]Figure 2A), suggesting more high-quality genomes of citrus and its relatives are required. Specifically, 13,301, 4,559, 12,135, and 20,447 gene families were defined as core, soft-core, dispensable, and private genes, respectively ([109]Figure 2B). The fraction of core genes (core plus soft core) in the citrus pan-genome (35%) was in line with previous studies (35–87%).[110]^27^,[111]^28^,[112]^29^,[113]^30^,[114]^31^,[115]^32^,[1 16]^33^,[117]^34^,[118]^35 The dispensable and private gene families accounted for 64.6% of the total pan gene families, enlarging gene resources of citrus reference genome. The core gene families accounted for more than 51.7% genes of each genome ([119]Figure 2C), suggesting conserved genomic features among citrus and relatives. In addition, wild citrus species, such as Ichang papeda, citron, and kumquat, and primitive citrus Atlantia, showed higher proportion of private genes than cultivated citrus accessions ([120]Figure 2C). More than 47.2% of pan genome gene families were functionally unknown, which was higher than the diploid genome of C. sinensis ‘Newhall’ (42.3%) ([121]Figure S6, [122]Tables S13, and [123]S16). Figure 2. [124]Figure 2 [125]Open in a new tab Pan genome of Citrus and relatives (A) Rarefaction curve of detected genes in the pan and core genomes. (B) The composition of the pan genome, including core, soft core, dispensable and private genes (pie plot). The histogram depicting the number of gene families of pan genome under different presence frequency in 26 accessions of Citrus and relatives. (C) Presence and absence information of pan gene families in accessions of citrus and citrus relatives. (D) Gene number of each composition for each individual genome. HKC: Atalantia buxifolia, CSV: ‘Valencia’ sweet orange, FOR: Fortunella hindsii, CSN: ‘Newhall’ sweet orange, XZ: Citrus medica, XJC: Citrus ichangensis, HWB: pummelo, CMS: Citrus reticulata'Mangshan', PON: Poncirus trifoliata, and CLM: Clementine mandarin. Whole-genome alignment between the C. sinensis Valencia genome and each of the 9 other genome assemblies was performed. A total of 77,609 InDels (37,355 deletions, 40,254 insertions) were identified in addition to 3 million single nucleotide polymorphisms (SNPs) ([126]Figure S7, [127]Table S17). We searched genes with presence/absence variation between the HLB tolerant and susceptible groups ([128]Table S15). Because HLB is a pathogen-triggered immune disease,[129]^10 we paid close attention to immunity related genes. The genes families involved in plant immunity, such as CDPK, MAPK, NBS-LRR, NPR, hormone, PLCP, PR, RLKs, ROS, and antioxidant genes, were presented mostly in dispensable and private gene sets ([130]Figure 3A). Furthermore, HLB-susceptible citrus accessions were enriched for the plant immunity genes that were absent in HLB-tolerant accessions ([131]Figure 3B, [132]Table S18), suggesting a possible explanation for the stronger systemic and chronic immune responses observed in susceptible citrus accessions compared to those that are tolerant. Figure 3. [133]Figure 3 [134]Open in a new tab Plant immunity-related genes in citrus pan genome (A) The composition of plant immunity-related genes in citrus pan genome, including core, soft core, dispensable and private genes. (B) The HLB-tolerant (including resistant accessions) and-susceptible citrus accessions specific plant immunity-related genes. The plant immunity-related genes here refer to CDPK(Calcium-Dependent Protein Kinases), MAPK(Mitogen-Activated Protein Kinase Cascades), NBS-LRR (Nucleotide Binding Domain and Leucine-rich Repeat), NPR(Nonexpressor of Pathogenesis-related Genes), PLCP(Papain-Like Cysteine Proteases), PR (Pathogenesis-related Protein), RLKs (Receptor-Like Kinases), ROS(Reactive Oxygen Species), snakin, and plant hormone. We further analyzed group (HLB-tolerant and-susceptible)-specific genes related to immunity ([135]Table S15). Most of immunity related genes were present in all the accessions. Nevertheless, we have identified multiple interesting genes including orange1.1t03332.1 (NBS-LRR), orange1.1t04682.1 (NBS-LRR), orange1.1t05285.1 (PLCP, cysteine protease-like protein), Cs6g22310.1 (lectin), orange1.1t05183.1 (Leucine-rich repeat receptor-like protein kinase), Cs1g05340.1 (LRR-XII), Cs9g13810.1 (RLCK-XII/XIII), and Cs6g09910.5 (MAPKKK, Raf31), which were present in most HLB susceptible accessions (67–83%), but were absent in all HLB resistant accessions. On the contrary, only Cs2g10550.1 (Leucine-rich repeat receptor-like protein kinase), and Cs1g05370.1 (Serine-threonine protein kinase, plant-type) were present in 75% of four HLB tolerant accessions but absent in the 6 HLB susceptible accessions. In addition, 74 antioxidant biosynthesis and antioxidant enzyme genes were absent in susceptible citrus accessions but were present in one or more tolerant citrus accessions ([136]Data S1). Indel analyses of genes involved in plant immunity in 26 HLB-resistant,-tolerant, or-susceptible citrus accessions The pan-genome analysis of 10 assembled citrus genomes included only accessions susceptible or tolerant to HLB, but none with HLB resistance (refers to citrus plants with no HLB symptoms in the presence of CLas or inhibiting CLas growth). To search for genomic signatures associated with resistance to HLB we expanded our interspecific analysis of genomic variation to a panel of 26 citrus accessions and relatives with members exhibiting all three classes of response to HLB infection. The additional 16 accessions were selected based on their phylogenetic relationships and high-quality genome resequencing data (20–126 x coverages) ([137]Table S15). In total, we have identified 263,177, 26,891, and 1369 indels in the resistant, tolerant, and susceptible groups, respectively. Specifically, we identified indels in the coding region of 26 NBS-LRR, 3 receptor-like kinase, and 1 SOD genes in the resistant group, 4 NBS-LRR and 1 SOD genes in the tolerant group, 1 NBS-LRR and 1 receptor-like kinase genes in the susceptible group ([138]Table S19). Consistent with the model that HLB is a pathogen-triggered immune disease,[139]^10 HLB resistant/tolerant citrus accessions contain more indel mutations in plant immunity genes than HLB susceptible accessions that might contribute to the reduced immune responses in tolerant/resistant citrus genotypes compared to the susceptible genotypes.[140]^6^,[141]^7 GWAS analysis of citrus genes that are potentially responsible for the HLB resistance/tolerance or susceptibility GWAS has been widely used to understand the genetic basis of plant disease resistance and susceptibility. To further explore the genetic basis of HLB resistance, we performed GWAS analysis using a large panel of both HLB susceptible and resistant/tolerant varieties ([142]Figure 4, [143]Table S20, [144]Data S2) using whole genome sequences of 447 citrus accessions and relatives.[145]^36 We have sequenced 91 additional citrus genotypes ([146]Table S21) to complement public sequence datasets of 356 accessions. A total of 7.59 million SNPs for 447 citrus accessions and relatives were generated, which were subsequently filtered by quality and sequence depth, and SNPs with minor allele frequency (MAF) more than 0.01 and individual level missingness less than 10%. A total of 252,357 high quality SNPs across the whole genome were used for HLB GWAS analysis ([147]Figure S8). The principal component analysis based on SNP data suggested 447 citrus accessions and relatives showed population stratification, which was accounted in GWAS analysis ([148]Figure S9). The Quantile-Quantile Plot suggested the robustness of our GWAS analysis ([149]Figure S10). We found HLB-associated citrus genomic SNPs were from a large number of genes across whole genome ([150]Figure 4A), suggesting that the citrus genetic effect on HLB may be explained by the omnigenic model (a large number of genes). Such a model indicates that complex traits are influenced by core genes with direct effects as well as by a modest number of genes or pathways with small effects.[151]^37 252 SNPs, including 86 from coding region and 166 from non-coding region were significantly associated with HLB resistance/tolerance and susceptibility ([152]Figure 4A, [153]Data S2, adjusted pvalue <1e-5). 37% of genes (32/86) containing SNPs with significant association with HLB resistance/tolerance and susceptibility were involved in plant immunity, ROS, and stress response ([154]Figure 4B, [155]Data S2). However, no group-specific SNPs were identified for HLB resistant/tolerant and susceptible groups even though SNPs from 10 genes showed different patterns between HLB susceptible and tolerant/resistant accessions ([156]Figure 5, [157]Data S2). These 10 genes included PCS1 (Phytochelatin synthase1), mutation of which impairs callose deposition, bacterial pathogen defense and auxin content[158]^38^,[159]^39; CNGC1 (Cyclic nucleotide-gated ion channel 1) that acts as Ca^2+-permeable channels involved in Ca^2+ oscillations and receptor-mediated signaling during plant immunity[160]^40; two RLKs, pabAB (para-aminobenzoate synthetase) which has been shown to scavenge the reactive oxygen species in vitro[161]^41; and RPPL1 (NLB RPP13-like protein 1). Figure 4. [162]Figure 4 [163]Open in a new tab The genomic variations showing significant associations with citrus HLB based on GWAS (A) Manhattan plots depicting HLB (A) showing significant associations with citrus genomic variations. Points over the blue line represent SNPs showing significant associations with HLB (corrected P-value < 1e-5). (B) The functional pathway distribution of genes that contain the HLB associated SNPs. Figure 5. [164]Figure 5 [165]Open in a new tab The type of citrus HLB disease associated SNPs among different citrus varieties These SNPs were from genes involved in plant defense (AGAP1, LLR4, MIK2-LIKE, LRR2, RPPL1 and SUT1), and stress response (SEC11 and SPHK). AGAP1, ACYLATED GALACTOLIPID- ASSOCIATED PHOSPHOLIPASE 1; LLR4, MALECTIN-LIKE DOMAIN (MLD)- AND LEUCINE-RICH REPEAT (LRR)-CONTAINING PROTEIN 4; MIK2-LIKE, MDIS1-INTERACTING RECEPTORLIKE KINASE2 like; LRR2, Leucine-rich repeat protein 2; RPPL1, disease resistance RPP13-like protein 1; SUT1, SUPPRESSORS OF TOPP4-1; SEC11, signal peptidase I; SPHK, sphingosine kinase. Allele-specific expression (ASE) Next, we used the phased diploid genome assembly of C. sinensis to investigate whether ASE contributes to the differences in HLB response between C. sinensis ‘Valencia’, an HLB susceptible cultivar, and Sugar Belle mandarin LB9-9, an HLB tolerant cultivar. Both cultivars mainly contain genes originated from either mandarin or pummelo. Mandarin LB9-9 is tolerant to HLB, whereas Valencia sweet orange is susceptible to HLB.[166]^5 We conducted ASE analyses for two set of RNA-seq data of C. sinensis ‘Valencia’ and Sugar Belle mandarin LB9-9 including both HLB symptomatic and asymptomatic samples.[167]^42 A total of 1623 and 1668 genes with ASE were identified for Valencia and LB9-9, respectively ([168]Data S3). For Valencia, expression of 892 genes of the mandarin allele was higher than the allele with pummelo ancestry, with 731 displaying the reverse pattern with higher expression of the pummelo allele. For LB9-9, 1012 genes of mandarin origin had higher expression whereas 656 genes of pummelo origin had higher expression ([169]Data S3). We further compared the ASE genes and differentially expressed genes (DEGs) between HLB symptomatic vs. asymptomatic Valencia or LB9-9. Intriguingly, 615 DEGs in symptomatic vs asymptomatic Valencia and 484 DEGs of LB9-9 were also ASE genes. Only 177 genes were shared between the DEG and ASE analysis including 166 genes showing similar response to HLB in both Valencia and LB9-9 and only 11 genes showed opposite expression pattern to HLB ([170]Data S3). The 11 genes included orange1.1t03406 (RLK), Cs9g12460 (raffinose synthase), orange1.1t03953 (haloacid dehalogenase-like hydrolase domain-containing protein), Cs5g18710 (licodione synthase), Cs5g21200 (indole-3-acetate beta-glucosyltransferase), Cs4g03330 (mandrin 4-coumarate—CoA ligase 1), Cs2g19490 (leucine-rich repeat, cysteine-containing type RLK), Cs2g19440 (Glucose-6-phosphate dehydrogenase), Cs2g04330 (LRR-VIII-2 RLK), Cs2g01740 (peptidase aspartic), and Cs1g12660 (caffeoyl-CoA O-methyltransferase). Nine genes, including orange1.1t03406, orange1.1t03953, Cs5g18710, Cs5g21200, Cs4g03330, Cs2g19490, Cs2g19440, Cs2g01740, and Cs1g12660 were upregulated in sweet orange and downregulated in LB9-9 under CLas infection condition. Two genes, including Cs9g12460 and Cs2g04330 were downregulated in sweet orange and upregulated in LB9-9 under CLas infection condition ([171]Data S3). Discussion We have completed the phased diploid genome assembly of C. sinensis ‘Newhall’. There are currently nine assembled genomes for citrus and relatives.[172]^1^,[173]^21^,[174]^22^,[175]^23^,[176]^24^,[177]^25 Even though citrus and its relatives are diploid, none of the nine assembled genomes were chromosome-level phased genomes. A phased genome assembly with allelic information is critical for dissecting the special genetic characteristics, accurately evaluating somatic mutation calling and gene expression, as well as conducting allelic level analysis.[178]^43 The first chromosome-level phased genome of C. sinensis was completed for Valencia sweet orange.[179]^43 ‘Newhall’ navel is also sweet orange but with distinct characteristics from Valencia sweet orange, including the harvest time, navel, seeds, and juice contents. Completion of the chromosome-level phased genome for both Newhall and Valencia will provide useful resource to investigate the underlying genetic determinants for such traits. In addition, Newhall chromosome-level phased genome was successfully used in this study to identify multiple candidate genes contributing to the difference in HLB tolerance between mandarin LB9-9 and Valencia. We have sequenced 91 citrus accessions in this study. In total, 447 citrus accessions and relatives have been sequenced so far. Importantly, the power of GWAS is boosted when sample size is large.[180]^44 Imai and colleagues have demonstrated GWAS is suitable for citrus analysis based on analysis of 110 citrus accessions composed of landraces, modern cultivars, and in-house breeding lines.[181]^45 The large amount of whole genome sequencing data of citrus and relatives enabled GWAS analysis of putative genetic determinants of HLB resistance/tolerance and susceptibility. Mattia et al. conducted GWAS analysis of genes responsible for flavonoid biosynthesis of diverse mandarin accessions.[182]^46 Another study by Minamikawa et al. used GWAS analysis to identify putative genetic traits controlling fruit quality.[183]^47 In both studies, SNP arrays were used. With the advances of high-throughput sequencing, whole genome sequencing has become a viable genotyping technology for use in GWAS analyses, offering the potential to analyze a broader range of genome-wide variations.[184]^36 Whole genome sequence provides a significant advantage over array-based methods, with the potential to detect and genotype all variants present in a sample, not only those present on an array or imputation reference panel.[185]^36 However, few GWAS analyses have been performed for citrus using whole genome sequencing data to date, probably owing to the challenges in handling a large amount of genomic data. This GWAS analysis using whole genome sequencing data in this study has advanced our understanding of HLB resistance, tolerance or susceptibility in citrus accessions. Citrus genomic resources generated in the past and in this study enabled identification of candidate genes underlying the HLB resistance, tolerance, or susceptibility via Pan genome analysis of genes presence and absence in 10 citrus accessions, indel analyses of 26 citrus accessions, GWAS analysis of 447 citrus accessions, and ASE analysis of Valencia and mandarin LB9-9. Among the identified genes, NBS-LRR genes are the most abundant. This probably evolves from that CLas resides inside the phloem sieve elements[186]^2 and NBS-LRRs are involved in direct recognition of CLas proteins either on the surface or released.[187]^48^,[188]^49^,[189]^50 More unique NLRs genes were found in HLB-susceptible citrus accessions than in HLB-tolerant accessions, suggesting CLas might trigger more severe immune responses in HLB susceptible citrus genotypes. More indel mutations were identified in NLR genes in the HLB resistant group than in the susceptible group, which suggest that such mutations might lead to reduced immune responses to CLas in the HLB resistant group. In addition to NBS-LRR, genes encoding CDPK, PRRs, and RLCKs were also identified. CDPKs play critical roles in plant immunity, including regulation of oxidative burst, gene expression, and hormone signal transduction.[190]^51 PRRs including receptor-like kinases (RLKs) and receptor-like proteins (RLPs) usually localize on the membrane to detect MAMPs in the apoplast. It is possible that some PAMPs of CLas, such as LPS and flagella,[191]^52^,[192]^53 are released into the apoplast to be sensed by citrus cells. RLCKs including BOTRYTIS-INDUCED KINASE 1 (BIK1) and related PBS1-like kinases act synergistically with multiple PRRs to allow subsequent phosphorylation of the transmembrane NADPH-oxidase RESPIRATORY BURST OXIDASE D (RBOHD), the main producer of ROS during pathogen infection.[193]^54^,[194]^55^,[195]^56 Consistent with our observation, it was reported that CLas causes less expression changes in immunity genes for the HLB-tolerant US-897 than the HLB-susceptible ‘Cleopatra’ mandarin,[196]^7 which might result from the differences in the NLR, RLK, RLCK, and CDPK genes. Intriguingly, 74 antioxidant biosynthesis or antioxidant enzyme genes were absent in susceptible citrus accessions but were present in one or more tolerant citrus accessions. This is consistent with that higher antioxidant levels and antioxidant enzyme activities were reported to account for the higher tolerance of Persian triploid lime than Mexican lime.[197]^6 Because the presence/absence pattern of antioxidant biosynthesis or antioxidant enzyme genes was not universal, they are likely responsible for some citrus genotypes, but not all genotypes. For example, HLB-tolerant Sugar Belle mandarin LB9-9 has similar antioxidant enzyme activities as HLB-susceptible Valencia sweet orange but has higher phloem regeneration capacity than the later. Antimicrobial peptides were also reported to be responsible for HLB resistance in Australian Finger lime.[198]^11 Overall, this study has provided a phased chromosome-level genome assembly for C. sinensis ‘Newhall’, and sequenced 91 new citrus accession, which were used for identification of putative genes responsible for HLB resistance, tolerance, or susceptibility. Such genes should be further verified using other approaches such as mutagenesis or gene silencing. Identification of genetic determinants responsible for HLB resistance, tolerance, or susceptibility in citrus accessions provide useful targets for developing of HLB-resistant/tolerant citrus cultivars using the CRISPR genome editing tool.[199]^57^,[200]^58 Limitations of the study By analyzing HLB susceptible, tolerant, or resistant citrus genotypes, we have identified putative genetic determinants of HLB pathogenicity. However, none of them have been verified using experimental approaches such as RNAi or CRISPR technologies.[201]^59^,[202]^60 We have conducted pan-genome analyses using 10 genomes. However, the pan-genome of Citrus did not reach a plateau, which requires more high-quality genomes of citrus and its relatives. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples __________________________________________________________________ C. sinensis Osbeck cv. Newhall National Navel Orange Engineering Research Center, Ganzhou, Jiangxi Province, China N/A Leaf and root samples of citrus germplasms Lower Variety Grove, California Citrus State Historic Park, Riverside, California N/A __________________________________________________________________ Deposited data __________________________________________________________________ Genome sequencing data of C. sinensis'Newhall' This study SRA Bioproject: PRLNA810206. Genome sequencing data of new sequenced 91 citrus accessions This study SRA Bioproject: PRJNA698060 __________________________________________________________________ Software and algorithms __________________________________________________________________ Hifiasm v 0.15.4 Cheng et al., 2021[203]^61 [204]https://github.com/chhylp123/hifiasm ALLHIC v 0.9.8 Zhang et al., 2019[205]^62 [206]https://github.com/tangerzhang/ALLHiC juicebox v1.11.08 Durand et al., 2016[207]^63 [208]https://github.com/aidenlab/Juicebox BUSCO V10 Manni et al., 2021[209]^64 [210]https://github.com/WenchaoLin/BUSCO-Mod CEGMA v2 Parra et al., 2007[211]^65 [212]https://github.com/KorfLab/CEGMA_v2 BWA v 0.7.17 Li and Durbin, 2010[213]^66 [214]https://github.com/lh3/bwa TRF Benson, 1999[215]^67 [216]https://github.com/Benson-Genomics-Lab/TRF LTR_FINDER Xu and Wang, 2007[217]^68 [218]https://github.com/xzhub/LTR_Finder RepeatScout Lian et al., 2016[219]^69 [220]https://github.com/mmcco/RepeatScout RepeatMasker Tempel, 2012[221]^70 [222]https://github.com/rmhubley/RepeatMasker BLAST v2.2.26 Camacho et al., 2009[223]^71 [224]https://github.com/ncbi/blast_plus_docs GeneWise v2.4.1 Birney et al., 2004[225]^72 [226]https://bio.tools/genewise Geneid v1.4 Alioto et al., 2018[227]^73 [228]https://github.com/guigolab/geneid Genescan v1.0 Burge and Karlin, 1997[229]^74 [230]http://hollywood.mit.edu/GENSCAN.html GlimmerHMM v3.04 Majoros et al., 2004[231]^75 [232]https://github.com/kblin/glimmerhmm Trinity v2. 1. 1 Grabherr et al., 2011[233]^76; Haas et al., 2013[234]^77 [235]https://github.com/trinityrnaseq/trinityrnaseq Hisat v2.0.4 Kim et al., 2019[236]^78 [237]https://github.com/DaehwanKimLab/hisat2 StringTie v1.3.3 Pertea et al., 2015[238]^79 [239]https://github.com/gpertea/stringtie EvidenceModeler (EVM) v1. 1. 1 Haas et al., 2008[240]^80 [241]https://github.com/EVidenceModeler DIAMOND v0.8.22 Buchfink et al., 2015[242]^81 [243]https://github.com/bbuchfink/diamond tRNAscan-SE Lowe and Chan, 2016[244]^82 [245]https://github.com/UCSC-LoweLab/tRNAscan-SE SAMtools v1.10 Li et al., 2009[246]^83 [247]https://github.com/samtools/ GATK v4.1.9.0 McKenna et al., 2010[248]^84 [249]https://github.com/broadgsa/gatk CD-HIT v4.8 Li and Godzik, 2006[250]^85 [251]https://github.com/weizhongli/cdhit OrthoFinder v2.2.7 Emms and Kelly, 2019[252]^86 [253]https://github.com/davidemms/OrthoFinder SOAPnuke v1.5.3 Chen et al., 2018[254]^87 [255]https://github.com/BGI-flexlab/SOAPnuke Bowtie2 v 2.2.6 Langmead and Salzberg, 2012[256]^88 [257]https://github.com/BenLangmead/bowtie2 GEMMA v 0.98.1 Zhou and Stephens, 2014[258]^89 [259]https://github.com/genetics-statistics/GEMMA HTSeq-count v0.11.2 Anders et al., 2015[260]^90 [261]https://github.com/simon-anders/htseq DESeq2 Love et al., 2014[262]^91 [263]https://github.com/mikelove/DESeq2 [264]Open in a new tab Resource availability Lead contact Further information and requests for resources should be directed to, and will be fulfilled by, the lead contact, Nian Wang ([265]nianwang@ufl.edu). Materials availability Besides data, this study did not generate any new reagents or materials. Experimental model and subject details This study does not include experimental model or subjects. Method details Plant materials The Newhall navel sweet orange (C. sinensis Osbeck cv. Newhall) was sequenced in this study. The plant materials of Newhall navel sweet orange were obtained for genomic DNA and RNA extractions from the National Navel Orange Engineering Research Center, Ganzhou, Jiangxi Province, China. Leaf and root samples of citrus germplasms were collected from the Lower Variety Grove, California Citrus State Historic Park, Riverside, California for GWAS study. DNA and RNA extraction Genomic DNA for Illumina sequencing was extracted using the phenol-chloroform method.[266]^92 Genomic DNA was isolated using Nanobind Plant Nuclei Big DNA Kit (Circulomics Inc., Baltimore, MD, USA) following the manufacturer’s instructions for PacBio and Hi-C sequencing. For full length transcript sequencing (Iso-Seq), the RNA was extracted using an RNAprep Plant Kit (Qiagen, Valencia, CA, USA). Genomic DNA for BGI sequencing was extracted using MoBio Powersoil DNA extraction kit (MoBio Laboratories Inc. Carlsbad, CA, USA) following the manufacturer’s instructions. The quality of genomic DNA and RNA was evaluated using agarose gel electrophoresis and Qubit 3.0 Fluorometer (Life Technologies, USA). Library construction and sequencing Following the manufacturer’s protocol of short read DNA sequencing from Illumina,[267]^93 the library was prepared. After quality control, quantification, and normalization of the DNA libraries, 150-bp paired-end reads were generated using the Illumina NovaSeq 6000 platform according to the manufacturer’s instructions. PacBio HiFi SMRTbell Library with 15 kb DNA fragment was constructed following the manufacturer’s protocol. The constructed library was then sequenced by Pacbio Sequel II platform according to the manufacturer’s instructions. Hi-C libraries were prepared by a standard procedure.[268]^94 Formaldehyde solution was used to fix plant cells. Then, 2.5 M glycine was added to quench the cross-linking reaction. The crosslink DNA was treated with restriction enzymes (e.g., HindIII), creating a gap on both sides of the crosslinking point. The exposed DNA ends were repaired and covalently linked with biotin-14-dCTP (Invitrogen Life Technologies, Carlsbad, CA), and then connected by T4 DNA ligase (Invitrogen Life Technologies, Carlsbad, CA) to form a closed random circular DNA structure. Proteinase K (Invitrogen Life Technologies, Carlsbad, CA) was used to digest proteins at the junction point to disconnect the crosslink of the protein and DNA. Genomic DNA was extracted and later fragmented into 350 bp. Capturing and enriching biotin-labeled fragments was conducted via the affinity of streptomycin to biotin, which was used to construct Illumina library. Hi-C sequencing libraries were amplified by PCR and sequenced on Illumina HiSeq-2500 platform (PE 125 bp). The 91 citrus genomes for GWAS study were sequenced using BGISEQ500 platform. Shotgun genomic library preparation and sequencing were performed per the manufacturer’s protocol at BGI-Shenzhen, China. Briefly, 500 ng of input DNA was used for library generation and fragmented ultrasonically to yield 400 to 600 bp of fragments. DNA fragments were then end-repaired and A-tailed, and adaptors with specific barcodes were added. PCR amplification of DNA fragments was carried out to generate a single-strand circular DNA library. The DNA libraries were sequenced by BGISEQ500 using a paired-end 100-bp sequencing strategy. On average, more than 41.9 Gb of raw data were generated for each genomic sample for GWAS analysis. De novo genome assembly for Newhall navel sweet orange Hifiasm v 0.15.4[269]^61 with default parameters was used to assemble HiFi reads into scaffolds, which were corrected using short Illumina DNA reads. Based on Hi-C sequencing data, the assembled scaffold sequences were mounted to the near-chromosome level using ALLHIC v 0.9.8.[270]^62 According to chromosome interaction intensity using juicebox v1.11.08,[271]^63 the near-chromosome level genome was manually corrected into a chromosome-level genome. Genome quality assessment To assess the assembly quality of the assembled genomes, the completeness of the assembly was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) v10,[272]^64 and the Conserved Core Eukaryotic Gene Mapping Approach (CEGMA) v2.[273]^65 The coverage of assembled genomes was also calculated by mapping Illumina short reads to the assembly using Burrows-Wheeler Aligner (BWA).[274]^66 Repeat element identification Repeat elements of whole genome were identified using both homology alignment and de novo prediction. Tandem repeat was extracted using TRF[275]^67 by ab initio prediction. The de novo repetitive element library was constructed using LTR_FINDER,[276]^68 RepeatScout,[277]^69 and RepeatModeler.[278]^95 Homolog repeats were predicted based on Repbase database[279]^96 employing RepeatMasker software[280]^70 and its in-house scripts RepeatProteinMask with default parameters. Using uclust program, a non-redundant transposable element (TE) library (a combination of homolog repeats and de novo TE) was generated, which was applied to mask the genome using RepeatMasker software. Gene structure annotation Homology-based prediction, ab initio prediction, and RNA-Seq assisted prediction were employed to perform the gene model prediction. Genomic sequences were aligned to homologous proteins using tblastn v2.2.26[281]^71 with a threshold of E-value ≤ 1e−5. Based on the matched proteins from reference genomes, GeneWise (v2.4.1)[282]^72 software was used to predict gene structure. Augustus v3.2.3,[283]^97 Geneid v1.4,[284]^73 Genescan v1.0,[285]^74 GlimmerHMM v3.04,[286]^75 and SNAP_2013-11-29 were used for the automated de novo gene prediction. The transcriptome assembly was performed using Trinity v2. 1. 1[287]^77^,[288]^76 for the genome annotation. To identify exon region and splice positions, the RNA-Seq reads from leaf, root, and fruit tissues were aligned to genome using Hisat v2.0.4[289]^78 with default parameters. The alignment results were then used as input for StringTie v1.3.3[290]^79 with default parameters for genome-based transcript assembly. The non-redundant reference gene set was generated by merging genes predicted from three methods using EvidenceModeler (EVM) v1. 1. 1.[291]^80 Gene functional annotation Gene functions were assigned according to the best match by aligning the protein sequences to the Swiss-Prot[292]^98 using Blastp[293]^71 with a threshold of E-value ≤ 1e−5.[294]^99 The motifs and domains were annotated using InterProScan v5.31[295]^100 by searching publicly available databases, including Pro-Dom,[296]^101 PRINTS,[297]^102 Pfam,[298]^103 SMRT,[299]^104 PANTHER[300]^105^,[301]^106 and PROSITE.[302]^107 The Gene Ontology (GO)[303]^108 IDs for each gene were assigned according to the corresponding Inter-Pro[304]^109 entry. We predicted the proteins function by transferring annotations from the closest BLAST hit with a threshold of E-value <10^−5 from the Swiss-Prot[305]^98 and the NR databases[306]^110 using DIAMOND v0.8.22.[307]^81 We also mapped gene set to the KEGG[308]^111 and eggNOG databases[309]^112 to generate more comprehensive gene functional information. To infer the evolutionary trajectories of genes in Newhall navel orange, we calculated the number of substitutions per synonymous site (Ks) and the number of substitutions per nonsynonymous site (Ka) for each homologous gene pairs. A Ka/Ks ratio more than 1, less than 1, and equal to 1 indicates positive selection, purifying selection and neutral evolution, respectively.[310]^113^,[311]^114 Non-coding RNA annotation tRNAs were predicted using the program tRNAscan-SE.[312]^82 rRNA sequences were predicted using Blast[313]^71 with relative species’ rRNA sequences. Other ncRNAs, including miRNAs and snRNAs, were identified by searching the Rfam database[314]^115 with default parameters. Identification of genomic variations We used For SNPs and InDels identification, short sequencing reads were aligned to the high-quality sweet orange genome as the reference genome[315]^116 using BWA software v 0.7.17 (Li and Durbin, 2010). Duplicated mapping reads and unmapped reads were removed with the SAMtools v1.10.[316]^83 All genotype information for the polymorphic sites was generated using the GATK v4.1.9.0 population method.[317]^84 The generated SNPs and InDels were filtered by quality and sequence depth. Pan genome construction To construct the citrus pan genome, we generated the gene families from 10 citrus and relative genomes ([318]Table S1). For each genome, a gene containing CDS with 100% similarity to other genes was removed by using the cd-hit-est of CD-HIT v4.8 toolkit.[319]^85 Protein sequences of the remaining genes were subjected to homologous searching by BLASTp[320]^71 with default parameters. OrthoFinder v2.2.7 [321]^86 was used to deal with the BLAST result with default parameters to make gene family clustering. The gene families shared among all accessions were defined as core gene families. The gene families that were missed in one or two accessions were defined as softcore gene families. The gene families that were missed in more than two accessions were defined as dispensable gene families, and gene families that only existed in one accession were defined as private gene families. We used eggNOG database to generate gene functional information for the citrus pan genome. The eggNOG pathway enrichment analysis of core, softcore, dispensable, and private gene families was performed using Fisher exact test in R program. GWAS To generate comprehensive SNP genotyping data for GWAS analysis, we sequenced 91 new citrus genomes and collected 356 representative genomic data of citrus and its relatives from public database ([322]Tables S20 and [323]S21). Raw reads for each genomic sample were filtered by SOAPnuke (v1.5.3) with the parameters set as “filterMeta-Q 2-S-L 15-N 3 –P 0.5-q 20-L 60-R 0.5–5 0”.[324]^87 The trimmed reads were mapped to the sweet orange genomes,[325]^21^,[326]^116C. clementina,[327]^1 pummelo, Citron, kumquat, Atlantia, Papeda,[328]^22Citrus mangshanensis,[329]^24 and Swingle citrumelo[330]^117 genomes using Bowtie2 software[331]^88 to identify the citrus reads. The high quality paired-end short citrus genomic reads were mapped to pummelo (Citrus maxima)[332]^22 reference genome, which is of the highest-quality among sequenced genomes, using BWA software v 0.7.17.[333]^66 Duplicated mapping reads were removed with the SAMtools package v1.10.[334]^83 All genotype information for the polymorphic sites was generated using the GATK v4.1.9.0 population method.[335]^84 The generated SNPs were filtered by quality and sequence depth, and SNPs with minor allele frequency (MAF) more than 0.01 and individual level missingness less than 10% were retained for GWAS analysis. To perform the association between HLB and citrus genomic variations, we collected the HLB resistant or tolerant information for each citrus genotypes from previous studies ([336]Table S20). The population structure of citrus was estimated using PCA method.[337]^47GWAS for HLB was conducted using the univariate linear mixed model in the GEMMA package v0.98.1, which accounts for population stratification using the first two principal components (PCs) of population structure and kinship matrices.[338]^89 P-values for multiple testing were corrected using the FDR method in R program.[339]^118 All items with corrected P-values below 1e-5 were considered significant. Significantly associated SNPs were mapped to the citrus reference genome to acquire gene annotations. The genes containing significantly associated SNPs were functionally annotated using the KEGG and agriGO 2.0 databases.[340]^119 Allelic specific expression of transcriptomes of valencia sweet orange and Mandarin LB9-9 in response to CLas infection We generated the allelic specific expression of genes (ASE) of Valencia sweet orange and Mandarin LB9-9.[341]^27 Briefly, cleaned RNA-seq reads were first mapped against both alleles of genome of Newhall navel sweet orange using HISAT2 v2.0.4[342]^78 and SAMtools v1.10[343]^83 to select good-quality alignments. Reads uniquely aligned to the same chromosome in both alleles were preserved and assigned to the allele if the alignment had a higher score and fewer mismatches than to the other allele. Reads that failed to be assigned, which were mainly from homozygous genomic region or regions that were unassembled or haplotype unresolved, were not considered in downstream analyses. The allele-specific reads were mapped to the Newhall navel sweet orange consensus genome using HISAT2 v2.0.4 and SAMtools v1.10. Raw counts for each genes were quantified using HTSeq-count v0.11.2.[344]^90 Genes with total counts >10 in all biological replicates were analyzed for ASE using DESeq2 in R program.[345]^91 A gene was considered to have ASE if the expression difference of the two alleles was significantly greater than 2-fold (adjusted p < 0.05). Quantification and statistical analysis Statistical analysis The statistical analysis details have been described in methods and results. All the statistical analysis was perform using R program. Acknowledgments