Abstract Spiny head croaker (Collichthys lucidus), belonging to the family Sciaenidae, is a small economic fish with a main distribution in the coastal waters of Northwestern Pacific. Here, we constructed a nonredundant chromosome-level genome assembly of spiny head croaker and also made genome-wide investigations on genome evolution and gene families related to otolith development. A primary genome assembly of 811.23 Mb, with a contig N50 of 74.92 kb, was generated by a combination of 49.12-Gb Illumina clean reads and 35.24 Gb of PacBio long reads. Contigs of this draft assembly were further anchored into chromosomes by integration with additional 185.33-Gb Hi-C data, resulting in a high-quality chromosome-level genome assembly of 817.24 Mb, with an improved scaffold N50 of 26.58 Mb. Based on our phylogenetic analysis, we observed that C. lucidus is much closer to Larimichthys crocea than Miichthys miiuy. We also predicted that many gene families were significantly expanded (p-value <0.05) in spiny head croaker; among them, some are associated with “calcium signaling pathway” and potential “inner ear functions.” In addition, we identified some otolith-related genes (such as otol1a that encodes Otolin-1a) with critical deletions or mutations, suggesting possible molecular mechanisms for well-developed otoliths in the family Sciaenidae. Keywords: spiny head croaker, whole genome sequencing, chromosome-level genome assembly, otolith development, Sciaenidae Introduction Spiny head croaker (Collichthys lucidus), belonging to the family Sciaenidae, is a small economic fish with a main distribution in the coastal waters of Northwestern Pacific ([39]Cheng et al., 2012a), from Philippines, China to Japan. With excellent properties and good meat quality, spiny head croaker has been favored by Chinese consumers with a high market value, whereas it has been overfished in the Yangtze estuary area of China ([40]Hu et al., 2015). Furthermore, Sciaenidae fishes are well known for their well-developed otoliths ([41]Xu et al., 2016), which are acellular crystalline mineral deposits in the inner ears of various teleost fishes ([42]Pracheil et al., 2019). Otoliths, mainly composed of calcium carbonate and organic matrix, play vital roles in sound sensing, balance, linear acceleration, and gravity in bony fishes ([43]Schulz-Mirbach et al., 2019). Moreover, otoliths are widely applied in fisheries sciences, such as evaluation of fish populations or population migration patterns, and are essential for paleoichthyological and archeological studies ([44]Barrett, 2019; [45]Barnett et al., 2020; [46]Heimbrand et al., 2020). Several critical genes, including Otoconin-90 (Oc90), secreted protein acidic and rich in cysteine (SPARC), SPARC-like1 (SPARCL1), otopetrin-1 (otop1), otolin-1 (otol1), and Otolithmatrixprotein-1 (OMP-1), have been identified to be related to otolith growth and formation in various vertebrates including bony fishes ([47]Hurle et al., 2003; [48]Murayama et al., 2005; [49]Kang et al., 2008; [50]Petko et al., 2008; [51]Xu et al., 2016b; [52]Hołubowicz et al., 2017). Nevertheless, a systematic screening of otolith-related genes in Sciaenidae species has not been reported yet. With the rapid development of genome sequencing technology and genome-based bioinformatics methods, studies on aquatic genomes and related applications, such as molecular breeding, drug development, new biomaterials, and DNA barcoding technology, have been accumulated ([53]Zhang et al., 2019; [54]Houston et al., 2020; [55]Li et al., 2020; [56]Long et al., 2020). Whole-genome sequencing (WGS) of about 90 fishes has been published around the world by far ([57]Bian et al., 2019; [58]You et al., 2020). At present, genome studies on the Sciaenidae family have focused on the popular large yellow croaker (Larimichthys crocea) and its economic traits ([59]Wu et al., 2014; [60]Ao et al., 2015; [61]Gui et al., 2018; [62]Mu et al., 2018; [63]Chen et al., 2019), while there are only few reports on other Sciaenidae species such as spiny head croaker. Mitochondrial genome maps of spiny head croaker and candidate genes related to its sex determination have been examined by mitochondrial genome sequencing ([64]Cheng et al., 2012b), chromosome assembly ([65]Cai et al., 2019), and RNA sequencing ([66]Song et al., 2020). However, the genetic basis of well-developed otoliths in Sciaenidae species is still unknown. It is therefore necessary to explore genomic resources to gain insights into otolith development mechanisms and to accelerate genome-assisted improvements in biodiversity protection, breeding, and disease prevention of Sciaenidae species. In the past decades, Illumina short-read sequencing technology has been generally employed to assemble various fish genomes. However, with cost reduction of both short- and long-read sequencing technologies, more and more recent WGS projects have introduced PacBio long-read sequences in order to assemble high repetitive regions and improve assembly quality ([67]You et al., 2020). Here, we produced a nonredundant chromosome-level assembly of the spiny head croaker by combination of Illumina short reads, PacBio Single Molecule Real-Time (SMRT) long reads, high-throughput chromosome conformation capture (Hi-C) data, and transcriptome sequences. Moreover, we performed comparative genomics studies on candidate genes related to otolith development to figure out potential mechanisms for the well-developed otoliths in the family Sciaenidae. Materials and Method Sample Collection and Sequencing We extracted genomic DNAs from pooled muscle tissues of a wild female spiny head croaker and sequenced by using an Illumina HiSeq 2500 sequencing platform (San Diego, CA, United States) and a PacBio Sequel sequencing platform (Menlo Park, CA, United States). The construction of DNA libraries (insert sizes of 500 and 800 bp for Illumina, and 20 kb for PacBio) and subsequent sequencing were performed according to the standard protocols. In total, 52.48 Gb of raw Illumina data and five SMRT cells produced using the P6 polymerase/C4 chemistry, producing 35.24 Gb of PacBio long reads, were generated. After filtering by SOAPnuke (v.1.5.6; [68]Chen et al., 2017), we obtained 49.12 Gb of Illumina clean data and 35.24 Gb of PacBio data for subsequent assembly. To acquire a chromosome-level assembly of the genome, genomic DNAs were fixed with formaldehyde and were sheared by a restriction enzyme (MboI) to build a Hi-C library, and then sequenced by an Illumina HiSeq X Ten platform. A total of 185.33 Gb of 150 PE Hi-C data were generated. All sequenced data generated in this study were deposited in the CNGB Nucleotide Sequence Archive under Program no. CNP0001197. We also extracted muscle RNA for transcriptome sequencing by using a HiSeq 2500 platform. Furthermore, to obtain the full-length transcript, the mixed RNA sample from 13 tissues was transcribed to generate full-length cDNA, and the SMRT bell library was constructed using the SMRT bell Template Prep Kit. The libraries were then prepared for sequencing on the PacBio Sequel sequencing platform. Genome Assembly and Chromosome Assembly Firstly, we employed Kmerfreq ([69]https://github.com/fanagislab/kmerfreq) to estimate the genome size with 17-bp k-mers and applied GenomeScope (v1.0; [70]Vurture et al., 2017) to estimate genome heterozygosity. Subsequently, a hybrid genome assembly pipeline was employed to obtain genome assembly. Short Illumina reads were first assembled by using Platanus with “-m 300 -k 27 -s 3” ([71]Kajitani et al., 2014), and DBG2OLC ([72]Ye et al., 2016) was performed to combine Platanus-generated contigs with PacBio reads to generate a hybrid contig assembly with default parameters. Pilon (v.1.225; ([73]Walker et al., 2014) was employed to polish the hybrid assembly. After then, redundancies of the primary assembly were removed by Redundans ([74]Pryszcz and Gabaldón, 2016) with “--identity 0.85 --overlap 0.36.” We performed quality control of Hi-C raw reads and obtained valid Hi-C-connected reads by Juicer (v.1.5; [75]Durand et al., 2016). A 3D de novo assembly (3D-DNA, v.180922; [76]Dudchenko, et al., 2017) pipeline ([77]Dudchenko et al., 2017) was applied to anchor primary contigs into chromosome-level scaffolds. Completeness of the genome assembly was evaluated using by BUSCO v3.0 ([78]Simão et al., 2015) with “-l actinopterygii_odb9 -m genome -c 3 -sp zebrafish.” De Novo Assembly of Transcriptomes We de novo assembled the RNA-seq reads using the Trinity assembler (v2.9.0; [79]Haas et al., 2013) and TGI clustering tool (TGICL; [80]Pertea et al., 2003). The PacBio ISO-Seq3 pipeline ([81]https://github.com/PacificBiosciences/IsoSeq) was used to obtain full-length non-chimeric (FLNC) transcripts via ccs, classify, cluster, and polish stage. FLNC was aligned with genome by minimap2. Gene Prediction and Annotation Repetitive elements in the spiny head croaker genome were identified through a combination of homolog-based and de novo approaches. For the homolog-based method, RepeatMasker (v.4.0.7) ([82]Smit et al., 2019) and RepeatProteinMask (v.4.0.7; [83]Smit et al., 2019) were used to detect repeats by aligning against the Repbase database (v 21.0; [84]Bao et al., 2015). For the de novo method, LTRharvest ([85]Ellinghaus et al., 2008) was applied to predict full long terminal repeat (LTR) retrotransposons. RepeatModeler (v1.0.11; [86]Smit et al., 2019) was employed to build transposable element (TE) consensus sequences as a de novo TE library, and TRF (v.4.09; [87]Benson, 1999) was used to obtain tandem repetitive sequences. RepeatMasker was then used to discover and identify repetitive sequences with the combined library of the de novo TEs. Based on the repeat masked genome, we employed de novo, homology-based, and transcriptome-based prediction methods to annotate protein-coding genes in the assembled genome. Protein sequences of zebrafish (Danio rerio), three-spined stickleback (Gasterosteus aculeatus), Atlantic cod (Gadus morhua), channel catfish (Ictalurus punctatus), spotted gar (Lepisosteus oculatus), Nile tilapia (Oreochromis niloticus), fugu (Takifugu rubripes), downloaded from Ensembl ([88]Hunt et al., 2018), and large yellow croaker (L. crocea, GCF_000972845.2) from NCBI were aligned to the spiny head croaker genome by tBLASTn ([89]Kent, 2002) with “-e 1e-5.” Subsequently, GeneWise ([90]Birney et al., 2004) was used to predict gene structures from BLAST hits. Augustus (v3.3.1; [91]Stanke et al., 2006) was performed to predict de novo genes. We obtain 3,000 intact gene models generated from the homolog-based method randomly to train the parameters of AUGUSTUS then used AUGUSTUS to perform de novo prediction based on the repeat-masked genome with the training parameters. These gene sets that were predicted by different methods were integrated into a nonredundant gene set through the pipelines described in previous research ([92]Xiong et al., 2016). After that, the combined gene set was modified with transcriptome data through PASA (v2.3.3; [93]Haas et al., 2003). Gene functional annotation was performed based on consensus of sequence and domain. The protein sequences were aligned to NCBI Non-Redundant Protein Sequence (NR) databases, Kyoto Encyclopedia of Genes and Genomes (KEGG v89.0; [94]Kanehisa and Goto, 2000), SwissProt, and TrEMBL (Uniprot release 2020-06) ([95]Boeckmann et al., 2003) by BLASTp with “-e 1e-5.” The domains were searched and predicted by using InterProScan v5.11–55.0 ([96]Zdobnov and Apweiler, 2001) ([97]Jones et al., 2014) with publicly available databases including PANTHER ([98]Thomas et al., 2003), Pfam ([99]Bateman et al., 2004), PRINTS ([100]Attwood et al., 2000), ProDom ([101]Servant et al., 2002), PROSITE profiles ([102]Sigrist et al., 2010), and SMART ([103]Letunic et al., 2012). Gene ontology (GO) terms ([104]Ashburner et al., 2000) for each gene were predicted from the InterPro descriptions. Genome Evolution and Gene Family Analysis In order to identify gene families in spiny head croaker, we collected protein sequences of the same species used for homologous annotation as well as miiuy croaker (Miichthys miiuy) and performed the TreeFam methodology ([105]Li et al., 2006) to obtain gene families of these species. We then used RaxML ([106]Stamatakis, 2006) to construct the phylogenetic tree by using the single copy orthologous gene families with the GTRGAMMA model. To identify the synteny between spiny head croaker and large yellow croaker, BLASTp was used to calculate pairwise similarities (e value < 1e-5), and MCScanX package with default parameters was then used for classification. Then, JCVI was performed to generate visualization. A MCMCtree program in PAML (v4.9e; [107]Yang, 2007) was performed to estimate the divergence time between various species in the phylogenetic tree with the REV substitution model. Three calibration time points based on the TimeTree database ([108]http://www.timetree.org) were used as references (T. rubripes-G.