Abstract Background: Short tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases and the regulation of gene expression. Long-read sequencing (LRS) offers a potential solution to genome-wide STR analysis. However, characterizing STRs in human genomes using LRS on a large population scale has not been reported. Methods: We conducted the large LRS-based STR analysis in 193 unrelated samples of the Chinese population and performed genome-wide profiling of STR variation in the human genome. The repeat dynamic index (RDI) was introduced to evaluate the variability of STR. We sourced the expression data from the Genotype-Tissue Expression to explore the tissue specificity of highly variable STRs related genes across tissues. Enrichment analyses were also conducted to identify potential functional roles of the high variable STRs. Results: This study reports the large-scale analysis of human STR variation by LRS and offers a reference STR database based on the LRS dataset. We found that the disease-associated STRs (dSTRs) and STRs associated with the expression of nearby genes (eSTRs) were highly variable in the general population. Moreover, tissue-specific expression analysis showed that those highly variable STRs related genes presented the highest expression level in brain tissues, and enrichment pathways analysis found those STRs are involved in synaptic function-related pathways. Conclusion: Our study profiled the genome-wide landscape of STR using LRS and highlighted the highly variable STRs in the human genome, which provide a valuable resource for studying the role of STRs in human disease and complex traits. Keywords: short tandem repeats, long-read sequencing, highly variable STRs, TRcards, database, brain tissue, synaptic function Introduction Short tandem repeats (STRs) are abundant repetitive elements comprised of recurring DNA motifs of two–six bases. Due to their repetitive nature, STRs have the highest mutational rate in the genome and are typically polymorphic. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases ([78]Gymrek 2017; [79]Hannan 2018). STR expansions in the coding or non-coding regions are linked to more than 50 known disorders ([80]Depienne and Mandel, 2021). Many of these conditions affect the nervous system. Well-known examples of STR expansion diseases in protein-coding regions are the “polyglutamine” (PolyQ) diseases (e.g., Huntington disease and Spinocerebellar ataxia), caused by variable stretches of the repeated trinucleotide CAG. Non-coding repeat expansions are even more diverse and can occur in either the 5′ UTRs, introns, or 3′ UTRs of genes. Their impact strongly depends on the type, length, and location of the repeat motif within genes. Examples of these repeat disorders include Fragile X syndrome (FXS) caused by CGG repeats and Myotonic dystrophy (DM1) caused by CTG repeats ([81]Tang et al., 2017; [82]Trost et al., 2020; [83]Depienne and Mandel, 2021). Recently, more than 28,000 eSTRs in 17 tissues were identified to play a role in gene regulation by leveraging deep whole-genome sequencing (WGS) and gene expression data collected by the Genotype-Tissue Expression Project (GTEx), STRs for which the number of repeats was associated with the expression of nearby genes, termed expression STRs (eSTRs). Then, eSTRs were ranked with a statistical fine-mapping framework to prioritize potentially causal eSTRs and 5% of which were referred to as fine-mapped eSTRs (FM-eSTRs) ([84]Fotsing et al., 2019). It is becoming increasingly clear that STRs across the genome are likely to have widespread contributions to complex polygenic traits. In these cases, smaller expansions or contractions may subtly increase or decrease the risk for a trait and work together to modulate an individual’s disease risk ([85]Gymrek et al., 2016; [86]Fotsing et al., 2019; [87]Jakubosky et al., 2020). Genome-wide surveys of STRs in individual genomes have become feasible due to the development of high-throughput sequencing technologies. Most studies used whole-genome sequence data based on short-read sequencing (SRS) to genotype STRs ([88]Willems et al., 2014; [89]Tang et al., 2017; [90]Mousavi et al., 2019; [91]Trost et al., 2020; [92]Mitra et al., 2021). However, the intrinsic limitations of SRS prevent the comprehensive characterization of all STRs or the discovery of novel disease-relevant repeat expansions, which are longer than read length ([93]Gymrek, 2017; [94]Liu et al., 2020). Long-read sequencing (LRS) technologies offer a good solution to genome-wide STR analysis. Current LRS technologies, such as Pacific Biosciences sequencing and Oxford Nanopore Technologies (ONT) sequencing, have achieved reads longer than 10 kb on average, which have a high chance to cover whole tandem repeats, including flanking unique sequences ([95]Pollard et al., 2018; [96]Midha et al., 2019; [97]Amarasinghe et al., 2020; [98]Logsdon et al., 2020). LRS has recently been applied to genotype long and complex repeats, such as the C9orf72 GGGGCC expansion implicated in frontotemporal lobar degeneration and a complex pentamer repeat in SAMD12 implicated in myoclonus epilepsy ([99]Zeng et al., 2019; [100]Mitsuhashi and Matsumoto, 2020; [101]DeJesus-Hernandez et al., 2021). More human diseases caused by STR expansions have also been reported in recently published studies with the utilization of LRS ([102]Sone et al., 2019; [103]Tian et al., 2019; [104]Zeng et al., 2019; [105]Deng et al., 2020). The normal ranges of different STRs may vary significantly in the general population. Thus, the knowledge of the normal repeat ranges of STRs is critically important to determine that the pathogenicity of observed repeats in known STRs or to discover novel disease-relevant repeat expansions ([106]Liu et al., 2020). To the best of our knowledge, although there exist studies on detecting and characterizing STRs in human genomes using LRS on select small datasets, analysis at scale has not been reported ([107]Liu et al., 2020). Herein, we conducted a large-scale analysis of human STR variation by LRS in the Chinese population and developed a reference STR database, named TRcards, with 193 of the LRS dataset. Besides, we performed genome-wide profiling of STR variation in the human genome with LRS data, evaluated the variability of STR and characterized the highly variable STRs. Materials and Methods Participants A set of 193 unrelated Chinese was included in our study for ONT sequencing. Among all the individuals, 102 (52.85%) were males and 91 (47.15%) were females. The ages ranged from 26 to 85 years, with a median age of 50 years. This study was approved by the Ethics Committee of Xiangya Hospital, Central South University. All participants gave informed consent. Long-Read Whole-Genome Sequencing DNA samples sequenced in this study were isolated from whole blood. DNA samples of individuals were sequenced using a PromethION sequencer (Oxford Nanopore Technologies). Library preparation was carried out using a 1D Genomic DNA ligation kit (SQKLSK109) according to the manufacturer’s protocol. For each individual, one PRO-002 (R9.4.1) flow cell was used. PromethION data base-calling was performed using guppy v.3.3.0 (Oxford Nanopore Technologies), and only pass reads (Qscore ≥7) were used for subsequent analysis ([108]Sun et al., 2020). Sample LNT00178 was also sequenced with the Pacibio Sequel II platform. High molecular weight (HMW) DNA was extracted, and HiFi libraries were constructed using the SMRTbell Express Template Prep Kit v2 and SMRTbell Enzyme Clean Up Kit (PacBio) ([109]Du et al., 2021). Size selection was performed with SageELF and 15 kb fragments were chosen for sequencing with the Sequel II platform using 30 h movies. Then, the resulting raw subreads were converted to circular consensus sequencing (CCS) reads using the CCS v4.2 algorithm with–minPasses 3 –minPredictedAccuracy 0.99. Furthermore, HG002 with ONT and the corresponding PacBio CCS data were downloaded from [110]https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/Ashk