Abstract Schizophrenia is a neuropsychiatric disorder with heritability estimates between 60% and 80%. Although genome-wide association studies have identified many genetic loci linked to the disorder, most of these are noncoding variants whose functional impacts are not well understood. To bridge this gap, we prioritized potential functional variants linked to schizophrenia by utilizing a human brain epigenomic roadmap. We assessed the regulatory activity of these variants using an adapted STARR-seq screening method across four cell lines: Neuro-2a, SH-SY5Y, HEK-293T, and PC-12. Furthermore, we pinpointed candidate target genes through functional characterization and investigated their roles using zebrafish models. Our study identified 351 candidate single nucleotide polymorphisms (SNPs). Among these, 46 SNPs exhibited biased allelic enhancer activity, termed baaSNPs, with notable cell-type specificity. Chromatin interaction profiling and expression quantitative trait loci analyses linked these baaSNPs to 217 candidate target genes, and pathway enrichment analysis indicated that these genes are involved in critical neurological processes such as synaptic transmission and GABAergic signaling. One baaSNP in particular, rs13072690, showed regulatory effects across all examined cell lines and was associated with reduced expression of the PCCB gene in multiple brain regions. Heterozygous pccb knockout zebrafish exhibited abnormal behaviors, including hyperactivity, increased anxiety-like responses, and social deficits. In conclusion, our study demonstrates a valuable strategy for functionally annotating putative risk variants, complementing epigenetic approaches by providing insights into the regulatory potential of noncoding variants associated with schizophrenia. These findings offer promising candidates for further research into the underlying mechanisms of the disorder. graphic file with name 42003_2025_8456_Figa_HTML.jpg Subject terms: Functional genomics, Schizophrenia __________________________________________________________________ Functional screening of schizophrenia-associated variants using STARR-seq identifies cell-type-specific regulatory SNPs, highlighting the role of rs13072690; zebrafish models confirm behavioural effects, offering insights into noncoding variant functions in schizophrenia. Introduction Schizophrenia (SCZ) has a substantial genetic component, with heritability estimates ranging from 60%−80%^[56]1. Although genome-wide association studies (GWASs) have identified nearly 300 genetic loci associated with SCZ^[57]2–[58]7, most of these are in noncoding regions of the genome. This presents a challenge in determining which genes are functionally relevant to the disorder. Identifying causal variants and understanding their functional impacts are crucial for linking these genetic loci to the biological mechanisms underlying SCZ. While computational fine-mapping techniques can predict potential causal variants by analyzing linkage disequilibrium patterns^[59]8, they often lack functional validation and may produce different sets of candidates depending on the algorithm used^[60]9. Causal variants are thought to contribute to SCZ by altering gene expression, particularly through changes in transcription factor binding and the function of regulatory elements^[61]10. Increasing evidence suggests that disease-risk variants often reside within enhancer elements, where they can alter gene transcription via spatial enhancer-promoter interactions^[62]11–[63]13. Disruptions in gene expression play a critical role in the onset and progression of various diseases, including SCZ. Therefore, systematically identifying disease-risk single nucleotide polymorphisms (SNPs) that affect enhancer activity and mapping their target genes is essential for advancing our understanding of the molecular mechanisms underlying SCZ. Traditional methods for assessing the regulatory functions of individual candidate SNPs, such as dual-luciferase reporter assays and CRISPR-Cas9 techniques^[64]14,[65]15, offer high accuracy but are limited by their low throughput. These approaches are inefficient when scaling to hundreds of genetic loci^[66]16,[67]17, making them impractical for comprehensive genomic studies. Recently, high-throughput assays like self-transcribing active regulatory region sequencing (STARR-seq)^[68]18 and massively parallel reporter assays (MPRA)^[69]19 have emerged as powerful tools. These assays allow for the simultaneous evaluation of regulatory activity across large sets of DNA sequences, providing a more efficient and scalable approach for identifying functional variants^[70]18. For example, a recent study used a MPRA to examine fine-mapped SCZ GWAS variants in human neural progenitor cells, identifying 439 variants with allelic regulatory effects^[71]10. Despite these advancements, a comprehensive understanding of the biased allelic enhancer activity of SCZ-associated risk variants remains to be achieved. Regulatory elements exhibit significant cell type specificity^[72]20, indicating that the effects of functional variants may vary by cell type. To thoroughly identify potential functional variants in the brain, it is crucial to catalog the regulatory elements active in each brain cell type, considering both organismal and regional contexts. Such datasets are essential for understanding the functional relevance of genetic risk loci in the molecular pathogenesis of SCZ. Extensive research has mapped cell-type-specific gene regulatory landscapes in the human brain, particularly during early developmental stages^[73]21, using organoid culture systems^[74]22–[75]24 and induced pluripotent stem cell-derived models^[76]25,[77]26. Additionally, chromatin accessibility has been profiled in postmortem adult human brain tissue^[78]27–[79]30. These comprehensive datasets are valuable for identifying risk variants associated with SCZ through multi-omic approaches. In this study, we leveraged human brain epigenomic datasets and applied a multi-omics framework^[80]31 to predict risk variants associated with SCZ. We identified 351 candidate risk variants and employed a modified STARR-seq approach to systematically evaluate their regulatory activity across four cell lines: HEK-293T, SH-SY5Y, Neuro-2a, and PC-12. These cell lines were selected for their relevance to neuronal biology and experimental utility^[81]32–[82]35. To link SNPs with biased allelic enhancer activity (baaSNPs) to their target genes, we used two complementary genomic strategies: chromatin interaction profiling (Hi-C) and expression quantitative trait loci (eQTL) analysis. Through this valuable approach, we identified candidate target genes linked to baaSNPs associated with SCZ and further investigated their functions using zebrafish models. Results Identification of candidate risk variants using a multi-omic approach To capture a broad range of potential variants associated with SCZ, we first collected all SNPs with p-values <5 × 10^−7 from recent GWASs^[83]2–[84]7. We then included SNPs that co-localized with GWAS and eQTL signals from the Genotype-Tissue Expression (GTEx) project (version 8)^[85]36. To ensure comprehensive coverage, we added SNPs in strong linkage disequilibrium (r^2 > 0.8) with the top variants identified. This process yielded a total of 83,976 SCZ-associated SNPs. Using this extensive catalog, we implemented a tiered multi-omic strategy to predict candidate risk variants (Fig. [86]1A)^[87]31. This approach integrated data from bulk assay for transposase-accessible chromatin using sequencing (ATAC-seq)^[88]37, single-cell ATAC-seq (scATAC-seq, Fig. [89]1B)^[90]38, and high-throughput chromosome conformation capture (HiChIP) experiments^[91]39,[92]40 to systematically prioritize SNPs that are likely to impact gene regulation. Through this comprehensive approach, we identified 351 candidate risk regulatory variants associated with SCZ (Fig. [93]1C–D; Supplementary Data [94]1). Fig. 1. Identification of Candidate Risk Variants Associated with SCZ. [95]Fig. 1 [96]Open in a new tab A Schematic representation of the analytical framework used to prioritize SNPs and predict their functional significance. B Chromatin accessibility across various adult brain regions, revealed by single-cell ATAC-seq (scATAC-seq), highlights potential regulatory elements^[97]31 (C) A Manhattan plot displays the 351 candidate risk variants associated with SCZ identified through our multi-omic approach. D Overlap of these candidate risk variants across different GWAS were shown, indicating shared findings among studies. STARR-seq identifies SNPs with biased allelic enhancer activity To systematically assess the regulatory activity of these candidate SNPs, we adapted the STARR-seq method using synthesized oligonucleotides. We selected four cell lines for this analysis: HEK-293T, Neuro-2a, SH-SY5Y, and PC-12 cells. For each of the 351 candidate variants (Supplementary Data [98]1), we synthesized a 200-base-pair oligonucleotide centered on the reference and alternative alleles to evaluate functional activity. These oligonucleotides were cloned into the hSTARR-seq_ORI vector^[99]41 to create a library containing all SNPs. We introduced this library into each of the four cell lines. For each variant, we quantified both the plasmid library (input) and the RNA transcription products (output) using next-generation sequencing, followed by data analysis (Fig. [100]2A; Materials and Methods). Fig. 2. STARR-seq identification of SCZ-associated regulatory SNPs with biased allelic enhancer activity. [101]Fig. 2 [102]Open in a new tab A Schematic representation of the high-throughput STARR-seq workflow used to assess biased allelic activity of 351 candidate schizophrenia-associated variants. This process involves synthesizing oligonucleotides centered on each SNP, cloning them into the hSTARR-seq_ORI vector, and introducing them into four cell lines for analysis. B Volcano plots displaying the differences in regulatory activity for all SNP allele fragments across four cell lines: HEK-293T, Neuro-2a, SH-SY5Y, and PC-12. The fold change (FC) represents the expression change between the output (RNA transcripts) and input (plasmid library). SNPs showing significantly increased expression (log₂FC > 0.585, false discovery rate [FDR] < 0.05) are highlighted in red. Those with significantly decreased expression (log₂FC<−0.585, FDR < 0.05), identified as candidate silencers, are depicted in blue. All other SNP fragments, considered inactive, are shown in gray. C Heatmap illustrating biased allelic enhancer SNPs (baaSNPs) identified across the four cell lines. The full cell color represents the log₂ fold change (logFC), using a diverging color scale centered at 0 (blue for negative logFC, red for positive logFC, white for neutral), with values capped at −6 and +6. Statistical significance is indicated with asterisks (*p < 0.05, **p < 0.01, ***p < 0.001). To evaluate the regulatory activity of the SNP-containing fragments, we compared output reads to input reads using the voom method^[103]42. This analysis identified SNP fragments with potential enhancer activity—termed eSNPs—for either the reference or alternative allele. Specifically, we found 78 eSNPs in HEK-293T cells, 83 in Neuro-2a cells, 68 in SH-SY5Y cells, and 178 in PC-12 cells (log₂ fold change > 0.585, false discovery rate < 0.05; Fig. [104]2B; Supplementary Data [105]2). We further explored SNPs exhibiting biased allelic enhancer activity effects (baaSNPs) among the eSNPs. Using the mpralm method^[106]43, we analyzed changes in allelic expression ratios between the output and input libraries. This analysis identified 12 baaSNPs in HEK-293T cells, 10 in Neuro-2a cells, 14 in SH-SY5Y cells, and 24 in PC-12 cells (Fig. [107]2C; Supplementary Data [108]3). Notably, most baaSNPs (82.6%, n = 38) were cell-specific, with only eight loci shared across at least two cell types. The SNP rs13072690 was shared among all four cell types (Fig. [109]2C). Target gene prediction for baaSNPs using eQTL and Hi-C analyses To identify potential target genes for the baaSNPs, we employed an integrative approach that combined eQTL data with analyses of chromatin interactions. This strategy enabled us to leverage both gene expression associations and the spatial organization of the genome to predict candidate genes. We compared our baaSNPs with eQTLs detected in the adult brain, which is relevant for understanding the complex gene regulatory networks involved in neuropsychiatric disorders like SCZ^[110]44. Interestingly, only 38% of the baaSNPs exhibited eQTL signals in the adult brain. In total, we identified 53 candidate target genes associated with the baaSNPs. Specifically, this included 16 genes in HEK-293T cells, 11 genes in Neuro-2a cells, 18 genes in SH-SY5Y cells, and 27 genes in PC-12 cells based on STARR-seq data, with three genes—PCCB, PPP2R3A, and RP11-102M11.2—common across all four cell lines (Fig. [111]3A; Supplementary Data [112]4). To explore the biological roles of these genes, we conducted a Gene Ontology pathway enrichment analysis, using all genes with brain eQTLs as the background set. Our analysis revealed several significantly enriched pathways; notably, GABAergic synaptic transmission emerged as the most directly relevant to neuronal function and SCZ pathology. Additionally, other enriched processes point to broader impacts on neuronal development, signal transduction, and cellular architecture (Fig. [113]3B). Fig. 3. Prediction and functional characterization of target genes regulated by baaSNPs. [114]Fig. 3 [115]Open in a new tab A A circular diagram illustrates 53 candidate target genes predicted to be regulated by SNPs exhibiting biased allelic enhancer activity (baaSNPs) across four cell lines, based on expression quantitative trait loci (eQTL) analysis. The p-values shown are derived from the eQTL study. B Bar graphs display the most significantly enriched biological pathways among these potential target genes across the four cell lines, highlighting key neurological processes. C Chromatin loop data from neurons at three different developmental stages were utilized to map baaSNPs to their target genes, providing spatial context for gene regulation. D A Venn diagram depicts the overlap between genes associated with baaSNPs through eQTL analysis and those identified via Hi-C chromatin interaction data, indicating distinct but complementary sets of genes. E Results from a dual-luciferase reporter assay demonstrate the biased allelic enhancer activity of SNP rs3088186, confirming its functional impact. Bars represent the mean ± SEM. F An example locus of the MSRA gene illustrates that the baaSNP rs3088186 physically interacts with the MSRA promoter in neural progenitors (germinal zone), immature neurons (cortical plate), and adult neurons. Epigenetic annotation data were obtained from the UCSC Genome Browser. Recognizing that baaSNPs might influence genes through long-range interactions not captured by eQTLs, we utilized chromatin interaction data from the human brain. We analyzed chromatin loop data from neural progenitors (germinal zone), immature neurons (cortical plate), and adult neurons (Fig. [116]3C). This neuronal chromatin interaction analysis linked 46 baaSNPs to 168 protein-coding genes (Supplementary Data [117]5). Remarkably, only four genes overlapped between the genes assigned by Hi-C data and those assigned by eQTL data (Fig. [118]3D; Supplementary Data [119]6), indicating that these methods identify largely distinct gene sets. As an example of a gene interacting with a baaSNP, we focused on rs3088186 within the MSRA locus. Dual-luciferase reporter assays confirmed that this variant exhibited significant differences in regulatory activity between alleles, consistent with our STARR-seq results (p < 0.05, Fig. [120]3E). The variant is located within a neuronal-specific enhancer marked by H3K27ac, indicating its activity in neurons^[121]20 but not in other brain cell types. It resides within an intron of MSRA, a gene involved in cellular antioxidant defense and previously associated with SCZ^[122]45. Chromatin interaction studies revealed that rs3088186 interacts with the MSRA promoter in neural progenitors, immature neurons, and adult neurons (Fig. [123]3F). This enhancer-promoter interaction suggests that the variant affects MSRA expression through a neuronal chromatin loop, potentially contributing to SCZ pathology. SNP rs13072690 acts as a biased allelic enhancer regulating PCCB expression The STARR-seq assay demonstrated that the G allele of rs13072690 has a stronger enhancer effect compared to the A allele. To confirm this biased activity, we first performed dual-luciferase reporter assays in our original STARR-seq cell lines—Neuro-2a, SH-SY5Y, HEK-293T, and PC-12—which all showed significantly different regulatory activities between alleles (p < 0.05, Fig. [124]4B). Additionally, we extended our validation to three new cell systems—human microglial HMC3 cells, mouse microglial BV2 cells, and mouse cortical primary neurons—where we observed consistent biased allelic enhancer activity (p < 0.05, Fig. [125]4B). Despite technical challenges associated with lower transfection efficiency, particularly in primary neurons^[126]17,[127]46, these results collectively reinforce the robustness and reproducibility of our findings across distinct cellular contexts. Recognizing that PCCB is the most compelling target gene candidate for rs13072690 based on Open Targets Genetics data (Supplementary Data [128]7)^[129]47, we focused our investigation on the regulatory relationship between rs13072690 and PCCB (Fig. [130]4A). Chromatin interaction analyses indicated that the PCCB promoter interacts with rs13072690 in glial cells (Fig. [131]4A)^[132]48 suggesting a potential regulatory role for rs13072690 via chromatin interactions. Fig. 4. SNP rs13072690 functions as a biased allelic enhancer regulating PCCB expression. [133]Fig. 4 [134]Open in a new tab A This diagram illustrates how the biased allelic enhancer SNP (rs13072690) potentially regulates the expression of the distant gene PCCB. The aggregated single-cell ATAC-seq (scATAC-seq) tracks display chromatin accessibility signals from all cells of the specified cell type, while Hi-C loop data derived from microglia cells show the physical interactions between rs13072690 and the PCCB promoter. B Results from the dual-luciferase reporter assay demonstrate the biased allelic enhancer activity of rs13072690, confirming its functional impact on gene regulation. Bars represent the mean ± SEM. C A comparison of brain tissue eQTLs from GTEx (V8) shows the association between rs13072690 and PCCB expression across various brain regions. Further examination of eQTL associations from PsychENCODE revealed that the rs13072690-G allele is correlated with decreased PCCB expression in the adult brain (β = −0.11, false discovery rate = 8.1 × 10⁻¹¹; Supplementary Data [135]4). Additionally, data from GTEx (V8)^[136]36 confirmed that this association extends across various brain tissues, including the nucleus accumbens, frontal cortex, and putamen (Fig. [137]4C). In summary, our findings demonstrate that the rs13072690-G allele functions as a biased allelic enhancer that reduces PCCB expression and increases the risk of SCZ. Reduced pccb expression in adult zebrafish leads to hyperactivity and impaired social behaviors To investigate the role of the pccb gene, we generated a zebrafish knockout model using CRISPR/Cas9 technology. Through F₃ generation screening, we isolated a pccb mutant allele with a one-base-pair insertion causing a frameshift and premature termination of the protein at amino acid position 138 (Fig. [138]5A–C). Expression analysis revealed that pccb expression in the brain tissues of heterozygous (pccb^e3 ⁄ ⁺) zebrafish was significantly reduced, suggesting that the mutation led to nonsense-mediated decay of the transcript (Fig. [139]5D). The pccb^e3⁄⁺ zebrafish were viable and fertile, allowing for behavioral assessments in adulthood. Fig. 5. Generation of the pccb knockout mutants using CRISPR/Cas9 gene editing and behavioral assessment in zebrafish. [140]Fig. 5 [141]Open in a new tab A Schematic representation of the zebrafish pccb gene structure. The protein comprises an MmdA domain of acetyl-CoA carboxylase and a crotonase-like domain of the crotonase/enoyl-Coenzyme A (CoA) hydratase superfamily. A one-nucleotide insertion allele was generated by injecting a single guide RNA (sgRNA) targeting exon 3 of the pccb gene. The asterisk (“*“) indicates the predicted premature stop codon resulting from the frameshift mutation, leading to early truncation of the protein before the functional domains. B Sequencing results comparing wild-type (pccb^+/+) and mutants (pccb^e3/+ and pccb^e3/e3). C Predicted three-dimensional models of the pccb^+/+ and pccb^e3/e3 proteins generated using the SWISS-MODEL online tool. D Expression levels of pccb mRNA in brain tissues, indicating a significant reduction in the mutant zebrafish. E Schematic of the open field test setup and representative heatmaps illustrating movement patterns of zebrafish during the test. F Total distance moved during the open field test, showing increased locomotor activity in pccb^e3/+ zebrafish compared to wild-type controls. G Average swimming velocity in the open field test (pccb^+/+, n = 29; pccb^e3/+, n = 36), demonstrating hyperactivity in the mutant fish. H Ratio of distance traveled in the center zone to the total distance moved, reflecting anxiety-like behavior. I Ratio of time spent in the center zone to the total time (pccb^+/+, n = 20; pccb^e3/+, n = 25), further indicating increased anxiety levels in pccb^e3/+ zebrafish. Bars represent the mean ± SEM. Compared to wild-type controls (pccb^+/⁺), pccb^e3 ⁄ ⁺ zebrafish exhibited a significant increase in total distance moved and average swimming velocity (Fig. [142]5F–G). Additionally, they spent less time and traveled shorter distances in the central zone of the tank, indicating heightened anxiety levels (Fig. [143]5H–I). To further evaluate anxiety-related behaviors, we conducted the novel tank and novel object tests (Fig. [144]S1A, [145]S1F). In the novel tank test, there was no significant difference between pccb^e3 ⁄ ⁺ and wild-type zebrafish in the time or distance moved to the upper or bottom zones (Fig. [146]S1B–E). However, when a new object was introduced into the tank, pccb^e3 ⁄ ⁺ zebrafish spent less time and traveled shorter distances in the new object zone (Fig. [147]S1G–J), indicating elevated anxiety and reduced exploratory behavior toward novelty. Zebrafish are inherently social animals, and social withdrawal is a key feature observed in individuals with SCZ^[148]49. Therefore, we examined the social behaviors of pccb^e3 ⁄ ⁺zebrafish. In the mirror test, pccb^e3 ⁄ ⁺ zebrafish spent less time and traveled shorter distances in the mirror zone, where they could interact with their own reflections (Fig. [149]6A–C). In the shoaling behavior test, wild-type zebrafish spent more time in contact with each other and maintained closer inter-individual distances. In contrast, pccb^e3⁄⁺ zebrafish reduced the time spent in body contact and increased the time without contact, maintaining greater distances from one another (Fig. [150]6D–G). Additionally, in the social interaction test, pccb^e3 ⁄ ⁺ zebrafish moved significantly less distance and spent less time in the social zone (Fig. [151]6H–J). Collectively, these data suggest that pccb^e3 ⁄ ⁺ zebrafish with reduced pccb expression exhibit decreased interest in social interaction. Fig. 6. Decreased aggressiveness, reduced social cohesion, and impaired social interaction in pccb^e3/+ zebrafish. [152]Fig. 6 [153]Open in a new tab A Schematic of the aggressiveness test and representative heatmaps (n = 10), showing zebrafish behavior in response to their mirror image. B Ratio of distance traveled in the biting zone (adjacent to the mirror) to the total distance moved, indicating reduced aggressiveness in pccb^e3/+ zebrafish. C Ratio of time spent in the biting zone to the total time, supporting findings of decreased aggressiveness. D Schematic of the shoaling test with representative tracking plots. Each color represents an individual fish in the control group (pccb^+/+, n = 28; pccb^e3/+, n = 20), illustrating social cohesion. E Time spent near other fish without direct contact, reflecting social interest without physical engagement. F Time spent in direct body contact with other fish, indicating levels of social interaction. G Average inter-individual distance between fish, measuring social spacing and cohesion. H Schematic of the social preference test and representative heatmaps (n = 20), assessing the tendency of zebrafish to engage with conspecifics. I Ratio of distance traveled in the social zone (area adjacent to conspecifics) to the total distance moved, indicating social preference. J Ratio of time spent in the social zone to the total time, demonstrating impaired social interaction in pccb^e3/+ zebrafish compared to controls. Bars represent the mean ± SEM. Discussion GWASs have identified numerous genetic loci associated with SCZ, significantly advancing our understanding of the disorder’s genetic architecture^[154]2–[155]7. However, translating these findings into biological mechanisms and potential therapeutic targets remains a considerable challenge. The STARR-seq method has proven effective in refining GWAS-identified variants to a subset of functionally validated variants exhibiting differential allelic activity^[156]13. Despite recent progress in elucidating the regulatory mechanisms of certain SCZ risk variants^[157]10,[158]50,[159]51, the functional significance of many common variants linked to SCZ remains largely unexplored. Recent studies have employed MPRAs and complementary tools to identify regulatory variants associated with brain disorders such as SCZ. For instance, McAfee et al.^[160]10 analyzed 5,173 fine-mapped SCZ GWAS variants in primary human neural progenitors using the FINEMAP tool^[161]52 to pinpoint allelic regulatory effects, whereas our study adopted a tiered multi-omics strategy^[162]31 that predicted 351 potential functional risk variants. Only 67 variants overlapped between the two studies, and just five exhibited allelic regulatory activity in the MPRA screen. Among these, rs2905426 acted as an enhancer in our STARR-seq assay, although it was not classified as a baaSNP. These discrepancies likely stem from differences in variant-prioritization strategies, assay platforms, and cellular contexts. Similarly, Zeng et al.^[163]53 leveraged cell-type-specific chromatin accessibility QTL (caQTL) analysis in human induced pluripotent stem cell-derived excitatory neurons to investigate 19,893 potential risk variants and identifying 476 variants with detected allelic effects. Due to the absence of a detailed variant list in that study, a full meta-analysis was not feasible. However, this comparison emphasizes that despite methodological divergences, high-throughput approaches collectively contribute to refining our understanding of enhancer function in disease contexts. In response to previous work, our study provides an integrative perspective that complements these approaches by combining functional screening with multi-layered omics data. Our findings revealed 46 SNPs exhibiting biased allelic enhancer activity, demonstrating extensive cell specificity. Specifically, 38 loci showed activity in only one cell type, while eight loci exhibited biased activity across multiple cell lines. We prioritized 53 candidate target genes associated with baaSNPs across the four cell lines. These target genes were significantly enriched in the GABAergic signaling pathway, which plays a critical role in the pathophysiology of SCZ^[164]54,[165]55. This enrichment supports the validity of our multi-omics approach, suggesting that our method is effective in identifying biologically relevant genes implicated in SCZ. Therefore, our findings not only reinforce the connection between GABAergic dysfunction and SCZ but also demonstrate the usefulness of our strategy in uncovering meaningful biological insights. Of particular interest, SNP rs13072690 exhibited biased allelic enhancer activity in all seven cell types, highlighting its potential regulatory significance. This SNP is associated with PCCB expression. Specifically, eQTL data from both PsychENCODE and GTEx consistently showed that the rs13072690-G allele was associated with reduced PCCB expression in multiple brain regions. Previous studies have identified PCCB as a key gene implicated in various neuropsychiatric traits. Zhang et al. demonstrated that downregulation of PCCB in human forebrain organoids contributes to SCZ-related phenotypes, including increased neuroactivity and impaired neural synchronization^[166]54. Additionally, other studies have also connected PCCB to neuropsychiatric disorders, such as SCZ and bipolar disorder, reinforcing its potential involvement in these conditions^[167]56,[168]57. In our study, zebrafish with reduced pccb expression displayed hyperlocomotion, heightened anxiety-like behaviors, and social deficits, all of which are characteristic of SCZ phenotypes^[169]58,[170]59 These findings support the emerging view of PCCB as a key regulatory gene in the pathogenesis of neuropsychiatric disorders, reinforcing and extending prior work in this area. Notably, the G allele of rs13072690 exhibited stronger enhancer activity across four cell lines, as confirmed by dual-luciferase assays. The observed discrepancy in the direction of effect between the STARR-seq assay and eQTL data for rs13072690 may arise due to differences in the experimental contexts of each approach. Specifically, STARR-seq assesses enhancer activity by evaluating the SNP in synthetic reporter constructs positioned at specific loci, eQTL studies identify associations between SNPs and gene expression levels across various tissues. Previous studies, such as McAfee et al.^[171]10, have also highlighted that a portion of MPRA allelic regulatory activity and eQTL signals do not always align in terms of effect direction, suggesting inherent complexities in interpreting regulatory variants. These differences emphasize the need for integrating various experimental approaches to fully understand the functional implications of genetic variants. Traditionally, linking non-coding genetic variants to genes relies on eQTL resources. However, recent studies suggest that natural selection pressures may cause eQTL studies to identify different sets of variants than those found in GWAS^[172]60. Consistent with this observation, we found that 62% of our baaSNPs did not overlap with variants identified in adult brain eQTL studies. This limited overlap may be partly due to differences in cell types and developmental stages between eQTL studies, which often use heterogeneous adult brain tissues, and our experiments, which were conducted in specific cell lines. Since these cell lines do not fully represent the neuronal or glial environments of the brain, the observed lack of overlap could also reflect this difference in cellular context. Additionally, the relatively small sample sizes of current eQTL studies might contribute to this discrepancy. Therefore, a significant number of regulatory variants may exist beyond those identified by eQTL studies. By leveraging long-range chromatin interactions, we linked these baaSNPs to their target genes. These genes exhibited more extensive functional annotations compared to those identified through eQTL analysis. Specifically, genes identified through chromatin interaction analyses were involved in more distal regulatory interactions, consistent with the enhancer redundancy observed in mutation-intolerant, disease-associated genes^[173]61. This finding underscores the importance of integrating chromatin interaction data to fully elucidate gene regulatory networks. One limitation of Hi-C data is the potential for false negatives, which can result from factors in the library preparation process, such as the loop size. These factors can influence the detection of chromatin interactions, potentially contributing to the limited overlap observed with eQTLs. Overall, our study demonstrates that using STARR-seq to characterize GWAS variants can reveal regulatory variants with functional potential, which may not be detectable through traditional eQTL studies. While STARR-seq indicates that an allele can have a functional effect, it does not necessarily confirm that the effect occurs in the brain, offering a complementary type of information to eQTL studies. While STARR-seq is highly effective for validating the regulatory effects of genetic variants on a large scale, it has certain limitations including its episomal design does not replicate the native chromatin context, and it cannot directly identify target genes. Furthermore, because STARR-seq quantifies the expression of a reporter gene driven by candidate enhancers, the observed signals can reflect both transcriptional regulation and posttranscriptional processes. Consequently, differences in reporter activity may not solely indicate changes in transcription but might also be influenced by variations in RNA dynamics. These factors should be carefully considered when interpreting the regulatory mechanisms^[174]62. Another limitation of our current design is that our library did not include active (positive control) or inactive (negative control). Instead, we relied on technical replicates and cross-validation across multiple cell lines to assess the reproducibility and reliability of the observed regulatory effects. To address these limitations, further studies incorporating allele knock-in experiments would further bolster our understanding of the in vivo relevance of the identified regulatory variants. Such experiments could provide critical insights into gene-SNP relationships and enhance our comprehension of cell-type specificity by confirming that these regulatory effects occur within the native genomic context. Finally, SNPs located in putative silencer elements warrant further investigation of biased allelic activity to elucidate their contributions to gene regulation and SCZ pathogenesis. In conclusion, our study systematically elucidated the molecular genetic regulatory mechanisms of risk variants associated with SCZ across four distinct cell types. We hope that these findings will provide valuable biological insights into the pathogenesis of SCZ and potentially inform the development of targeted therapeutic strategies with further research. Methods Multi-omics framework for candidate risk variant selection in SCZ To identify candidate risk variants associated with SCZ, we first collected SNPs from recent GWASs^[175]2–[176]7 that had p-values <5 × 10^−7. This suggestive threshold was adopted to include more potential variants. We then incorporated SNPs that co-localized with GWAS and eQTL signals, using a posterior probability threshold of 0.01. eQTL analysis was performed using the GTEx (V8) database to assess the regulatory impact of these SNPs. Finally, we included SNPs in strong linkage disequilibrium (r² ≥ 0.8) with the previously identified SNPs. Using this comprehensive catalog of variants, we employed a multi-omics framework^[177]31 as described by Corces et al. This framework combined bulk assay for ATAC-seq^[178]37, scATAC-seq^[179]38, and HiChIP^[180]39,[181]40 enhancer connectome data. Initially, we mapped these SNPs to regions of chromatin accessibility identified from scATAC-seq data derived from brain tissues, focusing on variants located in open chromatin regions likely to be functionally relevant. We then refined our selection by identifying SNPs located in regions with potential regulatory interactions between enhancers and promoters, using HiChIP data and chromatin loop predictions. To further narrow down the candidates, we employed machine learning models based on the gkm-SVM^[182]63 method to predict SNPs that could disrupt transcription factor binding. These models were trained on chromatin accessibility data from 24 cell clusters derived from scATAC-seq experiments. We evaluated the allelic effects of the candidate SNPs on chromatin accessibility and transcription factor binding using three complementary methods—GkmExplain^[183]64, in silico mutagenesis^[184]65, and deltaSVM^[185]66—by analyzing the sequences of both alleles for each SNP. This rigorous filtering approach led to the identification of 351 candidate risk variants associated with SCZ (Supplementary Data [186]1). STARR-seq plasmid library construction For each selected SNP, we designed six 200-base-pair sequences corresponding to the reference and alternative alleles in three genomic contexts, positioning the variant at −50 bp, 0 bp, and +50 bp from the center. This design ensures that each SNP is represented by multiple fragments, reducing the likelihood of missing data from any single fragment. Oligonucleotides for these sequences were synthesized by Dynegene Technologies (Shanghai, China). Fifteen-base-pair adapter sequences were added to both the 5′ and 3′ ends to facilitate cloning, resulting in a total of 2,106 oligonucleotides representing 351 SNPs. The oligonucleotides were amplified using adapter primers synthesized by Sangon Biotech (Shanghai, China) and purified with AMPure XP beads (Beckman Coulter, USA). The purified oligonucleotides were inserted into a linearized hSTARR-seq_ORI vector (Addgene, Plasmid No. 99296, USA)^[187]41 using Age I-HF and Sal I-HF restriction enzymes (New England Biolabs, USA). DNA assembly was performed with the In-Fusion® HD Cloning Kit (Clontech, Cat. No. 639650, USA). The In-Fusion products were precipitated using sodium acetate (pH 5.2) and ethanol, then transformed into MegaX DH10B T1R Electrocompetent Cells (Invitrogen, Cat. No. C640003, USA). Cell lines and plasmid library transfection We employed a modified STARR-seq approach to assess the regulatory activity of SNPs across four cell lines: HEK-293T, SH-SY5Y, Neuro-2a, and PC-12. These lines were selected for their relevance to neuronal biology and experimental utility: HEK-293T cells for high transfection efficiency and robust reporter assays^[188]32; SH-SY5Y cells for neuronal characteristics^[189]33; Neuro-2a cells for studying neurodevelopmental signaling^[190]34; and PC-12 cells for differentiation and neuroendocrine studies^[191]35. Human embryonic kidney HEK-293T and human neuroblastoma SH-SY5Y cells were cultured in high-glucose Dulbecco’s Modified Eagle Medium (DMEM) (Procell, Cat. No. PM150210, China) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin. Mouse neuroblastoma Neuro-2a cells were cultured in Minimum Essential Medium (MEM) (Procell, Cat. No. PM150410, China) with 10% FBS and 1% penicillin-streptomycin. Rat pheochromocytoma PC-12 cells were cultured in RPMI-1640 medium (Procell, Cat. No. PM150110, China) supplemented similarly. All cell lines were maintained at 37 °C in a humidified incubator with 5% carbon dioxide. For dual-luciferase reporter validation experiments, human microglial HMC3 cells and mouse microglial BV2 cells were revived and passaged one to two generations before being seeded onto 48-well plates once in optimal growth condition. Mouse cortical primary neuron cells were purchased from Wuhan Procell Life Science & Technology Co., Ltd. Upon arrival, these cells were allowed to stabilize overnight, then passaged onto 48-well plates pre-coated overnight with 1% poly-D-lysine and washed three times with sterile PBS prior to cell seeding. All cell lines, including Neuro-2a, SH-SY5Y, HEK-293T, PC-12, HMC3, BV2, and mouse cortical primary neurons, were cultured to logarithmic growth phase, reaching a density of 2 × 10^7 cells on the day of transfection. Transfections were performed using Xfect™ Transfection Reagent (Takara, Cat. No. 631317, China) under optimized conditions established from preliminary experiments. The purified plasmid DNA from the constructed library was transfected into the cells, and each experiment was conducted in triplicate to ensure reproducibility. STARR-seq sequencing library construction Before cell transfection, the plasmid library was used to construct the input sequencing library. Specifically, 100 ng of the plasmid library was amplified using human STARR-seq-specific primers (98 °C for 45 s followed by 15 cycles of 98 °C for 15 s, 65 °C for 30 s, 72 °C for 70 s). The purified products were then subjected to two additional rounds of PCR with sample index primers. High-fidelity KAPA HiFi HotStart ReadyMix (Roche, Cat. No. KK2601, USA) was used in all PCR steps to ensure accurate amplification. The final products, representing entire regulatory elements, were amplified and sequenced with full coverage using paired-end 150 bp reads on the Illumina NovaSeq 6000 platform. After transfection, total RNA was extracted using the E.Z.N.A.® Total RNA Midiprep Kit (Omega Bio-Tek, Cat. No. R6664, USA). Polyadenylated RNA was isolated using VAHTS mRNA Capture Beads (Vazyme, Cat. No. N401, China) and reverse-transcribed with the HiScript III 1st Strand cDNA Synthesis Kit (Vazyme, Cat. No. R312-02, China). The resulting cDNA was treated with RNase A and purified using AMPure XP beads. Two rounds of PCR amplification were performed, with products purified using Agencourt SPRIselect Reagent (Beckman Coulter, Cat. No. [192]B23318, USA). Three biological replicates were prepared for both input and output sequencing libraries, which were then sequenced using paired-end 150 bp reads on the Illumina NovaSeq 6000 platform. STARR-seq data analysis Sequencing data were processed using the DADA2 (v1.26) pipeline to generate amplicon sequence variant (ASV) tables at single-nucleotide resolution. Raw sequencing reads were first subjected to quality filtering and trimming based on sequence quality scores. The DADA2 algorithm then learned sequencing error rates directly from the dataset and used these error models to infer true biological sequence variants, distinguishing even single-nucleotide differences. Paired-end reads were subsequently merged to reconstruct full-length amplicon sequences, followed by detection and removal of chimeric sequences. Finally, an ASV abundance table was constructed, detailing the accurate count-level abundance of each unique sequence variant across all samples. The count-level abundance data were saved for further analysis (Fig. [193]S2). Enhancer activity analysis followed the method described by Tippens et al.^[194]67. Raw data were processed using the “voom” function from the R limma package (v3.40.6)^[195]68. Input and output samples were treated as separate experimental conditions, and eSNPs were identified based on significantly higher expression in the output compared to the input. A false discovery rate (FDR)-adjusted p-value <0.05 and a fold change >1.5 (log₂FC > 0.585) defined eSNPs (Supplementary Data [196]2). SNPs showing significantly reduced expression (FDR < 0.05, log₂FC<−0.585) were classified as candidate silencer SNPs. To identify baaSNPs among the eSNPs, differential allelic regulatory activity was calculated using the mpra package (v1.14.0)^[197]43. The analysis employed the ‘mpralm()‘ function, which uses a linear model to measure differential regulatory activity between two alleles. This provided summary statistics for each variant, including t-statistics, p-values, log fold changes, B-statistics, and average expression (Supplementary Data [198]3). Dual-Luciferase Reporter Assay To validate enhancer activity, 200 bp fragments containing the SNPs were cloned into a KpnI-Xhol linearized pGL3-promoter luciferase vector (Promega, Cat. No. E1751, USA) following the manufacturer’s instructions. The constructs were co-transfected with a Renilla luciferase plasmid into cells using the Promega transfection reagent, with each construct transfected in triplicate. Forty-eight hours post-transfection, luciferase and Renilla activities were measured using the Dual-Luciferase Reporter Assay System (Beyotime, Cat. No. RG027, China). Luciferase activity was normalized to Renilla activity, and relative luciferase signals were compared to the pGL3-Promoter vector control. eQTL analysis Expression quantitative trait loci (eQTL) datasets from the adult dorsolateral prefrontal cortex (DLPFC; n = 1387) were obtained from Wang et al.^[199]44. We integrated these with our baaSNPs by matching variant information (chromosome, position, rsID), annotating all data to the hg38 reference genome. For variant-level overlap analysis, the proportion of eQTL overlap was calculated by dividing the number of baaSNPs matching eQTLs by the total number of baaSNPs. The genomic coordinates of the baaSNPs were overlapped with associated eQTL genes using the ‘findOverlaps()‘ function in the GenomicRanges Bioconductor package^[200]69. The number of genes and loci involved in each overlap was recorded. Gene Ontology (GO) enrichment analysis for these genes was performed using WebGestalt^[201]70, with results considered significant at p < 0.01. In our analysis, we used all genes with brain eQTLs as the background gene set for WebGestalt. The GO analysis provided the enrichment ratio associated with each GO term. Assigning genes to baaSNPs using Hi-C data To assign genes to baaSNPs using long-range chromatin interaction data, we filtered Hi-C loops from three datasets: germinal zone, cortical plate, and adult neurons^[202]48,[203]71. We overlapped the coordinates of baaSNPs with the non-promoter anchors of these loops to identify variants interacting with promoters through chromatin loops. The resulting SNP-gene pairs were filtered to include only protein-coding genes with official gene symbols, resulting in a total of 168 genes (Hi-C genes). Epigenetic annotation We used H3K27ac peaks from sorted brain cells as cell-type-specific adult brain enhancers^[204]20. These enhancers were utilized to analyze the epigenetic profiles of the baaSNPs. Overlaps between baaSNPs and each enhancer set were assessed using the ‘findOverlaps()‘ function from the GenomicRanges package in Bioconductor. pccb knockout zebrafish Wild-type zebrafish (AB strain) were obtained from the China Zebrafish Resource Center (Wuhan, China) and maintained under standard conditions with a 14 h light/10-h dark cycle at 28.5 °C. All animal procedures were reviewed and approved by the Institutional Animal Care and Use Committee of Qingdao University and conducted in accordance with the Guide for the Care and Use of Laboratory Animals published by the U.S. National Institutes of Health. We have complied with all relevant ethical regulations for animal use. To investigate the role of the pccb gene in regulating behavior, we generated a zebrafish knockout model using CRISPR/Cas9 technology. We microinjected 600 pg of Cas9 mRNA and 80 pg of guide RNA (gRNA) targeting the third exon of the pccb gene into one-cell-stage fertilized eggs. The gRNA sequence (5′-GGGACGGATCAATGGCAGAC-3′) was synthesized by Genescript (Nanjing, China) and validated via PCR and Sanger sequencing using primers 5′-CTCCAAACACTGTGAAGTCCTG-3′ and 5′-CGTATGCACCAGACGCCTTC-3′. Heterozygous mutant adult zebrafish (pccb^e3/+) and their wild-type siblings (pccb^+/+) were used for subsequent phenotypic analyses. Analysis of pccb gene expression and behavioral assessment in zebrafish Brain tissue from 6-month-old zebrafish was collected for RNA extraction. Expression of pccb was analyzed by reverse transcription quantitative PCR (RT-qPCR) using β-actin as the control. Primers for pccb were 5′-GAAAGCAGAACCAAGCGGAG-3′ (forward) and 5′-CTCTAAAGCAGGGGTTGGTCT-3′ (reverse). RNA quality was assessed using a NanoDrop spectrophotometer, and relative quantification was performed using the comparative Ct (2 − ΔΔCT) method. Behavioral tests were conducted on 6-month-old zebrafish. In the open field test, individual zebrafish were placed in a cylindrical tank divided into central and peripheral zones^[205]72, and swimming behavior was recorded for 30 min. In the mirror biting test, a mirror was introduced into the tank, and interactions with their reflections were observed for 15 min^[206]73. The shoaling test involved placing five zebrafish in a tank to assess social behavior. Social interaction and novel object exploration were evaluated in cross-shaped and rectangular tanks^[207]74, respectively, while the novel tank test assessed anxiety-related behaviors^[208]75. Data were analyzed using GraphPad Prism 9 software. Comparisons between pccb^e3/+ and pccb^+/+ zebrafish were made using unpaired t-tests, with significance set at p < 0.05. Statistics and reproducibility All data are presented as mean ± SEM, with individual data points shown in boxplots or dot plots. Statistical comparisons were performed using Student’s t-test or Mann–Whitney U test, depending on normality. A p-value  <  0.05 was considered significant, and exact p-values are reported. STARR-seq analyses used limma-voom and mpralm to identify eSNPs and baaSNPs. Luciferase assays were performed in triplicate per condition. Zebrafish behavioral tests included 10–36 individuals per group; exact sample sizes are provided in figure legends. All experiments were independently replicated at least three times. Reporting summary Further information on research design is available in the [209]Nature Portfolio Reporting Summary linked to this article. Supplementary information [210]Transparent Peer Review file^ (1MB, pdf) [211]Supplementary Information^ (5.5MB, pdf) [212]42003_2025_8456_MOESM3_ESM.pdf^ (584.2KB, pdf) Description of Additional Supplementary Files [213]Supplementary Data 1-7^ (1MB, xlsx) [214]Reporting Summary^ (2.1MB, pdf) Acknowledgements