Abstract Depression is the most prevalent mental disorder with substantial morbidity and mortality. Although genome-wide association studies (GWASs) have identified multiple risk variants for depression, due to the complicated gene regulatory mechanisms and complexity of linkage disequilibrium (LD), the biological mechanisms by which the risk variants exert their effects on depression remain largely unknown. Here, we perform a transcriptome-wide association study (TWAS) of depression by integrating GWAS summary statistics from 807,553 individuals (246,363 depression cases and 561,190 controls) and summary-level gene-expression data (from the dorsolateral prefrontal cortex (DLPFC) of 1003 individuals). We identified 53 transcriptome-wide significant (TWS) risk genes for depression, of which 23 genes were not implicated in risk loci of the original GWAS. Seven out of 53 risk genes (B3GALTL, FADS1, TCTEX1D1, XPNPEP3, ZMAT2, ZNF501 and ZNF502) showed TWS associations with depression in two independent brain expression quantitative loci (eQTL) datasets, suggesting that these genes may represent promising candidates. We further conducted conditional analyses and identified the potential risk genes that driven the TWAS association signal in each locus. Finally, pathway enrichment analysis revealed biologically pathways relevant to depression. Our study identified new depression risk genes whose expression dysregulation may play a role in depression. More importantly, we translated the GWAS associations into risk genes and relevant pathways. Further mechanistic study and functional characterization of the TWS depression risk genes will facilitate the diagnostics and therapeutics for depression. Subject terms: Genetics, Depression Introduction Depression is a complex and heterogeneous mental disorder characterized by depressed mood, loss of interests, appetite and sleep disturbances, cognitive impairments, feelings of worthlessness and hopelessness^[36]1,[37]2. Depression has a high global prevalence (~4.7%) and over 298 million of the global population were affected by depression^[38]3–[39]5. Females were more likely (about twice) to develop depressive symptoms than males^[40]6. As depression has a high prevalence and is accompanied with substantial morbidity and mortality, it becomes a leading cause of disability worldwide and a major contributor to global disease burden (e.g. the economic burden of depression was estimated about $210.5 billion in US in 2010^[41]4,[42]7). To date, the etiology of depression remains largely unknown. However, accumulating evidence indicate that depression was caused by a combination of genetic and environmental factors. Twin study has estimated the heritability of depression to be ~37%^[43]8, indicating the important role of genetic component in the development of depression. In the past decade, we have witnessed the rapid progress of genetic study of depression. Since CONVERGE consortium identified two genome-wide significant risk loci for depression in 2014^[44]9, multiple exciting findings have been reported by genome-wide association studies (GWASs)^[45]10–[46]13. Recently, Howard et al. reported the largest GWAS of depression and identified over 100 risk loci that reached genome-wide significance level^[47]14. Despite the fact that GWASs of depression have made great progress in recent years and over 100 depression risk loci have been identified by GWASs, mechanistic investigations and biological interpretations lag far behind. For most of the risk loci, the implicated genes and corresponding mechanisms remain largely unknown. In traditional GWASs, the gene (or genes) that located the nearest to the most significant risk variant was usually assigned as the potential candidate gene. However, due to the complexity of LD (each risk locus usually contains several genes that were in high linkage disequilibrium (LD)) and gene regulation (genetic variants may regulate distal genes rather than the nearest gene through affecting chromosomal conformation change), the nearest gene may not necessarily represent the causal gene by which the identified GWAS risk variants exert their effects on depression^[48]15. As the vast majority of depression risk variants identified by GWASs were located in non-coding regions, it is possible that these risk variants confer depression risk through modulating gene expression rather than altering protein sequences or structures^[49]16. Transcriptome-wide association study (TWAS) is a powerful approach aimed at identifying risk genes whose expression perturbations may confer disease susceptibility. Through integrating expression quantitative loci (eQTL) results with GWAS associations, TWAS could identify genes whose genetically regulated expression may be associated with diseases risk^[50]17. This method leverages a relatively small set of reference individuals with expression and genotype measured data to impute the expression-trait association statistics from GWAS summary data^[51]18. In addition to prioritizing putative target genes at genome-wide significant loci, TWAS provides an opportunity to detect genes with small effect sizes and located in regions that do not contain genome-wide significant variants^[52]19. Furthermore, the susceptibility genes identified by TWAS can more accurately inform follow-up experimental validation and potential treatment strategies. In this study, we carried out a TWAS of depression by integrating brain eQTL data (from the dorsolateral prefrontal cortex (DLPFC)) of 1003 subjects (including the CommonMind Consortium (CMC)^[53]20 and the second phase of the BrainSeq Consortium (BrainSeq2)^[54]21) and the largest depression GWAS (including 246,363 cases and 561,190 controls)^[55]14. We identified 53 transcriptome-wide significant (TWS) depression risk genes that reached Bonferroni-corrected significance level. We subsequently performed conditional analysis of all significant TWAS associations to identify independent associations (i.e. the driven genes at each risk locus). Pathway and gene ontology analyses of the TWS genes identified relevant pathways, including formation of fibrin clot, female pregnancy and peripheral axonal degeneration pathways. Finally, we compared the expression level of the TWS depression risk genes in brains of depression cases and controls using RNA-sequencing (RNA-seq) expression data. Overall, our findings highlight the power of TWAS in identifying depression risk genes with small effect size and provide testable targets for further functional validation of depression. Future mechanistic investigations and functional experiments of the TWS depression risk genes will provide pivotal information for the diagnostics and therapeutics of depression. Methods and materials GWAS meta-analysis The summary statistics from the largest GWAS of depression^[56]14 were used in this study. To identify the common risk variants for depression, Howard et al. performed a genome-wide meta-analysis through combining three large-scale depression cohorts (including 23andme, PGC2 and UK Biobank)^[57]10,[58]11,[59]22 (a total of 246,363 cases and 561,190 controls, after excluding overlapping samples) and identified 102 independent genetic variants associated with depression^[60]14. Detailed information about the participants ascertainment, genotyping, quality control and statistical analysis can be found in the original study^[61]14. The summary statistics for the meta-analysis of depression in UK Biobank and PGC_139k cohorts were downloaded from the Edinburgh data share center (10.7488/ds/2458)^[62]14. We then conducted a meta-analysis by combining summary statistics from above two cohorts and 23andMe^[63]10 using PLINK^[64]23 software and more detailed procedure of fixed-effect meta-analysis has been previously described in study of Li et al.^[65]12. We obtained the depression GWAS summary statistics of 23andme under a data transfer agreement^[66]12. Transcriptome-wide association analysis We performed a TWAS through integrating the depression GWAS summary statistics and two sets of eQTL data (i.e. CMC^[67]20 (N = 452) and BrainSeq2^[68]21 (N = 551), respectively). The CMC SNP-expression weights (The SNP-expression weights represent the correlations between SNPs and gene expression in the reference panel while accounting for LD and were computed from different linear models (including BLUP, BSLMM, LASSO, Elastic Net and top SNPs)) were downloaded directly from the FUSION/TWAS website ([69]http://gusevlab.org/projects/fusion/). The BrainSeq2 SNP-expression weights were obtained from the Lieber Institute for Brain Development (LIBD) browser ([70]http://eqtl.brainseq.org/phase2/). Please refer to the original papers for further details on the sample collection, RNA extraction and sequencing, genotyping and statistical analysis^[71]20,[72]21. The SNP-expression weights (i.e. expression weights) were derived using the default processing pipelines described in FUSION ([73]http://gusevlab.org/projects/fusion)^[74]17. The TWAS was performed using the FUSION software with default settings^[75]17 and a strict Bonferroni-corrected study-wise P threshold (i.e. P = 3.95 × 10^−6 (0.05/12,647) (total number of genes across panels) was used in this study. TWAS analysis utilizes several regularized linear models (including BLUP, LASSO and elastic net) and an additional Bayesian linear mixed model (BSLMM) to evaluate expression imputation. Furthermore, FUSION performs a fivefold cross-validation for each of the desired models to determine which model is the best. For a given gene, SNP-expression weights in the cis-locus were computed using the best prediction model and FUSION typically restricts the cis-locus to 500 kb boundary on either side of the gene. The TWAS calculated Z-score results were used to assess the association between gene and depression and the absolute value of the Z-score reflects the association strength between implicated genes and disease. More detailed information about the principle of FUSION, statistical model and Z-score calculation can be found in the original paper^[76]17. Transcriptome-wide association analysis using SNP-expression weights from PsychENCODE We further performed a TWAS using the SNP-expression weights data from PsychENCODE^[77]24. Briefly, TWAS was performed using the FUSION package^[78]17 ([79]https://github.com/gusevlab/fusion_twas) as above described, with the use of SNP-expression weights from PsychENCODE (1321 unique individuals). Detailed information about PsychENCODE was provided in PsychENCODE website ([80]http://resource.psychencode.org/) and related publication^[81]17. Transcriptome-wide association study in the Asian dataset To further explore the ethnic difference for the TWAS results in depression across populations, we carried out a depression TWAS in the Asian population by integrating eQTL data from lymphoblastoid cell lines of 162 samples (including 80 Han Chinese from Beijing and 82 Japanese in Tokyo, Japan populations)^[82]25 and the Chinese GWAS summary statistics (including 5303 women depression and 5337 controls) from the CONVERGE consortium^[83]9. More detailed information about eQTL and GWAS datasets can be found in the original studies^[84]9,[85]25. Defining of genes implicated in depression GWAS We first extracted the genomic coordinates of all TWS genes (based on hg19). We then compared the genomic locations of the TWS genes with genes identified by Howard et al.^[86]14 (i.e. gene located in the 102 depression risk loci identified by Howard et al.). Genes that overlap with any risk locus defined by Howard et al. were considered as genes implicated in depression GWAS. Conditional and joint analysis Conditional and joint analysis (based on genes rather than SNPs) were performed for genome-wide significant (Bonferroni-corrected) TWAS signals using FUSION^[87]17. The joint and conditional tests aim to evaluate the GWAS association signal after removing expression association from TWAS (i.e., to investigate if the GWAS signals are still significant after removing the expression association from TWAS). To evaluate the joint/conditional gene model, marginal association statistics (i.e., the main TWAS results) and a correlation/LD matrix are required. Each depression GWAS SNP association was conditioned on the joint gene model (one SNP at a time). The permutation test was used in conditional and joint analysis, with a maximum of 100,000 permutations and an initiate permutation P-value threshold of 0.05 for each feature. “FUSION.post_process.R” script was used for post-processing and generating multiple conditional output plots along with summary statistics. Pathway analysis Considering the correlation (i.e., co-expression) of the TWAS implicated genes may affect the independence of the Bonferroni-correction assumption, we tend to relax the Bonferroni-corrected P threshold instead of stringent Bonferroni-corrected standard. Therefore, a relaxed Bonferroni significance threshold (estimated as P = 7.91 × 10^−6 (0.10/12,647)) was used for pathway analyses in our analysis (which allows for more genes to include in the pathway enrichment analysis). GeneNetwork v2.0 ([88]https://genenetwork.nl), which uses gene co-regulation to predict pathway membership and HPO term associations^[89]26 was used for pathway analysis. By integrating 31,499 public RNA-seq samples from a wide range of tissues and cell types, GeneNetwork generated multiple pathways for pathway analysis^[90]26. As genes are known to cause a particular disease or disease symptom tend to have similar molecular functions or are involved in the same pathway or biological processes^[91]27,[92]28, it’s possible for GeneNetwork to accurately predict gene functions and to prioritize candidate disease genes with high accuracy. Agnostic analyses of pathways in databases (including Gene Ontology (GO) and Reactome) were done to identify special pathways relevant to depression. More detailed information about the principle of GeneNetwork such as principal component analysis (PCA), co-regulation scores calculation, P-value calculation and statistical analyses can be found in the original study^[93]26. Expression analysis of TWAS significant genes in depression cases and controls TWAS identified a total of 53 genes that showed significant association with depression (after correcting for Bonferroni test). To explore whether the expression levels of depression-associated risk genes identified by TWAS were dysregulated in depression cases compared with controls, we obtained expression data (based on RNA-seq) of brain tissues (only the expression data from the DLPFC were used for further analysis) from three datasets, including [94]GSE102556 (26 depression cases and 22 controls)^[95]29, [96]GSE101521 (30 depression cases and 29 controls)^[97]30 and [98]GSE80655 (23 depression cases and 24 controls)^[99]31. The same processing procedure was conducted for quality control, alignment and gene-expression quantification in three RNA-seq datasets. Briefly, Trimmomatic^[100]32 was utilized to examine the sequencing quality and trim reads. The clean paired-end reads were then aligned to the human reference genome (GRCh38) by using Hisat2^[101]33 and gene-level reads counts were quantified as transcripts per million (TPM) with featureCounts^[102]34. Protein-coding genes with average TPM ≥ 1.0 were extracted and differentially expressed genes (DEGs) were identified (based on read counts using likelihood ratio test (LRT)) for each of three datasets using DESeq2^[103]35 R package. More detailed information about sample collection, sample diagnose and RNA sequencing can be found in the original studies^[104]36. Spatio-temporal expression pattern analysis To explore the spatio-temporal expression pattern of the depression candidate genes identified by TWAS analysis in human brain, we downloaded expression data (based on RNA-seq) from the Allen Institute for Brain Science ([105]http://www.brainspan.org/)^[106]37. For a specific brain developmental stage, the mean expression level of all the genes in a geneset was represented as the expression level of the geneset at this stage. The gene-expression level was measured by RPKM (read per kilobase per million mapped reads) and two genesets implicated by TWAS analysis were used in this study. Background genes were extracted from a previous study^[107]38. Cell-type-specific expression analysis of TWS depression genes The cell-type-specific expression pattern data were obtained from a previous study^[108]39. Briefly, Skene et al. carried out a comprehensive single-cell analysis of mouse brain tissues (including the hippocampus, neocortex, striatum and etc). A total of 9970 cells were sequenced and 24 cell types were identified by Skene et al. Skene et al. then calculated the specificity score of each gene in each cell type (a higher specificity score indicates a higher specificity of gene expression in a specific cell type). We firstly converted candidate human genes into mouse orthologs. We then extracted the specificity score for each gene in each cell type. We set a cutoff value of the specificity score to 0.1, and we calculated the number of genes with a specificity score greater than 0.1 in each cell type. All the analyses were performed with R software and more details about the single-cell data could be found in the original paper^[109]39. Spatio-temporal expression pattern analysis of TWS genes in the human brain We used the expression data from the Allen Institute for Brain Science ([110]http://www.brainspan.org/)^[111]37 for spatio-temporal expression analysis. Gene-expression values of the TWS risk genes in the prefrontal cortex (PFC) (N = 42) were downloaded and transformed as previously described^[112]40. Tissue-type-specific expression analysis of TWS depression genes To explore the expression pattern of TWS genes in human tissues, we investigated their expression level in diverse human tissues using the Genotype-Tissue Expression (GTEx) project ([113]http://gtexportal.org/)^[114]41, which includes expression data in 54 human tissues. Detailed information about the GTEx (e.g. sample source or size, gene-expression normalization) can be found in original publication^[115]41 and the GTEx website. Results TWAS identifies 53 susceptibility genes for depression To identify genes associated with depression, we performed a TWAS by integrating two gene-expression reference panels (i.e. CMC and BrainSeq2) and summary-level association data from the largest depression GWAS meta-analysis so far (a total of 807,553 individuals). We performed TWAS using the FUSION software (Methods)^[116]17. Briefly, this approach uses GWAS summary statistics and gene-expression panels with reference LD to estimate the association between the cis-genetic components of gene expression and depression risk. In total, we identified 53 TWS depression-associated genes (summed across two expression reference panels) after Bonferroni correction, including 33 significant genes detected using the CMC dataset (Supplementary Table [117]1 and Fig. [118]1a) and 27 genes detected using the BrainSeq2-DLPFC dataset (Supplementary Table [119]2 and Fig. [120]1b). Among the TWS associations (53 genes), 30 of the genes were implicated in the original depression GWAS and the remaining 23 genes did not fall within previous GWAS loci (definition of genes implicated in depression GWAS can be found in methods). Notably, we found that 7 out of 53 genes reached transcriptome-wide significance level in both two expression panels (CMC and BrainSeq2), including B3GALTL, FADS1, TCTEX1D1, XPNPEP3, ZMAT2, ZNF501 and ZNF502 (Table [121]1 and Fig. [122]1). Among the seven TWS genes, upregulation of XPNPEP3 may be associated with depression risk as it has a positive Z-score (Z > 0 suggests that the gene is predicted to be upregulated in cases compared with controls). However, downregulation of other genes may increase risk of depression (Z < 0). These data suggested that the TWAS significant genes may have a role in depression risk. In addition, the overlapping genes identified in both expression panels represent plausible candidate genes for depression as these genes showed TWS association with depression in two independent eQTL datasets. Fig. 1. Manhattan plot of the TWAS results for depression (246,363 cases and 561,190 controls). [123]Fig. 1 [124]Open in a new tab a Manhattan plot of TWAS results in CMC dataset (a total of 33 significant genes detected). b Manhattan plot of TWAS results in BrainSeq2 dataset (a total of 27 significant genes detected). Each point represents a gene, with physical genomic position (chromosome, basepair) plotted on the x-axis and association P-value (the −log[10] (FUSION P-value)) between gene expression in the DLPFC and depression plotted on the y-axis. Bonferroni-corrected significant genes are labeled and the significance threshold of P = 7.91 × 10^−6 was used. Seven genes reached transcriptome-wide significance in both two expression panels (including B3GALTL, FADS1, TCTEX1D1, XPNPEP3, ZMAT2, ZNF501 and ZNF502) and are highlighted in red colour. Table 1. Significant TWAS genes both in CMC and LIBD datasets for depression. Gene Region CMC dataset BrainSeq2 dataset Best eQTL TWAS.Z TWAS.P Best eQTL TWAS.Z TWAS.P TCTEX1D1 chr 1:67218142–67244470 rs512691 −5.81 6.26E-09 rs10493416 −5.49 3.99E-08 ZNF501 chr 3:44771088–44778575 rs10514710 −4.69 2.68E-06 rs10514710 −4.79 1.68E-06 ZNF502 chr 3:44754135–44765323 rs10514710 −4.93 8.21E-07 rs10514710 −4.98 6.50E-07 ZMAT2^a chr 5:140078265–140086248 rs801183 −5.24 1.65E-07 rs3756341 −5.02 5.28E-07 FADS1^a chr 11:61567099–61596790 rs174568 −4.82 1.42E-06 rs174566 −5.12 3.08E-07 B3GALTL chr 13:31774073–31906413 rs4065552 −4.79 1.70E-06 rs9543390 −5.52 3.44E-08 XPNPEP3 chr 22:41253081–41363838 rs138354 5.93 2.97E-09 rs2899341 5.18 2.26E-07 [125]Open in a new tab ^aIndicates that the TWAS gene was not implicated in the original depression GWAS. Expression signals driven the depression TWAS hits TWAS usually identifies several TWS genes for each of the risk locus. To detect if the identified transcriptome-wide association signals were conditionally independent and to explore whether the GWAS signals remain significant after removing the expression weights from TWAS, we conducted conditional and joint analyses. Our conditional analyses identified several independent TWS genes from both of the brain eQTL datasets (Fig. [126]2). We found that TCTEX1D1 explains most of the signal at its locus in both CMC dataset (rs10789214 lead SNP P[GWAS] = 7.53 × 10^−8, conditioned on TCTEX1D1 lead SNP P[GWAS] = 7.83 × 10^−3) (Fig. [127]2a) and BrainSeq2 dataset (rs10789214 lead SNP P[GWAS] = 7.53 × 10^−8, conditioned on TCTEX1D1 lead SNP P[GWAS] = 2.12 × 10^−2) (Fig. [128]2b). Similarly, conditional analyses showed that ZNF445 explained most of the signal at its locus in CMC dataset (Fig. [129]2c) and ZNF502 explained most of the signal at its locus in BrainSeq2 dataset (Fig. [130]2d). In addition, we also found that FADS1 (Fig. [131]2e, f), B3GALTL (Fig. [132]2g, h) and XPNPEP3 (Fig. [133]2i, j) genes explained most of the variance at their loci in both CMC and BrainSeq2 datasets. Collectively, our conditional analyses identified independent genes that driven the transcriptome-wide association signals. Fig. 2. Regional association of transcriptome-wide significant genes. [134]Fig. 2 [135]Open in a new tab a Chr 1 regional association plot in CMC dataset. b Chr 1 regional association plot in BrainSeq2 dataset. Of note, TCTEX1D1 driven the association signal in both two eQTL datasets. c Chr 3 regional association plot in CMC dataset. d Chr 3 regional association plot in BrainSeq2 dataset. e Chr 11 regional association plot in CMC dataset. f Chr 11 regional association plot in BrainSeq2 dataset. Notably, FADS1 explained most of the association signal in both two eQTL datasets. g Chr 13 regional association plot in CMC dataset. h Chr 13 regional association plot in BrainSeq2 dataset. Notably, B3GALTL driven the association signal in both two eQTL datasets. i Chr 22 regional association plot in CMC dataset. j Chr 22 regional association plot in BrainSeq2 dataset. Notably, XPNPEP3 explained most of the association signal in both two eQTL datasets. The top panel in each plot shows all of the genes in the locus. The marginally TWAS associated genes are highlighted in blue, and those that are jointly significant highlighted in green. The bottom panel shows a Manhattan plot of the GWAS data before (grey) and after (blue) conditioning on the predicted expression of the green genes. The x-axis denotes genome coordinates and the y-axis denotes association P-values in GWAS. Pathway enrichment analysis of the identified TWS genes revealed related biological processes To detect whether the TWS genes identified by TWAS were enriched in specific pathways, we carried out pathway and gene ontology analysis. All of genes that reached a relaxed Bonferroni-corrected significance were grouped into three different clusters based on co-expression of 31,499 public samples (expression level was quantified with RNA-seq) (Fig. [136]3). Several biological pathways were significantly enriched among the TWS depression genes, including female pregnancy (Mann–Whitney U-Test, P = 8.53 × 10^−6), formation of Fibrin Clot (Mann–Whitney U-Test, P = 7.92 × 10^−5), cAMP binding (Mann–Whitney U-Test, P = 7.45 × 10^−4) and ephrin signaling (Mann–Whitney U-Test, P = 7.37 × 10^−4) (Table [137]2). In addition, Human Phenotype Ontology (HPO) analyses suggested that the identified TWAS significant genes were enriched in peripheral axonal degeneration (Mann–Whitney U-Test, P = 5.49 × 10^−4) and neurodevelopmental delay (Mann–Whitney U-Test, P = 7.08 × 10^−4) (Table [138]2). Interestingly, we noticed that several marginally significant enriched pathways were consistent with previous findings^[139]42, including G-protein coupled receptor activity (Mann–Whitney U-Test, P = 8.54 × 10^−2). Fig. 3. Gene clustering for the identified TWAS genes based on co-expression. [140]Fig. 3 [141]Open in a new tab Public RNA-sequencing data from 31,499 samples was used to determine co-expression profiles. Co-expression cluster 1 was showed in blue. Co-expression cluster 2 was showed in green. Co-expression cluster 3 was showed in purple. Darker lines represent stronger co-expression. Table 2. Significant pathways of TWAS genes identified through gene-network analysis. Pathway Significance Database Formation of Fibrin Clot (Clotting Cascade) 7.92E-05 Reactome Common Pathway of Fibrin Clot Formation 3.98E-04 Reactome Keratinization 4.95E-04 Reactome Ephrin signaling 7.37E-04 Reactome Female pregnancy 8.53E-06 GO Processes Defense response to Gram-negative bacterium 3.05E-05 GO Processes Fibrinolysis 3.65E-05 GO Processes Epithelial to mesenchymal transition 5.63E-04 GO Processes Pattern specification process 6.96E-04 GO Processes Protease binding 7.00E-05 GO Function cAMP binding 7.45E-04 GO Function Peripheral axonal degeneration 5.49E-04 Human Phenotype Ontology Neurodevelopmental delay 7.08E-04 Human Phenotype Ontology [142]Open in a new tab TWS genes showed higher expression level than background genes in the human brain In addition to pathway enrichment analysis, another important approach to explore the biological function of geneset is the spatio-temporal gene-expression profiling. Based on the expression data from the BrainSpan ([143]http://www.brainspan.org/), we carried out spatio-temporal expression pattern analysis for two TWS genesets (geneset 1: the TWS depression genes identified both in CMC and BrainSeq2 expression panels, N = 7; geneset 2: all TWS depression genes across two expression panels, N = 53). More detailed information about spatio-temporal expression pattern analysis can be found in the study of Gilman et al.^[144]43. The average expression level of all TWS depression genes (Supplementary Table [145]1 and [146]2) was significantly higher than the expression level of the background genes across all developmental stages (Wilcoxon rank-sum test, P < 8.66 × 10^−5) (Supplementary Fig. [147]1). Moreover, we found that the expression level of the TWS depression genes was higher in prenatal stage than the postnatal stages in geneset 2 (P = 1.30 × 10^−2, Wilcoxon rank-sum test) but not in geneset 1 (P = 6.60 × 10^−2, Wilcoxon rank-sum test). These data suggest that the identified TWS depression genes may have pivotal roles in brain development and function. Cell-type-specific expression analysis of target genes To explore the expression pattern of the TWS depression genes in different brain cell types, we conducted cell-type-specific expression analyses^[148]39. TWS depression genes identified in CMC and BrainSeq2 datasets (a total of 53 genes) (Supplementary Table [149]1 and [150]2) were analyzed to investigate if TWS depression genes were specifically expressed in specific cell populations. Notably, we found that the TWS depression genes were primarily expressed in two pyramidal cell types (including hippocampal CA1 pyramidal cells and somatosensory pyramidal cells) (i.e. these two pyramidal cell types contained the most numbers of the TWS depression genes that above the specificity score (i.e. specificity score > 0.1)) (Supplementary Fig. [151]2). This result is consistent with previous findings and provides further support for the involvement of pyramidal cells in depression^[152]44,[153]45. In addition, this result also suggests that the TWS genes may confer depression risk by affecting the functions of pyramidal cells. To explore the expression pattern of the seven overlapping TWS genes in different human tissues, we performed tissue-type-specific expression analysis using expression data from GTEx. Our results indicated that expression level of FADS1 and TCTEX1D1 is low in most human tissues. However, other five genes (B3GALTL, XPNPEP3, ZMAT2, ZNF501 and ZNF502) are all expressed in human brains (Supplementary Figs. [154]3–[155]9). Expression analysis of TWS genes in brains of depression cases and controls TWAS identified depression-associated genes (Supplementary Tables [156]1 and [157]2) under the assumption that genetic variants influence depression risk by modulating gene expression. To further explore if the significant genes identified by TWAS analysis were dysregulated in depression cases compared with controls, we compared the expression level of the TWS genes in brains of depression cases and healthy controls using the expression data from three RNA-seq studies^[158]29–[159]31. As previous studies have indicated that the DLPFC may play a pivotal role in depression^[160]46,[161]47, we only selected the expression data from the DLPFC to perform differential expression analysis. These three datasets were [162]GSE102556 (26 depression cases and 22 controls)^[163]29, [164]GSE101521 (30 depression cases and 29 controls)^[165]30 and [166]GSE80655 (23 depression cases and 24 controls)^[167]31. The P value was corrected by the Bonferroni correction (for multiple testing) approach, which resulted in a significance threshold of P = 9.43 × 10^−4 (=0.05/53 TWS genes were retained for differential expression analysis analysis). We found that PCDHA8 (P = 1.31 × 10^−3) was significantly downregulated in depression cases compared with controls in [168]GSE101521 dataset. By contrast, FANCL (P = 4.88 × 10^−2) showed a significant upregulation in depression cases compared with controls in [169]GSE101521 dataset (Supplementary Table [170]3). In [171]GSE80655 dataset, PCDHA7 showed a significant upregulation (P = 4.34 × 10^−3) (Supplementary Table [172]3). Interestingly, five TWAS significant genes (TMEM161B-AS1, GMPPB, STAU1, NDUFA2 and GPX1) were significantly upregulated in brains of depression cases compared with controls in [173]GSE102556 dataset (Supplementary Table [174]3). Taken together, these results further support that the identified TWAS significant genes may have a role in depression. In addition, these results also suggest that the identified risk variants may confer depression risk through regulating gene expression. Discussion Depression is a common psychiatric disorder that is caused by a combination of multiple risk factors, including genetic, psychological, biological and social factors. Although recent GWASs have successfully identified multiple depression risk loci, the biological interpretations and functional understanding of these associations remain poorly understood (partly due to the inability to fine-map depression-relevant genes). TWAS^[175]17 is a powerful approach to identify associated genes by combining the GWAS results and expression data. In this study, we performed a depression TWAS and identified candidate risk genes for depression. Considering that sample size and tissue matching have pivotal roles in TWAS, we used expression data from the DLPFC (i.e. CMC^[176]20 and BrainSeq2^[177]21) to conduct TWAS analysis. We integrated the summary statistics from the largest depression GWAS^[178]14 so far (N = 807,553 individuals) and gene-expression measurements from two large-scale expression studies to identify TWS genes for depression. Overall, we identified 53 significant genes whose expression were associated with depression risk, of which 23 genes did not overlap with a genome-wide significant locus in the depression GWAS. Our study also highlighted seven overlapping TWS genes (including B3GALTL, FADS1, TCTEX1D1, XPNPEP3, ZMAT2, ZNF501 and ZNF502) in both CMC and BrainSeq2 datasets. These seven TWS genes may represent high-confidence risk genes for depression as they reached transcriptome-wide significance level in two independent expression datasets. B3GLCT encodes Beta 3-Glucosyltransferase, which is involved in metabolism of proteins and O-glycosylation of TSR domain-containing proteins^[179]48. No previous study showed association between this gene and depression. TCTEX1D1 (Tctex1 Domain Containing 1) has a role in organelle biogenesis and maintenance, and intraflagellar transport^[180]49. The protein encoded by XPNPEP3 belongs to the family of X-pro-aminopeptidases, which remove the N-terminal amino acid from peptides with a proline residue in the penultimate position^[181]50. ZMAT2 (Zinc Finger Matrin-Type 2) is related to nucleic acid binding. ZNF501 (Zinc Finger Protein 501) encodes a DNA-binding transcription factor. An important paralog of ZNF501 is ZNF502. There were no previous studies reported associations between ZNF501/ZNF502 and depression. It is possible that ZNF501/ZNF502 confer risk of depression by regulating gene expression. FADS1 encodes fatty acid desaturase 1, a protein that is mainly involved in metabolism of alpha-linolenic (omega3), linoleic (omega6) acid and metabolism^[182]51. Interestingly, a previous study has showed that mRNA expression of FADS1 was significantly lower in depression patients compared with controls, suggested that FADS1 may have pivotal roles in depression^[183]52. More work is needed to characterize the potential role of these genes in depression. We further performed a depression TWAS analysis using the expression weights data from PsychENCODE^[184]24. Among the seven overlapping TWS genes, three genes (i.e. B3GALTL, FADS1 and ZMAT2) showed significant associations in PsychENCODE dataset (Supplementary Table [185]4). Spatio-temporal expression pattern analysis showed distinct expression patterns of these seven overlapping TWS genes in the developing and adult human brain (Supplementary Fig. [186]10). Five genes (FADS1, XPNPEP3, ZMAT2, ZNF501 and ZNF502) showed higher expression in the prenatal developing human brain than adult brain. However, other two genes (B3GALTL and TCTEX1D1) showed reverse spatio-temporal expression patterns. Of note, we noticed that previous studies^[187]11,[188]53 also used CMC dataset as SNP-expression weights for depression TWAS. To identify genes whose genetically regulated expression are associated with complex diseases and traits, Mancuso et al. performed a comprehensive TWAS by integrating multiple expression references and GWAS summary