Abstract Objective This study aims at identifying master regulators of transcriptional networks in autism spectrum disorders (ASDs). Results With two sets of independent RNA-Seq data generated on cerebellum from patients with ASDs and control subjects (N = 39 and 45 for set 1, N = 24 and 38 for set 2, respectively), we carried out a network deconvolution of transcriptomic data, followed by virtual protein activity analysis. We identified PPP1R3F (Protein Phosphatase 1 Regulatory Subunit 3F) as a candidate master regulator affecting a large body of downstream genes that are associated with the disease phenotype. Pathway analysis on the identified targets of PPP1R3F in both datasets indicated alteration of endocytosis pathway. Despite a limited sample size, our study represents one of the first applications of network deconvolution approach to brain transcriptomic data to generate hypotheses that may be further validated by large-scale studies. Electronic supplementary material The online version of this article (10.1186/s13104-018-3594-0) contains supplementary material, which is available to authorized users. Keywords: Autism spectrum disorders, RNA-Seq, Next generation sequencing, Network deconvolution, Gene expression Introduction Autism spectrum disorders (ASD) comprise a set of highly inheritable neurodevelopmental conditions characterized by impairments in social communication, repetitive behaviors and restricted interests [[29]1, [30]2]. ASDs are estimated to affect 1 in 68 children in the United States, and boys are 4.5 times more likely than girls to develop ASDs [[31]3]. Several studies showed that the heritability of autistic phenotypes is estimated to be around 90% [[32]4, [33]5]. The number of genes potentially implicated in ASDs is rapidly growing, mainly from large-scale genetic studies such as next generation sequencing (NGS) [[34]6–[35]12] and genome-wide association studies (GWAS) [[36]13–[37]16]. Although these studies have substantially advanced our understanding of the etiology of ASDs, the underlying molecular mechanisms remain elusive [[38]17]. Transcriptome analysis is gaining momentum as a complementary approach to genetic association studies [[39]17], enabling us to understand the molecular pathophysiology of ASDs. A number of studies have evaluated whole-genome gene expression that may contribute to the onset of ASD. In a large-scale RNA-Seq effort, matched brain regions from subjects affected with ASDs and controls were utilized to identify neuronal genes strongly dysregulated in cortical regions [[40]17]. Utilizing microarray technology, Voineagu et al. [[41]18] demonstrated consistent differences in transcriptome organization between autistic/normal human brain tissues using gene co-expression network analysis. However, the potential molecular drivers of co-expressed modules have not been identified [[42]18]. Despite applications of co-expression network approaches in the inference of regulatory machinery in ASD [[43]19], state-of-the-art approaches such as network deconvolution methods are barely adopted in this area. Network deconvolution methods have been successfully used to study prostate differentiation [[44]20] and cancers [[45]21]. They can overcome limitations of the existing methods such as connecting genes with indirect interactions leaving their mutual causal effects aside as well as suffering from the exponentially increasing computational cost, etc. [[46]22]. These methods can illuminate the underlying transcription circuitry of diseases and illustrate potential regulation drivers. For example, with transcriptional network deconvolution approach, we have recently provided novel insights on post-traumatic stress disorder (PTSD) [[47]23] by identifying several genes as drivers of innate immune function. In the current study, we used ARACNe (algorithm for reconstruction of accurate cellular networks) [[48]24] to deconvolve cellular networks. In this approach, gene–gene co-regulatory patterns are first identified using mutual information (MI), and the constructed networks are further pruned by removing indirect connections where two genes are co-regulated through one or more intermediaries. Using two of the largest transcriptomic datasets of postmortem brain tissues from ASD individuals and control subjects by Parikshak et al. [[49]19] and Gupta et al. [[50]17], we reconstructed the transcriptional networks followed by virtual protein activity analysis, to identify “master regulators” (MRs) that may differentially regulate the expression levels of multiple downstream genes in the cerebellum region of ASD individuals and controls. Main text Methods Network construction and analysis tools are explained in the Additional file [51]1. Upon constructing the transcriptional networks, we used an algorithm called VIPER (virtual inference of protein-activity by enriched regulon analysis [[52]21]). VIPER aims at inferring the protein activity of a MR by a systematic analysis of the expression patterns of its targets (regulons). VIPER directly integrates target mode of regulation indicating whether targets are repressed or activated given the statistical confidence in regulator–target interactions and target overlap between different regulators in order to obtain the enrichment of a protein regulon in differentially expressed genes [[53]23]. Compared to the existing approaches such as T-profiler [[54]25], gene set enrichment analysis (GSEA) [[55]26], and Fisher’s exact test [[56]27], VIPER supports seamless integration of genes with different likelihoods of representing activated, repressed or undetermined targets. Both datasets contain multiple regions including cerebellum, which is relevant for ASDs since specific cerebellar zones can affect neocortical substrates for social interaction and cognitive functions such as language and executive functions [[57]28–[58]30]. Abnormalities of the cerebellum, which is believed to be involved in cognitive functions, can in part underlie autistic symptoms [[59]31]. Several other brain regions, such as gyral surface of the anterior cingulate cortex and ventromedial prefrontal cortex [[60]32], posterior superior temporal sulcus (pSTS) [[61]33], amygdala, orbital frontal cortex, and fusiform gyrus [[62]34] are also known to be ASD-relevant. We reasoned that in the same brain region, there should be highly active proteins whose expression regulates a large set of target genes and such patterns should be replicated in an independent dataset. Our preliminary finding indicates PPP1R3F (Protein Phosphatase 1 Regulatory Subunit 3F) as a potential master regulator (MR). The framework of the in silico experiments is illustrated in Fig. [63]1. Influence of dysregulation of this gene on ASD pathogenesis was then examined. Fig. 1. [64]Fig. 1 [65]Open in a new tab The overall process of network construction and virtual protein activity analysis to identify a master regulator Results and discussion We first used the data from Parikshak et al. [[66]19] to construct the regulatory networks. This data is part of a large RNA-Seq repository on post-mortem human brain tissue (39 cases vs. 45 controls) from cerebellum, frontal cortex, temporal cortex, prefrontal cortex, and visual cortex. During the process of network deconvolution (see Methods in Additional file [67]1), pairwise MI between all of the available transcripts were obtained. Next, the constructed network was trimmed to remove genetic intermediaries, resulting in potential direct connections between MRs and their targets (we used the recommended P value threshold of 10^−8, as a measure of confidence of regulatory relationships between two genes [[68]24]). This analysis yielded a repertoire of 672,973 interactions, 23,935 regulators, and 24,847 targets in the constructed network using the dataset from Parikshak et al. [[69]19]. We similarly analyzed the second dataset from Gupta et al. [[70]17], a RNA-Seq data of post-mortem brain tissues with more samples of cerebellum region than other brain regions. Using the same network construction settings on this dataset [[71]17] containing 24 cases and 38 controls, we deconvolved a network of 297,870 interactions containing 12,040 regulators and 12,529 targets. Both constructed networks are provided in Additional files [72]2 and [73]3. After applying VIPER, we compared the list of significant MRs at FDR ≤ 0.05. We identified PPP1R3F as the only MR shared between the two datasets. Given the small sample size of the data, it is possible that our analysis was underpowered and may have missed other relevant MRs in ASDs. Figure [74]2 illustrated how downregulation of this MR influences the expression of its regulons in the constructed networks of both data sets. PPP1R3F was significantly downregulated in Parikshak et al. data (FDR from one-sided t test: 0.029) as well as Gupta et al. data (FDR = 3.58 × 10^−4). Fig. 2. [75]Fig. 2 [76]Open in a new tab Gene set enrichment analysis (GSEA) of PPP1R3F targets in the constructed networks using the data by a Parikshak et al. [[77]16] and b Gupta et al. [[78]14]. Black bars in the both figures depict the rank of the PPP1R3F targets in terms of correlation with the phenotype among the entire list of genes in the both datasets PPP1R3F is one of the type-1 protein phosphatase (PP1) regulatory subunits. Protein phosphorylation is a key mechanism by which cells regulate signaling transduction pathways, and PPP1 family enzymes are associated with dephosphorylation of several proteins such as TGF-ß cascade [[79]35]. PPP1R3F has been found to be important to neuronal activities [[80]36]. A systematic resequencing of X-chromosome synaptic genes in a group of individuals with ASD (122 males and 20 females) has identified a rare non-synonymous variant in PPP1R3F that can predispose to developing ASDs [[81]36]. This potentially damaging variant, c.733T > C, was observed in a boy with a diagnosis of asperger syndrome and was transmitted from a mother who suffered from learning disabilities and seizures [[82]36]. Further, we examined the overlaps between PPP1R3F regulons and known candidate genes implicated in ASD and its related disorders (Table [83]1). The most significant overlap was found with SFARI gene list [[84]37] (P = 8 × 10^−4), followed by overlap with an intellectual disability database gene list (P = 0.072) [[85]38]. The overlaps with other ASDs candidate gene lists also showed trends towards to being significant. These results suggest the potential relevance of the predicted PPP1R3F network to ASDs. Table 1. The overlap between the identified PPP1R3F regulons from both datasets (n = 177 genes) and several candidate gene lists of ASDs and ID (intellectual disability) Source of gene list # Genes in the gene list Overlap P value Fold enrichment References