Abstract Autism spectrum disorder (ASD) is a complex neurodevelopmental disease in early childhood, and growing up to be a major cause of disability in children. However, the underlying molecular mechanism of ASD remains elusive. Hence, we represented integrated multifactor analysis exploring dysfunctional modules based on RNA-Seq data from corpus callosum in 6 patients with ASD and 6 normal individuals. According to protein-protein interactions (PPIs) and WGCNA, we performed co-expression modules analysis for ASD-associated genes, and identified 25 modules with differentially expressed genes (DEGs), observing that genes in these modules were significantly involved in various biological processes in nervous system, sensory system, phylogenetic system and variety of signaling pathways. Then, based on transcriptional and post-transcriptional regulations, integrating transcription factor (TF)-target and RNA-associated interactions, significant regulators of co-expression modules were identified as pivot regulators, including 67 pivot TFs, 13 pivot miRNAs and 6 pivot lncRNAs. GO and KEGG pathway enrichment analysis demonstrated that the pivot miRNAs significantly enriched in neural or mental-associated biological progresses. The pivot TFs were mainly involved in various regulation of transcription, immune system and organs development. Finally, our work deciphered a multifactor dysfunctional co-expression subnetwork involved in ASD, helps uncover core dysfunctional modules for this disease and improves our understanding of its underlying molecular mechanism. Keywords: Autism spectrum disorder, Multifactor analysis, Co-expression, Core dysfunctional module Introduction Autism spectrum disorder (ASD), also known as autism, is a representative disease of pervasive neurodevelopmental disorders in children and one of the most rapidly growing diseases in the world [35]^1^, [36]^2. It's a congenital mental disorder that has nothing to do with upbringing, which is mainly characterized by essential defects in social and communication ability, language ability and repetitive patterns of behavior [37]^3. In addition to these core symptoms, there are also some peripheral symptoms, such as senses interference caused by the digestive system, immune system and sensory system problems. They also exhibit neurobehavioral problems such as learning disabilities, cognitive deficits, anxiety, and hyperactivity. The symptoms of children with ASD are much characterized by social communication disorders, so many early researches focused on social cognition to find the cause of autism. But in recent years, researchers mainly focused on the medical and biology point of view, and made some progress. For example, it's validated that Taar7h and Taar7b in neuroactive ligand-receptor interaction pathway were downregulated in ASD [38]^4. The study by Rikki Hullinger et.al indicated that an autistic-like phenotype will be caused by the upregulation of AT-1, which affecting key neuronal metabolic pathways [39]^5. IGF-1 (insulin-like growth factor-1) has been verified related to multi neuropsychiatric disorders, such as depression, Alzheimer's disease and ASD [40]^6. Studies have demonstrated that ASD are caused by a combination of genetic, epigenetics and environmental factors [41]^7^, [42]^8. There are more than 300 autism-associated genes identified at the human genome level, however, most of them have no clear genetic causes. Regulation of gene expression includes transcriptional levels, post-transcriptional levels, and translational levels. Transcription regulation refers to changing the level of gene expression by changing the transcription rate, which plays an important role in the accuracy and diversity of the transmission of genetic information [43]^9^, [44]^10. Transcriptional regulation of eukaryotes includes various forms [45]^11^-[46]^13, such as DNA methylation, histone modification, chromatin remodeling, transcription factors (TFs), and so on. Transcription of eukaryotic genes takes place in the nucleus and translation takes place in the cytoplasm. Therefore, post-transcriptional regulation is another important aspect of gene expression regulation [47]^14^-[48]^20, including alternative splicing of RNA, RNA methylation, and various regulatory RNAs (miRNAs, lncRNAs) are involved in post-transcriptional regulation. The in-depth study of post-transcriptional and post-translational regulatory mechanisms of genes is of great significance in revealing the nature of life rhythmic activities, the molecular basis of biological evolution, and exploring new fields in genetics research. To further explore the role of genomic in autism, we performed a systematic integrated strategy constructing multifactor regulatory network to identify meaningful gene modules underlying ASD. Combining RNA-Seq data, protein-protein interactions (PPIs), RNA-associated interactions and co-expression analysis, we first identified gene co-expression modules, observing that genes in these modules were significantly involved in various biological processes in nervous system and sensory system and variety of signaling pathway, and four core modules were identified in which genes significantly involved in ASD symptoms-associated biological processes and pathways. Then, based on transcriptional and post-transcriptional regulations, we identified pivot regulators for each module. And, the relationships between DEGs in the four core dysfunctional modules were further analysis. The results indicated that, in addition to the disorder of the genes within the core modules, the regulation of genes by pivot regulators will also play a key role in the occurrence and development of ASD. In a word, our multifactor co-expression network analysis, not only help to explore the relationship of gene modules and ASD, but also provide a novel direct for biologists to further design their researches. Results Identification of gene co-expression modules We downloaded the RNA-seq data ([49]GSE62098) from the NCBI Gene Expression Omnibus (GEO) database, referring to 6 normal individuals and 6 patients with autism spectrum disorder, and FastQC was performed on the quality control of the data. The filtered reads were aligned to the human genome reference (GRCh38). Based on gene level raw counts, we identified 502 differentially expressed protein-coding genes (DEGs), 24 miRNAs (DEMs) and 139 lncRNAs (DELncRNAs), respectively (|FC| > 1.5, p value < 0.05, Supplementary Table [50]S1). From the point of view of a single gene, gene module represents a series of highly correlated genes, and the genes in the same module may have similar biological function. And from the perspective of system biology, the search for a gene module with potential function is actually a bridge to understand the function of a single gene and the characteristics of the global network. Hence, identifying gene functional modules is a key step for understanding molecular mechanisms of diseases. We first extracted a protein-protein interaction (PPI) subnetwork (Figure [51]1), which consisting of DEGs and their 2847 interactors from the human PPI network (see Materials and Methods). Then, based on WGCNA, we performed co-expression modules analysis for these genes in the network. 30 modules were identified (Figure [52]2) and 25 of these modules containing DEGs were kept for further analysis (Supplementary Table [53]S2). The functional enrichment analysis revealed that the GO biological processes and KEGG pathway involved in nervous system, sensory system and behavioral control tended to be enriched by its dysregulated genes in the 25 modules (Supplementary Table [54]S3), for example, “neuroactive ligand-receptor interaction”, “sensory perception” and “behavioral fear response”, which are autisms symptoms-associated functions. In fact, it is no doubt that those genes also involved in the brain development, like “forebrain development” and “pituitary gland development”. There were 196 genes (18 DEGs) in 25 modules have been reported associated with ASD susceptibility exacted from the SFARI Gene ([55]https://gene.sfari.org/autdb/). Figure 1. [56]Figure 1 [57]Open in a new tab Protein-protein interaction network. Nodes in red represent DEGs and blue represent interactors of these DEGs extracted from STRING. Figure 2. [58]Figure 2 [59]Open in a new tab Visualization of WGCNA results. A. Clustering dendrogram of genes. A total of 31 colors corresponding to 30 modules and a gene set containing genes are not included in any module (grey). X-axis represents gene and y-axis represents the height of the gene tree. B. Heatmap plot of the gene network. The heatmap depicts the Topological Overlap Matrix (TOM) among all genes in the analysis. Light color represents low overlap and progressively darker red color represents higher overlap. Blocks of darker colors along the diagonal are the modules. Identification of pivot regulators Transcriptional and post-transcriptional regulations have long been involved in the initiation and progression of various diseases in human. However, the intricate regulatory mechanisms underlying ASD still remain ambiguous. Integrating TF-target, miRNA- and lncRNA-associated interactions, we constructed multifactor-mediated regulation network for co-expression modules. Finally, pivot regulators significantly regulating the co-expression modules were identified (hypergeometric test, p < 0.05, see Materials and Methods), including 67 pivot TFs, 13 pivot miRNAs and 6 pivot lncRNAs (LINC00668, APCDD1L-AS1, ESRG, LINC01121, LINC01749 and LINC00466) for 20 modules (Supplementary Table [60]S4). Similarly, the functional enrichment was performed for the pivot TFs and miRNAs, respectively. The results demonstrated that the pivot TFs were mainly involved in biological processes in transcriptional regulation and organs development (Figure [61]3A); the pivot miRNAs significantly enriched in neural or mental-associated biological progresses, for example, “neurotrophin TRK receptor signaling pathway” (p=6.16E-19), “nervous system development” (p =1.43E-4), “long-term depression” (p =2.37E-2) (Figure [62]3B). Moreover, some pivot lncRNAs have been reported to be directly or indirectly related to ASD. The lncRNA LINC00668 is located downstream of gene LAMA1 involved in cerebellum and retinal development [63]^21 suggesting that it may implicated autism etiology. LINC01749 is validated associated with Major Depressive Disorder (MDD) [64]^22, and so on. Figure 3. [65]Figure 3 [66]Open in a new tab Functional enrichment results. A. Enrichment results of pivot TFs. The color depth represents p-value, the number of genes represented by the node size B. Enrichment results of pivot miRNAs. Nodes in blue represent KEGG pathway enrichment results and in purple represent GO functional enrichment results. The size of nodes represents the number of genes. Profiling core dysfunctional modules Based on the functional enrichment results, we focused on 4 core dysfunctional modules, including blue, yellow, green and darkred module. The enrichment analysis observed that all of the four modules were significantly enrichment for “neuroactive ligand-receptor interaction”; various G-protein coupled receptor signaling pathway enriched by blue, yellow and green module; “neuropeptide signaling pathway” by yellow, green and darkred module. But beyond that, each module also had own characteristic functions (Supplementary Table [67]S3). The blue module consisting of 314 genes (176 DEGs) was involved in biological progresses of sensory system, such as gustation, smell and optesthesia. The yellow module containing 257 genes (16 DEGs) was observed significantly enrichment for neural system, various synapses, forebrain development and learning. These 228 genes in green module were significantly enrichment for long-term depression, sensory perception, behavioral, learning, cognition and central nervous system neuron development. And 47 genes in darkred module were observed significantly enriched on the biological processes of regulation of behavior. In total, there were 75 overlapping genes with SFARI Gene, including 13 in blue, 37 in yellow and 27 in green. Functional enrichment showed that, pivot miRNAs (hsa-miR-640 and hsa-miR-1262 for blue module, has-let-7d-5p and hsa-miR-5190 for yellow module) were significantly enrichment for neurotrophin-related signaling pathways and nervous system development, like “neurotrophin TRK receptor signaling pathway” (p=6.160E-19) , “Neurotrophin signaling pathway” (p=7.314E-03) and “nervous system development” (p= 1.431E-04). Has-let-7d-5p have been validated for its role in Alzheimer's disease [68]^23^, [69]^24 and brain cancer [70]^25, suggesting that its potential function in ASD. 13 pivot TFs of the four core module, including HNF4A, HNF1A, POU2F1, POU5F1, TBX21, GATA1 (for blue), FOXM1, HNF1A (for yellow), STAT6, HMGA1, ATF3, PAX5, MAZ (for green), POU2F1 and ESR1 (for darkred) were mainly involved in transcription regulation and organ morphogenesis (Figure [71]4A). And HNF1A, POU2F1 significantly regulated blue and yellow, blue and darkred, respectively. It has been showed that HNF1A is associated with neuropsychiatric and neuropsychological characteristics [72]^26^, [73]^27. The alternative promoter usage and differential expression of POU2F1 transcript variants has an impact on cerebellar development [74]^28, lens and olfactory placode development [75]^29. ESR1 is also with evident role for autism [76]^30^, [77]^31. Embryonic stem cell related gene (ESRG) is an lncRNA downregulated during the transition from induced pluripotent stem cell (iPSCs) to neural progenitor cells (NPCs) [78]^32. Figure 4. [79]Figure 4 [80]Open in a new tab A. The biological processes enriched by 13 pivot TFs of 4 core modules. B. The transcriptional and post-transcriptional regulation for core modules. Circles in red represent DEGs. Triangles, diamonds and “V” in gray represent significant relationship between pivot regulators and genes in module. And the blue lines are interactions among these pivot regulators. C. DEG-DEG interactions between modules. The size of nodes represents the degree. Considering each of these four core modules corresponded to the most typical features of autisms, we further analyzed the regulation subnetwork and interactions between differentially expressed genes in the four modules (Figure [81]4B, C). Analysis results presented the interactions among has-let-7d-5p, ESR1 and POU2F1, indicating they can regulate the modules genes together to affect autism. In addition, it was found that the differentially expressed genes in yellow and green are tightly linked with genes in blue, modules are tightly linked. In short, dysregulated genes in one module may directly regulated the expression of genes in another, and then pivot regulators can not only directly regulate module genes, but also indirectly regulate other functional modules through interaction with other pivot regulators of the functional modules. Discussion Autism is a complex disease with a strong genetic basis, including a wide range of neurodevelopmental disorders in clinical and etiology [82]^33^, [83]^34. Today, most of our knowledge on ASD genetics has been obtained from the genetic linkage of large ASD patient cohorts or exome sequencing analysis [84]^35, which provides us an opportunity to observe the molecular basis of this disease. However, a complete picture of this disease may require the integration of ASD gene data from different dimensions. Transcription and post-transcription regulation is closely related to various human diseases[85]^20^, [86]^36^, [87]^37, including brain diseases [88]^38. In this study, we performed an integrated multifactor analysis to identified core dysfunctional modules on RNA-seq data. To determine the role of dysfunctional gene set in the occurrence and development of ASD, we first identified the differentially expressed genes based on RNA-seq data. Then, combining protein-protein interactions and WGCNA, 25 co-expression modules with differentially expressed genes were exacted, and observed genes in these modules were significantly involved in various biological processes in nervous system, sensory system, phylogenetic system and variety of signaling pathways. In addition to, four core dysfunction modules were identified for further analysis, which mainly enriched in gene ontology (GO) biological progresses or KEGG pathways closely related with the core symptoms of autisms, respectively. Finally, to explored whether transcripttional and post-transcriptional regulation influence the development of ASD by regulating the genes in the four core dysfunction modules, significant regulators of co-expression modules were identified as pivot regulators by integrating TF-targets and RNA-associated interactions, including TFs, miRNAs and lncRNAs. These pivot regulators were also mainly involved in a variety of developmental, neural or mental-associated biological progresses, have been reported to be more or less directly or indirectly related to neurological diseases [89]^32^, [90]^39^-[91]^41, suggesting their vital role in the regulation subnetwork. Finally, we deciphered several core dysfunctional modules for this disease, and improved our understanding of its underlying molecular mechanism further. Experts believe that autism is also a neuronal synaptic disease which closely related with neuronal development, but its molecular mechanism needs further study. From a molecular neurobiology point of view, the functions of dendrites, axons-related proteins and non-coding genes during of the neuro-synapse are helpful to understand their role in brain development and mature, and reveal the autism pathogenesis. In summary, our work detailed multifactor-mediated core dysfunctional modules under ASD, which may also contribute to the discovery of more detailed molecular mechanisms and provided a rich resource of potential candidates for future experimental validations and a theoretical guidance for biological research in the future. Materials and Methods Data resources and Differentially expression analysis The RNA-seq data ([92]GSE62098 [93]^42) was collected from the NCBI Gene Expression Omnibus [94]^43 (GEO) database, which contained 6 patients with autism spectrum disorders and 6 normal individuals (tissue: corpus callosum). First, the quality control of the downloaded data was performed by FastQC (Version 0.11.5). Then, the filtered reads were used to mapped to the hg38 genome reference genome (GRCh38) using HISAT2 (version 2.1.0) [95]^44 with default parameters. The reference genome and gene annotation file (.gtf format) was downloaded from GENCODE Release 27 ([96]http://www.gencodegenes.org/). Third, FPKM (Fragments Per Kilobase transcriptome per Million reads) values were calculated using StringTie (v1.3.3 release) [97]^45 with parameters “--known-splicesite-infile”. Fourth, gene level raw counts were calculated by the preDE.py script within StringTie. Finally, based on raw counts, we identified differentially expressed protein-coding genes, miRNAs and lncRNAs using DEseq2 (|fold change (FC)| > 1.5 and p-value < 0.05) for further analysis. Generating gene co-expression modules The WGCNA R software package [98]^46 was performed to identify gene co-expression modules. We firstly extracted a protein-protein physical interaction (PPI) subnetwork (combined_score > 900) in human from the STRING [99]^47 database (v10.5), containing 9637 proteins and 170987 interactions. Then, DGEs and genes interacted with these DEGs in the PPI subnetwork were identified to construct weighted gene correlation network. Finally, the expression profile of genes in the weighted gene correlation network was input to WGCNA for co-expression modules. The parameters (minModuleSize = 20; minimum height = 0.15) were set to cut tree. Identifying pivot regulators For each gene co-expression module, we explored transcriptional and post-transcriptional regulations. From AnimalTFDB2.0 [100]^48 and TRRUST v2 [101]^49, we recruited 3632 regulatory relationships in human, referring to 393 TFs and 1463 target genes. As for post-transcriptional regulations, 18294 differentially expressed miRNA (DEM)-association interacttions were collected from RAID v2.0 [102]^50 database. In addition to, we also collected 321 differentially expressed lncRNA (DELncRNA)-associated interacttions from RAID v2.0. If a regulator i) Inline graphic 2 relationships between module and the regulator; and (ii) the number of its targets significant enriched for per module (hypergeometric test, p value < 0.05) [103]^51, we defined pivot regulator as the regulator significantly regulated the module. Gene ontology and KEGG pathway enrichment analysis Significant gene ontology (GO) biological processes and KEGG pathway enrichment analysis were performed on DAVID [104]^52. And miRNAs function was deciphered by DIANA-miRPath v3.0 [105]^53. Supplementary Material Supplementary Table S1. [106]Click here for additional data file.^ (74KB, xls) Supplementary Table S2. [107]Click here for additional data file.^ (288.5KB, xls) Supplementary Table S3. [108]Click here for additional data file.^ (387KB, xls) Supplementary Table S4. [109]Click here for additional data file.^ (37KB, xls) Acknowledgments