Abstract The dementia epidemic is likely to expand worldwide as the aging population continues to grow. A better understanding of the molecular mechanisms that lead to dementia is expected to reveal potentially modifiable risk factors that could contribute to the development of prevention strategies. Alzheimer’s disease is the most prevalent form of dementia. Currently we only partially understand some of the pathophysiological mechanisms that lead to development of the disease in aging individuals. In this study, Switch Miner software was used to identify key switch genes in the brain whose expression may lead to the development of Alzheimer’s disease. The results indicate that switch genes are enriched in pathways involved in the proteasome, oxidative phosphorylation, Parkinson’s disease, Huntington’s disease, Alzheimer’s disease and metabolism in the hippocampus and posterior cingulate cortex. Network analysis identified the krupel like factor 9 (KLF9), potassium channel tetramerization domain 2 (KCTD2), Sp1 transcription factor (SP1) and chromodomain helicase DNA binding protein 1 (CHD1) as key transcriptional regulators of switch genes in the brain of AD patients. These transcriptions factors have been implicated in conditions associated with Alzheimer’s disease, including diabetes, glucocorticoid signaling, stroke, and sleep disorders. The specific pathways affected reveal potential modifiable risk factors by lifestyle changes. Introduction Dementia affects over 50 million people worldwide, approximately 67% of which have Alzheimer’s disease (AD) [[30]1]. By 2050 it is predicted that as many as 152 million people may have dementia [[31]1]. Unfortunately, there remains no cure for AD and only four drugs have been approved for treatment that manages symptoms of dementia in some patients. We currently do not completely understand the cause of AD, but there is strong data to support the involvement of the proteins amyloid and tau. In AD patients, amyloid-ß and hyperphosphorylated tau are produced abundantly in the brain. Amyloid-ß forms inter-neuronal plaques that disrupt cell function, whereas tau forms intra-neuronal neurofibrillary tangles that block intracellular transport. Risk factors for AD include genetics (i.e. APOE epsilon 4 carrier), biology (i.e. aging and gender), and environmental factors (i.e. glucose and cholesterol metabolism, inflammation and oxidative stress) related to lifestyle choices or accidents (i.e. diet, exercise, smoking, education and head trauma). Plaques and tangles may appear in the brain 18 years before the onset of symptoms [[32]2]. Initially plaques and tangles form in the hippocampus (HIP) and entorhinal cortex (EC) of the temporal lobe, which are involved in learning and memory [[33]3]. In addition to the HIP and EC, metabolic and pathological differences have been found in the middle temporal gyrus (MTG), posterior cingulate cortex (PCC), and superior frontal gyrus (SFG) of AD patients [[34]4–[35]24]. The primary visual cortex (VCX) usually does not show disease-related neurodegeneration [[36]25, [37]26]. Recently, laser capture microdissection (LCM) was used to select neurons in regions of the brain affected in AD patients and healthy elderly controls. Gene expression profiling of the neurons identified changes that occur in the development and pathogenesis of AD [[38]19, [39]26]. One striking finding from these studies is that AD patients have significantly reduced expression of mitochondrial transport chain genes in the PCC, MTG and HIP compared to controls [[40]27]. The key events in AD development remain unknown, but gene expression studies on postmortem brain tissue are expected to reveal pathways that are dysregulated in patients. Recently SWItch Miner (SWIM) software has been developed that combines gene expression data with topological properties of correlation networks to reveal major changes in cellular phenotype that are at the root of biological processes [[41]28]. In SWIM, the Pearson correlation coefficient between the expression of two genes is used to build co-expression networks. RNA transcripts are the nodes of the network and connections between nodes are made if the expression of the genes is significantly correlated or anti-correlated. Clustering algorithms are used to identify disease modules. SWIM analysis identifies “switch genes” that may be fundamental to a disease. SWIM has been used to identify switch genes involved in the transformation of glioblastomas from stem-like to the differentiated state [[42]29] and reprogramming in grapevine development from immature to mature growth [[43]30]. The use of SWIM could be expanded to the study of chronic diseases in order to reveal key players that lead to disease development. In this study, we have used SWIM to identify genes whose expression is associated with drastic changes in the brain of AD patients. The results show that the switch genes in the HIP and PCC regions of the brain that are affected in AD patients are enriched in proteasome, oxidative phosphorylation, metabolic, Parkinson’s disease, Huntington’s disease, and Alzheimer’s disease pathways. Methods Database mining The NCBI GEO database ([44]https://www.ncbi.nlm.nih.gov/gds) and ArrayExpress database ([45]https://www.ebi.ac.uk/arrayexpress/) were searched on June 3, 2019 for studies in which gene expression data was available from laser-captured neurons in the brain of Alzheimer’s patients ([46]Fig 1). The NCBI GEO database was queried using the search terms Alzheimer’s, brain, neuron and "Homo sapiens"[Organism]) for the study types expression profiling by array and expression profiling by high-throughput sequencing. 44 studies were identified, 21 were brain-specific studies and 4 had data from laser-captured neurons ([47]GSE28146, [48]GSE66333, [49]GSE5281, [50]GSE4757). The ArrayExpress database was searched using the keywords Alzheimer’s, Homo sapiens, and transcription and 74 studies were identified, 27 of these studies were brain-specific and 3 studies had data from laser-captured neurons ([51]GSE28146, [52]GSE29652, and [53]GSE4757). Fig 1. Database mining. [54]Fig 1 [55]Open in a new tab The NCBI GEO database ([56]https://www.ncbi.nlm.nih.gov/gds) and ArrayExpress database ([57]https://www.ebi.ac.uk/arrayexpress/) were searched on June 3, 2019 for studies in which gene expression data was available from laser-captured neurons in the brain of Alzheimer’s patients. SWIM analysis to identify switch genes Raw data from the expression arrays was imported into SWIM. The SWIM algorithm is comprised of several steps shown in [58]Fig 2 [[59]31]. In the pre-processing phase, genes that are not expressed or only slightly expressed are removed. In the filtering phase, the fold-change limit was set between 2–4 and genes that were not significantly expressed differently between AD patients compared to controls are removed. The False Discovery Rate method was used to correct for multiple tests [[60]32] and then a Pearson correlation analysis was used to build a co-expression network of genes differentially expressed between AD patients and controls. In step 4, the k-means algorithm was then used to identify communities within the network [[61]33]. To determine the number of clusters, SWIM uses Scree plot, which allows replicating the clustering many times with a new set of initial cluster centroid positions, and for each replicate the k-means algorithm performs iterations until the minimum of the sum of the squared error (SSE) function is reached. The cluster configuration with the lowest SSE values among the replicates is designated as the number of clusters. The heat cartography map is built using a clusterphobic coefficient Kπ, which measures external and internal node connections, and the global within-module degree Zg, which measures the extent each node is connected to others in its own community. When Zg exceeds 5 a node it is considered a hub. The average Pearson correlation coefficient (APCC) between the expression profiles of each node and its nearest neighbors is used to build the heat cartography map. Using APCC, three types of hubs may be identified. Date hubs show low positive co-expression with their partners (low APCC), party hubs show high positive co-expression (high APCC), and nodes that have negative APCC values are called fight-club hubs [[62]28]. In the final step of SWIM analysis, switch genes are identified that are a subset of the fight-club hubs that interact outside of their community. Switch genes are characterized as not being a hub in their own cluster (low Zg <2.5), having many links outside their own cluster (Kπ >0.8, when Kπ is close to 1 most of its links are external to its own module), and having a negative average weight of incident links (APCC <0) [[63]28]. Fig 2. SWIM analysis to identify switch genes. [64]Fig 2 [65]Open in a new tab Data from the expression arrays was imported into SWIM. The SWIM algorithm is comprised of the steps depicted in the figure [[66]31]. For each brain section and each group, diseased and normal, the mean value of the gene expression is calculated for every cell in the microarray. The p-value is then calculated for each gene. Control for the false discovery rate is performed using the Benjamini and Hochberg approach. The sets of genes to analyze is further filtered by setting thresholds for the magnitude of the fold change and false discovery rate. Next, the Pearson correlation coefficient is calculated for the remaining pairs of genes and those below a threshold are eliminated. At this point 1000 to 2000 genes out of the original 54675 remain. These genes become the nodes in the network that will be analyzed. The switch genes are the set of genes that interact outside their own community, are not in local hubs and are mainly anti-correlated with their interaction partners. After the switch genes are identified, KEGG pathway analysis was conducted to see if the results imply a relationship to a disease. The random set of genes is chosen from the identified nodes from above. The analysis is performed by studying the effect on the network connectivity of removing different types of nodes by decreasing degree. The total number of nodes to be removed must be equal to the total number of switch genes and the cumulative node deletion is carried out by type (i.e., total hubs, party hubs, date hubs, fight-club hubs, switch genes, and randomly chosen nodes). Pathway analysis Entrez gene identifiers from the SWIM analysis were imported into the Database for Annotation, Visualization and Integrated Discovery (DAVID) ([67]https://david.ncifcrf.gov/) which uses singular enrichment analysis [[68]34, [69]35]. The functional annotation tool was used to select Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The biological functions of KEGG charts were enriched with p < .05. Transcription factor analysis Entrez gene identifiers from the SWIM analysis were imported into NetworkAnalyst for network analysis of transcription factors [[70]36]. In NetworkAnalyst, we used the transcription factor and gene target data derived from the ENCODE ChIP-seq data. Transcription factor analysis in NetworkAnalyst uses the BETA Minus Algorithm in which only peak intensity signal < 500 and the predicted regulatory potential score <1 is used. Transcription factors were ranked according to network topology measurements including degree and betweenness centrality. Results In order to identify gene expression changes in the brain that may lead to the transition from a healthy aging brain to that of AD patient, we used the SWIM algorithm. The Array Express and NCBI databases were searched to identify studies that contained expression data from postmortem brain tissue of AD patients and age-matched controls ([71]Fig 1). LCM neuron studies ([72]GSE28146, [73]GSE29652, [74]GSE4757, [75]GSE66333 and [76]GSE5281) and several microarrays with samples from brain sections were identified. SWIM analysis was conducted on each of these datasets. In all of the brain section studies and four of the LCM studies, the p values were not sufficiently robust to complete SWIM analysis. Only [77]GSE5281 had data with sufficiently robust p values for the analysis to be completed. The characteristics of the participants in the [78]GSE5281 study [[79]19, [80]26, [81]27] are presented in [82]Table 1. Brain samples from a total of 22–23 participants (10 AD patients and 12–13 controls, depending on the brain region). The average age of the participants was 83 years old. The controls were matched as closely as possible for age at death and mean education level. The AD patients had a Braak stage ranging from III-IV [[83]26]. Table 1. Characteristics of study participants. [84]GSE5281 Entorhinal cortex Variable HC AD P Value Number of participant 13 10 Age, mean (SD) 80.3 (9.2) 85.6 (6.3) 0.13 Female/male, No (% male) 3/10 (76) 6/4 (40) 0.07 [85]GSE5281 Hippocampus Variable HC AD P Value Number of participant 13 10 Age, mean (SD) 79.6 (9.4) 77.8 (5.7) 0.6 Female/male, No (% male) 3/10 (76) 4/6 (60) 0.38 [86]GSE5281 Mid temporal gyrus Variable HC AD P Value Number of participant 12 10 Age, mean (SD) 80.1 (9.8) 79.1 (6.4) 0.76 Female/male, No (% male) 4/8 (67) 6/10 (63) 0.82 [87]GSE5281 Posterior cingulate cortex Variable HC AD P Value Number of participant 13 10 Age, mean (SD) 79.8 (9.4) 77.5 (6.2) 0.56 Female/male, No (% male) 4/9 (69) 3/6 (67) 0.90 [88]GSE5281 Superior frontal gyrus Variable HC AD P Value Number of participant 13 10 Age, mean (SD) 79.3 (10.2) 79.2 (7.5) 0.97 Female/male, No (% male) 4/7 (64) 10/13 (67) 0.69 [89]GSE5281 Primary visual cortex Variable HC AD P Value Number of participant 13 10 Age, mean (SD) 79.9 (7) 80.2 (6.7) 0.38 Female/male, No (% male) 3/9 (75) 8/11 (58) 0.33 [90]Open in a new tab Gene expression data from laser-captured neurons from HIP, EC, MTG, PCC, SFG and VCX was imported into SWIM. The data for EC using a linear fold-change of 3 is presented in [91]Fig 3. The samples that were filtered out in step 2 of the analysis are depicted as grey bars in [92]Fig 3A, whereas those that are retained for further analysis are shown in red. In [93]Fig 3B the correlation communities are identified. The fight-club hubs are depicted in R4 in blue and are negatively correlated in expression with their interaction partner. A heat map of the expression of the switch genes, step 5, is shown in [94]Fig 3C. After the switch genes are identified, the data is analyzed further to assess robustness. The data in [95]Fig 3D indicates that fight-club hubs differ from date and party hubs and the switch genes are significantly different that random. The switch genes identified in the EC are listed in [96]S1 Table. Fig 3. SWIM analysis of the entorhinal cortex. [97]Fig 3 [98]Open in a new tab A. Distribution of the fold-change values for [99]GSE5281 brain microarray gene expression data from the EC. The x-axis represents the fold-change value (log2 of the fold-change) that is the ratio of the average expression data in AD patients compared to the average expression data in normal controls computed for protein-coding and non-coding RNAs. The y-axis represents the frequency of the obtained fold-change values. The grey bars represent the fold-change values associated with protein-coding and non-coding RNAs that will be discarded according to the selected threshold. The red bars represent the fold-change values associated with protein-coding and non-coding RNAs that were retained for further analysis. B. Heat cartography map for [100]GSE5281 brain data from the entorhinal cortex correlation network. The plane is identified by two parameters: Zg (within-module degree) and Kπ (clusterphobic coefficient) and it is divided into seven regions each defining a specific node role (R1-R7). High Zg values correspond to nodes that are hubs within their module (local hubs), whereas low Zg values correspond to nodes with few connections within their module (non-hubs within their communities, but they could be hubs in the network). Each node is colored according to its average Pearson Correlation coefficient (APCC) value. Yellow nodes are party and date hubs, which are positively correlated in expression with their interaction partners. Blue nodes are the fight-club hubs, which have an average negative correlation in expression with their interaction partners. Blue nodes falling in the region R4 are the switch genes, which are characterized by low Zg and by high Kπ values and are connected mainly outside their module. C. Dendrogram and heat map for switch genes in [101]GSE5281 brain microarray gene expression data from the entorhinal cortex. The expression profiles of switch genes (including protein-coding and non-coding RNAs) are clustered according to rows (switch genes) and columns (samples) of the switch genes expression data (biclustering). The colors represent different expression levels that increase from blue to yellow. The red line under the x axis labels denotes AD samples. D. Robustness for the [102]GSE5281 brain correlation network from the entorhinal cortex. The x-axis represents the cumulative fraction of removed nodes, while the y-axis represents the average shortest path. The shortest path between two nodes is the minimum number of consecutive edges connecting them. Each curve corresponds to the variation of the average shortest path of the correlation network as function of the removal of nodes specified by the colors of each curve. The data for HIP using a linear fold-change of 3 is presented in [103]Fig 4. The samples that are retained for further analysis are depicted in red in [104]Fig 4A, the correlation communities are identified in [105]Fig 4B and the fight-club hubs are depicted in R4 in blue. A heat map of the expression of the switch genes, is shown in [106]Fig 4C. The data indicates that fight-club hubs differ from date and party hubs and the switch genes are significantly different that random, [107]Fig 4D. The switch genes identified in the HIP are listed in [108]S2 Table. Fig 4. SWIM analysis of the hippocampus. [109]Fig 4 [110]Open in a new tab A. Distribution of the fold-change values for [111]GSE5281 brain microarray gene expression data from the HIP. The x-axis represents the fold-change value (log2 of the fold-change) that is the ratio of the average expression data in AD patients compared to the average expression data in normal controls computed for protein-coding and non-coding RNAs. The y-axis represents the frequency of the obtained fold-change values. The grey bars represent the fold-change values associated with protein-coding and non-coding RNAs that will be discarded according to the selected threshold. The red bars represent the fold-change values associated with protein-coding and non-coding RNAs that were retained for further analysis. B. Heat cartography map for [112]GSE5281 brain data from the HIP. correlation network. The plane is identified by two parameters: Zg (within-module degree) and Kπ (clusterphobic coefficient) and it is divided into seven regions each defining a specific node role (R1-R7). High Zg values correspond to nodes that are hubs within their module (local hubs), whereas low Zg values correspond to nodes with few connections within their module (non-hubs within their communities, but they could be hubs in the network). Each node is colored according to its average Pearson Correlation coefficient (APCC) value. Yellow nodes are party and date hubs, which are positively correlated in expression with their interaction partners. Blue nodes are the fight-club hubs, which have an average negative correlation in expression with their interaction partners. Blue nodes falling in the region R4 are the switch genes, which are characterized by low Zg and by high Kπ values and are connected mainly outside their module. C. Dendrogram and heat map for switch genes in [113]GSE5281 brain microarray gene expression data from the HIP. The expression profiles of switch genes (including protein-coding and non-coding RNAs) are clustered according to rows (switch genes) and columns (samples) of the switch genes expression data (biclustering). The colors represent different expression levels that increase from blue to yellow. The red line under the x axis labels denotes AD samples. D. Robustness for the [114]GSE5281 brain correlation network from the HIP. The x-axis represents the cumulative fraction of removed nodes, while the y-axis represents the average shortest path. The shortest path between two nodes is the minimum number of consecutive edges connecting them. Each curve corresponds to the variation of the average shortest path of the correlation network as function of the removal of nodes specified by the colors of each curve. The data for MTG using a linear fold-change of 4 is presented in [115]Fig 5. The samples that are retained are depicted in red, [116]Fig 5A and the correlation communities are identified in [117]Fig 5B with the fight-club hubs depicted in R4 in blue. A heat map of the expression of the switch genes is presented in [118]Fig 5C. The data indicates that fight-club hubs differ from date and party hubs and the switch genes are significantly different that random, [119]Fig 5D. The switch genes identified in the MTG are listed in [120]S3 Table. Fig 5. SWIM analysis of the mid temporal gyrus. [121]Fig 5 [122]Open in a new tab A. Distribution of the fold-change values for [123]GSE5281 brain microarray gene expression data from the MTG. The x-axis represents the fold-change value (log2 of the fold-change) that is the ratio of the average expression data in AD patients compared to the average expression data in normal controls computed for protein-coding and non-coding RNAs. The y-axis represents the frequency of the obtained fold-change values. The grey bars represent the fold-change values associated with protein-coding and non-coding RNAs that will be discarded according to the selected threshold. The red bars represent the fold-change values associated with protein-coding and non-coding RNAs that were retained for further analysis. B. Heat cartography map for [124]GSE5281 brain data from the MTG. correlation network. The plane is identified by two parameters: Zg (within-module degree) and Kπ (clusterphobic coefficient) and it is divided into seven regions each defining a specific node role (R1-R7). High Zg values correspond to nodes that are hubs within their module (local hubs), whereas low Zg values correspond to nodes with few connections within their module (non-hubs within their communities, but they could be hubs in the network). Each node is colored according to its average Pearson Correlation coefficient (APCC) value. Yellow nodes are party and date hubs, which are positively correlated in expression with their interaction partners. Blue nodes are the fight-club hubs, which have an average negative correlation in expression with their interaction partners. Blue nodes falling in the region R4 are the switch genes, which are characterized by low Zg and by high Kπ values and are connected mainly outside their module. C. Dendrogram and heat map for switch genes in [125]GSE5281 brain microarray gene expression data from the MTG. The expression profiles of switch genes (including protein-coding and non-coding RNAs) are clustered according to rows (switch genes) and columns (samples) of the switch genes expression data (biclustering). The colors represent different expression levels that increase from blue to yellow. The red line under the x axis labels denotes AD samples. D. Robustness for the [126]GSE5281 brain correlation network from the MTG. The x-axis represents the cumulative fraction of removed nodes, while the y-axis represents the average shortest path. The shortest path between two nodes is the minimum number of consecutive edges connecting them. Each curve corresponds to the variation of the average shortest path of the correlation network as function of the removal of nodes specified by the colors of each curve. The data for PCC using a linear fold-change of 4 is presented in [127]Fig 6. The samples that are retained for further analysis are depicted in red, [128]Fig 6A, the correlation communities are identified in [129]Fig 6B with the fight-club hubs depicted in R4 in blue. A heat map of the expression of the switch genes, step 5, is shown in [130]Fig 6C. The data indicates that fight-club hubs differ from date and party hubs and the switch genes are significantly different that random, [131]Fig 6D. The switch genes identified in the PCC are listed in [132]S4 Table. Fig 6. SWIM analysis of the posterior cingulate. [133]Fig 6 [134]Open in a new tab A. Distribution of the fold-change values for [135]GSE5281 brain microarray gene expression data from the PC. The x-axis represents the fold-change value (log2 of the fold-change) that is the ratio of the average expression data in AD patients compared to the average expression data in normal controls computed for protein-coding and non-coding RNAs. The y-axis represents the frequency of the obtained fold-change values. The grey bars represent the fold-change values associated with protein-coding and non-coding RNAs that will be discarded according to the selected threshold. The red bars represent the fold-change values associated with protein-coding and non-coding RNAs that were retained for further analysis. B. Heat cartography map for [136]GSE5281 brain data from the PC. correlation network. The plane is identified by two parameters: Zg (within-module degree) and Kπ (clusterphobic coefficient) and it is divided into seven regions each defining a specific node role (R1-R7). High Zg values correspond to nodes that are hubs within their module (local hubs), whereas low Zg values correspond to nodes with few connections within their module (non-hubs within their communities, but they could be hubs in the network). Each node is colored according to its average Pearson Correlation coefficient (APCC) value. Yellow nodes are party and date hubs, which are positively correlated in expression with their interaction partners. Blue nodes are the fight-club hubs, which have an average negative correlation in expression with their interaction partners. Blue nodes falling in the region R4 are the switch genes, which are characterized by low Zg and by high Kπ values and are connected mainly outside their module. C. Dendrogram and heat map for switch genes in [137]GSE5281 brain microarray gene expression data from the PC. The expression profiles of switch genes (including protein-coding and non-coding RNAs) are clustered according to rows (switch genes) and columns (samples) of the switch genes expression data (biclustering). The colors represent different expression levels that increase from blue to yellow. The red line under the x axis labels denotes AD samples. D. Robustness for the [138]GSE5281 brain correlation network from the PC. The x-axis represents the cumulative fraction of removed nodes, while the y-axis represents the average shortest path. The shortest path between two nodes is the minimum number of consecutive edges connecting them. Each curve corresponds to the variation of the average shortest path of the correlation network as function of the removal of nodes specified by the colors of each curve. For SFG, initially the fold-change was set at 3 and SWIM was not able to identify any switch genes. We then set the linear fold-change of 2.5 and the data obtained from SWIM analysis is presented in [139]Fig 7. The samples that are retained for further analysis are depicted in red, [140]Fig 7A, the correlation communities are identified in [141]Fig 7B and very few fight-club hubs were found. A heat map of the expression of the switch genes, step 5, is shown in [142]Fig 7C. The data indicates that fight-club hubs differ from date and party hubs, but the switch genes differ only slightly from random, [143]Fig 7D. The switch genes identified in the SFG are listed in [144]S5 Table. Fig 7. SWIM analysis of the superior frontal gyrus. [145]Fig 7 [146]Open in a new tab A. Distribution of the fold-change values for [147]GSE5281 brain microarray gene expression data from the SFG. The x-axis represents the fold-change value (log2 of the fold-change) that is the ratio of the average expression data in AD patients compared to the average expression data in normal controls computed for protein-coding and non-coding RNAs. The y-axis represents the frequency of the obtained fold-change values. The grey bars represent the fold-change values associated with protein-coding and non-coding RNAs that will be discarded according to the selected threshold. The red bars represent the fold-change values associated with protein-coding and non-coding RNAs that were retained for further analysis. B. Heat cartography map for [148]GSE5281 brain data from the SFG. correlation network. The plane is identified by two parameters: Zg (within-module degree) and Kπ (clusterphobic coefficient) and it is divided into seven regions each defining a specific node role (R1-R7). High Zg values correspond to nodes that are hubs within their module (local hubs), whereas low Zg values correspond to nodes with few connections within their module (non-hubs within their communities, but they could be hubs in the network). Each node is colored according to its average Pearson Correlation coefficient (APCC) value. Yellow nodes are party and date hubs, which are positively correlated in expression with their interaction partners. Blue nodes are the fight-club hubs, which have an average negative correlation in expression with their interaction partners. Blue nodes falling in the region R4 are the switch genes, which are characterized by low Zg and by high Kπ values and are connected mainly outside their module. C. Dendrogram and heat map for switch genes in [149]GSE5281 brain microarray gene expression data from the SFG. The expression profiles of switch genes (including protein-coding and non-coding RNAs) are clustered according to rows (switch genes) and columns (samples) of the switch genes expression data (biclustering). The colors represent different expression levels that increase from blue to yellow. The red line under the x axis labels denotes AD samples. D. Robustness for the [150]GSE5281 brain correlation network from the SFG. The x-axis represents the cumulative fraction of removed nodes, while the y-axis represents the average shortest path. The shortest path between two nodes is the minimum number of consecutive edges connecting them. Each curve corresponds to the variation of the average shortest path of the correlation network as function of the removal of nodes specified by the colors of each curve. For VCX, initially the fold-change was set at 3 and SWIM was not able to identify any switch genes. We then set the linear fold-change at 2 and the data obtained from SWIM analysis is presented in [151]Fig 8. The samples that are retained for further analysis are depicted in red in [152]Fig 8A and the correlation communities are identified in [153]Fig 8B. The fight-club hubs are depicted in R4 in blue. A heat map of the expression of the switch genes, step 5, is shown in [154]Fig 8C. The data indicates that fight-club hubs do not differ measurably from date and party hubs, but the switch genes are significantly different than random, [155]Fig 8D. The switch genes identified in the VCX are listed in [156]S6 Table. Fig 8. SWIM analysis of the primary visual cortex. [157]Fig 8 [158]Open in a new tab A. Distribution of the fold-change values for [159]GSE5281 brain microarray gene expression data from the VCX. The x-axis represents the fold-change value (log2 of the fold-change) that is the ratio of the average expression data in AD patients compared to the average expression data in normal controls computed for protein-coding and non-coding RNAs. The y-axis represents the frequency of the obtained fold-change values. The grey bars represent the fold-change values associated with protein-coding and non-coding RNAs that will be discarded according to the selected threshold. The red bars represent the fold-change values associated with protein-coding and non-coding RNAs that were retained for further analysis. B. Heat cartography map for [160]GSE5281 brain data from the VCX. correlation network. The plane is identified by two parameters: Zg (within-module degree) and Kπ (clusterphobic coefficient) and it is divided into seven regions each defining a specific node role (R1-R7). High Zg values correspond to nodes that are hubs within their module (local hubs), whereas low Zg values correspond to nodes with few connections within their module (non-hubs within their communities, but they could be hubs in the network). Each node is colored according to its average Pearson Correlation coefficient (APCC) value. Yellow nodes are party and date hubs, which are positively correlated in expression with their interaction partners. Blue nodes are the fight-club hubs, which have an average negative correlation in expression with their interaction partners. Blue nodes falling in the region R4 are the switch genes, which are characterized by low Zg and by high Kπ values and are connected mainly outside their module. C. Dendrogram and heat map for switch genes in [161]GSE5281 brain microarray gene expression data from the VCX. The expression profiles of switch genes (including protein-coding and non-coding RNAs) are clustered according to rows (switch genes) and columns (samples) of the switch genes expression data (biclustering). The colors represent different expression levels that increase from blue to yellow. The red line under the x axis labels denotes AD samples. D. Robustness for the [162]GSE5281 brain correlation network from the VCX. The x-axis represents the cumulative fraction of removed nodes, while the y-axis represents the average shortest path. The shortest path between two nodes is the minimum number of consecutive edges connecting them. Each curve corresponds to the variation of the average shortest path of the correlation network as function of the removal of nodes specified by the colors of each curve. A Venn diagram analysis and UpSetR plot analysis of the switch genes identified in each brain region is shown in [163]Fig 9 and [164]S7 Table. The order of brain regions which have the largest number of switch genes to the least number is HIP>PCC>EC>MTG>VCX>SFG. The regions that had the largest number of unique switch genes to that which had the least is HIP>EC>PCC>MTG>VCX>SFG. The PCC shares 53 switch genes with the HIP ([165]S7 Table). Interestingly, the EC shares only 1 switch gene with HIP and one with the PCC. Most of the EC switch genes are unique for that region of the brain. Fig 9. Venn diagram and UpsetR plot of switch genes from different brain regions. [166]Fig 9 [167]Open in a new tab A. The Venn diagram was created using [168]http://www.interactivenn.net/. The genes symbols were imported for the different brain areas. EC: Entorhino cortex, HIP: Hippocampus, MTG: Mid Temporal Gyrus, SFG: Superior Frontal Gyrus, PCC Posterior Cingulate, VCX: Primary Visual Cortex. B. The UpSetR plot was created as described [[169]37]. The horizontal bars with labels at the lower left of the panel represent the six data sets that were included in the Venn diagram, with the length of each bar displaying the total set size. The dot pattern to the right shows the intersections between the sets. The vertical bars at the top show the size of the corresponding intersection, ranked by decreasing set size, where a gray dot indicates an empty set and a single black dot indicates no intersection with another set. Pathway analysis of the switch genes was performed in order to identify functions. In the HIP the majority of the disrupted pathways were involved with metabolism, specifically glutamine, glutamate, steroid, arginine, pyruvate and amino acids metabolism ([170]Fig 10). Changes in gene expression involved with oxidative phosphorylation, RNA transport and the spliceosome are enriched in switch genes. The switch genes of the HIP are also enriched in Parkinson’s, Alzheimer’s and Huntington’s disease pathways. The PCC switch genes are also enriched in metabolic and Parkinson’s, Alzheimer’s and Huntington’s disease pathways ([171]Fig 11). The switch genes dysregulated pathways shared in the PCC and HIP are the proteasome, oxidative phosphorylation and metabolism ([172]S1 Fig and [173]S7 Table). The switch genes in the EC, MTG, VCX and SFG are not enriched in any particular pathways. Fig 10. Hippocampus pathway enrichment analysis. [174]Fig 10 [175]Open in a new tab Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of swim genes of the HIP was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID). The gene count for each pathway is represented in A, whereas B represents the fold enrichment. Fig 11. Posterior cingulate cortex (PCC) pathway enrichment analysis. [176]Fig 11 [177]Open in a new tab KEGG pathway enrichment analysis of swim genes of the posterior cingulate cortex was performed using DAVID. The gene counts for each pathway is represented in A, whereas B represents the fold enrichment. In order to identify key transcriptional regulators of the switch genes from the different brain regions, a transcription factor analysis was performed using NetworkAnalyst [[178]36]. Network analysis was performed using the brain regions with the greater number of switch genes, the HIP and PCC. Network analysis revealed that switch genes identified in the HIP region were regulated by the transcription factors, krupel like factor 9 (KLF9), and potassium channel tetramerization domain 2 (KCTD2), whereas those from the PCC region were regulated by KLF9, Sp1 transcription factor (SP1), and chromodomain helicase DNA binding protein 1 (CHD1). As noted above, KLF9 was shared between the HIP and PCC brain regions ([179]S3 and [180]S4 Figs). Discussion In this study we used SWIM analysis to identify key genes in regions of the brain known to show metabolic and pathological differences in AD patients compared to healthy aging individuals including the HIP, PCC, EC, MTG, and SFG. For comparison we also analyzed gene expression data from the VCX, which usually does not show disease-related neurodegeneration. Transcription data from laser-captured neurons was interrogated to identify switch genes. The results indicate that changes in gene expression in both the HIP and PCC may alter brain function by disrupting metabolism, oxidative phosphorylation, and the proteasome ([181]S1 Fig). Previous studies have shown that the PCC and HIP are affected in AD patients [[182]12]. The PCC shows a reduction in glucose metabolism in early AD and has the largest abnormal positron emission tomography scans of cognitively normal late-middle age individuals who carry the APOE epsilon 4 allele [[183]24, [184]38, [185]39]. The HIP shows neurofibrillary tangles in AD patients [[186]12, [187]14–[188]16]. In addition, energy metabolism genes showed lower expression levels in the PCC and HIP in AD patients compared to controls [[189]27]. In contrast to our study, a previous study that analyzed gene expression changes using the same microarray data found that cellular physiological processes, transport, metabolism and cellular localization were pathways affected across most brain regions including the PCC, HIP, EC, and MTG [[190]26]. The unique aspect of the SWIM algorithm that most likely explains the difference in the results is that it includes fight-club hubs that are negatively correlated with their interaction partners. Therefore, although there is a dysregulation of gene expression that leads to pathways affected in multiple brain regions in AD patients, our data indicate that key switch gene changes that alter significant pathways are present mainly in in the PCC and HIP. Our results also showed that switch genes in the EC, MTG, VCX and SFG are not enriched in any particular pathways. This suggests that these brain regions may be more resistant to key switch events that cause neurodegeneration compared to the PCC and HIP regions. Similar to our findings, the earlier study that analyzed the [191]GSE5281 data found that the SFG and VCX areas, which are affected in later stages of AD, are relatively neuroprotected and capable of resisting disease pathology [[192]26]. Caberlotto and colleagues used network analysis of AD-related genes to conclude that metabolism-associated processes including insulin and fatty acid metabolism underlie the development of AD [[193]40]. In this study, seed genes associated with AD were obtained from the same transcriptomic data we used (microarray [194]GSE5281). In addition, they used single nucleotide polymorphism data from AD, molecular targets of AD drugs and AD genes present in the Online Mendelian Inheritance in Man database. We compared the switch genes identified in our study to the AD-related seed genes identified by Caberlotto and colleagues and the results indicate that many of the switch genes were the same as the seed genes, especially in the HIP and PCC strongly suggesting that dysregulation of metabolic processes are key events important to the development of AD ([195]S2 Fig and [196]S8 Table). Network analysis of the switch genes in the HIP and PCC brain regions identified several transcription factors relevant to the pathogenesis of AD. For example, network analysis identified KLF9 and KCTD2 as the main regulatory transcription factors of the HIP switch genes. Recently, Cui and colleagues demonstrated that KLF9 promotes the expression of peroxisome proliferator-activated receptor γ coactivator 1α (PGC1 α) resulting in hepatic gluconeogenesis and suggested that KLF9 may be responsible for the glucocorticoid therapy-induced diabetes [[197]41]. In this context, diabetes has been extensively associated with an increased risk for AD [[198]42]. Moreover, glucocorticoid overexposure has been associated with cognitive decline, amyloid beta misprocessing and ultimately, the development of AD [[199]43, [200]44]. Given the involvement of KLF9 in glucose homeostasis and glucocorticoid signaling, its potential as a therapeutic target for AD warrants further investigation. In addition to KLF9, KCTD2 was another key transcription factor regulating the HIP switch genes. Genome wide association studies identified KCTD2 as a shared susceptibility gene between AD and ischemic stroke [[201]45, [202]46]. Interestingly, KCTD2 may play a role in sleep regulation [[203]47, [204]48] and sleep disturbances have been linked to the development of AD [[205]49]. Similarly, network analysis identified KLF9, SP1 and CHD1 as central regulators of PCC switch genes. Dysregulation of SP1 in AD has been documented in several studies. For instance, SP1 mRNA was upregulated in brains of both human and transgenic AD model mice [[206]50]. Inhibition of SP1 function in a transgenic AD model mice increased memory deficits suggesting that it may be a useful therapeutic target [[207]51]. Another important transcription factor, CHD1, is involved in TDP-43 mediated neurodegeneration [[208]52]. Recently, Chd1 has been found to play a role in learning and memory in mice [[209]53]. Furthermore, Chd1 knockdown in mouse embryonic stem cells mimicked high fat diet and aging-induced gene expression changes [[210]54]. Collectively, the transcription factors identified in this study are involved in processes related to the pathogenesis of AD and thus may be important therapeutic targets. There is a potential caveat that should be kept in mind when interpreting the results from this study. Although several possible gene expression datasets were identified, only one study achieved the high stringent p-values required for the SWIM analysis. Therefore, the results presented herein may be specific for this dataset and not of AD in general. Nonetheless, the pathways and transcription factors identified in this study have been associated with AD by other investigations. Future studies will seek to confirm the validity of these findings in an independent microarray. Conclusions This study provides novel insights into the key switch events that occur in the HIP and PCC involved in the transformation from a healthy aging brain to that of an AD patient. The majority of the pathways in the HIP and PCC that are altered in AD patients are involved with metabolism including disruption of glutamine, glutamate, steroid, arginine, pyruvate and amino acids metabolism. In addition, some of the transcriptional regulators of the switch genes are involved in glucose homeostasis, glucocorticoid signaling, sleep regulation, and memory. Targeting these transcription factors may provide novel therapeutics for AD. Supporting information S1 Fig. Genes common in HIP and PCC pathway enrichment analysis. KEGG pathway enrichment analysis of swim genes common between the HIP and the PCC were performed using DAVID. The gene counts for each pathway is represented in A whereas B represent the fold enrichment. (TIFF) [211]Click here for additional data file.^ (8.6MB, tiff) S2 Fig. Venn diagram between SWIM genes and Caberlotto et al seed genes. The Venn diagrams were created using [212]http://www.interactivenn.net/. The orange sets represent the SWIM genes whereas the green sets represent the seed genes from Caberlotto et al study. (TIF) [213]Click here for additional data file.^ (11.4MB, tif) S3 Fig. Transcription factor analysis of the switch genes identified in the hippocampus. Network analysis of HIP switch genes was performed using NetworkAnalyst. Transcription factor data was derived from the ENCODE ChIP-seq database. Transcription factors (blue rectangles) and switch genes (pink circles) are ranked according to network topology measurements, degree and betweenness centrality. Transcription factors with the highest values of degree and betweenness centrality measurements are enclosed in the yellow oval. Gray lines represent protein-protein interactions. Network analysis was performed on June 2019. (TIF) [214]Click here for additional data file.^ (6.4MB, tif) S4 Fig. Transcription factor analysis of the switch genes identified in the posterior cingulate cortex. Network analysis of posterior cingulate cortex switch genes was performed using NetworkAnalyst. Transcription factor data was derived from the ENCODE ChIP-seq database. Transcription factors (blue rectangles) and switch genes (pink circles) are ranked according to network topology measurements, degree and betweenness centrality. Transcription factors with the highest values of degree and betweenness centrality measurements are enclosed in the yellow oval. Gray lines represent protein-protein interactions. Network analysis was performed on June 2019. (TIF) [215]Click here for additional data file.^ (8.7MB, tif) S1 Table. Enthorinal cortex SWIM genes. (XLSX) [216]Click here for additional data file.^ (57.4KB, xlsx) S2 Table. Hippocampus SWIM genes. (XLSX) [217]Click here for additional data file.^ (37.4KB, xlsx) S3 Table. Mid temporal gyrus SWIM genes. (XLSX) [218]Click here for additional data file.^ (47.6KB, xlsx) S4 Table. Posterior cingulate cortex SWIM genes. (XLSX) [219]Click here for additional data file.^ (54.9KB, xlsx) S5 Table. Superior frontal gyrus SWIM genes. (XLSX) [220]Click here for additional data file.^ (43.4KB, xlsx) S6 Table. Primary visual cortex SWIM genes. (XLSX) [221]Click here for additional data file.^ (46.9KB, xlsx) S7 Table. HIP and PCC SWIM genes venn diagram. (XLSX) [222]Click here for additional data file.^ (16.1KB, xlsx) S8 Table. Hippocampus pathway enrichment analysis. (XLSX) [223]Click here for additional data file.^ (12.3KB, xlsx) S9 Table. Posterior cingulate cortex pathway enrichment analysis. (XLSX) [224]Click here for additional data file.^ (10.9KB, xlsx) S10 Table. HIP/PCC pathway enrichment analysis. (XLSX) [225]Click here for additional data file.^ (10KB, xlsx) Acknowledgments