Abstract Central nervous system dysfunction is an important cause of morbidity and mortality in patients with human immunodeficiency virus type 1 (HIV-1) infection and acquired immunodeficiency virus syndrome (AIDS). Patients with AIDS are usually affected by HIV-associated encephalitis (HIVE) with viral replication limited to cells of monocyte origin. To examine the molecular mechanisms underlying HIVE-induced dementia, the [28]GSE4755 Affymetrix data were obtained from the Gene Expression Omnibus database and the differentially expressed genes (DEGs) between the samples from AIDS patients with and without apparent features of HIVE-induced dementia were identified. In addition, protein–protein interaction networks were constructed by mapping DEGs into protein–protein interaction data to identify the pathways that these DEGs are involved in. The results revealed that the expression of 1,528 DEGs is mainly involved in the immune response, regulation of cell proliferation, cellular response to inflammation, signal transduction, and viral replication cycle. Heat-shock protein alpha, class A member 1 (HSP90AA1), and fibronectin 1 were detected as hub nodes with degree values >130. In conclusion, the results indicate that HSP90A and fibronectin 1 play important roles in HIVE pathogenesis. Keywords: microarray, human immunodeficiency virus, differentially expressed genes, protein-protein interaction network, gene ontology, encephalitis, dementia Introduction Central nervous system dysfunction is an important cause of mortality and morbidity in patients with human immunodeficiency virus type 1 (HIV-1) infection and acquired immunodeficiency virus syndrome (AIDS).[29]1,[30]2 Mild neurocognitive disorder and motor dysfunction as outcomes of HIV-associated dementia (HAD) may develop in association with opportunistic infections, neoplasms, and cerebrovascular disease.[31]3,[32]4 Patients with HAD usually suffer from HIV-associated encephalitis (HIVE) with viral replication limited to cells of monocyte origin.[33]5 The mechanisms of brain injury in HIV-1 infection may be multiple. At the center of the HIVE pathology, brain inflammatory infiltration includes activation of microglia, perivascular or parenchymal monocytes, multinucleated giant cells, and lymphocytes.[34]6,[35]7 Furthermore, HIVE and HAD may be accompanied by significant loss of neurons and presynaptic terminals, formation of abnormal dendrites, followed by the altered expression of neuronal growth factors, proinflammatory chemokines, and cell death genes identified as possible mechanisms of brain injury.[36]7,[37]8 According to the current data, changes in gene expression occurring in the brains of AIDS patients may induce development of brain atrophy, mild gliosis, and impaired blood–brain barrier permeability along with alterations in neuronal HIV-1 chemokine co-receptors.[38]9,[39]10 In these cases, HIVE/HAD pathology exhibits remarkable gene expression profiles as a result of HIV-altered neuronal signal transduction and astrocyte function together with stimulated apoptosis through CXCR4 receptors, independent of CD4 binding.[40]11,[41]12 Rigorous laboratory investigations of HIV-1 infection conducted over the past two decades have identified various factors and genes involved in the pathogenesis of HIVE, including T-cell receptor-mediated signaling, subcellular trafficking, transcriptional regulation, and a variety of cellular metabolic pathways.[42]13 In particular, increasing evidence has demonstrated that different serological markers such as tumor necrosis factor alpha, monocyte chemo-attractant protein 1, interleukin-6, and high-sensitivity C-reactive and soluble CD14 proteins are responsible for the pathogenic progression of HAD, leading to severe metabolic alterations and accelerated senescence.[43]14 However, despite recent advances in the elucidation of the HIV-1 pathophysiology, the molecular mechanisms involved in HIVE-induced dementia remain poorly understood. Therefore, in the present study, microarrays were utilized to identify the differentially expressed genes (DEGs) between the samples from AIDS patients with and without apparent features of HIVE-induced dementia according to the scheme depicted in [44]Figure 1. Gene ontology (GO) enrichment analysis was performed, and a protein–protein interaction (PPI) network was created by mapping the DEGs to the PPI data. The results provided from this investigation may facilitate a better understanding of the detailed molecular mechanisms underlying HIVE-induced dementia and thus assist in the selection of appropriate and effective treatment strategies for patients with neuro-AIDS. Figure 1. Figure 1 [45]Open in a new tab Scheme of the brain tissue analysis from AIDS patients with and without apparent features of HIVE-induced dementia using the Affymetrix microarray platform. Note: Brain tissue sample preparation (A). Microarray data processing (B, C) is performed to identify DEGs and construct the PPI network. Abbreviations: HIVE, HIV-associated encephalitis; DEGs, differentially expressed genes; PPI, protein-protein interaction. Materials and methods Affymetrix microarray data The transcriptional profile of [46]GSE4755 (Torres-Munoz JE, unpublished data, 2006) was obtained from the National Center of Biotechnology Information Gene Expression Omnibus (GEO) database,[47]15 which is based on the Affymetrix Human Genome U95 annotation data (chip hgu95) that was assembled using data from public repositories. In total, four specimens (n=4) were available based on the Agilent RNA 6000 Assay Platform. A00-44 and D03-046 samples correspond to AIDS patients with premortem histories of HIVE and dementia. DME01-1991 and DME02-0053 samples correspond to AIDS patients without premortem history of HIVE and dementia. HIVE status was confirmed by microscopic observation. RNA of HIV-positive patients was determined by a viral load quantification method using Gag-primers. Data preprocessing Probe cell intensity data (CEL files) (Affymetrix Inc., Santa Clara, CA, USA) were converted into expression values, and background correction was performed by the robust multiarray average algorithm[48]16 using the default settings within the Bioconductor environment.[49]17 Measurements of multiple probe sets obtained from the same genes were averaged. Analysis of DEGs For the [50]GSE4755 data set, the LIMMA package (Linear Models for Microarray Data) of the Bioconductor software[51]17 was implemented to identify relevant DEGs between the two groups of AIDS patients.[52]18,[53]19 Only DEGs with a P-value less than 0.05 were selected for further analysis. All previous steps in Affymetrix data preprocessing and DEG analysis were achieved via execution of the in-house script in the R scripting language interface ([54]Table S1). GO and Kyoto Encyclopedia of Genes and Genomes pathway analysis The Kyoto Encyclopedia of Genes and Genomes (KEGG) database[55]20,[56]21 contains the information on the networked molecules and genes. The database for annotation, visualization, and integrated discovery (DAVID) was used to analyze list of genes derived from high-throughput genomic experiments[57]22 and to identify over-presented GO categories in biological processes with a P-value less than 0.05. PPI network construction To demonstrate the potential PPI correlations, all DEGs were mapped on the compiled data set of human interactome for the PPI network construction and microarray data enrichments analysis. The human interactome kindly provided by Dr C Laudanna from the Laboratory of Cell Trafficking and Signal Transduction (University of Verona, Verona, Italy) represents nonredundant, undirected, and no-loop physical protein– protein binary interaction data set in Cytoscape sif format comprising HGNC (HUGO Gene Nomenclature Committee)-curated protein IDs compiled from different sources. Next, a PPI network was constructed by the Cytoscape v2.8 software platform[58]23 based on the PPI correlations. Molecular complex detection analysis The molecular complex detection (MCODE) algorithm,[59]24 a well-known automated method to find highly interconnected subgraphs, detects densely connected regions in large PPI networks that may represent molecular complexes. In the present study, clustered subnetworks of highly intra-connected nodes (n>20) in the network were searched using a Cytoscape AllegroMCODE plug-in. Next, the identified subnetworks were used for functional enrichment analysis. Results and discussion Determination of DEGs To obtain the DEGs between the different groups of AIDS patients with different race, age, and sex backgrounds, publicly available microarray data sets were retrieved from the GEO repository using the Agilent RNA 6000 Assay Microchip technology. Fresh frozen autopsied human brain samples were prepared as 0.8–1.0 cm^3 fragments for microarray gene expression analysis from frontal cortex dissections, which were validated by quantitative real-time polymerase chain reaction (Torres-Munoz JE, unpublished data, 2006). Prior to DEG determination, the microarray raw data were tested for quality assessment purposes, comparing it to a synthetic array created by taking probe-wise medians. Quality problems are most apparent either from an MA-plot where the loess smoother oscillates a great deal or if the variability of the log fold change (M) values seems greater than those of other arrays in the data set. All analyzed microarrays revealed no distorted or aberrant loess lines on the MA-plots, which often indicate the absence of potential quality problems. The median and interquartile range curves appeared on each plot are close to the zero line representing good data quality ([60]Figure 2A–D). Subsequently, data preprocessing was performed to eliminate the effect of background noise and to normalize and summarize expression values per each probe set of the database ([61]Figure 3A and B). The samples have 1,528 overlapping genes with a P-value less than 0.05, and no distinct set of DEGs was detected through a screening of 12,625 genes identified in all samples of AIDS patients with and without HIVE-induced dementia. Figure 2. [62]Figure 2 [63]Open in a new tab MA-plots of Affymetrix microarrays (n=4) plotted with common pseudo-array reference and represented each gene with a dot for AIDS patients with (A and B) and without (C and D) HIVE-induced dementia. Notes: Loess line and horizontal axis at M=0 are depicted in red and blue, respectively. The x- and y-axis are on a binary logarithmic (log[2]) scale. Abbreviations: AIDS, acquired immunodeficiency virus syndrome; HIV, human immunodeficiency virus; HIVE, HIV-associated encephalitis; IQR, interquartile range. Figure 3. [64]Figure 3 [65]Open in a new tab Box-plots of raw (A) and RMA-normalized (B) data from four replicate Affymetrix arrays. Note: The y-axis is on a binary logarithmic scale. Abbreviation: RMA, robust multiarray average. GO enrichment analysis of DEGs GEO analysis has become a commonly utilized approach for functional annotation of large-scale genomic data.[66]25 To investigate the functional changes in the pathological course of HIV-1 infection, the DEGs were mapped to the DAVID database. DAVID is one of the most popular tools in the field of high-throughput functional annotation software cited more than 2,000 publications.[67]26 This project provided three structural networks of defined terms to describe the gene product attributes: biological process, molecular function (MF), and cellular compartment (CC). In the present investigation, the majority of the enriched genes were upregulated in the samples from AIDS patients with and without HIVE-induced dementia. The DEGs were most commonly associated with CC and MF, including plasma membrane/nonmembrane-bound organelle involvement and nucleotide/nucleoside binding ([68]Table 1). To visualize the category relationships by “induced” graphs for the top GO:0005886, GO:0044459GO, and GO:0000166 terms with the highest counts of DEGs in the samples from AIDS patients, the biograph function and the getmatrix method from the MATLAB Bioinformatics Toolbox (MathWorks, Natick, MA, USA) were used. They return a square matrix of relationships between GO terms for the each GO object where the relations of GO CC or MF with cellular parts and metabolic processes are shown ([69]Figure 4A–C). Table 1. Top ten significantly enriched GO terms with high counts of DEGs in samples from AIDS patients with and without HIVE-induced dementia Term Category Description Count P-value GO:0005886 CC Plasma membrane 380 4.54E−07 GO:0044459 CC Plasma membrane part 262 1.78E−11 GO:0000166 MF Nucleotide binding 246 7.66E−07 GO:0043228 CC Nonmembrane-bounded organelle 243 0.009174 GO:0043232 CC Intracellular nonmembrane-bounded organelle 243 0.009174 GO:0017076 MF Purine nucleotide binding 216 6.24E−07 GO:0032553 MF Ribonucleotide binding 211 1.97E−07 GO:0032555 MF Purine ribonucleotide binding 211 1.97E−07 GO:0001883 MF Purine nucleoside binding 180 8.27E−06 GO:0001882 MF Nucleoside binding 180 1.23E−05 [70]Open in a new tab Abbreviations: GO, gene ontology; DEGs, differentially expressed genes; AIDS, acquired immunodeficiency virus syndrome; HIV, human immunodeficiency virus; HIVE, HIV-associated encephalitis; CC, cellular compartment; MF, molecular function. Figure 4. [71]Figure 4 [72]Open in a new tab “Induced” graphs of the enriched GO categories for GO:0005886 (A), GO:0044459GO (B), and GO:0000166 (C) terms with highest counts of DEGs in samples from AIDS patients with and without HIVE-induced dementia. Abbreviations: GO, gene ontology; DEGs, differentially expressed genes; AIDS, acquired immunodeficiency virus syndrome; HIV, human immunodeficiency virus; HIVE, HIV-associated encephalitis. To gain further insights into the biological pathway changes, the online biological classification tool incorporated in the DAVID database was used.[73]22 The main goal of the pathway analysis was to identify the pathways that were significantly enriched in the pathological process represented in the brain samples of AIDS patients. Significant enrichment of those DEGs in multiple KEGG terms was observed. The most significantly enriched pathways for DEGs in samples of AIDS patients with and without HIVE-induced dementia were ErbB (erythroblastic leukemia viral oncogene) and MAPK (mitogen-activated protein kinase) signaling pathways with high numbers of genes especially involved in the latter pathway in particular. The GO terms were significantly overrepresented for DEG-related biological phenomena, such as nucleic acid and cellular metabolic processes. From the results of the KEGG pathway enrichment analysis, it is clear that the majority of enriched GO terms of DEGs from the samples of AIDS patients were correlated with cell signaling pathways, endocytosis-mediated, and cell–cell adhesion mechanisms ([74]Table 2). Some pathways, such as neurotrophin signaling pathway and the pathways involved in colorectal cancer, suggested that neuronal death features of HIV-related brain damage and high prevalence of neoplastic tumors of the gastrointestinal tract are also present. The last finding is also consistent with various published data reporting an elevated risk and earlier age of onset of colonic neoplasia in the HIV/AIDS population.[75]27–[76]30 Table 2. Top ten enriched KEGG pathways of DEGs with low P-values in samples of AIDS patients with and without HIVE-induced dementia Pathway Genes P-value ErbB signaling pathway 25 7.18E−06 MAPK signaling pathway 53 7.40E−06 Focal adhesion 42 2.31E−05 Endocytosis 39 3.47E−05 Epithelial cell signaling 20 5.34E−05 Gap junction 23 1.08E−04 Phosphatidylinositol signaling 20 1.85E−04 Colorectal cancer 20 0.001031 ECM–receptor interaction 20 0.001031 Neurotrophin signaling pathway 26 0.001098 [77]Open in a new tab Abbreviations: KEGG, Kyoto Encyclopedia of Genes and Genomes; DEGs, differentially expressed genes; AIDS, acquired immunodeficiency virus syndrome; HIV, human immunodeficiency virus; HIVE, HIV-associated encephalitis; ErbB, erythroblastic leukemia viral oncogene; MAPK, mitogen-activated protein kinase; ECM, extracellular matrix. Construction of PPI network Network analysis has been shown to be a powerful tool to understand biological responses in health and disease.[78]31 In the PPI network, the nodes are proteins and the edges are functional interactions. To construct the PPI network and maximize the coverage of the genome, PPI data were obtained from the human interactome as an example of a genome-scale functional network, or “human reference network”. This reference interactome network represents nonredundant, undirected, and no-loop physical protein–protein binary interaction data set comprising 16,018 unique HGNC-curated protein IDs with 299,760 binary interactions compiled from different databases and literature. The network is generic, and the same regardless of the tissue, genotype, and pathologic state. To obtain correct and relevant PPIs, only DEGs related to AIDS patients were mapped on the reference network, and the rest was deleted. The new “local” network was further refined by removing duplicated edges and self-loops comprising only 949 nodes and 4,789 edges ([79]Figure 5) to reveal highly expressed genes with minimal (P-value ≥1.33E−10) and maximal (P-value ≤0.006) P-values. The resulting PPI network was not weighted, since each PPI occurred only once. Since the resulting network was too large to provide more specific and detailed information, it proved necessary to divide the network into subnetworks, each of which represented protein subcomplexes or functional modules. Figure 5. [80]Figure 5 [81]Open in a new tab PPI network constructed from the list of DEGs on the basis of human interactome using spring embedded layout. Notes: The red and blue nodes indicate the highly expressed DEGs with minimal (P-value ≥1.33E−10) and maximal (P-value ≤0.006) P-values. The ellipse and rectangular node shapes indicate the DEGs with low (log[2] FC ≥−2.54) and high (log[2] FC ≤3.22) log[2] FC values. Nodes with high degree values (hubs) are depicted using large node shape feature by creating pseudo-exponential gradient mapping. Abbreviations: PPI, protein–protein interaction; DEGs, differentially expressed genes; FC, fold change; min, minimum; max, maximum. In order to achieve this, the new network was analyzed using the AllegroMCODE plug-in and implementing the MCODE algorithm with a k-core value of 2.0, node score cutoff of 0.2, maximum depth from the seed node of 100 and graphics-processing-unit-based parallelization to find clusters efficiently. The MCODE method performed a typical seed-growth-style-clustering algorithm, which ranks the subnetworks with clusters according to the average number of connections per protein in the complex as node score. A total of 22 subnetworks were found, among which six were detected with the intra-connection nodes >20 and node score >2.0 ([82]Table 3 and [83]Figure 6). All clustered subnetworks were varied in size containing a total of 166 proteins and 706 PPIs. The clusters were mutually inclusive indicating that nodes were shared between the individual clusters. In each subnetwork, the score was mainly determined by a node score cutoff value to define the number of nodes in a subnetwork. The functions of these subnetworks were primarily corresponded to immune response, regulation of cell proliferation and cellular response to inflammation. Table 3. Statistics for top six subnetworks identified by MCODE method in PPI network for AIDS patients with and without HIVE-induced dementia Subnetwork Score Proteins Interactions 1 8.34 35 292 2 6.27 22 138 3 3.26 27 88 4 2.43 28 68 5 2.32 28 65 6 2.12 26 55 [84]Open in a new tab Abbreviations: MCODE, molecular complex detection; PPI, protein–protein interaction; AIDS, acquired immunodeficiency virus syndrome; HIV, human immunodeficiency virus; HIVE, HIV-associated encephalitis. Figure 6. [85]Figure 6 [86]Open in a new tab Subnetworks identified from the local PPI network using unweighted force-directed layout. Notes: The red and blue nodes indicate the highly expressed DEGs with minimal (P-value ≥1.33E−10) and maximal (P-value ≤0.006) P-values. The ellipse and rectangular node shapes indicate the DEGs with low (log[2] FC ≥−2.54) and high (log[2] FC ≤3.22) log[2] FC values. Nodes with high degree values (hubs) are depicted using large node shape feature by creating pseudo-exponential gradient mapping. Abbreviations: PPI, protein–protein interaction; DEGs, differentially expressed genes; FC, fold change. Next, the degree of each node in the network was calculated with a degree-sorted circle layout algorithm, which identified hubs as nodes with a degree value of >55 ([87]Table 4). The degree corresponded to the number of edges connecting all of the nodes in the network. A higher value for the degree indicated a highly connected network, which was likely to be more robust. Table 4. Top 15 hub nodes identified in PPI network for DEGs from samples of AIDS patients with and without HIVE-induced dementia ID Name P-value log[2] FC Degree HSP90AA1 Heat-shock protein alpha, class A member 1 0.003639 0.454711 139 FN1 Fibronectin 1 5.74E−06 −1.07768 138 SUMO1 SMT3 suppressor of mif two 3 homolog 1 2.14E−04 0.489265 122 TUBB Tubulin, beta 8.60E−04 −0.38154 83 HSP90AB1 Heat-shock protein alpha, class B member 1 8.31E−04 0.390789 82 NEDD8 Neural precursor cell 8 0.00373 0.301632 76 FYN FYN oncogene related to SRC, FGR, YES 9.45E−04 −0.36713 74 PRKDC DNA-activated, catalytic polypeptide 0.002548 0.31338 68 CALM1 Calmodulin 1 (phosphorylase kinase, delta) 0.001049 0.373122 67 ABL1 c-abl oncogene 1, receptor tyrosine kinase 0.005356 −0.28995 62 MAPK14 Mitogen-activated protein kinase 14 0.003497 0.321649 58 CSNK1E Casein kinase 1, epsilon 0.00417 −0.29183 57 YBX1 Y box binding protein 1 1.86E−06 −1.3169 57 DDX3X DEAD box polypeptide 3, X-linked 0.001003 −0.49537 56 RPA1 Replication protein A1, 70 kDa 6.28E−04 0.400196 56 [88]Open in a new tab Abbreviations: PPI, protein–protein interactions; DEGs, differentially expressed genes; AIDS, acquired immunodeficiency virus syndrome; HIV, human immunodeficiency virus; HIVE, HIV-associated encephalitis; FC, fold change; SMT3, ubiquitin-like protein of the SUMO family; SUMO, small ubiqutin-like modifier; FYN, proto-oncogene tyrosine-protein kinase Fyn; SRC, proto-oncogene tyrosine-protein kinase Src; FGR, tyrosine-protein kinase Fgr; YES, tyrosine-protein kinase Yes; DEAD, D-E-A-D amino acid sequence (asp-glu-ala-asp). Subnetworks 1, 2, and 6 were determined as cytoskeleton- and protein folding-associated networks containing the highest percentage of hubs of highly expressed genes ([89]Figure 7). On the other hand, subnetworks 3–5 were defined to associate with RNA modifications, protein synthesis, and inflammatory processes. Figure 7. Figure 7 [90]Open in a new tab Total percentage of hubs identified for six subnetworks to detect main DEGs associated with different cellular processes. Abbreviation: DEGs, differentially expressed genes. Heat-shock protein alpha (90 kDa), class A member 1 (HSP90AA1), and fibronectin 1 (FN1) were detected as “ superhubs” with a degree value of >130. Previously, members of the HSP90 family have been reported as interacting with a large number of viral proteins including HIV-1 Pr55(Gag) and Gag-Pol proteins.[91]32,[92]33 In particular, positional proteomics analysis identified the cleavage of cytosolic HSP90AA1 at 473–474 and 491–492 amino acid residues by the HIV-1 protease.[93]34 Moreover, hyperthermia-increased expression of HSP90 proteins in CD4+ T-cells were found to enhance HIV-1 Tat-mediated transactivation of the retroviral long terminal repeats stimulating single-cycle HIV-1 infection.[94]35 On the other hand, FN1, a high-molecular weight (~440 kDa) glycoprotein, was also found to interact with the structural and regulatory proteins of HIV-1, including envelope surface glycoproteins (gp120, 160, and 41), Nef/Tat, and Gag-Pol proteins.[95]36–[96]41 Notably, HIV-1 gp120 induces phosphorylation of FN1 and enhances the physical association between this protein and Robo4 in human lymphatic endothelial cells.[97]41 Additionally, the FN1 molecule modulates the effects of the HIV-1 Tat protein on endothelial cells and murine Kaposi’s sarcoma-like cells.[98]42 Contrary to the previous report of Petito et al,[99]43 we did not find any evidence of up-regulated gene expression profiles for CD8+ and cytotoxic T lymphocyte-associated genes at the systems biology level in autopsied AIDS brains with and without HIVE probably because of the differences in brain samples taken from different parts of the brain. The absence of DEGs participated in T-cell recruitment and associated with neurological lesions in the temporal lobe was due to the remote location of these lesions from the frontal cortex sampled for RNA extraction. Furthermore, our findings showed that a number of proteins associated with the hub DEGs might be also used as markers to suspect the development and progression of AIDS or HIVE and dementia, and confirm some adverse effects of antiretroviral therapy. In particular, some of the identified proteins (SUMO1, FYN, and NEDD8) were found to be involved in the HIV-1 covalent modification and replication[100]44,[101]45 and HIV-associated lipodystrophy.[102]46 The NEDD8 protein is a significant parameter to monitor during disease because, in AIDS patients receiving anti-HIV therapy, this protein has a considerable relevance as anti-HIV medications cause lipodystrophy in the long term. Conclusion The present study analyzed the gene expression profiles and pathways that may be involved in the progression of neuro-AIDS in patients with or without apparent features of HIVE-induced dementia by using a comprehensive bioinformatics analysis. To achieve this goal, the [103]GSE4755 Affymetrix data were extracted from the GEO database to identify DEGs associated with the disease. In the next step, a PPI network was created by mapping relevant DEGs into the PPI data to pinpoint the interactions for these DEGs involved in this process. The results indicated the expression of 1,528 DEGs, which were mainly involved in the immune response, regulation of cell proliferation, cellular response to inflammation, signal transduction, and viral replication cycle. The data also show that the subnetworks one, two, and six were determined to contain the maximum number of hubs mainly relating to the regulation of cellular metabolism and response to inflammation. It is important because skewing of cellular metabolism/inflammation plays a significant part in progression to AIDS and also in neurodegeneration. The discovery of HSP90 proteins that play vital role in CD4+ T-cells were found to enhance HIV-1 Tat-mediated transactivation of the retroviral long terminal repeats stimulating single-cycle HIV-1 infection is an important one in the context of HIV-1. It was also determined that HSP90AA1 and FN1 molecules might play important roles in the mechanism of HIVE since they were defined to be as hub nodes with degree values >130. However, further studies are required to confirm these observations and determine their clinical utility in the therapeutic management of HIV-related neurological syndromes. Supplementary material Table S1. R script to define and analyze differentially expressed genes (DEGs) from four Affymetrix microarrays ##################### ####### script.R #### ##################### ## How to run this script: # source(“script.R”) ## (A) RMA normalization # Loads required libraries library(affy); library(limma); library(gcrma) # Import expression raw data and stores them as AffyBatch object data <- ReadAffy() # Normalizes the data with ‘rma’ function and assigns them to exprSet object eset_rma <- rma (data) # Create box plots for raw data and normalized data pdf (file = “raw_boxplot.pdf ”) boxplot (data, col = “gray”, main = “Raw Data”); dev.off() pdf (file = “rma_boxplot.pdf ”); boxplot(data.frame(exprs(eset_rma)), col = “gray”, main = “RMA Normalized Data”); dev.off() ## (B) DEG analysis for RMA data # Creates appropriate design matrix design <- model.matrix (~0 + factor (c(1,1,2,2))) # Assigns nicer column names colnames (design) <- c(“Gl”, “G2”) # Creates appropriate contrast matrix for pairwise comparisons contrast.matrix <- makeContrasts (G2-Gl, Gl-G2, levels = design) # Fits a linear model for each gene based on the given series of arrays fit <- lmFit(eset_rma, design) # Computes estimated coefficients and standard errors for a given set of contrasts fit2 <- contrasts.fit(fit, contrast.matrix) # Computes moderated t-statistics and log-odds of differential expression by empirical Bayes shrinkage fit2 <- eBayes (fit2) rma_deg <- topTable(fit2, coef = 1, adjust = “fdr”, sort.by = “B”, number = 50000) rma_deg_result <- rma_deg[rma_deg_result$adj.P.Val < 0.05,] # Exports all expression values to tab delimited text file write.table(rma_deg_result, “rma_deg_result.xls”, col.names = NA, quote= FALSE, sep = “\t”) ## (C) Create Venn diagram for RMA data rma_venn <- decideTests(fit2,p.value = 0.05) pdf(file = “rma_venn.pdf ”); vennDiagram(rma_venn); dev.off() ## (D) DEG annotations # Verify annotation platform for expression values to tab delimited text file eset_rma@annotation # Loads required libraries library (annotate); library (hgu95av2.db) # List objects availble in annotation package. is (“package:hgu95av2.db”) # Extract feature names from normalized dataeset ID <- featureNames(eset_rma) # Check gene symbol, name, and ensembl gene IDs Symbol <- getSYMBOL(ID, “hgu95av2.db”) Name <- as.character(lookUp(ID, “hgu95av2.db”, “GENENAME”, “P.Value”)) Ensembl <- as.character(lookUp(ID, “hgu95av2.db”, “ENSEMBL”)) # Create hyperlink for ensembl genome browser Ensembl <- ifelse(Ensembl=“NA”,NA, paste ("”, Ensembl, “”, sep=””)) # Make temporary data frame with all IDs tmp_rma <- data.frame(ID = ID, Symbol = Symbol, Name = Name, Ensembl = Ensembl, stringsAsFactors = F) # Make “NA” characters for temporary data frame tmp_rma[tmp_rma=“NA”] <- NA # Exports all expression values to tab delimited text file write.table(tmp_rma, file = “annot_rma.xls”, row.names = F, sep = “\t”) # Merge annotations and expression data values all <-merge(tmp_rma, rma_deg_result, by.x = “ID”, by.y= “ID”, all= T) # Exports all expression values to tab delimited text file write.table(all, file = “annot_deg_rma_fmal.xls”, col.names = NA, sep = “\t”) [104]Open in a new tab Acknowledgments