Abstract Although recognized as a promising microbial cell factory for producing biofuels, current productivity in cyanobacterial systems is low. To make the processes economically feasible, one of the hurdles, which need to be overcome is the low tolerance of hosts to toxic biofuels. Meanwhile, little information is available regarding the cellular responses to biofuels stress in cyanobacteria, which makes it challenging for tolerance engineering. Using large proteomic datasets of Synechocystis under various biofuels stress and environmental perturbation, a protein co-expression network was first constructed and then combined with the experimentally determined protein–protein interaction network. Proteins with statistically higher topological overlap in the integrated network were identified as common responsive proteins to both biofuels stress and environmental perturbations. In addition, a weighted gene co-expression network analysis was performed to distinguish unique responses to biofuels from those to environmental perturbations and to uncover metabolic modules and proteins uniquely associated with biofuels stress. The results showed that biofuel-specific proteins and modules were enriched in several functional categories, including photosynthesis, carbon fixation, and amino acid metabolism, which may represent potential key signatures for biofuels stress responses in Synechocystis. Network-based analysis allowed determination of the responses specifically related to biofuels stress, and the results constituted an important knowledge foundation for tolerance engineering against biofuels in Synechocystis. Keywords: biofuels, network, WGCNA, tolerance, Synechocystis Introduction Human society has been dependent on fossil fuels for centuries. However, fossil fuels are not an infinite resource, and the possibility of their running out in the future and the increasing concerns over energy security and global climate change pose an urgent call for developing renewable ways to produce fuels. Among all alternatives, photosynthetic cyanobacteria have recently attracted significant attention as a promising “microbial cell factory” to produce renewable biofuels due to their capability to utilize solar energy and CO[2] as the sole energy and carbon sources, respectively (Ducat et al., [31]2011; Quintana et al., [32]2011; Robertson et al., [33]2011). Cyanobacteria contain considerable amounts of lipids in the thylakoid membranes and possess higher photosynthetic efficiency and faster growth rate compared to eukaryotic green algae and higher plants (Quintana et al., [34]2011). In addition, cyanobacteria have a relatively simple genetic background and are amenable to modification by metabolic engineering and synthetic biology (Wang et al., [35]2012a). Recent efforts have led to successful production of various biofuels in engineered cyanobacterial cells, such as ethanol (Deng and Coleman, [36]1999), butanol and isobutanol (Atsumi et al., [37]2009), alkanes (Choi and Lee, [38]2013), and biodiesel (Da Ros et al., [39]2013). However, the current biofuel productivity in the cyanobacterial systems is several orders of magnitude lower than their native producing microbes (Jin et al., [40]2014). In addition to ongoing efforts to optimize the existing pathways and to discover and construct novel pathways, one option to achieve high productivity is to improve cellular tolerance to toxic biofuel products synthesized by the cyanobacterial hosts (Dunlop, [41]2011; Zingaro and Papoutsakis, [42]2012). Although response mechanisms against biofuels have been extensively studied in many native biofuel-producing microbes (Couto et al., [43]1997; Dunlop, [44]2011), it remains unclear for cyanobacteria. As part of our long-term goal to construct more robust and product-tolerant photosynthetic “chassis” for synthesizing various renewable biofuels, our laboratory has applied integrated transcriptomic, proteomic, and metabolomic approaches to determine the metabolic profiles of a model cyanobacterium Synechocystis sp. PCC 6803 (hereafter Synechocystis) stressed under various biofuels (Liu et al., [45]2012; Qiao et al., [46]2012; Wang et al., [47]2012b; Tian et al., [48]2013; Zhu et al., [49]2013). Consistent with early genome-level studies in other microbes (Nicolaou et al., [50]2010; Dunlop, [51]2011), our previous results showed that Synechocystis cells employed a combination of multiple resistance mechanisms in dealing with biofuels stress (Wang et al., [52]2012b). In addition, the comparative proteomic analysis provided strong evidence that proteins involved in multiple aspects of photosynthesis (i.e., photosystems I and II, cytochrome, and ferredoxin) were up-regulated in ethanol-treated Synechocystis (Qiao et al., [53]2012), suggesting there could be unique response mechanisms employed by cyanobacteria to combat biofuel toxicity. Although initial efforts using a conventional approach of analyzing individual genes/proteins according to fold change and statistical significance has led to determination of the responses associated with each of the biofuels in Synechocystis (Liu et al., [54]2012; Qiao et al., [55]2012; Wang et al., [56]2012b; Tian et al., [57]2013; Zhu et al., [58]2013), it becomes clear that network-focused rather than individual gene/protein-focused methodologies would be more appropriate to obtain a complete picture of cellular response (Lehtinen et al., [59]2013). In addition, the network analysis defines modules and their possible biological roles based on connectivity of proteins or genes rather than using any artificial cutoff, which may avoid information loss related to genes/proteins of low abundance or small fold changes, such as signal transduction genes. In recent years, network analysis has been applied to cyanobacterial studies. For example, Singh et al. ([60]2010) constructed a Bayesian network of Synechocystis using transcriptomic data and defined a set of genes as the core transcriptional response (CTR) that are commonly regulated under most of environmental perturbations (Singh et al., [61]2010). McDermott et al. ([62]2011) developed a predictive in silico model of diurnal and circadian behavior of Cyanothece 51142 using transcriptomic data, and the results showed that incorporation of network topology into the model could improve the ability to explain the behavior (McDermott et al., [63]2011). Recently, Wang et al. ([64]2013b) utilized a weighted gene co-expression network analysis (WGCNA) approach to establish transcriptional networks for four cyanobacterial species under metal stresses, and a further cross-species network comparison led to the discovery of several core response modules and genes that may be essential to all metal stresses, as well as species-specific hub genes for metal stresses (Wang et al., [65]2013b). The studies demonstrated that network-based analysis could be a powerful tool in deciphering cellular responses. In this study, to further identify responses specifically related to biofuels stress that could be used as potential targets for rational tolerance engineering, a topological analysis of global proteins co-expression network combined with protein–protein interaction (PPI) network was first performed to uncover a core set of proteins commonly responsive to both biofuels stress and environmental perturbations. Then, a WGCNA was applied to identify responses specifically related to biofuels stress. The combination of both analyses allowed the identification of the protein network signatures associated with exogenous biofuels treatments, and provided new insights into the molecular mechanisms against biofuels stress in Synechocystis. Materials and Methods Proteomic data sources A total of five iTRAQ LC-MS/MS datasets of Synechocystis sp. PCC 6803 from our previous study were re-analyzed at a peptide level. Growth of Synechocystis under ethanol, butanol, hexane, salt stress conditions with dosages of 1.5% (v/v), 0.2% (v/v), 0.8% (v/v), 4% (w/v), and nitrogen starvation, which led to ~50% growth reduction were then determined. For each condition, cells were harvested at two time points (24 and 48 h) that were corresponding to middle-exponential and exponential-stationary transition phases in the growth time courses for proteomics analysis. Each biological replicates sample has two technical replicates. Due to the page limitation, for details about the environmental perturbation and biofuel stress experiments and original proteomic datasets please find from several previous publications (Liu et al., [66]2012; Qiao et al., [67]2012, [68]2013; Huang et al., [69]2013; Tian et al., [70]2013). Proteomic data analysis The mass spectroscopy analysis was performed using a AB SCIEX TripleTOF™ 5600 mass spectrometer (AB SCIEX, Framingham, MA, USA), coupled with online micro flow HPLC system (Shimadzu Co, Kyoto, Japan) as described previously. Genome sequence and annotation information of Synechocystis sp. PCC 6803 were downloaded from NCBI ([71]ftp://ftp.ncbi.nlm.nih.gov/genomes). The details for the experimental design, execution, and proteomic data analysis can be found in the original publications (Liu et al., [72]2012; Qiao et al., [73]2012, [74]2013; Huang et al., [75]2013; Tian et al., [76]2013). Protein co-expression network construction To construct the association network from proteomic data, we used a multi-step procedure for network construction: first, we performed a procedure for data normalization identical with Principal component analysis (PCA) (See below); second, correlation values were calculated between present values for all pairs of peptides. In this study, we used peptides rather than proteins to construct the protein co-expression network. One reason is lots of related peptides from the same protein are always observed in discordance, which may be due to different post-translational modifications or isoforms. Correlation is calculated as the Pearson correlation coefficient for all pairwise peptides. Third, in order to generate a reliable protein co-expression network, high correlation coefficients (r > 0.9) was used, where only gene pairs with a correlation coefficient higher than 0.9 were considered connected. Finally, we combined protein co-expression network with experimentally determined PPI network (Sato et al., [77]2007). In this process, known PPI between observed proteins already in the co-expression network were added as new edges to the network. Topological analysis Topological analysis of networks was performed using Cytoscape software. Bottleneck and hub proteins were defined as the top 20% of proteins ranked by the values of betweenness and degree centrality, respectively (McDermott et al., [78]2011, [79]2012). The degree and betweenness centrality metrics were defined according to the methods described by McDermott et al. ([80]2012). Briefly, degree centrality is a metric of the connectedness of a node, and betweenness centrality is a metric that measures how often paths between nodes must traverse a given node. Generally, degree centrality is the fraction of edges for a particular protein out of all possible interactions for that protein in the network, and betweenness is the number of shortest paths between all pairs of proteins in the network that pass through a specific node. Principal components analysis The proteomics data were converted to a ratio versus control conditions. The data were then log2 scaled, and each unit reflects a twofold change in abundance. In order to avoid influence caused by missing data, peptides with any missing data in any condition were removed. Remaining core peptides identified in all conditions were subjected to PCA and partial least square-discriminant analysis (PLS-DA) by SIMCA-P 11.5 software. Averaging was taken for all technical replicates of samples, as in general good reproducibility was observed between replicates (Liu et al., [81]2012; Qiao et al., [82]2012; Tian et al., [83]2013). Weighted gene co-expression network analysis Weighted gene co-expression network analysis approach was used to establish a co-expression network from the LC-MS/MS proteomic data (Langfelder and Horvath, [84]2008). The co-expression network was created first by calculating weighted Pearson correlation matrices corresponding to peptide abundance expression, and then by following the standard procedure of WGCNA to create the networks. Briefly, weighted correlation matrices were transformed into matrices of connection strengths using a power function. These connection strengths were then used to calculate topological overlap (TO) (Langfelder and Horvath, [85]2008). The topological overlap matrix (TOM) is computed as TOM[ij] = (l[ij] + a[ij])/[min (k[i],k[j]) + 1 − a[ij]] where l[ij] is defined as the dot product on row i and column j in adjacency matrix [a] and k[i] (the connectivity) is the summation of row i in adjacency matrix [a] (Gibbs et al., [86]2013). Hierarchical clustering based on TO was used to group proteins with highly similar co-expression relationships into modules. Protein dendrograms were obtained by average linkage hierarchical clustering, while the color row underneath the dendrogram showed the module assignment determined by the Dynamic Tree Cut method. The network for each module was generated with the minimum spanning tree with a dissimilarity matrix from WGCNA. The modules with r > 0.55 and a p-value <0.1 were extracted (Wang et al., [87]2013a). Functional enrichment analysis Metabolic pathway enrichment analysis was conducted according to Kyoto Encyclopedia of Genes and Genomes (KEGG) and Cluster of Orthologous Groups of proteins (COG) database using the following formula: [MATH: p=1i=0m1< /mstyle>MiNMniNn :MATH] N is the number of all proteins with KEGG pathway annotation information, M is the number of proteins with a given KEGG pathway annotation, n is the number of the associated proteins with KEGG pathway annotation information, and m is the number of the associated proteins with a given KEGG pathway annotation. All pathway mapping was manually checked for each of the proteins. We also calculated functional enrichment by considering each group or module of interest versus all proteins in the network as a background, as the ratio of m/n versus M/N. Results and Discussion Overview of proteomics analysis The proteomic datasets used in this study are listed in Table [88]1. Briefly, the datasets contain four sets of quantitative iTRAQ–LC-MS analyses of Synechocystis grown under five stress conditions, i.e., biofuel stresses of ethanol, butanol and hexane, and environmental perturbations of high salt and nitrogen starvation. For each condition, treated and corresponding wild-type control cells were harvested at two time points (i.e., 24 and 48 h). Each biofuel-stressed dataset has two technical replicates. For overall data quality, reproducibility and full description of the proteomic datasets, please refer to several previous publications (Liu et al., [89]2012; Qiao et al., [90]2012, [91]2013; Huang et al., [92]2013; Tian et al., [93]2013). Table 1. The proteomic datasets used in this study. Condition[94]^a Unique spectra Peptides Unique peptides Proteins Ethanol (Qiao et al., [95]2012) 21,066 7,337 7,192 1,523 Butanol (Tian et al., [96]2013) 18,745 6,355 6,252 1,300 Hexane (Liu et al., [97]2012) 19,217 6,995 6,875 1,389 Salt (Qiao et al., [98]2013) 23,822 8,379 8,257 1,702 N-starvation (Huang et al., [99]2013) 23,674 8,404 8,282 1,703 [100]Open in a new tab ^aReferences for each dataset are provided.