Abstract
Although recognized as a promising microbial cell factory for producing
biofuels, current productivity in cyanobacterial systems is low. To
make the processes economically feasible, one of the hurdles, which
need to be overcome is the low tolerance of hosts to toxic biofuels.
Meanwhile, little information is available regarding the cellular
responses to biofuels stress in cyanobacteria, which makes it
challenging for tolerance engineering. Using large proteomic datasets
of Synechocystis under various biofuels stress and environmental
perturbation, a protein co-expression network was first constructed and
then combined with the experimentally determined protein–protein
interaction network. Proteins with statistically higher topological
overlap in the integrated network were identified as common responsive
proteins to both biofuels stress and environmental perturbations. In
addition, a weighted gene co-expression network analysis was performed
to distinguish unique responses to biofuels from those to environmental
perturbations and to uncover metabolic modules and proteins uniquely
associated with biofuels stress. The results showed that
biofuel-specific proteins and modules were enriched in several
functional categories, including photosynthesis, carbon fixation, and
amino acid metabolism, which may represent potential key signatures for
biofuels stress responses in Synechocystis. Network-based analysis
allowed determination of the responses specifically related to biofuels
stress, and the results constituted an important knowledge foundation
for tolerance engineering against biofuels in Synechocystis.
Keywords: biofuels, network, WGCNA, tolerance, Synechocystis
Introduction
Human society has been dependent on fossil fuels for centuries.
However, fossil fuels are not an infinite resource, and the possibility
of their running out in the future and the increasing concerns over
energy security and global climate change pose an urgent call for
developing renewable ways to produce fuels. Among all alternatives,
photosynthetic cyanobacteria have recently attracted significant
attention as a promising “microbial cell factory” to produce renewable
biofuels due to their capability to utilize solar energy and CO[2] as
the sole energy and carbon sources, respectively (Ducat et al.,
[31]2011; Quintana et al., [32]2011; Robertson et al., [33]2011).
Cyanobacteria contain considerable amounts of lipids in the thylakoid
membranes and possess higher photosynthetic efficiency and faster
growth rate compared to eukaryotic green algae and higher plants
(Quintana et al., [34]2011). In addition, cyanobacteria have a
relatively simple genetic background and are amenable to modification
by metabolic engineering and synthetic biology (Wang et al.,
[35]2012a). Recent efforts have led to successful production of various
biofuels in engineered cyanobacterial cells, such as ethanol (Deng and
Coleman, [36]1999), butanol and isobutanol (Atsumi et al., [37]2009),
alkanes (Choi and Lee, [38]2013), and biodiesel (Da Ros et al.,
[39]2013). However, the current biofuel productivity in the
cyanobacterial systems is several orders of magnitude lower than their
native producing microbes (Jin et al., [40]2014). In addition to
ongoing efforts to optimize the existing pathways and to discover and
construct novel pathways, one option to achieve high productivity is to
improve cellular tolerance to toxic biofuel products synthesized by the
cyanobacterial hosts (Dunlop, [41]2011; Zingaro and Papoutsakis,
[42]2012).
Although response mechanisms against biofuels have been extensively
studied in many native biofuel-producing microbes (Couto et al.,
[43]1997; Dunlop, [44]2011), it remains unclear for cyanobacteria. As
part of our long-term goal to construct more robust and
product-tolerant photosynthetic “chassis” for synthesizing various
renewable biofuels, our laboratory has applied integrated
transcriptomic, proteomic, and metabolomic approaches to determine the
metabolic profiles of a model cyanobacterium Synechocystis sp. PCC 6803
(hereafter Synechocystis) stressed under various biofuels (Liu et al.,
[45]2012; Qiao et al., [46]2012; Wang et al., [47]2012b; Tian et al.,
[48]2013; Zhu et al., [49]2013). Consistent with early genome-level
studies in other microbes (Nicolaou et al., [50]2010; Dunlop,
[51]2011), our previous results showed that Synechocystis cells
employed a combination of multiple resistance mechanisms in dealing
with biofuels stress (Wang et al., [52]2012b). In addition, the
comparative proteomic analysis provided strong evidence that proteins
involved in multiple aspects of photosynthesis (i.e., photosystems I
and II, cytochrome, and ferredoxin) were up-regulated in
ethanol-treated Synechocystis (Qiao et al., [53]2012), suggesting there
could be unique response mechanisms employed by cyanobacteria to combat
biofuel toxicity.
Although initial efforts using a conventional approach of analyzing
individual genes/proteins according to fold change and statistical
significance has led to determination of the responses associated with
each of the biofuels in Synechocystis (Liu et al., [54]2012; Qiao et
al., [55]2012; Wang et al., [56]2012b; Tian et al., [57]2013; Zhu et
al., [58]2013), it becomes clear that network-focused rather than
individual gene/protein-focused methodologies would be more appropriate
to obtain a complete picture of cellular response (Lehtinen et al.,
[59]2013). In addition, the network analysis defines modules and their
possible biological roles based on connectivity of proteins or genes
rather than using any artificial cutoff, which may avoid information
loss related to genes/proteins of low abundance or small fold changes,
such as signal transduction genes. In recent years, network analysis
has been applied to cyanobacterial studies. For example, Singh et al.
([60]2010) constructed a Bayesian network of Synechocystis using
transcriptomic data and defined a set of genes as the core
transcriptional response (CTR) that are commonly regulated under most
of environmental perturbations (Singh et al., [61]2010). McDermott et
al. ([62]2011) developed a predictive in silico model of diurnal and
circadian behavior of Cyanothece 51142 using transcriptomic data, and
the results showed that incorporation of network topology into the
model could improve the ability to explain the behavior (McDermott et
al., [63]2011). Recently, Wang et al. ([64]2013b) utilized a weighted
gene co-expression network analysis (WGCNA) approach to establish
transcriptional networks for four cyanobacterial species under metal
stresses, and a further cross-species network comparison led to the
discovery of several core response modules and genes that may be
essential to all metal stresses, as well as species-specific hub genes
for metal stresses (Wang et al., [65]2013b). The studies demonstrated
that network-based analysis could be a powerful tool in deciphering
cellular responses.
In this study, to further identify responses specifically related to
biofuels stress that could be used as potential targets for rational
tolerance engineering, a topological analysis of global proteins
co-expression network combined with protein–protein interaction (PPI)
network was first performed to uncover a core set of proteins commonly
responsive to both biofuels stress and environmental perturbations.
Then, a WGCNA was applied to identify responses specifically related to
biofuels stress. The combination of both analyses allowed the
identification of the protein network signatures associated with
exogenous biofuels treatments, and provided new insights into the
molecular mechanisms against biofuels stress in Synechocystis.
Materials and Methods
Proteomic data sources
A total of five iTRAQ LC-MS/MS datasets of Synechocystis sp. PCC 6803
from our previous study were re-analyzed at a peptide level. Growth of
Synechocystis under ethanol, butanol, hexane, salt stress conditions
with dosages of 1.5% (v/v), 0.2% (v/v), 0.8% (v/v), 4% (w/v), and
nitrogen starvation, which led to ~50% growth reduction were then
determined. For each condition, cells were harvested at two time points
(24 and 48 h) that were corresponding to middle-exponential and
exponential-stationary transition phases in the growth time courses for
proteomics analysis. Each biological replicates sample has two
technical replicates. Due to the page limitation, for details about the
environmental perturbation and biofuel stress experiments and original
proteomic datasets please find from several previous publications (Liu
et al., [66]2012; Qiao et al., [67]2012, [68]2013; Huang et al.,
[69]2013; Tian et al., [70]2013).
Proteomic data analysis
The mass spectroscopy analysis was performed using a AB SCIEX
TripleTOF™ 5600 mass spectrometer (AB SCIEX, Framingham, MA, USA),
coupled with online micro flow HPLC system (Shimadzu Co, Kyoto, Japan)
as described previously. Genome sequence and annotation information of
Synechocystis sp. PCC 6803 were downloaded from NCBI
([71]ftp://ftp.ncbi.nlm.nih.gov/genomes). The details for the
experimental design, execution, and proteomic data analysis can be
found in the original publications (Liu et al., [72]2012; Qiao et al.,
[73]2012, [74]2013; Huang et al., [75]2013; Tian et al., [76]2013).
Protein co-expression network construction
To construct the association network from proteomic data, we used a
multi-step procedure for network construction: first, we performed a
procedure for data normalization identical with Principal component
analysis (PCA) (See below); second, correlation values were calculated
between present values for all pairs of peptides. In this study, we
used peptides rather than proteins to construct the protein
co-expression network. One reason is lots of related peptides from the
same protein are always observed in discordance, which may be due to
different post-translational modifications or isoforms. Correlation is
calculated as the Pearson correlation coefficient for all pairwise
peptides. Third, in order to generate a reliable protein co-expression
network, high correlation coefficients (r > 0.9) was used, where only
gene pairs with a correlation coefficient higher than 0.9 were
considered connected. Finally, we combined protein co-expression
network with experimentally determined PPI network (Sato et al.,
[77]2007). In this process, known PPI between observed proteins already
in the co-expression network were added as new edges to the network.
Topological analysis
Topological analysis of networks was performed using Cytoscape
software. Bottleneck and hub proteins were defined as the top 20% of
proteins ranked by the values of betweenness and degree centrality,
respectively (McDermott et al., [78]2011, [79]2012). The degree and
betweenness centrality metrics were defined according to the methods
described by McDermott et al. ([80]2012). Briefly, degree centrality is
a metric of the connectedness of a node, and betweenness centrality is
a metric that measures how often paths between nodes must traverse a
given node. Generally, degree centrality is the fraction of edges for a
particular protein out of all possible interactions for that protein in
the network, and betweenness is the number of shortest paths between
all pairs of proteins in the network that pass through a specific node.
Principal components analysis
The proteomics data were converted to a ratio versus control
conditions. The data were then log2 scaled, and each unit reflects a
twofold change in abundance. In order to avoid influence caused by
missing data, peptides with any missing data in any condition were
removed. Remaining core peptides identified in all conditions were
subjected to PCA and partial least square-discriminant analysis
(PLS-DA) by SIMCA-P 11.5 software. Averaging was taken for all
technical replicates of samples, as in general good reproducibility was
observed between replicates (Liu et al., [81]2012; Qiao et al.,
[82]2012; Tian et al., [83]2013).
Weighted gene co-expression network analysis
Weighted gene co-expression network analysis approach was used to
establish a co-expression network from the LC-MS/MS proteomic data
(Langfelder and Horvath, [84]2008). The co-expression network was
created first by calculating weighted Pearson correlation matrices
corresponding to peptide abundance expression, and then by following
the standard procedure of WGCNA to create the networks. Briefly,
weighted correlation matrices were transformed into matrices of
connection strengths using a power function. These connection strengths
were then used to calculate topological overlap (TO) (Langfelder and
Horvath, [85]2008). The topological overlap matrix (TOM) is computed as
TOM[ij] = (l[ij] + a[ij])/[min (k[i],k[j]) + 1 − a[ij]] where l[ij] is
defined as the dot product on row i and column j in adjacency matrix
[a] and k[i] (the connectivity) is the summation of row i in adjacency
matrix [a] (Gibbs et al., [86]2013). Hierarchical clustering based on
TO was used to group proteins with highly similar co-expression
relationships into modules. Protein dendrograms were obtained by
average linkage hierarchical clustering, while the color row underneath
the dendrogram showed the module assignment determined by the Dynamic
Tree Cut method. The network for each module was generated with the
minimum spanning tree with a dissimilarity matrix from WGCNA. The
modules with r > 0.55 and a p-value <0.1 were extracted (Wang et al.,
[87]2013a).
Functional enrichment analysis
Metabolic pathway enrichment analysis was conducted according to Kyoto
Encyclopedia of Genes and Genomes (KEGG) and Cluster of Orthologous
Groups of proteins (COG) database using the following formula:
[MATH: p=1−∑i=0m−1<
/mstyle>MiN−Mn−iNn :MATH]
N is the number of all proteins with KEGG pathway annotation
information, M is the number of proteins with a given KEGG pathway
annotation, n is the number of the associated proteins with KEGG
pathway annotation information, and m is the number of the associated
proteins with a given KEGG pathway annotation. All pathway mapping was
manually checked for each of the proteins. We also calculated
functional enrichment by considering each group or module of interest
versus all proteins in the network as a background, as the ratio of m/n
versus M/N.
Results and Discussion
Overview of proteomics analysis
The proteomic datasets used in this study are listed in Table [88]1.
Briefly, the datasets contain four sets of quantitative iTRAQ–LC-MS
analyses of Synechocystis grown under five stress conditions, i.e.,
biofuel stresses of ethanol, butanol and hexane, and environmental
perturbations of high salt and nitrogen starvation. For each condition,
treated and corresponding wild-type control cells were harvested at two
time points (i.e., 24 and 48 h). Each biofuel-stressed dataset has two
technical replicates. For overall data quality, reproducibility and
full description of the proteomic datasets, please refer to several
previous publications (Liu et al., [89]2012; Qiao et al., [90]2012,
[91]2013; Huang et al., [92]2013; Tian et al., [93]2013).
Table 1.
The proteomic datasets used in this study.
Condition[94]^a Unique spectra Peptides Unique peptides Proteins
Ethanol (Qiao et al., [95]2012) 21,066 7,337 7,192 1,523
Butanol (Tian et al., [96]2013) 18,745 6,355 6,252 1,300
Hexane (Liu et al., [97]2012) 19,217 6,995 6,875 1,389
Salt (Qiao et al., [98]2013) 23,822 8,379 8,257 1,702
N-starvation (Huang et al., [99]2013) 23,674 8,404 8,282 1,703
[100]Open in a new tab
^aReferences for each dataset are provided.