Abstract
Despite responses to salinity stress in Dunaliella salina, a
unicellular halotolerant green alga, being subject to extensive study,
but the underlying molecular mechanism remains unknown. Here, Empirical
Bayes method was applied to identify the common differentially
expressed genes (DEGs) between hypersaline and normal conditions. Then,
using weighted gene co-expression network analysis (WGCNA), which takes
advantage of a graph theoretical approach, highly correlated genes were
clustered as a module. Subsequently, connectivity patterns of the
identified modules in two conditions were surveyed to define preserved
and non-preserved modules by combining the Zsummary and medianRank
measures. Finally, common and specific hub genes in non-preserved
modules were determined using Eigengene-based module connectivity or
module membership (k[ME]) measures and validation was performed by
using leave-one-out cross-validation (LOOCV). In this study, the power
of beta = 12 (scale-free R2 = 0.8) was selected as the
soft-thresholding to ensure a scale-free network, which led to the
identification of 15 co-expression modules. Results also indicate that
green, blue, brown, and yellow modules are non-preserved in salinity
stress conditions. Examples of enriched Kyoto Encyclopedia of Genes and
Genomes (KEGG) pathways in non-preserved modules are Sulfur metabolism,
Oxidative phosphorylation, Porphyrin and chlorophyll metabolism,
Vitamin B6 metabolism. Moreover, the systems biology approach was
applied here, proposed some salinity specific hub genes, such as
radical-induced cell death1 protein (RCD1), mitogen-activated protein
kinase kinase kinase 13 (MAP3K13), long-chain acyl-CoA synthetase
(ACSL), acetyl-CoA carboxylase, biotin carboxylase subunit (AccC), and
fructose-bisphosphate aldolase (ALDO), for the development of
metabolites accumulating strains in D. salina.
Subject terms: Systems biology, Regulatory networks, Computational
biology and bioinformatics, Gene regulatory networks
Introduction
Dunaliella salina is a unicellular halotolerant green alga that can
survive in saturated brine (up to 5.5 M NaCl)^[26]1. This feature makes
it an interesting model organism for studying salt tolerance. On the
other hand, as the main producer of carotenoids and lipids, D. salina
is widely used for food and drug industries^[27]2.
Modifying the environmental circumstances, especially salinity
concentration, are among the most effective approach put forth to
enhance different metabolites accumulation in these microalgae.
However, the biomass productivity of D. salina is retarded in a
hypersaline conditions^[28]3, making it difficult to improve metabolite
production in large scales.
Exploration of salt stress-responding mechanisms is an inevitable step
in resolving these problems. The contribution of ROS and calcium
signaling pathway in response to salinity condition has been reported
previously^[29]4. Expressed sequence tag (EST) profiling of D. salina
in hypersaline condition also identified 1401 unique responsive
transcripts were contributed in protein synthesis, energy, primary
metabolism, and protein fate^[30]5. Moreover, a recent global
transcriptome sequencing showed that enhancements of photosynthesis and
biosynthesis of porphyrins, as well as degradation of starch, synthesis
of glycerol, membrane lipid desaturation, are taken part in D.salina
responses to hypersaline conditions^[31]6. Hovewer, due to the
complexity of salt stresses responding processes in microalgae,
underlying molecular mechanisms for salt stress response in microalgae
remain a daunting challenge^[32]7.
There is a lack of integrative investigation on transcriptomes data of
D.salina under various salinity stress pressures over ranges of
treatment periods. Moreover, all of the previous researches have solely
focused on differentially expressed genes identification, wherease
connectivity analysis has not yet been considered. In contrast to
focusing on differentially expressed genes, co-expression module-based
network analysis provides new insight into the role of different genes
associated with a specific condition, which cannot be detected by
standard transcriptome and network analysis^[33]4. Meanwhile, in these
networks, hubs can represent essential genes that may indeed contribute
to a specific phenotype^[34]8. This powerful approach has been widely
used on a range of systems mainly in plant species, including
Barley^[35]9, Wheat^[36]10, Rice^[37]11, elucidating the comprehensive
picture of stress responses. More recently, the efficiency of this
approach in identifying groups of expressed genes and highly connected
hubs which contributed in secondary metabolites accumulation in
microalgae Auxenochlorella protothecoides has been confirmed^[38]12.
The current study focused on co-expression network construction using
weighted gene co-expression network analysis (WGCNA) in combination
with the identification of hub genes in respective co-expressed modules
of salt responsive genes in D. salina.
Materials and methods
Eligible RNA seq data collection
A search for RNA seq datasets was performed on NCBI Sequence Read
Archive (SRA) database using the following keywords: microalgae,
Dunaliella, salina, salt, saline, osmotic, stress. Finally, four
independent studies that surveyed global transcriptome profiling in D.
salina at salt stress conditions were selected and raw data were
retrieved in fastq format. The first dataset (SRP134914) contains
twelve biological samples that were grown in sterile ATCC-1174 DA
medium supplemented with NaCl, sorbitol, and H[2]O[2]. We only included
the control and NaCl treated samples of this dataset. This dataset was
generated by deep sequencing, in triplicate, using Illumina NextSeq500
platform. The second dataset (SRP149387) contains nine samples were
generated in triplicate, using Illumina HiSeq 4000 platform. The third
dataset (SRP184449) contains twelve samples which were generated using
Illumina HiSeq 2000 platform (Table [39]1).
Table 1.
Details of datasets and treatment conditions with hypersalinity which
were used in this study.
Data set ID Salinity condition Sampling time points
SRP134914 NaCl (2 M) 24 h after treatment
SRP149387 NaCl (2.5 M) 6, 12, and 24 h after treatment
SRP184449 NaCl (2.5 M) 0.5, 1 and 2 h after treatment
[40]Open in a new tab
Pre-processing and differential expression analysis
Quality control of raw data sets was performed with FastQC (v 0.11.5)
software^[41]13. Adaptor sequence and low-quality reads with Phred
score < 30 were trimmed by using Trimmomatic (v0.32) software^[42]14.
The processed reads were subjected to de novo assembly using Trinity
(v2.4.0) software^[43]15 using default parameters. Protein orthology
was determined using Blastx (cutoff value of 6) against C. reinhardtii
and D. salina proteins ([44]https://phytozome.jgi.doe.gov/) as
described by Dums et al.^[45]16. The best hits were extracted with
in-house python scripts (Supplementary File [46]S1). Filtered reads
were aligned to the de novo assembled transcripts using
align_and_estimate_abundance Perl script implemented in RSEM (v1.3.1)
software^[47]17. Gene counts were then subjected to Bioconductor DESeq2
package version 1.10.1^[48]18 to identify differentially expressed
genes (DEGs). Comparisons were done using Wald’s test to determine the
log2-fold change. To overcome the inconsistency of results of different
studies and stabilize the genes’ expression ratios, batch effect
correction by using the Empirical Bayes method was performed^[49]19.
This correction enables direct comparisons of expression profiles
between biological groups from independent experiments^[50]20–[51]22.
Moreover, genes with low CV less than 10% were filtered out. Finally,
common DEGs between four datasets with a threshold |> 1.0| and adjusted
p-value < 0.05 were selected to further analysis.
Weighted gene co-expression network analysis (WGCNA)
Co-expression networks were constructed using the WGCNA algorithm
implemented in R WGCNA package^[52]23. To import selected DEGs in WGCNA
package, raw expression values of selected DEGs were normalized with
variance StabilizingTransformation (vst) function in R software. Then,
a similarity co-expression matrix was calculated with Pearson's
correlation
[MATH: Cori,j :MATH]
for all common DEGs. The similarity matrix was transformed into an
adjacency matrix (AM) by using the following equation
[MATH: aij=0.5∗1+cori,jβ :MATH]
where
[MATH: aij :MATH]
denotes the adjacencies between DEGs as a connection strengths index.
The soft-thresholding power beta of the co-expression network was
chosen by the criterion of scale-free topology with R2 cutoff (0.8).
Finally, the adjacency was transformed into a topological overlap
matrix (TOM) and corresponding dissimilarity matrix
[MATH: (1-TOM) :MATH]
using the following formula
[MATH: TOMi,j=∑u
aiuauj+aijminki,kj+1-a<
mrow>ij,Ki=∑uaiu :MATH]
where, row index u
[MATH: (u=1,⋯,m) :MATH]
represents sample measurements.
To obtain co-expressed modules, the parameters were adjusted to
minModuleSize = 20 and minimum height = 0.2 to cut the tree.
Network preservation analysis
ModulePreservation function implemented in WGCNA Bioconductor R package
was applied to survey preservation levels of control network modules in
the salinity coexpressed modules based on the combination of two
preservation statistics including medianRank and Zsummary. Zsummary
combines multiple statistics into a single overall measure of
preservation that considers density and connectivity aspects of
preservation using the following formula
[MATH: Zsummary=<
msub>Zdensity+Zconnectivity2
mfrac> :MATH]
The higher value of a Zsummary indicates the strong preservation in
control and treatment conditions. However, the dependency of Zsummary
to module size is a challenge, especially when modules with different
sizes must be compared. medianRank as a module size independent index
is another statistic to test the preservation level. The lower value of
a medianRank indicates the strong preservation in control and treatment
conditions^[53]23. Statistical significance of both indexes was tested
using permutation testing (here we applied 200 permutations). As
prescribed in original reports Zsummary and medianRank were combined
and Zsummary < 5 or medianRank < 8 were considered as criteria for
considering a module as a non-preserved module^[54]8,[55]23.
Identification and validation of hub genes
Hub genes in each co-expressed module were defined according to
Eigengene-based module connectivity or module membership (k[ME]) index
in non-preserved modules. To determine the k[ME], the correlation of
expression value of a gene and eigengene of the module were estimated.
This index measures the closeness of a gene in a given module. Genes
with |k[ME]| ≥ 0.7 were considered as hub genes in the respective
module^[56]23. Validation of hubs was performed by using leave-one-out
cross-validation (LOOCV) as prescribed in^[57]12.
Statistical analysis and functional enrichment
Kyoto encyclopedia of genes and genome (KEGG) Pathway enrichment was
performed using Algal Functional Annotation Tool (available
[58]http://pathways.mcdb.ucla.edu/algal/index.html)^[59]24 by setting
P-value < 0.05 as a cut-off criterion with Dunaliella salina and
Chlamydomonas reinhardtii genomes as references.