Abstract

   Nearly one-third of the world's population is latently infected with
   Mycobacterium tuberculosis (M. tb), which represents a huge disease
   reservoir for reactivation and a major obstacle for effective control
   of tuberculosis. During latent infection, M. tb is thought to enter
   nonreplicative dormant states by virtue of its response to hypoxia and
   nutrient-deprived conditions. Knowledge of the genetic programs used to
   facilitate entry into and exit from the nonreplicative dormant states
   remains incomplete. In this study, we examined the transcriptional
   changes of Mycobacterium marinum (M. marinum), a pathogenic
   mycobacterial species closely related to M. tb, at different stages of
   resuscitation from hypoxia-induced dormancy. RNA-seq analyses were
   performed on M. marinum cultures recovered at multiple time points
   after resuscitation. Differentially expressed genes (DEGs) at each time
   period were identified and analyzed. Co-expression networks of
   transcription factors and DEGs in each period were constructed. In
   addition, we performed a weighted gene co-expression network analysis
   (WGCNA) on all genes and obtained 12 distinct gene modules.
   Collectively, these data provided valuable insight into the
   transcriptome changes of M. marinum upon resuscitation as well as gene
   module function of the bacteria during active metabolism and growth.

   Keywords: transcriptional regulation, resuscitation, M. marinum,
   hypoxia, latency

Introduction

   Mycobacterium tuberculosis (M. tb), the causative agent of tuberculosis
   (TB), is the leading cause of death due to an infectious disease
   globally, with an estimated 10 million new cases and 1.3 million deaths
   in 2017. There were an additional 300,000 deaths from TB among
   HIV-positive people. The success of M. tb as a leading pathogen is
   associated with its ability to infect and persist in the host. About
   1.7 billion people, 23% of the world's population, are estimated to
   have a latent TB infection (LTBI), which is asymptomatic but can
   persist for decades ([41]Stewart et al., 2003; [42]North and Jung,
   2004). About 5–10% of LTBI will eventually develop active disease, and
   host immunosuppression (e.g., HIV coinfection) markedly increases the
   risk of reactivation ([43]Corbett et al., 2007). LTBI poses a major
   challenge to the effective control of TB because of the difficulty in
   treatment and the fact that LTBI represents a huge disease reservoir.

   During LTBI, M. tb is thought to enter nonreplicative ‘dormant' states
   by virtue of its lowered or altered metabolism in response to hypoxia,
   nitrosative stress, and/or nutrient deprivation ([44]Boshoff and Barry,
   2005). Accordingly, much research has been focused on environmental
   conditions and genetic programs that induce bacteriostasis, and the
   most extensively studied culture condition is hypoxia ([45]Wayne and
   Sohaskey, 2001; [46]Rustad et al., 2009). It was shown that an
   immediate bacterial response (2 hr) was the coordinated upregulation of
   47 M. tb genes under the control of the response regulator (DosR) and
   two sensor kinases (DosS and DosT), known as the DosR regulon
   ([47]Sherman et al., 2001; [48]Boon and Dick, 2002; [49]Park et al.,
   2003). A second set of 230 genes, induced by longer hypoxia exposure (7
   days), was also identified ([50]Rustad et al., 2008). These genes,
   collectively known as the enduring hypoxic response (EHR), were
   DosR-independent genes ([51]Rustad et al., 2008).

   During the reactivation of LTBI, the dormant bacteria are believed to
   resuscitate and resume active growth and metabolism. A few recent
   studies have used reaeration of hypoxic cultures for in vitro modeling
   of reactivation or resuscitation ([52]Veatch and Kaushal, 2018).
   Several regulatory proteins, such as transcription factor ClgR and
   sigma factors SigH and SigE, were found to play a role in M. tb
   resuscitation from hypoxia ([53]Mcgillivray et al., 2015; [54]Iona et
   al., 2016; [55]Veatch et al., 2016).

   Despite the progress, knowledge of the genetic programs used to
   facilitate entry into and exit from the nonreplicative dormant states
   remains incomplete. In this study, we examined the transcriptional
   changes of Mycobacterium marinum (M. marinum) at different stages of
   resuscitation from hypoxia-induced dormancy. M. marinum is a pathogenic
   Mycobacterium and the closest genetic relative of the M. tb complex. M.
   marinum is an excellent model through which to understand various
   aspects of host–pathogen interactions in M. tb pathogenesis. For
   example, M. marinum and M. tb share many virulence determinants, such
   as the ESX-1 secretion system ([56]Tobin and Ramakrishnan, 2008) and
   lipid virulence factors phthiocerol dimycocerosates and phenolic
   glycolipids ([57]Yu et al., 2012). As such, findings from the current
   study of M. marinum may be applicable to M. tb.

Result

RNA-Seq Analysis of M. marinum Recovered From Hypoxia

   Larry Wayne and co-workers were the first to develop an in vitro model
   to mimic the hypoxic environment of the human granuloma ([58]Wayne,
   1977; [59]Wayne and Hayes, 1996; [60]Wayne and Sohaskey, 2001). In the
   Wayne model, a sealed, standing culture is incubated over an extended
   period while the bacteria deplete the available oxygen. The gradual
   depletion of oxygen leads to nonreplicating persistence states with a
   concomitant shift in gene expression and metabolism.

   To gain insight into the genetic mechanisms that facilitate the exit of
   mycobacteria from the nonreplicative state, we grew M. marinum under
   hypoxia for 7 days using the Wayne model and then reaerated the
   cultures. At different time points thereafter (0, 0.5, 4, 12, 24, and
   48 hr), M. marinum cultures were collected and subjected to RNA-seq
   analysis. The growth curve of the bacteria is shown in
   [61]Supplementary Figure 1 . A total of 18 samples were collected
   (three biological replicates at each time point) and analyzed.

   The RNA-seq reads showed a high mapping ratio for all samples (>96%)
   ([62] Table 1 ), supporting the overall sequencing accuracy.
   Transcripts of more than 4,900 genes were detected in each sample. We
   compared the RNA-seq data of cultures recovered at different time
   points under aerobic conditions. As expected, results showed that the
   correlation coefficient decreased as the interval between two samples
   increased ([63] Figure 1 ). This result also suggested that the
   recovery from hypoxia is a gradual but dynamic process.

Table 1.

   RNA-seq reads of 18 samples (three biological replicates at each time
   point).
   Sample   Total reads Mapped reads Pair mapped reads Single mapped reads
   Mapped ratio(%)
 0hr_rep1    25667618     25206648       25071544            135104        98.2
 0hr_rep2    24461374     24025951       23875630            150321        98.22
 0hr_rep3    24356981     23991626       23851008            140618        98.5
 0.5hr_rep1  27750348     27226045       27037288            188757        98.11
 0.5hr_rep2  29227286     28693463       28508636            184827        98.17
 0.5hr_rep3  30215819     29671934       29509385            162549        98.2
 4hr_rep1    29552072     28996782       28795942            200840        98.12
 4hr_rep2    25707932     25227133       25058798            168335        98.13
 4hr_rep3    26459312     25956585       25781988            174597        98.1
 12hr_rep1   20056376     19435701       19294860            140841        96.91
 12hr_rep2   25849366     25305478       25143598            161880        97.9
 12hr_rep3   24639687     24048335       23903266            145069        97.6
 24hr_rep1   28399172     27857378       27662324            195054        98.09
 24hr_rep2   31131638     30539305       30334952            204353        98.1
 24hr_rep3   29686534     29152176       28950812            201364        98.2
 48hr_rep1   28813928     28353514       28145540            207974        98.4
 48hr_rep2   27427442     26995636       26807948            187688        98.43
 48hr_rep3   28062635     27641695       27444061            197634        98.5
   [64]Open in a new tab

Figure 1.

   Figure 1
   [65]Open in a new tab

   Calculated correlation coefficients between RNA-seq data from samples
   of different time points (0_hour, 0.5_hour, 4_hour, 12_hour, 24_hour,
   and 48_hour) during resuscitation.

Dynamic Changes of Gene Expression at Different Stage of Resuscitation

   To analyze transcriptome changes of M. marinum, we focused on genes
   with RPKM ≥ 10 and compared samples from adjacent intervals: between
   0.5 and 0 hr, 4 and 0.5 hr, 12 and 4 hr, 24 and 12 hr, as well as 48
   and 24 hr. Differentially expressed genes (DEGs) were identified, and
   this was defined as a fold change greater than 2 and false discovery
   rate P value less than 0.05.

   At the earliest time point after resuscitation (0.5 hr), 136 DEGs were
   detected, of which 71 were upregulated and 65 were downregulated.
   Between 4 and 0.5 hr, most of the DGEs were downregulated (81 out of 88
   total DEGs). The numbers of DGEs found between 12 and 4 hr, 24 and 12
   hr, as well as 48 and 24 hr were 72, 85, and 172, respectively. The
   heat map of the five groups of DEGs is shown in [66]Figure 2 .

Figure 2.

   [67]Figure 2
   [68]Open in a new tab

   Heatmaps of DEGs between adjacent time points. At each time point, data
   from three biologically independent samples were included. (A) 0.5 vs.
   0 hr; (B) 4 vs. 0.5 hr; (C) 12 vs. 4 hr; (D) 24 vs. 12 hr; and (E) 48
   vs. 24 hr. The red color indicates upregulation. The blue color
   indicates downregulation.

   We performed a Venn analysis of the five DEG groups ([69] Figure 3A ).
   The proportion of unique genes in each group was high: 68.4% (93/136,
   between 0.5 and 0 hr), 60.2% (53/88, between 4 and 0.5 hr), 46.7%
   (35/72, between 12 and 4 hr), 47.1% (40/85, between 24 and 12 hr), and
   74.4% (128/172, between 48 and 24 hr). This suggests that a variety of
   genes were involved at different stages of resuscitation.

Figure 3.

   [70]Figure 3
   [71]Open in a new tab

   Analysis of DEGs in five groups. (A) The Venn diagram of five DEGs. (B)
   The expression trend of all DEGs (440 genes) in the five periods
   (left). The number of DEGs in each period is shown on the right.

   Combing the DEGs of different stages, a total of 440 genes were
   identified ([72] Supplementary Table 1 ), and their expression
   underwent dynamic changes during resuscitation ([73] Figure 3B ). For
   example, the expression of MMAR_0922, MMAR_3562, and MMAR_1654 were
   significantly changed at the early stage of resuscitation, suggesting
   that they may play an important role in this period. Some genes had
   changed in multiple time periods. For example, the expression of
   MMAR_3403 was changed in last three periods, suggesting that this gene
   may be associated with the late stage of resuscitation.

Validation of RNA-seq Results by RT-qPCR

   To validate the RNA-seq results, a real-time quantitative (RT-qPCR)
   analysis was performed. Three biologically independent samples at each
   time point were used for this experiment. For each DEG group, we
   selected the top 10 upregulated genes and 10 downregulated genes for
   this analysis ([74] Figure 4 ). Between 0 and 0.5 hr, MMAR_4852 and
   MMAR_5170 (whiB4) were significantly downregulated, and six genes,
   MMAR_5122 (lipX), MMAR_0548 (espG3), MMAR_0547 (esxR), MMAR_0551
   (eccE3), MMAR_0546 (esxG), and MMAR_0550 (mycP3), were significantly
   upregulated ([75] Figure 4A ). Between 0.5 to 4 hr, six genes,
   including MMAR_1656, MMAR_1658 (hycQ), MMAR_1653 (Rv0081), MMAR_1655,
   MMAR_5122 (lipX), and MMAR_5170 (whiB4), were significantly
   downregulated and four genes MMAR_0845 (hemB), MMAR_5484, MMAR_1908
   (ATC1), and MMAR_3776 (rpfE) were significantly upregulated ([76]
   Figure 4B ). Between 4 and 12 hr, MMAR_2343 (papA1), MMAR_3555, and
   MMAR_2320 (wecE) were significantly upregulated and MMAR_4903 were
   significantly downregulated ([77] Figure 4C ). Between 12 and 24 hr,
   six genes, MMAR_0335, MMAR_0602, MMAR_4903, MMAR_4899, MMAR_2009, and
   MMAR_0615 (iniA), were significantly downregulated ([78] Figure 4D ).
   Between 24 and 48 hr, six genes, MMAR_3465 (PPE51), MMAR_4824,
   MMAR_4482 (cypM), MMAR_4750, MMAR_2944, and MMAR_1790 (PPE2), were
   significantly downregulated, and seven genes, MMAR_2651, MMAR_2914
   (katG), MMAR_2649, MMAR_5315 (lpqH), MMAR_5319, MMAR_2839 (mpt63), and
   MMAR_0656, were significantly upregulated ([79] Figure 4E ).

Figure 4.

   [80]Figure 4
   [81]Open in a new tab

   qRT-PCR results. (A–E) Approximately 20 genes from each period were
   selected and analyzed by qRT-PCR. *p < 0.05; **p < 0.01; and ***p <
   0.001. (F) Scatter plot of RT-PCR data of all genes analyzed in (A–E),
   comparing them to RNA-seq data of the same genes.

   There is a good agreement between the RNA-seq and qPCR data, evident
   from the scatter plot using the expression levels of all 97 genes that
   were analyzed by both RNA-seq and qPCR (R^2^= 0.784) ([82] Figure 4F ).
   Based on this result, we consider that the RNA-seq data is reliable.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway
Enrichment Analysis

   To gain insight into the biological consequence of the observed
   transcriptome changes, we performed GO and KEGG pathway analyses of the
   five DEG groups ([83] Figure 5 ). A GO analysis was applied to identify
   the functional categories of DEGs. Between 0.5 and 0 hr, more than half
   of DEGs were involved in membrane and cell wall processes and were
   significantly enriched ([84] Figure 5 ). This is consistent with the
   notion that, upon reaeration, the bacteria resumed cell division, which
   involved cell wall and membrane biogenesis. A KEGG analysis
   consistently revealed that DEGs involved in metabolic pathways and
   biosynthesis were significantly enriched and accounted for the largest
   number. A similar trend was observed for later periods, in which genes
   involved in cell wall and membrane biogenesis were highly enriched and
   accounted for the majority of DEGs at these stages. These results
   provide snapshots into the recovery of the bacteria from hypoxia and
   active growth under aerobic conditions.

Figure 5.

   [85]Figure 5
   [86]Open in a new tab

   GO and KEGG analysis of DEGs in the five periods.

Co-Expression Analysis Between Transcriptional Regulators and mRNAs

   Co-expression networks can show relationships between genes. To explore
   the regulatory mechanisms at different stages of resuscitation, we
   constructed a co-expression network between transcriptional regulators
   and mRNAs. For this, we selected known transcriptional regulators,
   including transcription factors and sigma factors from the five DEGs,
   and calculated the correlation coefficients between these transcription
   factors and the remaining DEGs in the same group. We considered that a
   relationship existed between a given transcriptional regulator and
   other genes if the absolute value of the correlation coefficient was
   greater than 0.9, which included both positive and negative
   correlation. Based on these results, we constructed five co-expression
   modules and integrated them into a large network ([87] Figure 6 ). The
   dark blue nodes in the figure represent transcription factors, and the
   green nodes denote DEGs in the same period. The size of the node is
   determined by the degree of connectivity. Greater degrees of
   connectivity are indicated by larger points. If there is a line between
   two nodes, then there is a relationship between them.

Figure 6.

   [88]Figure 6
   [89]Open in a new tab

   Co-expression networks in five periods. Each dot represents a gene.
   Transcription factors are labeled in blue, and other genes are labeled
   in green. Line between dots represents co-expression relationship
   between genes. The size of dot is proportional to the level of
   connectivity.

   In the first co-expression module (between 0 and 0.5 hr), three
   transcription factors, MMAR_4874 (CosR), MMAR_1653 (Rv0081), and
   MMAR_4852 (KmtR), formed the major regulatory hubs, and MMAR_4874
   (CosR) was the largest hub and interacted with other hubs in the
   network ([90] Figure 6 ). The MMAR_4874 (CosR) and MMAR_1653 (Rv0081)
   hubs remained in the second co-expression module (between 0.5 and 4
   hr), in addition to three new hubs formed by MMAR_4254, MMAR_1725 and
   MMAR_1132. In the third period (between 4 and 12 hr), MMAR_0229 and
   MMAR_4902 formed the hubs. In the fourth period (between 12 and 24 hr),
   MMAR_2003 (SigB) and MMAR_4219 formed the hubs. In the final period (24
   to 48 hr), MMAR_2651, MMAR_1555, and MMAR_0249 formed the hubs.

Weighted Gene Co-Expression Network Analysis (WGCNA)

   The DEG analysis focused on partial dynamic changes during
   resuscitation. While the co-expression network of transcription factors
   provides an overview of the regulatory programs enabling the
   resuscitation of M. marinum, our knowledge on the overall genetic
   changes is still missing. Therefore, in this section, we analyzed the
   expression of all genes from cultures at different stages of
   resuscitation.

   Weighted gene co-expression network analysis (WGCNA) ([91]Langfelder
   and Horvath, 2008) is a method for analyzing the gene expression
   patterns of multiple samples. It clusters genes into modules by similar
   expression trends and reveals the relationship between gene modules and
   specific traits or phenotypes. We applied this method to analyze the
   RNA-seq data of M. marinum at different stages of resuscitation.

   The gene modules were identified by the WGCNA package in R software. We
   first determined the appropriate “soft-thresholding” value, which
   emphasizes strong gene–gene correlations at the expense of weak
   correlations. An optimal parameter (power = 20) was determined by
   plotting the strength of correlation against a series (range 2 to 20)
   of soft threshold powers ([92] Figure 7A ).

Figure 7.

   [93]Figure 7
   [94]Open in a new tab

   WGCNA cluster analysis. (A) Plot of the strength of correlation against
   a series (range 2 to 20) of soft threshold powers. (B) Gene clusters
   and gene module fusion.

   An unsigned pairwise correlation matrix was calculated, and the WGCNA
   algorithm was used to transfer the correlation coefficient between
   genes into the adjacent coefficient. Then, the dissimilarity of the
   topological overlap matrix was calculated based on the adjacent
   coefficient. Using the calculated dissimilarity, we carried out a
   hierarchical analysis by using agglomerative hierarchical clustering,
   also known as the bottom-up method. Other assumptions were made: (i)
   distances between different classes were measured by the average
   connectivity, and (ii) there should be at least 30 genes in each gene
   module.

   Based on these analyses, we initially obtained 48 gene modules. The
   hierarchical cluster tree was then treated using the dynamic tree cut
   algorithm in the WGCNA package. A total of 13 gene modules were
   obtained. The “gray” module was the default module, which included
   discarded genes that could not be clustered. Thus, we focused on the
   analysis in the remaining 12 gene modules. The process of fusion is
   shown in [95]Figure 7B . The number of genes varied among these 12
   modules, and the detailed information is listed in [96]Table 2 and
   [97]Supplementary Table 2 .

Table 2.

   Information of gene modules identified by WGCNA.
   Module_ID Number of genes
   module_34      1711
   module_26       919
   module_43       424
   module_14       320
   module_20       298
   module_3        268
   module_35       142
   module_29       136
   module_12       127
   module_22       89
   module_47       89
   module_45       46
   [98]Open in a new tab

   The first principal component analysis (PCA) was performed on the 12
   gene modules ([99] Figure 8 ). The PCA results reflected the main trend
   of gene expression in the modules. Module 20 played an important role
   in the early stage (0 to 0.5 hr) of recovery, module 35 played a role
   mostly in the middle stage (4 to 12 hr), and module 12 was only
   involved in the last stage (24 to 48 hr). Other modules appeared to
   play roles in more extended periods.

Figure 8.

   [100]Figure 8
   [101]Open in a new tab

   PCA analysis of 12 gene modules. The X axis represents time period, and
   the Y axis represents expression of first principal component.

Identification of Key Gene Modules Associated With Different Stages of
Resuscitation

   Thus far, we have identified 5 DEGs and 12 gene modules. We then
   performed an enrichment analysis between them. When the P value of
   Fisher's exact test was less than 0.001, we considered that these gene
   modules were significantly enriched in the DEG sets. The results are
   shown in [102]Figure 9 . Interestingly, module 20 was significantly
   enriched in DEGs of the early stage (0 to 0.5 hr) and module 12 was
   significantly enriched in DEGs of the last stage (24 to 48 hr) of the
   recovery. This is consistent with the result that these two modules
   were only involved in the early and last stages of resuscitation,
   respectively ([103] Figure 8 ).

Figure 9.

   [104]Figure 9
   [105]Open in a new tab

   Enrichment analysis between 12 gene modules and 5 DEGs. The above bar
   chart represents the number of DEGs in each period, and the bar chart
   on the right represents the number of genes in each gene module. Blue
   squares represent a significant enrichment between the row set and the
   column set.

Discussion

   In this study, we examined the transcriptome changes of M. marinum
   recovered from hypoxia-induced dormancy. To gain a comprehensive view,
   multiple time points, including shortly after resuscitation (0.5 hr) to
   more extended periods up to 48 hr, were included. For each time point,
   three biologically independent samples were analyzed. Transcripts of
   the whole genome were analyzed by RNA-seq, and the quality of the
   RNA-seq data was reflected by the high genome mapping ratio and further
   validated by qPCR analysis of close to 100 genes. With these
   high-quality sequence data, we performed in-depth analyses, which
   included the identification of DEGs and the construction of
   co-expression network of transcription factors in each period. The
   availability of transcriptomes of independent samples at multiple time
   points also allowed us to employ a weighted gene co-expression network
   analysis to identify gene modules of M. marinum. Collectively, these
   data provide valuable insight into not only the genetic changes of the
   bacteria upon resuscitation but also the gene module function of M.
   marinum during active metabolism and growth.

   A total of 136 DEGs were identified in M. marinum upon resuscitation
   from dormancy (0 to 0.5 hr), including eight transcription factors
   ([106] Figure 6 ). Interestingly, all of these transcription factors
   were significantly downregulated. Among them, MMAR_1653 is a homolog of
   Rv0081 in M. tb. Previous studies have shown that Rv0081 is a member of
   the DosR regulon and is induced at the early stage of hypoxia
   ([107]Sherman et al., 2001). Rv0081 is a major regulator of M. tb
   response to hypoxia and forms a large regulatory hub ([108]Galagan et
   al., 2013; [109]Chawla et al., 2018; [110]Sun et al., 2018). Rv0081
   plays an important role connecting the early and enduring hypoxic
   responses ([111]Sun et al., 2018). WhiB4 (MMAR_5170) is an
   oxygen-sensitive transcription factor and has been shown to regulate
   PE/PPE family proteins, and it plays a role in M. marinum virulence
   ([112]Chawla et al., 2012; [113]Wu et al., 2017). CosR (MMAR_4874) is a
   copper-inducible transcriptional regulator, and the loss of cosR
   resulted in a hypoxia-type response with the induction of the DosR
   regulon ([114]Talaat et al., 2004; [115]Ward et al., 2008; [116]Marcus
   et al., 2016). Given their roles in the hypoxic response, it is not
   surprising that Rv0081, WhiB4, and CosR underwent dynamic changes in
   expression upon resuscitation by reaeration. Two other transcription
   factors that were downregulated at this stage, MMAR_5405 (EthR) and
   MMAR_1394 (Rv3176c), belong to the TetR family transcription factors
   ([117]Leiba et al., 2014; [118]Sharma et al., 2017).

   We also found that multiple members of the ESAT-6 family proteins were
   upregulated upon resuscitation ([119]Harboe et al., 1996;
   [120]Priscille et al., 2004), including EsxA ([121]Sandra et al., 2010;
   [122]Zhang et al., 2016), EsxB ([123]Sandra et al., 2010), EsxG
   ([124]Sweeney et al., 2011), EsxH ([125]Alka et al., 2013;
   [126]Portal-Celhay et al., 2016), EsxK, and EsxN ([127]Zhigang et al.,
   2017). These proteins are components of the Type VII secretion systems,
   and many of them are important T cell antigens and play a critical role
   modulating the host–pathogen interactions ([128]Abdallah et al., 2007).

   From 0.5 to 4 hr after reaeration, 11 transcription factors were
   downregulated ([129] Figure 6 ), including Rv0081, CsoR, and WhiB4 as
   mentioned above. Notably, the expression of two other WhiB family
   proteins ([130]Averina et al., 2012), WhiB3 and WhiB5, were also
   significantly altered. WhiB3 responds to dormancy signals, including
   hypoxia and NO, and controls redox homeostasis of the bacteria
   ([131]Priscille et al., 2004). WhiB5 responds to oxygen and controls
   the expression type VII secretion systems ([132]Priscille et al.,
   2004). Consistently, we found that whiB3 was downregulated while whiB5
   was upregulated at this stage of resuscitation.

   During the period of 4 to 12 hr, we identified 72 DEGs, including 7 PPE
   family genes (MMAR_5121, MMAR_1095, MMAR_1235, MMAR_1847, MMAR_1905,
   MMAR_2669, and MMAR_3989). The PE/PPE family proteins play a critical
   role in mycobacterial pathogenesis ([133]Fishbein et al., 2015). In
   addition, several genes of the Mce family were upregulated, including
   MMAR_3865 (mce2B), MMAR_3868 (mce5E), MMAR_3867 (mce5D), MMAR_3866
   (mce5C). The Mce family genes comprise four mammalian cell invasion
   factor (mce) operons (mce1-4), and some of these are involved in the
   invasion of host cells ([134]Zhang and Xie, 2011).

   A picture appears to have emerged from these analyses; in the early
   stage of resuscitation from the hypoxia-induced dormancy, transcription
   factors critical for a hypoxia-induced response are downregulated, and,
   as the recovery continues, genes important for virulence and host
   interactions are upregulated.

   A WGCNA analysis revealed 12 distinct gene modules. Of particular
   interest is gene module 20, which was involved in the early stage of
   resuscitation only ([135] Figure 8 ). This module comprises of ~300
   genes, many of which have unknown functions or annotations. Future
   studies focusing on genes in this module may help to understand the
   molecular machinery enabling the exit of the bacteria from dormancy.

Methods and Materials

Bacterial Strain, Media, and Growth Conditions

   M. marinum 1218R (ATCC 927) was grown in Middlebrook 7H9 broth to
   OD600~0.5, at which point they were aliquoted and cultured in
   screw-capped conical flasks at 30°C without additional oxygen. The
   hypoxic culture conditions were described previously by Wayne and Hayes
   ([136]Wayne and Hayes, 1996). After 7 days in hypoxic conditions, the
   screw cap was replaced with a permeable membrane, and the rest of the
   conditions were unchanged. After aeration, samples were taken at 0 h,
   0.5 h, 4 h, 12 h, 24 h, and 48 h, and an aliquot was used to measure
   the OD value ([137] Supplementary Figure 1 ). The remaining samples
   were collected and snap frozen in liquid nitrogen for RNA sequencing.

RNA Extraction, Illumina Sequencing, and RT-qPCR

   M. marinum cultures were centrifuged at 4,500 × g for 5 min at room
   temperature and frozen on dry ice. The frozen cell pellets were
   resuspended in 1 mL TRIzol reagent (CW Bio). RNA extraction and
   illumina sequencing were performed as previously described ([138]Wu et
   al., 2017). Raw data of RNA sequencing have been uploaded to the GEO
   database (BioProject ID : PRJNA588556).

   For RT-qPCR validation of RNA-seq data, 1 µg RNA was
   reversed-transcribed to cDNA, which was then used as the template for
   RT-PCR analysis. The primers for analyzing the selected genes were
   listed in [139]Supplementary Table 3 .

Transcriptome and Bioinformatics Analysis

   The RNA-seq analysis and identification of differentially expressed
   genes (DEGs) were performed as previously described ([140]Lee et al.,
   2019).

GO and KEGG Analysis

   We used the Kyoto Encyclopedia of Genes and Genomes (KEGG) database
   ([141]Ogata et al., 2000) ([142]www.genome.jp/kegg/) and the Gene
   Ontology (GO) database ([143]Ashburner et al., 2000)
   ([144]www.geneontology.org/) for our data analysis.

Construction of the Gene Co-Expression Network

   We calculated the correlation coefficient between identified
   transcription factors and other DEGs in each time period. The
   correlation coefficient value ranges from -1 to 1, representing a
   negative and positive correlation, respectively. We considered the
   expression of two genes as correlated if the absolute value of the
   correlation coefficient was larger than 0.8. The results were imported
   into Cytoscape 3.0 to generate the network map ([145]Kohl et al.,
   2011).

Weighted Gene Co-Expression Network Analysis (WGCNA)

   We used the RNA-seq data from multiple time points (three biological
   independent samples at each time point) for WGCNA analysis. We used the
   WGCNA package to cluster gene modules as follows.
    1. Define gene co-expression similarity: calculate the similarity
       between any two genes using Pearson's correlation coefficient (Sij 
       =  |cor(i,j)|, the correlation coefficient of gene i and gene j),
       which then forms the correlation matrix (S  =  [S[ij]]).
    2. Define the exponential weighted value β: for any gene pair (i and
       j), apply the exponential adjacency function in the WGCNA algorithm
       to measure their relation index, namely, the exponential weighted β
       square of the correlation coefficient (a[ij]  =  power(S[ij], β)  =
        |S[ij]|^β). Exponential weighted β is the power of the correlation
       coefficient. We selected β  =  5 after the analysis (fit value R^2
       to approximately 0.9).
    3. Define a measure of node dissimilarity: after determining the
       adjacency function parameter β, the correlation matrix S  =
        [S[ij]] is switched into the adjacency matrix A  =  [a[ij]] and
       converted into the topological overlap matrix Ω  =  [ω[ij]]. k[i]
       or k[j] indicate the sum of one node's adjacency coefficients. The
       node is a gene (i or j).

   [MATH:
   <mrow><msub><mi>ω</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo
   ><mfrac><mrow><msub><mi>l</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><
   mo>+</mo><msub><mi>a</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow
   ><mrow><mi>min</mi><mrow><mo>{</mo><mrow><msub><mi>k</mi><mi>i</mi></ms
   ub><mo>,</mo><msub><mi>k</mi><mi>j</mi></msub></mrow><mo>}</mo></mrow><
   mo>+</mo><mn>1</mn><mo>−</mo><mo> </mo><msub><mi>a</mi><mrow><mi>i</mi>
   <mi>j</mi></mrow></msub></mrow></mfrac></mrow> :MATH]

   [MATH:
   <mrow><mtable><mtr><mtd><mrow><msub><mi>l</mi><mrow><mi>i</mi><mi>j</mi
   ></mrow></msub><mo>=</mo><msub><mo>∑</mo><mi>μ</mi></msub><msub><mi>a</
   mi><mrow><mi>i</mi><mi>μ</mi></mrow></msub><msub><mi>a</mi><mrow><mi>μ<
   /mi><mi>j</mi></mrow></msub></mrow></mtd><mtd><mrow><msub><mi>k</mi><mi
   >i</mi></msub><mo>=</mo><msub><mo>∑</mo><mi>μ</mi></msub><msub><mi>a</m
   i><mrow><mi>i</mi><mi>μ</mi></mrow></msub></mrow></mtd><mtd><mrow><msub
   ><mi>k</mi><mi>i</mi></msub><mo>=</mo><msub><mo>∑</mo><mi>μ</mi></msub>
   <msub><mi>a</mi><mrow><mi>i</mi><mi>μ</mi></mrow></msub></mrow></mtd></
   mtr></mtable></mrow> :MATH]
    4. Build hierarchical clustering tree to identify gene modules: the
       hierarchical clustering tree built using the dissimilarity
       coefficient
       [MATH:
       <mrow><msubsup><mi>d</mi><mrow><mi>i</mi><mi>j</mi></mrow><mi>ω</mi
       ></msubsup></mrow> :MATH]
       [MATH: <mrow><mo
       stretchy="false">(</mo><msubsup><mi>d</mi><mrow><mi>i</mi><mi>j</mi
       ></mrow><mi>ω</mi></msubsup><mo>=</mo><mn>1</mn><mo>−</mo><msub><mi
       >ω</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mrow> :MATH]
       ), and the different branches represent the gene modules.

Enrichment Analysis and PCA Analysis

   To determine whether one set of genes were more enriched in another set
   of genes, we used the Chi-square test or Fisher's exact test
   ([146]Upton, 1992). First, the two sets of genes were taken and used to
   form a 2*2 contingency table. If there was a value less than or equal
   to five in the table, the Fisher's exact test was applied; otherwise,
   the Chi-square test was applied. When the p value was less than 0.05,
   the two sets were considered significantly enriched to each other. PCA
   analyses were performed by the princomp function in R software (version
   3.5.1)

Data Availability Statement

   Raw data of RNA sequencing have been uploaded to the GEO database
   (BioProject ID:PRJNA588556).

Author Contributions

   JJ and CL performed the experiment. JJ, CL, and JL wrote the
   manuscript. YW and JZ prepared [147]Figures 1 –[148] 9 . LS, KY, and WX
   prepared [149]Tables 1 and [150]2 . LZ, YL, and JL provided guidance
   for experiments. All authors reviewed the manuscript.

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Acknowledgments