Abstract

   C. difficile is the most common cause of nosocomial diarrhea in North
   America and Europe. Genomes of individual strains of C. difficile are
   highly divergent. To determine how divergent strains respond to
   environmental changes, the transcriptomes of two historic and two
   recently isolated hypervirulent strains were analyzed following
   nutrient shift and osmotic shock. Illumina based RNA-seq was used to
   sequence these transcriptomes. Our results reveal that although C.
   difficile strains contain a large number of shared and strain specific
   genes, the majority of the differentially expressed genes were core
   genes. We also detected a number of transcriptionally active regions
   that were not part of the primary genome annotation. Some of these are
   likely to be small regulatory RNAs.

Introduction

   Clostridium difficile is a toxin producing anaerobic bacillus and is
   the leading cause of hospital associated diarrhea in North America and
   Europe[[33]1,[34]2]. Clinical presentation of C. difficile infection
   (CDI) ranges from asymptomatic colonization, mild diarrhea, severe
   pseudomembranous colitis, paralytic ileus, to sepsis and death[[35]3].
   Advanced age, prolonged hospitalization, antibiotic use and acid
   suppression therapy are some of the risk factors for
   CDI[[36]3,[37]4].The mortality rate is more than 80% in fulminant cases
   requiring colectomy[[38]5]. C. difficile produces two toxins, toxin A
   and toxin B, which are responsible for most of the damage caused to the
   host[[39]6,[40]7].

   During the last decade, there has been a significant increase in the
   rate of CDI across the United States, Canada, and
   Europe[[41]8-[42]10].Emergence of more virulent strains and changes in
   antibiotic treatment regimens are some of the established causes of the
   increase in CDI [[43]11,[44]12]. Some of these strains, which have the
   capacity to produce more severe colitis and mortality, have been termed
   as hypervirulent [[45]13]. These strains belong to PCR ribotype
   027[[46]13]. Ribotype 027 strains are also characterized as toxinotype
   III, North American pulsed field gel electrophoresis type1 (NAP1) and
   restriction endonuclease analysis group BI and in contrast to
   historical control strains, are fluoroquinolone resistant[[47]10]. In
   the recent years, C. difficile infection in the community setting has
   increased [[48]14]. It is also causing a significant number of
   infections in food animals[[49]15]. Recent studies have shown the
   possibility of foodborne transmission of C. difficile, which might
   explain the spread of C. difficile infections in the general community
   [[50]16].

   Comparative genomic studies have revealed that the genome conservation
   in C. difficile is very low and this is a major contributing factor in
   the outcome of infection by a given strain [[51]17-[52]19]. The number
   of core genes in C. difficile is estimated to be less than 20% of the
   pangenome[[53]20]. This variation in the genome content could enable C.
   difficile strains to respond differently to environmental changes. A
   number of functional genomics studies have been conducted to indentify
   the differential gene network that controls C. difficile response to
   environmental changes[[54]21-[55]24]. However, all these studies are
   based C. difficile 630, a strain isolated in 1985 from Zurich,
   Switzerland. Genome of C. difficile 630 is the most commonly used
   reference for functional genomics studies as it was the first strain to
   be sequenced [[56]17]. Another reason for using C. difficile 630 as
   reference is that microarray based studies require a high quality
   complete genome for probe design.

   Recent advances in new generation sequencing platforms have
   revolutionized microbial genomics. Genome sequencing using new
   generation sequencing platforms has resulted in sequencing of several
   C. difficile strains from different continents that are old as well as
   newer hypervirulent isolates[[57]20,[58]25-[59]27]. These comparative
   genomics studies have revealed that when compared to historic strains
   (isolated in the 1980s), new hypervirulent strains (isolated after
   2000) have undergone several genome changes that result in the
   hypervirulent phenotype[[60]12,[61]25]. The development of
   transcriptional profiling using RNA sequencing (RNA-seq) also offers
   vast improvements over microarray based transcriptional profiling. In
   order to determine how historic and recently emerged C. difficile
   strains respond to environmental changes we used RNA-seq to compare the
   transcriptome of four C. difficile strains after subjecting them to
   physiologically relevant in vitro stress conditions. In the first
   condition, cells were shifted from a rich medium [Brain heart infusion
   broth (BHI)] to a poor medium [Basal defined medium (BDM)] [[62]28]
   with supplementation of 0.5% sucrose. It has been reported that the
   presence of glucose and other easily metabolizable carbon sources in
   the growth medium suppress production of toxin A and toxin
   B[[63]29,[64]30]. The shift from BHI to BDM was designed to induce
   multiple nutritional changes and to analyze the impact of those changes
   in C. difficile pathways. Osmotic shock (shift from BHI to BHI
   supplemented with 1.5% NaCl) was chosen as the second test condition
   because C. difficile has been shown to have enhanced host cell
   adherence following osmotic shock[[65]31]. Since adhesion and
   colonization of animal tissue by bacteria is important in establishing
   infection, it is probable that without attachment, C. difficile cannot
   colonize and will be quickly removed by non-specific host defense
   mechanisms which include intestinal peristalsis, mucosal cell
   exfoliation and intestinal mucins[[66]32,[67]33].

   Two strains in this comparison were isolated in the 1980s (CD630 and
   CD196), and two were hypervirulent strains isolated after 2000
   ([68]R20291 and QCD_32g58). Although s C. difficile 630 is not a
   ribotype 027 strain, we included this strain in the comparisons as it
   is the only strain that was sequenced using Sanger method and to enable
   cross comparison with the previously reported functional genomics
   studies. Consistent with the large genome diversity in C. difficile,
   our transcriptome sequencing results show that differentially expressed
   core genes in various strains are not identical. We also report
   expression of several novel transcripts that are not part of the
   primary genome annotation.

Materials and Methods

C. difficile strains and culturing

   We selected four C. difficile strains for transcriptomic comparisons.
   The characteristics of these strains are given in [69]Table 1.
   Bacterial culturing was performed inside a Bactron IV anaerobic chamber
   (Shel Lab, Cornelius, OR). The chamber was filled and purged with an
   anaerobic gas mixture (10% CO2, 85% N2, 5% H2). A palladium catalyst
   was used in the chamber to remove any trace of oxygen. All materials
   used in the anaerobic chamber were pre-reduced. Spores of C. difficile
   strains CD630, CD196, QCD_32g58 and [70]R20291 were streaked on
   brain-heart infusion (BHI) agar plates containing 0.1% L-cysteine and
   taurocholate. The plates were incubated overnight at 37°C. Single
   colonies from these plates were then used inoculate pre-reduced BHI
   broth and were incubated at 37°C overnight. Fresh BHI broth was then
   inoculated by transferring 1% overnight culture. Cultures were
   incubated at 37°C until the OD600 reached between 0.4 - 0.5. Bacteria
   were then collected by centrifugation at 2,000 x g for 5 minutes. These
   cells were then shifted to two physiologically relevant in vitro
   conditions. In the first condition, cells were subjected to nutrient
   change by shifting to an equal volume of Basal defined medium
   (BDM)[[71]28] with supplementation of 0.5% sucrose. This shift was
   designed to induce multiple nutritional changes and to analyze the
   impact of those changes in C. difficile pathways. In the second
   condition, cells were subjected to osmotic shock by shifting to an
   equal volume of BHI supplemented with 1.5% NaCl. The same number of
   cells was transferred to fresh BHI as the control group. After
   incubating for 1 hour at 37°C, twice the volume of RNAprotect bacteria
   reagent (Qiagen, Valencia, CA) was added to the cultures to halt
   transcription and RNA degradation, and cells were collected by
   centrifugation at 2,000 x g for 10 minutes.All experiments were
   performed as two biological replicates.

Table 1. Characteristics of C. difficile strains used in this study.

   Strain name  Ribotype Isolation year Isolation country Hypervirulence
   CD630        13       1982           Switzerland       no
   CD196        27       1985           France            no
   QCD-32g58    27       2004           Canada            yes
   [72]R20291   27       2006           UK                yes
   [73]Open in a new tab

Isolation of total RNA

   Total RNA extraction was performed with TRIzol/RNeasy hybrid RNA
   extraction protocol. Briefly, the bacterial pellets were re-suspended
   with 1 ml of TRIzol reagent and were transferred to 2 ml sterile
   screw-cap microcentrifuge tube. Then 0.5ml of sterile RNase-free 0.1 mm
   zirconia beads was added to each tube. Cells were homogenized and lysed
   by bead beating four times in a Mini Bead-Beater (BioSpecProducts,
   Inc., Bartlesville, OK) for 30 seconds with a gap of 30 seconds. After
   the chloroform extraction, the aqueous phase was transferred to a 1.5
   ml sterile RNase-free micro-centrifuge tube and mixed with an equal
   volume of 100% ethanol (Sigma). This mix was loaded into an RNeasy
   column (Qiagen kit) and centrifuged for 30 seconds at 8,000 x g.
   Washing of the column, DNA digestion and elution steps were performed
   following the standard Qiagen protocol. Integrity of isolated RNA was
   estimated using Agilent Bioanalyzer 2100. Only those samples with an
   RNA Integrity Number (RIN) >9 were used for RNA sequencing.

cDNA library synthesis and sequencing

   Ribo-Zero™ rRNA Removal Kit ( Epicentre® Biotechnologies, Madison, WI,
   USA) was used to purify mRNA from 10µg total RNA. First strand cDNA was
   synthesized using SuperScript® III (Invitrogen). The second-strand cDNA
   was synthesized using RNase H (Invitrogen) and DNA polymerase I (New
   England BioLabs). Then the cDNA libraries were prepared using the
   Illumina Paired-end Genomic DNA Sample Prep kit (Illumina) following
   the manufacturer's protocol. For each sample, two libraries were
   prepared from biological replicates. Each library was then loaded onto
   flow cell channels of the Illumina High-seq 2000 platform for
   paired-end 90 bp × 2 sequencing. The average insert size for the
   paired-end libraries was 200 bp (from 180 to 220 bp). For each strain,
   six paired-end cDNA libraries were constructed. Therefore twenty four
   libraries were sequenced across all strains.

RNA-Seq data analysis

   For data processing, we used a customized RNA-Seq data analysis
   pipeline developed at Virginia Bioinformatics Institute (VBI) by
   combining open source programs. Briefly, the quality of the raw
   sequence reads was checked using the FastQC program
   ([74]http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc). The
   processed reads were then aligned using Bowtie version 0.12.7 [[75]34]
   to the corresponding C. difficile reference genome. Read alignments
   with mapping quality score (MAPQ) < 10 were removed. Cufflinks software
   package version 1.3.1 [[76]35] was used to assemble transcripts and
   estimate the relative abundances of the transcripts. Transcript
   expression levels are estimated as Fragments Per Kilobase per Million
   mapped reads (FPKM). Cuffdiff [[77]36], a component of Cufflinks was
   used to calculate transcript expression levels. When compared to
   control condition, genes with log[2] ratio ≥ 1.5 and FDR- adjusted p
   value less than or equal to 0.05 were considered as differentially
   expressed. The processed data and raw files from this experiment have
   been submitted to NCBI Gene expression Omnibus (GEO) under the
   accession # [78]GSE50497 and NCBI short read archive (SRA) under the
   accession # SRP029366.

Novel gene discovery

   Cufflinks program provides reference annotation based assembly and
   seeks to build upon available information about the transcriptome of an
   organism to find novel genes and isoforms [[79]35]. Cufflinks output
   includes all annotated reference transcripts and any novel genes and
   isoforms that are assembled. The novel gene transcripts identified by
   the pipeline can be novel small RNA genes or unannotated CDS. We
   performed the following steps to categorize these novel transcripts.
   First, these transcripts were used to search the Rfam database for sRNA
   matches. Rfam is a comprehensive database containing families of
   structural RNAs, including non-coding RNA genes as well as
   cis-regulatory RNA elements[[80]37]. It incorporates literature curated
   known sRNAs and uses them as seeds to predict sRNAs for sequenced
   genomes using covariance model[[81]37]. In the second step, the new
   transcripts were searched against the non-redundant (nr) database using
   BLASTX to check for any protein coding gene hits. Finally, we used ORF
   Finder program to verify whether any transcripts are potentially
   protein coding genes. Assembled transcripts with no BLASTX hits and no
   ORF assignment were considered as sRNAs.

Functional analysis of differentially expressed genes

   Since it is well established that C. difficile strains are known to be
   highly divergent [[82]18-[83]20], we classified genes in each strain
   into core, shared and unique categories using OrthoMCL program
   [[84]38]. We then combined this gene classification with gene
   expression level data to obtain a comparison of these genes across all
   strains. Pathologic program in Pathwaytools v 16.0[[85]39] was used to
   reconstruct the metabolic pathways of strains QCD_32g58, CD196 and
   [86]R20291. A previously curated high quality pathway annotation for
   strain CD630 by our group [[87]23] was used to remove false positive
   pathway predictions in these strains. Omics viewer [[88]40] was used to
   map the differentially expressed genes onto cellular pathways and to
   compare differentially expressed pathways across strains. For
   identifying how gene interaction networks adjust to the stress
   conditions tested, the differentially expressed genes were mapped to
   the systems level gene interaction network of C. difficile. The base
   interaction data for this analysis was obtained from STRING
   database[[89]41]. Complete interaction data from STRING v 9.0 was
   downloaded and C. difficile specific interaction data was then
   extracted using custom Linux shell scripts. The interaction data in
   STRING database includes both experimental as well as predicted
   interactions and each interaction is assigned a confidence score. We
   then selected interactions with confidence score of 400 or above. These
   would represent medium and high confidence interactions in C.
   difficile. This interaction network was then imported into Cytoscape
   [[90]42] for visualization and overlaying of transcriptomic expression
   data.

qRT-PCR

   We used qRT-PCR to verify the expression levels of selected genes.
   Following the manufacturer’s instructions, 3.0 µg of total RNA isolated
   from each stress condition was converted to cDNA by using SuperScript
   III reverse transcriptase (Invitrogen) with random hexamers. The
   real-time reaction mixture included 12.5 ng cDNA, 200 nM of each of
   both forward and reverse primers, and 1X SYBR GreenER qPCR SuperMix
   (Invitrogen). Primers used in this study are listed in the File S1.
   qPCR was performed in 96-well optical plates using the AB 7500
   Real-Time PCR System instrument and software (Invitrogen) and was
   analyzed by the method previously reported by our group[[91]43].

Results

Genome coverage

   C. difficile is an unusual species because the number of conserved
   genes in a given strain is very low and many of the strains contain a
   very high number of genes that are unique to that
   strain[[92]19,[93]20]. It has also undergone very rapid evolution in
   the last two decades by acquiring several new genes [[94]25]. To
   understand how historic and recently emerged C. difficile respond to
   physiological stress, two historical (CD630 and CD196) and two recently
   emerged (QCD-32g58 and [95]R20291) strains were subjected to nutrient
   shift and osmotic shock. Gene expression under these conditions was
   determined using RNA-seq and these results were compared to gene
   expression levels during growth in BHI (control condition). A total of
   24 samples from these conditions were used for paired-end
   bi-directional Illumina sequencing. Illumina sequence files were
   converted to Sanger fastq format and rRNAs were filtered. The quality
   of the sequence data was checked using the FastQC program. The sequence
   reads were 90 nucleotides in length and the total number of reads per
   sample was ~ 26.6 million on average. The filtered RNA-Seq sequence
   reads from the 24 samples were aligned to their corresponding C.
   difficile reference genome using Bowtie. For all samples analyzed,
   88-95% of reads were mapped with MAPQ greater than or equal to 10. The
   transcript expression levels were estimated as FPKM using
   Cufflinks[[96]44]. The number of genes expressed (FPKM>0) was
   calculated for each sample. There was very high correlation between
   samples when the number of expressed genes was compared between
   biological replicates ([97]Table 2). We used PATRIC [[98]45]
   annotations of the C. difficile genomes as references. In this