ABSTRACT

   As for many model organisms, the amount of Listeria omics data produced
   has recently increased exponentially. There are now >80 published
   complete Listeria genomes, around 350 different transcriptomic data
   sets, and 25 proteomic data sets available. The analysis of these data
   sets through a systems biology approach and the generation of tools for
   biologists to browse these various data are a challenge for
   bioinformaticians. We have developed a web-based platform, named
   Listeriomics, that integrates different tools for omics data analyses,
   i.e., (i) an interactive genome viewer to display gene expression
   arrays, tiling arrays, and sequencing data sets along with proteomics
   and genomics data sets; (ii) an expression and protein atlas that
   connects every gene, small RNA, antisense RNA, or protein with the most
   relevant omics data; (iii) a specific tool for exploring protein
   conservation through the Listeria phylogenomic tree; and (iv) a
   coexpression network tool for the discovery of potential new
   regulations. Our platform integrates all the complete Listeria species
   genomes, transcriptomes, and proteomes published to date. This website
   allows navigation among all these data sets with enriched metadata in a
   user-friendly format and can be used as a central database for systems
   biology analysis.

   IMPORTANCE In the last decades, Listeria has become a key model
   organism for the study of host-pathogen interactions, noncoding RNA
   regulation, and bacterial adaptation to stress. To study these
   mechanisms, several genomics, transcriptomics, and proteomics data sets
   have been produced. We have developed Listeriomics, an interactive web
   platform to browse and correlate these heterogeneous sources of
   information. Our website will allow listeriologists and microbiologists
   to decipher key regulation mechanism by using a systems biology
   approach.

INTRODUCTION

   Listeria monocytogenes is a foodborne pathogen responsible for
   foodborne infections with a mortality rate of 25%. This pathogen is
   responsible for gastroenteritis, sepsis, and meningitis and can cross
   three host barriers, the intestinal, placental, and blood-brain
   barriers. It is a major concern for pregnant women, as it induces
   abortions ([43]1). L. monocytogenes can enter, replicate in, and
   survive in a wide range of human cell types, such as macrophages,
   epithelial cells, and endothelial cells. Moreover, Listeria has emerged
   as a model organism for the study of host-pathogen interactions
   ([44]1[45]–[46]3).

   Listeria belongs to the Firmicutes phylum. The Listeria genus is made
   up of the widely studied pathogenic species L. monocytogenes; another
   pathogenic species, Listeria ivanovii, that mostly affects ruminants;
   and 15 nonpathogenic species ([47]4[48]–[49]10). In 2001, the genomes
   of L. monocytogenes strain EGD-e and one Listeria innocua strain were
   sequenced ([50]11). Since then, many other Listeria genomes, covering
   all the lineages, have been sequenced ([51]12[52]–[53]17). Currently,
   the NCBI refSeq database contains 83 complete Listeria genomes,
   including 70 L. monocytogenes genomes. The number of Listeria strains
   sequenced will probably grow exponentially in the coming years. Efforts
   have been made to summarize all these genomes on specific databases
   like ListiList ([54]11), GenoList ([55]18), GECO-LisDB server ([56]16),
   and ListeriaBase ([57]19) to find common gene features and to develop
   pangenome studies of Listeria species.

   The first Listeria transcriptomic data set was published in 2007
   ([58]20). Since that report, 64 ArrayExpress studies, corresponding to
   362 different biological conditions, have been produced ([59]21). Only
   seven of them are transcriptome sequencing (RNA-Seq) studies, and all
   the others correspond to transcription profiling by microarrays, with
   the EGD-e strain being the most frequently used strain. Listeria is
   also a key organism in the study of bacterial regulatory small
   noncoding RNAs (sRNAs). Despite the high number of studies on Listeria
   noncoding RNAs, only two websites with Listeria-related data sets have
   been published. The first one is a genome viewer published along with a
   transcription start site (TSS) study of Listeria ([60]22). The second
   is the sRNAdb database ([61]23), which provides tools to visualize the
   conservation of gene loci surrounding noncoding RNAs in different
   Gram-positive bacteria.

   The ability of L. monocytogenes to enter into various types of cells is
   due to the variety of proteins it secretes or anchors to its cell wall
   and external membrane. Consequently, many proteomic studies have been
   performed to analyze the exoproteome of Listeria ([62]24[63]–[64]35).
   Other studies have focused on cytoplasmic proteins ([65]27, [66]33,
   [67]35[68]–[69]44). To our knowledge, 74 proteome studies have been
   conducted to decipher the production and localization of Listeria
   proteins. Nevertheless, no database exists that combines all these
   proteomics data sets into a single, user-friendly resource.

   The number of omics data sets produced has increased exponentially. The
   number of tools to analyze these data, as well as the diversity of
   databases to store them, has also burgeoned. In parallel with this
   increase, many efforts have been made to develop accurate web-based
   tools to integrate diverse omics data for each model organism. One of
   the most complete resources is certainly the University of California
   Santa Cruz Encyclopedia of DNA Elements (ENCODE at UCSC) Genome Browser
   ([70]45), which allows the visualization of a large variety of human
   and mouse omics data sets. For prokaryotic organisms, the BioCyc
   ([71]46, [72]47) and Pathosystems Resource Integration Center ([73]48)
   websites have been created. These websites connect all the published
   genomic and transcriptomic data sets for prokaryotic organisms to
   metabolic pathways. Such wide-ranging web resources are useful for
   microbiologists, but for in-depth analyses, the development of
   individual web resources with curated metadata per model organism is
   also required. In the case of bacteria, few heterogeneous omics data
   sets are available ([74]49) and few model organisms have dedicated web
   resources, including Escherichia coli, with RegulonDB ([75]50) and
   PortEco ([76]51), and Bacillus subtilis, with SubtiWiki ([77]52). As
   yet, resources for Listeria species are limited.

   Here, we present Listeriomics ([78]http://listeriomics.pasteur.fr/), a
   highly interactive web resource summarizing many omics data sets
   related to the genus Listeria. We have curated and integrated all the
   available Listeria transcriptomic, proteomic, and genomic data sets to
   date. The Listeriomics platform was developed not only to integrate
   these diverse data sets but also to display them in a single viewer. To
   interactively explore these data sets, our website also provides
   different tools, i.e., (i) a genome viewer for displaying gene
   expression arrays, tiling arrays, and sequencing data, along with
   proteomic and genomic data sets; (ii) an expression atlas and protein
   atlas, inspired by the EBI Expression Atlas, that connects genomic
   elements (genes, small RNAs, antisense RNAs [asRNAs]) to the most
   relevant omics data; (iii) a specific tool for exploring protein
   conservation through the Listeria phylogenomic tree; and (iv) a
   coexpression network analysis tool for the discovery of potential new
   regulations.

RESULTS

The Listeriomics web interface.

   Genomic, transcriptomic, or proteomic data can be browsed by using the
   Listeriomics website ([79]http://listeriomics.pasteur.fr/) main page
   ([80]Fig. 1; [81]Table 1; see [82]Fig. S1 in the supplemental
   material). For each type of data, we designed a summary panel to
   navigate through the different data sets. The top banner of the website
   gives direct access to them. As summarized in [83]Table 1, users can
   search 83 complete Listeria genomes and browse 492 transcriptome and 74
   proteome data sets. Listeriomics integrates four tools for omics data
   management, i.e., (i) a genome viewer for displaying gene expression
   array, tiling array, and sequencing data along with proteomics and
   genomics data; (ii) an expression atlas and protein atlas that connect
   every genomic element (genes, small RNAs, asRNAs) to the most relevant
   omics data; (iii) a protein conservation tool for the direct
   visualization of the presence or absence of a protein in a specific
   Listeria strain; and (iv) a coexpression network analysis tool for the
   visualization of genome features with the same expression profile.

FIG 1 .

   [84]FIG 1
   [85]Open in a new tab

   Overview of the Listeriomics platform. (Center) The five major tools of
   Listeriomics, i.e., gene conservation and synteny, coexpression
   network, genome viewer, expression, and protein atlas. (Left) Summary
   of all the available genomic information available on the website.
   (Right) List of all the transcriptomic information available in
   Listeriomics. (Bottom) View of all the proteomic information that can
   be accessed.

TABLE 1 .

   Summary of omics data sets included in the Listeriomics database
   Category Data sizes Data type(s) Tools available
   Genomics 83 complete genomes (NCBI), all protein coding genes and
   noncoding RNAs, 304 small RNAs Genome, phylogeny, genome elements,
   homologs Genome summary, gene panel, small RNA panel, genome viewer
   Transcriptomics 362 biological conditions, 8 Listeria strains, 342
   comparisons Gene expression array, tiling array, TSS, RNA-Seq
   Transcriptome summary, expression atlas, heat map, genome viewer
   Proteomics 74 biological conditions, 4 Listeria strains, 28 comparisons
   Mass spectrometry Proteome summary, protein atlas, heat map, genome
   viewer
   [86]Open in a new tab
   FIG S1

   Flowchart of omics data set integration in the Listeriomics database.
   (A) Complete genome sequences from the RefSeq and GenBank databases
   were downloaded and integrated into Listeriomics, along with pathway
   information and small RNAs. (B) MAGE-TAB data sets were downloaded from
   ArrayExpress. Metadata on the data sets were manually curated, and
   processed gene expression array tables were added. Raw RNA-Seq data
   were downloaded and mapped to a reference genome. After log fold change
   calculation, all of the data sets were normalized with variance
   normalization to fix the statistical deviation at 1 and ensure
   comparability. (C) Proteomics data sets were manually curated from the
   core articles and related supplementary data. Download [87]FIG S1, EPS
   file, 1.1 MB^ (1.1MB, eps) .

   Copyright © 2017 Bécavin et al.

   This content is distributed under the terms of the [88]Creative Commons
   Attribution 4.0 International license.

   The genomic interface is designed to browse every complete genome of
   the Listeriomics resource. Users can access strain name, serotype,
   lineage, and isolation information, along with a complete phylogenomic
   tree of Listeria strains ([89]Fig. 1). From this table, scientists can
   access all the annotated genes of a specific strain. For each Listeria
   gene, five different information panels are available. The first panel
   shows all the general information about the position of the gene, its
   predicted annotated function. DNA and amino acid sequences can be
   accessed and saved as FASTA files or sent directly for a BLASTn or
   BLASTp search ([90]53). The predicted subcellular localization
   (cytoplasm, cytoplasmic membrane, cell wall, cell surface, and
   extracellular milieu [[91]27]) of each protein is also displayed along
   with information about the secretion pathway possibly used by the
   protein. The second panel provides an instant view of the conservation
   of a specific protein in other Listeria strains. This panel dynamically
   displays homologs on the Listeria reference tree in each existing
   Listeria strain. It also displays a summary table of all the homologous
   proteins with their similarity percentages and amino acid sequences.
   Users can also create a multialignment file of the homologous proteins.
   With the third panel, the user can visualize the protein locus synteny
   in all Listeria strains. We built an external synteny website by using
   the SynTView architecture ([92]54). A fourth panel uses the expression
   atlas to show in which transcriptomics data sets the selected gene is
   differently expressed. The fifth panel displays every proteomics data
   set in which the protein encoded by the selected gene has been
   detected. Finally, from the home webpage, a summary panel with all the
   small RNAs in L. monocytogenes EGD-e can be accessed ([93]Fig. 1). For
   each noncoding RNA element, one can display its position, its
   nucleotide sequence, its predicted secondary structure at 37°C, and a
   table displaying all supplementary information provided in source
   references ([94]22, [95]55[96]–[97]58).