Abstract

   Retting of bast fibres requires removal of pectin, hemicellulose and
   other non-cellulosic materials from plant stem tissues by a complex
   microbial community. A microbial retting consortium with
   high-efficiency pectinolytic bacterial strains is effective in reducing
   retting-time and enhancing fibre quality. We report comprehensive
   genomic analyses of three bacterial strains (PJRB 1, 2 and 3) of the
   consortium and resolve their taxonomic status, genomic features,
   variations, and pan-genome dynamics. The genome sizes of the strains
   are ~3.8 Mb with 3729 to 4002 protein-coding genes. Detailed
   annotations of the protein-coding genes revealed different
   carbohydrate-degrading CAZy classes viz. PL1, PL9, GH28, CE8, and CE12.
   Phylogeny and structural features of pectate lyase proteins of PJRB
   strains divulge their functional uniqueness and evolutionary
   convergence with closely related Bacillus strains. Genome-wide
   prediction of genomic variations revealed 12461 to 67381 SNPs, and
   notably many unique SNPs were localized within the important pectin
   metabolism genes. The variations in the pectate lyase genes possibly
   contribute to their specialized pectinolytic function during the
   retting process. These findings encompass a strong foundation for
   fundamental and evolutionary studies on this unique microbial
   degradation of decaying plant material with immense industrial
   significance. These have preponderant implications in plant biomass
   research and food industry, and also posit application in the
   reclamation of water pollution from plant materials.

   Subject terms: Applied microbiology, Comparative genomics, Bacterial
   genomics

Introduction

   The unique versatility, biodegradability, and affordability of bast
   fibre of jute (Corchorus olitorius L. and C. capsularis L.), has earned
   it the name ‘golden fibre’. A sizeable population of South Asian
   farmers depend on jute cultivation. The phloic bast fibres in jute and
   mesta (Hibiscus sabdariffa L. and H. cannabinus L.), originating from
   cambium tissues are characterized by thick cells consisting of
   ligno-cellulosic materials. The fibre bundles are bonded together with
   other non-fibrous tissues and woody core by pectinous substances and
   hemicelluloses^[30]1. Removal of these cementing biopolymers for
   separating the fibres holds the key to the quality and strength of the
   fibres, and thus has remained the focus of research for quite a long
   time. The process of separation and extraction of bast fibres from jute
   and mesta stem by removing pectin and other complex carbohydrates is
   called retting, which exploits the actions of microbial community
   present in the retting water to decompose plant tissues.

   Microbial activities influence the quality of fibre in hemp, flax,
   ramie, kenaf and jute^[31]2–[32]5. The stems, when steeped in retting
   water release pectins, hemicelluloses and free sugars, which in turn
   promotes microbial growth. Microbes having pectinolytic and xylanolytic
   properties but no cellulolytic activity are considered efficient
   retting microbes^[33]6. Pectin of jute and mesta is highly methyl
   esterified and its degradation requires the combined action of a group
   of pectin degrading enzymes, polygalacturonase and pectin/pectate
   lyase^[34]7,[35]8.

   Several researchers have made notable contribution to isolate and
   identify the microbes responsible for the retting of bast fibres,
   including jute and mesta^[36]6,[37]9–[38]11. Banik et al.^[39]12
   reported that the combined effects of urea and pectinolytic mixed
   bacterial culture reduced the retting time without explaining the role
   of enzymes involved. A composition of four Bacillus strains having
   pectinolytic, xylan and cellulose degrading abilities were used by Das
   et al. for ribbon retting of jute^[40]5. Notwithstanding the importance
   of these initial efforts, none of these microbial strains were further
   promoted and commercialized. An effective microbial retting consortium
   (CRIJAF SONA) was commercialized and adopted on a large-scale among
   jute growers, to reduce retting-time and enhancing fibre
   quality^[41]13,[42]14. Application of CRIJAF SONA consortium during
   retting reduced the retting duration of jute by 7 days, with improved
   fibre recovery and fibre quality i.e. colour, lustre, fibre strength
   (27.0–28.1 g/tex, fineness (2.7–2.8 tex) and fibre recovery by
   13.8–15.24% over control. The three Bacillus strains in the consortia
   have high polygalacturonase (PG) (5.1–6.0 IU/ml), pectin lyase (PNL)
   (185.7–203.7 IU/ml), and xylanase (15–16.2 IU/ml) activity^[43]13. The
   organisms present in microbial consortium were primarily identified up
   to species-level by metabolic fingerprinting pattern using Biolog
   system and further by ribotyping of a 977 bp 16 S rDNA fragment^[44]13.

   Further improvement of the efficacy of the retting consortium warrants
   comprehensive molecular characterization of these strains and precise
   identification of genes and enzymes involved in the retting process.
   Comprehensive genome-scale analyses permit the unambiguous
   establishment of species and strain typing^[45]15,[46]16. Also,
   verification of the functional uniqueness of these bacterial strains
   requires annotation of the complete set of genes, genomic variations,
   and specific gene analyses.

   We report here, reliable genome-level taxonomic resolution of PJRB
   strains in retting consortium, CRIJAF SONA. We further inquired into
   the functional uniqueness of these bacterial strains through specific
   gene analyses and genomic variations.

Materials and Methods

The microbial retting consortium

   The microbial consortium- consisting of three bacterial isolates (PJRB1
   – Acc. No. MTCC 5573, PJRB2 - MTCC 5574, and PJRB3 - MTCC 5575) with
   high polygalactouronase, pectin lyase, and xylanase
   activity^[47]13,[48]14 were used in the present study. Gram staining of
   the PJRB strains was performed using the standard protocol and
   visualized using light microscope (Olympus VX43). The size, morphology
   and other features were visualized using a scanning electron microscope
   (SEM, Hitachi S-530). Briefly, the protocol for sample preparation is
   as follows: log-phase cultures were fixed overnight in 0.25%
   glutaraldehyde (in 50 mM Na-Phosphate, pH 7.2) at room temperature. The
   bacterial cells were subsequently dehydrated for 10 min each in
   different ethanol grades (30%, 50%, 70%, 80%, and 90%) followed by
   storage in absolute ethanol prior to preparation of SEM stub. Finally,
   the bacterial cells were coated with gold in a sputterer and observed
   under scanning electron microscope at an electron high tension (EHT) of
   5 kV.

   The assay for extracellular pectinase enzyme was performed by
   inoculating the strains on pectin agar plates ameliorated with 1%
   citrus pectin followed by precipitation of the undigested pectin by
   soaking in 2% cetyl trimethyl ammonium bromide (CTAB) solution. The
   zone of substrate hydrolysis was measured as a ratio of diameter of
   clear zone to the diameter of colony as ‘potency index’.

Genomic DNA isolation, library preparation, and sequencing

   Bacillus strains were allowed to grow overnight in Luria Bertani broth
   at 37 °C and 5 ml overnight cultures were centrifuged at 5000×g for
   10 min at 4 °C. Genomic DNA was extracted using PureLink® Genomic DNA
   Kits (Invitrogen # K1820–01). The quality of the final DNA samples was
   evaluated by gel electrophoresis (1.5% agarose gel) and DNA
   concentration was measured in NanoDrop 2000с Spectrophotometer (Thermo
   Scientific, MA, USA). Three separate paired-end (PE) sequencing
   libraries were prepared with NEB Next Ultra DNA Library kit (NEB
   #E7370). Three individual PE libraries for each PJRB strains were
   sequenced from AgriGenome Labs Private Limited, Kerala, India using
   Illumina HiSeq. 2500 platform at 100 × coverage. The adapter sequences
   were trimmed using TrimmomaticPE-0.39 program^[49]17. The low quality
   reads (Phred quality score Q < 30) were filtered out and the unique
   reads were fetched using FastUniq^[50]18. Additionally, FastQ Screen
   v0.13.0^[51]19 was used to sample the Bacillus origin of reads as a
   quality control measure before assembly.

De novoassembly, contig ordering and scaffolding

   De novo assembly was initially performed using three different assembly
   methods, viz. SPAdes v.3.13.0^[52]20, ABySS v.2.1.5^[53]21 and Velvet
   v.1.2.10^[54]22. The default k-mer sizes were used for SPAdes assembly,
   whereas, a range of k-mers from 31 to 95 was used for Velvet and ABySS
   assemblies. Finally, the ABySS derived assemblies with kmer 95 were
   used in further downstream analyses. Scaffolding and gap filling of the
   ABySS-assembled contigs were performed using SSPACE v.3.0 16^[55]23 and
   GapFiller v1–10^[56]24. After genome comparison for taxonomic
   classification, the assemblies were finally ordered against complete
   genomes of the reference strains ([57]NZ_CP010405.1 – B. safensis
   FO-36b, [58]NZ_CP031880.1 – B. velezensis OSY-GA1, and
   [59]NZ_CP024204.1 – B. altitudinis P10) using Progressive MAUVE^[60]25.
   The ordered scaffolded assemblies were considered as ‘draft’ genomes.
   Genome wide-identification of phage-like and insertion sequences were
   performed using a webserver PHAST (PHAge Search Tool:
   [61]http://phast.wishartlab.com)^[62]26, and ISEscan^[63]27.

Genome-based taxonomic classification

   For genome-scale taxonomic analysis, the genome assemblies were
   searched at Microbial Genomes Atlas (MiGA) online server
   ([64]http://www.microbial-genomes.org)^[65]28 that uses NCBI
   non-redundant prokaryotic genomes database, JSpeciesWS
   ([66]http://jspecies.ribohost.com/jspeciesws)^[67]29, and TrueBac^TM ID
   system from the EzBioCloud server
   ([68]https://www.ezbiocloud.net/contents/genome). The species
   identification carried out in TrueBac-ID was based on the comparison of
   average nucleotide sequence identity (ANI) between the genomes of PJRB
   and their type strains. For a reliable identification of the strains,
   the Type (Strain) Genome Server (TYGS)^[69]30 was employed.

Gene prediction and annotations

   All three draft PJRB genomes were annotated using the NCBI Prokaryotic
   Genome Annotation Pipeline (PGAP)^[70]31. The functional annotation of
   genes was also carried out using web-based Rapid Annotation Using
   Subsystem Technology (RAST) annotation server
   ([71]http://rast.theseed.org/FIG/rast.cgi)^[72]32. Protein coding genes
   were scanned for their organization into operons using the web server
   Operon-mapper^[73]33. Functional annotation of genes in terms of Kyoto
   Encyclopedia of Genes and Genomes (KEGG) orthology assignments and
   predictions of KEGG pathways were carried out through KEGG Automatic
   Annotation Server (KAAS) server
   ([74]https://www.genome.jp/kegg/kaas/)^[75]34 using bi-directional best
   hit (BBH) method in GhostZ. Similarly, the PJRB genomes were also
   scanned for cluster of orthologous groups (COGs) annotations using
   eggNOG-mapper v2 ([76]http://eggnog-mapper.embl.de)^[77]35,[78]36.
   Automated Carbohydrate-active enzyme ANnotation web server, (dbCAN meta
   server ([79]http://bcb.unl.edu/dbCAN2/blast.php)^[80]37, was employed
   to annotate Carbohydrate-Active enZYmes (CAZy)^[81]38. Potential
   interaction among the pectin and xylan-degrading CAZys were carried out
   using protein-protein interaction database, STRING v.11^[82]39.

   For phylogenetic analyses of pectate lyase proteins, a total of 48 PLs
   from bacteria, fungi, nematode, and land plants were retrieved from the
   NCBI protein database. Domains for polysaccharide lyase (PL) were
   identified using SUPERFAMILY 2 database ([83]http://supfam.org/)^[84]40
   before analyzing their phylogenetic evolutions in MEGA X by MUSCLE
   alignment, Whelan And Goldman model (WAG) amino acid substitution model
   (Gamma distributed rates among sites) and maximum likelihood tree
   reconstruction^[85]41. Protein structure homology models, enzyme active
   site, and consensus ligand docking residues of PJRB pectate lyases were
   predicted using COFACTOR and COACH
   ([86]https://zhanglab.ccmb.med.umich.edu/COACH/)^[87]42.

Comparative genome analyses

   Pan-genome comparative analyses of the PJRB strains were carried out
   using ROARY pipeline^[88]43. For comparison, complete genomes of
   strains of each species were retrieved from the NCBI database
   (Table [89]S1). Both PJRB1 and PJRB2 genomes were compared with genomes
   of thirty strains each of B. safensis and B. velezensis, respectively.
   PJRB3 genome was compared with genomes of twenty-seven strains of B.
   altitudinis. The comparison and annotation of orthologous gene clusters
   among PJRB and their closely related strains were carried out using
   OrthoVenn2 ([90]https://orthovenn2.bioinfotoolkits.net/home)^[91]44.
   For genome synteny and collinearity analyses, online tools
   D-GENIES^[92]45 and C-Sibelia software^[93]46 were used and alignment
   of syntenic blocks were visualized in Circos^[94]47.

SNP identification

   For SNP and variant identification among the bacterial strains, three
   separate tools viz. Snippy ([95]https://github.com/tseemann/snippy),
   BactSNP ([96]https://github.com/IEkAdN/BactSNP)^[97]48 and Parsnp-Gingr
   of Harvest suite^[98]49 were used. For Snippy and BactSNP, fastq raw
   reads of PJRB1, PJRB2, and PJRB3 were aligned against complete genomes
   of respective reference strains ([99]NZ_CP010405.1, [100]NZ_CP031880.1,
   and [101]NZ_CP024204.1). In case of Parsnp, the PJRB1 and PJRB2 genomes
   were aligned to thirty genomes each of B. safensis and B. velezensis,
   respectively and PJRB3 genome with twenty-seven B. altitudinis genomes.

Results

Microscopy and pectinase activity of jute retting bacterial consortium

   After Gram staining, the bacterial strains were observed as purple,
   rod-shaped cells under the compound light microscope indicating those
   are Gram-positive Bacilli (Fig. [102]1a–c). In order to examine macro
   structural features and differences on the surface of jute retting PJRB
   strains, we performed high-resolution SEM imaging (Fig. [103]1d–f). The
   micrograph showed that all three strains are rod shaped and have
   comparable shapes and size (length ~1.5 µM, width ~0.5 µM). The
   pectinase assay of the above PJRB strains showed high pectinase
   activity measured as potency index; PJRB1–2.8, PJRB2–2.6, and
   PJRB3–2.45. On pectin agar plates, the hydrolysis of pectin was
   observed as halo zones surrounding the bacterial colony amidst opaque
   area with residual pectin (Fig. [104]1g–i). The zone of substrate
   hydrolysis was measured as a ratio of diameter of clear zone to the
   diameter of colony as ‘potency index’ and all the three strains
   confirmed the potency index of >3 as reported by Das et al.^[105]13.

Figure 1.

   [106]Figure 1
   [107]Open in a new tab

   Morphological and phenotypic characterization of PJRB strains.
   Rod-shaped Gram positive bacterial cells with size are resolved in the
   upper and middle panel through light microscopy (a–c) and scanning
   electron microscopy (SEM) (d–f), respectively. Scale bars are shown in
   each figure. In the lower panel, the strains grown on pectin agar
   plates with 1% citrus pectin. Halo indicates the zone of substrate
   hydrolysis by the pectinolytic activity of PJRB1 (g), PJRB2 (h) and
   PJRB3 (i).

Genome assemblies and genomic features of PJRB strains

   Whole genome shotgun sequencing of the three PJRB strains were
   performed individually to generate ~4.0 GB data each with nearly 100 ×
   genome coverage. The PE sequencing (mean read length 150 bp) generated
   approximately 16.5 to 17.0 million reads with an average Phred quality
   score of 38.8. After quality filter (Q > 30), approximately 14.6 to
   14.8 million reads were assembled using three different tools SPAdes,
   Velvet, and ABySS (Table [108]1). On preliminary examination, out of
   the three assembly methods, the ABySS generated assemblies with kmer 95
   were found to have consistent statistics and genome sizes comparable to
   other Bacillus strains and therefore, selected for further downstream
   analysis. The primary assemblies of PJRB1, PJRB2, and PJRB3 consisted
   of 3813976, 3880712, and 3899490 base pairs with N50 values of
   788506 bp, 528477 bp, and 205868 bp respectively (Table [109]1).

Table 1.

   Sequence metrics and features of PJRB genomes.
          Features           PJRB1      PJRB2      PJRB3
   No. of reads            14,843,867 14,851,391 14,639,509
   Primary assemblies
   SPAdes
   Assembly length         4,902,156  6,892,657  5,825,773
   N50                     708,884    569,445    173,580
   Velvet
   Assembly length         8,060,469  6,240,148  3,722,406
   N50                     561        625        671
   ABySS
   Assembly length         3,813,976  3,880,712  3,899,490
   N50                     259,390    458,936    98,293
   Final assembly post scaffolding and contig reordering
   Scaffold Nos.           21         21         46
   Total bases             3,809,132  3,876,440  3,883,973
   Maximum scaffold length 981,857    1,096,081  392,651
   N50                     788,506    528,477    205,868
   GC %                    41.42      46.55      40.88
   [110]Open in a new tab

   In order to improve the genome assemblies, contigs were arranged to be
   part of larger scaffolds and ordered according to the complete genomes
   of respective closest strains (Fig. [111]S1). The final scaffolded and
   reordered PJRB1, PJRB2, and PJRB3 assemblies consisted of 21, 21, and
   46 scaffolds with a maximum size of 981857, 1096081 and 392651 bp,
   respectively. The final genome size of PJRB1, PJRB2, and PJRB3
   consisted of 3809132, 3876440 and 3883973 base pairs with a GC
   percentage of 41.42, 46.55, and 40.88, respectively (Fig. [112]2).
   PJRB1 genome comprised three insertion elements and one intact phage
   sequence, PJRB2 genome comprised six insertion sequence and no intact
   phage, while the PJRB3 genome comprised 14 insertion sequences and one
   intact phage (Fig. [113]2).

Figure 2.

   [114]Figure 2
   [115]Open in a new tab

   Circular representation of draft genomes and features of the PJRB
   strains. (a) Draft genome of PJRB1 aligned to B. safensis FO-36b
   strain; (b) PJRB2 aligned to B. velezensis OSY-GA1 strain; and (c)
   PJRB3 to B. altitudinis strain P-10. The contents of the featured rings
   (starting with the outermost ring to the centre) are as follows. Ring
   1: Position of insertion (IS) and phage-like (phi) sequences; Ring 2:
   distribution of the scaffolds; Ring 3 and 4: ORFs in forward and
   reverse strands; Ring 5: BLASTn hits to reference Bacillus strains;
   Ring 6: Genome-wide COG class annotation of the PJRB genomes. Each COG
   classes are depicted in different colour; Ring 7: plots of GC content;
   Ring 8: GC skew plot, values above average is depicted in green and
   below average in purple. Important pectin degradation related genes
   were marked as pelA, pelB (Pectate lyase), pemA (Pectin esterase A) and
   xylB (Xylulose kinase). The figures were produced using CGView
   Server^BETA ([116]http://cgview.ca/).

Genome-level resolution of taxonomic identities

   Previously, with 16 S rDNA sequencing and metabolic fingerprinting in
   Biolog, the PJRB1, PJRB2, and PJRB3 were identified as strains of
   Bacillus pumilus^[117]14. Notwithstanding the importance of these
   methods in initial classification, the genome sequences of the strains
   are imperative for insights into their precise taxonomic status using
   different genome-based taxonomic tools. For PJRB1, the closest
   relatives found in the MiGA database were B. safensis [118]CP032830
   (98.85% ANI) and B. safensis NZ [119]CP018197 (98.28% ANI). PJRB1 was
   suggested to most likely belong to the species B. safensis (p-value:
   0.0016) or to the same subspecies of B. safensis [120]CP032830
   (p-value: 0.051). Similarly, the closest relative of PJRB2 was B.
   velezensis NZ [121]CP031880 (99.51% ANI). This strain most likely
   belongs to either B. velezensis (p-value: 0.0008) or the same
   subspecies of B. velezensis NZ [122]CP031880 (p-value: 0.05). In case
   of PJRB3, the closest relatives found by MiGA was B. altitudinis NZ
   [123]CP024204 (98.23% ANI). PJRB3 therefore, most likely belongs to the
   species B. altitudinis (p-value: 0.0032) or to the same subspecies of
   B. altitudinis NZ [124]CP024204 (p-value: 0.053). The RDP classifier
   which estimates the broad taxonomy of bacterial cells using 16 S rRNA
   training set, classified all the above three strains as Bacillus spp.
   with 100% confidence.

   The taxonomic identities of the above three strains were further
   confirmed using the tools TrueBac ID from the EzBioCloud server and
   TYGS server. Based on genomic evidence, the above tools confirmed the
   identity of these strains as PJRB1: B. safensis (97.5% genomic
   similarity and ANI 97.5%); PJRB2: B. velezensis (97.8% genomic
   similarity and ANI 97.8%), and PJRB3 as B. altitudinis (98.3% genomic
   similarity and ANI 98.3%). Using Genome BLAST Distance Phylogeny (GBDP)
   approach of TYGS, the three query strains were assigned to 14 species
   clusters and were grouped accordingly with the respective species
   (Fig. [125]S2).

Pan-genome comparative analyses of PJRB strains and genome synteny

   In order to compare the functional genes in terms of core and accessory
   genes shared or unique among strains, the PJRB strains in their
   respective clades were inferred at pan genome-scale (Fig. [126]S3). The
   PJRB1 genome was analyzed based on 9363 gene clusters of thirty-one B.
   safensis strains. Out of 9363 orthologous gene clusters, 2727 genes
   (29.13%) constituted the core-genome (present in >99% of genomes), and
   only 214 (2.29%) as soft core genes (in 95–99% of genomes). Shell genes
   (present in 15–95% of genomes) and cloud genes (present in 0–15% of
   genomes) were 1370 and 5050, respectively indicating that 68.59% genes
   present as either shell genes or cloud genes. The pan-genome of PJRB2
   and other B. velezensis strains comprised a total of 8107 genes as gene
   cluster, with 37.18% as core genes and 60.59% as shell and cloud genes.
   Although the B. altitudinis pan-genome was comprised of least number of
   7751 gene cluster, the proportion of core genes was highest (38.45%)
   and shell and cloud genes together comprised slightly less than 60% of
   the pan-genome. The above analysis indicates a considerable proportion
   of genes comprised the unshared portion in shell and cloud genes.
   Moreover, for all the above three pan genomic analyses the size of
   pan-genome increased with further addition of genomes (Fig. [127]S4).

   Pair-wise genome alignments of PJRB strains with other strains from
   same species revealed that they have very high genomic similarity and
   collinearity. No significant genomic rearrangements were evident from
   the dot plots except few chromosomal deletions. Chromosomal inversion
   was observed only in case of B. altitudinis strains PJRB3 and GR8. The
   pairwise comparison of multiple alignment blocks between PJRB1 and
   FO-36b strains showed 58 syntenic regions distributed in 10 scaffolds.
   Similarly, PJRB2 and OSY-GA1 strains showed 48 synteny blocks
   distributed in 10 scaffolds and PJRB3 and P-10 showed 51 synteny blocks
   distributed in 22 scaffolds (Fig. [128]3).

Figure 3.

   [129]Figure 3
   [130]Open in a new tab

   Genomic synteny shared between PJRB and their phylogenetically nearest
   strains. The pairwise multiple synteny blocks between PJRB1 and FO-36b
   strains showed 58 syntenic regions distributed in 10 scaffolds.
   Similarly, PJRB2 and OSY-GA1 strains showed 48 syntenic blocks
   distributed in 10 scaffolds and PJRB3 and P-10 showed 51 syntenic
   blocks distributed in 22 scaffolds. The syntenic diagrams were
   generated using Circos as an inbuilt tool of C-Sibelia^[131]46.

Genome annotation of jute retting bacterial consortium

   The genome assemblies of PJRB1, PJRB2, and PJRB3 were annotated
   in-depth using the NCBI PGAP (Table [132]2; details in
   Tables [133]S2–[134]S4). The genome of PJRB3 contains highest number of
   3901 genes organized in 2158 operons and PJRB2 had 3699 protein coding
   genes in 2008 operons; whereas PJRB1 had 3780 protein coding genes in
   2034 operons. In contrast, PJRB3 consisted lowest number of RNA coding
   genes (88 genes) compared to PJRB2 and PJRB1 (111 and 89 RNA coding
   genes, respectively). Pseudogenes found in these three assembles were
   47, 99, and 88, respectively for the PJRB1, PJRB2, and PJRB3 genomes.

Table 2.

   Annotation details of PJRB genomes through NCBI Prokaryotic Genome
   Annotation Pipeline (PGAP).
     Features   PJRB1 PJRB2 PJRB3
   Total genes  3,916 3,909 4,077
   CDSs         3,827 3,798 3,989
   Proteins     3,780 3,699 3,901
   Total RNA    89    111   88
   rRNAs
     5 S        3     7     5
     16 S       3     4     2
     23 S       4     5     1
   tRNAs        74    90    75
   ncRNAs       5     5     5
   Pseudo genes 47    99    88
   [135]Open in a new tab

   The functional annotation tool RASTtk, which provides accurate
   subsystem level annotations besides predicting number of genes, was
   also employed to analyse the PJRB1, PJRB2, and PJRB3 genomes. The
   number of genes (including both protein encoding and RNA genes) were
   4078, 4054, and 4258, respectively for the three genomes. As per RASTtk
   seed subsystem annotations, 1271 genes (32%) of PJRB1 fall into 337
   subsystems, 1246 genes (32%) of PJRB2 into 342, and 1267 genes (31%) of
   PJRB3 into 345 subsystems. Among different subsystem categories ‘amino
   acid and derivatives’ comprised highest number of genes (350 in PJRB1,
   321 in PJRB2, and 328 in PJRB3) followed by ‘carbohydrates’ (267 in
   PJRB1, 234 in PJRB2, and 276 in PJRB3) (Fig. [136]S5).

Analysis of orthologous genes in PJRB strains

   Clusters of orthologous groups (COG) annotation of the PJRB genomes
   revealed that 88.87% to 91.32% of the protein-coding genes were
   annotated using COG database (Fig. [137]4a) (details in
   Tables [138]S5–[139]S7). Among the COG functions, a substantial
   fraction of genes (36.54% to 38.81%) was involved in ‘metabolism’; of
   which ‘amino acid transport and metabolism (E)’ is the predominant
   functional category (333 to 339 genes). The other COG functional
   categories consisted ‘information storage and processing’ of about 18%
   and ‘cellular process and signalling’ of about 16.5% of the COG
   categories. viz. cell wall/membrane/envelope biogenesis (M168–186
   replication, recombination and repair; L129–149 signal transduction
   mechanisms; T 105–108 translation, ribosomal structure and biogenesis;
   J 175–186 transcription; K 295–318).

Figure 4.

   [140]Figure 4
   [141]Open in a new tab

   Analysis of orthologous genes in PJRB strains using COG, KEGG, and
   OrthoVenn. (a) Bar plot showing number of genes under 22 different COG
   categories depicted on X-axis according to four broad functional
   groups. (b) KEGG pathway enrichment analysis represented through bar
   chart shows distribution of number of proteins in 20 different KEGG
   functional categories annotated using the KAAS. (c) Venn diagram
   represents distribution of shared and unique gene clusters among
   different PJRB strains.

   Furthermore, KEGG pathway enrichment analysis could annotate nearly
   half of the total protein-coding genes, majority of which are involved
   in metabolic processes (Fig. [142]4b). Carbohydrate metabolism (~240
   genes) and amino acid metabolism (~128 genes) constitute the
   predominant categories of metabolic genes. In each of the PJRB strains
   ~22 genes are involved in ‘biosynthesis and metabolism of glycans’.
   Xenobiotics biodegradation and metabolism is another important category
   of genes in KEGG analysis. PJRB1 and PJRB2 genome contained seven genes
   under this category, whereas PJRB3 consisted only three such genes.
   Apart from the metabolism-related genes, protein families for genetic
   information processing (~262 genes) and protein families for signalling
   and cellular processes (~190 genes) also correspond to the major share
   of KEGG classifications in all the three PJRB genomes.

   At the protein sequence level, analysis with OrthoVenn exhibited that
   the three strains form 3603 clusters, 1036 orthologous clusters (at
   least contains two species), and 2567 single-copy gene clusters
   (Fig. [143]4c). Number of ortholog clusters shared by all three species
   were 2590, while 962 clusters were shared by at least two genomes. A
   total of 51 gene clusters were specific to only a single genome. Out of
   the 51 gene clusters, three belonged to PJRB1, whereas, 25 and 23
   specific gene clusters were from PJRB2 and PJRB3, respectively.

Major CAZy categories and pectin degradation genes

   Since PJRB strains are essentially selected for their pectin
   degradation characteristics, it is imperative to analyse the
   genome-wide distribution of genes encoding CAZys. Total number of
   CAZymes were highest in strains of B. velezensis (88–108) as compared
   to that of B. safensis (77–81) and B. altitudinis (79–83)
   (Table [144]3) (details in Tables [145]S8–[146]S10). The major
   carbohydrate degrading CAZy classes observed are PL1, PL9, GH28, CE8,
   and CE12. Among the CAZy classes, the glycoside hydrolase family 28
   (GH28) was most predominant with their numbers varying between 32–39
   among different strains of B. safensis, B. velezensis and B.
   altitudinis. Among the three strains, PJRB2 contained maximum 38 GHs,
   while PJRB1 and PJRB2 consisted 35 GHs each. With a closer look on the
   polysaccharide lyase CAZy family, which encodes pectin degrading
   enzymes like exo-pectate lyase (PL1; EC 4.2.2.9) and
   exopolygalacturonate lyase (PL9; EC 4.2.2.9), one copy each of PL1 and
   PL9 were observed in PJRB1 and PJRB3 strains. Whereas, PJRB2 genome
   consisted two copies of PL1 and single PL9. Carbohydrate esterase
   family 8 (CE8) encoding for pectin methylesterase (EC 3.1.1.11) and
   CE12 for pectin acetylesterase (EC 3.1.1.6), which catalyze the
   acylation of carbohydrates constitutes an important CAZy family. The
   number of CEs varied between 13–15 in B. altitudinis strains, while all
   strains of B. safensis contained exactly 15 CEs. Although strains of B.
   velezensis contained fewer CEs (11), PJRB2 in contrast contained 20
   CEs. By incorporating the STRING interaction results with the pectin
   and xylan degrading enzymes of other Bacillus species, we could predict
   an enzyme interaction network based on co-occurrence of the respective
   genes across the genomes (Fig. [147]S6). Pectin esterase, derived by
   automated computational analysis using gene prediction method, was
   found to interact with arabinoxylan hydrolase. Pectin lyase, catalyzing
   the depolymerization of methyl-esterified pectins was found to interact
   with rhamnogalacturonan acetylesterase, which play a considerable role
   in the degradation of rhamnogalacturonan derived from plant cell walls.
   Similarly, pectate lyase was found to interact with glycoside
   hydrolases (Table [148]S11).

Table 3.

   Genome-wide comparative distribution of CAZymes in PJRB genomes and
   selected Bacillus species.
   Bacterial species and strains Total CAZy* Major CAZy categories
   Pectinase coding CAZys
   GT GH PL CE AA CBMs
   B. safensis PJRB1 78 25 35 2 15 0 3 PL1, PL9, GH28,CE8, CE12
   B. safensis FO-36b 75 24 33 2 15 0 3 PL1, PL9, GH28, CE8, CE12
   B. safensis U14-5 76 23 35 2 15 0 3 PL1, PL9, GH28, CE8, CE12
   B. safensis BRM1 75 24 33 2 15 0 3 PL1, PL9, GH28, CE8, CE12
   B. safensis U41 77 25 33 2 15 0 4 PL1, PL9, GH28, CE8, CE12
   B. safensis KCTC 12796BP 79 24 36 2 15 0 4 PL1, PL9, GH28, CE8, CE12
   B. velezensis PJRB2 84 36 38 3 20 5 6 PL1 (2), PL9
   B. velezensis OSY-GA1 85 32 37 3 11 1 4 PL1 (2), PL9
   B. velezensis SQR9 87 32 39 3 11 1 4 PL1 (2), PL9
   B. velezensis L-S60 85 32 37 3 11 1 4 PL1 (2), PL9
   B. velezensis JS25R 88 33 39 3 11 1 5 PL1 (2), PL9
   B. velezensis AS43.3 85 32 37 3 11 1 4 PL1 (2), PL9
   B. altitudinis PJRB3 81 28 35 2 14 0 4 PL1, PL9, GH28, CE8, CE12
   B. altitudinis P-10 78 27 34 2 13 0 4 PL1, PL9, GH28, CE8, CE12
   B. altitudinis GR-8 78 27 33 2 14 0 4 PL1, PL9, GH28, CE8, CE12
   B. altitudinis W3 80 27 34 2 15 0 4 PL1, PL9, GH28, CE8, CE12
   B. altitudinis GQYP101 80 27 34 2 15 0 4 PL1, PL9, GH28, CE8, CE12
   B. altitudinis HQ-51-BA 77 27 32 2 14 0 4 PL1, PL9, GH28, CE8, CE12
   [149]Open in a new tab

   *Consensus CAZy predictions by HMMER, Diamond BLAST hits, and HotPep.

   GHs - Glycoside hydrolases; GTs - Glycosyl transferases; PLs -
   Polysaccharide lyases; CEs - Carbohydrate esterases; AAs - Auxiliary
   activities; CBMs - Carbohydrate-binding modules.

Phylogenetic analyses and homology modelling of pectate lyases

   Phylogenetic relationship was analyzed among 48 pectate lyase proteins
   from PJRB and their related strains and also from other well
   characterized bacteria (Fig. [150]5). Pectate lyase being a ubiquitous
   enzyme, sequences from fungi, nematode, and land plants were also
   included to analyze the diversity. The Maximum Likelihood tree grouped
   together all the pectate lyases from Bacillus safensis and Bacillus
   altitudinis including those of PJRB1 and PJRB3 as per their species
   affiliations. Whereas the two isoforms of pectate lyases of PJRB2
   formed two separate and distinct clusters with B. velezensis strains
   OSY-GA1, S3–1, SQR9 and GQJK49. None of the Bacillus pectate lyases
   clustered with those of plants, fungi or nematodes.

Figure 5.

   [151]Figure 5
   [152]Open in a new tab

   Phylogenetic relationship of pectate lyases in PJRB strains with other
   organisms. The Maximum Likelihood tree of 48 pectate lyase proteins
   were constructed using WAG amino acid substitution model with Gamma (G)
   distributed rates among sites and complete deletion of gaps and missing
   data. The tree with the highest log likelihood (−5121.95) is shown.
   PJRB strains are marked in red. Bsaf: Bacillus safensis; Bvel: Bacillus
   velezensis; Balt: Bacillus altitudinis; Bsub: Bacillus subtilis; Rasol:
   Ralstonia solanacearum; Blic: Bacillus licheniformis; Psav: Pseudomonas
   savastanoi; Xcam: Xanthomonas campestris; Saur: Streptomyces aureus;
   Dchr: Dickeya chrysanthemi; Tmar: Thermotoga maritima; Pbar:
   PaeniBacillus barcinonensis; Cace: Clostridium acetobutylicum; Cjap:
   Cellvibrio japonicus; Sdeg: Saccharophagus degradans; Fsuc: Fibrobacter
   succinogenes; Chut: Cytophaga hutchinsonii; Minc: Meloidogyne
   incognita; Gros: Globodera rostochiensis; Have: Heterodera avenae;
   Cchl: Colletotrichum chlorophyti; Foxy: Fusarium oxysporum; Rsol:
   Rhizoctnia solani; Pbra: Penicillium brasilianum; Anig: Aspergillus
   niger; Atha: Arabidopsis thaliana; Ntab: Nicotiana tabacum; Ptri:
   Populus trichocarpa; Slyc: Solanum lycopersicum; Mtra: Medicago
   truncatula; Gmax: Glycine max; Osat: Oryza sativa japonica.
   Representative models of the PJRB pectate lyases derived from closest
   PDB homologues using COFACTOR module of COACH server are show in the
   right side with the active site residues denoted in pink. The
   respective Polysaccharide lyase-6 (PL-6) domains as identified using
   SUPERFAMILY 2, and the ligand docking residues are illustrated below
   the models.

   In addition to the phylogenetic analysis, three dimensional structures
   of pectate lyases were also compared. Ab initio model prediction of
   pectate lyases from PJRB strains were performed to explore the
   structural divergence and identification of ligand binding and active
   site residues (Fig. [153]5). As per COACH analyses, the closest
   structural homologue of PJRB1 and PJRB2 pectate lyases was a
   hexasaccharide I bound to B. subtilis pectate lyase (PDB model 2NZM),
   whereas, the pectate lyase from PJRB3 strain found its closest
   homologue in pectate Lyase C of Dickeya chrysanthemi mutant R218K (PDB
   model 2EWE). In case of PJRB1, the ligand ADA (α-D-galacturonic acid)
   interacted with four residues of the enzyme- Asp 186, Lys 210, Leu 213
   and Arg 244. Similarly, the ligand binding residues of PJRB2 PL2
   comprised of Gly 185, Tyr 209, Ile 212 and Arg 250. The pectate lyase
   enzymes of PJRB strains exhibited considerable divergence in their
   sequence as evidenced in the ligand binding site residues predicted by
   ab initio modelling. Pectate lyase of PJRB3 and PL1 of PJRB2 had five
   residues each in their ADA binding site. In PJRB2 PL1, the binding
   residues are Asn 203, Asp 246, Lys270, Val 273 and Arg 307; whereas in
   PJRB3 PL these residues are Lys 178, Lys 206, Leu 209, Ser 212 and Arg
   240.

   Active site residues were predicted by COFACTOR module of COACH, and,
   except PJRB PL2 which contained a single Arg at position 244, all other
   contained one Asp and one Arg each in their active sites. The closest
   active site homologue observed in PJRB1 and PJRB3 pectate lyases was a
   pectate lyase of D. chrysanthemi (PDB model 1pclA), whereas, both the
   PLs of PJRB2 had highest homology with that of Bacillus sp. TS-47 (PDB
   model 1vblA).

Genome-wide variant identification in PJRB strains

   Several genomic variants were discovered in PJRB genomes based on two
   different variant calling approaches by Snippy and BactSNP. The PJRB1
   genome when compared to B. safensis strain FO-36b, PJRB2 to B.
   velezensis strain OSY-GA1, and PJRB3 to B. altitudinis strain P-10,
   they produced total sequence variants of 67381 (61001 in CDS), 12461
   (11012 in CDS), and 41345 (37125 in CDS), respectively
   (Tables [154]S12–[155]S14). Majority of these variants (84%-90%) are
   SNPs (Table [156]S15). Deletions are only around 0.5% in both PJRB1 and
   PJRB3, and slightly higher (1%) in PJRB2. An almost similar pattern was
   observed for insertion type variants. Other complex type variations are
   5.8% in PJRB2 and nearly 12% in PJRB1 and PJRB3. However, only a
   fraction of these variations had any real effect on bacterial
   metabolism and growth, as majority of them were found to be of
   synonymous type mutations. Among the total variants in CDS, 47650
   (78.11%) are categorized as synonymous variants without any change in
   the coded amino acids. Similarly, in PJRB2 and PJRB3, 71% and 77%
   variants are of synonymous types. The proportion of missense mutations
   (non-synonymous that effect codon change) are 21.44% in PJRB1, 28.18%
   in PJRB2, and 22.08% in PJRB3. Frameshift mutations which changes the
   reading frames of genes were rather rare (0.17% to 0.44%). Base
   substitutions are mainly transitions types (68.57% to 69.49%), and
   nearly 30% changes are due to transversions. Most frequent base
   substitution type was found G > A (17.44% to 18.28%); whereas, the
   least frequent was C > G (2.29% to 3.01%).

   Based on genome alignment approach of phylogenetically close bacterial
   strains, 78301 SNPs in PJRB1, 13080 in PJRB2 and 48406 in PJRB3 were
   identified, out of which 2339, 909, and 3670 were found unique to
   PJRB1, PJRB2 and PJRB3, respectively. Interestingly, several SNPs were
   located in the pectin degradation related genes in all the PJRB genomes
   (Fig. [157]S7a–c). A unique transition type SNP, T > C was located in
   gene (locus: FJP00_18325) coding for pectate lyase in the PJRB1 genome.
   Similarly, two SNPs, G > A and C > T in genes (loci: FJP86_03645 and
   FJP86_18420) coding for pectate lyases in PJRB2 genome, and an G > A in
   gene coding for pectin esterase A (locus: FJP11_10115) in PJRB3 genome
   were also identified. Although not yet validated, these unique SNPs in
   the genes coding for pectin degrading enzymes possibly could account
   for their high pectinolytic efficiency in the retting process.

Discussion

   Microbial retting of bast tissues from jute and other allied fibre
   crops, such as mesta, is crucial for extracting good quality fibres of
   economic importance. Bast fibres are secondary phloem tissues that are
   affixed to the inner bark and the outside of cambium tissues. The fibre
   extraction process involves water-borne microbial decomposition of stem
   tissues adhered together by complex substances including pectin,
   hemicellulose, xylan, and lignin^[158]1. Since bast fibres are
   cellulosic in nature, the retting microbes must not have cellulolytic
   activity but should possess pectin and xylan degrading
   enzymes^[159]5,[160]13. The retting process involves the build-up of
   microbial communities by dissolution of sugars and nitrogenous
   substances from the plant stems. The aerobic Bacillus initiate the
   degradation of pectin and xylans till the anaerobic bacteria of
   possibly Clostridium genus replaces them because of diminished
   availability of oxygen in the retting environment^[161]2. Earlier we
   have isolated and characterized three individual pectinolytic and
   xylanolytic bacterial strains devoid of cellulolytic activity,
   characterized them as Bacillus sp., and applied them as commercial
   formulation for jute retting consortium named ‘CRIJAF
   SONA’^[162]13,[163]14. This formulation has become very popular among
   the jute farmers owing to its ability to reduce retting time and
   improve fibre quality and fibre recovery, thereby increasing the net
   income of jute farmers.

   Sustained commercial success of this bacterial consortium demands
   precise characterization of these strains at genome sequence level to
   identify the genes for further improvisation of their efficiency and
   molecular strain typing. To date, sequencing and comparative analysis
   of genomes of several Bacillus species have provided important insights
   about their evolution, genetics, and physiology including a wide range
   of applications in agriculture and industry. Here, we report
   whole-genome shotgun sequencing of the pectinolytic strains of three
   Bacillus sp. at sufficiently high genome coverage and typical genomic
   features including annotation and variations of the pectin-degrading
   genes.

Genome assemblies and annotations of PJRB strains establish their genomic
organization and typical features

   The genome size and GC content are considered very important in order
   to understand the ability of any microorganism to adapt to varying
   environmental conditions^[164]50. Based on the available complete
   genomes of Bacillus at NCBI, it is evident that there is up to 79% of
   genome size variation (3.42 Mb to 6.13 Mb), exhibiting considerable
   plasticity in Bacillus species. Similarly, the GC content of the
   studied species also vary from 34.7% to 94.7%. Nearly 3.8 Mb genomes
   assembled in contigs of three PJRB strains of Bacillus in the present
   study are consistent with these genome sizes and GC content. The GC
   content of PJRB1 and PJRB3 differed with PJRB2, which had slightly
   higher GC content and these results are consistent with the average GC%
   of 46.3% of the selected Bacillus species. Gene content, number of tRNA
   sequences, and other features like phage-like sequences and insertion
   sequences were also found similar to the genomes of respective Bacillus
   species to which the PJRB strains belong. The insertion sequences in
   microbes are known to influence the genome plasticity and
   adaptability^[165]51. All these features, including the insertion sites
   observed in the PJRB genomes, may be crucial in analysing their
   influence in genome plasticity and adaptability to a complex
   environmental niche like jute-retting water.

Genome-level resolution of PJRB consortium reveal taxonomic affiliations to
threeBacillusspecies

   One of the major highlights of the present genome sequence analysis of
   jute retting bacterial consortium is their definitive taxonomic
   resolution based on genome-based phylogeny. It is now widely accepted
   that whole-genome sequence-based taxonomy using core genome alignment
   and ANI values is a better approach than the conventional method of
   bacterial taxonomic studies and 16 S rRNA sequencing^[166]52–[167]54.
   Prior to this genome analysis, identity of the strains of the
   consortium was established as Bacillus pumilus by morphological,
   biochemical, and 16 S rDNA sequencing^[168]13. However, different
   genome-based taxonomic analysis in this study unequivocally identified
   them as three different Bacillus species, viz. PJRB1-Bacillus safensis,
   PJRB2-Bacillus velezensis, and PJRB3-Bacillus altitudinis. Although
   ribotyping based on 16 S rRNA is widely used for bacterial
   classification, it is often inadequate for the species whose 16 S rRNA
   genes are highly conserved^[169]55. In case of Bacillus cereus, it has
   been demonstrated that whole genome sequence analysis provided higher
   precision over the 16 S rRNA gene sequence to resolve the taxonomic
   ambiguities^[170]15. Similarly, genome-based taxonomic analysis proved
   effective in the reclassification of Paenibacillus riograndensis as a
   genomovar of P. sonchi^[171]56. Our findings too lead us to revise the
   taxonomic resolution of the jute retting bacterial consortium strains
   into three different Bacillus species in contrary to their earlier
   classification as strains of B. pumilus^[172]13.

Comparative genomics ofBacillusstrains highlights genome collinearity and
gene orthology

   Pan-genome profiles of bacterial genomes shed light on their core and
   accessory genes with differences in genomic signatures. While the
   conserved core genes can efficiently identify bacterial genus-level
   identity from a microbiome, strain-specific accessory genes actually
   define lateral gene transfer with novel functions^[173]57. Therefore, a
   comprehensive analysis of pan-genome aids in understanding the
   essential and laterally transferred functions of a newly sequenced
   bacterial genome. Since we have taxonomically resolved the identity of
   jute retting bacterial consortium into three different bacterial
   species, we performed separate pan-genome analysis of PJRB strains. In
   all the analyses, the steady increase of pan-genome with each further
   addition of genome explains that they have large and open pan-genome.
   We found that for all the PJRB strains, ~30% of the protein-coding
   genes in pan-genome cluster were grouped as core and ~60% as accessory
   genes. These findings are in agreement with that of other bacterial
   species^[174]58. As expected, these core genes are mainly involved in
   basic metabolism function of bacteria like recombination and repair,
   membrane biogenesis, carbohydrate metabolism, etc., whereas the
   accessory genomic regions may be useful to infer evolutionary history
   and specific adaptation of the bacterial strains. A comparative
   analysis of predicted protein-coding genes for their orthologous
   relationship also revealed significant gene overlaps among the three
   PJRB genomes. The number of overlapped genes are very similar to the
   core pan-genome and also corroborated to high genomic synteny and
   collinearity. Though overlapped genes are considered least effective
   markers in bacterial phylogenomic analysis of closely related strains,
   they can be combined with locally collinear blocks for a robust
   phylogenomic inference^[175]59. Thus, the orthologous gene clusters
   identified from the pan-genome analysis will serve as a solid platform
   for an in-depth assessment of genome translocations, horizontal gene
   transfer, and gene losses in these PJRB strains.

Analysis of carbohydrate-degrading genes exhibit distinct patterns and
variations associated with PJRB strains

   Although, the carbohydrate degrading enzymes are currently classified
   into 23 families in the CAZy database, they fall into two main
   groups^[176]38. Glycoside hydrolases (GHs) catalyze the hydrolysis of
   glycosidic bonds and polysaccharide lyases (PLs) cleave uronic
   acid-containing polysacharides via a β-elimination mechanism. On some
   substrates, carbohydrate esterases (CEs) have also been reported to
   catalyze acylation of substituted saccharides. Whole-genome or
   transcriptome analysis are now routinely used in many bacterial species
   to locate or study the expression of genes that are involved in
   polysaccharide degradation. Often genes encoding CAZymes are reported
   to be organized in gene clusters. The extensive studies in the last few
   years have offered a number of insights into the complex enzymatic
   pathway required to degrade pectin^[177]60,[178]61. Since the key
   characteristics of the studied bacterial strains of retting consortium
   are essentially based on efficient pectin degradation, the presence and
   organization of CAZymes constitute an important part of our study. We
   observed different carbohydrate degrading CAZy classes viz. PL1, PL9,
   GH28, CE8, and CE12 in all the three PJRB genomes, although their
   numbers varied from other strains of B. safensis, B. velezensis and B.
   altitudinis. Pectin lyase activity (U/ml) varies among the selected
   strains: PJRB1 185.7, PJRB2 197.7, and PJRB3 203.7^[179]13; this may
   possibly be attributed to variation in CAZymes in the genomes. Total
   annotated CAZymes were highest (84) in PJRB2, which also contained an
   extra copy of PL gene. It is also noteworthy that the commercial
   consortium contains PJRB2 in twice the concentration of the other two
   strains. Computational predictions also established an intricate enzyme
   interaction network of pectin degrading genes. This will facilitate to
   build models to further improve the synergy of this consortium for
   efficient retting.

   Pectate lyase domain-based phylogenetic analysis of PJRB and other
   bacterial strains resolved their evolutionary convergence among the
   Firmicutes. Similarly, a phylogenetic tree based on 121 pectate lyases
   clustered them according to the source organisms such as bacteria,
   fungi, plants, and nematodes^[180]61. In another study, a multiple
   sequence alignment of 48 pectin lyase protein sequence of different
   bacteria and fungi revealed the presence of conserved motifs as well as
   variable active sites, explained their catalytic variabilities^[181]60.
   In this study, identification of ligand binding and active site
   residues through homology modelling of pectate lyases from PJRB strains
   offer an opportunity to improve the efficiency by modulating these
   sites. Changes in N terminal region of PelA protein of Bacillus sp.
   BP-23 PelA were demonstrated to render different substrate specificity
   and unusual features as compared to PelA proteins in other Bacillus
   species^[182]62. The large number of SNPs identified in the PJRB
   strains of Bacillus spp. reported here are expected to serve as crucial
   resources towards molecular typing and inference of intra-species
   genetic polymorphism. Several SNPs were located in pectin degradation
   related genes of all the PJRB strains and this is perhaps one of the
   first attempts to associate functional variation with SNPs in CAZy
   genes.

Conclusions

   CRIJAF SONA microbial formulation has helped in shortening the retting
   duration and improvement in fibre quality, providing the much-required
   fillip to the jute sector allowing diversification of jute products to
   tap the world market. Now with the availability of the complete genome
   sequences of the retting bacterial strains, it will be possible to
   further improve this technique by cloning and functional validation of
   the pectin degrading genes. In this study, the presence and
   organization of different carbohydrate degrading CAZymes classes viz.
   PL1, PL9, GH28, CE8, and CE12 in all the three PJRB genomes were
   established. Definitive genome-level taxonomic resolution of PJRB
   consortium revealed taxonomic affiliations to the three Bacillus
   species.

   Also, a comprehensive understanding of the diversity of microbial
   population will help in incorporating other strains into the consortium
   and protecting IPR of the indigenous microbiome. Further application of
   high throughput next-generation sequencing can generate genome level
   data from each retting environment indicating the microbial diversity
   and their correlation with fibre quality. Genome analysis of the
   retting microbes provides additional insights into the mechanisms of
   carbohydrate metabolism and could be relevant for improving the next
   generation of efficient consortium. The global market for natural
   fibres is continuously growing due to changing environmental
   legislations and regulations, and a deeper understanding of retting
   process is prerequisite to achieve sustainable natural fibre production
   system.

Data availability

   Genome sequences of PJRB1, PJRB2 and PJRB3 are accessible at NCBI under
   the following accession nos.: PJRB1 (BioSample Accession SAMN11964560,
   Genome Accession [183]VFLO00000000, SRA accession SRR9317474); PJRB2
   (BioSample Accession SAMN11964574, Genome Accession [184]VFLN00000000,
   SRA accession SRR9317475) and PJRB3 (BioSample Accession SAMN11964575,
   Genome Accession [185]VFLM00000000, SRA accession SRR9317473).

Supplementary information

   [186]Supplementary information.^ (2.9MB, xlsx)
   [187]Supplementary information2.^ (2.7MB, pdf)

Acknowledgements