Graphical abstract

   graphic file with name fx1.jpg
   [34]Open in a new tab

Highlights

     * •
       A protocol to characterize transcriptionally active microbes (TAMs)
       using RNA-seq
     * •
       Steps for RNA-seq library preparation, sequencing, and
       metatranscriptomic analysis
     * •
       Detailed procedure for taxonomic and functional classification of
       microbes
     * •
       Correlation between bacterial species and the expressed host genes
     __________________________________________________________________

   Publisher’s note: Undertaking any experimental protocol requires
   adherence to local institutional guidelines for laboratory safety and
   ethics.
     __________________________________________________________________

   The elucidation of the role of microorganisms in human infections has
   been hindered by difficulties using conventional culture-based
   techniques. Here, we present a protocol for the investigation of
   transcriptionally active microbes (TAMs) using an RNA sequencing
   (RNA-seq)-based approach. We describe the steps for RNA isolation,
   viral genome sequencing, RNA-seq library preparation, and
   metatranscriptomic and transcriptomic analysis. This protocol permits a
   comprehensive evaluation of TAMs’ contributions to the differential
   severity of infectious diseases, with a particular focus on diseases
   such as COVID-19.

Before you begin

   Comprehending the significance of TAMs in influencing the course or
   outcome of a disease represents a crucial aspect of clinical research
   that has not received in-depth examination thus far. This protocol
   outlines the precise steps required for conducting a functional
   metagenomics study among the various SARS-CoV-2 variants of concern
   (VOCs) from the nasopharyngeal swab isolated from the COVID-19 positive
   patients. These same procedures are applicable for investigating the
   metagenomic profile of any other sample/disease type. In case of blood
   samples, it is imperative to first eliminate globin mRNA along with the
   ribosomal RNA before proceeding with the preparation of the RNA-seq
   library.

Nasopharyngeal swab collection (performed by a trained healthcare provider,
only)

   The trained paramedical staff at the MAX Hospital, Delhi collect the
   nasopharyngeal swabs of patients on the day of reporting to the
   hospital. Put the tip of the swab into a vial containing 3 mL of Viral
   Transport Media (VTM) (HiViral Transport Kit, HiMedia, Cat. No:
   MS2760A-50NO), by breaking the applicator’s stick and sealing the tube
   tightly. Vortex the tube for 2 min to allow the dissolution of the
   sample into the VTM solution followed by centrifugation and allow it to
   settle for some time before processing.

Preparation of running the codes

Guppy installation

     Inline graphic Timing: 10 min

     * 1.
       Login to Nanopore community website
       ([35]https://community.nanoporetech.com) using nanopore login
       credential, login id and password
       ([36]https://community.nanoporetech.com/).
     * 2.
       Direct to “Software downloads” page from right selection options
       ([37]https://community.nanoporetech.com/downloads).
     * 3.
       Download Guppy (latest v6.5.7) for Linux 64-bit GPU download
       option.
     * 4.
       Install Guppy to GPU enabled server.

R and RStudio installation

     Inline graphic Timing: 20–30 min

     * 5.
       The latest version of R can be found at
       [38]https://cran.r-project.org/. This is the official website for
       the R programming language, where you can download the latest
       version and related packages.
     * 6.
       The installation of RStudio is not mandatory to follow the
       protocol. However, it makes viewing and interacting with files,
       packages, objects in the environment, tables, and graphs easy.
       RStudio can be found at
       [39]https://posit.co/download/rstudio-desktop/ and can greatly
       enhance your R programming experience by providing a user-friendly
       interface for working with R scripts and data.

Installation of tools/softwares/packages before analysis

     Inline graphic Timing: 1 h

   conda install -c bioconda::fastqc

   conda install -c bioconda::trimmomatic

   conda install -c bioconda::hisat2

   conda install -c bioconda::samtools

   conda install -c bioconda::kraken2

   conda install -c bioconda::bracken

   conda install -c bioconda::humann

     Note: Set up the “conda” package manager using the official link
     [40]https://conda.io/projects/conda/en/latest/user-guide/getting-sta
     rted.html for creating and activating the conda environment.

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Critical commercial assays
     __________________________________________________________________

   Viral transport medium (VTM) HiViral Transport Kit, HiMedia Cat. No.
   MS2760A-50NO
   Viral RNA extraction QIAmp viral mini kit, QIAGEN Cat. No. 52906
   TRUPCR SARS-CoV-2 kit 3B BlackBio Biotech India Ltd. Cat. No. 3B304
   LunaScript RT SuperMix Kit New England Biolabs (NEB) Cat. No. E3010
   Q5 Hot Start High-Fidelity 2X master mix NEB Cat. No. M0494
   NEBNext Ultra II End Repair/dA – tailing module NEB Cat. No. E7546
   Blunt/TA ligase master mix NEB Cat. No. M0367
   NEBNext Quick ligation module NEB Cat. No. E6056
   Ligation Sequencing Kit Oxford Nanopore Technologies SQK-LSK109
   Native Barcoding Expansion 96 Oxford Nanopore Technologies EXP-NBD196
   Sequencing flow cells Oxford Nanopore Technologies FLO-MIN106 (R9.4.1)
   Flow Cell Wash Kit Oxford Nanopore Technologies EXP-WSH004
   TruSeq Stranded Total RNA Library Prep Gold Illumina Cat. No. 20020599
   IDT for Illumina – TruSeq RNA UD indexes (96 indexes, 96 samples)
   Illumina Cat. No. 20022371
   AMPure XP Beckman Coulter Cat. No. A63881
   Agencourt RNAClean XP Kit Beckman Coulter Cat. No. A63987
   Qubit dsDNA HS Assay Kit Symbio (Thermo Fisher Scientific) Cat. No.
   [41]Q32854
   Bioanalyzer DNA high sensitivity kit Agilent Cat. No. 5067-4626
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   RNA-seq data This study, NCBI Sequence Read Archive (SRA) database NCBI
   SRA: PRJNA676016, PRJNA678831 (Pre-VOC), PRJN868733, PRJNA952815 (VOCs)
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   R (v4.3.3) CRAN [42]https://cran.r-project.org/bin/windows/base/
   RStudio Posit [43]https://posit.co/download/rstudio-desktop/
   bcl2fastq (v2.19.0.316) GitHub [44]https://github.com/brwnj/bcl2fastq
   Guppy (v2.1.0) Oxford Nanopore Technologies
   [45]https://community.nanoporetech.com/
   Minimap2 (v2.17) Li, Heng[46]^2 [47]https://github.com/lh3/minimap2
   Nanopolish (v0.14.1) Loman et al.[48]^3
   [49]https://github.com/jts/nanopolish
   FastQC (v0.12.0) GitHub [50]https://github.com/s-andrews/FastQC
   Trimmomatic (v0.39) Bolger et al.[51]^4
   [52]http://www.usadellab.org/cms/?page=trimmomatic
   HISAT2 Kim et al.[53]^5 [54]https://github.com/DaehwanKimLab/hisat2
   Samtools (1.17) Danecek et al.[55]^6 [56]https://github.com/samtools/
   Kraken2 (v2.0.8) Wood et al.[57]^7
   [58]https://github.com/DerrickWood/kraken2/wiki
   Bracken2 (v2.7.0) Lu et al.[59]^8
   [60]https://github.com/jenniferlu717/Bracken
   HUMAnN3 Beghini et al.[61]^9
   [62]https://github.com/biobakery/humann?tab=readme-ov-file
   metagenomeSeq (v3.16) GitHub
   [63]https://github.com/HCBravoLab/metagenomeSeq
   Vegan (v2.6–2) & Phyloseq (v1.40.0) McMurdie, Dixon
   et al.[64]^10^,[65]^11
   [66]https://cran.r-project.org/web/packages/vegan/vegan.pdf &
   [67]https://joey711.github.io/phyloseq/tutorials-index.html
   DESeq2 (v1.38.3) Love et al.[68]^12
   [69]https://www.bioconductor.org/packages/release/bioc/html/DESeq2.html
   clusterProfiler (v4.8.3) Guangchuang et al.[70]^13
   [71]https://bioconductor.org/packages/release/bioc/html/clusterProfiler
   .html
   ggplot2 (v3.4.2) CRAN
   [72]https://cran.r-project.org/web/packages/ggplot2/index.html
     __________________________________________________________________

   Others
     __________________________________________________________________

   KEGG Kanehisa and Goto[73]^14 [74]https://www.genome.jp/kegg/
   Centrifuge Eppendorf 5810R refrigerated centrifuge Cat. No. 5810R
   ROTOSPIN rotary mixer Tarsons Cat. No. 3090X
   Magnetic stand-96 Invitrogen Cat. No. 12027
   NanoDrop 2000 spectrophotometer Thermo Fisher Scientific Cat. No.
   ND2000CLAPTOP
   Qubit 4.0 fluorometer Thermo Fisher Scientific Cat. No. [75]Q33240
   MinION Mk1C Oxford Nanopore Technologies MIN-101C
   Bioanalyzer 2100 instrument Agilent Technologies Cat. No. G2939BA
   NextSeq 2000 system Illumina Cat. No. 20038897
   [76]Open in a new tab

Materials and equipment

   All resources, materials, and software are listed in the [77]key
   resources table.

Bioinformatics analysis

   All bioinformatics analyses have been carried out on the CSIR-IGIB’s
   high-performance GPU-linux based scientific computing server (CentOS
   7.7) and on a local workstation. A basic knowledge of scripting
   languages (bash, R, and Python) is required to understand and apply
   this protocol. Software and scripts used in this protocol are provided
   in the “Software and algorithms” section of the [78]key resources
   table.

Computational requirements for analysis

   Basic installation requirements:
     * •
       Memory = 128 GB.
     * •
       Cores = 4.
     * •
       compute nodes = 40.
     * •
       Operating system (Linux or Mac).
     * •
       GPUs (V100)-Linux-based (CentOS 7.7) compute cluster.

     Note: A computer with a linux and network connection is required.
     The RAM requirement depends on the number of samples to be analyzed.
     16 GB RAM should be sufficient for an initial analysis. On the basis
     of availability of RAM, analysis time may fluctuate (fast).

Step-by-step method details

RNA isolation

     Inline graphic Timing: 2 h

   This major step describes the isolation of RNA from the nasopharyngeal
   swab samples collected from hospital admitted COVID-19 positive
   patients. RNA is extracted using commercially available RNA extraction
   kits (QIAmp viral mini kit, Qiagen, Cat. No. 52906), in accordance with
   the kit protocol (QIAamp Viral RNA Mini Handbook).
     * 1.
       Lyse the sample (VTM).
          + a.
            Take 200 μL of VTM solution in a 1.5 mL microcentrifuge tube.
          + b.
            Add 560 μL prepared Buffer AVL containing carrier RNA.
          + c.
            Mix thoroughly by pulse-vortexing for 15 s to ensure efficient
            lysis.
          + d.
            Incubate at room temperature for 10 min.

     Inline graphic CRITICAL: Buffer AVL–carrier RNA should be prepared
     fresh, and is stable at 2°C–8°C for up to 48 hours. This solution
     develops a precipitate when stored at 2°C–8°C that must be
     re-dissolved by warming at 80°C before use.

     Note: Add Buffer AVE to the tube containing lyophilized carrier RNA
     to obtain a solution of 1 μg/μL (i.e., add 310 μL Buffer AVE to
     310 μg lyophilized carrier RNA). Dissolve the carrier RNA
     thoroughly, divide it into conveniently sized aliquots, and store it
     at −20°C. Do not freeze–thaw the aliquots of carrier RNA more than 3
     times.

     * 2.
       Binding of the viral RNA to the QIAamp membrane.
          + a.
            Add 560 μL 96% ethanol to the sample.
          + b.
            Mix by pulse-vortexing for 15 s.
          + c.
            Carefully apply 630 μL of the solution to the QIAamp Mini
            column.
          + d.
            Close the cap, and centrifuge at 6000 × g (8000 rpm) for
            1 min.
          + e.
            Place the QIAamp Mini column into a clean 2 mL collection
            tube, and discard the tube containing the filtrate.
          + f.
            Repeat this step until all of the lysate has been loaded onto
            the spin column.

     Inline graphic CRITICAL: Use only ethanol, since other alcohols may
     result in reduced RNA yield and purity. Only use freshly prepared
     ethanol.

     * 3.
       Wash.
          + a.
            Carefully open the QIAamp Mini column, and add 500 μL Buffer
            AW1.
          + b.
            Close the cap and centrifuge at 6000 × g (8000 rpm) for 1 min.
          + c.
            Place the QIAamp Mini column in a clean 2 mL collection tube
            and discard the tube containing the filtrate.
          + d.
            Carefully open the QIAamp Mini column and add 500 μL Buffer
            AW2.
          + e.
            Close the cap and centrifuge at full speed (20,000 × g;
            14,000 rpm) for 3 min.

     Note: Buffer AW1 and AW2 are supplied as a concentrate. Before using
     for the first time, add the appropriate amount of ethanol (96%–100%)
     to Buffer concentrates.

     * 4.
       Dry Spin.
          + a.
            Place the QIAamp Mini column in a new 2 mL collection tube,
            and discard the old collection tube with the filtrate.
          + b.
            Centrifuge at full speed for 1 min.

     Inline graphic CRITICAL: Dry spin is always recommended since
     residual Buffer AW2 in the eluate may cause problems in downstream
     applications.

     * 5.
       Elute.
          + a.
            Place the QIAamp Mini column in a clean 1.5 mL microcentrifuge
            tube. Discard the old collection tube containing the filtrate.
          + b.
            Carefully open the QIAamp Mini column and add 60 μL Buffer AVE
            equilibrated to room temperature.
          + c.
            Close the cap and incubate at room temperature for 1 min.
          + d.
            Centrifuge at 6000 × g (8000 rpm) for 1 min.
     * 6.
       Quantify the isolated RNA using a NanoDrop Spectrophotometer taking
       1 μL of the sample.
     * 7.
       Store the viral RNA at −20°C or −80°C for long term storage.

     Inline graphic Pause point: RNA collected can be stored at −80°C
     until library preparation. However, RNA is prone to degradation and
     samples should be used as soon as possible. To avoid possible batch
     effects due to different storage times, RNA from all study groups
     (Pre-VOCs, VOCs: Delta and Omicron) need to be included in each
     batch.

Viral detection and quantification

     Inline graphic Timing: 2 h

   We perform a real-time reverse transcription polymerase chain reaction
   (RT-PCR) test for SARS-CoV-2 detection and quantification using TRUPCR
   SARS-CoV-2 kit (3B BlackBio Biotech India Ltd.).
     * 8.
       Thaw the necessary kit (HiScribe T7 ARCA mRNA Kit (with tailing))
       components, mix and pulse-spin in microfuge to collect solutions to
       the bottoms of tubes.
     * 9.
       Reaction preparation.
          + a.
            Prepare the reaction mix as follows:

                Reagent      For 1 rxn. Storage
            Master mix       10 μL      −20°C
            Enzyme mix       0.35 μL    −20°C
            Primer probe mix 4.65 μL    −20°C
            Total            15 μL      -
            [79]Open in a new tab
          + b.
            Transfer 15 μL of the above prepared Reaction mix into a
            0.2 mL PCR tube.
          + c.
            Add 10 μL of RNA sample, positive control, or negative control
            to make a final volume of 25 μL.
     * 10.
       Program set up.
          + a.
            Define the following setting for temperature profile.

   Stop Temperature, °C  Time  Dye acquisition Cycles
   1    50              15 min -               1
   2    95              05 min -               1
   3    95              05 s   -               38
   4    60              40 s   Yes
   5    72              15 s   -
   [80]Open in a new tab

     Note: Choose passive reference dye as “None”.

     * 11.
       Channel selection.
          + a.
            Define the following setting for channel selection.

   Detector name       Reporter       Quencher
   E gene        Texas Red/Orange/ROX None
   RNase P       HEX/Yellow/VIC       None
   RdRp + N gene FAM/Green            None
   [81]Open in a new tab
     * 12.
       Result analysis: We consider the cycle threshold (Ct) value of 35
       for interpretation of the results.

SARS-CoV-2 whole genome sequencing

     Inline graphic Timing: 16 h

   COVID-19 positive RNAs are subjected to sequencing, which allows the
   identification of SARS-CoV-2 variants for further classification of
   samples into Pre-VOCs and VOCs (Delta and Omicron).

     Note: Prepare the sequencing library following the recommendation of
     Oxford Nanopore Native Barcoding protocol.

     * 13.
       Reverse transcription.
          + a.
            Proceed with first strand cDNA generation from the isolated
            RNA using the following reaction mix.

                              Reagent                    Volume/well Storage
            RNA sample (∼100 ng)                         11 μL       −20°C
            60 μL random hexamers and anchored polyT(23) 1 μL        −20°C
            10 mM dNTPs                                  1 μL        −20°C
            Total                                        13 μL       -
            [82]Open in a new tab
          + b.
            Do pipette mixing and give a short spin.
          + c.
            Put the reaction plate for incubation at 65°C for 5 min, and
            then snap cool on ice for 1 min followed by first strand cDNA
            synthesis.
     * 14.
       First strand cDNA synthesis.
          + a.
            Add the following reagents to the annealed template RNA.

                          Reagent                Volume/well Storage
            5X SuperScript IV buffer             4 μL        −20°C
            100 mM DTT                           1 μL        −20°C
            RNaseOUT RNase inhibitor             1 μL        −20°C
            SuperScript IV Reverse Transcriptase 1 μL        −20°C
            Total                                7 μL        -
            [83]Open in a new tab
          + b.
            Mix using pipette and spin down the residual volume.
          + c.
            Put the reaction plate in the thermal cycler for incubation as
            per using the below program.

            Steps Temperature  Time  Cycles
                  23°C        10 min 1
                  50°C        10 min 1
                  80°C        10 min 1
            Hold  4°C         forever
            [84]Open in a new tab
          + d.
            To degrade the RNA strand attached to the cDNA, add 1 μL of
            RNase H to each well.
          + e.
            Mix properly and incubate at 37°C for 20 min.
     * 15.
       Second strand cDNA synthesis.
          + a.
            Prepare the following master mix and add it to cDNA.

                 Reagent      Volume/well Storage
            cDNA              21 μL       −20°C
            Random primers    1 μL        −20°C
            10 mM dNTPs       1 μL        −20°C
            10X Klenow Buffer 1 μL        −20°C
            Total             25 μL       -
            [85]Open in a new tab
          + b.
            Mix by vortexing, spin down the residual volume.
          + c.
            Incubate the plate at 95°C for 3 min and snap cooled on ice
            for >1 min.
          + d.
            Add 1 μL of Klenow Fragment to each well, followed by
            vortexing and spinning down the residual volume.
          + e.
            Incubate the plate in a thermal cycler as per the program
            below.

            Steps Temperature  Time  Cycles
                  37°C        60 min 1
                  75°C        10 min 1
            Hold  4°C         forever
            [86]Open in a new tab
     * 16.
       Double stranded cDNA purification.
          + a.
            Bead Addition and Mixing: Add 45 μL of AMPure XP beads to the
            double-stranded cDNA and ensure thorough mixing by vortexing.
          + b.
            Incubation: Incubate the mixture at room temperature for
            15 min to allow binding of cDNA to the beads.
          + c.
            Magnetic Separation: Place the plate on a magnetic stand for
            5 min, allowing the beads to settle on the magnet’s side.
            Carefully remove the clear supernatant without disturbing the
            beads.
          + d.
            Ethanol Wash: Wash the bead-bound cDNA twice with 100 μL of
            80% ethanol to remove impurities. Ensure that residual ethanol
            is removed by spinning the plate.
          + e.
            Air Drying: Allow the plate to air dry, ensuring that all
            residual ethanol evaporates.
          + f.
            Resuspension: Resuspend the bead-bound cDNA in 26 μL of
            resuspension buffer and incubate for 2 min at room
            temperature.
          + g.
            Elution: Carefully elute 25 μL of purified double-stranded
            cDNA from the beads.
          + h.
            Concentration Measurement: Measure the concentration of the
            purified cDNA product using a NanoDrop or similar instrument.
     * 17.
       PCR tiling of SARS-CoV-2 virus.
       We amplify the SARS-CoV-2 genome from the double-stranded cDNA
       using ARTIC V3 primers specific for SARS-CoV-2.

     Note: Divide these primers into two pools: Pool A and Pool B, and
     dilute to a concentration of 10 nM before being utilized in the
     subsequent step.
          + a.
            Prepare the reaction mixture according to the following
            instructions.
            PCR reaction master mix

                Reagent                  Volume (pool A) Volume (pool B) Storage
Purified double stranded cDNA (∼100 ng)  x μl            x μl            −20°C
Q5 Hot Start High-Fidelity 2X Master Mix 12.5 μL         12.5 μL         −20°C
V2 Primer pool at 10 μM (A or B)         3.7 μL          3.7 μL          −20°C
V3 Primer pool at 10 μM (A or B)         0.45 μL         0.38 μL         −20°C
Nuclease-free water                      8.35-x μl       8.42-x μl       RT
Total                                    25 μL           25 μL           -
            [87]Open in a new tab
          + b.
            Gently mix the reaction plate by pipetting and incubate in a
            thermal cycler as per the program below.
            PCR cycling conditions

                     Steps          Temperature Time  Cycles
            Initial Denaturation    98°C        30 s 1
            Denaturation            98°C        15 s 35 cycles
            Annealing and Extension 65°C        5 min
            Hold                    4°C         forever
            [88]Open in a new tab
          + c.
            Briefly spin the plate after PCR reaction and perform the
            following steps:
               o i.
                 Centrifugation: Centrifuge the plate once before
                 proceeding to the purification step.
               o ii.
                 Combining Pools: Combine the reactions from Pool A and
                 Pool B into a new deep 96-well plate, with one well
                 designated for each sample. Then, add 50 μL of
                 resuspended AMPure XP beads to each well.
               o iii.
                 Incubation: Incubate the plate at room temperature for
                 10 min.
               o iv.
                 Magnetic Separation: Subsequently, transfer the plate to
                 a magnetic stand and wait for 5 min. Once the beads have
                 completely moved to the side of the magnet, carefully
                 discard the supernatant.
               o v.
                 Ethanol Wash: Wash the pellet with 200 μL of 80% ethanol
                 twice to remove any impurities. Afterwards, remove the
                 residual ethanol by spinning the plate, and allow it to
                 air dry.
               o vi.
                 Elution: Elute the amplified product using 15 μL of
                 nuclease-free water.
               o vii.
                 Dilution and Quantification: After diluting the eluted
                 product in 1:10 ratio, quantify using the Qubit High
                 Sensitivity dsDNA assay kit.
     * 18.
       End-prep.
       In order to add the barcodes to the amplicons, short PolyA should
       be added to the end of the amplicons.
          + a.
            Prepare the following reaction mix:

                         Reagent                 Volume     Storage
            cDNA (∼200 ng)                    x μL          −20°C
            Nuclease-free water               (12.5 - x) μL −20°C
            Ultra II End-prep reaction buffer 1.75 μL       −20°C
            Ultra II End-prep enzyme mix      0.75 μL       −20°C
            Total                             15 μL         -
            [89]Open in a new tab
          + b.
            Gently mix the reaction by pipetting up and down 8–10 times
            and spin down the plate.
          + c.
            Incubate at 20°C for 5 min and 65°C for 5 min in a thermal
            cycler.
          + d.
            Spin the plate.
     * 19.
       Native barcode ligation.
       To the end-prepped DNA, unique barcode sequences are added.
          + a.
            Spin down the Native Barcode plate in a microplate centrifuge.
          + b.
            Based on the number of samples chosen for sequencing, prepare
            the following reaction mix:

                          Reagent                Volume Storage
            End-prepped DNA                      5.5 μL −20°C
            Nuclease-free water                  1.5 μL RT
            Native Barcode                       2.5 μL −20°C
            NEBNext Ultra II Ligation Master Mix 10 μL  −20°C
            NEBNext Ligation Enhancer            0.5 μL −20°C
            Total                                15 μL  -
            [90]Open in a new tab
          + c.
            Gently mix the reaction by pipetting and incubate at 20°C for
            20 min and at 65°C for 10 min in a thermal cycler.
          + d.
            Spin down the plate.
     * 20.
       Pooling Barcoded Samples: Combine the barcoded samples into one
       tube. Add 8 μL of AMPure XP beads to the tube. Incubate at room
       temperature for 10 min on a hula mixer.
     * 21.
       Magnetic Separation: Transfer the tube to a magnetic stand for
       5 min. Carefully remove the clear supernatant once the beads have
       moved towards the magnet completely.
     * 22.
       Washing: Resuspend the pellet in 700 μL of Short Fragment Buffer
       (SFB). Return the tube to the magnet, and remove the supernatant.
       Repeat this step for a total of two washes.
     * 23.
       Ethanol Wash: Wash the pellet with 100 μL of 80% ethanol.
     * 24.
       Elution: Elute the pooled barcoded sample with 35 μL of
       nuclease-free water.
     * 25.
       Quantification: Quantify the pooled barcoded sample using the Qubit
       High Sensitivity dsDNA assay kit.
     * 26.
       Add the sequencing adapter to the pooled barcoded sample as per the
       following reaction:

                         Reagent                   Volume Storage
       End-prepped DNA                             5.5 μL −20°C
       Adapter Mix II (AMII)                       5 μL   −20°C
       NEBNext Quick Ligation Reaction Buffer (5X) 10 μL  −20°C
       Quick T4 DNA Ligase                         5 μL   −20°C
       Total                                       15 μL  -
       [91]Open in a new tab
          + a.
            Gently pipette mix and spin the tube.
          + b.
            Incubate the sample at room temperature for 20 min.
     * 27.
       Purification: Purify the final library using 20 μL AMPure XP beads.
       Following separation of beads from the library by placing onto the
       magnetic stand, wash the pellet twice using 125 μL Short Fragment
       Buffer (SFB) and elute with 15 μL Elution Buffer.
     * 28.
       Quantification: Quantify the final library using 1 μL of library in
       Qubit High sensitivity dsDNA assay kit.
          + a.
            If concentration is above 100 ng/μL, dilute with EB buffer for
            a final concentration of 50 ng/μL in a volume of 13 μL.
            Measure the concentration again using 1 μL of sample in Qubit.
     * 29.
       Priming and loading the SpotON flow cell.
          + a.
            Allow the flow cell to come to room temperature (∼ 24°C) for
            at least 5 min before inserting it into the MinION Mk1B
            device.
          + b.
            Determine the number of active pores in the flow cell by
            selecting “Check flow cell” in MinKNOW.
          + c.
            Open the priming port and check for a small air bubble under
            the cover.
               o i.
                 Draw back a small volume to remove any bubble:
               o ii.
                 Insert a P1000 tip into the priming port.
               o iii.
                 Turn the pipette’s thumbwheel anti-clockwise until the
                 dial shows a gain of no more than 20–30 μL or until a
                 small volume of storage buffer (yellow color) is entering
                 the pipette tip.
          + d.
            Prepare the flow cell priming mix by adding 30 μL of FLT to
            the entire tube (1,170 μL) of FB, and mix by pipetting, being
            careful to avoid foaming/air bubbles.
          + e.
            The priming process involved two steps:
               o i.
                 First Priming -
                 Open the priming port.
                 Slowly, load a total of 800 μL of the priming solution
                 into the flow cell.
                 Allow the flow cell to incubate at room temperature for
                 5 min.
               o ii.
                 Second Priming:
                 For the second priming step, open the SpotON port as
                 well.
                 Add 200 μL of priming solution into the flow cell.
                 These steps ensure that the MinION flow cell is properly
                 primed and ready for sequencing.
     * 30.
       Sequencing run.
          + a.
            The final library was prepared for loading as per the formula
            mentioned below.

                                Reagent                      Volume  Storage
            Sequencing Buffer (SQB)                          37.5 μL −20°C
            Loading Beads (LB), mixed immediately before use 25.5 μL −20°C
            DNA library (100 ng)                             12 μL   −20°C
            Total                                            75 μL   -
            [92]Open in a new tab
          + b.
            Mix the entire library very slowly by pipetting, followed by
            loading into the flow cell through the SpotON port in a
            dropwise manner.
          + c.
            Configure the MinION Mk1c software for sequencing, selecting
            the high-accuracy base calling option. This configuration was
            done after providing the relevant information about the
            library preparation kit, including SQK-LSK109, EXP-NBD104, and
            EXP-NBD114.
          + d.
            Allow the sequencing process to continue until a sufficient
            amount of data has been generated, and subsequently, stop the
            sequencing run.
          + e.
            Once the sequencing run has been stopped, wash the flow cell
            using a Flow Cell wash kit, and QC it to confirm if there are
            still a sufficient number of functional pores on the flow cell
            for another sequencing run.
          + f.
            Following this, store the flow cell at 4°C to preserve its
            integrity, and transfer the generated sequencing data to an
            HPC cluster for further data analysis.

Data analysis and interpretation for SARS-CoV-2 genome sequencing

     Inline graphic Timing: 5–6 h

   This step explains the detailed description of the analysis methods
   used for interpretation of the SARS-CoV-2 variant.
     * 31.
       Data transfer from nanopore sequencing device (Mk1B/Mk1C).
          + a.
            The data is stored in the device’s “/data/” folder with the
            run name.
          + b.
            Transfer the data using “scp” (relatively slow) or “rsync”
            (fast copying) tool.

     Note: It can copy locally, to/from another host over any remote
     shell, or to/from a remote rsync daemon.

     * 32.
       Software Setup for SARS-CoV-2 genome analysis.
          + a.
            Operating System Requirement: A 64-bit UNIX, Linux, or a
            similar environment. This includes Mac OS, Linux distributions
            such as Ubuntu 16 or later, or Windows 10 Subsystem for Linux.
          + b.
            Create a custom conda environment: Install the relevant
            version of Miniconda (recommended as it is relatively small
            and quick to install) following the installation guidelines at
            [93]www.conda.io/projects/conda/en/latest/user-guide/install/i
            ndex.html.
          + c.
            Install ARTIC nCoV-2019 specific data and software repository:
            Update the environment using the following command.

   git clone [94]https://github.com/artic-network/artic-ncov2019.git

   cd artic-ncov2019

   conda env remove -n artic-ncov2019

   conda env create -f environment.yml
     * 33.
       Make a new directory for analysis: Give a meaningful name to the
       analysis directory e.g., analysis/run_name.

   mkdir analysis

   cd analysis

   mkdir run_name

   cd run_name

   mkdir basecalling demultiplexing guppyplex nanopolish
     * 34.
       Activate the ARTIC environment: Perform all steps in this tutorial
       in the artic-ncov2019 conda environment:

   source activate artic-ncov2019.
     * 35.
       Basecalling with Guppy:

   guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg -i /path/to/Runfolder/
   -s ./basecalling/ -x "cuda:1" -r.

   [95]graphic file with name fx5.jpg
   [96]Open in a new tab
     * 36.
       Demultiplexing with Guppy:

   guppy_barcoder --require_barcodes_both_ends -i ./basecalling/ -s
   ./demultiplexing/ --arrangements_files "barcode_arrs_nb12.cfg
   barcode_arrs_nb24.cfg" -x "cuda:1"

   [97]graphic file with name fx6.jpg
   [98]Open in a new tab
     * 37.
       Read filtering: Perform this step for individual barcodes.

   cd guppyplex

   artic guppyplex --skip-quality-check –directory
   ../demultiplexing/barcode01.
     * 38.
       Run the ARTIC MinION pipeline: Perform this step for individual
       barcodes.

   artic minion --normalise 200 --scheme-directory
   /home/ranjeet/nanopore/artic-ncov2019/primer_schemes/ --read-file
   ../guppyplex/barcode01.fastq --fast5-directory
   ../path/to/Sequencing_run/fast5_files/ --sequencing-summary
   ../path/to/sequencing_summary.txt nCoV-2019/V3 Sample_name

   #Where,

   --scheme-directory: specifies the directory containing primer schemes
   for the analysis.

   --read-file: specifies the location of the input FASTQ file containing
   the sequencing reads.

   --fast5-directory: specifies the directory containing the raw FAST5
   files generated by the sequencer.

   --sequencing-summary: specifies the location of the sequencing summary
   file. This file contains information about the sequencing run, such as
   the number of reads generated, quality metrics.

   “nCoV-2019/V3”: specifies the primer scheme specific to the novel
   coronavirus(nCoV-2019) targeting version 3 of the genome.

   “Sample_name”: specifies the name of the sample being analyzed.

   Replace /path/to/ with the actual paths to the relevant directories and
   files on system.

     Note: In the above code, “--normalise 200” parameter is used to
     normalize coverage depth to 200X during data analysis. This
     adjustment aims to enhance the robustness of results by ensuring a
     more consistent coverage across the sequenced regions.

   Output files
   File name Description
   samplename.rg.primertrimmed.bam BAM file for visualisation after
   primer-binding site trimming
   samplename.trimmed.bam BAM file with the primers left on (used in
   variant calling)
   samplename.merged.vcf all detected variants in VCF format
   samplename.pass.vcf detected variants in VCF format passing quality
   filter
   samplename.fail.vcf detected variants in VCF format failing quality
   filter
   samplename.primers.vcf detected variants falling in primer-binding
   regions
   samplename.consensus.fasta consensus sequence
   [99]Open in a new tab
     * 39.
       Merge the Variant call files: Merge multiple variant call format
       (vcf) files into a single VCF file.

   bcftools merge --force-samples ∗.vcf.gz > all_sample.vcf.
     * 40.
       Convert the barcode name with Sample name: Perform this step for
       individual barcodes.

   for filename in barcode01∗; do echo mv ∖"$filename∖"
   ∖"${filename//barcode01/IGIB1130001}∖"; done | /bin/bash.

   # Here, “| /bin/bash/” pipes the printed 'mv' commands into the Bash
   shell (/bin/bash) for execution.

   # An alternative approach using 'rename' command for converting barcode
   names to sample names for each sample. rename -v
   's/barcode01/IGIB1130001/' ∗
     * 41.
       Change of SARS-CoV-2 header with Institute tag name for all the
       samples

   for i in ∗.fasta;do temp = ${i%.∗};echo "sed
   '1s/>[100]MN908947.3/>${temp}_CSIR-IGIB/g'
   $i > ./fastachanged/$temp.fasta";done > name.sh

   bash name.sh.

   # An alternative way to change of SARS-CoV-2 header with sample tag
   name for each samples individually.

   sed '1s/>[101]MN908947.3/>${temp}_CSIR-IGIB/g'
   barcode01.fasta > ./fastachanged/barcode01.consensus.fasta

   sed '1s/>[102]MN908947.3/>${temp}_CSIR-IGIB/g'
   barcode02.fasta > ./fastachanged/barcode02.consensus.fasta

   sed '1s/>[103]MN908947.3/>${temp}_CSIR-IGIB/g'
   barcode03.fasta > ./fastachanged/barcode03.consensus.fasta.
     * 42.
       Concatenate all fasta to one file.

   cat ∗.consensus.fasta > concatenated_consensus.fasta.
     * 43.
       Upload the concatenated fasta files to the Nextclade and Pangolin
       webtool for clade and lineage classification respectively
       ([104]Figure 1).

Figure 1.

   [105]Figure 1
   [106]Open in a new tab

   Clade and Lineage classification

   (A) Nextclade webtool for clade and (B) Pangolin webtool for lineage
   classification.

RNA-seq library preparation

     Inline graphic Timing: 13–14 h

     Inline graphic Timing: 15 min (for step 44)

     Inline graphic Timing: 1 h 30 min (for step 45)

     Inline graphic Timing: ∼ 50 min with handling time (for step 46)

     Inline graphic Timing: ∼ 2 h with handling time (for step 47)

     Inline graphic Timing: ∼ 45 min with handling time (for step 48)

     Inline graphic Timing: ∼2 h 30 min with handling time (for step 49)

     Inline graphic Timing: ∼2 h with handling time (for step 50)

     Inline graphic Timing: 1 h 30 min (for step 51)

     Inline graphic Timing: 1 h (for step 52)

   This method explains how to convert total RNA into a library suitable
   for subsequent sequencing. Steps involve RNA quantification, ribosomal
   RNA removal, RNA fragmentation, first and second cDNA strand synthesis,
   3′ end adenylation, adapter ligation, DNA fragment enrichment, final
   library quality and quantity assessment, cluster generation, and
   sequencing of reads 1 and 2.
     * 44.
       RNA Quality Check.
          + a.
            Check the concentration and purity of the isolated RNA using
            the Nanodrop system.
          + b.
            For library preparation, take the input RNA volume according
            to the 250 ng concentration and make up the volume to 10 μL
            using nuclease-free water.
     * 45.
       Depletion of rRNA from total RNA.
       In this step, rRNA is depleted from each sample and further
       purified, fragmented and primed for cDNA synthesis.
       Bind rRNA
          + a.
            In a 96-well plate, add an input concentration of 250 ng of
            the RNA and make up the volume to 10 μL for each sample using
            NFW.
          + b.
            Add 5 μL RBB (rRNA Binding Buffer) to each well.
          + c.
            Add 5 μL RRM G (rRNA Removal Mix - Gold) specific for the
            Ribo-zero gold kit to the well.
          + d.
            Carefully mix the reagents by pipetting up and down 10 times
            (Do not vortex mix).
          + e.
            Centrifuge at 280 g for 1 min.
          + f.
            Incubate the reaction plate in the thermal cycler at 68°C for
            5 min to allow denaturation of the RNA.
          + g.
            Once the reaction is over, take the plate out and incubate at
            room temperature for 1 min.
            Remove rRNA
          + h.
            Thoroughly vortex and add 35 μL RRB (rRNA Removal Beads) to
            each well of the above plate.
          + i.
            Again, mix by pipetting up and down 10 times.
          + j.
            Incubate at room temperature for 1 min.
          + k.
            Place the plate on a magnetic stand for 1 min.
          + l.
            Transfer all the supernatant to the corresponding well of the
            96-well plate.
            Clean up RNA
          + m.
            Vortex RNAClean XP Beads until well dispersed.
          + n.
            Add 99 μL or 193 μL (if starting with degraded total RNA)
            RNAClean XP Beads to each well, and then mix thoroughly by
            pipetting up and down nearly 10 times.
          + o.
            Incubate at room temperature for 15 min.
          + p.
            Place on a magnetic stand and wait until the liquid is clear
            (∼5 min).
          + q.
            Remove and discard all of the supernatant from each well.
          + r.
            Ethanol washing:
               o i.
                 Add 200 μL freshly prepared 70% EtOH to each well to
                 wash.
               o ii.
                 Incubate on the magnetic stand for 30 s.
               o iii.
                 Remove and discard all supernatant from each well.
                 (Remove all the residual EtOH from each well).
               o iv.
                 Air-dry on the magnetic stand for 15 min.
               o v.
                 Remove from the magnetic stand.

     Note: Always use freshly prepared Ethanol. Make sure all the drops
     of ethanol are removed/ dried from the walls of the plate before
     adding ELB.
          + s.
            Elution:
               o i.
                 Add 11 μL ELB to each well, and then mix thoroughly by
                 pipetting up and down.
               o ii.
                 Incubate at room temperature for 2 min.
               o iii.
                 Centrifuge at 280 × g for 1 min.
               o iv.
                 Place on a magnetic stand and wait until the liquid is
                 clear (∼5 min).
               o v.
                 Transfer 8.5 μL supernatant to the corresponding well.
          + t.
            Add 8.5 μL EPH to each well, and then mix thoroughly by
            pipetting up and down.
          + u.
            Incubate on the thermal cycler at 94°C for 8 min and hold at
            4°C.
          + v.
            Centrifuge briefly after taking out of the thermal cycler.

     Inline graphic CRITICAL: Always avoid mixing using a vortexer during
     the complete library preparation protocol (do pipette mixing). All
     the beads must be taken out and let stand at room temperature for
     30 min (never use cold beads). Especially, rRNA Removal Beads are
     highly dense, requiring meticulous and deliberate pipetting for
     precise and accurate handling. During cleanup, transfer the
     supernatant carefully by checking under a white light or on a white
     surface for the beads on tips.
     * 46.
       First strand cDNA synthesis.
       In this step, the cleaved RNA fragments are reverse transcribed
       into the first strand cDNA.
       Check the concentration and purity of the isolated RNA using the
       Nanodrop system.
          + a.
            Add SuperScript IV at a ratio of 1 μL SuperScript IV to 9 μL
            FSA.

     Note: SuperScript IV is an enzyme and should be taken out from −20°C
     temperature at the time of use only, and in a cooler/ ice. FSA
     should be handled with precautions since it contains Actinomycin D,
     a toxin
          + b.
            Add 8 μL FSA and SuperScript II mixture to each well of the
            plate, and then mix thoroughly by pipetting up and down.
          + c.
            Centrifuge at 280 × g for 1 min.
          + d.
            Incubate on the preprogrammed thermal cycler and run the
            following “synthesize 1st Strand program”. Each well contains
            25 μL.

                    Steps         Temperature  Time  Cycles
            1^st strand synthesis 25°C        10 min 1
                                  42°C        15 min 1
                                  70°C        15 min 1
            Hold                  4°C         forever
            [107]Open in a new tab
     * 47.
       Second strand cDNA synthesis.
       This process removes the RNA template, synthesizes a replacement
       strand, and incorporates dUTP in place of dTTP to generate ds cDNA.
       The incorporation of dUTP quenches the second strand during
       amplification. Magnetic beads separate the ds cDNA from the second
       strand reaction mix. The result is blunt-ended cDNA.
          + a.
            Dilute CTE to 1:50 in RSB. For example, 2 μL CTE + 98 μL RSB.
          + b.
            Add 5 μL diluted CTE to each well. Discard diluted CTE after
            use.
          + c.
            Add 5 μL RSB to each well.
          + d.
            Centrifuge SMM at 600 × g for 5 s.
          + e.
            Add 20 μL SMM to each well, and then mix thoroughly by
            pipetting up and down 6 times.
          + f.
            Centrifuge at 280 × g for 1 min.
          + g.
            Place on the preprogrammed thermal cycler and incubate at 16°C
            for 1 h. Each well contains 50 μL.
          + h.
            Place on the bench and let stand to bring to room temperature.
          + i.
            Purify cDNA.
               o i.
                 Add 90 μL AMPure XP beads to each well of the DFP plate.
               o ii.
                 Mix thoroughly by pipetting up and down 10 times.
               o iii.
                 Incubate at room temperature for 15 min.
               o iv.
                 Centrifuge at 280 × g for 1 min.
               o v.
                 Place on a magnetic stand and wait until the liquid is
                 clear (∼5 min).
               o vi.
                 Remove and discard 135 μL supernatant from each well.
          + j.
            Two times ethanol wash.
               o i.
                 Add 200 μL fresh 80% EtOH to each well.
               o ii.
                 Incubate on the magnetic stand for 30 s.
               o iii.
                 Remove and discard all supernatant from each well.
               o iv.
                 Use a 20 μL pipette to remove residual EtOH from each
                 well.
               o v.
                 Air-dry on the magnetic stand for 15 min. Do not over dry
                 beads.
          + k.
            Elution.
               o i.
                 Remove from the magnetic stand.
               o ii.
                 Add 17.5 μL RSB to each well, and then pipette mix
                 thoroughly.
               o iii.
                 Incubate at room temperature for 2 min.
               o iv.
                 Centrifuge at 280 × g for 1 min.
               o v.
                 Place on a magnetic stand and wait until the liquid is
                 clear (∼5 min).
               o vi.
                 Transfer 15 μL supernatant to the corresponding well of
                 the plate.

     Inline graphic Pause point: If you are stopping, seal the plate and
     store at −25°C to −15°C for up to 7 days.
     * 48.
       Adenylate 3′Ends.
       One adenine (A) nucleotide is added to the 3ʹ ends of the blunt
       fragments to prevent them from ligating to each other during
       adapter ligation reaction. One corresponding thymine (T) nucleotide
       on the 3ʹ end of the adapter provides a complementary overhang for
       ligating the adapter to the fragment. This strategy ensures a low
       rate of chimera (concatenated template) formation.
          + a.
            Dilute CTA to 1:100 in RSB. For example, 1 μL CTA + 99 μL RSB.
          + b.
            Add 2.5 μL diluted CTA to each well. Discard diluted CTA after
            use.
          + c.
            Centrifuge ATL at 600 × g for 5 s.
          + d.
            Add 12.5 μL ATL to each well, and then mix thoroughly by
            pipetting up and down 10 times.
          + e.
            Seal the ALP plate with a Microseal 'B' adhesive seal.
          + f.
            Centrifuge at 280 × g for 1 min.
          + g.
            Put the samples on the thermal cycler and initiate the ATAIL70
            program. Utilize the preheat lid option, setting it to 100°C,
            and then proceed with incubation at 37°C for 30 min, followed
            by 70°C for 5 min. Hold at 4°C. Each well contains 30 μL.
          + h.
            Centrifuge at 280 × g for 1 min.
     * 49.
       Ligate Adapters.
       This process ligates multiple indexing adapters to the ends of the
       ds cDNA fragments, which prepares them for hybridization onto a
       flow cell.
          + a.
            Dilute CTL 1:100 in RSB. For example, 1 μL CTL + 99 μL RSB.
            Discard the diluted CTL after use.
          + b.
            Remove LIG from −25°C to −15°C storage.
          + c.
            Add the following reagents in the order listed to each well.
               o i.
                 Diluted CTL (2.5 μL).
               o ii.
                 LIG (2.5 μL).
               o iii.
                 RNA adapters (2.5 μL).
          + d.
            Mix thoroughly by pipetting up and down 10 times.
          + e.
            Centrifuge at 280 × g for 1 min.
          + f.
            Incubate at 30°C for 10 min on the thermal cycler.
          + g.
            Add 5 μL STL to each well, and then pipette mix thoroughly.
          + h.
            Centrifuge at 280 × g for 1 min.
          + i.
            CleanUp Ligated Fragments: This cleanup is done in 2 rounds
            with different volumes of AMPure XP beads and RSB. Perform
            steps i. to xvii using the Round 1 volumes. Repeat steps i.
            through xvii with the new plate using the Round 2 volumes.

                Reagent     Round 1 Round 2
            AMPure XP beads 42 μL   50 μL
            RSB             52.5 μL 22.5 μL
            [108]Open in a new tab
               o i.
                 Add AMPure XP beads to each well.
               o ii.
                 Mix thoroughly by pipetting up and down.
               o iii.
                 Incubate at room temperature for 15 min.
               o iv.
                 Centrifuge at 280 × g for 1 min.
               o v.
                 Place on a magnetic stand and wait until the liquid is
                 clear (2–5 min).
               o vi.
                 Remove and discard all supernatant from each well.
               o vii.
                 Wash two times with 200 μL fresh 80% EtOH by incubating
                 on the magnetic stand for 30 s.
               o viii.
                 Remove and discard all supernatant from each well.
               o ix.
                 Air-dry on the magnetic stand for 15 min.
               o x.
                 Remove from the magnetic stand.
               o xi.
                 Add RSB to each well and mix by pipetting.
               o xii.
                 Incubate at room temperature for 2 min.
               o xiii.
                 Centrifuge at 280 × g for 1 min.
               o xiv.
                 Place on a magnetic stand and wait until the liquid is
                 clear (2–5 min).
               o xv.
                 Transfer 50 μL supernatant to the corresponding well of
                 the CAP plate.
               o xvi.
                 Perform round 2 clean up and transfer 20 μL supernatant
                 to the corresponding well of the PCR plate.

     Inline graphic Pause point: If you are stopping, seal the plate and
     store at −25°C to −15°C for up to 7 days.
     * 50.
       Enrich DNA Fragments.
       This process uses PCR to selectively enrich those DNA fragments
       that have adapter molecules on both ends and to amplify the amount
       of DNA in the library. PCR is performed with PPC (PCR Primer
       Cocktail) that anneals to the ends of the adapters. Minimize the
       number of PCR cycles to avoid skewing the representation of the
       library.
          + a.
            Amplify DNA Fragments.
               o i.
                 Place the PCR plate on ice and add 5 μL PPC to each well.
               o ii.
                 Add 25 μL PMM to each well, and then mix thoroughly by
                 pipetting up and down 10 times.
               o iii.
                 Centrifuge at 280 × g for 1 min.
               o iv.
                 Place on the preprogrammed thermal cycler and run the PCR
                 program. Each well contains 50 μL.
               o v.
                 Choose the preheat lid option and set to 100°C.
                 PCR reaction master mix

                     Reagent      Amount
                 PPC              5 μL
                 PMM              25 μL
                 Purified library 20 μL
                 [109]Open in a new tab
                 PCR cycling conditions

                        Steps         Temperature Time   Cycles
                 Initial Denaturation 98°C        30 s  1
                 Denaturation         98°C        10 s  15 cycles
                 Annealing            60°C        30 s
                 Extension            72°C        30 s
                 Final extension      72°C        5 min 1
                 Hold                 4°C         forever
                 [110]Open in a new tab
          + b.
            Clean Up Amplified DNA.
               o i.
                 Centrifuge at 280 × g for 1 min.
               o ii.
                 Add 47.5 μL AMPure XP beads to each well. Adapter Type
                 Volume AMPure XP beads.
               o iii.
                 Mix thoroughly by pipetting up and down 10 times.
               o iv.
                 Incubate at room temperature for 15 min.
               o v.
                 Centrifuge at 280 × g for 1 min.
               o vi.
                 Place on a magnetic stand and wait until the liquid is
                 clear (2–5 min).
               o vii.
                 Remove and discard all supernatant from each well.
               o viii.
                 Wash two times with 200 μL fresh 80% EtOH as follows.
               o ix.
                 Incubate on the magnetic stand for 30 s.
               o x.
                 Remove and discard all supernatant from each well.
               o xi.
                 Air-dry on the magnetic stand for 15 min.
               o xii.
                 Remove from the magnetic stand and add 32.5 μL RSB to
                 each well, and then mix thoroughly by pipetting up and
                 down 10 times.
               o xiii.
                 Incubate at room temperature for 2 min.
               o xiv.
                 Centrifuge at 280 × g for 1 min.
               o xv.
                 Place on a magnetic stand and wait until the liquid is
                 clear (2–5 min).
               o xvi.
                 Transfer 30 μL supernatant to the corresponding well of
                 the TSP1 plate.

     Inline graphic Pause point: If you are stopping, seal the plate and
     store at −25°C to −15°C for up to 7 days.
     * 51.
       Quantify Libraries.
          + a.
            Use 1 μL sample to quantify all the libraries individually
            using the Qubit dsDNA HS Assay kit.
          + b.
            Check the size of the quantified libraries using a DNA 1000
            chip on an Agilent Technologies 2100 Bioanalyzer using
            Bioanalyzer High Sensitivity DNA kit.

     Note: Expect the final product to be a band at ∼280 bp. A single
     peak should be observed for the library. If an unexpected small peak
     is observed at 120–170 bp, it indicates the presence of adapter
     dimers ([111]Figure 2A). If adapter dimers are present in the
     library, perform an additional clean-up step with AMPure beads
     (second round of purification may reduce the library yields). A bead
     ratio of 0.8x to 1x is usually recommended and sufficient to remove
     the unwanted adapter dimers ([112]Figure 2B).

     Inline graphic CRITICAL: One day prior to anticipated run, remove
     cartridge from −20°C storage. Thaw at room temperature for 6 hours.
     Transfer to refrigerator 4°C, and continue to thaw for a minimum of
     12 hours for 300 cycle or smaller kits. Before beginning with
     normalization and pooling, thaw the cartridge at room temperature,
     cartridge should have air on all sides except the bottom and should
     not be stacked. The NextSeq flow cell should also be taken out from
     4°C at this point (half an hour before final loading of the library)

     * 52.
       Normalize and Pool Libraries.

     Note: For best practice, perform normalization and pooling directly
     prior to sequencing. To minimize index hopping, do not store
     libraries in the pooled form.
          + a.
            Calculate the concentration of the libraries in nM based on
            the ng/μL value using the formula: (concentration in ng/μL) /
            (660 g/mol × average library size) × 10ˆ6 = concentration in
            nM.
          + b.
            Dilute the libraries to the concentration of 4 nM.
          + c.
            Take 5 μL of each library to prepare a single pool for the
            total samples intended for the NextSeq run.
          + d.
            Quantify the pooled cDNA library using Qubit for 4 nM.
          + e.
            At this stage, do the following calculations to prepare a
            final concentration of 650 pmol.
     * 53.
       Start sequencing.
          + a.
            Mix the cartridge for sequencing on the NextSeq 2000 at
            2 × 151 read length at a final loading concentration of 650
            p.m.
          + b.
            Insert the flow cell into the cartridge.
          + c.
            Load 20 μL of the 650 pm library into the cartridge.
          + d.
            Place the cartridge into the NextSeq 2000 instrument and start
            sequencing.

Figure 2.

   [113]Figure 2
   [114]Open in a new tab

   BioAnalyzer profile for the prepared library

   (A) 2 peaks indicating the adapter dimer (140 bp) and library (280 bp).

   (B) After cleanup with beads, adapter dimer removed and only 1 peak
   indicating purified library.

Meta-transcriptome analysis

     Inline graphic Timing: 9–10 days (according to the availability of
     computational resources, may take less time)

     Inline graphic Timing: 5 h (depending on the core availability might
     run in less time also) (for step 54)

     Inline graphic Timing: ∼ 10 h (Trimmomatic) (for step 55)

     Inline graphic Timing: ∼ 2 days (depending on the core availability
     might run in less time also) (for step 56)

     Inline graphic Timing: ∼ 5 h (for step 57)

     Inline graphic Timing: ∼ 5 h (for step 58)

     Inline graphic Timing: ∼ 2 days (for steps 59–61)

     Inline graphic Timing: ∼ 4 days (for step 62)

   Submit the following batch script to run analysis for quality check,
   removal of adapters and low quality sequences, alignment, taxonomic
   classification as well as functional analysis of microbes in parallel
   to the files where paired reads were concatenated into single files if
   needed.
     * 54.
       Quality check of reads.
          + a.
            Use FastQC to see the quality of each sample.
          + b.
            Use the below code to run the samples.

   fastqc -o out_file rawdata/file1_R1.fastq.gz
     * 55.
       Remove adapter sequences.
          + a.
            For each sample, filter and trim the raw metatranscriptomic
            reads using Trimmomatic v0.39.
          + b.
            Batch script is given below to run the samples in parallel.
          + c.
            Use the below code to run the analysis with precise input and
            output files.

   #!/bin/bash.

   ##run: sbatch job_script.sh.

   #SBATCH --job-name = pallawi_job

   #SBATCH --nodes = 3 #squeue

   #SBATCH --ntasks-per-node = 35.

   #SBATCH --mem = 128 GB.

   #Memory (RAM) per node. Number followed by unit prefix.

   #SBATCH --partition = compute.

   #Partition/queue in which run the job.

   adapter = /home/pallawik/anaconda3/share/trimmomatic-0.39–2/adapters

   rawdata = /lustre/pallawik/humann3/raw_data_voc_humann/raw_data

   outdata = /lustre/pallawik/humann3/raw_data_voc_humann/Analysis

   for i in $(ls ∗.fastq.gz |rev | cut -c 13- | rev | uniq)

   do.

   #remove adapter sequences

   trimmomatic PE -threads 20 $rawdata/${i}_R1.fastq.gz
   $rawdata/${i}_R2.fastq.gz $trim_out/paired2/${i}_R1.fastq.gz
   $trim_out/unpaired2/${i}up_R1.fastq.gz
   $trim_out/paired2/${i}_R2.fastq.gz
   $trim_out/unpaired2/${i}up_R2.fastq.gz
   ILLUMINACLIP:$adapter/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3
   SLIDINGWINDOW:4:15 MINLEN:36

   done.
     * 56.
       Mapping of human RNA reads to the human reference sequence.
       Time (Index build): ∼15–20 min.
          + a.
            Download reference sequence using the following link here.
          + b.
            Install HISAT2 as given in “[115]installation of
            tools/softwares/packages before analysis” section.
          + c.
            Build index for human reference sequence using the code given
            below.
          + d.
            Map the raw reads using the HISAT2 v2.2.1 algorithm onto the
            human reference.
            # Here, the command “hisat2-build” is used to create index of
            human reference sequence.
            hisat2-build -p 16 genome-human.fa genome-human.
            # hisat2 was used to map the query sequence from human
            reference sequence. Since, hisat gives output as .sam file,
            samtools is used to sort and convert the output in .bam file
            using the code below.
            hisat2 -p 100 --mp 3 --rdg 3 -x $index_hisat −1
            $rawdata/${i}_R1.fastq.gz −2 $rawdata/${i}_R2.fastq.gz
            --summary-file $outdata/hisat2/smry/${i}.txt | $samtools view
            -bSu - | $samtools sort -o $outdata/hisat2/h2_out/${i}.bam.

     Note: The parameter values as given to “--mp” (mismatch penalty) and
     “--rdg” (read-gap penalty) are taken based on recommendation of the
     published paper by Kim et al.
     [116]https://doi.org/10.1038/s41587-019-0201-4.
          + e.
            Result HISAT2 analysis (HISAT Output).
            [117]graphic file with name fx7.jpg
            [118]Open in a new tab
     * 57.
       Identification and removal of human RNA reads from
       meta-transcriptomic (Unmapped) data.
       This step removes the aligned human RNA reads.
          + a.
            Use Samtools to generate qualified non-human RNA-seq data.
          + b.
            Rest unaligned reads are considered as microbial and used in
            taxonomic classification.
            samtools view -@ 50 -b -f 2
            $outdata/${i}.bam > mapped/${i}_mapped.bam
            samtools view -@ 50 -b -F 2
            $outdata${i}.bam > unmapped/${i}_umapped.bam.
     * 58.
       Split unmapped bam into fastq.
          + a.
            For taxonomic classification mapped reads need to split as
            fastq file.
          + b.
            Kraken2 takes paired fastq files as input.
            samtools fastq -@ 50 $outdata/unmapped/${i}_umapped.bam −1
            $outdata/bam2fastq/${i}_R1.fastq.gz −2
            $outdata/bam2fastq/${i}_R2.fastq.gz −0 /dev/null -s /dev/null
            -n.
          + c.
            Output.
            [119]graphic file with name fx8.jpg
            [120]Open in a new tab
     * 59.
       Taxonomic classification.
          + a.
            Taxonomic classification with Kraken2.
               o i.
                 Installation of kraken2.
               o ii.
                 Download database here.
                 [121]https://lomanlab.github.io/mockcommunity/mc_database
                 s.html.
               o iii.
                 Download the database from the Kraken2 website, which
                 contains bacteria, archaea, and viral reference
                 sequences.
               o iv.
                 Kraken2 map reads with taxa using k-mers from the genomic
                 database and assigns taxonomy.
               o v.
                 Run:
                 kraken2 --db $index --report
                 $outdata/kraken2/k2_report/${i}.report --output
                 $outdata/kraken2/${i}.output --gzip-compressed --paired
                 $outdata/bam2fastq/${i}_R1.fastq.gz
                 $outdata/bam2fastq/${i}_R2.fastq.gz.
               o vi.
                 Output:
                 [122]graphic file with name fx9.jpg
                 [123]Open in a new tab

     Note: Although, kraken2 provides faster and more accurate
     classification, however, it cannot assign each read to the species.
     To overcome this limitation, we analyzed Kraken2 output with
     Bracken2. The provided screenshot illustrates the increase in reads
     identified by Bracken2 compared to those assigned by Kraken2 (reads
     assigned to a specific species in a particular sample).
                 [124]graphic file with name fx10.jpg
                 [125]Open in a new tab
          + b.
            Taxonomic classification with Bracken.
               o i.
                 Bracken2 (Bayesian Re-estimation of Abundance after
                 classification with KrakEN) uses the taxonomy assigned by
                 Kraken2 to estimate the number of reads per sample that
                 come from different species, using the Kraken report
                 output file.
               o ii.
                 Bracken2 provides faster and more accurate
                 classification.
               o iii.
                 Run:
                 bracken -d $index -i $outdata/kraken2/${i}.report -o
                 $outdata/braken2/${i}.bracken.
               o iv.
                 Output:
                 [126]graphic file with name fx11.jpg
                 [127]Open in a new tab
                 Output for one complete loop
                 [128]graphic file with name fx12.jpg
                 [129]Open in a new tab
                 Downstream Analysis
     * 60.
       Normalisation.
          + a.
            To normalize the sampling depth, use the CSS (Cumulative Sum
            Scaling: median-like quantile normalization) method from
            R-package metagenomeSeq.
          + b.
            Downloading and installing metagenomeseq.
            if (!requireNamespace("BiocManager", quietly = TRUE))
            install.packages("BiocManager")
            BiocManager::install("metagenomeSeq")
            library(metagenomeSeq)
          + c.
            Run:
            norm_voc <- read.csv(file = "voc/kraken/bracken-out.csv")
            norm_voc_1 <- norm[ ,c(2:ncol(norm_voc))]
            metaSeqObject = newMRexperiment(norm_voc_1)
            metaSeqObject_CSS = cumNorm( metaSeqObject , p =
            cumNormStatFast(metaSeqObject) )
            read_count_CSS = data.frame(MRcounts(metaSeqObject_CSS, norm =
            TRUE, log = TRUE))
            rownames(read_count_CSS) <- norm$species
            write.csv(read_count_CSS, "voc/normalized_bracken_voc.csv")
     * 61.
       Alpha and Beta diversity.
          + a.
            Execute alpha and beta diversity analyses in R (4.2.0) using
            vegan (v2.6–2) (Dixon, 2009) and Phyloseq (v1.40.0).
          + b.
            For alpha diversity: Use the estimated richness function
            (Shannon’s, and Chao-1) from the phyloseq package.
          + c.
            For beta diversity: Use the phyloseq::distance from the vegan
            package to generate Bray-Curtis dissimilarity matrix and
            principal coordinate (PCoA) values.
          + d.
            After the analysis, export the data from R as a CSV file.
          + e.
            For data plotting and visualization: Use the library Seaborn
            (0.12.2) and Matplotlib (3.1) build on Numpy (1.11.0) in
            Python (3.6). The detailed scripts and documentation can be
            found in the GitHub repository:
            [130]https://github.com/pallawikumari/visualization-in-python/
            tree/V1.0.0.

   if (!require("BiocManager", quietly = TRUE))

   install.packages("BiocManager")

   BiocManager::install("phyloseq")

   install.packages("vegan")

   #Alpha diversity

   library("vegan")

   data_biom <- import_biom("kraken_out.biom", parseFunction =
   parse_taxonomy_default)

   data.alpha <- estimate_richness(data_biom)

   #Beta diversity

   braycurtis <- ordinate(physeq = data_biom, method = "PCoA", distance =
   "bray", weighted = TRUE)

   braycurtis.export <- as.data.frame(braycurtis$vectors, row.names =
   NULL, optional = FALSE, cut.names = FALSE, col.names =
   names(braycurtis.$vectors), fix.empty.names = TRUE, stringsAsFactors =
   default.stringsAsFactors())

   write.csv(braycurtis.export, file = "braycurtis-krakenout-voc.csv")

   percent_explained <- 100 ∗ braycurtis$values$Eigenvalues /
   sum(braycurtis$values$Eigenvalues)

   percent_explained[1:5]
     * 62.
       HUMAnN3 analysis.
          + a.
            Use HUMAnN3 (The Human Microbiome Project Unified Metabolic
            Analysis Network 3) analysis pipeline to illustrate the
            microbial population of the metabolic pathways based on the
            MetaCyc database.
            conda create -n METAPH
            conda activate METAPH
            conda install -c bioconda humann
          + b.
            Download the database using:
            humann_databases --download uniref uniref. 90_diamond.
          + c.
            Run:
            input = /lustre/pallawik/fastq/merge
            output = /lustre/pallawik/humann3/humann_out
            chocophyln = /lustre/pallawik/humann3/db/chocophlan
            diamond = /lustre/pallawik/humann3/db/uniref
            for i in $(ls ∗.fastq.gz |rev | cut -c 10- | rev | uniq)
            do
            humann --threads 1000 --input $input/${i}.fastq.gz --output
            $output/${i} --remove-temp-output --nucleotide-database
            $chocophyln --protein-database $diamond
            done.
          + d.
            Output:
            [131]graphic file with name fx13.jpg
            [132]Open in a new tab
          + e.
            Output Interpretation:
          + f.
            Use Humann_renorm_table to obtain normalized pathway abundance
            from RPK (Reads per kilobase) to CPM (Counts per million) and
            join tables using humann_join_tables.
            humann_renorm_table --input ${i}_pathabundance.tsv --output
            output/${i}_pathabundance_relab.tsv --units relab
            humann_join_tables --input humann-output/ --output
            humann_pathabundance.tsv --file_name pathabundance_relab
            humann_join_tables --input humann-output/ --output
            humann_pathcoverage.tsv --file_name pathcoverage
            humann_join_tables --input humann-output/ --output
            humann_genefamilies.tsv --file_name genefamilies_relab
            humann_barplot -i path_with_bac_test_barplot.tsv -f
            ANAGLYCOLYSIS-PWY -m condition --output ANAGLYCOLYSIS-PWY.png
            --remove-zeros --sort sum.

     Note: The tool (humann_barplot) is used to assess the abundance of a
     specific pathway, such as “ANAGLYCOLYSIS-PWY” as mentioned in the
     provided code block. It's important to note that this tool can be
     employed to visualize abundance of various pathways, not limited to
     just one.

Transcriptome analysis

     Inline graphic Timing: 1–2 days (according to the availability of
     computational resources, may take less time)

     * 63.
       Indexing Transcriptome and Quantifying Reads using Salmon.
          + a.
            Download reference transcriptome in FASTA format for the
            Protein-coding transcript sequences on the human reference
            from ([133]https://www.gencodegenes.org/human/).
            wget
            [134]https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human
            /release_44/gencode.v44.pc_transcripts.fa.gz
            gzip -d gencode.v44.pc_transcripts.fa.gz.
          + b.
            Install the Salmon on the working directory of the server.
            That can install using the conda package manager or download
            it from the official Salmon website e.g.,
            SALMON-v.2.2.0.tar.gz
            ([135]https://salmon-tddft.jp/download.html) for RNA-seq read
            quantification.
            conda install bioconda::salmon.
          + c.
            Indexing the Transcriptome: In terminal navigate to the
            directory containing the transcriptome FASTA file and execute
            the below command.
            salmon index -t gencode.v44.pc_transcripts.fa -i index.
          + d.
            Quantify Reads using Salmon: Navigate to the directory where
            RNA-seq data is located. With the below command Salmon will
            produce several output files, including quantification
            results. Key files include "quant.sf" containing
            transcript-level abundance estimates.
            for i in ∗_R1.fastq.gz; do n = ${i%.∗}; echo "salmon quant -i
            /home/ranjeet.maurya/reference/index/ -l A −1 $i −2
            ${i%%_∗}_R2.fastq.gz -p 300 --validateMappings -o
            quants/${i%%_∗}"; done > quant.sh
            bash quant.sh.
     * 64.
       Differential expression analysis using DESeq2.
          + a.
            Download the comprehensive gene annotation (GTF format) file
            (gencode.v40.annotation.gtf) for human reference GRCh38 from
            GENCODE ([136]https://www.gencodegenes.org/human/).
          + b.
            Install the following R packages for DESeq2 analysis.
            if (!require("BiocManager", quietly = TRUE))
            install.packages("BiocManager")
            BiocManager::install("tximport")
            install.packages("dplyr")
            BiocManager::install("GenomicFeatures")
            install.packages("readr")
            BiocManager::install("DESeq2″)
          + c.
            Load the above installed R packages using the following R
            code.
            setwd("/path/to/quant_files")
            library(tximport)
            library(dplyr)
            library(GenomicFeatures)
            library(readr)
            library(DESeq2)
            library(RColorBrewer)
            library(ggplot2)
          + d.
            Load Sample tables and quantification files.
            txdb <- makeTxDbFromGFF("gencode.v40.annotation.gtf")
            k <- keys(txdb, keytype = "GENEID")
            tx2gene <- select(txdb, keys = k, keytype = "GENEID",
            columns = "TXNAME")
            head(tx2gene)
            write.csv(as.data.frame(tx2gene),file =
            "gencode.v40.annotation.csv")
          + e.
            Run DESeq2 with the settings comparing condition A vs.
            condition B.
            tx2gene<-read.csv("gencode.v40.annotation.csv", header = TRUE)
            samples <- read.csv("group1-vs-group2-samples.csv", header =
            TRUE)
            #Here, the creation of the “samples” object is importing a
            sample sheet where the ‘Group’ contains the variables denoting
            the ‘group1’ and ‘group2’ that would be used to perform the
            differential gene-expression.
            files <- file.path(samples$Patient.ID, "quant.sf")
            names(files) <- paste0(samples$Patient.ID)
            txi.salmon <- tximport(files, type = "salmon", tx2gene =
            tx2gene)
            dds <- DESeqDataSetFromTximport(txi.salmon, samples,
            ∼Patient.ID)
            keep <- rowSums(counts(dds)) >= 10.
            #creation of the “keep” object is to find genes which are
            expressed above a certain expression value threshold.
            head(dds)
            dds
            dds <- dds[keep,]
            cts<-counts(dds)
            dd<-as.matrix(cts +1)
            head(dd)
            write.csv(dd, "count_data.csv", row.names = TRUE)
            dds <- DESeqDataSetFromMatrix (countData = dd, colData =
            samples, design = ∼Group)
            dds<-DESeq(dds,)
            head(dds)
            res <- results(dds,contrast = c("Group", "group1", "group2")))
            head(res)
            summary(res)
          + f.
            Write and save the results.
            res<-res[order(res$padj),]
            head(res)
            write.csv (res, "DEGs-group1-vs-group2.csv")

     Note: This will save the differential expression results to a
     comma-separated file (csv) file for further analysis or
     visualization. The analysis outcome will produce a CSV file that
     presents the outcomes of the differential expression tests for all
     genes, ordered by the adjusted p-value. The columns in the file will
     encompass the mean expression, fold change between conditions,
     relevant statistical test metrics, p-value and adjusted p-value. The
     adjusted p-values are derived from the p-values and should be used
     in preference over the p-values.

Pathway enrichment analysis

     Inline graphic Timing: 1–2 h

     * 65.
       Functional enrichment analysis using clusterProfiler and KEGG
       pathways.
          + a.
            Install and load “clusterProfiler” R package.
          + b.
            Consider pathways with a p value cutoff of 0.05.

   # Step 1: Install the 'ClusterProfiler' package

   install.packages("ClusterProfiler")

   # Step 2: Install the 'org.Hs.e.g.,.db' Bioconductor package

   if (!requireNamespace("BiocManager", quietly = TRUE))

   install.packages("BiocManager")

   BiocManager::install("org.Hs.e.g.,.db")

   # Step 3: Load required libraries

   library(ClusterProfiler)

   library(org.Hs.e.g.,.db)

   # Step 4: Read data from CSV file and select relevant columns for
   Ensembl Ids and Fold change

   df <- read.csv("DEGs-moderate-vs-mild.csv")[,c(1,3)]

   ids <- bitr(df$Ensembl.Ids, fromType = "ENSEMBL", toType = "ENTREZID",
   OrgDb = "org.Hs.e.g.,.db")

   dedup_ids = ids[!duplicated(ids[c("ENSEMBL")]),]

   df2 = df[df$Ensembl.Ids %in% dedup_ids$ENSEMBL,]

   # Step 5: Map Entrez IDs to the dataframe

   df2$Y = dedup_ids$ENTREZID

   kegg_gene_list <- df2$log2FoldChange

   names(kegg_gene_list) <- df2$Y.

   # Step 6: Create a gene list with log2FoldChange values

   kegg_gene_list<-na.omit(kegg_gene_list)

   kegg_gene_list = sort(kegg_gene_list, decreasing = TRUE)

   # Step 7:Perform KEGG pathway enrichment analysis

   gse <- gseKEGG(geneList = kegg_gene_list,

   organism = "hsa",

   nPerm = 10000,

   minGSSize = 3,

   maxGSSize = 800,

   pvalueCutoff = 0.05,

   pAdjustMethod = "BH",

   keyType = "ncbi-geneid")

   # Step 8: Write the results to a CSV file

   write.csv(gse@result, "group1-vs-group2_kegg_enrich.csv")

Correlation analysis

     Inline graphic Timing: 30 min

   Use this analysis to identify the correlation between bacterial species
   and the expressed host genes.
     * 66.
       Utilize microbial species for correlation analysis with
       differentially expressed host genes using the corrr (0.4.4) package
       in R, employing Spearman’s correlation analysis.
     * 67.
       Establish a correlation coefficient cut-off of ≥±0.8 and a p-value
       of ≤0.05 to construct a correlation plot and visualize the
       interaction network in Cytoscape (version 3.9.1), a GUI-based tool.

   library(dplyr)

   library(tibble)

   library(tidyr)

   library(magrittr)

   library(corrplot)

   library(RColorBrewer)

   library(corrr)

   my_data <- read.csv("gene.csv")

   bacteria <- read.csv("bacteria.csv")

   df <- bind_cols(bacteria[, −1],my_data[, −1])

   res1 <- cor.mtest(df, conf.level = 0.95)

   res2 <- res1$p %>% as.data.frame()

   rownames(res2) <- colnames(df)

   colnames(res2) <- colnames(df)

   res2 <- res2%>%

   rownames_to_column("rowname") %>%

   gather("var", "p", -rowname)

   A < - corrr::correlate(bind_cols(my_data[, −1], bacteria[, −1])) %>%

   pivot_longer(cols = -term,names_to = "var",values_to = 'value') %>%

   rename(rowname = term) %>%

   left_join(res2) %>%

   mutate(value = ifelse(p < 0.05, NA_integer_, value)) %>%

   select(-p) %>%

   spread(var, value) %>%

   filter(rowname %in% colnames(my_data)) %>%

   select(one_of(c("rowname", colnames(bacteria)[-1]))) %>%

   as.data.frame() %>%

   column_to_rownames("rowname") %>%

   as.matrix() %>%

   corrplot::corrplot(is.corr = FALSE, na.label.col = "white",

   col = brewer.pal (n = 10, name = "PuOr"), tl.col = "black", tl.srt =
   45)

Expected outcomes

   The major outcome of this protocol involves the identification of
   Transcriptionally Active Microbes (TAMs) and their integration with
   host-response genes. Through an integrative approach of dual
   RNA-Sequencing and metagenomic analysis, this method provides new
   insights into potentially functional host-microbiome dynamics and
   explores the association between host pathways, genes, and functional
   microbial species. This method offers an effective means of identifying
   putative microbial candidates’ vis-à-vis various diseases caused by a
   primary infecting pathogen.

Limitations

   The RNA-seq protocol has been fine-tuned to work best with a
   substantial input of total RNA, typically ranging from 0.1 to 1 μg.
   This range offers an ideal opportunity to include a diverse range of
   RNA samples. However, it’s crucial to optimize the protocol initially
   to ensure that each sample is consistently processed with the same RNA
   concentration. This approach eliminates any potential bias in further
   steps of library preparation. This is even more important as we
   undertake integrative analysis for the expressed host genes and the
   transcribed microbes in those cohorts of patients.

   Moreover, one of the protocol’s key steps involves removing rRNA to
   enrich the desired mRNA reads. This step significantly increases the
   coverage of the desired mRNA transcripts. Unfortunately, it’s observed
   that not every sample in our library preparation process achieves
   uniform rRNA depletion. This discrepancy results in some more reads
   being assigned to rRNA in few samples. This protocol focuses mainly on
   the coding mRNA and long non coding RNAs leaving the small mRNAs.

   Additionally, during the purification step of adapter ligation, if not
   executed meticulously, there’s a risk of adapter dimer formation. These
   adapter dimers can be problematic for the success of the library
   preparation process. If not adequately removed, they may lead to issues
   in our final library.

Troubleshooting

Problem 1

   Insufficient rRNA depletion (related to Step 45).

Potential solution

     * •
       Make sure that the rRNA removal beads (RRB) are at room temperature
       before adding it to the sample and do not allow the RRB pellets to
       dry.
     * •
       Pipette up and down quickly to ensure thorough mixing. Insufficient
       mixing leads to lower levels of rRNA depletion.
     * •
       Pipette with the tips at the bottom of the well to prevent foaming.
       Excess foam leads to sample loss because foam does not transfer
       efficiently.

Problem 2

   Improper RNA fragmentation (related to Step 45t).

Potential solution

     * •
       The fragmentation of nucleic acids is a crucial step in optimizing
       library preparation, clustering, and sequencing, and the specific
       conditions for fragmentation depend on the initial state of the
       RNA, whether it’s intact or degraded.
     * •
       When working with intact RNA, the TruSeq Stranded Total RNA Library
       Prep protocol for transcriptome analysis involves fragmenting the
       RNA after rRNA depletion. For samples containing degraded RNA, it’s
       important to avoid over-fragmentation. In such cases, the
       fragmentation time should be reduced.
     * •
       This can be achieved by either omitting or modifying the thermal
       cycler Elution 2-Frag-Prime program to involve a 94°C incubation
       for X min, followed by a 4°C hold.

Problem 3

   Adapter Dimers formation (related to Step 51b).

Potential solution

     * •
       A Bioanalyzer chip is run to check the adapter dimers in the
       library which hampers the library binding to the flow cell. So to
       remove dimers before putting sequencing is inevitable and for this
       1:1 AMPure beads purification is done for libraries which have
       adapter dimers in BA profile.

Problem 4

   Storage issue during HUMAnN3 analysis (related to Step 62).

Potential solution

   In HUMAnN3, intermediate files are generated during the analysis which
   need a vast amount of space (30–40 GB/sample). Removing those
   intermediate files will be helpful if there is limited storage
   availability.

Problem 5

   Potential error during DESeq2 analysis (related to Step 64).
     * •
       Error: 'counts' must be a numeric matrix.
     * •
       Error: all genes have equal values across samples.
     * •
       Error: sample names in colData must be the same as colnames in
       countData.

Potential solution

   Performing DESeq2 analysis can sometimes lead to errors due to various
   reasons such as issues with data format, missing values, or incorrect
   input.
     * •
       This error occurs if the input data for DESeq2 is not a numeric
       matrix. Make sure your count data is in a numeric matrix or data
       frame. For example, if your counts are in a table named countData,
       you can convert it to a matrix using the “as.matrix” function.
     * •
       This error may occur if all counts for a particular gene are the
       same across all samples. Check your data for such cases and
       consider filtering out genes with very low variance.
     * •
       Ensure that the sample names in your colData match the column names
       in your count data. It’s crucial that they are in the same order.

Problem 6

   The protocol demonstrated here uses a large compute resource, which
   might not be and often not feasible for many researchers/ groups.

Potential solution

     * •
       All this analysis can also be done with a computer with ∼16–32 GB
       of RAM, only the execution time will increase. Instead of servers,
       laptops or personal systems can also be used for analyses utilizing
       alternate installation methods.
     * •
       Apart from this, alternatively the user can use web-based analysis
       platforms, such as Galaxy server ([137]https://usegalaxy.org/),
       where tools are available and easy to execute.

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Rajesh Pandey
   (rajeshp@igib.in; rajesh.p@igib.res.in).

Technical contact

   Technical questions on executing this protocol should be directed to
   and will be answered by the technical contact, Rajesh Pandey
   (rajeshp@igib.in; rajesh.p@igib.res.in).

Materials availability

   This study did not generate new unique reagents and material.

Data and code availability

     * •
       RNA-seq data have been deposited at NCBI SRA, and are publicly
       available as of the date of publication. The accession numbers for
       the RNA-seq data reported in this paper are NCBI
       SRA: [138]PRJNA676016, [139]PRJNA678831 (Pre-VOC), [140]PRJN868733,
       [141]PRJNA952815 (VOCs). Accession numbers are listed in the
       [142]key resources table.
     * •
       The software code and detailed script used by this protocol (for
       data plotting and visualization) can be accessed at
       [143]https://github.com/pallawikumari/visualization-in-python/tree/
       V1.0.0 and has been deposited at Zenodo
       ([144]https://zenodo.org/doi/10.5281/zenodo.10983961).
     * •
       This paper does not report original code.
     * •
       Any additional information required to reanalyze the data reported
       in this paper is available from the [145]lead contact upon request.

Acknowledgments