Abstract

   In recent years, the overuse of antibiotics has led to the emergence of
   antimicrobial-resistant (AMR) bacteria. To evaluate the spread of AMR
   bacteria, the reservoir of AMR genes (resistome) has been identified in
   environmental samples, hospital environments, and human populations,
   but the functional role of AMR bacteria and their persistence within
   individuals has not been fully investigated. Here, we performed a
   strain-resolved in-depth analysis of the resistome changes by
   reconstructing a large number of metagenome-assembled genomes from the
   gut microbiome of an antibiotic-treated individual. Interestingly, we
   identified two bacterial populations with different resistome profiles:
   extensively acquired antimicrobial-resistant bacteria (EARB) and
   sporadically acquired antimicrobial-resistant bacteria, and found that
   EARB showed broader drug resistance and a significant functional role
   in shaping individual microbiome composition after antibiotic
   treatment. Our findings of AMR bacteria would provide a new avenue for
   controlling the spread of AMR bacteria in the human community.

   Subject terms: Next-generation sequencing, Antimicrobials, Microbial
   genetics, Metagenomics

Introduction

   Recently, the misuse and overuse of antibiotics in medicine and food
   production have led to the emergence of antibiotic-resistant
   bacteria^[46]1,[47]2. For example, the increased use of antibiotics
   during the COVID-19 pandemic has accelerated the development of
   multidrug-resistant bacteria^[48]3, posing a new threat to modern
   society. Importantly, the human gut microbiome serves as a reservoir
   for antimicrobial resistance (AMR) genes^[49]4–[50]6, spreading these
   genes to the community. Initially, monitoring the spread of AMR genes
   was conducted using environmental samples, such as urban sewage^[51]7
   and samples from hospital environments^[52]8, as well as by studying
   normal human populations in different countries^[53]9,[54]10 and
   vertical transmission from mothers to infants^[55]11. Such surveillance
   approaches^[56]12 have been successful in the identification of AMR
   reservoirs, called resistomes, in the human community; however, a deep
   understanding of the taxonomical origins of AMR gene carriers and the
   impact of AMR bacteria on the gut commensal community is lacking. Some
   initial in-depth metagenomic studies have found that individual
   resistomes can persist for at least one year^[57]9, but the dynamic
   changes in AMR bacteria and their impact on the community have been
   poorly studied at species or strain resolution. Therefore, functional
   analysis of AMR bacteria in the individual gut microbiome will be a key
   to understand their functional niche in the human community and
   managing their unintended consequences in the human gut microbiomes.

   Here, we performed a strain-resolved, in-depth analysis of resistome
   changes for the identification of the driver microbial species of
   resistome changes and their functional niche in a given community. To
   this end, we analyzed publicly available shotgun metagenomic samples of
   healthy adults who underwent 4-day treatments with antibiotic
   cocktails^[58]13, focusing on resistome dynamics at the species/strain
   level through de novo assembly of the metagenome-assembled genome
   (MAG). Interestingly, we identified bacteria that had acquired extreme
   resistance (EARB), those that had acquired sporadic resistance (SARB),
   and non-carriers, based on their AMR gene counts. Based on further
   functional analysis, we found that EARB displayed broad drug resistance
   and a significant functional role in shaping individual microbiome
   compositions with the consequential strain switching, validated in
   multiple independent cohorts, including those with recurrent urinary
   tract infections (RUTI) and liver cirrhosis, and preterm infants.
   Therefore, our findings provide insights into the distribution and
   enrichment of AMR genes within bacterial strains and highlight the
   functional significance of multi-resistant bacteria in the
   antibiotic-treated gut community.

Methods

De novo metagenomic assembly

   We obtained 57 metagenomic samples from the original paper^[59]13 and
   generated high-quality (HQ) non-human reads. These data were collected
   before and after a 4-day treatment with antibiotic cocktails once a day
   (meropenem, gentamicin, and vancomycin). To observe longitudinal
   changes following the cessation of antibiotics, additional samples were
   obtained at day 8, day 42, and day 180. Note that among the possible 60
   samples, three samples were excluded due to the failure in library
   construction from the original study. Human contaminations were removed
   from metagenomic samples again using GRC38 and bowtie2 (default option;
   version 2.3.5.1)^[60]14. Host removed reads were used for MAG
   construction using metaWRAP (default option; version 1.3.2)
   pipeline^[61]15. First, FASTQ files were assembled by MEGAHIT
   (v1.1.3)^[62]16. The assembled files were binned using three different
   representative binning tools (MetaBat2, MaxBin2, CONCOCT). The initial
   bins were refined to obtain final MAGs. CheckM (version 1.0.12)^[63]17
   was used for quality control of MAGs, and those with >70% completeness
   and <5% contamination were considered HQ.

Antimicrobial resistance gene analysis

   AMR genes in the assembled genome were searched by Resistance Gene
   Identifier software (RGI; version 5.2.1) with Comprehensive Antibiotic
   Resistance Database (CARD) database (3.2.2)^[64]18 as a reference
   (--input_type contig --alignment_tool DIAMOND --split_prodigal_jobs
   –clean). MAGs that contain more than 17 AMR genes were defined as
   Extremely Acquired Resistant Bacteria (EARB) in this study, and MAGs
   containing AMR genes <17 were considered as SARB; MAGs having no AMR
   genes were considered as non-carriers.

   To analyze AMR dynamics, we counted the number of MAGs and AMR genes at
   different sampling points. In addition, we tracked the average number
   of AMR genes per MAG (by dividing the total number of AMR genes by the
   number of MAG) to identify individual variations in response to
   antibiotic treatment. We categorized individuals into ‘Susceptible’ and
   ‘Tolerant’ groups by considering the timing of the peak of AMR gene
   increases and the degree of the increase. To confirm microbial
   composition displacement among the tolerant group and the distances
   between different time points of given individual, we used
   metagenomic-based operational taxonomic unit (mOTU) relative abundance,
   from the original paper^[65]13, and R package vegan (version 2.6-4) to
   calculate Bray-Curtis dissimilarity between day 0 (baseline) and day
   180 of all subjects. We obtained multi-dimensional scaling coordinates
   and calculated the displacement.

   Drug classes and mechanisms of AMR genes in each MAG were obtained from
   the RGI analysis results. To analyze the distribution of various drug
   classes, the number of occurrences of each drug class was counted in
   EARB and SARB. To examine the differences in drug classes between EARB
   and SARB, the frequency of each drug class was calculated, and the
   number of MAGs carrying each drug class was counted. To get a
   prevalence score, the proportion of MAGs carrying a given drug class
   (carrier ratio) was multiplied by the frequency of the drug class.

   To analyze the sharing of drug classes and AMR genes among EARBs and
   SARBs, we counted the frequency of drug classes or AMR genes at a
   certain taxonomic level (genus level for EARB and phylum level for
   SARB). Specifically, for EARB, we considered a drug class or AMR gene
   to be “shared” if it was found in at least three out of the four total
   EARB genera. Similarly, for SARB, it was considered shared if it was
   detected in at least six out of the eight total SARB phyla.

   To explore potential correlations between specific drug classes or AMR
   genes and taxonomies, we calculated the frequency of each drug class
   within every phylum or species. This was done by dividing the number of
   times an AMR gene associated with a drug class appeared within a given
   taxonomy by the total occurrences of that taxonomy level. We performed
   this calculation separately for EARB and SARB. Following the frequency
   determination, we added one to the mean frequency value to adjust for
   any instances of zero occurrence, then applied a logarithmic
   transformation to normalize the data distribution. We visualized the
   results using a pheatmap to provide an intuitive understanding of the
   frequency and distribution patterns. We also checked the AMR signature
   from MAGs based on non-negative matrix factorization (NMF) technique
   using R package NMF. By setting the factorization rank to two, we
   extracted basis and coefficient vectors from the given NMF factors.

Taxonomy annotation of MAGs

   The taxonomy of MAGs, assembled in this study, was assigned using
   GTDB-TK classify_wf (version 2.1.0) with release 207_v2
   database^[66]19. A phylogenetic tree for EARB was constructed from 12
   healthy adults using GTDB-TK de_novo_wf. Since all EARBs were assigned
   within Enterobacteriaceae family, following parameter was used
   (--bacteria --taxa_filter o__Enterobacterales --outgroup_taxon
   f__Enterobacteriaceae). The result was used for plotting a phylogenetic
   tree in iTOL website ([67]https://itol.embl.de/)^[68]20.

Relative abundance profiling of MAGs

   The abundance of MAGs constructed from 12 healthy adults was quantified
   by the quant function of salmon program (version 0.13.1) in metaWRAP
   pipeline^[69]15 and merged with the taxonomy table. Phylum
   ‘Firmicutes_C’, ‘Firmicutes_A’ were converted to ‘Firmicutes’ to remove
   clade annotations for clarity.

   To avoid bias caused by differences in absolute abundance between
   samples, the abundance of the same phylum in the same host was added,
   and its relative abundance was calculated and added based on the
   sampling point. Relative abundance was recalculated to obtain the
   relative abundances at the phylum level for each sampling point. To
   avoid bias caused by differences in absolute abundance between samples,
   we first summed the absolute abundance of each phylum within each
   sample and converted it into a sample-level relative abundance. Next,
   we grouped these sample-level relative abundances by sampling point,
   summed them again for each phylum, and recalculated the phylum-level
   relative abundances at the sampling-point level. To compare the
   composition of the top-5 abundant phyla between sampling points, minor
   phyla, whose abundance was less than the top-5, were converted to
   ‘others’ at each sampling point. The abundance table of the
   EARB-containing community was further investigated at the species level
   and categorized into ‘EARB,’ ‘top-5’ (the top-5 abundant species,
   excluding EARB), and ‘others’. Lastly, to generate the relative
   abundance of Enterobacteriaceae and Veillonellaceae, family-level
   taxonomy information of MAGs was categorized into ‘Enterobacteriaceae’,
   ‘Veillonellaceae’, or ‘others’.

Functional analysis of MAGs

   Gene prediction using Prodigal (default mode; version 2.6.3)^[70]21 was
   conducted for further functional and community power analyses.
   Functions of predicted genes were annotated with KEGG using hmmsearch
   function of HMMER program (default option; version 3.3.1)^[71]22 using
   pre-trained hidden Markov Models (prok90_kegg94) from Raven toolbox
   ([72]https://github.com/SysBioChalmers/RAVEN/wiki/Use-Pre-Trained-HMMs)
   . KEGG pathways/modules and the list of KEGG orthology (KO) terms were
   manually parsed from the KEGG database (release 103; 2022/07)^[73]23
   and used for functional analysis. For comparison between non-EARB and
   EARB, samples that contain EARB were selected, and the KO pathway and
   module table were generated by counting the number of KO genes for the
   pathway or module in each MAG. A significant difference pathway or
   module was identified using a two-sided Wilcoxon rank-sum test with a
   confidence level of 0.95. Significant pathways were filtered based on
   the criteria: pathway gene coverage >30% and fold change >0.5 compared
   to the non-EARB. Significant modules were filtered based on the
   criteria: module gene coverage >0.8 and fold change >0.5 compared to
   the non-EARB.

Community power analysis

   To investigate the functional importance of EARB in a community, a
   community power analysis was conducted. Genes of MAGs in the
   communities with EARB existing were predicted using prodigal (default
   mode; version 2.6.3) and KEGG annotated by hmmsearch function of HMMER
   (default option version 3.3.1). Therefore, KEGG enzyme count table was
   generated for each community. Functional uniqueness was measured by
   counting number of unique KEGG enzyme in a MAG when comparing others in
   the same community by leave-one-out approach. To calculate the
   functional importance of a microbe, the KEGG enzyme counts of a microbe
   were divided by the sum of the same enzyme counts of the entire
   community (proportion of KEGG enzymes for the microbe). The sum of the
   proportion of all KEGG enzymes of a microbe was considered as the
   community power score of the given microbe. Community power scores of
   three categorized microbes (EARB, SARB, non-carrier) were also
   calculated.

Co-enrichment analysis

   Co-enrichment matrix was generated using mOTU relative abundance
   rarefied table downloaded from the original paper^[74]13. First, we
   choose three mOTU species corresponding to EARB (Escherichia coli,
   Klebsiella oxytoca, Klebsiella pneumoniae). Four non-EARB species
   (Bacteroides thetaiotaomicron, Bacteroides ovatus, Bilophila
   wadsworthia, Parabacteroides distasonis) were manually picked based on
   high prevalence and high community power score for comparison.
   Co-enrichment correlation between each selected microbes’ abundance and
   other mOTUs’ were calculated by Spearman test. mOTU species that were
   annotated ambiguously (annotated as ‘motu_linkage_group’) and showed
   correlation between –0.3 < and <0.3 with all selected species were
   excluded. The correlations that both microbes co-existed in at least 5
   communities and showed significant p value (>0.05) were selected and
   plotted using the pheatmap R package. To count positively or negatively
   related mOTUs, filtering spearman correlation test by >0.3 or < −0.3
   for each comparing bacteria. To verify the relationship between the
   relative abundance of EARB or non-EARB with mOTU richness, we sum up
   the relative abundance of all EARB mOTUs or non-EARB in each sample.
   Then we counted mOTUs, which are non-zero in relative abundance in each
   sample. We calculated the sum of relative abundance as log2 and then
   divided the samples into three categories (Rare: log2 relative
   abundance sum < −10, Normal: −10 ≤ log2 relative abundance sum <−5,
   Rich: 5 ≤ log2 relative abundance sum). Correlation between relative
   abundance sum of EARB or non-EARB and number of non-zero mOTUs (mOTU
   richness) was measured by spearman’s correlation coefficients.

Tracking endogenous strains among metagenomic samples

   To check genomic similarity between basic EARBs, we created an ANI
   matrix using fastANI^[75]24 with –matrix parameter with default options
   (version 1.33). For strain analysis, gene prediction was conducted for
   12 EARB E. coli using Prodigal^[76]21. All predicted genes were
   concatenated and then clustered based on similarity using CD-HIT
   (version 4.8.1)^[77]25 with -aS 0.9 -c 0.98 -n 10 -M 0 -d 0 -T 0 -G 0
   parameters. From CD-HIT clustering information, extract the longest 20
   homologous genes that are present in all 12 E. coli strains with the
   same length using cdbfasta (version 1.00). 20 homologous genes in most
   HQ E. coli genome (ERR1995253 bin.3) were extracted and built as a
   reference by bwa (version 0.7.17-r1188) for variant call and SNP
   search. Variant call was done by following steps. Paired FASTQ files
   from EARB E. coli containing samples were mapped against a previously
   built E. coli homologous gene reference. Generated bam file was sorted,
   indexed, and mpileup (mpileup -ugf) using samtools (version
   1.9)^[78]26. Variants were called using bcftools (bcftools call -cv)
   (version 1.9)^[79]27, and filtered to include only variants with a
   quality score of 20 or higher. Bam files were converted tabula form
   using sam2tsv version (d29b24f2b). The frequency of nucleotides in
   every variant point was counted using the converted tsv file.
   Nucleotide bases that existed in variant position <5% were discarded
   because of the possibility of sequencing error. SNP ratio of each
   variant position was counted using basic R functions. For SNP analysis
   for AMR gene, we did the same process except using AMR gene from
   ERR1995248 bin.3 (host 5 & day 8) for reference because it is the
   dominant E. coli of the host. Using the dominant strain AMR gene for
   variant calling, we were able to find a nucleotide ratio clustered into
   two in the day 4 sample. Comparing variant position and SNP information
   between longitudinal data, we could track the SNP ratio (SNP nucleotide
   frequency/total nucleotide frequency) of the variant position through
   sampling points. To confirm the existence of two E. coli strains in
   host 5 and the day 4 sample, all variant positions of the 54 AMR gene
   reference host 5 at day 4 were investigated for their sequence depth.

Validation using a different cohort dataset

   To confirm the reproducibility of EARB in other antibiotic-exposed
   human metagenome data, we downloaded raw data (i.e., FASTQ files) of
   shotgun metagenomic samples, or isolate strain genome samples, from
   three different study dataset, such as RUTI-causing E. coli isolates
   (n = 556), liver cirrhosis patient (n = 592), and preterm infant
   (n = 399). We constructed MAG from the raw data from liver cirrhosis
   patient and preterm infant data and search AMR gene using same
   pipeline, such as MetaWRAP pipeline, as we described above. For liver
   cirrhosis patients data, Megahit^[80]16 was used with -r parameter (for
   single-end input). For E. coli isolates data, we assembled contigs
   using Megahit then predicted genes using prodigal for AMR gene
   searching. We also investigated the E. coli isolates’ SNP profile of
   variant positions, which were found in SNP analysis between the major
   strain and the minor strain of host 5. We calculated the AMR prevalence
   value and compared it with the healthy adults’ EARB. To verify the
   community power score of EARB in live cirrhosis and preterm infant
   data, we selected communities that contain EARB and more than 5 other
   MAG (liver cirrhosis = 28, preterm infant = 24 communities).
   Communities for healthy adults, liver cirrhosis patients, and preterm
   infants were arranged by sampling point, MELD score (Model for
   End-Stage Liver Disease), and host day of life, respectively.

Results

De novo assembly of shotgun metagenomics revealed the enrichment of multi-AMR
genes within specific bacterial strains

   To investigate the longitudinal changes in the AMR gene repertoire of
   the gut microbiome after antibiotic treatment, known as the resistome,
   we obtained and processed publicly available fecal shotgun metagenomics
   data from 12 healthy adults with 5 time points (total 57 samples were
   used among possible 60 samples, with three samples excluded due to the
   library construction failure). These data were collected before and
   after a 4-day treatment with an antibiotic cocktail (meropenem,
   gentamicin, and vancomycin), as well as on days 8, 42, and 180 after
   the cessation of antibiotics (Fig. [81]1)^[82]13. First, we performed
   de novo genome reconstructions of 57 metagenomic samples using the
   metaWRAP pipeline^[83]15, which integrates three binning tools (see
   Methods). A total of 7858 initial bins from the three binning tools
   were filtered into 2585 HQ MAGs, which passed the quality criteria of
   completeness (>70%) and contamination (<5%) based on CheckM
   estimations^[84]17 (Fig. [85]1b and Supplementary Fig. [86]1a, b) and
   refined these into 1358 MAGs, which were merged from the consensus bins
   of the three binning tools (Supplementary Data [87]1–[88]2).

Fig. 1. Metagenome-assembled genomes revealed individual resistome changes
and discovered bacteria that had acquired extensive antimicrobial resistance.

   [89]Fig. 1
   [90]Open in a new tab

   a Overview of the shotgun metagenome analysis conducted in this study.
   From 12 healthy adults, 56 metagenomics datasets were sampled at five
   different time points, and three additional datasets (recurrent urinary
   tract infection [RUTI]-causing Escherichia coli genomes, preterm infant
   metagenomes, and liver cirrhosis metagenomes) were used for further
   validation. b Scatter plot of metagenome-assembled genome (MAG) quality
   score of three different binning tools. Using CheckM algorithm, we
   selected high-quality MAGs (total number of high-quality (HQ)
   MAGs = 2585 [blue], total number of low-quality (LQ) MAGs = 5268 [red])
   based on a completeness (Q) >70% and contamination (T) less than 5%
   (block dotted lines). c Boxplot of the number of refined MAGs based on
   the consensus of three binning tools (completeness > 70%,
   contamination < 5%; n = 1358) and the number of antimicrobial
   resistance (AMR) genes found in subjects at each sampling point. AMR
   genes were discovered by Resistance Gene Identifier (RGI) tool using
   the Comprehensive Antibiotic Resistance Database (statistical
   significance measured by Student’s t test comparing each time point
   group with the day (D)0 group as reference group. ****p ≤ 0.0001,
   **p ≤ 0.01). d Individual dynamics of the number of AMRs per MAG. Each
   line indicates AMR changes in a given host. e Line plots tracking the
   changes in the number of MAGs and average AMR gene burden per MAG of
   susceptible (Host 5 and 12) and tolerant subjects (Host 2 and 4). f AMR
   gene distribution within each MAG (n = 1358). MAGs containing more than
   17 AMR genes (black dotted line) were regarded as bacteria that had
   acquired extreme resistance (EARB) (red dots, n = 20) (statistical
   significance measured by Student’s t test comparing each time point
   group with the D0 group, ****p ≤ 0.0001, *p ≤ 0.05).

   We identified the AMR gene repertoire of the MAGs using CARD^[91]28 as
   a reference (See Methods). We traced the changes in the number of MAGs
   and AMR genes identified in each sample and summarized them by time
   point. The number of MAGs was highest at baseline (day 0),
   significantly reduced from day 4 to day 42, and then recovered on day
   180. We also found a significant increase in the number of AMR genes by
   day 8, suggesting that 4 days after the cessation of antibiotic
   treatment, resistant bacteria persisted and thrived (Fig. [92]1c).
   Interestingly, individual examinations of changes in the AMR gene ratio
   revealed diverse resistance responses of the gut microbiome to
   antibiotic treatment. For example, some individuals (host 5 and 12;
   susceptible group) showed drastic changes in their resistome from
   baseline, while other individuals (host 2 and 4; tolerant group) showed
   a similar level of AMR gene burden. This suggests different
   susceptibilities of individuals to antibiotic-induced dysbiosis (Fig.
   [93]1d, e and Supplementary Fig. [94]1c). In a previous study,
   administering antibiotics to healthy subjects not only altered their
   gut microbiomes to resemble those of intensive care unit patients, but
   also resulted in a protracted recovery in some subjects, as confirmed
   by the maximum displacement between baseline and final data points in
   principal component analysis^[95]29. In line with these findings, our
   study revealed that individuals in the susceptible group, who
   experienced a rapid enrichment of AMR genes following antibiotic
   treatment, demonstrated the longest distance from baseline (day 0) to
   the final sampling point (day 180), indicating a delayed recovery of
   the microbiome (Supplementary Fig. [96]1d).

   Next, we examined the correlation between MAGs and the increased
   resistome by quantifying the prevalence of AMR genes within each MAG at
   different time points (Fig. [97]1f). In our temporal analysis, we found
   that a specific set of 20 MAGs contained a notably high number of AMR
   gene, reaching a saturation point in the cumulative distribution of AMR
   counts, with each MAG containing at least 17 AMR genes (Supplementary
   Fig. [98]2a). Therefore, we defined bacteria with more than 17 AMR
   genes as extensively acquired antimicrobial resistance bacteria (EARB).
   The EARB showed comparable assembly quality compared to other MAGs
   (Supplementary Fig. [99]2b, c), indicating that the higher number of
   AMR genes in EARB was not an artifact of poor metagenome assembly.
   Notably, we found that individuals with different resistome changes,
   i.e., susceptible and resistant groups, showed distinct emergence
   timings of EARB species following antibiotic treatment (Supplementary
   Fig. [100]1c). Therefore, to better understand the changes in
   individual microbiomes after antibiotics treatment, the dynamics of the
   abundance and functions of EARB species will be the key to
   understanding the effects of antibiotics on gut microbiome community.

Characteristics of EARB strains at the taxonomical, functional, and community
level

   We further explored the characteristics of EARB by comparing their
   taxonomy and abundance with those of other microbes within the same
   microbial community. We first annotated the taxonomy of all MAGs
   identified using Genome Database Taxonomy Toolkit (GTDB-TK)^[101]19 and
   constructed a phylogenetic tree for the EARB species (Supplementary
   Fig. [102]3a, see Methods). Of note, all EARB belonged to the
   Proteobacteria phylum (also known as Pseudomonadota), predominantly
   within the genera Escherichia, Klebsiella, Enterobacter, and
   Cronobacter, which are widely recognized as pathobionts associated with
   various chronic diseases and bloodstream infections^[103]30. We also
   examined the taxonomy of MAGs containing fewer than 17 AMR genes,
   referred to as SARB. The majority of SARB belonged to the phyla
   Bacteroidetes (also known as Bacteroidota) (62.8%) and Firmicutes (also
   known as Bacillota) (27.5%), while a small proportion belonged to
   Proteobacteria (3.8%). AMR non-carriers mostly belonged to Firmicutes
   (82.6%), implying different taxonomic preferences for the acquisition