Abstract In recent years, the overuse of antibiotics has led to the emergence of antimicrobial-resistant (AMR) bacteria. To evaluate the spread of AMR bacteria, the reservoir of AMR genes (resistome) has been identified in environmental samples, hospital environments, and human populations, but the functional role of AMR bacteria and their persistence within individuals has not been fully investigated. Here, we performed a strain-resolved in-depth analysis of the resistome changes by reconstructing a large number of metagenome-assembled genomes from the gut microbiome of an antibiotic-treated individual. Interestingly, we identified two bacterial populations with different resistome profiles: extensively acquired antimicrobial-resistant bacteria (EARB) and sporadically acquired antimicrobial-resistant bacteria, and found that EARB showed broader drug resistance and a significant functional role in shaping individual microbiome composition after antibiotic treatment. Our findings of AMR bacteria would provide a new avenue for controlling the spread of AMR bacteria in the human community. Subject terms: Next-generation sequencing, Antimicrobials, Microbial genetics, Metagenomics Introduction Recently, the misuse and overuse of antibiotics in medicine and food production have led to the emergence of antibiotic-resistant bacteria^[46]1,[47]2. For example, the increased use of antibiotics during the COVID-19 pandemic has accelerated the development of multidrug-resistant bacteria^[48]3, posing a new threat to modern society. Importantly, the human gut microbiome serves as a reservoir for antimicrobial resistance (AMR) genes^[49]4–[50]6, spreading these genes to the community. Initially, monitoring the spread of AMR genes was conducted using environmental samples, such as urban sewage^[51]7 and samples from hospital environments^[52]8, as well as by studying normal human populations in different countries^[53]9,[54]10 and vertical transmission from mothers to infants^[55]11. Such surveillance approaches^[56]12 have been successful in the identification of AMR reservoirs, called resistomes, in the human community; however, a deep understanding of the taxonomical origins of AMR gene carriers and the impact of AMR bacteria on the gut commensal community is lacking. Some initial in-depth metagenomic studies have found that individual resistomes can persist for at least one year^[57]9, but the dynamic changes in AMR bacteria and their impact on the community have been poorly studied at species or strain resolution. Therefore, functional analysis of AMR bacteria in the individual gut microbiome will be a key to understand their functional niche in the human community and managing their unintended consequences in the human gut microbiomes. Here, we performed a strain-resolved, in-depth analysis of resistome changes for the identification of the driver microbial species of resistome changes and their functional niche in a given community. To this end, we analyzed publicly available shotgun metagenomic samples of healthy adults who underwent 4-day treatments with antibiotic cocktails^[58]13, focusing on resistome dynamics at the species/strain level through de novo assembly of the metagenome-assembled genome (MAG). Interestingly, we identified bacteria that had acquired extreme resistance (EARB), those that had acquired sporadic resistance (SARB), and non-carriers, based on their AMR gene counts. Based on further functional analysis, we found that EARB displayed broad drug resistance and a significant functional role in shaping individual microbiome compositions with the consequential strain switching, validated in multiple independent cohorts, including those with recurrent urinary tract infections (RUTI) and liver cirrhosis, and preterm infants. Therefore, our findings provide insights into the distribution and enrichment of AMR genes within bacterial strains and highlight the functional significance of multi-resistant bacteria in the antibiotic-treated gut community. Methods De novo metagenomic assembly We obtained 57 metagenomic samples from the original paper^[59]13 and generated high-quality (HQ) non-human reads. These data were collected before and after a 4-day treatment with antibiotic cocktails once a day (meropenem, gentamicin, and vancomycin). To observe longitudinal changes following the cessation of antibiotics, additional samples were obtained at day 8, day 42, and day 180. Note that among the possible 60 samples, three samples were excluded due to the failure in library construction from the original study. Human contaminations were removed from metagenomic samples again using GRC38 and bowtie2 (default option; version 2.3.5.1)^[60]14. Host removed reads were used for MAG construction using metaWRAP (default option; version 1.3.2) pipeline^[61]15. First, FASTQ files were assembled by MEGAHIT (v1.1.3)^[62]16. The assembled files were binned using three different representative binning tools (MetaBat2, MaxBin2, CONCOCT). The initial bins were refined to obtain final MAGs. CheckM (version 1.0.12)^[63]17 was used for quality control of MAGs, and those with >70% completeness and <5% contamination were considered HQ. Antimicrobial resistance gene analysis AMR genes in the assembled genome were searched by Resistance Gene Identifier software (RGI; version 5.2.1) with Comprehensive Antibiotic Resistance Database (CARD) database (3.2.2)^[64]18 as a reference (--input_type contig --alignment_tool DIAMOND --split_prodigal_jobs –clean). MAGs that contain more than 17 AMR genes were defined as Extremely Acquired Resistant Bacteria (EARB) in this study, and MAGs containing AMR genes <17 were considered as SARB; MAGs having no AMR genes were considered as non-carriers. To analyze AMR dynamics, we counted the number of MAGs and AMR genes at different sampling points. In addition, we tracked the average number of AMR genes per MAG (by dividing the total number of AMR genes by the number of MAG) to identify individual variations in response to antibiotic treatment. We categorized individuals into ‘Susceptible’ and ‘Tolerant’ groups by considering the timing of the peak of AMR gene increases and the degree of the increase. To confirm microbial composition displacement among the tolerant group and the distances between different time points of given individual, we used metagenomic-based operational taxonomic unit (mOTU) relative abundance, from the original paper^[65]13, and R package vegan (version 2.6-4) to calculate Bray-Curtis dissimilarity between day 0 (baseline) and day 180 of all subjects. We obtained multi-dimensional scaling coordinates and calculated the displacement. Drug classes and mechanisms of AMR genes in each MAG were obtained from the RGI analysis results. To analyze the distribution of various drug classes, the number of occurrences of each drug class was counted in EARB and SARB. To examine the differences in drug classes between EARB and SARB, the frequency of each drug class was calculated, and the number of MAGs carrying each drug class was counted. To get a prevalence score, the proportion of MAGs carrying a given drug class (carrier ratio) was multiplied by the frequency of the drug class. To analyze the sharing of drug classes and AMR genes among EARBs and SARBs, we counted the frequency of drug classes or AMR genes at a certain taxonomic level (genus level for EARB and phylum level for SARB). Specifically, for EARB, we considered a drug class or AMR gene to be “shared” if it was found in at least three out of the four total EARB genera. Similarly, for SARB, it was considered shared if it was detected in at least six out of the eight total SARB phyla. To explore potential correlations between specific drug classes or AMR genes and taxonomies, we calculated the frequency of each drug class within every phylum or species. This was done by dividing the number of times an AMR gene associated with a drug class appeared within a given taxonomy by the total occurrences of that taxonomy level. We performed this calculation separately for EARB and SARB. Following the frequency determination, we added one to the mean frequency value to adjust for any instances of zero occurrence, then applied a logarithmic transformation to normalize the data distribution. We visualized the results using a pheatmap to provide an intuitive understanding of the frequency and distribution patterns. We also checked the AMR signature from MAGs based on non-negative matrix factorization (NMF) technique using R package NMF. By setting the factorization rank to two, we extracted basis and coefficient vectors from the given NMF factors. Taxonomy annotation of MAGs The taxonomy of MAGs, assembled in this study, was assigned using GTDB-TK classify_wf (version 2.1.0) with release 207_v2 database^[66]19. A phylogenetic tree for EARB was constructed from 12 healthy adults using GTDB-TK de_novo_wf. Since all EARBs were assigned within Enterobacteriaceae family, following parameter was used (--bacteria --taxa_filter o__Enterobacterales --outgroup_taxon f__Enterobacteriaceae). The result was used for plotting a phylogenetic tree in iTOL website ([67]https://itol.embl.de/)^[68]20. Relative abundance profiling of MAGs The abundance of MAGs constructed from 12 healthy adults was quantified by the quant function of salmon program (version 0.13.1) in metaWRAP pipeline^[69]15 and merged with the taxonomy table. Phylum ‘Firmicutes_C’, ‘Firmicutes_A’ were converted to ‘Firmicutes’ to remove clade annotations for clarity. To avoid bias caused by differences in absolute abundance between samples, the abundance of the same phylum in the same host was added, and its relative abundance was calculated and added based on the sampling point. Relative abundance was recalculated to obtain the relative abundances at the phylum level for each sampling point. To avoid bias caused by differences in absolute abundance between samples, we first summed the absolute abundance of each phylum within each sample and converted it into a sample-level relative abundance. Next, we grouped these sample-level relative abundances by sampling point, summed them again for each phylum, and recalculated the phylum-level relative abundances at the sampling-point level. To compare the composition of the top-5 abundant phyla between sampling points, minor phyla, whose abundance was less than the top-5, were converted to ‘others’ at each sampling point. The abundance table of the EARB-containing community was further investigated at the species level and categorized into ‘EARB,’ ‘top-5’ (the top-5 abundant species, excluding EARB), and ‘others’. Lastly, to generate the relative abundance of Enterobacteriaceae and Veillonellaceae, family-level taxonomy information of MAGs was categorized into ‘Enterobacteriaceae’, ‘Veillonellaceae’, or ‘others’. Functional analysis of MAGs Gene prediction using Prodigal (default mode; version 2.6.3)^[70]21 was conducted for further functional and community power analyses. Functions of predicted genes were annotated with KEGG using hmmsearch function of HMMER program (default option; version 3.3.1)^[71]22 using pre-trained hidden Markov Models (prok90_kegg94) from Raven toolbox ([72]https://github.com/SysBioChalmers/RAVEN/wiki/Use-Pre-Trained-HMMs) . KEGG pathways/modules and the list of KEGG orthology (KO) terms were manually parsed from the KEGG database (release 103; 2022/07)^[73]23 and used for functional analysis. For comparison between non-EARB and EARB, samples that contain EARB were selected, and the KO pathway and module table were generated by counting the number of KO genes for the pathway or module in each MAG. A significant difference pathway or module was identified using a two-sided Wilcoxon rank-sum test with a confidence level of 0.95. Significant pathways were filtered based on the criteria: pathway gene coverage >30% and fold change >0.5 compared to the non-EARB. Significant modules were filtered based on the criteria: module gene coverage >0.8 and fold change >0.5 compared to the non-EARB. Community power analysis To investigate the functional importance of EARB in a community, a community power analysis was conducted. Genes of MAGs in the communities with EARB existing were predicted using prodigal (default mode; version 2.6.3) and KEGG annotated by hmmsearch function of HMMER (default option version 3.3.1). Therefore, KEGG enzyme count table was generated for each community. Functional uniqueness was measured by counting number of unique KEGG enzyme in a MAG when comparing others in the same community by leave-one-out approach. To calculate the functional importance of a microbe, the KEGG enzyme counts of a microbe were divided by the sum of the same enzyme counts of the entire community (proportion of KEGG enzymes for the microbe). The sum of the proportion of all KEGG enzymes of a microbe was considered as the community power score of the given microbe. Community power scores of three categorized microbes (EARB, SARB, non-carrier) were also calculated. Co-enrichment analysis Co-enrichment matrix was generated using mOTU relative abundance rarefied table downloaded from the original paper^[74]13. First, we choose three mOTU species corresponding to EARB (Escherichia coli, Klebsiella oxytoca, Klebsiella pneumoniae). Four non-EARB species (Bacteroides thetaiotaomicron, Bacteroides ovatus, Bilophila wadsworthia, Parabacteroides distasonis) were manually picked based on high prevalence and high community power score for comparison. Co-enrichment correlation between each selected microbes’ abundance and other mOTUs’ were calculated by Spearman test. mOTU species that were annotated ambiguously (annotated as ‘motu_linkage_group’) and showed correlation between –0.3 < and <0.3 with all selected species were excluded. The correlations that both microbes co-existed in at least 5 communities and showed significant p value (>0.05) were selected and plotted using the pheatmap R package. To count positively or negatively related mOTUs, filtering spearman correlation test by >0.3 or < −0.3 for each comparing bacteria. To verify the relationship between the relative abundance of EARB or non-EARB with mOTU richness, we sum up the relative abundance of all EARB mOTUs or non-EARB in each sample. Then we counted mOTUs, which are non-zero in relative abundance in each sample. We calculated the sum of relative abundance as log2 and then divided the samples into three categories (Rare: log2 relative abundance sum < −10, Normal: −10 ≤ log2 relative abundance sum <−5, Rich: 5 ≤ log2 relative abundance sum). Correlation between relative abundance sum of EARB or non-EARB and number of non-zero mOTUs (mOTU richness) was measured by spearman’s correlation coefficients. Tracking endogenous strains among metagenomic samples To check genomic similarity between basic EARBs, we created an ANI matrix using fastANI^[75]24 with –matrix parameter with default options (version 1.33). For strain analysis, gene prediction was conducted for 12 EARB E. coli using Prodigal^[76]21. All predicted genes were concatenated and then clustered based on similarity using CD-HIT (version 4.8.1)^[77]25 with -aS 0.9 -c 0.98 -n 10 -M 0 -d 0 -T 0 -G 0 parameters. From CD-HIT clustering information, extract the longest 20 homologous genes that are present in all 12 E. coli strains with the same length using cdbfasta (version 1.00). 20 homologous genes in most HQ E. coli genome (ERR1995253 bin.3) were extracted and built as a reference by bwa (version 0.7.17-r1188) for variant call and SNP search. Variant call was done by following steps. Paired FASTQ files from EARB E. coli containing samples were mapped against a previously built E. coli homologous gene reference. Generated bam file was sorted, indexed, and mpileup (mpileup -ugf) using samtools (version 1.9)^[78]26. Variants were called using bcftools (bcftools call -cv) (version 1.9)^[79]27, and filtered to include only variants with a quality score of 20 or higher. Bam files were converted tabula form using sam2tsv version (d29b24f2b). The frequency of nucleotides in every variant point was counted using the converted tsv file. Nucleotide bases that existed in variant position <5% were discarded because of the possibility of sequencing error. SNP ratio of each variant position was counted using basic R functions. For SNP analysis for AMR gene, we did the same process except using AMR gene from ERR1995248 bin.3 (host 5 & day 8) for reference because it is the dominant E. coli of the host. Using the dominant strain AMR gene for variant calling, we were able to find a nucleotide ratio clustered into two in the day 4 sample. Comparing variant position and SNP information between longitudinal data, we could track the SNP ratio (SNP nucleotide frequency/total nucleotide frequency) of the variant position through sampling points. To confirm the existence of two E. coli strains in host 5 and the day 4 sample, all variant positions of the 54 AMR gene reference host 5 at day 4 were investigated for their sequence depth. Validation using a different cohort dataset To confirm the reproducibility of EARB in other antibiotic-exposed human metagenome data, we downloaded raw data (i.e., FASTQ files) of shotgun metagenomic samples, or isolate strain genome samples, from three different study dataset, such as RUTI-causing E. coli isolates (n = 556), liver cirrhosis patient (n = 592), and preterm infant (n = 399). We constructed MAG from the raw data from liver cirrhosis patient and preterm infant data and search AMR gene using same pipeline, such as MetaWRAP pipeline, as we described above. For liver cirrhosis patients data, Megahit^[80]16 was used with -r parameter (for single-end input). For E. coli isolates data, we assembled contigs using Megahit then predicted genes using prodigal for AMR gene searching. We also investigated the E. coli isolates’ SNP profile of variant positions, which were found in SNP analysis between the major strain and the minor strain of host 5. We calculated the AMR prevalence value and compared it with the healthy adults’ EARB. To verify the community power score of EARB in live cirrhosis and preterm infant data, we selected communities that contain EARB and more than 5 other MAG (liver cirrhosis = 28, preterm infant = 24 communities). Communities for healthy adults, liver cirrhosis patients, and preterm infants were arranged by sampling point, MELD score (Model for End-Stage Liver Disease), and host day of life, respectively. Results De novo assembly of shotgun metagenomics revealed the enrichment of multi-AMR genes within specific bacterial strains To investigate the longitudinal changes in the AMR gene repertoire of the gut microbiome after antibiotic treatment, known as the resistome, we obtained and processed publicly available fecal shotgun metagenomics data from 12 healthy adults with 5 time points (total 57 samples were used among possible 60 samples, with three samples excluded due to the library construction failure). These data were collected before and after a 4-day treatment with an antibiotic cocktail (meropenem, gentamicin, and vancomycin), as well as on days 8, 42, and 180 after the cessation of antibiotics (Fig. [81]1)^[82]13. First, we performed de novo genome reconstructions of 57 metagenomic samples using the metaWRAP pipeline^[83]15, which integrates three binning tools (see Methods). A total of 7858 initial bins from the three binning tools were filtered into 2585 HQ MAGs, which passed the quality criteria of completeness (>70%) and contamination (<5%) based on CheckM estimations^[84]17 (Fig. [85]1b and Supplementary Fig. [86]1a, b) and refined these into 1358 MAGs, which were merged from the consensus bins of the three binning tools (Supplementary Data [87]1–[88]2). Fig. 1. Metagenome-assembled genomes revealed individual resistome changes and discovered bacteria that had acquired extensive antimicrobial resistance. [89]Fig. 1 [90]Open in a new tab a Overview of the shotgun metagenome analysis conducted in this study. From 12 healthy adults, 56 metagenomics datasets were sampled at five different time points, and three additional datasets (recurrent urinary tract infection [RUTI]-causing Escherichia coli genomes, preterm infant metagenomes, and liver cirrhosis metagenomes) were used for further validation. b Scatter plot of metagenome-assembled genome (MAG) quality score of three different binning tools. Using CheckM algorithm, we selected high-quality MAGs (total number of high-quality (HQ) MAGs = 2585 [blue], total number of low-quality (LQ) MAGs = 5268 [red]) based on a completeness (Q) >70% and contamination (T) less than 5% (block dotted lines). c Boxplot of the number of refined MAGs based on the consensus of three binning tools (completeness > 70%, contamination < 5%; n = 1358) and the number of antimicrobial resistance (AMR) genes found in subjects at each sampling point. AMR genes were discovered by Resistance Gene Identifier (RGI) tool using the Comprehensive Antibiotic Resistance Database (statistical significance measured by Student’s t test comparing each time point group with the day (D)0 group as reference group. ****p ≤ 0.0001, **p ≤ 0.01). d Individual dynamics of the number of AMRs per MAG. Each line indicates AMR changes in a given host. e Line plots tracking the changes in the number of MAGs and average AMR gene burden per MAG of susceptible (Host 5 and 12) and tolerant subjects (Host 2 and 4). f AMR gene distribution within each MAG (n = 1358). MAGs containing more than 17 AMR genes (black dotted line) were regarded as bacteria that had acquired extreme resistance (EARB) (red dots, n = 20) (statistical significance measured by Student’s t test comparing each time point group with the D0 group, ****p ≤ 0.0001, *p ≤ 0.05). We identified the AMR gene repertoire of the MAGs using CARD^[91]28 as a reference (See Methods). We traced the changes in the number of MAGs and AMR genes identified in each sample and summarized them by time point. The number of MAGs was highest at baseline (day 0), significantly reduced from day 4 to day 42, and then recovered on day 180. We also found a significant increase in the number of AMR genes by day 8, suggesting that 4 days after the cessation of antibiotic treatment, resistant bacteria persisted and thrived (Fig. [92]1c). Interestingly, individual examinations of changes in the AMR gene ratio revealed diverse resistance responses of the gut microbiome to antibiotic treatment. For example, some individuals (host 5 and 12; susceptible group) showed drastic changes in their resistome from baseline, while other individuals (host 2 and 4; tolerant group) showed a similar level of AMR gene burden. This suggests different susceptibilities of individuals to antibiotic-induced dysbiosis (Fig. [93]1d, e and Supplementary Fig. [94]1c). In a previous study, administering antibiotics to healthy subjects not only altered their gut microbiomes to resemble those of intensive care unit patients, but also resulted in a protracted recovery in some subjects, as confirmed by the maximum displacement between baseline and final data points in principal component analysis^[95]29. In line with these findings, our study revealed that individuals in the susceptible group, who experienced a rapid enrichment of AMR genes following antibiotic treatment, demonstrated the longest distance from baseline (day 0) to the final sampling point (day 180), indicating a delayed recovery of the microbiome (Supplementary Fig. [96]1d). Next, we examined the correlation between MAGs and the increased resistome by quantifying the prevalence of AMR genes within each MAG at different time points (Fig. [97]1f). In our temporal analysis, we found that a specific set of 20 MAGs contained a notably high number of AMR gene, reaching a saturation point in the cumulative distribution of AMR counts, with each MAG containing at least 17 AMR genes (Supplementary Fig. [98]2a). Therefore, we defined bacteria with more than 17 AMR genes as extensively acquired antimicrobial resistance bacteria (EARB). The EARB showed comparable assembly quality compared to other MAGs (Supplementary Fig. [99]2b, c), indicating that the higher number of AMR genes in EARB was not an artifact of poor metagenome assembly. Notably, we found that individuals with different resistome changes, i.e., susceptible and resistant groups, showed distinct emergence timings of EARB species following antibiotic treatment (Supplementary Fig. [100]1c). Therefore, to better understand the changes in individual microbiomes after antibiotics treatment, the dynamics of the abundance and functions of EARB species will be the key to understanding the effects of antibiotics on gut microbiome community. Characteristics of EARB strains at the taxonomical, functional, and community level We further explored the characteristics of EARB by comparing their taxonomy and abundance with those of other microbes within the same microbial community. We first annotated the taxonomy of all MAGs identified using Genome Database Taxonomy Toolkit (GTDB-TK)^[101]19 and constructed a phylogenetic tree for the EARB species (Supplementary Fig. [102]3a, see Methods). Of note, all EARB belonged to the Proteobacteria phylum (also known as Pseudomonadota), predominantly within the genera Escherichia, Klebsiella, Enterobacter, and Cronobacter, which are widely recognized as pathobionts associated with various chronic diseases and bloodstream infections^[103]30. We also examined the taxonomy of MAGs containing fewer than 17 AMR genes, referred to as SARB. The majority of SARB belonged to the phyla Bacteroidetes (also known as Bacteroidota) (62.8%) and Firmicutes (also known as Bacillota) (27.5%), while a small proportion belonged to Proteobacteria (3.8%). AMR non-carriers mostly belonged to Firmicutes (82.6%), implying different taxonomic preferences for the acquisition