ABSTRACT Oenococcus oeni, the only species of lactic acid bacteria capable of fully completing malolactic fermentation under challenging wine conditions, continues to intrigue researchers owing to its remarkable adaptability, particularly in combating acid stress. However, the mechanism underlying its superior adaptation to wine stresses still remains elusive due to the lack of viable genetic manipulation tools for this species. In this study, we conducted genomic sequencing and acid resistance phenotype analysis of 255 O. oeni isolates derived from diverse wine regions across China, aiming to elucidate their strain diversity and genotype-phenotype associations of acid resistance through comparative genomics. A significant correlation between phenotypes and evolutionary relationships was observed. Notably, phylogroup B predominantly consisted of acid-resistant isolates, primarily originating from Shandong and Shaanxi wine regions. Furthermore, we uncovered a noteworthy linkage between prophage genomic islands and acid resistance phenotype. Using genome-wide association studies, we identified key genes correlated with acid resistance, primarily involved in carbohydrates and amino acid metabolism processes. This study offers profound insights into the genetic diversity and genetic basis underlying adaptation mechanisms to acid stress in O. oeni. IMPORTANCE This study provides valuable insights into the genetic basis of acid resistance in Oenococcus oeni, a key lactic acid bacterium in winemaking. By analyzing 255 isolates from diverse wine regions in China, we identified significant correlations between strain diversity, genomic islands, and acid resistance phenotypes. Our findings reveal that certain prophage-related genomic islands and specific genes are closely linked to acid resistance, offering a deeper understanding of how O. oeni adapts to acidic environments. These discoveries not only advance our knowledge of microbial stress responses but also pave the way for selecting and engineering acid-resistant strains, enhancing malolactic fermentation efficiency and wine quality. This research underscores the importance of genomics in improving winemaking practices and addressing challenges posed by high-acidity wines. KEYWORDS: lactic acid bacteria, pan-genome, genomic island, horizontal gene transfer, genome-wide association study INTRODUCTION Malolactic fermentation (MLF) is a critical process in winemaking, particularly red wine. During this process, lactic acid bacteria (LAB) convert the sharp-tasting malic acid into the milder lactic acid, which reduces the wine’s acidity, enhances its flavor and aroma, and improves microbial stability ([40]1). However, winemakers often encounter several challenges in the MLF process, such as prolonged fermentation, incomplete fermentation, and contamination by undesirable microbes ([41]2[42]–[43]4). These challenges mainly arise from the difficulty of LAB to adapt to the harsh conditions of winemaking, including high acidity, high ethanol content, low temperatures, high levels of sulfur dioxide (SO[2]), and nutrient scarcity ([44]5). The growth of LAB is notably inhibited under acidic stress, which can significantly hinder the fermentation process. Consequently, the selection of LAB strains that exhibit strong acid resistance is imperative to ensure the stability and efficiency of MLF. Oenococcus oeni is the most commonly employed strain in MLF due to its remarkable adaptability to the stress conditions prevalent in wine. In winemaking, MLF is typically carried out under monoculture conditions by inoculating a selected strain of O. oeni ([45]6). Although other LAB species, such as Lactobacillus, Pediococcus, and Leuconostoc, are also capable of performing MLF, they often struggle to survive the extreme conditions in wine, leading to premature fermentation cessation or the production of harmful metabolites, which have a detrimental effect on wine quality ([46]3). Despite O. oeni’s ability to tolerate some of these stress factors, its growth can still be inhibited if the wine’s acidity is excessively high ([47]7). It is therefore vital to select O. oeni strains with enhanced acid resistance in order to ensure the successful progression of MLF. Such strains can enhance fermentation efficiency, reduce fermentation costs, mitigate the risk of microbial contamination, and ultimately elevate the overall quality of the wine ([48]4, [49]8). In contrast to other LAB, there is a paucity of suitable molecular biology techniques to genetically modify O. oeni, rendering it challenging to investigate gene function and regulatory mechanisms in O. oeni ([50]5). Researchers have devised methods such as electroporation and antisense RNA to research O. oeni at the molecular level ([51]9). Nevertheless, these methodologies encounter limited utility due to issues such as poor reproducibility. While heterologous expression in other LAB has partially addressed the difficulty of genetic modification in O. oeni ([52]9). Despite these advances, challenges remain, resulting in low gene expression levels and potentially inaccurate outcomes due to differences in cellular environments and metabolic pathways among LAB. In contrast, genomics offers a means of effectively studying the stress response mechanism of O. oeni at the molecular level, significantly enhancing research efficiency. By analyzing the genome, we can find potential genotype-phenotype associations, which serve as a solid foundation for future genetic confirmation. Microorganisms can adapt to stressful environments through horizontal gene transfer (HGT), which results in the generation of a pan-genome comprising multiple accessory genes ([53]10). Comparative genomics analysis of the O. oeni pan-genome can reveal evolutionary relationships among strains and elucidate strain diversity associated with phenotypes. Currently, comparative genomics is frequently integrated with genome-wide association studies (GWASs) to more comprehensively elucidate correlations between genotype and phenotype ([54]11). GWAS has been employed to investigate a multitude of statistical associations related to bacterial pathogenicity, antibiotic resistance, and other traits ([55]12). In this study, we sequenced, assembled, and annotated 255 O. oeni isolates for large-scale comparative genomic analysis. Subsequently, the isolates were subjected to an assessment of their resistance to acid stress environments. By integrating comparative genomics with GWAS, we elucidated the evolutionary relationships among O. oeni isolates and investigated genotype-phenotype associations of acid resistance. The primary objectives of this study were to examine the relationship between isolation sources, strain diversity, and genomic islands, as well as to identify genes associated with acid resistance. Our comprehensive analysis of population structural characteristics provides novel insights into the genetic traits of O. oeni and offers a foundation for further investigation into the potential application of acid-resistant O. oeni in the winemaking industry. RESULTS Genomic and pan-genomic characterization In order to gain a comprehensive understanding of the strain diversity of O. oeni comprehensively, we utilized 255 O. oeni isolates from various regions of China between 1999 and 2016 ([56]Table S1). Among these, seven isolates were obtained by downloading from the NCBI database. The genomes of the remaining 248 isolates were sequenced and assembled in this study. The average genome size of 255 isolates was 2.1 Mb, with an average GC content of 37.88%, an average N50 of 1.12 Mb, and an average depth of 912× ([57]Table S1). Based on the genomic data of 243 O.oeni uploaded to the NCBI database, the genome size ranges from 1.7 to 2.5 Mb, with an N50 ranging from 4.7 kb to 2.0 Mb. The GC content is between 37.5% and 38.5%. The sequencing coverage of the bacterial genomes typically ranges from 30× to 150×, with coverage exceeding 150× can enhance the accuracy of genome data ([58]13). Our genomic data fall within these reasonable ranges. The GC content of O. oeni exhibited significant regional differences (P = 4.27e−12, Kruskal-Wallis test) ([59]Fig. 1a). Fig 1. [60]Box plot depicts GC content differences across Hebei, Inner Mongolia, Ningxia, Shaanxi, and Shandong. Pie chart depicts most genes in cloud. Bar graph depicts gene functions by highest count in unknown category. [61]Open in a new tab Genomic characterization and functional annotation of O. oeni. (a) Boxplot illustrating the GC content (%) of O. oeni genomes across various provinces of China. Significant differences in GC content were observed between isolates from different provinces (P < 0.05, Kruskal-Wallis test). Pairwise comparisons indicated the following significance levels: *P < 0.05; **P < 0.01; ***P < 0.001 (Wilcoxon tests). Only significant results are presented in the figure. (b) Fan chart depicting the distribution of pan-genomic genes among 255 O. oeni isolates. (c) Functional annotation of the pan-genome utilizing the COG database. Pan-genome genes are categorized into four groups: information processing and storage, cellular processing and signaling, metabolism, and poorly characterized. After pan-genomic analysis of 255 O. oeni isolates, a total of 16,416 pan-genes were identified. Among these, the core genome comprised 1,028 genes, including 840 core genes and 188 soft-core genes ([62]Fig. 1b). The soft-core genes are those present in 95–99% of the isolates, while the core genes are those found in more than 99% of the isolates. The pan-genome fit curve exhibited an increasing trend. The fit of Heaps' law with an exponent of γ = 0.37 indicated that the pan-genome of O. oeni was open ([63]Fig. S1). The exponent γ of the cumulative curve of the core genome was close to 0, indicating that the core genome was in a stable state. Among the 10,906 pan-genes annotated by the COG database, except for genes of unknown function, the majority are involved in metabolism, cellular processes, and signaling, as well as information processing and storage ([64]Fig. 1c). Analysis of the strain diversity Pairwise average nucleotide identity (ANI) is widely used in genomic studies as a reliable metric for determining species boundaries. An ANI value exceeding 95% generally suggests that the organisms belong to the same species ([65]14). The PSU-1 strain (assembly accession [66]NC_008528.1) was the first O. oeni strain to have its whole-genome sequenced. Over the years, it has been extensively studied, with high-quality data and comprehensive annotations, making it a widely used reference genome in O. oeni research. In this study, we used PSU-1 as the reference strain and compared it with all the isolates. The ANI values ranged from 98.88% to 99.85%, all exceeding 95%, thereby confirming that the isolates all belong to O. oeni. Lorentzen et al. [67]1 classified O. oeni strains from wine, cider, and kombucha into four phylogroups (A, B, C, and D) based on the core genome phylogenetic tree. To assign the 255 O. oeni isolates used in this study to these phylogroups, we randomly selected 45 strains from the data set of Lorentzen et al. ([68]Table S2), which represent all four phylogroups. These strains served as references to construct a