Abstract Despite the increasing knowledge about factors shaping the human microbiome, the host genetic factors that modulate the skin-microbiome interactions are still largely understudied. This contrasts with recent efforts to characterize host genes that influence the gut microbiota. Here, we investigated the effect of genetics on skin microbiota across three different skin microenvironments through meta-analyses of genome-wide association studies (GWAS) of two population-based German cohorts. We identified 23 genome-wide significant loci harboring 30 candidate genes involved in innate immune signaling, environmental sensing, cell differentiation, proliferation and fibroblast activity. However, no locus passed the strict threshold for study-wide significance (P < 6.3 × 10^−10 for 80 features included in the analysis). Mendelian randomization (MR) analysis indicated the influence of staphylococci on eczema/dermatitis and suggested modulating effects of the microbiota on other skin diseases. Finally, transcriptional profiles of keratinocytes significantly changed after in vitro co-culturing with Staphylococcus epidermidis, chosen as a representative of skin commensals. Seven candidate genes from the GWAS were found overlapping with differential expression in the co-culturing experiments, warranting further research of the skin commensal and host genetic makeup interaction. Subject terms: Microbiome, Genetics research __________________________________________________________________ Microbiome composition is influenced by genetics, although the specific host genetic factors responsible are not well known. Here, the authors performed a genome-wide meta-analysis to discover host genetic effects on skin microbiota and finding potential causal effects of microbiota composition on skin diseases. Introduction Human-associated microbial communities show individual-specific variation shaped by a multitude of factors^[58]1,[59]2. For skin in particular, the bacterial community composition is strongly influenced by host characteristics, such as skin microenvironment, sex, age and body mass index (BMI), and to a lesser extent by lifestyle and environmental expositions^[60]3. The genetic influence of the host on skin microbiome composition and diversity was suggested by findings indicating heritability of up to 56.4% for single taxonomic branches of skin commensals in twins^[61]4. Furthermore, host genetics and skin microbiota interactions haven been suggested by studies including targeted genes^[62]4 and in the context of inflammatory diseases, such as atopic dermatitis^[63]5. Nevertheless, the influence of host genetics on the skin microbiome is largely understudied and no dedicated genome-wide association study (GWAS) of host genetics and the bacterial community inhabiting the skin has been performed so far. This strongly contrasts with what is known about the human gut microbiota, where a variety of associated genomic loci and pathways has been identified by large GWAS^[64]6,[65]7. Together, these gut microbiome-based GWAS have not only suggested how human molecular mechanisms modulate the microbiome but also indicated the consequences of such modulation to the host health and disease. Therefore, we aimed to study the effects of genetics on skin microbiota across skin microenvironments through meta-analyses of GWAS of two German cohorts. To investigate the putative influence of the skin microbiota in skin diseases we applied Mendelian randomization (MR) analysis. Finally, putative effects of the skin microbiome members on the expression of candidate genes identified by GWAS were tested using normal human epidermal keratinocytes cultured with the common skin bacterium Staphylococcus epidermidis. Results and discussion A total of 1656 skin samples from participants of two cross-sectional, population-based German cohorts, KORA FF4 (n[Individuals] = 324) and PopGen (n[Individuals] = 273)^[66]8,[67]9 were analyzed. Skin samples were taken from dry [dorsal and volar forearm (PopGen)], moist [antecubital fossa (KORA FF4 and PopGen)] and sebaceous [retroauricular fold (KORA FF4) and forehead (PopGen)] skin microenvironments (Fig. [68]1a–c, Supplementary Table [69]1). Microbial community profiles were obtained from sequencing of the V1-V2 regions from the 16 S ribosomal RNA (rRNA) gene (see Methods). Genome-wide association analyses were conducted on univariate relative abundances of individual bacteria (amplicon sequence variants; ASVs) and non-redundant taxonomic groups ranging from genus to phylum levels (79 in total; see Methods). Additionally, multivariate community composition (i.e., beta diversity as captured by Bray-Curtis dissimilarity) was analyzed for association with host genetic variation. The umbrella term “microbial feature” will henceforth be used in this article for all 80 analyzed input data. Fig. 1. Characteristics of KORA FF4 and PopGen cohorts. [70]Fig. 1 [71]Open in a new tab a Female (orange) and male (blue) composition of cohorts. b age and, c body mass index (BMI) distribution in cohorts. d ordination of skin microbiome profiles based on Bray-Curtis dissimilarity and principal coordinates analysis. Samples (n = 1,656) were colored by the skin site and represent dry [dorsal (D.) forearm (n = 260) and volar (V.) forearm (n = 251)], moist [antecubital (A.) fossa (n = 318 in KORA FF4, n = 258 in PopGen)] and sebaceous [seb.; forehead (n = 252) and retroauricular (R.) fold (n = 317)] microenvironments. Cohort names were abbreviated, PopGen (P) and KORA FF4 (K). Marginal boxplots are shown to visualize sample distributions along axes. The boxplot area represents the interquartile range (IQR) divided by the median. Lines extend to a maximum of 1.5 × IQR beyond the area. Points are outliers. Percentage of variation explained by each axis is shown in parentheses. We tested the association of microbial features with variation in 4,685,714 human autosomal single nucleotide polymorphisms (SNP), accounting for main confounders of the skin microbiota (age, sex and BMI) and genetic background of study participants (see Methods)^[72]3,[73]7. Cohort-wise association results were combined in a meta-analysis framework according to skin microenvironment, justified by the observed similarity of the microbiota profiles of samples from the same microenvironment (Fig. [74]1d). To assure robustness of association results, only loci with genome-wide significance (P[Meta] < 5 × 10^−8) and with nominal significance in both cohorts (P < 0.05) were further considered (see Methods for details). A total of 23 loci showed a genome-wide significant association with skin microbial features, of which 22 were linked to univariate features (Table [75]1 and Fig. [76]2a). However, none of these passed the strict threshold for study-wide significance (P < 6.3 × 10^−10 for 80 features included in the analysis, see Methods). Most of the associations were found in moist skin microenvironment (n = 11), followed by dry (n = 7) and sebaceous (n = 5) (Fig. [77]2b). There was a tendency for a higher number of associations found in deeper taxonomic levels: the highest number of significant associations were found at the ASV level (n = 8), followed by genus level (n = 6; Fig. [78]2c). Of all microbial features deeper than family level, features within the genus Staphylococcus were associated with most loci (n = 5; Fig. [79]2d). Bayesian fine-mapping or linkage disequilibrium (LD) structure prioritized 462 genetic variants as potentially causal (Supplementary Data [80]1). A total of 30 genes were found of interest for containing potentially causal variants and/or because these variants were significantly associated with the gene expression in skin tissue from the GTEx portal^[81]10 (Table [82]1). Of these, 27 were protein coding genes, one an rRNA pseudogene and two long-non coding RNA (lncRNA) genes (Supplementary Data [83]2). Most of the protein coding genes were expressed in skin tissue and found expressed in different cell types in skin in datasets from previous studies (see Methods for details^[84]11,[85]12) (Fig. [86]3). In the next section, we will explore the genes of interest with functional roles related to the host-microbiome interface. Table 1. Results summary of GWAS for microbial features ID Chr Position rsID Effect allele EAF Other allele Microenv. Feature N (total) Beta ± s.e. P value Genes 1 2 17323400 rs1396075 T 0.76 A Moist c.Gammaproteobacteria 563 0.37 ± 0.067 3.4×10^−08 ■ 2 2 43812831 rs12466030 A 0.66 G Moist a.ASV070 [Veillonella (unc.)] 226 0.583 ± 0.103 1.7×10^-08 THADA 3 3 9164097 rs2664121 T 0.73 G Dry g.Micrococcus 402 −0.33 ± 0.112 4.3×10^-09 ENSG00000269886,SRGAP3 4 3 12514124 rs709165 G 0.54 A Moist a.ASV006 [S. hominis] 398 0.379 ± 0.069 4.0×10^-08 MKRN2,MKRN2OS,RAF1,RNA5SP123,TSEN2 5 4 3266916 rs2159173 A 0.93 T Sebaceous a.ASV093 [Staphylococcus (unc.)] 276 −0.957 ± 0.159 1.8×10^-09 HTT,MSANTD1,RGS12 6 4 55057749 rs55702239 G 0.77 A Dry o.Bacteroidales,g.Bacteroides 349 −0.43 ± 0.103 3.5×10^-08 FIP1L1,PDGFRA 7 5 14584609 rs152620 A 0.79 T Moist g.Acinetobacter 454 −0.444 ± 0.081 3.7×10^-08 OTULINL 8 6 69060156 rs9445997 T 0.62 C Sebaceous g.Staphylococcus 569 −0.328 ± 0.06 4.0×10^-08 ■ 9 6 93109029 rs2757026 C 0.51 T Moist f.Clostridiales_Incertae_Sedis_XI 363 −0.404 ± 0.072 2.2×10^-08 ■ 10 6 144022040 rs9484795 T 0.77 C Dry g.Anaerococcus 352 −0.41 ± 0.132 3.4×10^-08 PHACTR2 11 7 57369680 rs11762959 A 0.55 G Moist o.Lactobacillales 489 0.348 ± 0.064 4.5×10^-08 ■ 12 7 57369974 rs7791487 A 0.56 T Moist f.Streptococcaceae 458 0.378 ± 0.064 3.4×10^-09 ■ 13 8 125763112 rs59379063 T 0.92 A Sebaceous a.ASV093 [Staphylococcus (unc.)] 279 −0.733 ± 0.133 3.2×10^-08 ■ 14 9 126919126 rs10121400 T 0.64 A Moist o.Burkholderiales 509 −0.354 ± 0.063 2.4×10^-08 ■ 15 11 106233170 rs17105612 G 0.94 A Moist a.ASV013 [S. epidermidis] 361 −0.928 ± 0.165 1.8×10^-08 ■ 16 12 96938142 rs12423627 T 0.93 C Sebaceous a.ASV002 [Staphylococcus (unc.)] 568 0.653 ± 0.113 6.9×10^-09 CFAP54 17 12 101194600 rs4764996 G 0.93 A Dry a.ASV013 [S. epidermidis] 348 −1.2 ± 0.204 2.1×10^-08 ANO4 18 13 33581388 rs1543797 T 0.75 C Dry Beta-diversity 511 ■ 3.2×10^-08 ■ 19 13 38067575 rs12583353 A 0.89 G Dry g.Paracoccus 372 −0.94 ± 0.144 5.6×10^-10 ■ 20 14 33629140 rs17100281 G 0.94 A Moist a.ASV021 [Micrococcus (unc.)] 331 −0.969 ± 0.176 3.7×10^-08 NPAS3 21 16 28056516 rs8049083 C 0.72 A Sebaceous a.ASV004 [Corynebacterium (unc.)] 505 −0.378 ± 0.068 2.8×10^-08 GSG1L 22 17 5341050 rs2472614 G 0.89 C Dry a.ASV086 [A. johnsonii] 255 −1.08 ± 0.186 4.7×10^-08 C1QBP,DERL2,DHX33,ENSG00000263272,MIS12,NUP88,RABEP1,RPAIN 23 19 48742067 rs6509364 C 0.65 T Moist f.Rhodobacteraceae 377 −0.415 ± 0.072 8.5×10^-09 CARD8,TMEM143,ZNF114 [87]Open in a new tab Single variant association tests were performed for each sample type and each microbial feature. Tests were adjusted for age, sex, body mass index (BMI) and genetic background (first ten genetic principal components). Positions are given as in genome assembly hg19 (GRCh37). Effect allele frequency (EAF) and total sample number (N) for the meta-analysis (sample pairs for dry) are shown. Results from moist and sebaceous skin sites were combined by meta-analysis (using METAL software for beta diversity and METASOFT software for univariate microbial features) and considered significant when P[Meta] value were genome-wide significant (P[Meta] < 5 × 10^−8) and data sets were nominal significant (P < 0.05). Loci from dry data sets were considered significant when at least one data set resulted in genome-wide significance (lowest P value shown as P[Meta]) and the other in nominal significance. Meta-analyses were weighted by sample size for multivariate microbial feature and by inverse variance for univariate features. Effect sizes (β) and its standard error (s.e.) from meta-analyses are shown for moist and sebaceous. Effect size and standard error from tests with volar forearm are shown for dry skin. Tests were two-sided. Candidate causal variants were identified by fine-mapping or based on LD > 0.6 to the lead genetic variant. Genes with variants within their region (no formatted font) or with variants associated with their expression (italic font) are shown. Genes are shown in bold font when both conditions are met. Microbial features are prefixed with their level, amplicon variant sequence (a.), genus (g.), family (f.), order (o.), class (c.) or phylum (p.). Association with rs55702239 in dry sites have been identified with the non-redundant features o.Bacteroidales and g.Bacteroides. For simplicity, only statistics related to the genus level is shown in this case. ENSG00000263272 is a novel transcript, antisense to RPAIN and ENSG00000269886 is a novel transcript, antisense to TTLL3. Fig. 2. Results from the GWAS. [88]Fig. 2 [89]Open in a new tab a Manhattan plot of per skin microenvironment meta-analysis. Lowest P value of each position is shown and identified by locus ID and rsID. Meta-analysis P values were obtained using the software METAL and METASOFT or by combining P values from data sets that originated from dry skin sites, see Methods. Significant positions are colored according to skin microenvironment and listed, where leading genetic variant, protein coding genes selected by fine-mapping as containing possible causal variants and microbial features are reported. Table [90]1 contains the list of loci characteristics and genes. b Count of significantly associated loci per microenvironment. c Level of microbial features with highest number of significant associations. d Sub-family features with the highest number of significant associations. Fig. 3. Expression of human genes associated with the skin microbiome in public databases. [91]Fig. 3 [92]Open in a new tab Candidate protein coding genes were selected by GWAS in skin. Upper panel shows the normalized transcriptional expression of genes in skin tissue. Data are from Human Protein Atlas version 20.1^[93]11, which additionally includes data sets from the GTEx^[94]10 and the Functional annotation of the mammalian genome (FANTOM5)^[95]68 projects. Bottom panel shows candidate gene expression in different skin cell types. Single-cell expression was normalized by cell type. Genes differently expressed in each cell type in comparison with the others are highlighted. Displayed log-normalized gene expression data and differential expression analyses are retrieved from Solé-Boldo et al.^[96]12. Candidate genes were mapped by gene symbol. Host functions associated with the human skin microbiota Genetic variants associated with the skin bacteria were localized in genes related to pathogen sensing and regulation of response to pathogens. C1QBP (locus id: 22, lead variant rs2472614, P[Meta] = 4.7 × 10^−8, associated with ASV086 [Acinetobacter johnsonii]), for instance, encodes the complement component 1, q subcomponent binding protein (C1qBP, a.k.a. gC1q-R/p33) and is abundantly expressed in keratinocytes (Fig. [97]3^[98]12). C1qBP is an ubiquitous, multi-ligand, multifunctional and multicompartmental protein, which also acts as endothelial receptor to plasma proteins from the complement and kinin/kallikrein systems and is a marker for epithelial cell proliferation^[99]13,[100]14. C1qBP binds to microbial proteins^[101]15, including Staphylococcus aureus protein A^[102]16, and therefore, is suggested to play a role in both the response to and pathogenesis of microbes^[103]17. Additionally, DHX33, (locus id: 22, same locus containing gene C1QBP), and CARD8 (locus id: 23, rs6509364, P[Meta] = 8.5 × 10^−09, associated with the Rhodobacteraceae family) encode proteins which regulate inflammasome activity, which in turn regulate innate immunity caspase 1 activation^[104]18. DHX33 activates the NLRP3 inflammasome after sensing cytosolic RNA derived from viruses, bacteria or achaea^[105]19,[106]20. CARD8 is structurally related to NLRP1, a sensor component of the NLRP1 inflammasome, and has been shown to activate caspase 1 activity in resting T cells and is a negative regulator of NLRP3 inflamasome^[107]21,[108]22. Together, these results suggest that innate immune components carrying out sensing and regulatory activities may be involved in shaping the human skin microbiota. Associated genetic variants were also localized at genes HTT (locus id: 5, rs2159173, P[Meta] = 1.8 × 10^−09, associated with ASV093 [Staphylococcus (uncl.)]) and CFAP54 (locus id: 16, rs12423627, P[Meta] = 6.9 × 10^−^09, associated with ASV002 [Staphylococcus (uncl.)]), which encode proteins required for cilia formation in mammalian cells^[109]23,[110]24 and expressed in different cell types in skin (Fig. [111]3). Further, we found SNPs that were associated with the expression of the transcript ENSG00000269886 (locus id: 3, rs2664121, P[Meta] = 4.3 × 10^−09, associated with the genus Micrococcus). Interestingly, the effector alleles of all of these (rs2664121, rs2075337, rs2543492, rs1300250) were associated with the decrease in both tissue expression of ENSG00000269886 and relative abundance of the genus Micrococcus (the GTEx portal^[112]10 and Supplementary Data [113]1). ENSG00000269886 is an lncRNA antisense to the gene TTLL3, which regulates cilia assembly across eukaryotes^[114]25,[115]26. Skin cells do not have motile cilia. Thus, it is likely that these genes are related to primary cillium, an organelle at the cell surface that senses extracellular signals, such as chemo-mechanical signals, osmolarity, pH, oxygen and light^[116]27. Primary cillium is found in various skin cells such as keratinocytes, fibroblasts, melanocytes and Langerhans cells^[117]28. Its formation is influenced by the dynamics of the actin cytoskeleton^[118]29, which is regulated by SRGAP3 encoded protein (locus id: 3)^[119]30. Together, these results suggest that extracellular sensing through primary cilium may be involved in the regulation of the skin microbiota. Additional associations were observed with SNPs located in genes involved in cellular differentiation and proliferation. These were RAF1 (locus id: 4, rs709165, P[Meta] = 4.0 × 10^−08, associated with ASV006 [Staphylococcus hominis])^[120]31–[121]33 and RGS12 (locus id: 5, rs2159173)^[122]34–[123]36, the latter found abundantly expressed in keratinocytes^[124]12 (Fig. [125]3). Furthermore, SNPS in PDGFRA (locus id: 6, rs55702239, P[Meta] = 3.5 × 10^−08) were associated with order Bacteroidales and genus Bacteroides. PDGFRA is abundantly expressed in fibroblasts^[126]12 (Fig. [127]3) and participates in cellular maintenance^[128]37 and extracellular matrix production^[129]38. Keratinocyte proliferation, differentiation and function as well as innate immune signaling are major forces contributing to the complex function of the skin barrier. Therefore, it is conceivable that the discovered GWAS associations may represent links between the skin barrier and members of the skin microbiota. Expression of candidate genes by keratinocytes co-cultured with Staphylococcus epidermidis To gain insights in the putative participation of the identified candidate genes in the molecular interaction with the skin bacteria, we analyzed the in vitro transcriptional profile of normal human epidermal keratinocytes co-cultured with S. epidermidis, an abundant commensal in human skin^[130]39. Transcriptional profiles (six replicates) of keratinocytes from the foreskin of a 0-year-old male donor co-cultured with the S. epidermidis ATCC 14990 strain clearly differed from the profiles of controls, keratinocytes that were not co-cultured with bacteria (Fig. [131]4a). The S. epidermidis ATCC 14990 strain is a well characterized laboratory strain which is close to the strains found in the skin of the participants of the two cohorts studied. This proximity is suggested by the observation of 100% overlap and identity with the full length of ASV002 [Staphylococcus (uncl.)] amplicon sequence (307 base pairs), the second most abundant ASV in the whole database (~10% of rarefied sequences) and the most abundant ASV assigned to Staphylococcus genus. Fig. 4. GWAS genes expressed by keratinocytes co-cultured with Staphylococcus epidermidis. [132]Fig. 4 [133]Open in a new tab a Bacteria were added to two-dimension keratinocyte cultures, which were cultivated in six replicates, two per weekly batch. First and second dimensions of principal component analysis of gene expression after variance stabilizing transformation (vst) are shown. Differential expression analysis was performed to compare the expression of keratinocyte genes in culture with and without S. epidermidis. b Enrichment of biological processes mapped to Gene Ontology (GO) was performed on differentially expressed genes [q < 0.05 (derived from Wald test on negative binomial generalized linear models) and absolute logarithmic (log2) fold change >1]. Top ten lowest adjusted P values (Fisher exact test) of each up and down regulated processes are shown, ordered by number of detected genes. Large subunit ribosomal ribonucleic acid (LSU-rRNA), transfer RNA (tRNA) and noncoding RNA (ncRNA) are abbreviated. c Change in the transcription of GWAS selected genes which were differentially expressed are shown. Approximate posterior estimation for generalized linear model (apeglm^[134]78) shrinkage was applied to effect size (log2 fold change). Error bars represent posterior standard deviation. A total of 4134 genes were differentially regulated (Supplementary Data [135]3), suggesting a strong transcriptional response of human keratinocytes to S. epidermidis ATCC 14990 strain in vitro. According to pathway enrichment analysis (Supplementary Data [136]4), the most significant biological processes upregulated were related to immune response, including cytokine-mediated and innate immune responses, as well as response to virus and symbionts (Fig. [137]4b). On the other hand, ribosomal biogenesis, processing of ribosomal RNA and non-coding RNA were among the most significantly down regulated biological processes. In this scenario, a quarter of the candidate genes (n = 7) were differentially expressed (q < 0.05 and absolute log2 fold change >1) when comparing cultures with and without S. epidermidis ATCC 14990 (Fig. [138]4c). Based on knockout mouse macrophage cells, the deficiency of C1QBP protein increases the DNA sensor cyclic GMP-AMP (cGAMP) synthase-induced innate immune response^[139]40. Here, we observed the downregulation of C1QBP transcription associated with the upregulation of genes belonging to innate immune response (Fig. [140]4b, c), which sides with our GWAS suggestion that this gene may play a role in the regulation of skin bacteria via innate immunity. On the other hand, DHX33 was downregulated, contrasting to its role in innate immunity via activation of NLRP3, which transcript was upregulated (Fig. [141]4c and see Supplementary Data [142]3). It is thus likely that the reduced expression of DHX33 in our assays may be associated with the role of DHX33 in rRNA synthesis via positive regulation of transcription by RNA polymerase I^[143]41, being both pathways downregulated (Fig. [144]4b, c, Supplementary Data [145]4). Genes coding for SRGAP3 and TTLL3, of which the lncRNA antisense gene was implied by GWAS, were upregulated (Fig. [146]4c and Supplementary Data [147]3). These observations support our discovered association of primary cilium and skin bacteria. However, it is important to bear in mind that the encoded proteins are not exclusively related to primary cilium, and their expression in our assays may also be related to other structures, e.g., cytoskeleton in the case of SRGAP3^[148]30, and processes, e.g., proliferation in the case of TTLL3^[149]26. Finally, the know role of PDGFRA in fibroblast activity are not directly translated to keratinocytes^[150]37,[151]38. Therefore, the consequences of S. epidermidis-induced in vitro upregulation of PDGFRA in keratinocytes remain to be investigated. Our in vitro experiment is explorative in nature and is limited to its reductionist approach: it consists of two-dimensional co-cultures of isolated keratinocytes and a single S. epidermidis laboratory strain. It is well known that the immunomodulatory effects of S. epidermidis depend on the specific strain, and that there is a large S. epidermidis strain level variation. Thus, it is not possible to directly extrapolate our preliminary functional results to an eventual keratinocyte response to skin commensals in vivo. A panel of commensal strains as well as in vitro models closer to the skin physiology, such as three-dimensional human skin models^[152]42, are necessary to uncover the functional dynamics of the host-commensal cellular interactions. Nevertheless, our assays allowed for the observation of the transcriptional regulation of several GWAS selected genes, being a starting point for functional investigations of the roles of these genes in the interaction with the skin microbiota. Influence of skin microbiota on non-infectious skin diseases Summary statistics of univariate microbial features were used as exposures in 2-sample mendelian randomization (MR; see Methods) to assess their influence on non-infectious skin diseases. A total of eight comparisons passed the per-trait suggestive threshold (q[(trait)] value <0.05, Fig. [153]5), although no comparison passed the global threshold (q[(global)] value <0.05; Supplementary Data [154]5). MR results indicated the influence of staphylococci in dermatitis/eczema (Staphylococcus genus, β = 1.5 × 10^−03), and further, modulating roles of Flavobacteriaceae in two allergy-related traits with microenvironment-specific effect direction (β[Moist] = 8.8 × 10^−04; β[Dry] = 1.1 ×10^−03). Additional results suggested involvement of Staphylococcus ASVs in psoriasis (ASV012 [Staphylococcus hominis], β = 4.0 × 10^−04), seborrhoeic keratosis (ASV010 [Staphylococcus (uncl.)], β = 7.2 × 10^−04) and vitiligo (ASV012 [Staphylococcus hominis], β = 1.2 × 10^−04). Potential protective effects of staphylococci in allergic rhinitis were also suggested (ASV012 [Staphylococcus hominis], β = −1.1 × 10^−03). It is noteworthy that these are likely coagulase-negative staphylococci, which are typical members of the skin microbiota^[155]43. However, the ASV-level signals from the MR were only weakly or inconclusively supported by the sensitivity analysis (see Methods, Supplementary Data [156]5). Together, our findings suggest that members of the microbiota may modulate the health-disease balance in skin. Fig. 5. Results from Mendelian Randomization analysis. Fig. 5 [157]Open in a new tab All exposure-to-outcome pairs with q[trait)] <0.05 in the inverse-variance weighted 2-sample MR are shown. Error bars represent standard error. Microbial features are prefixed with their level, amplicon variant sequence (a.), genus (g.) and family (f.). Additional details and sensitive analyses can be found in Supplementary Data [158]5. In summary, we conducted the first genome-wide association analysis dedicated to the human skin microbiota and identified 23 genome-wide significant loci. The combination of samples from different skin microenvironments of participants from two independent German cohorts allowed for robust results, despite the rather small number of included participants. The candidate genes have functions related to innate immune signaling, environmental sensing, cell differentiation, proliferation and fibroblast activity. Keratinocyte cultures challenged with a laboratory strain of S. epidermidis indicated regulation of seven candidate genes identified by GWAS, providing preliminary evidence that GWAS selected genes may be transcriptionally regulated by skin commensals. MR analysis further supported that specific skin microbiota features might have causal roles in the development of atopic dermatitis, but also suggested modulation of other non-infections skin diseases. It needs to be considered that, despite our efforts to integrate information from different molecular levels and databases to understand the exact mechanisms by which the variants influence candidate gene function(s) and or expression and how this influences the skin microbiome, further and advanced experiments are needed. Likewise, it would be important to systematically establish differences in cutaneous gene expression with skin type, skin physiology and across age groups. Nevertheless, our results suggest a close interaction of the host genetic makeup and associated skin microbiomes. Furthermore, our findings point to the skin microbiota as a target for disease prevention and management, with potential for the development of personalized treatments for non-infectious, inflammatory skin conditions. Methods Cohorts’ description, genotyping, imputation and harmonization PopGen cohort participants were randomly recruited via the local population registry in Kiel, Germany, and as blood donors of the University Hospital Schleswig-Holstein, Campus Kiel^[159]9. Genotypes derived from the Affymetrix Genome-Wide Human single nucleotide polymorphism (SNP) Array 6.0 were quality controlled following a previously established protocol^[160]44 and using the IKMB GWAS Quality Control Pipeline ([161]https://github.com/ikmb/gwas-qc). Briefly, variants with excess missing data (>2%) and/or that deviated from Hardy-Weinberg Equilibrium [HWE, False Discovery Rate (FDR^[162]45) P value <10^−5] were excluded. Samples with high missing data (>2%), high overall increased/decreased heterozygosity rates (i.e., ±5 standard deviation from the sample mean) and related individuals with a PLINK^[163]46 PI_HAT score >0.1875 were removed. To assess population structure, we performed a principal components analysis (PCA) including individuals of the 1000 Genomes Phase3 ref. [164]47 and removed outlier individuals not matching a European ancestry. Imputation was performed with the Michigan Imputation Server^[165]48 (Reference Panel: HRCr1.1 2016 (GRCh37/hg19); Array Build: GRCh37/hg19; rsq filter: off; Phasing Eagle 2.4 (phased output); Population: EUR; Mode: Quality Control & Imputation) and was followed by removal of monomorphic variants. These steps were performed following the miQTL cookbook instructions ([166]https://github.com/alexa-kur/miQTL_cookbook#chapter-2-genotype-im putation). KORA FF4 cohort participants from the youngest age group (39-48 years) that were previously genotyped as part of KORA S4 Survey were recruited from the southern German city of Augsburg and its two surrounding counties^[167]8. Genotyping and genotyping imputation were performed by the KORA Study Center. Briefly, genotypes were derived from the Affymetrix Genome-Wide Human SNP Array 6.0 (KORA F4). Samples with missing data (>3%), mismatch with phenotypic and genetic gender and high heterozygosity rates (i.e., ±5 standard deviation from the sample mean) were removed. Samples were also checked for European ancestry, population outliers and compared with other existing genotype data of the same individual within the KORA cohort. Variants with excess missing data (>2%), deviating from HWE (P value <5 × 10^−10) and Minor allele frequency (MAF) (<2%) were removed. Prephasing was done with SHAPEIT v2^[168]49 and imputation with IMPUTE v2.3^[169]50 (reference panel: 1000 Genomes Phase 3 integrated variant set release in NCBI build 37). To harmonize both genotype datasets, resulting VCF (PopGen) and IMPUTE output (KORA FF4) files were converted to PLINK format using PLINK v1.9^[170]46. Participants that had their skin microbiota profiled (see section below) were selected and variants with MAF < 5% were removed. Genotype Harmonizer v 1.4.23 was used to update the KORA FF4 allele reference based on the PopGen data. SNPs with missingness >10% and non-biallelic SNPs were removed from PopGen data using PLINK v2.0-alpha-avx2-20200217. PopGen SNPs were references to set alleles in