Abstract A bioinformatic analysis is a promising approach to understand the relationship between the vast tumor microbiome and cancer development. In the present study, we studied the relationships between the intratumoral microbiome and classical clinical risk factors using bioinformatics analysis of the Cancer Genome Atlas (TCGA) and the Cancer Microbiome Atlas (TCMA) datasets. We used TCMA database and investigated the abundance of microbes at the genus level in solid normal tissue (n = 22) and the primary tumors of patients with head and neck squamous cell carcinoma (HNSCC) (n = 154) and identified three major tumor microbiomes, Fusobacterium, Prevotella, and Streptococcus. The tissue level of Fusobacterium was higher in primary tumors than in solid normal tissue. However, univariate and multivariate analyses of these 3 microbes showed no significant effects on patient survival. We then extracted 43, 55, or 59 genes that were differentially expressed between the over and under the median groups for Fusobacterium, Prevotella, or Streptococcus using the criteria of >2.5, >1.5, or >2.0 fold and p < 0.05 in the Mann-Whitney U test. The results of a pathway analysis revealed the association of Fusobacterium- and Streptococcus-related genes with the IL-17 signaling pathway and Staphylococcus aureus infection, while Prevotella-associated pathways were not extracted. A protein-protein interaction analysis revealed a dense network in the order of Fusobacterium, Streptococcus, and Prevotella. An investigation of the relationships between the intratumoral microbiome and classical clinical risk factors showed that high levels of Fusobacterium were associated with a good prognosis in the absence of alcohol consumption and smoking, while high levels of Streptococcus were associated with a poor prognosis in the absence of alcohol consumption. In conclusion, intratumoral Fusobacterium and Streptococcus may affect the prognosis of patients with HNSCC, and their effects on HNSCC are modulated by the impact of drinking and smoking. Keywords: Intratumor microbiome, Oral bacteria, Prevotella, Fusobacterium, Streptococcus, Head and neck cancer, RNA sequencing, TCGA, TCMA 1. Introduction Head and neck cancer (HNC) is a malignant tumor that develops in the paranasal sinuses, nasal cavity, oral cavity, pharynx, salivary glands, and larynx [[41]1]. It is the seventh most prevalent cancer worldwide [[42]2] and more than 90 % of cases of head and neck squamous cell carcinoma (HNSCC) originate from squamous cells of the mucous membrane [[43]3]. The 5-year survival rate of HNC is <50 % [[44]4,[45]5]. Smoking, the consumption of alcohol, and human papillomavirus (HPV) infection are important risk or carcinogenic factors for HNSCC [[46]6,[47]7]. Furthermore, studies on oral bacteria on tumor surfaces and in saliva have shown that several types of bacteria are associated with cancer [[48]8], which suggests that the oral microbiome residing in the oral cavity and pharynx is another risk factor for HNSCC. Indeed, oral commensal bacteria from the genera Streptococcus, Rothia, Fusobacterium, Haemophilus, and Prevotella are frequently identified in HNSCC [[49][9], [50][10], [51][11]]. Although bacteria were detected in tumors in the 19th century, its implication has not been examined in detail since then [[52]12,[53]13]. Due to recent advances in omics analyses and various technologies, a relationship between cancer and the microbiome has been reported for the majority of cancers, including colorectal cancer, skin cancer, breast cancer, bone cancer, cervical cancer, esophageal cancer, prostate cancer, stomach cancer, kidney cancer, lung cancer, and HNC, and this is a rapidly developing field [[54]13,[55]14]. The intratumoral microbiome enters the tumor site via hematogenous spread from the mouth, gut, or tumors [[56]13]. However, HNSCC, particularly oral and pharyngeal cancers, are in close proximity to the oral cavity, and are always exposed to the oral microbiome. Therefore, the bacterial abundance of tumor-associated oral microorganisms in HNSCC is expected to be higher than in other carcinomas and most strongly affects components of the tumor microenvironment (TME) [[57]15]. The Cancer Genome Atlas (TCGA) is a large-scale cancer genome project that was started in 2006 in the United States and has comprehensively analyzed genome methylation and gene and protein expression aberrations in more than 20 cancer types [[58][16], [59][17], [60][18]]. TCGA reported a comprehensive genomic characterization of HNC in 2015 [[61]19]. Many omics analyses using TCGA database have been performed on HNC, and we also conducted a transcriptome analysis of gene expression induced by starvation in HNSCC in relation to prognosis and Porphyromonas gingivalis-infected cells and demonstrated the potential of PLAU as a prognostic biomarker using this database [[62]20,[63]21]. In contrast to conventional studies using mucosal swabs and saliva, a microbiome study on a tumor tissue and/or database is a relatively new research field. The Cancer Microbiome Atlas (TCMA) database, a curated and decontaminated collection of the microbial compositions of oropharyngeal, esophageal, gastrointestinal, and colorectal tissues, has been published, which allows for analyses of the pan-cancerous relationship between the microbiome and tumorigenesis [[64]22]. It has promoted research on the intratumor microbiome of HNSCC, including a comprehensive analysis of the intratumor microbiome, the essential involvement of Fusobacterium in the immune microenvironment under inflammatory conditions, the clinical correlation between the intratumor oral microbiome and oral squamous cell carcinoma, and potential novel microbial markers, such as intratumoral Leptotrichia [[65]15,[66][23], [67][24], [68][25], [69][26], [70][27]]. A previous study that extracted RNA sequencing data from TCGA investigated the relationships between the bacterial and fungal landscapes of HNSCC and HPV infection, smoking, and drinking habits [[71]7]. We also reported that TCGA-HNSCC patients with sub-median levels of Leptotrichia in the intratumoral microbiome had a poorer prognosis [[72]25]. Although many studies have recently been published on the association between HNSCC and intratumor bacteria [[73]15,[74][23], [75][24], [76][25], [77][26], [78][27]], the findings obtained are not always consistent, even for the same bacterial species, and at this stage no conclusions can be drawn on how the intra-tumor microbiome may act. Therefore, the relationship between the major intratumoral microbiota of HNSCC and the impact of its classical prognostic factors on patient survival were examined herein, and differentially expressed genes in tumor cells associated with infections by the major microbes were also investigated. The present results suggest the impact of intratumoral Fusobacterium and Streptococcus on the prognosis of HNSCC in relation with the consumption of alcohol and smoking. 2. Materials and methods 2.1. Data collection from TCGA and TCMA databases RNA-Seq count data (HTSeq version) on TCGA-HNSCC (499 primary tumor samples and 45 solid normal tissue samples) were obtained from the GDC Data Portal [[79]28] (accessed on March 20, 2019.) with Subio Platform [[80]29] software ver 1.24.5859 (Subio Inc. Aichi, Japan). The intratumor microbiome compositions of 177 TCMA-HNSCC samples (155 primary tumor samples and 22 solid normal tissue samples) at the genus level were collected from TCMA database [[81]30] (accessed on 13 July 2023.). A total of 154 patients in both the TCGA and TCMA datasets were examined ([82]Fig. 1A). Fig. 1. [83]Fig. 1 [84]Open in a new tab Abundance of three major microorganisms in solid normal tissue and primary tumors of HNSCC patients (A) The extraction schedule for patients in both the TCGA and TCMA databases. (B) Abundance of Fusobacterium, Prevotella, and Streptococcus, the 3 top microbes at the genus level. (C) Comparison of the top 3 microbes at the genus level in solid normal tissue and the primary tumors of HNSCC patients. Differences were considered to be significant at p < 0.05. 2.2. Filtering of TCMA genus microbes Microbes were filtered using Subio Platform [[85]29] software ver 1.24.5859. From the downloaded TCMA data, it was the 221 microbes at the genus level, but there were too many to analyze, so there were still 48 microbes at 0.01 or less, so 18 microbes were extracted by reducing it to 0.1 or less. Eighteen microbes remained, and we focused on three using mean relative abundance. The rates of the other microbes were summed and labeled as “Other.” 2.3. Kaplan-Meier survival analysis and Cox proportional hazards model TCGA-HNSCC 154 primary tumor samples were divided into two groups for each of the 18 filter-passed microbes: a rate over and under the median. Kaplan-Meier survival curves were generated using Subio Platform software to compare the results for groups above and below the median for each microorganism. Interactions with known risk factors were confirmed using a Kaplan-Meier curve and the Log-rank test with gene expression as a factor and stratification by known risk factors. A Cox proportional hazards model that included significant interaction terms was constructed to confirm the effects of interactions after adjustments for confounding factors. We examined the relationships between classical prognostic factors, namely, the consumption of alcohol (yes: n = 110, no: n = 40), smoking (yes (cigarettes 2.82 ± 1.86/day): n = 85, no: n = 69), the HPV status (positive: n = 42, negative: n = 111), sex (male: n = 112, female: n = 42), lymph node metastasis (yes: n = 79, no: n = 72), and tumor size (T1-2: n = 61, T3-4: n = 92), and the abundance of each microbe. 2.4. Extraction of genes differentially expressed between over and under the median groups for Fusobacterium, Prevotella, and Streptococcus The RNA-Seq data processing method were performed according to our previous study [[86]25]. The Subio Platform ([87]https://www.subioplatform.com/info_technical/293) was used for normalization and preprocessing of the RNA-Seq data. In this platform, low signal cut-off processing is a measure to align the lower limit of count with the lower end of the signal range to prevent falsely detecting differentially expressed genes (DEGs) due to the measurement values in the noise range. It is not recommended to set a constant cutoff value for all data. To prevent false detection of DEGs due to measurements within the noise range, the cut-off values of 50 and 32 were set in this study. RNA-Seq count data at the 90th percentile were normalized, non-zero counts less than 50 were replaced with 50, and 0 as the low signal cut-off was replaced with 32. Normalized counts were converted to log2 ratios against the average of solid normal tissue samples. Genes with counts that were too low (count <50 in all samples) or too stable (log2 ratios between −1 and 1 in all samples) were excluded. 2.5. Functional pathway and protein-protein interaction (PPI) analyses For selected genes, the Database for Annotation, Visualization, and Integrated Discovery (DAVID) server was used to examine the molecular pathways of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. GO enrichment was performed at three main levels: cellular components (CC), biological processes (BP), and molecular functions (MF). Based on the STRING online database ([88]https://string-db.org/accessed on 29 February 2024.), a PPI network was constructed using these genes. Next, we visualized the most important modules in the PPI network. 2.6. Statistical analysis Data were plotted in boxplots, and comparisons between two groups were performed using the Student's t-test with Microsoft Excel (Microsoft, Redmond, WA, USA) according to our previous study [[89]25]. The effects of Fusobacterium, Prevotella, and Streptococcus population rates (Under the median vs. Over the median) on all-cause mortality within 5 years were assessed using a Cox proportional hazards model analysis, with Fusobacterium, Prevotella, and Streptococcus population rates as the independent variable. Multivariate models were constructed adjusting for known risk factors, including age, gender, HPV status, alcohol intake, smoking (number of cigarettes per day), and the M, N, and T stages, which were also used as independent variables. Double log plots were generated to confirm the proportional hazard nature of Fusobacterium, Prevotella, and Streptococcus population rates. Selected known risk factors shall not include those with significant collinearilty. SPSS version 24.0 for Windows (IBM Japan, Tokyo, Japan) was used for statistical analyses. P-values were two-tailed and values < 0.05 indicated a significant difference. 3. Results 3.1. Microbiome profiling of top 3 microbes at the genus level A total of 154 patients in the TCGA and TCMA datasets were selected and we planned to select major microbes from the 221 microbes defined at the TCMA gene level ([90]Fig. 1A). The top 3 microbes in primary tumors were Prevotella (25.0 %), Fusobacterium (17.9 %), and Streptococcus (9.9 %) ([91]Fig. 1B). For other, the results were Actinomyces (2.1 %), Aggregatibacter (0.7 %), Alloprevotella (1.9 %), Campylobacter (2.0 %), Capnocytophaga (4.8 %), Granulicatella (0.4 %), Haemophilus (4.0 %), Lactobacillus (1.0 %), Leptotrichia (1.7 %), Mycoplasma (0.6 %), Neisseria (2.3 %), Porphyromonas (2.3 %), Rothia (1.0 %), Treponema (4.9 %), and Veillonella (3.4 %) ([92]Fig. 1B). The most population rates of Prevotella, Streptococcus, and Fusobacterium in normal tissue were 29.0, 12.1, and 6.9 %, respectively ([93]Fig. 1B). We then investigated differences in the population rates of Fusobacterium, Prevotella, and Streptococcus at the genus level between solid normal tissue (n = 22) and the primary tumors (n = 154) of HNSCC patients. Differences were observed in the tissue microbiome profiles of Fusobacterium between solid normal tissue and primary tumors ([94]Fig. 1C), with higher abundance in tumors than in solid normal tissue. 3.2. Cox regression analysis of relationships of top 3 microbes at the genus level and classical prognostic factors affecting survival in TCGA-HNSCC patients The top 3 microbes and classical risk factors, including sex, HPV, smoking, drinking, age, and TNM stage as independent variables, using supplementary material ([95]Table S1) were subjected to univariate and multivariate analyses (Cox proportional hazard model) of all-cause mortality within 5 years. The population rates of Fusobacterium, Prevotella, and Streptococcus at the genus level were divided into two groups: a rate over and under the median. Double log plots were performed and the results confirmed that the population rates of Fusobacterium, Prevotella, and Streptococcus were proportional hazards. In the univariate analysis, Fusobacterium Over the median (vs. Under the median) was HR = 0.722, 95 % CI = 0.442–1.178, p = 0.192. Prevotella Over the median (vs. Under the median) was HR = 1.589, 95 % CI = 0.971–2.600, p = 0.065. Streptococcus Over the median (vs. Under the median) was HR = 1.037, 95 % CI = 0.637–1.689, p = 0.883 ([96]Table 1). In addition, the multivariate analysis showed that Fusobacterium Over the median (vs. Under the median) was HR = 0.884, 95 % CI = 0.497–1.574, p = 0.676. Prevotella Over the median (vs. Under the median) was HR = 1.720, 95 % CI = 0.991–2.987, p = 0.054. Streptococcus Over the median (vs. Under the median) was HR = 0.988, 95 % CI = 0.573–1.706, p = 0.996. These results indicate that the abundance of these three bacteria was not associated with the prognosis of the TCGA-HNC patients examined. Table 1. Univariate and multivariate analyses of all-cause mortality within 5 years of top 3 selected microbial species in TCGA-HNSCC patients. Univariate __________________________________________________________________ Multivariate __________________________________________________________________ HR 95 % CI P-value HR 95 % CI P-value Fusobacterium_Over (vs. Under) 0.722 0.442 – 1.178 0.192 0.884 0.497 , 1.574 0.676 Prevotella_Over (vs. Under) 1.589 0.971 – 2.600 0.065 1.720 0.991 , 2.987 0.054 Streptococcus_Over (vs. Under) 1.037 0.637 – 1.689 0.883 0.988 0.573 , 1.706 0.966 Age (per 1 year) 1.000 0.979 – 1.021 0.993 1.009 0.984 , 1.033 0.492 Sex_male (vs. female) 0.825 0.491 – 1.388 0.469 0.805 0.434 , 1.494 0.491 HPV status_Positive (vs. Negative) 0.693 0.384 – 1.253 0.225 0.515 0.263 , 1.008 0.053 Alcohol_history_Yes (vs. No) 1.302 0.736 – 2.304 0.365 1.350 0.667 , 2.733 0.404 Cigarettes per day_>0 (vs. 0) 1.308 0.797 – 2.146 0.288 1.366 0.780 , 2.393 0.275 M stage_m1 (vs. m0) 8.793 1.166 – 66.316 0.035 11.016 1.206 , 100.589 0.033 N stage (Continuous variable per 1) 1.093 0.966 – 1.236 0.158 N stage (Category) Lymph node metastasis no 1.000 ref 1.000 ref Lymph node metastasis yes 1.388 0.842 , 2.289 0.198 1.482 0.853 , 2.576 0.163 T stage (Category) T1-2 1.000 ref 1.000 ref ≥T3 1.250 0.748 , 2.092 0.394 1.276 0.707 , 2.305 0.419 [97]Open in a new tab HR: hazard ratio; 95 % CI: 95 % confidence interval; ref: reference value; Over: Over the median; Under: Under the median. Bold type indicates p < 0.05. 3.3. Extraction of genes differentially expressed between over and under the median groups for Fusobacterium, Prevotella, and Streptococcus To establish whether intratumoral Fusobacterium, Prevotella, and Streptococcus affect gene expression in HNSCC cells, genes with expression levels that were higher or lower in tumors than in solid normal tissue were analyzed. We divided the population rates of Fusobacterium, Prevotella, and Streptococcus at the genus level into the following two groups: a rate over and under the median. To facilitate our functional pathway and PPI analyses, the appropriate number of genes to be extracted was approximately 50. When the fold change was examined at >1.5, >2.0, and >2.5 and p < 0.05 in the Mann-Whitney U test, the extracted genes were 503, 117, and 43 for Fusobacterium, 55, 4, and 0 for Prevotella, and 381, 59, and 14 for Streptococcus, respectively. By using the criteria of >2.5, >1.5, or >2.0 fold and p < 0.05 in the Mann-Whitney U test, 43, 55, or 59 genes were extracted, respectively, determining approximately 50 up- and down-regulated genes in tumor cells between the over and under the median groups ([98]Fig. 2A–C). Heat maps also showed 43, 55, and 59 genes that were differentially expressed between the over and under the median groups ([99]Fig. 2D–F), the patterns of which differed for each of the genes extracted for Fusobacterium, Prevotella, and Streptococcus ([100]Fig. 2D–F). Fig. 2. [101]Fig. 2 [102]Open in a new tab Extraction of genes related to Fusobacterium, Prevotella, and Streptococcus (A) Forty-three genes that were differentially expressed between the over and under the median groups for Fusobacterium were extracted using the criteria of >2.5 fold and p < 0.05 in the Mann-Whitney U test. (B) Fifty-five genes that were differentially expressed between the over and under the median groups for Prevotella were extracted using the criteria of >1.5 fold and p < 0.05 in the Mann-Whitney U test. (C) Fifty-nine genes that were differentially expressed between the over and under the median groups for Streptococcus were extracted using the criteria of >2.0 fold and p < 0.05 in the Mann-Whitney U test. Heat maps and the hierarchical clustering of extracted genes related to Fusobacterium (D), Prevotella (E), and Streptococcus (F). The vital status of TCGA-HNSCC patients and the rate of each microbe are coded as follows: alive ( Image 1 ), dead ( Image 2 ), over the median ( Image 3 ), and under the median ( Image 4 ). 3.4. Functional and PPI analyses of Fusobacterium-, Prevotella-, and Streptococcus-related genes To investigate the mechanisms associated with these microbes, functional and PPI analyses of differentially expressed genes were performed. Biological properties and potential signaling pathways were examined using GO terms and KEGG pathway analyses. The following enriched terms were common to Fusobacterium and Streptococcus in the GO enrichment analysis: antimicrobial humoral immune response mediated by antimicrobial peptide, cellular response to UV-A, cornification, epidermis development, extracellular matrix disassembly, intermediate filament organization, keratinocyte differentiation, keratinization, positive regulation of antibacterial peptide production, and proteolysis. Only epidermis development was common to Fusobacterium, Prevotella, and Streptococcus ([103]Fig. 3A–C). The KEGG analysis revealed that Fusobacterium and Streptococcus-related genes correlated with the IL-17 signaling pathway and Staphylococcus aureus infection, while pathways associated with Prevotella were not extracted ([104]Fig. 3D and E). The PPI analysis showed that these genes formed a dense network; density decreased in the order of Fusobacterium, Streptococcus, and Prevotella ([105]Fig. 4A–C). Extracted genes densely related to Fusobacterium were CNFN, CRCT1, DEFB4A, DSG1, IL36G, IL36RN, KLK5, KLK7, KLK8, KLK10, KRT1, KRT14, KRT16, KRT6B, KRT6C, KRT75, KRTDAP, LCE3D, PI3, S100A12, S100A7, S100A7A, SBSN, SPRR1A, SPRR1B, SPRR2A, SPRR2B, SPRR2D, SPRR2E, SPRR2F, SPRR2G, and TGM1. Extracted genes densely related to Streptococcus were CASP14, CRCT1, DEFB4A, DSC1, DSG1, FOXA1, KLK5, KLK7, KLK8, KRT1, KRT14, KRT15, KRT19, KRT6C, KRT75, KRTDAP, LCE3D, LCE3E, NTRK2, NTS, PI3, S100A7, S100A7A, SBSN, SPRR2B, and SPRR2G. The network formation of up-regulated genes (color-coded red) was characteristic in Fusobacterium, ([106]Fig. 4A), while that of down-regulated genes (color-coded blue) was observed in Streptococcus ([107]Fig. 4C). Fig. 3. [108]Fig. 3 [109]Fig. 3 [110]Open in a new tab Functional analyses of Fusobacterium-, Prevotella-, and Streptococcus-related genes (A) GO terms identified in a GO enrichment analysis of 43 genes that were differentially expressed between the over and under the median groups for Fusobacterium and extracted using the criteria of >2.5 fold and p < 0.05 in the Mann-Whitney U test are shown. (B) GO terms identified in a GO enrichment analysis of 55 genes that were differentially expressed between the over and under the median groups for Prevotella and extracted using the criteria of >1.5 fold and p < 0.05 in the Mann-Whitney U test are shown. (C) GO terms identified in a GO enrichment analysis of 59 genes that were differentially expressed between the over and under the median groups for Streptococcus and extracted using the criteria of >2.0 fold and p < 0.05 in the Mann-Whitney U test are shown. BP, biological process; CC, cellular composition; MF, molecular function. (D) Molecular pathways identified in a KEGG pathway enrichment analysis of 43 extracted genes related to Fusobacterium are shown. (E) Molecular pathways identified in a KEGG pathway enrichment analysis of 59 extracted genes related to Streptococcus are shown. Fig. 4. [111]Fig. 4 [112]Open in a new tab Protein–protein interaction analyses of Fusobacterium-, Prevotella-, and Streptococcus-related genes (A) Proteins encoded by 43 extracted genes related to Fusobacterium were subjected to a PPI network analysis. (B) Proteins encoded by 55 extracted genes related to Prevotella were subjected to a PPI network analysis. (C) Proteins encoded by 59 extracted genes related to Streptococcus were subjected to a PPI network analysis. Genes with fold changes (over/under the median) > upper limit (2.5, 1.5, or 2.0), and lower limit (0.4, 0.66, or 0.5) are coded in red ( Image 5 ) and blue ( Image 6 ), respectively. Gray ( Image 7 ) was used for all other cases. (For interpretation of the references to color in this figure legend, the