Abstract Oral squamous cell carcinoma is a prominent cancer worldwide, particularly in Taiwan. By integrating omics analyses in 50 matched samples, we uncover in Taiwanese patients a predominant mutation signature associated with cytidine deaminase APOBEC, which correlates with the upregulation of APOBEC3A expression in the APOBEC3 gene cluster at 22q13. APOBEC3A expression is significantly higher in tumors carrying APOBEC3B-deletion allele(s). High-level APOBEC3A expression is associated with better overall survival, especially among patients carrying APOBEC3B-deletion alleles, as examined in a second cohort (n = 188; p = 0.004). The frequency of APOBEC3B-deletion alleles is ~50% in 143 genotyped oral squamous cell carcinoma -Taiwan samples (27A3B ^−/−:89A3B ^+/−:27A3B ^+/+), compared to the 5.8% found in 314 OSCC-TCGA samples. We thus report a frequent APOBEC mutational profile, which relates to a APOBEC3B-deletion germline polymorphism in Taiwanese oral squamous cell carcinoma that impacts expression of APOBEC3A, and is shown to be of clinical prognostic relevance. Our finding might be recapitulated by genomic studies in other cancer types. __________________________________________________________________ Oral squamous cell carcinoma is a prevalent malignancy in Taiwan. Here, the authors show that OSCC in Taiwanese show a frequent deletion polymorphism in the cytidine deaminases gene cluster APOBEC3 resulting in increased expression of A3A, which is shown to be of clinical prognostic relevance. Introduction Oral cancer is a common cancer worldwide. Approximately 300,373 new cases were diagnosed in 2012, making oral cancer a growing health concern^[82]1, [83]2. Oral squamous cell carcinoma (OSCC), which is the major subtype of oral cancer, accounts for >90% of all cases of oral cancer. In Taiwan, OSCC is a prevalent malignancy that represents the fourth most common cancer affecting males (M:F = 10.8:1; 6308:582 new cases in 2012, per the Taiwan Cancer Registry^[84]3). In addition to the known risk behaviors of cigarette smoking and alcohol drinking, Taiwanese men often indulge in the additional risk behavior of betel nut chewing^[85]4, [86]5. However, human papilloma virus (HPV) infection, which is another risk factor, is found at a much lower rate in Taiwan (~10%) compared with worldwide (~24 %)^[87]6–[88]8. Despite the unique epidemiology of this disease, the contribution of the genetic background to its incidence or progression has not yet been explored in the Taiwanese population. Genome-wide analyses of the head and neck cancer data reported by The Cancer Genome Atlas (TCGA)^[89]9–[90]11 and an India-based research group^[91]12 revealed shared mutation spectrums, particularly in genes associated with cancer development (e.g., TP53, NOTCH1, FAT1, CASP8, and PIK3CA) and copy number variations (e.g., deletion of CDKN2A and amplification of FADD). These somatic mutations are likely to arise from distinct origins, and may thus generate unique signatures in combination^[92]13, [93]14. For head and neck cancer, three prevalent mutational signatures have been identified: age-associated signature 1B (~24.9%), APOBEC (Apolipoprotein B mRNA Editing enzyme, Catalytic polypeptide)-associated signature 2 (~35.2%), and smoking-associated signature 4 (~34.8%)^[94]13. The APOBEC genes encode members of a superfamily of cytidine deaminases that deaminate the cytidines of DNA and RNA to uracil, and have been implicated in diverse biological functions, including innate immunity and viral restriction^[95]15–[96]17. The human genome encodes 11 APOBEC genes, seven of which (APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H) form the APOBEC3 gene cluster at 22q13. A germline APOBEC3B (A3B) deletion polymorphism forms the fusion gene, APOBEC3A_B (A3A_B), which comprises the APOBEC3A (A3A) coding region and the APOBEC3B (A3B) 3′UTR. This polymorphism is rare in Africans and Europeans (0.9% and 6%, respectively) but common in East Asians, Amerindians, and Oceanics (36.9%, 57.7%, and 92.9%, respectively)^[97]18. Recent studies in TCGA tumors uncovered elevated expression of A3B in breast, bladder, cervix, lung (adenocarcinoma and squamous cell carcinoma), and head and neck cancers, and addressed the role of A3B in mediating genomic mutations^[98]19, [99]20. However, no significant correlation was found between the clinical outcome of breast cancer and the APOBEC3B-deletion polymorphism^[100]19. Both A3A and A3A_B were found to exhibit stronger cytidine deaminase activity than A3B in an in vitro study^[101]21, and breast cancer patients harboring A3B deletions were found to have more APOBEC-dependent mutations than patients lacking an A3B deletion^[102]22. In this study, we extend our previous targeted sequencing effort in profiling OSCC mutations^[103]23 to a comprehensive and systems-based interrogation of mutational landscapes and their pathological consequences in Taiwanese OSCC patients (OSCC-Taiwan). We carry out exomic and transcriptomic analyses of samples from a first cohort, and further evaluate our findings in a larger second cohort of patients for which we have complete clinical outcome information. These analyses allow us to compile a comprehensive mutational landscape for OSCC-Taiwan and identify novel associated mutations and alterations. In particular, we find that the APOBEC3B-deletion polymorphism is prevalent in the Taiwanese population, and that the corresponding expressional alteration is a key factor in determining the outcome of OSCC in these patients. Taken together, the results from our integrated analyses provide insights into the genetic basis of disease-associated alterations of gene expression in OSCC, and suggest novel powerful biomarkers that may prove useful for precision medicine. Results Mutational landscape for OSCC-Taiwan To uncover genomic alterations that could prove useful as molecular markers for OSCC in Taiwan, we recruited patients who were ethnically Taiwanese and had been admitted to Chang Gung Memorial Hospital at Linkuo. We performed whole-exome sequencing (WES) of matched tumor/PBMC DNAs from 50 OSCC patients (75-Mbp target region, mean depth = 244 ± 54×), and RNA sequencing of matched tumor/adjacent normal tissues from 39 of the 50 patients (mean read count = 38.7 ± 15.2 million; Fig. [104]1a, Supplementary Fig. [105]1, Supplementary Tables [106]1, and Supplementary Data [107]1–[108]2). A subset of the identified somatic mutations was validated in 49 tumor tissues using an IonAmpliSeq Comprehensive Cancer Panel (CCP; 409 cancer-related genes). We obtained an average coverage of 1165 ± 220×. One tumor sample was missed due to problems with DNA availability. A detailed workflow for our sequencing experiments and data analysis is shown in Supplementary Fig. [109]1. Fig. 1. Fig. 1 [110]Open in a new tab An integrated deep-sequencing approach identifies novel variant features underlying OSCC of a unique demographic origin. Summary data for the 50 OSCC-Taiwan cases. The four blocks correspond to the different types of data attributes. They represent, from top to bottom: a Mutation analyses in a series of 50 OSCC samples. The y-axis shows the number of mutation events and the omics data (DNA exome sequencing, RNA RNA-Seq, CCP comprehensive cancer panel), whereas the x-axis indicates the samples of the individual patients. b Heatmap representation of individual genes exhibiting somatic mutations in the 50 OSCC samples. The q-values (false discovery rates) represent the significance of each mutated gene, as determined using MutSigCV. c Heatmap representation of the copy number variations, compared with those from TCGA and India. SNVs were identified with Mutect, which applies a Bayesian classifier to detect mutations with allelic fractions of 0.1 or less ( < 10%). For the number of mutation events, the mutation types are broken down by the indicated sequence features. For the mutation b and copy number analyses c, the tables on the right show the percentages of patients with the respective somatic sequence variation or amplification/deletion, as found in the OSCC-Taiwan (TW), OSCC-TCGA (TCGA), and OSCC-India (India) cohorts. NA data not available. d The risk exposure and A3B deletion genotypes. OSCC patients with the habits of alcohol, betel nut or cigarette are individually marked. For 3 APOBEC3B genotypes, A3B ^−/−, A3B ^+/−, and A3B ^+/+ are shown with full, half and empty squares, respectively In total, our exome sequencing analyses uncovered 24,051 somatic single-nucleotide variants (SNVs) and InDels (Supplementary Data [111]3 and Supplementary Fig. [112]2a). The targeted approach of CCP enabled us to independently validate 84.91% of the SNVs identified by our WES (Supplementary Data [113]4 and Supplementary Fig. [114]2b). We cross-referenced our data with that of 172 oral cavity cancer samples (OSCC-TCGA) from the Head-Neck Squamous Cell Carcinoma (HNSC) project of TCGA (HNSC-TCGA) and 106 oral cavity cancers from India (OSCC-India), as annotated in the International Cancer Genome Consortium (ICGC). We found that our patient cohort was characteristic of OSCC with regard to frequently mutated genes, such as TP53, FAT1, NOTCH1, PIK3CA, CDKN2A, and HRAS. Interestingly, however, our analysis also identified somatic mutations in genes that turned out to be unique (CENPV) or locally prevalent (DHRS4, RASA1, and SETD8; mutated in ≥ 10% of the Taiwanese patients) (Fig. [115]1b, Supplementary Table [116]2, and Supplementary Fig. [117]3). We further used our exome sequencing data to examine copy number alterations in our samples (Fig. [118]1c and Supplementary Fig. [119]4a). Our findings largely recapitulated the profile found in OSCC-TCGA (Supplementary Fig. [120]4b), which included significantly amplified/deleted regions encompassing genes such as EGFR, FGFR1, CCND1, FADD, FAT1, and CDKN2A (Supplementary Data [121]5 and Supplementary Data [122]6). However, we also uncovered new amplification peaks at 9p24.2 and 14q11.2, and new deletion peaks at 5q35.3, 6q21, 17q12, 12p13.31, 5q31.1, and 11p15.4 (Supplementary Fig. [123]4a vs. Supplementary Fig. [124]4b). A summary of the genomic mutational landscapes of all samples, including somatic mutations, copy number variations (CNVs), mutational signatures, and the correlations of these changes with each patient’s clinical information, is presented in Fig. [125]1. Comparative analysis of the transcriptomes of tumor and normal tissues allowed us to identify 3548 genes as being differentially expressed in OSCC-Taiwan; these included 1220 upregulated and 2328 downregulated genes (Supplementary Fig. [126]5). Notably, 91.3% of these differentially expressed genes (DEGs) were similarly identified in the OSCC-TCGA data set, indicating that the overall expression profile of OSCC is highly similar across ethnic groups. When we performed pathway analysis using MetaCore, we found that the identified DEGs were enriched in pathways related to cell adhesion, cytoskeletal remodeling, and immune responses (Supplementary Data [127]7), providing hints into the pathological causes of OSCC. APOBEC mutational signatures correlate with elevated A3A Previous pan-cancer analyses in 7042 cancers of various types defined distinct mutational signatures as being genetic hallmarks of different cancer types^[128]13, [129]14. By applying a similar algorithm to data for the 20,963 somatic SNVs discovered herein, we identified three mutational signatures as being enriched in OSCC-Taiwan: the APOBEC-associated signature (signature 2/13; 40%), the age-associated signature (signature 1A/1B; 37%), and the smoking-associated signature^[130]8 (signature 5; 23%) (Figs. [131]2a and [132]2b). We then compared our data for these signatures with those from the 32 cancer projects archived in the TCGA/ICGC data portal, using the somatic SNVs identified in OSCC-TCGA as a reference. Notably, the APOBEC-associated signature was highly represented in the OSCC-Taiwan and OSCC-TCGA data sets, but to a lesser extent (17%) in the OSCC-India and HNSC-TCGA (31%) data sets (Fig. [133]2c and Supplementary Fig. [134]6). This suggests that the APOBEC-associated signature may be a predominant sequence feature in OSCC. Fig. 2. Fig. 2 [135]Open in a new tab Mutational signature analysis of OSCC-Taiwan. a Three distinct mutational signatures were identified by our OSCC whole-exome sequencing. The spectra of base changes representing APOBEC (signatures 2/13), age (signature 1), and smoking (signature 4/5) are shown. The x-axis indicates the 96 combinations of trinucleotide motifs, while the y-axis represents the relative coefficient of the detected signature. b Heatmap of cosine-similarity results for the mutation spectrums of OSCC-Taiwan, coded by color. The cosine-similarity score, which ranges from 0 to 1, represents the extent of similarity to a particular signature. Among the 27 mutational signatures, the APOBEC, age, and smoking signatures are the most significant mutational signatures detected in OSCC samples (dark red). c The three overrepresented mutation signatures described in a were compared among datasets representing OSCC from Taiwan, India, and TCGA, as well as other TCGA tumor types carrying APOBEC-associated signatures. OSCC-TCGA is a subset of the HNSC (head and neck squamous cell carcinoma) data archived in TCGA. CESC, BLCA, BRCA, ESCA, LUSC, and LUAD correspond to cervical squamous cell carcinoma, bladder urothelial carcinoma, breast invasive carcinoma, esophageal carcinoma, lung squamous cell carcinoma, and lung adenocarcinoma, respectively We next examined whether the observed overrepresentation of this mutation signature might be attributed to tumor-related alterations in APOBEC expression. As shown in Fig. [136]3a, our RNA-Seq results revealed that A3A and A3B were significantly elevated in tumors compared to adjacent normal tissues (p < 0.0001 for both genes), with A3A showing a considerably larger upregulation (the fold changes for A3A and A3B were 8.3 and 3.2, respectively). With respect to the other APOBEC genes, A3G was differentially elevated in tumors (p = 0.02), but its overall expression level was rather low. We further confirmed the significant upregulations of A3A and A3B (p < 0.0001) in a second cohort comprising 188 N/T paired OSCC samples (Supplementary Fig. [137]7 and Supplementary Table [138]3). Importantly, we also detected enrichment of A3A-specific peptides in tumor proteomes, as assessed by iTRAQ-mass spectrometry (Fig. [139]3b and Supplementary Fig. [140]8). Fig. 3. Fig. 3 [141]Open in a new tab APOBEC3 gene expression is altered in OSCC. a The mRNA expression levels of all seven APOBEC3 genes in our paired OSCC samples were determined by RNA-Seq. The y-axis shows the TPM (transcripts per million) of each gene in paired tissue samples. N, adjacent normal tissue; T, tumor tissues. The expression profiles illustrate the tumor-specific up-regulations of A3A, A3B (p < 0.0001, Mann–Whitney U-test), and A3G (p = 0.02, Mann–Whitney U-test). In tumors, A3A was expressed at a higher level than A3B (p < 0.001). b A3A-specific peptides were identified in tissue proteomes using iTRAQ-mass spectrometry. The y axis represents the protein level relative to a common reference. A3A-specific peptides were significantly higher (~3 fold) in tumor tissues compared with corresponding normal tissues (p < 0.001, Mann-Whitney U-test). Of the 38 N/T paired tissues analyzed, we detected A3A peptides in 35 normal tissue samples and 36 tumor tissue samples. c Expression levels of A3A and A3B in different TCGA cancer types. Only cancers reported to have APOBEC-mutation signatures were included in this plot; they are ordered according to the A3A expression levels in their tumors. The y-axis shows the TPM of our RNA-Seq data for the A3A and A3B genes. The HNSC samples were further grouped into OSCC and ‘others’ (HNSC-non-OSCC). UCEC, DLBC, SKCM, STAD, PAAD, THCA, KIRP, CESC, BLCA, BRCA, ESCA, LUSC, and LUAD correspond to uterine corpus endometrial carcinoma, lymphoid neoplasm diffuse large B-cell lymphoma, skin cutaneous melanoma, stomach adenocarcinoma, pancreatic adenocarcinoma, thyroid carcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, bladder urothelial carcinoma, breast invasive carcinoma, esophageal carcinoma, lung squamous cell carcinoma and lung adenocarcinoma, respectively. Box plots show the distribution of expression of indicated APOBEC genes. Boxes extend from the third (Q3) to the first (Q1) quartile, with the line at the median; whiskers extend to 2.5 and 97.5 percentiles To corroborate that APOBEC upregulation is linked with the above-described unique profile of genomic mutations, we analyzed the expression levels of A3A and A3B in cancer types previously reported to be enriched for mutation signatures 2 and 13^[142]13. We found that the expressions of A3A and A3B were generally elevated in these tumors, as previously reported^[143]20, [144]24, and that OSCC exhibited the highest expression level of A3A among them (Fig. [145]3c). Of note, the expression levels of A3A and A3B were both significantly increased (p < 0.001) in the OSCC-TCGA dataset, with A3B showing the greater degree of elevation (Fig. [146]3c). Given that A3A and A3B are reportedly interferon (IFN)-inducible^[147]25–[148]29, this upregulation may reflect the activation of IFN signaling. Indeed, our transcriptome-sequencing data are consistent with this hypothesis (Supplementary Fig. [149]9 and Supplementary Data [150]7). Furthermore, we identified several miRNAs that are downregulated in OSCC (from the TCGA data) and were predicted by TargetScan to target the A3B 3′UTR^[151]30. We examined the regulatory role of one of them, miR-409, and found that it reduced the activity of a luciferase reporter gene carrying the A3B 3′UTR (Supplementary Fig. [152]10). The accumulation of APOBEC-associated mutations requires inactivation of the DNA repair system (i.e., TP53 mutation), as reported in breast cancer^[153]31. Here, we discovered that the amounts of APOBEC-signature mutations were significantly correlated with the transcript abundance of A3A, but not A3B, in 27 OSCC patients with mutation of TP53 (p = 0.038 and 0.239, respectively; Supplementary Fig. [154]11). As the hypermutation signature of A3A may be distinguished from that of A3B (YTCA vs. RTCA)^[155]32, we also examined their relative incidences in our samples. We found that the YTCA:RTCA and YTCW:RTCW ratios were both about 7:3, and that the value of YTCA/YTCW was significantly higher than that of RTCA/RTCW (Wilcoxon rank sum test, p = 0.0009 and 1.704e-07, respectively; Supplementary Table [156]4), pinpointing A3A as the major mutator contributing to the APOBEC-associated signature in OSCC-Taiwan. Four out of the 50 cases were found to be HPV DNA-positive. However, three of them carried TP53 mutations and all four were E6 transcript-negative (Supplementary Table [157]5)^[158]33–[159]35, suggesting that the mutation signatures were unrelated to HPV infection. Viewed together, the results from our integrated sequencing and expression analyses are in line with a model in which the up-regulation of A3A activity in OSCC-Taiwan contributes to the presence of a dominant mutation signature. Genotyping and expression profiling of the APOBEC locus A previous report described a germline deletion polymorphism at the APOBEC3 locus that removes the coding region of A3B, leading to expression of a variant transcript in which the A3A coding sequence is fused to the 3′UTR of A3B ^[160]21 (termed A3A_B) (Fig. [161]4a). In our 50 OSCC matched samples, we detected germline deletion of the A3B coding sequence in 33 individuals: eight A3B ^−/− individuals (homozygous for the deletion allele) and 25 A3B ^+/− individuals (heterozygous for the deletion allele) (Supplementary Fig. [162]12a). We confirmed our exome sequencing results by PCR using genotype-specific primer sets that distinguished among the A3B ^−/− (a 757-bp PCR product), A3B ^+/− (449-bp and 757-bp), and A3B ^+/+ (449-bp) genotypes (Fig. [163]4a). Our RNA-Seq data further substantiated the presence of a variant transcript corresponding to this genomic polymorphism (Supplementary Data [164]8), and RT-PCR analysis with fusion-sensitive primers independently confirmed the expression of the deletion variant in A3B ^−/− and A3B ^+/− samples (Supplementary Fig. [165]12b). Fig. 4. [166]Fig. 4 [167]Open in a new tab The deletion polymorphism of the A3A-A3B genomic locus and upregulation of A3A and A3B in OSCC-Taiwan. a Schematic depiction of the gene structures and genomic organizations of the deletion polymorphism (top) and non-deleted (bottom) versions of the APOBEC3A-APOBEC3B genomic locus. The deletion variant (APOBEC3A_B genotype) arises from a 29.5-kb genomic deletion spanning from the 3’UTR of A3A to the eighth exon of A3B. PCR-based genotyping analysis was used to distinguish non-deletion and deletion alleles according to the size of the amplified product (449 and 757 bp, respectively). A3B ^+/+, A3B ^+/−, and A3B ^−/− represent non-carrier individuals and those heterozygous and homozygous for the deletion allele, respectively. b Genotype-biased expressional alterations of A3A and A3B in OSCC. Based on RNA-Seq-determined TPM values (y-axis), the mRNA expression levels of A3A (left) and A3B (right) were determined in the initial cohort of 39 paired samples (N, adjacent normal tissues; T, tumor). Patients are grouped according to their APOBEC3B-deletion genotypes, with the number of patients in each group (n) indicated at the top. c Genotype-specific relative expression of A3A in tumor (T) vs. normal tissue (N) samples, as determined by RT-PCR. The y-axis represents the expression level of A3A (or A3B) relative to that of TBP (TATA binding protein gene). Individuals of the A3B ^−/− genotype exhibited a greater tumor-specific upregulation of A3A compared with those of the A3B ^+/− genotype (p = 0.0354) (*p < 0.05; **p < 0.001; ***p < 0.0001, Wilcoxon signed-rank test). Box plots show the distribution of expression of indicated APOBEC genes. Boxes extend from the third (Q3) to the first (Q1) quartile, with the line at the median; whiskers extend to 2.5 and 97.5 percentiles Next, we used our RNA-Seq data to analyze the correlation between the various genotypes and the expression levels of A3A and A3B. As shown in Fig. [168]4b and Supplementary Fig. [169]13, A3A expression was significantly elevated in tumor tissues compared to adjacent normal tissues from all three genotypes; moreover, the level was about 2-fold higher in carriers of the A3B ^−/− genotype compared to those harboring the A3B ^+/− and A3B ^+/+ genotypes. The expression of A3B was also significantly elevated in tumor tissues from patients of all three genotypes, but its expression was about 5- to 20-fold lower than that of A3A. Next, we examined the prevalence and expressional consequence of this deletion polymorphism in the second cohort (143 of 188 individuals were genotyped: A3B ^−/−, n = 27; A3B ^+/−, n = 89; and A3B ^+/+, n = 27). Among these individuals, A3A expression was significantly elevated in all tumor tissues, with higher levels seen in A3B ^−/− individuals vs. those with the A3B ^+/− or A3B ^+/+ genotypes (Fig. [170]4c, left). In contrast, the expression levels of A3B were higher in the tumor tissues compared to normal tissues of A3B ^+/− and A3B ^+/+ genotype individuals, whereas it was not detected in the tumors of A3B ^−/− genotype individuals, as expected (Fig. [171]4c, right). We also analyzed data from 313 TCGA oral cancer samples predominantly obtained from the US; the cases included 278 A3B ^+/+ genotype individuals, 34 A3B ^+/− individuals, and 1 A3B ^−/− individual^[172]22. In this dataset, we failed to detect any significant difference in A3A expression between A3B ^+/− and A3B ^+/+ individuals, while the level of A3B was modestly higher in A3B ^+/+ than A3B ^+/− individuals (Mann–Whitney test, p = 0.756 and 0.014, respectively; Supplementary Fig. [173]14). Viewed together, these results strongly indicate that the differential expression of A3A and A3B in OSCC-Taiwan could be attributed in part to the genetic background of these patients. A3A as a key clinicopathological determinant in OSCC Expressional data obtained from qRT-PCR were used to stratify the patients of our second cohort (n = 188; Supplementary Table [174]3) into high vs. low expression groups, using the median as the cutoff value. We then tested whether A3A expression was associated with overall survival (OS). Kaplan–Meier survival analysis revealed that the 10-year OS rates were 85.1% and 65.9% for the high and low A3A expression subgroups, respectively, and were significantly different based on the log-rank test (p = 0.004; Fig. [175]5a, left). We did not find any significant clinical correlation for A3A expression in the OSCC-TCGA dataset (p = 0.239; Fig. [176]5a, right). The 10-year disease-specific-survival (DSS) rates were also significantly different between the high and low expression groups in the second cohort (85.1% and 65.9%, respectively; p = 0.004 by log-rank test). A multivariate analysis was carried out using age, sex, overall stage, perineural invasion, tumor differentiation, bone invasion, and A3A expression. The results demonstrated that higher overall stage and low A3A expression were associated with poorer prognosis for both OS and DSS, indicating that these represent independent factors for prognostic prediction following OSCC treatment (Table [177]1). In contrast, the expression of A3B did not correlate with either of the 10-year survival rates (Fig. [178]5b). Conversely, no significant correlation between OS and the expression of A3A or A3B was found in the OSCC-TCGA dataset (Figs. [179]5a, right, and [180]5b, right). Fig. 5. [181]Fig. 5 [182]Open in a new tab Kaplan–Meier plot for overall survival (OS) by A3A and A3B expression and genotype in OSCC-Taiwan and OSCC-TCGA. a Kaplan–Meier plot showing that the 10-year OS rates for patient subgroups stratified by high vs. low A3A expression were 85.1% and 65.9%, respectively (p = 0.004) in the OSCC-Taiwan data set, whereas no significant difference was found in the OSCC-TCGA data set (p = 0.239). b No significant difference in OS was found for subgroups stratified according to A3B expression (right) in the OSCC-Taiwan and OSCC-TCGA datasets. c Kaplan–Meier plot for OS in subgroups stratified by A3A expression among the 143 patients in the OSCC-Taiwan data set. High A3A in patients with the A3B ^+/− (n = 89) and A3B ^−/− (n = 27) genotypes, but not the A3B ^+/+ (n = 27) genotype, was significantly correlated with better OS. d Kaplan–Meier plot for OS in subgroups stratified by A3A expression among the 312 patients in the OSCC-TCGA data set. No significant correlation was found in patients with the A3B ^+/+ (n = 278) or A3B ^+/− (n = 34) genotypes. The y-axis shows the probability of OS according to high and low A3A expression. The survival rate was estimated by Kaplan–Meier plotting and compared by log-rank test; all p-values are two-sided, with the significance level set at p < 0.05 Table 1. Multivariate analysis of overall survival (OS) or disease specific survival (DSS) in the second cohort (n = 188) after treatment Characteristics Hazards ratio (OS) (95% confidence interval) p-value (OS) Hazards ratio (DSS) (95% confidence interval) p-value (DSS) Age (median: 50.6)  < 50.6y 1.000 (Reference) 0.321 1.000 (Reference) 0.384  > 50.6y 1.351 (0.745–2.451) 1.342 (0.692–2.602) Gender Male 1.000 (Reference) 0.459 1.000 (Reference) 0.206 Female 0.685 (0.252–1.864) 0.517 (0.186–1.440) Overall pathological stage I–II 1.000 (Reference) 0.022^a 1.000 (Reference) 0.079 III–IV 3.128 (1.172–8.344) 2.460 (0.900–6.723) Perineural invasion No 1.000 (Reference) 0.058 1.000 (Reference) 0.018^a Yes 1.823 (0.979–3.395) 2.329 (1.150–4.716) Cell differentiation ^b W-D + M-D P-D 1.000 (Reference) 0.935 1.000 (Reference) 0.750 1.035 (0.456–2.348) 0.857 (0.332–2.215) Bone invasion No 1.000 (Reference) 0.901 1.000 (Reference) 0.634 Yes 1.042 (0.543–1.999) 0.831 (0.389–1.779) APOBEC3A expression level Low 1.000 (Reference) 0.035^a 1.000 (Reference) 0.028^a High 0.499 (0.261–0.954) 0.444 (0.215–0.917) [183]Open in a new tab Multivariate analyses also adjusted with patients’ age and sex ^astatistically significant ^bW-D: well-differentiated, M-D: moderately-differentiated, and P-D: poorly-differentiated, squamous cell carcinoma Given the putative clinical relevance of the A3A alterations identified in our samples, we evaluated the relationships between A3A expression and various clinicopathological characteristics of the OSCC patients. We found that lower A3A expression was significantly associated with advanced overall stage, positive perineural invasion, and bone invasion (p = 0.006, 0.027, and 0.013, respectively; Table [184]2). However, we failed to establish any association between A3A expression in OSCC tumors and patient age, sex, pT status, pN status, extracapsular spread, differentiation, second primary occurrence, or tumor depth. Of the 188 patients with clinical outcome data, we were able to genotype 143 (Fig. [185]4c). We used these data to evaluate the association of A3A expression and survival among patients grouped by genotype. As shown in Fig. [186]5c and Supplementary Fig. [187]15, high A3A expression in the A3B ^+/− and A3B ^−/− genotypes was associated with better OS (p = 0.018 and 0.009, respectively; Fig. [188]5c), DSS (p = 0.010 and 0.009, respectively; Supplementary Fig. [189]15b and Supplementary Fig. [190]15c), and disease-free survival (DFS; p = 0.016 and 0.001, respectively; Supplementary Fig. [191]15b and Supplementary Fig. [192]15c). In contrast, the A3A level was not associated with survival among individuals of the A3B ^+/+ genotype (p = 0.891, 0.200, and 0.998 for OS, DSS, and DFS, respectively; Fig. [193]5c and Supplementary Fig. [194]15a). Our results indicated that high A3A predicts better OS, DSS, and DFS, whereas A3B expression was not significantly correlated with treatment outcomes in our population. Of note, we did not find such a significant clinical correlation for A3A expression levels in A3B ^+/+ or A3B ^+/− individuals from the OSCC-TCGA dataset (Fig. [195]5d). Considering that the OSCC-TCGA data exhibited an overall less pronounced A3A upregulation together with much less frequent A3B deletion (5.8%), our results further suggest that the high A3A expression in our overall dataset arose mainly from the A3B ^+/− and A3B ^−/− genotypes, which were prominent in OSCC-Taiwan but not OSCC-TCGA. Table 2. The clinicopathological characteristics related to the expression of APOBEC3A in the second cohort (n = 188) Patient categories Case No. APOBEC3A expression level p-value Low (%) High (%) Gender Male 172 87 (46.2) 85 (45.2) 0.794 Female 16 7 (3.72) 9 (4.79) Age ^a 51.2 ± 10.8 (80, 29) 52.4 ± 10.0 (79, 30) 0.327 pT status 1–2 87 39 (20.7) 49 (26.1) 0.109 3–4 100 56 (29.8) 44 (23.4) pN status (−) 97 42 (22.3) 55 (29.2) 0.079 ( + ) 91 52 (27.6) 39 (20.7) Overall pathological stage I–II 56 19 (10.1) 37 (19.6) 0.006^b III–IV 132 75 (39.8) 57 (30.3) ECS ^c (–) 142 68 (36.1) 74 (39.3) 0.396 (+) 46 26 (13.8) 20 (10.6) Cell differentiation ^d W-D + M-D 163 78 (41.4) 85 (45.2) 0.197 P-D 25 16 (8.5) 9 (4.7) Perineural invasion No 103 44 (23.4) 59 (31.3) 0.027^b Yes 85 50 (26.6) 35 (18.6) Second primary occurrence No 174 87 (46.2) 87 (46.2) 1.000 Yes 14 7 (3.72) 7 (3.72) Bone invasion No 137 61 (32.4) 76 (40.4) 0.013^b Yes 51 33 (17.5) 18 (9.57) Tumor depth ^a 15.1 ± 10.0 (48, 1) 14.9 ± 11.1 (55, 2) 0.625 [196]Open in a new tab ^aMean ± SD, median (maximum, minimum) ^bStatistically significant ^cECS: extracapsular spread ^dW-D: well-differentiated, M-D: moderately-differentiated, and P-D: poorly-differentiated, squamous cell carcinoma Discussion To our knowledge, this is the first study to explore the physiological manifestation of constitutional variants among APOBEC genes in OSCC. Although our sample size was limited, our integration of exomic and RNA sequencing data from matched samples, combined with the validation of our findings in a larger cohort of clinical samples, provides a complementary breadth and depth of molecular information. This strategy enabled us to uncover several key novel attributes of OSCC, as follows: (1) This is the first report of APOBEC3B-deletion polymorphisms in a Taiwanese population, comprised mainly of Han-Chinese. Our analysis reveals that allele deletion is frequent in this population (~50% in 143 genotyped patients), emphasizing that ethnicity is key to the genetic basis for this disease. (2) The constitutional deletion of both alleles of the A3B locus (A3B ^−/− genotype) is associated with the highest level of A3A expression, which may arise from altered 3′UTR regulation and enhanced IFN signaling in OSCC. (3) High-level A3A expression is correlated with better outcomes, particularly for patients carrying APOBEC3B-deletion alleles. (4) High-level A3A expression may serve as an independent prognostic biomarker for OSCC. Our findings indicate that the APOBEC3B-deletion genotype is strongly associated with A3A expression and clinical outcome. Given that APOBEC-induced mutations are now known to be prevalent in many cancers, it will be worthwhile to test this association in a larger cohort or in other cancer types in the future. Our RNA-Seq data indicate that the transcript level of A3A is generally at least 10-fold higher than that of A3B in OSCC tumors, particularly among A3B ^−/− genotype individuals. Notably, and in contrast to this tumor-associated up-regulation, the A3A expression levels in normal tissues of all three genotypes were low. This difference is likely to reflect two major tumor-intrinsic alterations. First, APOBEC exhibits IFN inducibility^[197]25–[198]29, which is consistent with its role in antiviral innate immunity^[199]36. Enhanced IFN signaling may thus trigger the upregulation of A3A. Indeed, heightened expression of genes in the IFN signaling pathway was evident in our RNA-Seq data. Moreover, a previous proteomic analysis of OSCC also showed that the IFN pathway is enriched at this level in tumor tissues^[200]37. Second, as A3A was highly expressed in the tumors of individuals carrying the A3B ^−/− and A3B ^+/− genotypes, these polymorphisms may be linked in cis to the elevated expression of A3A. However, it is not yet known how this genomic disruption might alter A3A expression. A recent report found that the transcript for the fusion gene, A3A_B, which links the coding region of A3A to the 3′UTR of A3B, was more stable than the A3A transcript^[201]21, suggesting that miRNAs may be involved. To address this possibility, we used sequence-prediction tools^[202]30 to identify several candidate miRNAs with A3B 3′UTR-targeting potential. Interestingly, most of these miRNAs were downregulated in the oral cavity cancer subset of the TCGA dataset. Among the identified candidates, we validated the suppressive effect of one, miR-409, using a 3′UTR reporter assay. We found that overexpression of miR-409 reduced reporter activity driven by the A3B 3′UTR by 20% but did not affect that driven by the A3A 3′UTR (Supplementary Fig. [203]10). Collectively, our observations are consistent with the notion that, among patients carrying the APOBEC3B-deletion polymorphism, the 3′UTR of the variant A3A_B transcript may confer a new layer of regulation in OSCC that depends on both miRNAs and the disease context. Many quantification methods utilize expectation-maximization algorithms to estimate the maximum likelihood expression levels for multiple mapped reads. Special care should be taken, however, when using RNA-Seq data to quantify the expression levels of A3A, A3A_B, and A3B in samples carrying the APOBEC3B-deletion genotype. As A3A_B and A3B are identical in their 3′UTR sequences, the reads for A3A_B could be mis-mapped to A3B and wind up being assigned to A3B in A3B ^−/− samples. As A3A_B is not presently annotated in the human genome references, it may not have been considered in previous analyses