Abstract Background Oral squamous cell carcinoma (OSCC) is one of the most common malignant tumors worldwide. Patients with poorly differentiated OSCC often exhibit a poor prognosis. AUNIP (Aurora Kinase A and Ninein Interacting Protein), also known as AIBp, plays a key role in cell cycle and DNA damage repair. However, the function of AUNIP in OSCC remains elusive. Methods The differentially expressed genes (DEGs) were obtained using R language. Receiver operating characteristic curve analysis was performed to identify diagnostic markers for OSCC. The effectiveness of AUNIP in diagnosing OSCC was evaluated by machine learning. AUNIP expression was analyzed in publicly available databases and clinical specimens. Bioinformatics analysis and in vitro experiments were conducted to explore biological functions and prognostic value of AUNIP in OSCC. Findings The gene integration analysis revealed 90 upregulated DEGs. One candidate biomarker, AUNIP, for the diagnosis of OSCC was detected, and its expression gradually increased along with malignant differentiation of OSCC. Bioinformatics analysis demonstrated that AUNIP could be associated with tumor microenvironment, human papillomavirus infection, and cell cycle in OSCC. The suppression of AUNIP inhibited OSCC cell proliferation and resulted in G0/G1 phase arrest in OSCC cells. The survival analysis showed that AUNIP overexpression predicted poor prognosis of OSCC patients. Interpretation: AUNIP could serve as a candidate diagnostic and prognostic biomarker for OSCC and suppression of AUNIP may be a potential approach to preventing and treating OSCC. Fund Taishan Scholars Project in Shandong Province (ts201511106) and the National Natural Science Foundation of China (Nos. 61603218). Keywords: Oral squamous cell carcinoma, Receiver operating characteristic curve, Biomarker, Weighted gene co-expression network analysis, Survival analysis, AUNIP __________________________________________________________________ Research in context. Evidence before this study Patients with poorly differentiated oral squamous cell carcinoma (OSCC) often have a poor prognosis. Tumor biomarkers can highlight biological differences among cancers and help predict treatment outcomes. However, tumor biomarkers with 100% sensitivity and specificity have not been discovered so far. Although a variety of potential biomarkers of OSCC have been discovered using the bioinformatics method, their verifications failed in large samples. AUNIP, also known as AIBp, is demonstrated to regulate mitotic entry and mitotic spindle assembly during cell cycle. However, the role of AUNIP in solid tumors such as OSCC remains elusive. Added value of this study Our research creatively integrated microarray data of OSCC with relative large samples from publicly available databases and clinical specimens. Bioinformatics analysis and in vitro experiments were carried out to speculate the potential biological functions and clinical significance of AUNIP in OSCC. We found that AUNIP was overexpressed in OSCC tissues and increased AUNIP expression predicted poor clinical outcomes in OSCC patients. The decreased expression of AUNIP caused an abnormity of OSCC cell cycle transition. In addition, AUNIP was associated with tumor microenvironment and HPV infection in OSCC. Implications of all the available evidence The findings of our research suggest that AUNIP could serve as a potential biomarker and the inhibition of AUNIP may be a new strategy for the treatment of OSCC. Integrating AUNIP with well-known biomarkers to detect OSCC and predict treatment outcomes could promote clinical practice. Alt-text: Unlabelled Box 1. Introduction Cancer morbidity and mortality have experienced a gradual increase all over the world. There were approximate 18·1 million cancer cases and 9·6 million cancer deaths worldwide in 2018, with oral cancer accounting for 2% and 1·9% respectively [[41]1,[42]2]. Oral squamous cell carcinoma (OSCC), exhibiting high morbidity and malignancy, is the most common type of oral cancer [[43]3,[44]4]. It normally involves the front two thirds of the tongue, gingiva (gums), floor of mouth, lips, alveolar ridge, buccal mucosa, hard palate, and retromolar trigone ([45]https://www.cancer.gov/). The excision of primary tumor with or without dissection of lymph nodes is the routine treatment of OSCC [[46]4,[47]5]. Most patients with OSCC are in advanced clinical stage at the time of diagnosis [[48]6]. Although the clinical multidisciplinary collaboration and sequential therapy can improve the prognosis [[49]7], the 5-year survival rate of OSCC patients remains about 50% and 25%–50% of them will suffer from local relapse and distant metastasis after treatment [[50]8,[51]9]. It is thus necessary to search for effective diagnostic and prognostic biomarkers for OSCC. A cancer biomarker is defined as a substance or a process that is indicative of the presence of cancer in the body ([52]https://en.wikipedia.org). There have been many papers reporting the discovery of OSCC biomarkers, but only few biomarkers have been validated and successfully applied in routine clinical practice [[53]10]. Moreover, most biomarkers possess limitations for the early detection of OSCC, and the prognostic value of them is plagued by inaccuracies [[54]11]. By taking advantage of bioinformatics analysis, it is more likely to overcome deficiencies in the known biomarkers. Bioinformatics analysis allows researchers to delve into integrated data of numerous clinical samples from different independent studies, which provides data infrastructure for discovering promising biomarkers and updating our understanding about cancers [[55]6,[56]12,[57]13]. The aim of this study was to explore potential diagnostic and prognostic biomarkers and their biological functions in OSCC by bioinformatics analysis. We finally determined Aurora Kinase A and Ninein Interacting Protein (AUNIP) as our gene of interest and confirmed its overexpression in OSCC tissues. AUNIP, also known as AIBp, was demonstrated to interact with Aurora-A at the C-terminal and with hNinein at the N-terminal to promote centrosomal structure maintenance and spindle formation during cell cycle [[58]14]. By controlling activation of both Aurora-A and Plk1, AUNIP could also regulate mitotic entry and mitotic spindle assembly [[59]15]. Apart from these, AUNIP and Aurora-A were co-overexpressed in various brain tumors [[60]14] and researches suggested that AUNIP might have an oncogenic role in human cancers and regulate cell cycle progression. Recently, AUNIP was discovered to be connected with CtIP and be required for DNA double-strand breaks repair [[61]16]. Our present study shed light on the critical role of AUNIP in regulating cell cycle in OSCC cells and promoting OSCC cell proliferation and proved its diagnostic and prognostic value for OSCC patients. Therefore, AUNIP inhibition could serve as a potential strategy to treat OSCC. 2. Materials and methods 2.1. Ethics statement The study protocol was approved by Taizhou Hospital Ethics Committee (Zhejiang, China) and Shandong University Research Ethics Committee (Shandong, China). All experiments were performed complying with the relevant regulations, and written informed consent was obtained from patients. 2.2. Clinical specimens Tissue microarray chips containing 89 samples of OSCC and 16 samples of normal control (nine samples of adjacent normal oral tissue) were purchased from Shanghai Qutdo Biotech Company (Shanghai, China). These samples were collected from Taizhou Hospital (Zhejiang, China). Clinicopathological characteristics of these patients were shown in (Table S1). Four pairs of OSCC and matched adjacent normal oral tissues, obtained from Stomatological Hospital of Shandong University (Shandong, China) and Qilu Hospital of Shandong University (Shandong, China), were frozen in liquid nitrogen until further analysis. Clinicopathological characteristics of these four patients were shown in (Table S2). 2.3. Analysis of upregulated differentially expressed genes (DEGs) In the Gene Expression Omnibus (GEO) database ([62]http://www.ncbi.nlm.nih.gov/geo/), we selected datasets of [63]GSE30784 [[64]17], [65]GSE3524 [[66]18], [67]GSE78060 [[68]19], [69]GSE41613 [[70]20], and downloaded the original files (.CEL files) and platform files. Background correction, quantile normalization, pro summarization, log[2] conversion, and missing values supplement for the matrix data of each GEO dataset were performed using Robust Multi-array Average algorithm with the “affy” package and the “impute” package in R/Bioconductor software (version 3·5·3). Upregulated DEGs of [71]GSE30784, [72]GSE3524, and [73]GSE78060 displaying an overlap region in Venn diagram, were screened and constructed matrices by means of the “limma” package in R/Bioconductor when log[2]FoldChange (FC) > 1 and adjusted P-value <0·01. 2.4. Receiver operating characteristic (ROC) curve and logistic regression analysis By screening out the relevant documents (.count files) and clinical information related to OSCC in The Cancer Genome Atlas (TCGA) database ([74]https://www.cancer.gov) and FireBrowse database ([75]http://www.firebrowse.org) [[76]21], we gained RNA-sequence data and clinicopathological characteristics over two groups of samples, the OSCC tumor category and the adjacent normal one. The “edgeR” package in R/Bioconductor software was used to normalize and process data before log[2] conversion. It was reported that infiltrating stromal and immune cells had important roles in tumor development. ESTIMATE score is the estimation of stromal and immune cells in malignant tumor tissues using expression data of TCGA samples. We calculated ESTIMATE, stromal, and immune scores in OSCC by adopting the ESTIMATE algorithm from ([77]https://bioinformatics.mdanderson.org/estimate/) [[78]22]. A total of 306 OSCC patients with detailed survival information and their clinical samples of oral cavity which included the front two thirds of the tongue, gingiva, floor of mouth, lips, alveolar ridge, buccal mucosa, hard palate, and retromolar trigone were involved in subsequent analysis. Then ROC curve analysis was performed to evaluate the sensitivity (true positive rate) and specificity (true negative rate) of the upregulated DEGs for OSCC diagnosis and we investigated how large the area under the curve (AUC) was by using the statistical package IBM SPSS Statistics software (SPSS, version 23). To further assess the efficacy of the gene in diagnosing OSCC, we performed five-fold cross-validation using logistic regression model in TCGA database [[79]23]. The randomly selected 60 tumor samples and 29 control ones were formed a new dataset, which avoided the size imbalance of two groups. The logistic regression model was built by using “scikit-learn” package in Python software (version 3.6, [80]https://www.python.org) [[81]24]. The confusion matrix of two classifications (the positive and negative) was displayed in (Table S3). Precision, recall, accuracy and F1-score were introduced to evaluate the performance of this classification model. The precision of a classifier was the number of true positives divided by the number that we predicted as positive. High precision meant high accuracy of predicting OSCC by AUNIP. Recall was defined as the number of true positives divided by the number of actual positives. Accuracy referred to the proportion of correctly predicted results among all samples. Because precision and recall affected each other and the values of them could not be ideally simultaneously large, we considered computing the F1-score as a comprehensive evaluation index. F1-score was the harmonic average of precision and recall. 2.5. Immunohistochemistry (IHC) staining The tissue microarray chips through deparaffinization and dehydration were incubated with monoclonal rabbit anti-human AUNIP (dilution 1:500, Bioss, bs-15019R) overnight at 4 °C after epitope retrieval, H[2]O[2] treatment and non-specific antigens blocking. Chips were next incubated with secondary antibody, followed by signal detection with DAB staining kit (Vector Laboratories, USA), and Histochemistry Score (H-Score = ∑ (PI × I) = (percentage of cells of weak intensity × 1) + (percentage of cells of moderate intensity × 2) + percentage of cells of strong intensity × 3) [[82]25] was obtained with Quant Center Analysis tool. 2.6. Construction of weighted gene co-expression network analysis (WGCNA) [83]GSE41613 dataset including 97 OSCC patients with detailed overall survival information was appropriate for the construction of WGCNA. We could link module eigengenes with survival data of OSCC patients. The data matrix of gene expression in [84]GSE41613 was constructed and the top 25% of genes in tumor samples with the largest variance were selected as input data set for the subsequent WGCNA using “WGCNA” package in R/Bioconductor software. Outlier samples were detected and excluded with sample hierarchically clustering method, before selecting an appropriate soft threshold power in order to achieve standard scale-free networks. In the next stage, through the construction of adjacency and topological overlap matrix (TOM) and the calculation of corresponding dissimilarity (1-TOM), gene dendrogram and module identification were accomplished with dynamic tree cut and the minimum module size was 30. Then clustering of module eigengenes was implemented to merge highly similar modules with the dissimilarity of <0·25 and we calculated the correlation between module eigengene and clinical phenotypes of OSCC. The module including AUNIP was chosen for further analysis and gene significance (GS) and module membership (MM) were obtained to identify modules related to clinical traits of OSCC. 2.7. Biological function and pathway enrichment analysis In gene networks conforming to scale-free distribution, genes with similar expression patterns could be co-regulated, functionally related or pathway shared. Through calculating the corresponding topological overlap, genes in the above selected module positively associated with AUNIP were found out (displayed by Cytoscape 3·6·1) and they were next executed gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis via the “clusterprofiler” package in R/Bioconductor software to acquire the enriched biological process and KEGG pathway for next analysis. In addition, samples from TCGA were divided into two groups based on the expression of AUNIP (median value) and gene set enrichment analysis (GSEA) software ([85]http://software.broadinstitute.org/gsea/index.jsp) was applied in the two groups to verify the results of GO and KEGG analysis. The cut-off criteria for GSEA were nominal P-value <0·05 and false discovery rate (FDR) < 0·25. 2.8. Cell culture and transient transfection Human OSCC cell lines SCC-15 and SCC-9 were obtained from ATCC (Beijing Beina Chuanglian Biotechnology Institute) and they were cultured in F12 and DMEM respectively containing 10% fetal bovine serum (Gibco, Carlsbad, CA, USA). Both cell lines were maintained in a humidified incubator with 37 °C, 5% CO[2]. Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA) was used to transfect Negative Control (NC) and AUNIP siRNAs (Sigma-Aldrich, USA) into OSCC cells as the manufacturer's protocol suggested. Target sequences for siRNAs were UUCUCCGAACGUGUCACGUdTdT ACGUGACACGUUCGGAGAAdTdT (NC si), CCAUUUGAUCCCAGGCUUAdTdT UAAGCCUGGGAUCAAAUGGdTdT (AUNIP si1), and GAAACAAAGCCGUAGUCCAdTdT UGGACUACGGCUUUGUUUCdTdT (AUNIP si2). 2.9. RNA extraction, RT-PCR and qRT-PCR Total RNA extracted from SCC-15 and SCC-9 with Trizol Reagent (Invitrogen, Carlsbad, CA, USA) were reverse transcribed with RT reagent Kit gDNA Eraser (TaKaRa). Next, SYBR-Green (TaKaRa) and qRT-PCR analysis were used for detecting cDNA expression levels and β-ACTIN was used as internal reference. Primers were shown as follows: hβ-ACTIN, Forward (F): 5′-AGTTGCGTTACACCCTTTCTTG-3′, Reverse (R): 5′-CACCTTCACCGTTCCAGTTTT-3′; hAUNIP, Forward (F): 5′- GCGGAAAGTGCAGACACATTT -3′, Reverse (R): 5′- TCTCTGGTGAATGCCTGTAGAT -3′. 2.10. Western blotting Total proteins were extracted with lysis buffer including protease inhibitors, and the concentration was measured. PVDF membranes (Millipore, Bedford, MA) containing proteins separated on SDS-PAGE gels were blocked with 5% nonfat dry milk and incubated with primary antibodies for overnight at 4 °C. Immunoblots were detected using ECL detection reagent (Millipore) according to manufacturer's protocol. Antibodies used were: β-ACTIN (A1978, Sigma-Aldrich, RRID: [86]AB_476692), GAPDH (Santa Cruz Biotechnology, sc-47724), AUNIP (Bioss, bs-15019R). 2.11. Colony formation and cell cycle analysis Cells (500–800 cells/well) were seeded in 6-well plate and supported for 7–14 days in a humidified incubator with 37 °C, 5% CO[2] until colonies of cells appeared. The colonies were fixed with methanol and stained with Giemsa in order to be counted. The colonies containing >50 cells were counted for analysis. For cell cycle analyses, cells were fixed with 70% ethanol at 4 °C overnight and stained with RNase A containing Propidium Iodide (Sigma-Aldrich, USA). Cell cycle distribution was determined using flow cytometry with ModFit software. Blank control, si-NC, and si-AUNIP groups were established. 2.12. Statistical analysis The Kaplan-Meier analysis for overall survival was proceeded based on the gene's expression level whose cut-off level was set at the median value with the aid of GraphPad Prism 8 software and the Log-Rank was utilized to test. SPSS was employed to perform univariate and multivariable analysis so as to establish a Cox proportional hazard regression model. Patients were divided into groups in the light of the target gene's expression whose correlation with clinicopathological characteristics in TCGA was studied by SPSS with the application of two-tailed χ2 test. In addition, Pearson correlation analysis was adopted to determine the linear relationship between two groups. One-way ANOVA test and two-tailed Student's t-tests were used for expression data comparisons by using GraphPad Prism 8 software. Each experiment was repeated three times or more and all data were presented as mean ± standard deviation (SD). Statistical significance was described as follows: n.s, not significant; *P ≤ 0·05; **P ≤ 0·01; ***P ≤ 0·001; ****P ≤ 0·0001. 3. Results 3.1. Identification of upregulated DEGs in OSCC after data integration As the volcano plots illustrated, gene expression profiles from [87]GSE3524 identified 843 differentially expressed genes with 665 genes upregulated and 178 genes downregulated in OSCC samples compared with the expression in normal control tissues ([88]Fig. 1a). From [89]GSE30784 data, we recognized 1266 differentially expressed genes, of which 653 genes were upregulated and 613 genes were downregulated in OSCC ([90]Fig. 1b). 2264 differentially expressed genes were ascertained in [91]GSE78060, comprising 1957 genes upregulated and 307 genes downregulated in OSCC ([92]Fig. 1c). Grounded on the cut-off criteria, DEGs were selected for integrated analysis and we gained 90 overlapping upregulated DEGs from the above mentioned three GSE datasets ([93]Fig. 1d, Table S4). Gene expression profiles from paired adjacent and tumor samples in TCGA database confirmed the overexpression of these 90 genes in OSCC samples ([94]Fig. 1e). Fig. 1. [95]Fig. 1 [96]Open in a new tab Identification of differentially upregulated expressed genes. a–c. Volcano plots of gene expression profiles in [97]GSE3524, [98]GSE30784, and [99]GSE78060. Red/blue symbols classify the upregulated/downregulated genes according to the criteria: log[2]FC > 1 and adjusted P-value <0·01. d. Common upregulated DEGs among [100]GSE3524, [101]GSE30784, and [102]GSE78060. e. The expression matrix of 90 common upregulated DEGs in 29 pairs of OSCC and adjacent normal tissues followed by unsupervised hierarchical clustering in TCGA database. Blue, red and white respectively represents a lower expression level, a higher expression level and no expression difference among the genes. (For interpretation of the references to