Abstract

Background

   Oral squamous cell carcinoma (OSCC) is one of the most common malignant
   tumors worldwide. Patients with poorly differentiated OSCC often
   exhibit a poor prognosis. AUNIP (Aurora Kinase A and Ninein Interacting
   Protein), also known as AIBp, plays a key role in cell cycle and DNA
   damage repair. However, the function of AUNIP in OSCC remains elusive.

Methods

   The differentially expressed genes (DEGs) were obtained using R
   language. Receiver operating characteristic curve analysis was
   performed to identify diagnostic markers for OSCC. The effectiveness of
   AUNIP in diagnosing OSCC was evaluated by machine learning. AUNIP
   expression was analyzed in publicly available databases and clinical
   specimens. Bioinformatics analysis and in vitro experiments were
   conducted to explore biological functions and prognostic value of AUNIP
   in OSCC.

Findings

   The gene integration analysis revealed 90 upregulated DEGs. One
   candidate biomarker, AUNIP, for the diagnosis of OSCC was detected, and
   its expression gradually increased along with malignant differentiation
   of OSCC. Bioinformatics analysis demonstrated that AUNIP could be
   associated with tumor microenvironment, human papillomavirus infection,
   and cell cycle in OSCC. The suppression of AUNIP inhibited OSCC cell
   proliferation and resulted in G0/G1 phase arrest in OSCC cells. The
   survival analysis showed that AUNIP overexpression predicted poor
   prognosis of OSCC patients.

   Interpretation: AUNIP could serve as a candidate diagnostic and
   prognostic biomarker for OSCC and suppression of AUNIP may be a
   potential approach to preventing and treating OSCC.

Fund

   Taishan Scholars Project in Shandong Province (ts201511106) and the
   National Natural Science Foundation of China (Nos. 61603218).

   Keywords: Oral squamous cell carcinoma, Receiver operating
   characteristic curve, Biomarker, Weighted gene co-expression network
   analysis, Survival analysis, AUNIP
     __________________________________________________________________

Research in context.

Evidence before this study

   Patients with poorly differentiated oral squamous cell carcinoma (OSCC)
   often have a poor prognosis. Tumor biomarkers can highlight biological
   differences among cancers and help predict treatment outcomes. However,
   tumor biomarkers with 100% sensitivity and specificity have not been
   discovered so far. Although a variety of potential biomarkers of OSCC
   have been discovered using the bioinformatics method, their
   verifications failed in large samples. AUNIP, also known as AIBp, is
   demonstrated to regulate mitotic entry and mitotic spindle assembly
   during cell cycle. However, the role of AUNIP in solid tumors such as
   OSCC remains elusive.

Added value of this study

   Our research creatively integrated microarray data of OSCC with
   relative large samples from publicly available databases and clinical
   specimens. Bioinformatics analysis and in vitro experiments were
   carried out to speculate the potential biological functions and
   clinical significance of AUNIP in OSCC. We found that AUNIP was
   overexpressed in OSCC tissues and increased AUNIP expression predicted
   poor clinical outcomes in OSCC patients. The decreased expression of
   AUNIP caused an abnormity of OSCC cell cycle transition. In addition,
   AUNIP was associated with tumor microenvironment and HPV infection in
   OSCC.

Implications of all the available evidence

   The findings of our research suggest that AUNIP could serve as a
   potential biomarker and the inhibition of AUNIP may be a new strategy
   for the treatment of OSCC. Integrating AUNIP with well-known biomarkers
   to detect OSCC and predict treatment outcomes could promote clinical
   practice.

   Alt-text: Unlabelled Box

1. Introduction

   Cancer morbidity and mortality have experienced a gradual increase all
   over the world. There were approximate 18·1 million cancer cases and
   9·6 million cancer deaths worldwide in 2018, with oral cancer
   accounting for 2% and 1·9% respectively [[41]1,[42]2]. Oral squamous
   cell carcinoma (OSCC), exhibiting high morbidity and malignancy, is the
   most common type of oral cancer [[43]3,[44]4]. It normally involves the
   front two thirds of the tongue, gingiva (gums), floor of mouth, lips,
   alveolar ridge, buccal mucosa, hard palate, and retromolar trigone
   ([45]https://www.cancer.gov/). The excision of primary tumor with or
   without dissection of lymph nodes is the routine treatment of OSCC
   [[46]4,[47]5]. Most patients with OSCC are in advanced clinical stage
   at the time of diagnosis [[48]6]. Although the clinical
   multidisciplinary collaboration and sequential therapy can improve the
   prognosis [[49]7], the 5-year survival rate of OSCC patients remains
   about 50% and 25%–50% of them will suffer from local relapse and
   distant metastasis after treatment [[50]8,[51]9]. It is thus necessary
   to search for effective diagnostic and prognostic biomarkers for OSCC.

   A cancer biomarker is defined as a substance or a process that is
   indicative of the presence of cancer in the body
   ([52]https://en.wikipedia.org). There have been many papers reporting
   the discovery of OSCC biomarkers, but only few biomarkers have been
   validated and successfully applied in routine clinical practice
   [[53]10]. Moreover, most biomarkers possess limitations for the early
   detection of OSCC, and the prognostic value of them is plagued by
   inaccuracies [[54]11]. By taking advantage of bioinformatics analysis,
   it is more likely to overcome deficiencies in the known biomarkers.
   Bioinformatics analysis allows researchers to delve into integrated
   data of numerous clinical samples from different independent studies,
   which provides data infrastructure for discovering promising biomarkers
   and updating our understanding about cancers [[55]6,[56]12,[57]13].

   The aim of this study was to explore potential diagnostic and
   prognostic biomarkers and their biological functions in OSCC by
   bioinformatics analysis. We finally determined Aurora Kinase A and
   Ninein Interacting Protein (AUNIP) as our gene of interest and
   confirmed its overexpression in OSCC tissues. AUNIP, also known as
   AIBp, was demonstrated to interact with Aurora-A at the C-terminal and
   with hNinein at the N-terminal to promote centrosomal structure
   maintenance and spindle formation during cell cycle [[58]14]. By
   controlling activation of both Aurora-A and Plk1, AUNIP could also
   regulate mitotic entry and mitotic spindle assembly [[59]15]. Apart
   from these, AUNIP and Aurora-A were co-overexpressed in various brain
   tumors [[60]14] and researches suggested that AUNIP might have an
   oncogenic role in human cancers and regulate cell cycle progression.
   Recently, AUNIP was discovered to be connected with CtIP and be
   required for DNA double-strand breaks repair [[61]16]. Our present
   study shed light on the critical role of AUNIP in regulating cell cycle
   in OSCC cells and promoting OSCC cell proliferation and proved its
   diagnostic and prognostic value for OSCC patients. Therefore, AUNIP
   inhibition could serve as a potential strategy to treat OSCC.

2. Materials and methods

2.1. Ethics statement

   The study protocol was approved by Taizhou Hospital Ethics Committee
   (Zhejiang, China) and Shandong University Research Ethics Committee
   (Shandong, China). All experiments were performed complying with the
   relevant regulations, and written informed consent was obtained from
   patients.

2.2. Clinical specimens

   Tissue microarray chips containing 89 samples of OSCC and 16 samples of
   normal control (nine samples of adjacent normal oral tissue) were
   purchased from Shanghai Qutdo Biotech Company (Shanghai, China). These
   samples were collected from Taizhou Hospital (Zhejiang, China).
   Clinicopathological characteristics of these patients were shown in
   (Table S1). Four pairs of OSCC and matched adjacent normal oral
   tissues, obtained from Stomatological Hospital of Shandong University
   (Shandong, China) and Qilu Hospital of Shandong University (Shandong,
   China), were frozen in liquid nitrogen until further analysis.
   Clinicopathological characteristics of these four patients were shown
   in (Table S2).

2.3. Analysis of upregulated differentially expressed genes (DEGs)

   In the Gene Expression Omnibus (GEO) database
   ([62]http://www.ncbi.nlm.nih.gov/geo/), we selected datasets of
   [63]GSE30784 [[64]17], [65]GSE3524 [[66]18], [67]GSE78060 [[68]19],
   [69]GSE41613 [[70]20], and downloaded the original files (.CEL files)
   and platform files. Background correction, quantile normalization, pro
   summarization, log[2] conversion, and missing values supplement for the
   matrix data of each GEO dataset were performed using Robust Multi-array
   Average algorithm with the “affy” package and the “impute” package in
   R/Bioconductor software (version 3·5·3). Upregulated DEGs of
   [71]GSE30784, [72]GSE3524, and [73]GSE78060 displaying an overlap
   region in Venn diagram, were screened and constructed matrices by means
   of the “limma” package in R/Bioconductor when log[2]FoldChange (FC) > 1
   and adjusted P-value <0·01.

2.4. Receiver operating characteristic (ROC) curve and logistic regression
analysis

   By screening out the relevant documents (.count files) and clinical
   information related to OSCC in The Cancer Genome Atlas (TCGA) database
   ([74]https://www.cancer.gov) and FireBrowse database
   ([75]http://www.firebrowse.org) [[76]21], we gained RNA-sequence data
   and clinicopathological characteristics over two groups of samples, the
   OSCC tumor category and the adjacent normal one. The “edgeR” package in
   R/Bioconductor software was used to normalize and process data before
   log[2] conversion. It was reported that infiltrating stromal and immune
   cells had important roles in tumor development. ESTIMATE score is the
   estimation of stromal and immune cells in malignant tumor tissues using
   expression data of TCGA samples. We calculated ESTIMATE, stromal, and
   immune scores in OSCC by adopting the ESTIMATE algorithm from
   ([77]https://bioinformatics.mdanderson.org/estimate/) [[78]22]. A total
   of 306 OSCC patients with detailed survival information and their
   clinical samples of oral cavity which included the front two thirds of
   the tongue, gingiva, floor of mouth, lips, alveolar ridge, buccal
   mucosa, hard palate, and retromolar trigone were involved in subsequent
   analysis. Then ROC curve analysis was performed to evaluate the
   sensitivity (true positive rate) and specificity (true negative rate)
   of the upregulated DEGs for OSCC diagnosis and we investigated how
   large the area under the curve (AUC) was by using the statistical
   package IBM SPSS Statistics software (SPSS, version 23).

   To further assess the efficacy of the gene in diagnosing OSCC, we
   performed five-fold cross-validation using logistic regression model in
   TCGA database [[79]23]. The randomly selected 60 tumor samples and 29
   control ones were formed a new dataset, which avoided the size
   imbalance of two groups. The logistic regression model was built by
   using “scikit-learn” package in Python software (version 3.6,
   [80]https://www.python.org) [[81]24]. The confusion matrix of two
   classifications (the positive and negative) was displayed in (Table
   S3). Precision, recall, accuracy and F1-score were introduced to
   evaluate the performance of this classification model. The precision of
   a classifier was the number of true positives divided by the number
   that we predicted as positive. High precision meant high accuracy of
   predicting OSCC by AUNIP. Recall was defined as the number of true
   positives divided by the number of actual positives. Accuracy referred
   to the proportion of correctly predicted results among all samples.
   Because precision and recall affected each other and the values of them
   could not be ideally simultaneously large, we considered computing the
   F1-score as a comprehensive evaluation index. F1-score was the harmonic
   average of precision and recall.

2.5. Immunohistochemistry (IHC) staining

   The tissue microarray chips through deparaffinization and dehydration
   were incubated with monoclonal rabbit anti-human AUNIP (dilution 1:500,
   Bioss, bs-15019R) overnight at 4 °C after epitope retrieval, H[2]O[2]
   treatment and non-specific antigens blocking. Chips were next incubated
   with secondary antibody, followed by signal detection with DAB staining
   kit (Vector Laboratories, USA), and Histochemistry Score (H-Score = ∑
   (PI × I) = (percentage of cells of weak intensity × 1) + (percentage of
   cells of moderate intensity × 2) + percentage of cells of strong
   intensity × 3) [[82]25] was obtained with Quant Center Analysis tool.

2.6. Construction of weighted gene co-expression network analysis (WGCNA)

   [83]GSE41613 dataset including 97 OSCC patients with detailed overall
   survival information was appropriate for the construction of WGCNA. We
   could link module eigengenes with survival data of OSCC patients. The
   data matrix of gene expression in [84]GSE41613 was constructed and the
   top 25% of genes in tumor samples with the largest variance were
   selected as input data set for the subsequent WGCNA using “WGCNA”
   package in R/Bioconductor software. Outlier samples were detected and
   excluded with sample hierarchically clustering method, before selecting
   an appropriate soft threshold power in order to achieve standard
   scale-free networks. In the next stage, through the construction of
   adjacency and topological overlap matrix (TOM) and the calculation of
   corresponding dissimilarity (1-TOM), gene dendrogram and module
   identification were accomplished with dynamic tree cut and the minimum
   module size was 30. Then clustering of module eigengenes was
   implemented to merge highly similar modules with the dissimilarity of
   <0·25 and we calculated the correlation between module eigengene and
   clinical phenotypes of OSCC. The module including AUNIP was chosen for
   further analysis and gene significance (GS) and module membership (MM)
   were obtained to identify modules related to clinical traits of OSCC.

2.7. Biological function and pathway enrichment analysis

   In gene networks conforming to scale-free distribution, genes with
   similar expression patterns could be co-regulated, functionally related
   or pathway shared. Through calculating the corresponding topological
   overlap, genes in the above selected module positively associated with
   AUNIP were found out (displayed by Cytoscape 3·6·1) and they were next
   executed gene ontology (GO) and Kyoto encyclopedia of genes and genomes
   (KEGG) pathway analysis via the “clusterprofiler” package in
   R/Bioconductor software to acquire the enriched biological process and
   KEGG pathway for next analysis. In addition, samples from TCGA were
   divided into two groups based on the expression of AUNIP (median value)
   and gene set enrichment analysis (GSEA) software
   ([85]http://software.broadinstitute.org/gsea/index.jsp) was applied in
   the two groups to verify the results of GO and KEGG analysis. The
   cut-off criteria for GSEA were nominal P-value <0·05 and false
   discovery rate (FDR) < 0·25.

2.8. Cell culture and transient transfection

   Human OSCC cell lines SCC-15 and SCC-9 were obtained from ATCC (Beijing
   Beina Chuanglian Biotechnology Institute) and they were cultured in F12
   and DMEM respectively containing 10% fetal bovine serum (Gibco,
   Carlsbad, CA, USA). Both cell lines were maintained in a humidified
   incubator with 37 °C, 5% CO[2]. Lipofectamine 2000 (Invitrogen,
   Carlsbad, CA, USA) was used to transfect Negative Control (NC) and
   AUNIP siRNAs (Sigma-Aldrich, USA) into OSCC cells as the manufacturer's
   protocol suggested. Target sequences for siRNAs were
   UUCUCCGAACGUGUCACGUdTdT ACGUGACACGUUCGGAGAAdTdT (NC si),
   CCAUUUGAUCCCAGGCUUAdTdT UAAGCCUGGGAUCAAAUGGdTdT (AUNIP si1), and
   GAAACAAAGCCGUAGUCCAdTdT UGGACUACGGCUUUGUUUCdTdT (AUNIP si2).

2.9. RNA extraction, RT-PCR and qRT-PCR

   Total RNA extracted from SCC-15 and SCC-9 with Trizol Reagent
   (Invitrogen, Carlsbad, CA, USA) were reverse transcribed with RT
   reagent Kit gDNA Eraser (TaKaRa). Next, SYBR-Green (TaKaRa) and qRT-PCR
   analysis were used for detecting cDNA expression levels and β-ACTIN was
   used as internal reference. Primers were shown as follows: hβ-ACTIN,
   Forward (F): 5′-AGTTGCGTTACACCCTTTCTTG-3′, Reverse (R):
   5′-CACCTTCACCGTTCCAGTTTT-3′; hAUNIP, Forward (F): 5′-
   GCGGAAAGTGCAGACACATTT -3′, Reverse (R): 5′- TCTCTGGTGAATGCCTGTAGAT -3′.

2.10. Western blotting

   Total proteins were extracted with lysis buffer including protease
   inhibitors, and the concentration was measured. PVDF membranes
   (Millipore, Bedford, MA) containing proteins separated on SDS-PAGE gels
   were blocked with 5% nonfat dry milk and incubated with primary
   antibodies for overnight at 4 °C. Immunoblots were detected using ECL
   detection reagent (Millipore) according to manufacturer's protocol.
   Antibodies used were: β-ACTIN (A1978, Sigma-Aldrich, RRID:
   [86]AB_476692), GAPDH (Santa Cruz Biotechnology, sc-47724), AUNIP
   (Bioss, bs-15019R).

2.11. Colony formation and cell cycle analysis

   Cells (500–800 cells/well) were seeded in 6-well plate and supported
   for 7–14 days in a humidified incubator with 37 °C, 5% CO[2] until
   colonies of cells appeared. The colonies were fixed with methanol and
   stained with Giemsa in order to be counted. The colonies containing >50
   cells were counted for analysis. For cell cycle analyses, cells were
   fixed with 70% ethanol at 4 °C overnight and stained with RNase A
   containing Propidium Iodide (Sigma-Aldrich, USA). Cell cycle
   distribution was determined using flow cytometry with ModFit software.
   Blank control, si-NC, and si-AUNIP groups were established.

2.12. Statistical analysis

   The Kaplan-Meier analysis for overall survival was proceeded based on
   the gene's expression level whose cut-off level was set at the median
   value with the aid of GraphPad Prism 8 software and the Log-Rank was
   utilized to test. SPSS was employed to perform univariate and
   multivariable analysis so as to establish a Cox proportional hazard
   regression model. Patients were divided into groups in the light of the
   target gene's expression whose correlation with clinicopathological
   characteristics in TCGA was studied by SPSS with the application of
   two-tailed χ2 test. In addition, Pearson correlation analysis was
   adopted to determine the linear relationship between two groups.
   One-way ANOVA test and two-tailed Student's t-tests were used for
   expression data comparisons by using GraphPad Prism 8 software. Each
   experiment was repeated three times or more and all data were presented
   as mean ± standard deviation (SD). Statistical significance was
   described as follows: n.s, not significant; *P ≤ 0·05; **P ≤ 0·01;
   ***P ≤ 0·001; ****P ≤ 0·0001.

3. Results

3.1. Identification of upregulated DEGs in OSCC after data integration

   As the volcano plots illustrated, gene expression profiles from
   [87]GSE3524 identified 843 differentially expressed genes with 665
   genes upregulated and 178 genes downregulated in OSCC samples compared
   with the expression in normal control tissues ([88]Fig. 1a). From
   [89]GSE30784 data, we recognized 1266 differentially expressed genes,
   of which 653 genes were upregulated and 613 genes were downregulated in
   OSCC ([90]Fig. 1b). 2264 differentially expressed genes were
   ascertained in [91]GSE78060, comprising 1957 genes upregulated and 307
   genes downregulated in OSCC ([92]Fig. 1c). Grounded on the cut-off
   criteria, DEGs were selected for integrated analysis and we gained 90
   overlapping upregulated DEGs from the above mentioned three GSE
   datasets ([93]Fig. 1d, Table S4). Gene expression profiles from paired
   adjacent and tumor samples in TCGA database confirmed the
   overexpression of these 90 genes in OSCC samples ([94]Fig. 1e).

Fig. 1.

   [95]Fig. 1
   [96]Open in a new tab

   Identification of differentially upregulated expressed genes.

   a–c. Volcano plots of gene expression profiles in [97]GSE3524,
   [98]GSE30784, and [99]GSE78060. Red/blue symbols classify the
   upregulated/downregulated genes according to the criteria: log[2]FC > 1
   and adjusted P-value <0·01. d. Common upregulated DEGs among
   [100]GSE3524, [101]GSE30784, and [102]GSE78060. e. The expression
   matrix of 90 common upregulated DEGs in 29 pairs of OSCC and adjacent
   normal tissues followed by unsupervised hierarchical clustering in TCGA
   database. Blue, red and white respectively represents a lower
   expression level, a higher expression level and no expression
   difference among the genes. (For interpretation of the references to