Abstract Oral Tongue Squamous cell carcinoma (OTSCC), the most frequently affected oral cancer sub-site, is associated with a poor therapeutic outcome and survival despite aggressive multi- modality management. Till date, there are no established biomarkers to indicate prognosis and outcome in patients presenting with tongue cancer. There is an urgent need for reliable molecular prognostic factors to enable identification of patients with high risk of recurrence and treatment failure in OTSCC management. In the current study, we present the meta-analysis of OTSCC microarray based gene expression profiles, deriving a comprehensive molecular portrait of tongue cancer biology, showing the relevant genes and pathways which can be pursued further to derive novel, tailored therapeutics as well as for prognostication. We have studied 5 gene expression profiling data sets available on exclusively oral tongue subsite comprising of sample size; n = 190, consisting of 111 tumors and 79 normals. The meta- analysis results showed 2405 genes differentially regulated comparing OTSCC tumor and normal. The top up regulated genes were found to be involved in Extracellular matrix degradation (ECM) and Epithelial to mesenchymal transition (EMT) pathways. The top down regulated genes were found to be involved in detoxication pathways. We validated the results in clinical samples (n = 206), comprising of histologically normals (n = 10), prospective (n = 29) and retrospective (n = 167) OTSCC by evaluating MMP9 and E-cadherin gene expression by qPCR and immunohistochemistry. Consistent with meta-analysis results, MMP9 mRNA expression was significantly up regulated in OTSCC primary tumors compared to normals. MMP9 protein over expression was found to be a significant predictor of poor prognosis, disease recurrence and poor Disease Free Survival (DFS) in OTSCC patients. Analysis by univariate and multivariate Cox proportional hazard model showed patients with loss of E-cadherin expression in OTSCC tumors having a poorer DFS (HR = 1.566; P value = 0.045) and poorer Overall Survival (OS) (HR = 1.224; P value = 0.003) respectively. Combined over-expression of MMP9 and loss of E-cadherin membrane positivity in the invasive tumor front (ITF) of OTSCC had a significant association with poorer DFS (Log Rank = 16.040; P value = 0.001). These results suggest that along with known clinical indicators of prognosis like occult node positivity, assessment of MMP9 and E-cadherin expression at ITF can be useful to identify patients at high risk and requiring a more intensive treatment strategy for OTSCC. Meta-analysis study of gene expression profiles indicates that OTSCC is a disease of ECM degradation leading to activated EMT processes implying the aggressive nature of the disease. The triggers for these processes should be studied further. Newer clinical application with agents that can inhibit the mediators of ECM degradation may be a key to achieving clinical control of invasion and metastasis of OTSCC. Introduction Oral Tongue Squamous Cell Carcinoma (OTSCC) is regarded as a biologically unique entity compared to cancers occurring in the other oral sub-sites. The trend in epidemiology of oral cancer in Asia in the past decade (2000–2012) shows OTSCC as the most frequently affected oral sub-site. [[36]1] Earlier studies also report a higher incidence of OTSCC in India compared to other countries. [[37]2, [38]3, [39]4, [40]5] According to population based cancer registry (PBCR), the age adjusted incidence rate (AAR) for OTSCC in Chennai is showing an increasing trend from 3.6 to 5.7 per 100,000 persons above 25 years. Though there are poor prognostic indicators for OTSCC like occult node positivity, tumor depth, lymphovascular invasion and perineural invasion, there is still a need for molecular prognostic biomarkers that are reliable and robust to identify patients who are likely to have an adverse outcome. Microarrays, a tool for genomic scale profiling of gene expression, is a well known potentially valuable means of understanding the complex interactions and networks in development of several diseases including cancer. [[41]6, [42]7] These high throughput studies have offered the advantage of understanding the biology of a cancer through an exhaustive analysis. The launching of public microarray data archives like Gene Expression Omnibus and the advent of advanced computational informatics tools have made it possible to compare and converge gene expression studies done independently across different platforms. However, the hallmark of scientific progress is reproducibility of published outcomes which has been difficult in the case of several microarray studies with major sources of discordance because of variation caused by random noise, biological and experimental differences, and differences in technical methods. [[43]8] Most often we have findings that are not reproducible across studies due to data perturbations of individual studies, improper validations, and insufficient control of false positives. Despite these obstacles, several groups have successfully gleaned important insights from the focused comparison of disparate microarray results. [[44]9, [45]10] Many of the limitations can be mitigated by the use of standard reporting methods, together with careful application of large-scale meta-analysis techniques. Current study presents meta-analysis of OTSCCs as an exclusive sub-site for the first time as our primary objective. It was attempted to overcome the limitations of the individual expression profiling studies, resolving inconsistencies and reducing the likelihood of random errors, thus laying a foundation for uncovering the molecular aspects of OTSCC. We present the differentially expressed genes (DEG) comparing the OTSCC and normal expression profiles along with the involved signaling pathways. We have validated two biomarkers, MMP9 and E-cadherin found in meta–analysis in prospective and retrospective clinical samples as our second objective. Materials and Methods Identification of eligible OTSCC gene expression data sets OTSCC expression profiling studies were identified by searching the PubMed database. The following keywords and their combinations were used: “Oral tongue cancer gene expression microarray”. The Gene Expression Omnibus database ([46]http://www.ncbi.nlm.gov/geo) was also searched for terms “Oral Tongue Cancer”, “Oral Tongue Squamous cell carcinoma”, “mobile tongue cancer”. Inclusion Criteria Gene expression data sets from exclusively anterior 2/3 (mobile tongue cancer) were taken for the study. The original experimental studies comprising of gene expression values for tongue tumor and normal tissues were taken. The expression data sets obtained from only standard microarray platforms were undertaken for the current study. Exclusion Criteria Studies on head and neck cancer with a few samples on tongue cancer were excluded. Studies from datasets with base of tongue samples, tongue cancer cell lines, non human tissues were excluded. Studies without inclusion of normal samples were excluded. Individual Study Analysis GEO accession number, sample type, platform, number of cases and controls, references and gene expression data were extracted from each