Abstract Our previous research has demonstrated that miR‐146a‐5p is down‐regulated in hepatocellular carcinoma (HCC) and might play a tumor‐suppressive role. In this study, we sought to validate the decreased expression with a larger cohort and to explore potential molecular mechanisms. GEO and TCGA databases were used to gather miR‐146a‐5p expression data in HCC, which included 762 HCC and 454 noncancerous liver tissues. A meta‐analysis of the GEO‐based microarrays, TCGA‐based RNA‐seq data, and additional qRT‐PCR data validated the down‐regulation of miR‐146a‐5p in HCC and no publication bias was observed. Integrated genes were generated by overlapping miR‐146a‐5p‐related genes from predicted and formerly reported HCC‐related genes using natural language processing. The overlaps were comprehensively analyzed to discover the potential gene signatures, regulatory pathways, and networks of miR‐146a‐5p in HCC. A total of 251 miR‐146a‐5p potential target genes were predicted by bioinformatics platforms and 104 genes were considered as both HCC‐ and miR‐146a‐5p‐related overlaps. RAC1 was the most connected hub gene for miR‐146a‐5p and four pathways with high enrichment (VEGF signaling pathway, adherens junction, toll‐like receptor signaling pathway, and neurotrophin signaling pathway) were denoted for the overlapped genes. The down‐regulation of miR‐146a‐5p in HCC has been validated with the most complete data possible. The potential gene signatures, regulatory pathways, and networks identified for miR‐146a‐5p in HCC could prove useful for molecular‐targeted diagnostics and therapeutics. Keywords: expression, gene signature, HCC, hepatocellular carcinoma, microRNA, miR‐146a‐5p __________________________________________________________________ Abbreviations AUC area under the curve EMT epithelial mesenchymal transition FCs fold changes GEO gene expression omnibus HCC hepatocellular carcinoma KEGG Kyoto Encyclopedia of Genes and Genomes miRs microRNAs NLP natural language processing ROC receiver operator characteristic TCGA The Cancer Genome Atlas Hepatocellular carcinoma (HCC) is considered to be the fifth most frequent cancer globally and takes the third place for cancer‐related mortality [36]1, [37]2. However, many patients are diagnosed at advanced stages, and recurrence and metastasis remain the main challenge for HCC treatment [38]3. Therefore, it is of utmost urgency to find novel diagnostic and prognostic biomarkers for HCC. MicroRNAs (miRs) are an ample variety of short, noncoding RNA molecules of 18–25 nucleotides, which mediate numerous cellular processes, such as cell proliferation, migration, and apoptosis [39]4, [40]5. Among them is miR‐146a‐5p, which locates on human chromosome 5q34 and is thought to be actively involved in multiple oncological processes of HCC, such as antitumor immune suppression [41]6, metastasis [42]7, and angiogenesis [43]8. Our previous work [44]9 has demonstrated that the down‐regulated miR‐146a‐5p expression is associated with the carcinogenesis and deterioration of HCC and that miR‐146a‐5p might be a tumor‐suppressive microRNA of HCC. Nevertheless, the precise molecular mechanisms of miR‐146a‐5p in HCC remain largely unknown and obscure. Believed to be promising in cancer diagnostics and prognosis predicting, gene signatures help to provide the molecular backgrounds, regulatory pathways, and networks of cellular activities in HCC [45]10. Cases in point are resources and techniques as follows: Gene Expression Omnibus (GEO) Database stores public array‐ and sequence‐based functional genomics data, which allows users’ query and downloading of experiments and gene expression profiles [46]11. Meanwhile, The Cancer Genome Atlas (TCGA) is one prominent example of the renowned public databases which contains the genetic information of various cancers. Furthermore, natural language processing (NLP) is a booming technique which teaches computers to comprehend and to sort out natural language by algorithms and programs, enabling researchers to retrieve papers on certain topics of interest and to analyze data automatically [47]12. A succession of resources and techniques in bioinformatics and computational biology were applied in the study, which includes GEO and TCGA data aggregation, comprehensive meta‐analyses, NLP analysis, target genes prediction, analytic integration, and bioinformatics analyses. We aimed to validate the down‐regulation of miR‐146a‐5p in HCC with the most complete data currently available and to present the audience with the intriguing gene signatures, regulatory pathways, and networks of miR‐146a‐5p in the carcinogenesis, metastasis, prognosis, recurrence, survival, and drug‐resistance (sorafenib and bevacizumab) of HCC. Materials and methods The present study consists of several processes sequentially (Fig. [48]1), that is, GEO‐based clinical values verification, TCGA‐based RNA‐seq data aggregation, comprehensive meta‐analyses based on GEO, TCGA and literature data, and multiple bioinformatics analyses. Figure 1. Figure 1 [49]Open in a new tab General flow chart. The present study is composed of several procedures sequentially; that is, GEO‐based verification of clinical values, TCGA‐based data aggregation of RNA‐seq, comprehensive meta‐analyses, and multiple bioinformatics analyses. Clinical value verification of miR‐146a‐5p expression in HCC based on GEO datasets All the functional genomics data of miR‐146a‐5p were requested and assembled from the GEO Database ([50]http://www.ncbi.nlm.nih.gov/geo/) with the closing date of 10 September 2016. The search strategy formulated in the GEO datasets ([51]http://www.ncbi.nlm.nih.gov/gds/) was as follows: (malignan* OR cancer OR tumor OR tumour OR neoplas* OR carcinoma) AND (hepatocellular OR liver OR hepatic OR HCC). Inclusion criteria were listed below: (a) HCC tissues were included in each dataset with each group containing more than two samples, regardless of the inclusion of adjacent noncancerous tissues (or healthy liver tissues); (b) the dataset sample organism was Homo sapiens; (c) the expression data of miR‐146a (hsa‐miR‐146a or hsa‐miR‐146a‐5p) from the experimental and control groups could be provided or calculated. Meanwhile, the following conditions might cause the exclusion of related datasets: (a) datasets without information on miR‐146a‐5p; (b) datasets without complete data for analysis; (c) samples based on cell lines; (d) not all the subjects of the included studies were human; or (e) miR‐146a‐5p was determined in the HCC patients without a comparison. Expression values of miR‐146a‐5p and sample size in both test and control groups were calculated. Moreover, means and standard deviations of these values were extracted to estimate the different levels of miR‐146a‐5p in case and control groups by using Review Manager 5.3 with random‐effects model. The chi‐square test and the I ^2 statistics were applied to evaluate the heterogeneity across studies. It was considered to be heterogeneous when the P value <0.05 or I ^2 > 50%. Furthermore, SMD and its 95% CI were pooled to assess the stability of the analysis. It was considered to be statistically significant if the corresponding 95% CI for the pooled SMD did not overlap 1 or ‐1. Additionally, sensitivity analysis was conducted by eliminating each study to evaluate the source of heterogeneity. RNA‐seq data aggregation based on TCGA database From the TCGA ([52]http://cancergenome.nih.gov/), we downloaded and extracted the data of miR‐146a‐5p expression from miRNASeqV2 (level 3), on 15 July 2016, through bulk download mode. MiR‐146a‐5p expression data were presented as upper quartile normalized Expectation‐Maximization (RSEM) count estimates [53]13, [54]14 by using the ‘rsem.gene.normalized_results’ file type. Related data were processed without further transformation, except that some values were rounded off to integers. The expression data between HCC and adjacent normal liver tissues were compared by limma package in R. Fold changes (FCs) were calculated as HCC vs. normal liver tissue. It would be considered as statistically significant if a FC value was <0.5 or >2 and with the P value <0.05 in Student's t‐test. Comprehensive meta‐analysis based on GEO, TCGA, and literature data Comprehensive meta‐analyses were performed based on the data gathered from GEO, TCGA, and relevant literature. Related studies were selected by comprehensively searching through the online databases PubMed, Embase, Web of Science, Wiley Online Library, Cochrane Library, Science Direct, Chinese WanFang Database, Chinese VIP Database, Chinese Biomedical Literature Database, and Chinese CNKI Database up to 15 July 2016, independently. The following combination of keywords and entry words was employed: (a) (miR‐146a OR miRNA‐146a OR microRNA‐146a OR miR146a OR miRNA146a OR microRNA146a OR ‘miR 146a’ OR ‘miRNA 146a’ OR ‘microRNA 146a'OR miR‐146a‐5p OR miRNA‐146a‐5p OR microRNA‐146a‐5p); (b) (hepatocellular OR liver OR hepatic OR HCC); (c) (‘cancer’ OR ‘tumor’ OR ‘tumour’ OR ‘neoplas*’ OR ‘carcinoma’ OR ‘sarcoma’ OR ‘malignan*’). In addition, some references of relevant articles were