Abstract

   Our previous research has demonstrated that miR‐146a‐5p is
   down‐regulated in hepatocellular carcinoma (HCC) and might play a
   tumor‐suppressive role. In this study, we sought to validate the
   decreased expression with a larger cohort and to explore potential
   molecular mechanisms. GEO and TCGA databases were used to gather
   miR‐146a‐5p expression data in HCC, which included 762 HCC and 454
   noncancerous liver tissues. A meta‐analysis of the GEO‐based
   microarrays, TCGA‐based RNA‐seq data, and additional qRT‐PCR data
   validated the down‐regulation of miR‐146a‐5p in HCC and no publication
   bias was observed. Integrated genes were generated by overlapping
   miR‐146a‐5p‐related genes from predicted and formerly reported
   HCC‐related genes using natural language processing. The overlaps were
   comprehensively analyzed to discover the potential gene signatures,
   regulatory pathways, and networks of miR‐146a‐5p in HCC. A total of 251
   miR‐146a‐5p potential target genes were predicted by bioinformatics
   platforms and 104 genes were considered as both HCC‐ and
   miR‐146a‐5p‐related overlaps. RAC1 was the most connected hub gene for
   miR‐146a‐5p and four pathways with high enrichment (VEGF signaling
   pathway, adherens junction, toll‐like receptor signaling pathway, and
   neurotrophin signaling pathway) were denoted for the overlapped genes.
   The down‐regulation of miR‐146a‐5p in HCC has been validated with the
   most complete data possible. The potential gene signatures, regulatory
   pathways, and networks identified for miR‐146a‐5p in HCC could prove
   useful for molecular‐targeted diagnostics and therapeutics.

   Keywords: expression, gene signature, HCC, hepatocellular carcinoma,
   microRNA, miR‐146a‐5p
     __________________________________________________________________

Abbreviations

   AUC
          area under the curve

   EMT
          epithelial mesenchymal transition

   FCs
          fold changes

   GEO
          gene expression omnibus

   HCC
          hepatocellular carcinoma

   KEGG
          Kyoto Encyclopedia of Genes and Genomes

   miRs
          microRNAs

   NLP
          natural language processing

   ROC
          receiver operator characteristic

   TCGA
          The Cancer Genome Atlas

   Hepatocellular carcinoma (HCC) is considered to be the fifth most
   frequent cancer globally and takes the third place for cancer‐related
   mortality [36]1, [37]2. However, many patients are diagnosed at
   advanced stages, and recurrence and metastasis remain the main
   challenge for HCC treatment [38]3. Therefore, it is of utmost urgency
   to find novel diagnostic and prognostic biomarkers for HCC.

   MicroRNAs (miRs) are an ample variety of short, noncoding RNA molecules
   of 18–25 nucleotides, which mediate numerous cellular processes, such
   as cell proliferation, migration, and apoptosis [39]4, [40]5. Among
   them is miR‐146a‐5p, which locates on human chromosome 5q34 and is
   thought to be actively involved in multiple oncological processes of
   HCC, such as antitumor immune suppression [41]6, metastasis [42]7, and
   angiogenesis [43]8. Our previous work [44]9 has demonstrated that the
   down‐regulated miR‐146a‐5p expression is associated with the
   carcinogenesis and deterioration of HCC and that miR‐146a‐5p might be a
   tumor‐suppressive microRNA of HCC. Nevertheless, the precise molecular
   mechanisms of miR‐146a‐5p in HCC remain largely unknown and obscure.

   Believed to be promising in cancer diagnostics and prognosis
   predicting, gene signatures help to provide the molecular backgrounds,
   regulatory pathways, and networks of cellular activities in HCC [45]10.
   Cases in point are resources and techniques as follows: Gene Expression
   Omnibus (GEO) Database stores public array‐ and sequence‐based
   functional genomics data, which allows users’ query and downloading of
   experiments and gene expression profiles [46]11. Meanwhile, The Cancer
   Genome Atlas (TCGA) is one prominent example of the renowned public
   databases which contains the genetic information of various cancers.
   Furthermore, natural language processing (NLP) is a booming technique
   which teaches computers to comprehend and to sort out natural language
   by algorithms and programs, enabling researchers to retrieve papers on
   certain topics of interest and to analyze data automatically [47]12.

   A succession of resources and techniques in bioinformatics and
   computational biology were applied in the study, which includes GEO and
   TCGA data aggregation, comprehensive meta‐analyses, NLP analysis,
   target genes prediction, analytic integration, and bioinformatics
   analyses. We aimed to validate the down‐regulation of miR‐146a‐5p in
   HCC with the most complete data currently available and to present the
   audience with the intriguing gene signatures, regulatory pathways, and
   networks of miR‐146a‐5p in the carcinogenesis, metastasis, prognosis,
   recurrence, survival, and drug‐resistance (sorafenib and bevacizumab)
   of HCC.

Materials and methods

   The present study consists of several processes sequentially (Fig.
   [48]1), that is, GEO‐based clinical values verification, TCGA‐based
   RNA‐seq data aggregation, comprehensive meta‐analyses based on GEO,
   TCGA and literature data, and multiple bioinformatics analyses.

Figure 1.

   Figure 1
   [49]Open in a new tab

   General flow chart. The present study is composed of several procedures
   sequentially; that is, GEO‐based verification of clinical values,
   TCGA‐based data aggregation of RNA‐seq, comprehensive meta‐analyses,
   and multiple bioinformatics analyses.

Clinical value verification of miR‐146a‐5p expression in HCC based on GEO
datasets

   All the functional genomics data of miR‐146a‐5p were requested and
   assembled from the GEO Database ([50]http://www.ncbi.nlm.nih.gov/geo/)
   with the closing date of 10 September 2016. The search strategy
   formulated in the GEO datasets ([51]http://www.ncbi.nlm.nih.gov/gds/)
   was as follows: (malignan* OR cancer OR tumor OR tumour OR neoplas* OR
   carcinoma) AND (hepatocellular OR liver OR hepatic OR HCC). Inclusion
   criteria were listed below: (a) HCC tissues were included in each
   dataset with each group containing more than two samples, regardless of
   the inclusion of adjacent noncancerous tissues (or healthy liver
   tissues); (b) the dataset sample organism was Homo sapiens; (c) the
   expression data of miR‐146a (hsa‐miR‐146a or hsa‐miR‐146a‐5p) from the
   experimental and control groups could be provided or calculated.
   Meanwhile, the following conditions might cause the exclusion of
   related datasets: (a) datasets without information on miR‐146a‐5p; (b)
   datasets without complete data for analysis; (c) samples based on cell
   lines; (d) not all the subjects of the included studies were human; or
   (e) miR‐146a‐5p was determined in the HCC patients without a
   comparison. Expression values of miR‐146a‐5p and sample size in both
   test and control groups were calculated. Moreover, means and standard
   deviations of these values were extracted to estimate the different
   levels of miR‐146a‐5p in case and control groups by using Review
   Manager 5.3 with random‐effects model. The chi‐square test and the I ^2
   statistics were applied to evaluate the heterogeneity across studies.
   It was considered to be heterogeneous when the P value <0.05 or I ^2 >
   50%. Furthermore, SMD and its 95% CI were pooled to assess the
   stability of the analysis. It was considered to be statistically
   significant if the corresponding 95% CI for the pooled SMD did not
   overlap 1 or ‐1. Additionally, sensitivity analysis was conducted by
   eliminating each study to evaluate the source of heterogeneity.

RNA‐seq data aggregation based on TCGA database

   From the TCGA ([52]http://cancergenome.nih.gov/), we downloaded and
   extracted the data of miR‐146a‐5p expression from miRNASeqV2 (level 3),
   on 15 July 2016, through bulk download mode. MiR‐146a‐5p expression
   data were presented as upper quartile normalized
   Expectation‐Maximization (RSEM) count estimates [53]13, [54]14 by using
   the ‘rsem.gene.normalized_results’ file type. Related data were
   processed without further transformation, except that some values were
   rounded off to integers. The expression data between HCC and adjacent
   normal liver tissues were compared by limma package in R. Fold changes
   (FCs) were calculated as HCC vs. normal liver tissue. It would be
   considered as statistically significant if a FC value was <0.5 or >2
   and with the P value <0.05 in Student's t‐test.

Comprehensive meta‐analysis based on GEO, TCGA, and literature data

   Comprehensive meta‐analyses were performed based on the data gathered
   from GEO, TCGA, and relevant literature. Related studies were selected
   by comprehensively searching through the online databases PubMed,
   Embase, Web of Science, Wiley Online Library, Cochrane Library, Science
   Direct, Chinese WanFang Database, Chinese VIP Database, Chinese
   Biomedical Literature Database, and Chinese CNKI Database up to 15 July
   2016, independently. The following combination of keywords and entry
   words was employed: (a) (miR‐146a OR miRNA‐146a OR microRNA‐146a OR
   miR146a OR miRNA146a OR microRNA146a OR ‘miR 146a’ OR ‘miRNA 146a’ OR
   ‘microRNA 146a'OR miR‐146a‐5p OR miRNA‐146a‐5p OR microRNA‐146a‐5p);
   (b) (hepatocellular OR liver OR hepatic OR HCC); (c) (‘cancer’ OR
   ‘tumor’ OR ‘tumour’ OR ‘neoplas*’ OR ‘carcinoma’ OR ‘sarcoma’ OR
   ‘malignan*’). In addition, some references of relevant articles were