Abstract

Background

   Hepatocellular carcinoma (HCC) is the second leading cause of death
   among cancers worldwide. In this study, we aimed to identify the
   molecular target genes and detect the key mechanisms of HCC. Three gene
   expression profiles ([40]GSE84006, [41]GSE14323, [42]GSE14811) and two
   miRNA expression profiles ([43]GSE40744, [44]GSE36915) were analyzed to
   determine the molecular target genes, microRNAs (miRNAs) and the
   potential molecular mechanisms in HCC.

Methods

   All profiles were extracted from the Gene Expression Omnibus database.
   The identification of the differentially expressed genes (DEGs) was
   analyzed by the GEO2R method. Kyoto Encyclopedia of Genes and Genomes
   (KEGG) pathway and gene ontology (GO) enrichment analysis performed
   database for Integrated Discovery, Visualization and Annotation. The
   miRNA-gene network and protein–protein interaction (PPI) network were
   correlated by the Cytoscape software. The key target genes were
   identified by the CytoHubba plugin, Molecular Complex Detection (MCODE)
   plugin and miRNA-gene network. The identified hub genes were testified
   for survival curve using the Kaplan–Meier plotter database.

Results

   Expression profiles had 592 overlapped DEGs. The majority of the DEGs
   were enriched in membrane-bounded organelles and intracellular
   membrane-bounded organelles. These DEGs were significantly enriched in
   metabolic, protein processing in the endoplasmic reticulum and thyroid
   cancer pathways. PPI network analysis showed these genes were mostly
   involved in the pathogenic Escherichia coli infection and the
   regulation of actin cytoskeleton pathways. Combining these results, we
   identified 10 key genes involving in the progression of HCC. Finally,
   PLK1, PRCC, PRPF4 and PSMA7 exhibited higher expression levels in HCC
   patients with poor prognosis than those for lower expression via
   Kaplan–Meier plotter database.

Conclusion

   PLK1, PRCC, PRPF4 and PSMA7 could be potential biomarkers or
   therapeutic targets for HCC. Meanwhile, the metabolic pathway, protein
   processing in the endoplasmic reticulum and the thyroid cancer pathway
   may play vital roles in the progression of HCC.

   Keywords: hepatocellular carcinoma, DEGs, bioinformatic analysis, KEGG
   pathway, PPI network, miRNA-gene network

Introduction

   Hepatocellular carcinoma (HCC), the second most common cancer in the
   world, is a major contributor to cancer mortality and incidence.[45]^1
   More than 580,000 new cases are increased every year in Asia, with
   China accounting for more than 50%.[46]^2 Several methods have been
   used for decreasing the incidence rate, containing early detection and
   diagnosis. These methods are well efficacious and could reduce the
   mortality and incidence. However, the 10-year survival rate remains
   unsatisfactory in patients.[47]^3 As is well known, HCC is a
   heterogeneous disease. The occurrence and development of HCC are
   associated with the cellular context, environmental influences, and
   gene aberrations.[48]^1 Recently, several studies suggested that
   cellular pathways and multiple genes may participate in the progression
   of HCC.[49]^4^–[50]^6 However, the underlying molecular mechanisms
   remain unknown. Thus, it is urgent and important to dig the hub
   molecules and to uncover the key molecular mechanisms.

   The miRNAs are a series of small, non-coding regulatory RNAs, which
   regulate gene expression and perform a crucial role in the regulation
   of proliferation, cellular differentiation and cell death. A part of
   miRNAs was regarded as tumor suppressor genes, or oncogenes that
   contribute to several cancer progressions. Notably, some key miRNAs
   provide a novel approach for HCC therapy. For instance, microRNA-21,
   microRNA-26a and microRNA-29a-3p have been reported association with
   HCC and could serve as biomarkers;[51]^7^,[52]^8 Shigoka et al also
   showed that plasma miR-92a could effectively discriminate HCC from the
   control subjects.[53]^9

   In recent years, a large amount of data about transcriptome of cells
   have been established using numerous of high‑throughput technologies.
   In particular, the microarray is a novel high-throughput platform for
   the analysis of gene expression. During tumorigenesis, the microarray
   technique, as a useful tool for the analysis of general genetic
   alteration, has been extensively used. Through accurate microarray
   analyses, the key factors processed by biochip data extraction,
   biological data clustering and sequence alignment can be identified.
   These analytical techniques provide a new and quick method to explore
   the molecular pathogenesis mechanism of various types of cancers.

   In this study, 3 gene expression profiles and 2 miRNA expression
   profiles were downloaded from the Gene Expression Omnibus (GEO)
   database to obtain differentially expressed genes (DEGs) and
   differentially expressed (DE) miRNAs between the HCC tissues and normal
   liver tissues samples. Furthermore, functional enrichment and the PPI
   network analyses were used to identify the DEGs. Meanwhile, with the
   mRNA–microRNA interaction analysis, the ten hub genes in DEGs were
   tested and four key genes and miRNAs as well as their potential
   molecular mechanisms were studied.

Materials and methods

Microarray data

   Three gene expression profiles ([54]GSE84006, [55]GSE14323,
   [56]GSE14811) and two miRNA expression profiles ([57]GSE40744,
   [58]GSE36915) were retrieved from the GEO database.[59]^10

Identification of DEGs and DE miRNAs

   Limma is a Bioconductor package of R that offers a comprehensive
   solution for the analysis of gene expression data. limma has been
   widely used for gene detection.[60]^11 This package contains
   sophisticated facilities for reading, normalizing and exploring for
   large datasets. Moreover, it also applies an adjusted P-value (adj. P)
   to help correct false-positives. Herein, the adj. P for selection of
   DEGs and DE miRNAs was set as <0.05. Funrich webserver
   ([61]http://funrich.org/index.html) was utilized to analyze the overlap
   of DEGs in the three datasets and the overlapping DEMs in the two miRNA
   expression profiles.[62]^12

Functional and pathway enrichment analysis

   Gene ontology (GO) is widely used in annotating genes, gene products
   and sequences. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a
   comprehensive database for biological interpretation of genome
   sequences and other high-throughput data. Both analyses were available
   in the Database for annotation, visualization and integrated discovery
   database (DAVID database), which is a bioinformatics data resource
   composed of an integrated biology knowledge base and analysis tools to
   extract meaningful biological information from large quantities of
   genes and protein collections.[63]^13 Herein, GO and KEGG analyses were
   applied by using the DAVID database to identification of DEGs. The
   cut-off criterion was set (P-value <0.05).

Construction of PPI network and analysis of modules

   The online database of STRING ([64]https://string-db.org/) was applied
   to assess the PPI containing direct (physical) and indirect
   (functional) associations.[65]^14 Then, the Cytoscape program, an
   open-source tool, was used for the visual exploration of the
   interaction networks among different biomolecules, which contained
   protein, gene and other types of interactions.[66]^15 Herein, the DEGs
   were mapped to STRING database to assess the PPI and visualized by
   Cytoscape program. The cut-off criterion of combined score and node
   degree was set to >0.15 and ≥10, respectively. Recent study suggested
   that the newly proposed method MCC performs better than other 10
   previous reported methods.[67]^16 Therefore, the plug-in of Molecular
   Complex Detection (MCODE) was utilized to screen hub gene modules with
   the degree cut-off, haircut on, k-core, node score cut-off, max depth
   set as 10, 0.2, 2, 0.2 and 100. In addition, the functional and pathway
   enrichment analyses of DEGs were carried out by DAVID database in each
   module and the cut-off criterion of P-value was set as <0.05.

Prediction of miRNA targets and prognosis analysis

   The target genes of the DEMs from [68]GSE40744 and [69]GSE36915 were
   predicted by miRNet webserver ([70]https://www.mirnet.ca/), which is
   commonly used for predicting microRNA targets.[71]^17 The target genes
   were superposed with the DEGs to gain an intersection for further
   analysis. To identification of the hub genes, the results from MCODE,
   CytoHubba and miRNA-gene networks were combined. The Kaplan–Meier
   plotter webserver was applied to analyze the prognostic significance of
   the identified hub genes.[72]^17

Results

Identification of DEGs in expression profiles

   [73]GSE84006 datasets contained 38 tumor tissues and 38 paired adjacent
   non-tumor tissues. [74]GSE14811 datasets contained 56 HCC samples and
   56 liver samples. [75]GSE14323 included 19 normal samples, 17 cirrhosis
   HCC samples, 41 cirrhosis samples and 38 HCC samples. The miRNA
   expression profile of [76]GSE40744 included 9 normal samples and 67
   HCV-associated HCC tumor samples. [77]GSE36915 included 21 non-tumor
   samples and 68 tumor samples. A total of 8,655, 11,544, and 1,375 DEGs
   were identified after the analyses of the [78]GSE84006, [79]GSE14323,
   and [80]GSE14811 datasets, respectively. Among them, 592 genes were
   found in all these datasets ([81]Figure 1A).

Figure 1.

   [82]Figure 1
   [83]Open in a new tab

   Identification of overlapping DEGs and DE miRNAs. (A) Identification of
   overlapping upregulated and downregulated DEGs in [84]GSE84006,
   [85]GSE14323 and [86]GSE14811. (B) Identification of overlapping
   upregulated and downregulated DE miRNAs in [87]GSE36915 and
   [88]GSE40744.

Functional and pathway enrichment analyses of DEGs

   The GO analysis indicated 366 overlapping genes that are involved in
   numerous biological processes (BP), such as substance metabolic, single
   organisms and macromolecules ([89]Figure 2A). In terms of cellular
   components (CC), DEGS were mainly enriched in the membrane-bounded
   organelles and intracellular membrane-bounded organelles ([90]Figure
   2B). The overlapping DEGs were primarily associated to the structural
   organization of the extracellular matrix, organic cyclic compounds
   binding, protein binding and heterocyclic compound binding in terms of
   molecular functions ([91]Figure 2C). Additionally, the DEGs were
   enriched in three Kyoto Encyclopedia of Genes and Genomes (KEGG)
   pathways, which included the metabolic pathways, protein processing in
   the endoplasmic reticulum and the thyroid cancer pathway ([92]Figure
   2D).

Figure 2.

   [93]Figure 2
   [94]Open in a new tab

   GO method analyzed the overlapping DEGs. The number of DEGs were showed
   of black bars. (A) Biological processes (BP) of top 10; (B) Cellular
   components (CC) of top 10; (C) Molecular function (MF) of top 10; (D)
   KEGG pathway of top 10.

Construction of PPI network and analysis of modules

   A total of 532 nodes and 3,370 edges were mapped in the PPI network of
   the identified DEGs. As shown in [95]Figure 3, these DEGs-overlapped
   showed significant interactions and networks. The combined scores
   higher than 0.4 in PPIs were used for constructing the PPI networks.
   The MCODE plugin was used to analyze the entire PPI network and four
   modules were selected. As shown in [96]Figure 4, enrichment analysis of
   the module genes of the KEGG pathway reveals enrichment in pathogenic
   Escherichia coli infection and the regulation of actin cytoskeleton.
   The first 50 genes provided by MCC method were selected and
   sequentially ordered as follows: SNRPD2, SF3B4, HNRNPA1, SNRPC, HNRNPU,
   RNPS1, POLR2H, SF3A2, PRPF3, POLR2I and so on ([97]Table 1 ranked by
   MCC method).

Figure 3.

   [98]Figure 3
   [99]Open in a new tab

   PPI network construction and module analysis. DEGs used blue nodes to
   represent. Orange colors of nodes represented the correlation to
   modules with other DEGs; meanwhile, the lines represented the
   relationship of two nodes.

Figure 4.

   [100]Figure 4
   [101]Open in a new tab

   KEGG pathway analysis. The most important four modules analyzed by KEGG
   pathway. Bars signified the number of DEGs. Different colors
   represented different modules.

Table 1.

   Top 50 in PPI network ranked by MCC method
   Rank  Name    Score
   1    SNRPD2  2.09E+13
   2    SF3B4   2.09E+13
   3    HNRNPA1 2.09E+13
   4    SNRPC   2.09E+13
   5    HNRNPU  2.09E+13
   6    RNPS1   2.09E+13
   7    POLR2H  2.09E+13
   8    SF3A2   2.09E+13
   9    PRPF3   2.09E+13
   10   POLR2I  2.09E+13
   11   GTF2F1  2.09E+13
   12   GPKOW   2.09E+13
   13   SF3B5   2.09E+13
   14   RBM17   2.09E+13
   15   PRCC    2.09E+13
   16   PRPF4   2.09E+13
   17   CHERP   2.09E+13
   18   PLK1    2.44E+09
   19   AURKB   2.40E+09
   20   AURKA   1.92E+09
   21   PSMD12  1.48E+09
   22   PSMC4   1.48E+09
   23   PSMA7   1.48E+09
   24   PSMB5   1.48E+09
   25   PSMB4   1.48E+09
   26   PSMD6   1.48E+09
   27   SHFM1   1.48E+09
   28   PSME3   1.48E+09
   29   BUB3    1.44E+09
   30   PPP2R5D 9.98E+08
   31   TOP2A   9.69E+08
   32   CCNB1   9.66E+08
   33   CENPA   9.65E+08
   34   ASF1B   9.65E+08
   35   PCNA    9.62E+08
   36   MCM3    9.62E+08
   37   RAD54L  9.62E+08
   38   OIP5    9.62E+08
   39   RPL8    9.59E+08
   40   RPL17   9.59E+08
   41   SEC61A1 9.59E+08
   42   RPS18   9.59E+08
   43   RPS13   9.58E+08
   44   MCM7    9.58E+08
   45   RPS21   9.58E+08
   46   RPL28   9.58E+08
   47   RPN2    9.58E+08
   48   RPN1    9.58E+08
   49   SSR3    9.58E+08
   50   SSR2    9.58E+08
   [102]Open in a new tab

miRNA-DEG pairs

   As shown in [103]Figure 1B, a total of 22 DE miRNAs were identified by
   evaluating the dataset of miRNA expression dataset ([104]GSE36915 and
   [105]GSE40744). Then, 180 nodes and 224 edges were mapped in the
   miRNA-gene network of the identified DEGs and miRNAs ([106]Figure 5).
   Moreover, the genes predicted by the miRNet webserver were considered
   as DE miRNAs target genes. For further identification of reliable hub
   genes, the identified target genes were compared to the DEGs. In this
   process, only the overlapping genes were considered as the hub genes.
   After combination of the results provided by MCODE, CytoHubba and the
   miRNA-gene network, ten hub genes were selected, including PLK1, PRCC,
   PRPF4, PSMA7, CHERP, GPKOW, HNRNPA1, PSMB5, PSMC4 and RNPS1.

Figure 5.

   [107]Figure 5
   [108]Open in a new tab

   miRNA-gene network. Regulation of DEGs in miRNA-gene network. Light
   blue nodes stand for DEGs, light orange nodes represent DE miRNAs. The
   lines represent the regulation of relationship between two nodes.

Kaplan–Meier survival analysis

   To verify the results from bioinformatics analysis, the Kaplan–Meier
   plotter database was performed to predict the prognostic value of these
   hub genes. As shown in [109]Figure 6, the OS for 364 patients with HCC
   was analyzed using the Kaplan–Meier survival plot. As a result, PLK1,
   PRCC, PRPF4 and PSMA7 were significantly correlated with worse OS for
   HCC patients.

Figure 6.

   [110]Figure 6
   [111]Open in a new tab

   Survival rates of four target genes. Using the Kaplan–Meier plotter
   database to analyze prognostic significance of the target genes in
   individuals of HCC. (A) PLK1. (B) PRCC. (C) PRPF4. (D) PSMA7. The red
   lines signified individuals with high expression of target gene and
   black lines with low expression.

Discussion

   Although HCC has been extensively studied, HCC is still a seriously
   difficult problem in the early diagnosis. The reason for that is the
   lack of depth understanding of the molecular mechanisms in the
   occurrence, progression and development. Therefore, a depth research,
   relating to the key genes and mechanisms of HCC development and
   progression, is important for HCC diagnosis and pre-treatment. It is
   convenient and easy to discover the general genetic alterations in the
   development and progression of diseases by the novel microarray
   technology, which could investigate the underlying hub genes and
   mechanisms for the diagnosis and therapies.[112]^18^–[113]^20

   In this study, we analyzed 592 overlapping DEGs, both up-regulated and
   down-regulated genes, in the [114]GSE84006, [115]GSE14323 and
   [116]GSE14811 expression profile datasets ([117]Figure 1). The GO
   analysis indicated that the 592 overlapping DEGs were correlated to
   substance metabolic, single organisms and macromolecules at the level
   of BPs. Besides, we tested DEGs in the KEGG pathway enrichment
   analysis. The results showed that the overlapping DEGs were mostly
   enriched in the metabolic pathway, protein processing in the
   endoplasmic reticulum and the thyroid cancer pathway. The metabolic
   pathway was identified as one of the essential pathways of the major
   modules of the overlapping DEGs ([118]Figure 2). These enriched
   pathways may provide potential strategies for the development of new
   therapeutic agents.

   Carcinogenesis is a complicated and complex process influencing with
   some speciﬁc genes and involving multiple signaling cascades. Recently,
   bioinformatics analyses have been used for identifying novel diagnosis
   markers and therapeutic targets for many cancers.[119]^21^–[120]^23 Zou
   et al[121]^24 identified NRAGE as a novel biomarker for HCC by
   integrated bioinformatics analysis. Furthermore, in early HCC, the G2/M
   checkpoint participated in the progression.[122]^25 Several similar
   researches have been shown for HCC now. Using bioinformatics analysis,
   Wang et al[123]^26 have found the miR-192-3p-XIAP axis involved in HCC.
   However, compared with this study, previous studies only chose a signal
   data or file alone, as well as analyzed the one module method, with a
   high degree of connectivity, to select those hub genes. Notably, the
   genes provided by our study that not only combining the results from
   CytoHubba, MCODE method and network tools for the identification of the
   key genes, but also using the Kaplan–Meier database. This is forceful
   increasing the reliability of these results ([124]Figures 3–[125]5).

   We predicted 10 hub genes containing PLK1, PRCC, PRPF4, PSMA7, CHERP,
   GPKOW, HNRNPA1, PSMB5, PSMC4 and RNPS1. A few studies have reported
   these genes in other cancers. To ensure the results obtained from the
   bioinformatics analysis, the Kaplan–Meier plotter database was been
   chosen to predict the prognostic value of these key genes. Only PLK1,
   PRCC, PRPF4 and PSMA7 significantly correlated with worse OS for the
   HCC patients ([126]Figure 6).

   PLK1, a serine/threonine-protein kinase that belongs to the polo-like
   kinase family, participates in various biological processes about
   embryonic development, cell cycle and RNA processing.[127]^27 Xu et
   al[128]^28 tested high expression of PLK1 in the HCC tissues showed
   significantly worse effect in the histological type. PLK1 and HOTAIR
   accelerate proteasome degradation of SUZ12 and ZNF198 during hepatitis
   B virus-induced liver carcinogenesis.[129]^29 PLK1 phosphorylation of
   PTEN causes a tumor-promoting metabolic state.[130]^30

   PRCC, the most common TFE3 fusion partner in papillary renal carcinoma,
   is associated with pre-mRNA splicing factors.[131]^31 In a subset of
   papillary renal cell carcinomas, PRCC has been repeatedly reported and
   presumably causes cancer.[132]^32^,[133]^33 By RNA sequencing and FISH
   experiment, the clinicopathological analysis and detection of the gene
   fusion verified that the PRCC-MITF gene fusion defines a novel member
   of the MiT family translocation renal cell carcinoma.[134]^33

   PRPF4, a key component of the spliceosomes, plays a critical role in
   pre-mRNA splicing and its mutations result in retinitis pigmentosa due
   to photoreceptor defects.[135]^34^,[136]^35 PRPF4 is necessary part for
   cell survival, and in zebrafish, PRPF4 participate in posterior lateral
   line primordium migration.[137]^36 However, PRCC and PRPF4 are not
   reported with HCC, and studying these important genes can therefore
   increase the understanding of not HCC but other cancers. Notably, it is
   not coincident in the PRPF4 mRNA level and its corresponding miRNA
   level. That result potential showed that miRNA may be not directly
   control PRPF4.

   The gene PSMA7 encodes a member of the peptidase T1A family, which is a
   20S core alpha subunit.[138]^37 PSMA7 is often over-expressed in
   several human common cancers. In addition, overexpression of PSMA7
   correlates with reduced survival rate in several types of cancers.
   Holmila et al[139]^38 tested a diagnostic use of PSMA7 to screen for
   early HCC. Tan et al[140]^39 investigated that PSMA7 inhibited the
   tumorigenicity of A549 human lung adenocarcinoma cells. The effects of
   sh-PSMA7 on cell proliferation and vascular endothelial growth factor
   expression via the ubiquitin-proteasome pathway in cervical
   cancer.[141]^40 Salivary exosomal PSMA7 is a promising biomarker of
   inflammatory bowel disease.[142]^41

Conclusion

   In conclusion, a total of four genes including PLK1, PRCC, PRPF4 and
   PSMA7 were identified as HCC biomarkers, and metabolic pathway, protein
   processing in the endoplasmic reticulum and thyroid cancer pathway were
   confirmed to be the essential and most mechanisms in the development of
   HCC. Our study, however, had some obvious limitations such as no
   analysis of clinic liver tissue samples, or only using the Kaplan–Meier
   plotter database to analyze the prognostic value of these molecular
   target genes. Further studies should explore these markers in the HCC
   patients to ensure the prognostic effect and need to dig the underlying
   mechanisms or related pathways. In the next step, we would try to use
   computer to predict which compound could target these hub genes to
   inhibit HCC development; meanwhile, using some biological method we
   could investigate these potential functions in vivo and in vitro.
   Overall, our results showed that the interactive network of miRNAs and
   mRNA is highly complex. However, we need more experiments to
   investigate these novel, key and hub genes.

Acknowledgments