Abstract Background Hepatocellular carcinoma (HCC) is the second leading cause of death among cancers worldwide. In this study, we aimed to identify the molecular target genes and detect the key mechanisms of HCC. Three gene expression profiles ([40]GSE84006, [41]GSE14323, [42]GSE14811) and two miRNA expression profiles ([43]GSE40744, [44]GSE36915) were analyzed to determine the molecular target genes, microRNAs (miRNAs) and the potential molecular mechanisms in HCC. Methods All profiles were extracted from the Gene Expression Omnibus database. The identification of the differentially expressed genes (DEGs) was analyzed by the GEO2R method. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and gene ontology (GO) enrichment analysis performed database for Integrated Discovery, Visualization and Annotation. The miRNA-gene network and protein–protein interaction (PPI) network were correlated by the Cytoscape software. The key target genes were identified by the CytoHubba plugin, Molecular Complex Detection (MCODE) plugin and miRNA-gene network. The identified hub genes were testified for survival curve using the Kaplan–Meier plotter database. Results Expression profiles had 592 overlapped DEGs. The majority of the DEGs were enriched in membrane-bounded organelles and intracellular membrane-bounded organelles. These DEGs were significantly enriched in metabolic, protein processing in the endoplasmic reticulum and thyroid cancer pathways. PPI network analysis showed these genes were mostly involved in the pathogenic Escherichia coli infection and the regulation of actin cytoskeleton pathways. Combining these results, we identified 10 key genes involving in the progression of HCC. Finally, PLK1, PRCC, PRPF4 and PSMA7 exhibited higher expression levels in HCC patients with poor prognosis than those for lower expression via Kaplan–Meier plotter database. Conclusion PLK1, PRCC, PRPF4 and PSMA7 could be potential biomarkers or therapeutic targets for HCC. Meanwhile, the metabolic pathway, protein processing in the endoplasmic reticulum and the thyroid cancer pathway may play vital roles in the progression of HCC. Keywords: hepatocellular carcinoma, DEGs, bioinformatic analysis, KEGG pathway, PPI network, miRNA-gene network Introduction Hepatocellular carcinoma (HCC), the second most common cancer in the world, is a major contributor to cancer mortality and incidence.[45]^1 More than 580,000 new cases are increased every year in Asia, with China accounting for more than 50%.[46]^2 Several methods have been used for decreasing the incidence rate, containing early detection and diagnosis. These methods are well efficacious and could reduce the mortality and incidence. However, the 10-year survival rate remains unsatisfactory in patients.[47]^3 As is well known, HCC is a heterogeneous disease. The occurrence and development of HCC are associated with the cellular context, environmental influences, and gene aberrations.[48]^1 Recently, several studies suggested that cellular pathways and multiple genes may participate in the progression of HCC.[49]^4^–[50]^6 However, the underlying molecular mechanisms remain unknown. Thus, it is urgent and important to dig the hub molecules and to uncover the key molecular mechanisms. The miRNAs are a series of small, non-coding regulatory RNAs, which regulate gene expression and perform a crucial role in the regulation of proliferation, cellular differentiation and cell death. A part of miRNAs was regarded as tumor suppressor genes, or oncogenes that contribute to several cancer progressions. Notably, some key miRNAs provide a novel approach for HCC therapy. For instance, microRNA-21, microRNA-26a and microRNA-29a-3p have been reported association with HCC and could serve as biomarkers;[51]^7^,[52]^8 Shigoka et al also showed that plasma miR-92a could effectively discriminate HCC from the control subjects.[53]^9 In recent years, a large amount of data about transcriptome of cells have been established using numerous of high‑throughput technologies. In particular, the microarray is a novel high-throughput platform for the analysis of gene expression. During tumorigenesis, the microarray technique, as a useful tool for the analysis of general genetic alteration, has been extensively used. Through accurate microarray analyses, the key factors processed by biochip data extraction, biological data clustering and sequence alignment can be identified. These analytical techniques provide a new and quick method to explore the molecular pathogenesis mechanism of various types of cancers. In this study, 3 gene expression profiles and 2 miRNA expression profiles were downloaded from the Gene Expression Omnibus (GEO) database to obtain differentially expressed genes (DEGs) and differentially expressed (DE) miRNAs between the HCC tissues and normal liver tissues samples. Furthermore, functional enrichment and the PPI network analyses were used to identify the DEGs. Meanwhile, with the mRNA–microRNA interaction analysis, the ten hub genes in DEGs were tested and four key genes and miRNAs as well as their potential molecular mechanisms were studied. Materials and methods Microarray data Three gene expression profiles ([54]GSE84006, [55]GSE14323, [56]GSE14811) and two miRNA expression profiles ([57]GSE40744, [58]GSE36915) were retrieved from the GEO database.[59]^10 Identification of DEGs and DE miRNAs Limma is a Bioconductor package of R that offers a comprehensive solution for the analysis of gene expression data. limma has been widely used for gene detection.[60]^11 This package contains sophisticated facilities for reading, normalizing and exploring for large datasets. Moreover, it also applies an adjusted P-value (adj. P) to help correct false-positives. Herein, the adj. P for selection of DEGs and DE miRNAs was set as <0.05. Funrich webserver ([61]http://funrich.org/index.html) was utilized to analyze the overlap of DEGs in the three datasets and the overlapping DEMs in the two miRNA expression profiles.[62]^12 Functional and pathway enrichment analysis Gene ontology (GO) is widely used in annotating genes, gene products and sequences. The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a comprehensive database for biological interpretation of genome sequences and other high-throughput data. Both analyses were available in the Database for annotation, visualization and integrated discovery database (DAVID database), which is a bioinformatics data resource composed of an integrated biology knowledge base and analysis tools to extract meaningful biological information from large quantities of genes and protein collections.[63]^13 Herein, GO and KEGG analyses were applied by using the DAVID database to identification of DEGs. The cut-off criterion was set (P-value <0.05). Construction of PPI network and analysis of modules The online database of STRING ([64]https://string-db.org/) was applied to assess the PPI containing direct (physical) and indirect (functional) associations.[65]^14 Then, the Cytoscape program, an open-source tool, was used for the visual exploration of the interaction networks among different biomolecules, which contained protein, gene and other types of interactions.[66]^15 Herein, the DEGs were mapped to STRING database to assess the PPI and visualized by Cytoscape program. The cut-off criterion of combined score and node degree was set to >0.15 and ≥10, respectively. Recent study suggested that the newly proposed method MCC performs better than other 10 previous reported methods.[67]^16 Therefore, the plug-in of Molecular Complex Detection (MCODE) was utilized to screen hub gene modules with the degree cut-off, haircut on, k-core, node score cut-off, max depth set as 10, 0.2, 2, 0.2 and 100. In addition, the functional and pathway enrichment analyses of DEGs were carried out by DAVID database in each module and the cut-off criterion of P-value was set as <0.05. Prediction of miRNA targets and prognosis analysis The target genes of the DEMs from [68]GSE40744 and [69]GSE36915 were predicted by miRNet webserver ([70]https://www.mirnet.ca/), which is commonly used for predicting microRNA targets.[71]^17 The target genes were superposed with the DEGs to gain an intersection for further analysis. To identification of the hub genes, the results from MCODE, CytoHubba and miRNA-gene networks were combined. The Kaplan–Meier plotter webserver was applied to analyze the prognostic significance of the identified hub genes.[72]^17 Results Identification of DEGs in expression profiles [73]GSE84006 datasets contained 38 tumor tissues and 38 paired adjacent non-tumor tissues. [74]GSE14811 datasets contained 56 HCC samples and 56 liver samples. [75]GSE14323 included 19 normal samples, 17 cirrhosis HCC samples, 41 cirrhosis samples and 38 HCC samples. The miRNA expression profile of [76]GSE40744 included 9 normal samples and 67 HCV-associated HCC tumor samples. [77]GSE36915 included 21 non-tumor samples and 68 tumor samples. A total of 8,655, 11,544, and 1,375 DEGs were identified after the analyses of the [78]GSE84006, [79]GSE14323, and [80]GSE14811 datasets, respectively. Among them, 592 genes were found in all these datasets ([81]Figure 1A). Figure 1. [82]Figure 1 [83]Open in a new tab Identification of overlapping DEGs and DE miRNAs. (A) Identification of overlapping upregulated and downregulated DEGs in [84]GSE84006, [85]GSE14323 and [86]GSE14811. (B) Identification of overlapping upregulated and downregulated DE miRNAs in [87]GSE36915 and [88]GSE40744. Functional and pathway enrichment analyses of DEGs The GO analysis indicated 366 overlapping genes that are involved in numerous biological processes (BP), such as substance metabolic, single organisms and macromolecules ([89]Figure 2A). In terms of cellular components (CC), DEGS were mainly enriched in the membrane-bounded organelles and intracellular membrane-bounded organelles ([90]Figure 2B). The overlapping DEGs were primarily associated to the structural organization of the extracellular matrix, organic cyclic compounds binding, protein binding and heterocyclic compound binding in terms of molecular functions ([91]Figure 2C). Additionally, the DEGs were enriched in three Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, which included the metabolic pathways, protein processing in the endoplasmic reticulum and the thyroid cancer pathway ([92]Figure 2D). Figure 2. [93]Figure 2 [94]Open in a new tab GO method analyzed the overlapping DEGs. The number of DEGs were showed of black bars. (A) Biological processes (BP) of top 10; (B) Cellular components (CC) of top 10; (C) Molecular function (MF) of top 10; (D) KEGG pathway of top 10. Construction of PPI network and analysis of modules A total of 532 nodes and 3,370 edges were mapped in the PPI network of the identified DEGs. As shown in [95]Figure 3, these DEGs-overlapped showed significant interactions and networks. The combined scores higher than 0.4 in PPIs were used for constructing the PPI networks. The MCODE plugin was used to analyze the entire PPI network and four modules were selected. As shown in [96]Figure 4, enrichment analysis of the module genes of the KEGG pathway reveals enrichment in pathogenic Escherichia coli infection and the regulation of actin cytoskeleton. The first 50 genes provided by MCC method were selected and sequentially ordered as follows: SNRPD2, SF3B4, HNRNPA1, SNRPC, HNRNPU, RNPS1, POLR2H, SF3A2, PRPF3, POLR2I and so on ([97]Table 1 ranked by MCC method). Figure 3. [98]Figure 3 [99]Open in a new tab PPI network construction and module analysis. DEGs used blue nodes to represent. Orange colors of nodes represented the correlation to modules with other DEGs; meanwhile, the lines represented the relationship of two nodes. Figure 4. [100]Figure 4 [101]Open in a new tab KEGG pathway analysis. The most important four modules analyzed by KEGG pathway. Bars signified the number of DEGs. Different colors represented different modules. Table 1. Top 50 in PPI network ranked by MCC method Rank Name Score 1 SNRPD2 2.09E+13 2 SF3B4 2.09E+13 3 HNRNPA1 2.09E+13 4 SNRPC 2.09E+13 5 HNRNPU 2.09E+13 6 RNPS1 2.09E+13 7 POLR2H 2.09E+13 8 SF3A2 2.09E+13 9 PRPF3 2.09E+13 10 POLR2I 2.09E+13 11 GTF2F1 2.09E+13 12 GPKOW 2.09E+13 13 SF3B5 2.09E+13 14 RBM17 2.09E+13 15 PRCC 2.09E+13 16 PRPF4 2.09E+13 17 CHERP 2.09E+13 18 PLK1 2.44E+09 19 AURKB 2.40E+09 20 AURKA 1.92E+09 21 PSMD12 1.48E+09 22 PSMC4 1.48E+09 23 PSMA7 1.48E+09 24 PSMB5 1.48E+09 25 PSMB4 1.48E+09 26 PSMD6 1.48E+09 27 SHFM1 1.48E+09 28 PSME3 1.48E+09 29 BUB3 1.44E+09 30 PPP2R5D 9.98E+08 31 TOP2A 9.69E+08 32 CCNB1 9.66E+08 33 CENPA 9.65E+08 34 ASF1B 9.65E+08 35 PCNA 9.62E+08 36 MCM3 9.62E+08 37 RAD54L 9.62E+08 38 OIP5 9.62E+08 39 RPL8 9.59E+08 40 RPL17 9.59E+08 41 SEC61A1 9.59E+08 42 RPS18 9.59E+08 43 RPS13 9.58E+08 44 MCM7 9.58E+08 45 RPS21 9.58E+08 46 RPL28 9.58E+08 47 RPN2 9.58E+08 48 RPN1 9.58E+08 49 SSR3 9.58E+08 50 SSR2 9.58E+08 [102]Open in a new tab miRNA-DEG pairs As shown in [103]Figure 1B, a total of 22 DE miRNAs were identified by evaluating the dataset of miRNA expression dataset ([104]GSE36915 and [105]GSE40744). Then, 180 nodes and 224 edges were mapped in the miRNA-gene network of the identified DEGs and miRNAs ([106]Figure 5). Moreover, the genes predicted by the miRNet webserver were considered as DE miRNAs target genes. For further identification of reliable hub genes, the identified target genes were compared to the DEGs. In this process, only the overlapping genes were considered as the hub genes. After combination of the results provided by MCODE, CytoHubba and the miRNA-gene network, ten hub genes were selected, including PLK1, PRCC, PRPF4, PSMA7, CHERP, GPKOW, HNRNPA1, PSMB5, PSMC4 and RNPS1. Figure 5. [107]Figure 5 [108]Open in a new tab miRNA-gene network. Regulation of DEGs in miRNA-gene network. Light blue nodes stand for DEGs, light orange nodes represent DE miRNAs. The lines represent the regulation of relationship between two nodes. Kaplan–Meier survival analysis To verify the results from bioinformatics analysis, the Kaplan–Meier plotter database was performed to predict the prognostic value of these hub genes. As shown in [109]Figure 6, the OS for 364 patients with HCC was analyzed using the Kaplan–Meier survival plot. As a result, PLK1, PRCC, PRPF4 and PSMA7 were significantly correlated with worse OS for HCC patients. Figure 6. [110]Figure 6 [111]Open in a new tab Survival rates of four target genes. Using the Kaplan–Meier plotter database to analyze prognostic significance of the target genes in individuals of HCC. (A) PLK1. (B) PRCC. (C) PRPF4. (D) PSMA7. The red lines signified individuals with high expression of target gene and black lines with low expression. Discussion Although HCC has been extensively studied, HCC is still a seriously difficult problem in the early diagnosis. The reason for that is the lack of depth understanding of the molecular mechanisms in the occurrence, progression and development. Therefore, a depth research, relating to the key genes and mechanisms of HCC development and progression, is important for HCC diagnosis and pre-treatment. It is convenient and easy to discover the general genetic alterations in the development and progression of diseases by the novel microarray technology, which could investigate the underlying hub genes and mechanisms for the diagnosis and therapies.[112]^18^–[113]^20 In this study, we analyzed 592 overlapping DEGs, both up-regulated and down-regulated genes, in the [114]GSE84006, [115]GSE14323 and [116]GSE14811 expression profile datasets ([117]Figure 1). The GO analysis indicated that the 592 overlapping DEGs were correlated to substance metabolic, single organisms and macromolecules at the level of BPs. Besides, we tested DEGs in the KEGG pathway enrichment analysis. The results showed that the overlapping DEGs were mostly enriched in the metabolic pathway, protein processing in the endoplasmic reticulum and the thyroid cancer pathway. The metabolic pathway was identified as one of the essential pathways of the major modules of the overlapping DEGs ([118]Figure 2). These enriched pathways may provide potential strategies for the development of new therapeutic agents. Carcinogenesis is a complicated and complex process influencing with some specific genes and involving multiple signaling cascades. Recently, bioinformatics analyses have been used for identifying novel diagnosis markers and therapeutic targets for many cancers.[119]^21^–[120]^23 Zou et al[121]^24 identified NRAGE as a novel biomarker for HCC by integrated bioinformatics analysis. Furthermore, in early HCC, the G2/M checkpoint participated in the progression.[122]^25 Several similar researches have been shown for HCC now. Using bioinformatics analysis, Wang et al[123]^26 have found the miR-192-3p-XIAP axis involved in HCC. However, compared with this study, previous studies only chose a signal data or file alone, as well as analyzed the one module method, with a high degree of connectivity, to select those hub genes. Notably, the genes provided by our study that not only combining the results from CytoHubba, MCODE method and network tools for the identification of the key genes, but also using the Kaplan–Meier database. This is forceful increasing the reliability of these results ([124]Figures 3–[125]5). We predicted 10 hub genes containing PLK1, PRCC, PRPF4, PSMA7, CHERP, GPKOW, HNRNPA1, PSMB5, PSMC4 and RNPS1. A few studies have reported these genes in other cancers. To ensure the results obtained from the bioinformatics analysis, the Kaplan–Meier plotter database was been chosen to predict the prognostic value of these key genes. Only PLK1, PRCC, PRPF4 and PSMA7 significantly correlated with worse OS for the HCC patients ([126]Figure 6). PLK1, a serine/threonine-protein kinase that belongs to the polo-like kinase family, participates in various biological processes about embryonic development, cell cycle and RNA processing.[127]^27 Xu et al[128]^28 tested high expression of PLK1 in the HCC tissues showed significantly worse effect in the histological type. PLK1 and HOTAIR accelerate proteasome degradation of SUZ12 and ZNF198 during hepatitis B virus-induced liver carcinogenesis.[129]^29 PLK1 phosphorylation of PTEN causes a tumor-promoting metabolic state.[130]^30 PRCC, the most common TFE3 fusion partner in papillary renal carcinoma, is associated with pre-mRNA splicing factors.[131]^31 In a subset of papillary renal cell carcinomas, PRCC has been repeatedly reported and presumably causes cancer.[132]^32^,[133]^33 By RNA sequencing and FISH experiment, the clinicopathological analysis and detection of the gene fusion verified that the PRCC-MITF gene fusion defines a novel member of the MiT family translocation renal cell carcinoma.[134]^33 PRPF4, a key component of the spliceosomes, plays a critical role in pre-mRNA splicing and its mutations result in retinitis pigmentosa due to photoreceptor defects.[135]^34^,[136]^35 PRPF4 is necessary part for cell survival, and in zebrafish, PRPF4 participate in posterior lateral line primordium migration.[137]^36 However, PRCC and PRPF4 are not reported with HCC, and studying these important genes can therefore increase the understanding of not HCC but other cancers. Notably, it is not coincident in the PRPF4 mRNA level and its corresponding miRNA level. That result potential showed that miRNA may be not directly control PRPF4. The gene PSMA7 encodes a member of the peptidase T1A family, which is a 20S core alpha subunit.[138]^37 PSMA7 is often over-expressed in several human common cancers. In addition, overexpression of PSMA7 correlates with reduced survival rate in several types of cancers. Holmila et al[139]^38 tested a diagnostic use of PSMA7 to screen for early HCC. Tan et al[140]^39 investigated that PSMA7 inhibited the tumorigenicity of A549 human lung adenocarcinoma cells. The effects of sh-PSMA7 on cell proliferation and vascular endothelial growth factor expression via the ubiquitin-proteasome pathway in cervical cancer.[141]^40 Salivary exosomal PSMA7 is a promising biomarker of inflammatory bowel disease.[142]^41 Conclusion In conclusion, a total of four genes including PLK1, PRCC, PRPF4 and PSMA7 were identified as HCC biomarkers, and metabolic pathway, protein processing in the endoplasmic reticulum and thyroid cancer pathway were confirmed to be the essential and most mechanisms in the development of HCC. Our study, however, had some obvious limitations such as no analysis of clinic liver tissue samples, or only using the Kaplan–Meier plotter database to analyze the prognostic value of these molecular target genes. Further studies should explore these markers in the HCC patients to ensure the prognostic effect and need to dig the underlying mechanisms or related pathways. In the next step, we would try to use computer to predict which compound could target these hub genes to inhibit HCC development; meanwhile, using some biological method we could investigate these potential functions in vivo and in vitro. Overall, our results showed that the interactive network of miRNAs and mRNA is highly complex. However, we need more experiments to investigate these novel, key and hub genes. Acknowledgments