Abstract

Introduction

   Cirrhosis is one of the most important risk factors for development of
   hepatocellular carcinoma (HCC). Recent studies have shown that removal
   or well control of the underlying cause could reduce but not eliminate
   the risk of HCC. Therefore, it is important to elucidate the molecular
   mechanisms that drive the progression of cirrhosis to HCC.

Materials and Methods

   Microarray datasets incorporating cirrhosis and HCC subjects were
   identified from the Gene Expression Omnibus (GEO) database.
   Differentially expressed genes (DEGs) were determined by GEO2R
   software. Functional enrichment analysis was performed by the
   clusterProfiler package in R. Liver carcinogenesis-related networks and
   modules were established using STRING database and MCODE plug-in,
   respectively, which were visualized with Cytoscape software. The
   ability of modular gene signatures to discriminate cirrhosis from HCC
   was assessed by hierarchical clustering, principal component analysis
   (PCA), and receiver operating characteristic (ROC) curve. Association
   of top modular genes and HCC grades or prognosis was analyzed with the
   UALCAN web-tool. Protein expression and distribution of top modular
   genes were analyzed using the Human Protein Atlas database.

Results

   Four microarray datasets were retrieved from GEO database. Compared
   with cirrhotic livers, 125 upregulated and 252 downregulated genes in
   HCC tissues were found. These DEGs constituted a liver
   carcinogenesis-related network with 272 nodes and 2954 edges, with 65
   nodes being highly connected and formed a liver carcinogenesis-related
   module. The modular genes were significantly involved in several KEGG
   pathways, such as “cell cycle,” “DNA replication,” “p53 signaling
   pathway,” “mismatch repair,” “base excision repair,” etc. These
   identified modular gene signatures could robustly discriminate
   cirrhosis from HCC in the validation dataset. In contrast, the
   expression pattern of the modular genes was consistent between
   cirrhotic and normal livers. The top modular genes TOP2A, CDC20, PRC1,
   CCNB2, and NUSAP1 were associated with HCC onset, progression, and
   prognosis, and exhibited higher expression in HCC compared with normal
   livers in the HPA database.

Conclusion

   Our study revealed a highly connected module associated with liver
   carcinogenesis on a cirrhotic background, which may provide deeper
   understanding of the genetic alterations involved in the transition
   from cirrhosis to HCC, and offer valuable variables for screening and
   surveillance of HCC in high-risk patients with cirrhosis.

   Keywords: cirrhosis, hepatocellular carcinoma, transcriptome, module,
   prognosis

Introduction

   Hepatocellular carcinoma accounts for 90% of all primary liver
   malignancies ([27]Mittal and El-Serag, 2013; [28]Galle et al., 2018),
   posing a serious threat to human health and quality of life. Worldwide,
   most patients with HCC have underlying cirrhosis of various etiologies
   ([29]Fattovich et al., 2004; [30]Beste et al., 2015; [31]Walker et al.,
   2016). Growing clinical evidence shows that removal or control of the
   injurious factors, such as hepatitis B or C virus, can reduce but not
   eliminate the risk of HCC ([32]Casado et al., 2013; [33]Marcellin et
   al., 2013; [34]Xu et al., 2015; [35]Sun et al., 2017). Therefore, it is
   important to understand the molecular mechanisms that drive the
   progression of cirrhosis to HCC.

   Hepatocellular carcinoma occurs as a consequence of the complex
   interplay between multiple genetic determinants ([36]Sanyal et al.,
   2010). Previous studies have found that aberrations in genetic
   molecules pertaining to oxidative stress, EMT, inflammatory response,
   cellular senescence, or telomere dysfunction may contribute to the
   progression of cirrhosis to HCC ([37]Ramakrishna et al., 2013). In
   addition, the Wnt/β-catenin, p53, pRb, MAPK, RAS, and JAK/STAT pathways
   are also reported to be canonical molecular pathways in HCC development
   ([38]Aravalli et al., 2008). However, different studies often yield
   diverse results and the global view on the landscape of genomic changes
   is still not very clear.

   With the aid of high-throughput detection techniques, all expressed
   genetic molecules in a given liver tissue sample can be simultaneously
   detected over a wide quantitative range ([39]Mas et al., 2009;
   [40]Villanueva et al., 2011; [41]Yildiz et al., 2013; [42]Wang et al.,
   2014; [43]Lee, 2015; [44]Schulze et al., 2015; [45]Villanueva et al.,
   2015; [46]Diaz et al., 2018; [47]Shen et al., 2018). High-throughput
   sequencing and microarray technologies allow investigators to
   simultaneously measure the changes of genome-wide genes under certain
   biological conditions. These approaches usually generate large
   “interesting” gene lists. By using biological knowledge accumulated in
   public databases (e.g., KEGG^[48]1), it is possible to systematically
   dissect large gene lists in an attempt to assemble a summary of the
   most enriched and pertinent biology. Therefore, integrated analyses of
   multiple datasets generated from different studies may help us to
   identify reliable and reproducible genetic alterations involved in the
   development of HCC on a cirrhotic background.

   Therefore, our present study used multiple bioinformatics tools to
   systematically integrate publicly available transcriptomic datasets and
   performed high-throughput gene expression comparisons between HCC and
   benign cirrhotic tissues.

Materials and Methods

Retrieval of Microarray Datasets on Cirrhosis and HCC From Public Database

   First, we searched and retrieved transcriptome profiles of cirrhotic
   and HCC tissues from GEO which is a public functional genomics data
   repository, allowing users to query, locate, review, and download
   studies and gene expression profiles of interest ([49]Barrett et al.,
   2013).

   The search terms we used included “cirrhosis” and “HCC.” Studies were
   considered eligible for analysis if: (1) studies contained both
   cirrhosis and HCC tissues; (2) species was limited to Homo sapiens; and
   (3) platform was limited to microarray. Then the retrieved datasets
   were further screened by manual retrieval. Our workflow for
   bioinformatics analysis of publicly available datasets is illustrated
   in [50]Figure 1.

FIGURE 1.

   [51]FIGURE 1
   [52]Open in a new tab

   Workflow for bioinformatics analysis. GEO, Gene Expression Omnibus;
   HCC, hepatocellular carcinoma; STRING, Search Tool for the Retrieval of
   Interacting Genes; MCODE, Molecular Complex Detection; HCL,
   hierarchical clustering; PCA, principal component analysis; KEGG, Kyoto
   Encyclopedia of Genes and Genomes; HPA, Human Protein Atlas. ROC,
   receiver operating characteristic.

Identification of DEGs Related to Liver Carcinogenesis From the Retrieved
Microarray Datasets

   Differentially expressed genes between cirrhosis and HCC tissues were
   defined as liver carcinogenesis-related genes that may have important
   implications in driving cirrhosis to HCC.

   Gene expression in all the datasets was normalized by the
   antilog-transformed RMA algorithm. GEO query and limma R packages
   contained in GEO2R, which allows gene expression analysis of published
   microarray datasets, was used to determine the DEGs between cirrhosis
   and HCC tissues ([53]Davis and Meltzer, 2007). FDR < 0.05 and FC > 1.5
   were considered as the cutoff values for DEG screening. The overlapping
   DEGs in datasets were retained for further analyses.

Functional Specification of the Identified DEGs Related to Liver
Carcinogenesis

   To identify and visualize enriched KEGG pathways for the candidate gene
   sets, clusterProfiler, which is an R package for comparing biological
   themes among gene clusters, was employed ([54]Yu et al., 2012).
   Fisher’s exact test followed by the Benjamini correction was performed
   and an adjusted P-value of <0.05 was set as the cutoff criterion.

Establishment of the LiverCarcinogenesis-Related Network andIts Modules

   The internal regulatory relationships between the identified liver
   carcinogenesis-related genes were predicted by the STRING database
   (confidence score > 0.4) ([55]Szklarczyk et al., 2017). Liver
   carcinogenesis-related network was established and visualized with
   Cytoscape software ([56]Shannon et al., 2003).

   We used the MCODE plug-in in the Cytoscape software ([57]Bader and
   Hogue, 2003) to screen the modules concealed in the liver
   carcinogenesis-related network with the following criteria: Max. depth
   = 100, K-Core = 2, mode score cutoff = 0.2, and degree cutoff = 2.
   Likewise, the functional specification of the identified module was
   determined with the clusterProfiler package as mentioned above. An
   adjusted P-value of <0.05 was considered statistically significant.

Verification of the Identified Modules for Discriminating Cirrhosis From HCC

   We used three of datasets ([58]GSE89377, [59]GSE17548, and
   [60]GSE98383) to mine modules from the liver carcinogenesis-related
   network; and used the remaining dataset ([61]GSE56140) to validate the
   findings. To verify the ability of the identified modules to
   discriminate cirrhosis from HCC subjects, we performed hierarchical
   clustering analysis by using R with the complete linkage method and the
   expression of modular genes as a distance metric. To verify the results
   of hierarchical clustering, we applied the identified modular genes
   that were considered as observed variables to PCA plots. PCA was
   conducted with the ggbiplot package in R. The first two principal
   components (PCs) were then subjected to binary logistic regression
   analysis to calculate the predicted probability which was applied to
   the receiver operating characteristic (ROC) curve analysis implemented
   by SPSS 20 (IBM, United States). Area under curve (AUC) was calculated
   to determine the predictive performance of the identified gene module.
   In order to reduce sampling bias, the modules were screened from any
   three out of the four datasets and repeated evaluations of their
   discriminant ability were performed using the remaining dataset.

Comparison of the Identified Modular Genes in Normal and Cirrhotic Samples

   We used GEO2R software to determine the expression differences between
   any two groups in dataset. An FDR of <0.05 and an FC of >1.5 were
   considered as the cutoff values for DEG screening. Modular gene
   expression in normal, cirrhotic, and HCC samples were visualized by
   using a heatmap drawn with MeV software^[62]2.

Analyses of the Association Between the Top Modular Genes and HCC
Histological Grade or Clinical Outcome

   Modular genes with FC > 3 between cirrhosis and HCC tissues in all GEO
   datasets were considered as the top modular genes. Association of the
   top modular genes and HCC grades or prognosis was analyzed by using
   UALCAN ([63]Chandrashekar et al., 2017), which is an interactive
   web-portal for exploring the association between tumor subgroup gene
   expression and survival in TCGA^[64]3. Expression differences of top
   modular genes between normal and different tumor grades were analyzed
   using the statistical method built-in the UALCAN web-software; a
   P-value of <0.05 was considered significant.

   According to the TPM expression values, the top modular genes were
   divided into a high expression group (with TPM values above upper
   quartile) and a low expression group (with TPM values below lower
   quartile). With information on the association between the gene
   expression and survival profiles documented in TCGA, Kaplan–Meier
   survival analyses were performed and overall survival plots were
   generated. The difference between high gene expression and low gene
   expression was compared by log-rank test; a P-value of < 0.05 was
   considered significant.

In silico Analysis of the Top Modular Members in Normal and HCC Specimens

   Protein expression and distribution of the top modular genes in human
   liver tissue were searched in the HPA ^[65]4 database ([66]Uhlen et
   al., 2015).

Results

Retrieved Microarray Datasets Pertaining to Cirrhosis and HCC

   According to the retrieval criteria, four microarray datasets
   ([67]Table 1) containing a total of 95 benign (cirrhosis) and 98
   malignant (HCC) subjects were found from the GEO database. [68]GSE98983
   dataset was produced by Affymetrix Human Genome U133 Plus 2.0 Array
   ([69]GPL570); [70]GSE89377 by Illumina HumanHT-12 V3.0 expression
   beadchip ([71]GPL6947); [72]GSE17548 by Affymetrix Human Genome U133
   Plus 2.0 Array ([73]GPL570); and.[74]GSE56140 by Illumina HumanHT-12
   V3.0 expression beadchip.

Table 1.

   Detailed information of the four microarray datasets used in the
   present study.
   ID Platform Sample Etiology References