Graphical abstract
[29]graphic file with name ga1.jpg
[30]Open in a new tab
Abbreviations: HCC, Hepatocellular carcinoma; H3K4me3, Histone H3
trimethylated at lysine 4; H3K27ac, Histone H3 acetylated at lysine 27;
H3K9ac, Histone H3 acetylated at lysine 9; H3K4me2, Histone H3
dimethylated at lysine 4; H3K27me3, Histone H3 trimethylated at lysine
27; H3K9me3, Histone H3 trimethylated at lysine 9; H2AFZ, H2A histone
family member Z; H3K4me1, Histone H3 monomethylated at lysine 4;
H3K36me3, Histone H3 trimethylated at lysine 36; H3K79me2, Histone H3
dimethylated at lysine 79; H4K20me1, Histone H4 monomethylated at
lysine 20; ONCO, Oncogenes; TSG, Tumor suppressor genes; DHLEG,
Different highly and lowly expressed genes; TH, The genes are highly
expressed in tumor cell line but not in normal cell line; TL, The genes
are lowly expressed in tumor cell line but not in normal cell line; NH,
The genes are highly expressed in normal cell line but not in tumor
cell line; NL, The genes are lowly expressed in normal cell line but
not in tumor cell line; NH-TL, The genes are highly expressed in normal
cell line and lowly expressed in tumor cell line; NL-TH, The genes are
lowly expressed in normal cell line and highly expressed in tumor cell
line
Keywords: Histone modification signals, Gene expression, Oncogenes,
Tumor suppressor genes, Biomarkers
Abstract
Hepatocellular carcinoma (HCC) is one of the leading causes of cancer
death in the world. It has been reported that HCC is closely related to
the changes of histone modifications. However, finding histone
modification patterns in key genes which related to HCC is still an
important task. In our study, the patterns of 11 kinds of histone
modifications in the promoter regions for the different types of genes
were analyzed by hierarchical screening for hepatocyte (normal) cell
line and HepG2 (tumor) cell line. The important histone modifications
and their key modification regions in different types of genes were
found. The results indicate that these important genes may play a
pivotal role in the occurrence of HCC. By analyzing the differences of
histone modifications and gene expression levels for these important
genes between the two cell lines, we found that the signals of H3K4me3,
H3K27ac, H3K9ac, and H3K4me2 in HCC are significantly stronger. The
changed regions of important histone modifications in 17 key genes were
also identified. For example, the H3K4me3 signals increased 150 times
in regions (−1500, −500) bp and (0, 1000) bp of ARHGAP5 in tumor cell
line than in normal cell line. Finally, a prognostic risk scoring model
was constructed, and the effects of key genes on the prognosis of HCC
were verified by the survival analysis. Our results may provide a more
precise potential therapeutic targets for identifying key genes and
histone modifications in HCC as new biomarkers.
1. Introduction
Hepatocellular carcinoma (HCC) is the most common primary live
malignancy [31][1]. Liver cancer is the sixth most commonly diagnosed
cancer and the fourth leading cause of cancer death worldwide in 2018,
with about 841,000 new cases and 782,000 deaths annually [32][2]. For
the reason that patients with advanced HCC are unsuitable for curative
hepatectomy or hepatic transplantation, even if patients undergo
surgical resection, the high recurrence rate is the main cause for the
poor 5-years survival rate of HCC patients [33][3], [34][4]. The
development of HCC is a complex process, and more and more evidence has
shown that epigenetic changes are an important reason for the
occurrence of HCC [35][5], [36][6]. Therefore, understanding the
underlying mechanism of epigenetic changes is crucial for finding new
effective treatments.
A number of studies about histone modifications of HCC have been
reported. The histone modification patterns are dynamically regulated
by enzymes that add and remove covalent modifications to histones
[37][7]. Histone deacetylation is closely related to HCC, especially
the role of histone deacetylase in the pathogenesis of HCC [38][8]. It
has found that histone deacetylase 3 may be an important factor
regulating the proliferation and invasion of HepG2 cell line. HDAC6
suppresses tumors by mediating autophagic cell death in HCC [39][9].
EZH2 represses gene transcription through histone H3 trimethylated at
lysine 27 (H3K27me3). It has also found that EZH2-mediated epigenetic
silencing contributes to constitutive activation of Wnt/β-catenin
signaling and consequential proliferation of HCC [40][10]. Moreover,
EZH2 and SUV39H2 are closely related to the survival and tumor stage,
respectively [41][11].
The microarray studies of tumor tissue have demonstrated that high
H3K27me3 is associated with lower survival and poor prognosis of HCC
[42][12], [43][13], [44][14]. Recent studies have identified histone H4
dimethylated at lysine 20 and histone H4 acetylated at lysine 16 as
novel biomarkers of microvascular infiltration in HCC [45][15]. The
high signals of H3K4me3 are related to poor prognosis in HCC patients
[46][12]. The aberrant modification of histone 3 phosphorylation has
also been found in HCC [47][16]. The H3K4me3 levels in Oct4, Yap1, and
TCF7 promoters are also relevant to the self-renewal of liver the
cancer stem cells [48][17], [49][18], [50][19], [51][20]. These
researches provide new directions for the discovery of new biomarkers
and potential therapeutic targets in HCC.
The epigenetic modification patterns of HCC are still a mystery
throughout the genome, so it is urgent and necessary to analyze the
epigenetic modifications in HCC. Although there were many studies about
enzymes and the changes of histone modifications, almost all of them
were based on experimental levels. At present, there are relatively few
studies on the specific patterns of histone modifications and the
relationship between histone modifications and gene expression.
With the development of next-generation sequencing technologies, it has
been feasible to analyze cancers from the whole genome level [52][13],
[53][21]. In this study, we used the publicly available data of HepG2
(tumor) cell line and hepatocyte (normal) cell line from ENCODE to
identify key genes and the changes of histone modification signals for
HCC [54][22]. The patterns and main features of histone modification
signals in the promoter regions were calculated. The important histone
modification signals of the key genes were found. We also identified
regions where the important histone modification signals have been
changed for the key genes and constructed a prognostic risk scoring
model related to the expression of key genes, which well validated the
impact of these genes on the prognosis of HCC. These results should be
important for the development of HCC.
2. Material and methods
2.1. Gene expression data
The gene expression data (RNA-Seq, hg19) of HepG2 and hepatocyte cell
lines were downloaded from ENCODE database
([55]https://www.encodeproject.org/) (Supplementary appendix file:
[56]Table A1), which are the bam format, then we used the bedtools
software to convert data to the bed format [57][23]. The gene
expression levels in both cell lines were calculated according to the
reads per kilobase of exon model per million mapped sequence reads
(RPKM) [58][24].
2.2. Data of histone modifications
The ChIP-Seq data (hg19) for 11 histone modifications in HepG2 and
hepatocyte were downloaded from ENCODE database. These histone
modifications include histone H3 trimethylated at lysine 9 (H3K9me3),
histone H3 trimethylated at lysine 4 (H3K4me3), histone H3
trimethylated at lysine 27 (H3K27me3), H2A histone family member Z
(H2AFZ), histone H3 monomethylated at lysine 4 (H3K4me1), histone H3
trimethylated at lysine 36 (H3K36me3), histone H3 dimethylated at
lysine 4 (H3K4me2), histone H3 acetylated at lysine 27 (H3K27ac),
histone H3 dimethylated at lysine 79 (H3K79me2), histone H3 acetylated
at lysine 9 (H3K9ac) and histone H4 monomethylated at lysine 20
(H4K20me1) (Supplementary appendix file: [59]Table A1), and these data
are also the bam format. The histone modification signals (
[MATH: Sijk :MATH]
) of k-th histone modification in j-th bin for i-th gene were
calculated by following Eq. [60](1):
[MATH: Sijk=nijk×10
9
ljbin×n
read1⩽<
mi>i⩽19157,1⩽j⩽80,1⩽k⩽11 :MATH]
(1)
where i is i-th gene, j is the j-th bin, k is the k-th histone
modification,
[MATH: nijk :MATH]
is the read counts of k-th histone modification that located in j-th
bin for i-th gene,
[MATH: nread :MATH]
is the total read counts of k-th histone modifications,
[MATH: ljbin :MATH]
is the length of j-th bin,
[MATH: 109 :MATH]
is used to keep the consistent magnitude with RPKM (In the process of
calculating RPKM, the unit of the exon length is kilobase and the
counting unit of mapped reads is million.).
2.3. Known cancer genes and human reference genes
We downloaded 235 oncogenes (ONCO) and 222 tumor suppressor genes (TSG)
from COSMIC somatic mutation catalog
([61]https://cancer.sanger.ac.uk/cosmic). The RefSeq genes of human
genome (hg19) were downloaded from the UCSC database
([62]http://genome.ucsc.edu/cgi-bin/hgTables) (Supplementary appendix
file: [63]Table A2). RefSeq genes include name, chrom, strand, txStart,
txEnd, cdsStart, cdsEnd, exonCount, exonStarts, exonEnds, score, name2,
cdsStartStat, cdsEndStat, and exonFrames. The RefSeq data we downloaded
has a total of 68,534 lines of information. The non-coding genes
starting with NR in the name column were removed, the coding genes
starting with NM (the mature messenger RNA) were retained, leaving
52,162 genes. Then the genes with the same transcription start site
were deduplicated, and only one of them was retained, leaving 25,383
genes. The gene names (name2 column) were deduplicated, leaving 19,198
genes. Finally, all genes on the chromosome Y were removed, leaving
19,157 genes to be selected for analysis.
2.4. Partition of promoter regions
We obtained the transcription start site (TSS, txStart) of each gene
from RefSeq data (Supplementary appendix file: [64]Table A2). For each
gene, the functional region of (−2000, 2000) bp flanking the TSS was
divided into 80 bins of 50 bp in size. The histone modification signals
in each bin were calculated by Eq. [65](1), so that we obtained all
histone modification signals of 80 bins in the promoter regions.
2.5. Correlation analysis
Since different histone modifications may have cooperation for gene
expression, to study the relationship between histone modifications, we
computed Spearman’s correlations between histone modification signals (
[MATH:
rkk′HM :MATH]
) according to Eq. [66](2). Finally, we got an 11 × 11 correlation
coefficients matrix and used the gplots package [67][25] to plot the
heat maps of histone modifications correlations.
[MATH:
rkk′HM=1-6∑j=1mrgHMkj-r
gHMk<
/mi>′j2mm2-1
mn> :MATH]
[MATH: HMkj=∑i=1nlog2Sijk+Δ/n
:MATH]
[MATH:
HMk′
j=∑i=1nlog2Sijk′+Δ/n
:MATH]
[MATH: 1⩽i⩽n
,1⩽j⩽m,<
mn>1⩽k⩽11,1<
mo>⩽k′⩽11<
/mrow> :MATH]
(2)
where m = 80, n = 19157, Δ=
[MATH:
10-2
:MATH]
, i is i-th gene, j is the j-th bin, k (
[MATH: k′ :MATH]
) is the k-th (
[MATH: k′ :MATH]
-th) histone modification,
[MATH: Sijk :MATH]
and
[MATH: Sijk′ :MATH]
were calculated according to Eq. [68](1),
[MATH: HMkj :MATH]
(
[MATH:
HMk′
j :MATH]
) is the average signals of k-th (
[MATH: k′ :MATH]
-th) histone modification in the j-th bin,
[MATH:
rgHM<
mi mathvariant="italic">kj
:MATH]
(
[MATH:
rgHM<
msup>k′j<
/mrow> :MATH]
) is the rank number of the
[MATH: HMkj :MATH]
(
[MATH:
HMk′
j :MATH]
).
In addition, the Spearman’s correlations (
[MATH: rkj,el :MATH]
) between the k-th histone modification signals in the j-th bin of
promoter regions and gene expression levels (el) were also calculated
by Eq. [69](3), so we got an 11 × 80 data matrix.
[MATH: rkj,el=1-6∑i=1nrgHMijk-
rgLRi
2nn2-1
mn> :MATH]
[MATH: HMijk=log2Sijk+Δ
:MATH]
[MATH: LRi=log2Ri+Δ
:MATH]
[MATH: 1⩽i⩽n
,1⩽j⩽m,<
mn>1⩽k⩽11 :MATH]
(3)
where m, n, Δ, i, j, k,
[MATH: Sijk :MATH]
are the same as in Eq. [70](2).
[MATH: HMijk :MATH]
is the logarithm of
[MATH: Sijk :MATH]
,
[MATH: Ri :MATH]
is the RPKM of i-th gene,
[MATH: LRi :MATH]
is the logarithm of the
[MATH: Ri :MATH]
,
[MATH:
rgHM<
mi mathvariant="italic">ijk
:MATH]
(
[MATH:
rgLRi
mi> :MATH]
) is the rank number of the
[MATH: HMijk :MATH]
(
[MATH: LRi :MATH]
).
2.6. Hierarchical screening for genes
The gene expression levels were sorted in the two cell lines,
respectively. We selected the top ten percent of genes as the highly
expressed genes and the expression value is zero as the lowly expressed
genes. Then, we got 1,915 highly expressed genes and 2,922 lowly
expressed genes for the normal cell line. For the tumor cell line, we
got 1,915 highly expressed genes and 5,352 lowly expressed genes.
Subsequently, we selected the differentially expressed genes as the
different highly and lowly expressed genes (DHLEG) from the above four
gene sets of highly and lowly expressed genes.
2.7. Clinical data and corresponding gene expression data of patients with
HCC
Clinical data and corresponding gene expression data of patients with
liver hepatocellular carcinoma (LIHC) were downloaded from The Cancer
Genome Atlas (TCGA) database ([71]https://cancergenome.nih.gov/)
[72][26]. Samples with both clinical data and expression data were
selected, while samples without survival information in clinical data
were deleted. The selected clinical data are all samples with overall
survival time and survival status. These samples data in TCGA were
collected from surgical resection of biopsy biospecimens of patients
diagnosed with hepatocellular carcinoma (HCC), and had not received
prior treatment for their disease (ablation, chemotherapy, or
radiotherapy) [73][26]. Finally, the data of 364 patients with LIHC
were used in this study (Supplementary appendix file: [74]Table A3).
2.8. Construction of risk scoring model
To further identify clinically significant prognostic genes, we
evaluated the relationship between gene expression levels and overall
survival. We performed a multivariate Cox proportional hazard
regression analysis for genes [75][27], the hazard ratio (HR) and 95%
confidence interval (CI) were calculated. The genes with P < 0.05 were
selected, and a risk scoring model (RS) was constructed by Eq. [76](4)
through weighting multivariable regression coefficients of each gene.
Then the risk score was calculated for each sample. Finally, the
patients were divided into high-risk group and low-risk group by using
the median risk score of all samples as the threshold.
[MATH: RS=∑i=
1mcoefi×eli :MATH]
(4)
Where i is the i-th gene, m is the number of genes, RS is a prognostic
risk score for the HCC patient,
[MATH: coefi
:MATH]
is the contribution of i-th gene to prognostic risk scores that were
obtained from the regression coefficient of multivariate Cox analysis,
[MATH: eli :MATH]
represents the expression level of i-th gene.
2.9. Survival analysis
The Kaplan-Meier method was used for survival analysis, the survival
rate and median survival of each prognostic risk group were calculated.
The survival rates of patients in different risk groups were compared,
and the significance of the differences was evaluated using the
log-rank test [77][28]. In order to verify the predictive ability of
the risk scoring model, the performance of the RS was also evaluated by
the time-dependent receiver operating characteristic (ROC) curve
[78][29].
3. Results
3.1. Histone modification signals of DHLEG
In order to analyze the differences between the highly and lowly
expressed genes, we selected the differentially expressed genes as
DHLEG [79][30]. The DHLEG includes the following six situations
([80]Fig. 1A): A. There are 1,051 genes that are highly expressed in
tumor cell line but not in normal cell line (TH). B. 2,816 genes are
lowly expressed in tumor cell line but not in normal cell line (TL). C.
1,051 genes are highly expressed in normal cell line but not in tumor
cell line (NH). D. 386 genes are lowly expressed in normal cell line
but not in tumor cell line (NL). E. 81 genes are highly expressed in
normal cell line and lowly expressed in tumor cell line (NH-TL). F.
Only PEG3 is lowly expressed in normal cell line and highly expressed
in tumor cell line (NL-TH).
Fig. 1.
[81]Fig. 1
[82]Open in a new tab
The average histone modification signals of DHLEG. (A) The Venn diagram
of DHLEG in two kinds of cell lines. (B) and (C) are the average
histone modification signals of the TH (TL) and NH (NL). (D) The
average histone modification signals of NH-TL in two kinds of cell
lines. Abbreviations: DHLEG: different highly and lowly expressed
genes. TH: the genes are highly expressed in tumor cell line but not in
normal cell line. NH: the genes are highly expressed in normal cell
line but not in tumor cell line. TL: the genes are lowly expressed in
tumor cell line but not in normal cell line. NL: the genes are lowly
expressed in normal cell line but not in tumor cell line. NH-TL: The
genes are highly expressed in normal cell line and lowly expressed in
tumor cell line.
By computing the average histone modification signal strength of each
bin in the promoter regions for DHLEG, we subsequently got the
distributions of each histone modification. It was found that most of
the histone modifications are stronger in highly expressed genes than
that in lowly expressed genes by comparing the histone modification
signals between TH (NH) and TL (NL) ([83]Fig. 1B and C), suggesting
that most of the histone modifications play an activating role in the
two kinds of cell lines. However, H3K27me3 signals are stronger in
lowly expressed genes than that in highly expressed genes for the two
kinds of cell lines. These results indicate that the H3K27me3 may play
an inhibiting role in the two kinds of cell lines. Moreover, we also
observed that the signals of H3K27ac and H3K4me3 are stronger in tumor
cell line than that in normal cell line. For the histone modifications
of NH-TL ([84]Fig. 1D), it can be seen that the H3K27me3 signals are
stronger in tumor cell line than that in normal cell line. Furthermore,
the H3K4me3 signals are stronger in tumor cell line, which is
consistent with the results of [85]Fig. 1B and C.
3.2. Correlation between histone modifications in highly and lowly expressed
genes
To illustrate the cooperative effects of histone modifications for the
highly and lowly expressed genes in the two kinds of cell lines, we
calculated Spearman’s correlations between histone modifications then
plotted heat maps ([86]Fig. 2). In highly expressed genes of tumor cell
line, there are two histone modification clusters with strong positive
correlation (
[MATH:
P<2.2×10
-16 :MATH]
), including (H3K27me3, H3K9ac, H3K4me3, and H3K27ac,
[MATH:
rkk′HM :MATH]
>0.87) and (H4K20me1 and H3K36me3,
[MATH:
rkk′HM :MATH]
=0.96), there is just one histone modification cluster between H3K4me1
and (H3K27ac, H3K4me3, and H3K9ac) with strong negative correlation (
[MATH:
rkk′HM :MATH]
<−0.76,
[MATH:
P<2.2×10
-16 :MATH]
) ([87]Fig. 2A). For the lowly expressed genes of tumor cell line,
there is only one histone modification cluster (H3K4me3, H3K4me2,
H3K4me1, H2AFZ, and H3K9ac) that has strong positive correlation (
[MATH:
rkk′HM :MATH]
>0.76,
[MATH:
P<2.2×10
-16 :MATH]
) ([88]Fig. 2B). Three histone modification clusters have strong
positive correlation (
[MATH:
P<2.2×10
-16 :MATH]
) in highly expressed genes of normal cell line, including (H3K36me3,
H3K79me2, and H4K20me1,
[MATH:
rkk′HM :MATH]
>0.86), (H2AFZ and H3K27ac,
[MATH:
rkk′HM :MATH]
=0.94) and (H3K4me2, H3K9ac and H3K4me3,
[MATH:
rkk′HM :MATH]
>0.86) ([89]Fig. 2C). In lowly expressed genes of normal cell line,
there is just one histone modification cluster (H3K79me2, H3K36me3, and
H3K9me3) that has strong positive correlation (
[MATH:
rkk′HM :MATH]
>0.84,
[MATH:
P<2.2×10
-16 :MATH]
). However, the Spearman’s correlations between H3K4me2 and (H3K79me2,
H3K36me3, and H3K9me3) have strong negative correlation (
[MATH:
rkk′HM :MATH]
<−0.58,
[MATH:
P<1.4×10
-8 :MATH]
) ([90]Fig. 2D). Therefore, heat maps show that different types of
genes are modified by different histone modifications clusters.
Fig. 2.
[91]Fig. 2
[92]Open in a new tab
The Spearman correlation coefficient of histone modifications in two
kinds of cell lines. (A) The heat map of highly expressed genes in
tumor cell line. (B) The heat map of lowly expressed genes in tumor
cell line. (C) The heat map of highly expressed genes in normal cell
line. (D) The heat map of lowly expressed genes in normal cell line.
3.3. Analysis of the histone modification signals in oncogenes (ONCO) and
tumor suppressor genes (TSG) of DHLEG for two kinds of cell lines
In order to further find key genes that are related to HCC in DHLEG, we
selected the known cancer genes in DHLEG ([93]Table 1). To illustrate
the characteristics of histone modifications in these genes, we
calculated the average histone modification signals for tumor and
normal cell lines, respectively. For tumor cell line ([94]Fig. 3A), it
shows that the average histone modification signals in highly expressed
ONCO are generally stronger than that in lowly expressed TSG,
especially for H3K4me3, H3K9ac, H3K27ac, and H3K4me2. Whereas for
normal cell line ([95]Fig. 3B), it shows that the average histone
modification signals in highly expressed TSG are generally stronger
than that in lowly expressed ONCO, except for H3K27me3.
Table 1.
Oncogenes and tumor suppressor genes in DHLEG.
DHLEG TYPE NUMBER GENE SYMBOL
TH ONCO 16 ARHGAP5, CCND1, CHD4, ERBB3, ETV4, HLF, MALT1, MAP2K1,
MAPK1, MDM2, MET, MYC, MYD88, PLCG1, WWTR1, XPO1
TSG 18 AXIN1, AXIN2, CDKN1B, CHD2, CLTC, EIF3E, FAT1, FH, FUS, MSH2,
MSH6, PTPRK, RMI2, RNF43, SMARCB1, SETD2, SDHA, TGFBR2
TL ONCO 15 ACKR3, ALK, CD79B, CHST11, CTNNA2, ETV1, FEV, FOXR1, GLI1,
HOXD13, KCNJ5, MYB, NTRK3, RSPO3, ZNF521
TSG 4 CBFA2T3, CDX2, CREB3L1, ROBO2
NH ONCO 14 ACKR3, AKT1, CCND3, CDK4, DDIT3, FOXA1, FSTL3, HIF1A, IDH2,
KDR, SGK1, SH3GL1, STAT3, TFE3
TSG 13 BTG1, CDH1, CREB3L1, DNM2, ELF3, ID3, KLF6, PER1, PPP2R1A,
SDHAF2, SDHC, SDHD, TNFAIP3
NL ONCO 2 P2RY8, POU2AF1
TSG 0 –
NH-TL ONCO 1 ACKR3
TSG 1 CREB3L1
NL-TH ONCO 0 –
TSG 0 –
[96]Open in a new tab
Abbreviations: DHLEG: different highly and lowly expressed genes. ONCO:
oncogenes. TSG: tumor suppressor genes. TH: the genes are highly
expressed in tumor cell line but not in normal cell line. TL: the genes
are lowly expressed in tumor cell line but not in normal cell line. NH:
the genes are highly expressed in normal cell line but not in tumor
cell line. NL: the genes are lowly expressed in normal cell line but
not in tumor cell line. NH-TL: the genes are highly expressed in normal
cell line and lowly expressed in tumor cell line. NL-TH: the genes are
lowly expressed in normal cell line and highly expressed in tumor cell
line.
Fig. 3.
[97]Fig. 3
[98]Open in a new tab
The histone modification signals and gene expression levels in ONCO and
TSG of DHLEG for two kinds of cell lines. (A) The histone modification
signals of tumor cell line. (B) The histone modification signals of
normal cell line. (C) The distribution of histone modification signals
in CREB3L1 that is highly expressed in normal cell line and lowly
expressed in tumor cell line. (D) The distribution of histone
modification signals in ACKR3 that is highly expressed in normal cell
line and lowly expressed in tumor cell line. (E) The heat map of
Spearman’s correlations between histone modification signals in the 80
bins of the promoter regions and the gene expression levels for
TH-ONCO. (F) The heat map of Spearman’s correlations between histone
modification signals in the 80 bins of the promoter regions and the
gene expression levels for NH-TSG. Abbreviations: ONCO: oncogenes. TSG:
tumor suppressor genes. DHLEG: different highly and lowly expressed
genes. TH-ONCO: the highly expressed oncogenes in tumor cell line.
NH-TSG: the highly expressed tumor suppressor genes in normal cell
line.
For TSG CREB3L1 ([99]Fig. 3C), it shows that the histone modifications
of H3K9ac, H3K4me2, and H3K27ac activate the high expression of this
gene in the normal cell line, H3K27me3 inhibits the expression of this
gene in the tumor cell line. For the gene ACKR3, it is highly expressed
in normal cell line and lowly expressed in tumor cell line (NH-TL), and
the signals of H3K9ac, H3K4me2, and H3K27ac are stronger in normal cell
line than that in tumor cell line ([100]Fig. 3D). It suggests that
these histone modifications mainly activated the high expression of
this gene in normal cell line. However, H3K27me3 has a higher level of
modification in tumor cell line, it inhibits the expression of this
gene in tumor cell line.
3.4. The analysis of the correlations between histone modifications and gene
expression levels
In order to study the relationships between the locations of histone
modifications and expression levels, we further calculated the
Spearman’s correlations between histone modifications in 80 bins of
promoter regions and the expression levels of TH-ONCO and NH-TSG, the
results are shown in the heat maps. For the TH-ONCO in tumor cell line,
there are the obvious negative correlations between gene expression
levels and H3K27me3 in the regions (200, 250) bp and (850, 900) bp,
while an obvious positive correlation between gene expression levels
and H3K27ac in the region (1700, 1750) bp ([101]Fig. 3E). For the
NH-TSG in normal cell line, the heat map also shows that there are
prominent correlations between gene expression levels and histone
modifications, such as H3K9ac in the region (1500, 1550) bp, H3K4me3 in
the regions (−1650, −1600) bp and (1800, 1850) bp. In addition, it has
the obvious correlations between gene expression levels and H3K27ac in
the regions (−1500, −1450), (850, 1000), (1050, 1100), (1150, 1250),
(1300, 1450), and (1900, 1950) bp ([102]Fig. 3F), the detailed results
are displayed in [103]Table 2. The above results indicate that H3K4me3,
H3K9ac, H3K27ac, and H3K27me3 are important histone modifications for
gene expression.
Table 2.
The obvious correlation between gene expression levels and histone
modification signals.
GENES HM REGION
[MATH: rkj,el :MATH]
P-value
TH-ONCO H3K27me3 (−1300 bp, −1250 bp) (15-th) −0.536 0.032
(−1150 bp, −1100 bp) (18-th) −0.518 0.040
(−700 bp, −650 bp) (27-th) −0.581 0.018
(−500 bp, −450 bp) (31-st) −0.552 0.027
(200 bp, 250 bp) (45-th) −0.612 0.012
(850 bp, 900 bp) (58-th) −0.630 0.009
(1100 bp, 1150 bp) (63-rd) −0.498 0.050
H3K27ac (1700 bp, 1750 bp) (75-th) 0.519 0.039
(1750 bp, 1800 bp) (76-th) 0.457 0.039
NH-TSG H3K9ac (1500 bp, 1550 bp) (71-st) 0.669 0.012
H3K4me3 (−1650 bp, −1600 bp) (8-th) 0.687 0.010
(1800 bp, 1850 bp) (77-th) 0.615 0.025
H3K27ac (−1500 bp, −1450 bp) (11-th) 0.575 0.040
(850 bp, 900 bp) (58-th) 0.600 0.030
(900 bp, 950 bp) (59-th) 0.598 0.031
(950 bp, 1000 bp) (60-th) 0.574 0.040
(1050 bp, 1100 bp) (62-nd) 0.580 0.040
(1150 bp, 1200 bp) (64-th) 0.617 0.025
(1200 bp, 1250 bp) (65-th) 0.721 0.005
(1300 bp, 1350 bp) (67-th) 0.639 0.019
(1350 bp, 1400 bp) (68-th) 0.727 0.005
(1400 bp, 1450 bp) (69-th) 0.756 0.003
(1900 bp, 1950 bp) (79-th) 0.712 0.006
[104]Open in a new tab
Abbreviations: HM: histone modification.
3.5. Analysis of important histone modifications in key genes related to HCC
To further recognize the differences between histone modification
signals in normal and tumor cell lines for the four types of
differential genes, i.e. TH-ONCO, TL-TSG, NH-TSG, and NL-ONCO, we
calculated the distribution and relative difference of histone
modification signals in TH-ONCO for tumor cell line and normal cell
line. The signals of H3K4me3 and H3K27ac are very stronger for
oncogenes (ERBB3, CCND1, ARHGAP5, PLCG1, HLF, and MYD88) in tumor cell
line ([105]Fig. 4). The results are consistent with the histone
modification clusters of the TH heat map, it also indicates that the
co-modification of the two histones promote the high expression of
these ONCO in tumor cell line. Furthermore, by comparing the histone
modifications of oncogenes, we also found other important genes, such
as MALT1, CHD4, MYC, MDM2, MET, ETV4, and MAP2K1 (Supplementary
appendix file: [106]Fig. A1, [107]Table A4).
Fig. 4.
[108]Fig. 4
[109]Open in a new tab
The distribution of histone modification signals in TH-ONCO for two
kinds of cell lines. In each figure, the same histone modifications are
represented in the same color that the box on the left represents the
histone modifications in tumor cell line (T) and the box on the right
represents the histone modifications in normal cell line (N). The line
represents the relative difference
[MATH: ST¯
-SN¯/SN¯
+10-2
:MATH]
between the same histone modifications. Abbreviations: TH-ONCO: the
highly expressed oncogenes in tumor cell line. HMS: the histone
modification signals.
Similarly, for the TL-TSG, we found that the signals of H3K9ac,
H3K27ac, and H3K4me2 for ROBO2, and H3K9ac, H3K27ac, H2AFZ, and H3K4me2
for CREB3L1 are generally stronger in normal cell line than that in
tumor cell line (Supplementary appendix file: [110]Fig. A2, [111]Table
A4). For the NH-TSG, the signals of H3K9ac, H3K4me3, H2AFZ, H3K27ac,
and H3K4me2 for CREB3L1, and H3K9ac, H3K36me3, H3K79me2, and H3K27ac
for CDH1, as well as H3K9ac, H3K4me3, H3K27ac, and H3K4me2 for BTG1 are
stronger in normal cell line than that in tumor cell line
(Supplementary appendix file: [112]Fig. A3, [113]Table A4).
3.6. Identifying the location of histone modifications in key genes related
to HCC
Through our above analysis, we found 17 key genes (13 oncogenes are
MYD88, MYC, ARHGAP5, PLCG1, MDM2, MET, ETV4, HLF, ERBB3, MAP2K1, MALT1,
CCND1, and CHD4, 4 tumor suppressor genes are CREB3L1, ROBO2, BTG1, and
ACKR3) related to important histone modifications regulating gene
expression in HCC. Afterward, we got the distributions of histone
modifications by calculating the histone modification signals in the
promoter regions for each gene. By comparing with normal cell line, we
identified the key regions of some important histone modifications in
genes that are associated with HCC. The results indicate that the
increase of the H3K4me3, H3K27ac, H3K9ac, etc. histone modification
signals in oncogenes may be the main reason that leads to the
development of HCC. The signals of H3K4me3 in some regions of oncogenes
(MYD88, MYC, ARHGAP5, PLCG1, MDM2, MET, HLF, ERBB3, MAP2K1, and MALT1)
are significantly stronger.
For key ONCO, the main histone modifications and their regions that
have changed in tumor cell line were summarized as the following
example ([114]Table 3A).
* (1)
The H3K4me3 signals are markedly intensified in the regions (−2000,
−1300) bp and (−1000, 1000) bp of MYD88, the region (−1000, 1500)
bp of PLCG1, and the region (−1000, 2000) bp of ERBB3, etc.
Particularly, in the regions (−1500, −500) bp and (0, 1000) bp of
ARHGAP5, the H3K4me3 signals increase 150 times in tumor cell line
than that in normal cell line. The H3K4me3 signals of these regions
in the related genes may result in the HCC.
* (2)
There are also obvious increases for H3K27ac signals in the region
(0, 2000) bp of ERBB3, regions (−1700, −1000) bp and (0, 500) bp of
ETV4, regions (−1000, −300) bp, (0, 500) bp and (500, 1500) bp of
MDM2, region (0, 2000) bp of HLF. Especially, the signals increase
100 times in the region (0, 1500) bp of ARHGAP5.
* (3)
H3K9ac signals in the region (0, 1500) bp of ARHGAP5, regions
(−500, −200) bp and (0, 1500) bp of PLCG1, regions (300, 800) bp
and (1000, 2000) bp of ERBB3 and region (0, 2000) bp of HLF also
increase.
* (4)
H3K4me2 signals in the regions (−1500, −500) bp and (1000, 2000) bp
of ARHGAP5 are also stronger. It can be seen that these histone
modifications in these genes are at least 50 times greater in tumor
cell line than that in normal cell line. This indicates that the
increase of H3K4me2 in these genes may significantly affect the
occurrence of HCC.
Table 3.
Genes and important distribution regions of histone modification
signals that related to HCC. (A) is the oncogenes, (B) is the tumor
suppressor genes.
(A)
GENE (ONCO) H3K4me3
__________________________________________________________________
H3K9ac
__________________________________________________________________
H3K27ac
__________________________________________________________________
H3K4me2
__________________________________________________________________
H3K79me2
__________________________________________________________________
DISTRIBUTION DHMR (bp) Δ DISTRIBUTION DHMR (bp) Δ DISTRIBUTION DHMR
(bp) Δ DISTRIBUTION DHMR (bp) Δ DISTRIBUTION DHMR (bp) Δ
MYD88 graphic file with name fx1.gif (−2000, −1300)
(−1000, 1000) ↑ graphic file with name fx2.gif (−2000, −1500)
(−1000, 0) ↑
MYC graphic file with name fx3.gif (−1300, −700)
(−500, 2000) ↑ graphic file with name fx4.gif (0, 1500) ↑ graphic file
with name fx5.gif (−1800, −800)
(−800, −100)
(200, 1500) ↑
ARHGAP5 graphic file with name fx6.gif (−1500, −500)
(0, 1000) ↑ graphic file with name fx7.gif (0, 1500) ↑ graphic file
with name fx8.gif (−1000, −500)
(0, 1500) ↑ graphic file with name fx9.gif (−1500, −500)
(1000, 2000) ↑ graphic file with name fx10.gif (500, 2000) ↑
PLCG1 graphic file with name fx11.gif (−1000, 1500) ↑ graphic file with
name fx12.gif (−500, −200)
(0, 1500) ↑ graphic file with name fx13.gif (0, 1200) ↑ graphic file
with name fx14.gif (−1500, −500) ↑
MDM2 graphic file with name fx15.gif (−1000, −300)
(0, 500)
(500, 1000) ↑ graphic file with name fx16.gif (−1000, −300)
(0, 500)
(500, 1500) ↑
MET graphic file with name fx17.gif (−800, 800) ↑ graphic file with
name fx18.gif (−500, 0)
(0, 1200) ↑
ETV4 graphic file with name fx19.gif (−1700, −1200)
(1000, 2000) ↑ graphic file with name fx20.gif (1000, 2000) ↑ graphic
file with name fx21.gif (−1700, −1000)
(0, 500)
(1000, 2000) ↑
HLF graphic file with name fx22.gif (0, 2000) ↑ graphic file with name
fx23.gif (0, 2000) ↑ graphic file with name fx24.gif (0, 2000) ↑
graphic file with name fx25.gif (0, 1000)
(1200, 2000) ↑
ERBB3 graphic file with name fx26.gif (−1000, 2000) ↑ graphic file with
name fx27.gif (300, 800)
(1000, 2000) ↑ graphic file with name fx28.gif (0, 2000) ↑
MAP2K1 graphic file with name fx29.gif (0, 1500) ↑ graphic file with
name fx30.gif (−700, −300)
(800, 1500) ↑ graphic file with name fx31.gif (500, 1500) ↑ graphic
file with name fx32.gif (700, 2000) ↑
MALT1 graphic file with name fx33.gif (−800, 1000) ↑ graphic file with
name fx34.gif (200, 1500) ↑ graphic file with name fx35.gif (−2000, 0)
(0, 500)
(700, 1100) ↑ graphic file with name fx36.gif (500, 2000) ↑
CCND1 graphic file with name fx37.gif (−2000, −1500)
(−1000, −2000) ↑ graphic file with name fx38.gif (0, 500) ↑ graphic
file with name fx39.gif (0, 800) ↑
CHD4 graphic file with name fx40.gif (−2000, −1500) ↑ graphic file with
name fx41.gif (−2000, −1400) ↑ graphic file with name fx42.gif (−2000,
−1400) ↑
(B)
GENE (TSG) H3K4me3
__________________________________________________________________
H3K9ac
__________________________________________________________________
H3K27ac
__________________________________________________________________
H3K4me2
__________________________________________________________________
H3K27me3
DISTRIBUTION DHMR (bp) Δ DISTRIBUTION DHMR (bp) Δ DISTRIBUTION DHMR
(bp) Δ DISTRIBUTION DHMR (bp) Δ DISTRIBUTION DHMR (bp) Δ
CREB3L1 graphic file with name fx43.gif (0, 1500) ↓ graphic file with
name fx44.gif (−1500, −300)
(300, 2000) ↓ graphic file with name fx45.gif (−500, 2000) ↓ graphic
file with name fx46.gif (−1800, −500)
(−300, 2000) ↑
ROBO2 graphic file with name fx47.gif (−1500, 500) ↓ graphic file with
name fx48.gif (−1500, 500) ↓ graphic file with name fx49.gif (−1500,
2000) ↓ graphic file with name fx50.gif (−1500, −500)
(0, 500) ↑
BTG1 graphic file with name fx51.gif (−2000, −500)
(700, 2000) ↓ graphic file with name fx52.gif (1000, 2000) ↓ graphic
file with name fx53.gif (−1000, 2000) ↓
ACKR3 graphic file with name fx54.gif (−2000, −800)
(0, 2000) ↓ graphic file with name fx55.gif (−2000, 1300) ↓ graphic
file with name fx56.gif (−1500, 500)
(500, 2000) ↓ graphic file with name fx57.gif (−1500, 2000) ↑
[115]Open in a new tab
In the distribution maps of histone modification signals, the blue
represents the distribution of histone modification signals in tumor
cell line and the red represents the distribution of histone
modification signals in normal cell line. The Δ represents the change
of histone modification signals in tumor cell line relative to normal
cell line, the ↑ represents increasing of histone modification signals,
the ↓ represents weakening of histone modification signals.
Abbreviations: HM: histone modification. DHMR: differentially histone
modification regions in tumor cell line.
For TSG ([116]Table 3B), we found that the signal strength of H3K9ac,
H3K4me2, and H3K27ac are weaker in tumor cell line than normal cell
line, such as H3K9ac in the region (0, 1500) bp of CREB3L1. In
contrast, the signal strength of H3K27me3 increases in tumor cell line
relative to normal cell line. It indicates that the increase of
H3K27me3 and the decrease of H3K9ac, H3K4me2, and H3K27ac in tumor cell
line inhibit the expression of TSG.
3.7. Functional enrichment analysis of ONCO (TSG) for DHLEG
To further analyze the action of these important genes in the
development of HCC, we performed GO and KEGG pathway enrichment
analysis using Metascape [117][31] ([118]http://metascape.org/)
([119]Fig. 5). The Pathways in cancer (hsa05200) is rich in many
significant genes (CCND1, MAP2K1, MAPK1, MDM2, MET, MYC, PLCG1, CDH1,
etc.). These analyses also reveal that these key genes are related to
transcriptional misregulation in cancer (hsa05202), the negative
regulation of cell differentiation (GO: 0045596), mesenchyme
development (GO: 0060485), the regulation of DNA binding transcription
factor activity (GO: 0051090), cell surface receptor signaling pathway
involved in cell–cell signaling (GO: 1905114), and the maintenance of
DNA repeat elements (GO: 0043570) ([120]Fig. 5A). GO enrichment cluster
analysis shows that these genes are indeed related to cancer, such as
the cluster in PI3K-Akt signaling pathway, Pathways in cancer, MAPK
family signaling cascades, epithelial cell proliferation and
differentiation, the positive regulation of cell death, etc. ([121]Fig.
5B). We also found some links between these genes ([122]Fig. 5C). All
these enrichment results indicate that these important genes do have an
impact on the occurrence of cancer.
Fig. 5.
[123]Fig. 5
[124]Open in a new tab
Functional enrichment analysis of key genes. (A) Heat maps showing
enrichment of key genes. Length of bars represent
[MATH: log10P
:MATH]
based on the best-scoring term within each cluster, the color key from
shallow to deep indicates high to low P values, respectively. (B)
Enrichment network of these genes. Each term is indicated by a circular
node. The number of these genes falling into that term is represented
by the circle size and the nodes of the same color belonged to the same
cluster. (C) Circos plots of these key genes. The related genes are
linked by lines.
3.8. Survival analysis of key genes
To further validate the prognostic value of the key genes we found for
HCC, we performed survival analysis using the clinical data and
corresponding expression data of 364 LIHC patients from the TCGA
database (including the HCC-related genes CDK4 and AKT1 that have been
found [125][32], [126][33], [127][34], [128][35]. First, the effect of
high and low expression of these genes on patient survival was
verified. It was found that the survival rate in high expression group
is lower than that in low expression group for ONCO (MYD88, ARHGAP5,
PLCG1, ETV4, ERBB3, MAP2K1, CHD4, CDK4, AKT1) of DHLEG, and the
survival rate in low expression group is lower than that in high
expression group for TSG (ROBO2, BTG1) of DHLEG, but the difference is
not significant (Supplementary appendix file: [129]Fig. A4). It may be
because the expression of a single gene is not sufficient to
significantly change the survival of the patient.
Therefore, we further performed multiple Cox regression analyses
(log-rank test,
[MATH: P=6×10-5 :MATH]
) on the expression of these genes ([130]Fig. 6A), finally constructed
a prognostic model related to the expression of six genes, as shown
below:
[MATH: RS=0.5904
mn>×elARHGAP5
+0.1713<
mo>×elET
mi>V4-0.<
/mo>5468×elMAP2K1-0.2910
mn>×elBTG1
+0.<
/mo>2505×elACKR3+0.6266×elC
DK4 :MATH]
Fig. 6.
[131]Fig. 6
[132]Open in a new tab
Survival analysis of key genes. (A) Forest plot with multiple Cox
analysis results. The horizontal line corresponds to the 95% confidence
interval and the vertical line indicates a C-index of 1. (B)
Kaplan-Meier curve for high-risk and low-risk groups. (C) The ROC curve
corresponding to the risk scoring model.
Among them, ARHGAP5, ETV4, ACKR3, and CDK4 are protection factors,
MAP2K1 and BTG1 are risk factors. The risk scores of all patients were
calculated by using the risk scoring model. According to the median
risk score of 1.3093, the patients were divided into high-risk and
low-risk groups. Kaplan-Meier survival analysis shows that patients in
the high-risk group has significantly lower survival rates than
patients in the low-risk group (P = 0.0001) ([133]Fig. 6B). The ROC
curve was used to evaluate the risk scoring model (AUC = 0.74), which
proves that the risk scoring model has a good predictive ability for
patient survival ([134]Fig. 6C).
In addition, we performed the survival analysis of patients in
different groups at high-risk and low-risk based on age, gender, tumor
stage, T, N, and M stages, combined with clinical pathological factors.
It was found that the high-risk and low-risk groups divided by the risk
scoring model in different age groups (age
[MATH: ≤ :MATH]
54, age
[MATH: > :MATH]
54), male, different pathological stages (I/II, III/IV), different T
stages (T1/T2, T3/T4), M0, N0 stages, the survival rate of patients in
the high-risk group is significantly lower than those in the low-risk
group ([135]Fig. 7). The results show that the predictive ability of
our risk scoring model has nothing to do with other pathological
factors, is a better independent predictor, and can well predict the
survival rate of patients.
Fig. 7.
[136]Fig. 7
[137]Open in a new tab
Kaplan–Meier curve to assess the independent prognostic performance of
risk scoring models in different categories of patients.
4. Discussion
In this study, we aimed at revealing the key genes that are related to
important histone modifications as well as the vital regions of these
histone modifications that resulted in HCC. We analyzed the histone
modification patterns of DHLEG and found that the H3K4me3, H3K27ac, and
H3K9ac are activating modifications, while H3K27me3 is an inhibitory
modification. By analyzing the ONCO and TSG of DHLEG, we found that
some key genes may be important for the occurrence of HCC, that
including MYC, CCND1, CHD4, MAP2K1, MDM2, MET, AKT1, CDK4, ARHGAP5,
ETV4, ERBB3, HLF, MALT1, MYD88, and PLCG1. Besides, in further
researches of the distribution patterns of histone modifications within
these ONCO and TSG, it was observed that H3K4me3, H3K27ac, H3K9ac, and
H3K27me3 play important roles for oncogenes expression. Finally, the
key genes were analyzed for survival, and a prognostic risk scoring
model related to the expression of six genes was established, which can
better predict the survival rate of patients.
In fact, some of the genes (MYC, CCND1, CHD4, MDM2, and MET) were found
in our research that have been reported in previous studies. For
example, MYC is a transcription factor, regulates many programs which
are related to the occurrence of cancer [138][36], [139][37],
[140][38], [141][39]. MYC and CCND1 are also canonical Wnt targets,
often occur overexpression and genomic amplification in HCC [142][10],
[143][40], [144][41], [145][42], [146][43], [147][44], [148][45].
Studies have also reported CHD4 is a good target for the eradication of
HCC [149][46]. MDM2 has been shown to function in HCC [150][47]. MET
frequently undergoes copy number variation and has been identified as
biomarkers in HCC [151][48], [152][49], [153][50], [154][51],
[155][52]. These researches not only have proved that the key genes
obtained in our study are associated with HCC, but also have
demonstrated that our results are credible.
In addition, we also found some important genes for HCC that hardly
reported in previous studies, such as ARHGAP5, ETV4, ERBB3, MAP2K1,
HLF, MALT1, MYD88, and PLCG1. Although there is no evidence that ERBB3
is carcinogenic, it has found that ERBB2 is associated with this gene,
which has genomic amplification in breast cancer [156][53]. Moreover,
the major histone co-modification patterns of the genes were also been
found in both cell lines. Some important histone modifications
(H3K4me3, H3K27ac, H3K9ac, and H3K4me2) and their modification regions
in each gene were identified. Our results will help study the
pathogenesis of HCC and discover new biomarkers as well as using
epigenetic modifications as novel target drugs.
5. Conclusion
In this study, we analyzed the changes of histone modifications in two
kinds of cell lines and the effects of histone modifications on the
levels of gene expression from the perspective of bioinformatics. The
important histone modifications and 17 key genes were identified, the
changes of regions for histone modifications in these key genes were
located, the survival analysis was used to further verify the impact of
key genes on prognosis. These results may help identify histone
modifications as biomarkers and find more potential therapeutic targets
for HCC.
Author contributions
Yu-Xian Liu designed the study, collected data, analyzed data, and
wrote the article. Qian-Zhong Li conceived the idea and was involved in
the study, discussion, writing, and revision of the whole article.
Yan-Ni Cao collected part of the data, discussed part of the results
and revised the whole article. Lu-Qiang Zhang discussed part of the
results and revised the article. They all approved final version of the
manuscript.
Financial support
This work was supported by the National Natural Science Foundation of
China [grant nos. 31870838 and 61861035].
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgements