Abstract
The autophagy cell, which can inhibit the formation of tumor in the
early stage and can promote the development of tumor in the late stage,
plays an important role in the development of tumor. Therefore, it has
potential significance to explore the influence of autophagy-related
genes (AAGs) on the prognosis of hepatocellular carcinoma (HCC). The
differentially expressed AAGs are selected from HCC gene expression
profile data and clinical data downloaded from the TCGA database, and
human autophagy database (HADB). The role of AAGs in HCC is elucidated
by GO functional annotation and KEGG pathway enrichment analysis.
Combining with clinical data, we selected age, gender, grade, stage, T
state, M state, and N state as Cox model indexes to construct the
multivariate Cox model and survival curve of Kaplan Meier (KM) was
drawn to estimate patients’ survival between high- and low-risk groups.
Through an ROC curve drawn by univariate and multivariate Cox
regression analysis, we found that seven genes with high expression
levels, including HSP90AB1, SQSTM1, RHEB, HDAC1, ATIC, HSPB8, and BIRC5
were associated with poor prognosis of HCC patients. Then the ICGC
database is used to verify the reliability and robustness of the model.
Therefore, the prognosis model of HCC constructed by autophagy genes
might effectively predict the overall survival rate and help to find
the best personalized targeted therapy of patients with HCC, which can
provide better prognosis for patients.
Keywords: hepatocellular carcinoma, autophagy gene, survival rate,
survival model, TCGA database, ICGC database
Introduction
Hepatocellular carcinoma (HCC) is the most common primary liver cancer
accounting for more than 95%. The main cause of HCC is chronic
infection of hepatitis B virus (HBV) and hepatitis C virus (HCV)
([27]Orcutt and Anaya, 2018). Because of the poor prognosis for HCC,
the mortality of HCC has been increasing in the past few decades. The
3-year survival rate is only 12.7%, and the median survival time is
only 9 months ([28]Kim et al., 2017). Therefore, it is very important
to explore the early diagnosis and accurate prognosis of liver cancer.
At present, autophagy has been proved to be associated with a variety
of human diseases from neurodegenerative diseases to cancer, including
HCC. Autophagy is a process in which damaged, denatured, or aged
proteins and organelles are transported to lysosomes for digestion and
degradation ([29]Janku et al., 2011). Autophagy can prevent the
accumulation of damaged proteins and organelles and inhibit the
carcinogenesis of cells. However, once a tumor is formed, autophagy
provides more abundant nutrition for cancer cells and promotes the
growth of tumors ([30]Mizushima, 2007). Acute spinal cord injury (ASCI)
is considered to be a serious damage to the central nervous system. At
present, the research in spinal surgery and neurology has highlighted
this complexity, in which autophagy is considered to play a crucial
role. Xu et al. used computational tools and published data to identify
genes and molecular pathways related to ASCI and autophagy, and to
identify drugs targeting genes related to ASCI and autophagy. A total
of 20 potential genes, 6 important pathways, and 28 drug candidates
were identified, which laid a foundation for the development of new
clinical trials and new targeted therapies ([31]Xu and Zhao, 2021).
[32]Xu et al. (2020) used a machine-learning method to analyze a large
number of colon cancer patients’ data, constructed a prognosis
evaluation model of colon cancer patients based on autophagy genes, and
identified eight autophagy genes (CTSD, ULK3, CDKN2A, NRG1, ATG4B,
ULK1, DAPK1, and SERPINA1) as key prognostic genes. Eight genes were
verified to be closely related to the autophagy process of
tumorigenesis and development. [33]Wang et al. (2020) and [34]Zhang et
al. (2020) also explored the potential value of autophagy-related genes
(AAGs) in lung adenocarcinoma through the autophagy gene model. Some
prognosis-related autophagy genes were identified to be associated with
Lung adenocarcinoma (LUAD), which provided important value for the
prognosis of LUAD. In the research and treatment of laryngeal cancer,
autophagy has also attracted much attention. [35]Luo et al. (2020)
established an autophagy-related model to predict the prognosis of
patients with laryngeal cancer. Four key genes (IKBKB, ST13, TSC2, and
map2k7) were screened out through the model, and the comprehensive
analysis of ARGs expression profile and clinical results finally
determined that there was a significant correlation with overall
survival of laryngeal cancer. However, due to the existence of single
cohort false positive rate, the results are not reliable. Therefore, in
this study, we combined autophagy genes with the gene data set of HCC
patients to identify the biomarkers of HCC.
Preclinical models and clinical trials have confirmed that the complex
interaction between autophagy and apoptosis determines the degree of
hepatocyte apoptosis and the progress of liver diseases ([36]David,
2012). Autophagy has a great influence on liver diseases, such as
alcoholic fatty liver disease, hepatitis, and fatty liver. Therefore,
further study on the relationship between autophagy and tumor, and the
construction of prognosis model, may be of great value for tumor
treatment strategy.
In this paper, we first combined a complete set of human autophagy
genes with the expression profile data of HCC and mined the autophagy
genes related to prognosis. In order to explore the potential value of
autophagy in HCC, the prognosis risk model of autophagy genes was
constructed, and the risk value of patients was calculated. In the
experiment, 62 differentially expressed autophagy genes were selected
from HCC samples in the TCGA database. The autophagy genes combined
with survival time to determine whether the differentially expressed
autophagy genes in HCC patients are related to the overall survival
time. Then the prognosis model was established, and the accuracy of the
model was determined by Kaplan Meier (KM) curve and Receiver Operating
Characteristic (ROC) curve. In addition, we downloaded HCC data from
the ICGC database as compared group to verify the reliability and
robustness of autophagy gene prognosis model.
Data and Methods
Data Source
Two hundred thirty-three human autophagy genes are downloaded from
human autophagy gene database^[37]1, and HCC sample data including 50
normal samples and 374 cancer samples are downloaded from TCGA
database^[38]2. Human genome annotation is downloaded from ENSEMBL
databas^[39]3. Then another HCC sample data is downloaded from ICGC
database^[40]4, which contains 202 normal samples and 243 cancer
samples.
Select Differentially Expressed Autophagy Genes
Two hundred thirty-three autophagy genes are screened from the
experimental group tumor data downloaded from TCGA database, and the
control group data is downloaded from the ICGC database.
The logarithm of log 2 between tumor samples and normal samples was
taken to normalize the gene expression values. When the screening
threshold was set as | log (foldchange) | > 1 and False Discovery Rate
(FDR) < 0.05, the p-value was corrected by FDR. The smaller the
p-value, the greater the difference of genes. Wilcoxon signed rank test
was used to identify autophagy genes differentially expressed between
tumor samples and normal samples. Finally, 62 AAGs were identified in
the experimental group and 28 autophagy related genes in the control
group.
Construct Prognosis-Model and Verify Accuracy
Cox regression model is a semi-parametric regression model proposed by
the British statistician D.R. Cox. The Cox model takes survival outcome
and survival time as dependent variables, which can simultaneously
analyze the influence of many factors on survival time. The basic form
of the model is as follows:
[MATH: h(t,X)=h0<
mrow>(t)exp(β1X1+β2X2+…+βm
msub>Xm) :MATH]
(1)
β[1,] β[2],……, β[m] are the partial regression coefficient of the
independent variable, which is the parameter to be estimated from the
sample data; h[0](t) is the baseline risk rate of h(t, X) when x vector
is 0, which is the quantity to be estimated from the sample data.
Therefore, in this study, we used univariate and multivariate Cox
regression analysis to study the correlation between overall survival
and gene expression level. We used univariate Cox regression analysis
to determine overall survival-related genes and narrow the range of HCC
marker genes. Then we used the multivariate Cox model to construct a
prognostic-related model of HCC. Seven factors including age, gender,
cell grade, cell stage, T status, M status, and N status are selected
as indicators of the Cox model, which is defined as follows:
[MATH:
riskSco
re=∑i=1n
xiyi :MATH]
(2)
Where x[i] represents the expression of gene I, y[i] represents the Cox
regression coefficient of gene I, and n represents the number of
independent indicators. According to the formula of multivariate Cox
model, we can calculate the median value of risk value with each
patient, which can be used to divide patients into high-risk group and
low-risk group. The KM curve is drawn to judge whether it exists
difference in survival between two groups. Then we draw the multi-index
characteristic curve (ROC), and the area under the curve indicates
whether the model is meaningful.
Enrichment Analysis
Go enrichment analysis can be divided into three parts: biological
process (BP), molecular function (MC), and cell component (CC). BP is a
process that is composed of orderly MCs and has many steps. MC
describes the activity of an individual molecular biological. What kind
of gene or organelle is located in CC.
In order to study the tumor molecular mechanism in AAGs, R packages
with DOSE, CLUSTERPROFILER, ENRICHLOT, GGPLOT2, and ORG.HS.EG.DB is
used to select the differential genes between the experimental group
and the control group. Then we calculate the hypergeometric
distribution relationship between differential genes and a specific
branch in GO classification. P-value is obtained by Go analysis for
each differential gene, and the smaller the p-value, the more
significant the differential gene enrichment (P < 0.05).
The above-mentioned R-package is also used to process the two groups of
data in KEGG pathway analysis. The differential genes are selected
based on P < 0.05 and are regarded as a whole network with expression
information process.
Data Processing
In this paper, all data processing and chart analysis are based on R
language (R 4.0.2) and Perl language. The limma package is used to
obtain the mean value of genes with multiple rows, and the Wilcox test
is utilized to take differentially expressed autophagy genes. Log2 is
used to normalize the gene expression data, and the genes with P < 0.05
are selected as significant difference genes. In order to represent the
survival difference between the two groups, the KM curve is drawn, and
the area under the ROC curve is used to evaluate the performance of the
model.
Results
The experimental flow chart of this paper is shown in [41]Figure 1, and
we will discuss each part in detail later.
FIGURE 1.
[42]FIGURE 1
[43]Open in a new tab
The flow chart of the experiment.
Differential Expression of Autophagy-Related Genes
We process the data with 424 cases in the experimental group and 445
cases in the control group, respectively. In the experimental group, 50
cases are normal samples, and 374 cases are cancer samples. There were
202 normal samples, and 243 tumor samples in the control group. The
Wilcox test is used to select differentially expressed autophagy genes.
The screening threshold of the differential expression with 233
autophagy genes is set as | log(foldchange) | > 1 and fdr < 0.05 in the
experiment group. The threshold of upregulated genes is set as log
(foldchange) > 0; otherwise, it is downregulated genes. Finally, 62
autophagy-related genes that contained 4 downregulated genes and 58
upregulated genes are identified. In the control group, 28
autophagy-related genes are identified in which 2 genes are
downregulated and 26 genes are upregulated.
The Pyrogram and volcano figure of 62 autophagy-related genes in normal
and tumor samples are drawn using pheatmap R package, as shown in
[44]Figure 2. In order to show the gene expression of autophagy
differential genes in normal samples and tumor samples more directly,
the box diagram of autophagy-related genes is also drawn using ggpubr R
package, as shown in [45]Figure 3.
FIGURE 2.
[46]FIGURE 2
[47]Open in a new tab
Differential expression of autophagy-related genes between normal and
tumor samples in HCC. (A) Volcano map indicates the differential
expression of 232 autophagy genes. The abscissa of volcano map is logFC
and the ordinate is -log10(fdr). Black dots represent genes that have
no difference between normal samples and tumor samples, while green
dots and red dots indicate low and high gene expression in tumor
samples, respectively. (B) Thermography intuitively shows the
hierarchical clustering of differentially expressed autophagy-related
genes in normal and tumor samples. The abscissa of the thermogram is
the sample and the ordinate is the differential autophagy-related
genes, the blue is the normal sample, and the pink is the tumor sample.
Green, black, and red represent the level of gene expression.
FIGURE 3.
[48]FIGURE 3
[49]Open in a new tab
The abscissa of the box diagram represents the differential genes, and
the ordinate represents the expression level of the genes. Green line
represents normal samples, and red line represents tumor samples.
Go Enrichment and KEGG Pathway Analysis
In order to explore the molecular mechanism of autophagy-related genes
in HCC, we performed go enrichment analysis and KEGG pathway analysis
on sample data.
In go enrichment analysis and KEGG pathway analysis, the filter
condition is set as p. adjusted <0.05. If the pathway with p. adjusted
does not exist, p < 0.05 can also be used as the selected condition. We
select the top 10 process of three modules in go enrichment analysis.
In BP, they are autophagy, process utilizing autophagy mechanism,
macroautophagy, regulation of autophagy, regulation of macroautophagy,
neuron death, regulation of apoptotic signaling pathway, autophagosome
assembly, autophagosome organization, and intrinsic apoptotic signaling
pathway. The top 10 process of BP is basically related to the activity
of autophagy. The top 10 process of CC is related to the activity of
organelles but also includes autophagosome. The top 10 process of MF is
basically the activity of individual molecular biology, such as
activated protein kinase regulator activity and cysteine-type
endopeptidase activity. They are shown as in [50]Figure 4A.
FIGURE 4.
[51]FIGURE 4
[52]Open in a new tab
Go enrichment analysis and KEGG pathway analysis. (A) Go enrichment
analysis histogram. From blue to red, the higher the value of P. adjust
and the longer the length, the more genes are enriched in the pathway.
(B) Cycle diagram of KEGG pathway analysis. The inner circle represents
the Z-score value, and the larger the value, the redder the color. The
outer ring represents that pathway contains the most upregulated and
downregulated genes. The red dot indicates the upregulated gene and the
blue dot indicates the downregulated gene. The right figure lists KEGG
pathway.
In KEGG pathway analysis, the top 10 pathways with the most enriched
genes are selected according to the count number of genes in the
pathway, as shown in [53]Figure 4B.
Construct Prognosis-Model of Autophagy-Related Gene
We use all autophagy gene expression levels as input for univariate Cox
regression analysis, and finally find that 62 AAGs are associated with
the survival rate of HCC patients. Among them, 32 AAGs are selected as
risk factors (p < 0.05; HRs, 1.003115658-1.041319005), and their
overexpression may reduce the survival rate of patients.
By multivariate Cox regression analysis, 14 AAGs are associated with
HCC patients as shown in [54]Table 1. When the p-value of the gene is
greater than 0.05 in multivariate analysis, we retain the gene as
complement to other genes. The risk value of each gene can be
calculated based on 14 AAGs with Formula (2).
TABLE 1.
Multivariate Cox regression analysis of survival rate in patients with
HCC.
Genes Coef HR HR.95L HR.95H p-value
SPNS1 1.617445499 5.040198664 2.08995838 12.15507582 0.00031676
ATIC 0.664084569 1.942711289 1.307840069 2.88577116 0.001004542
MLST8 –0.760676142 0.467350325 0.296519154 0.736601071 0.001049409
RHEB 0.530714439 1.700146525 1.152867022 2.50722603 0.00741324
HDAC1 0.465738387 1.593190146 1.102562075 2.302142345 0.013144435
HSPB8 0.153099368 1.165440781 1.023859491 1.326600207 0.020515825
SQSTM1 0.230890229 1.259720952 1.029756152 1.541041413 0.024763062
HSP90AB1 –0.288939881 0.749057236 0.564635872 0.993714306 0.045103224
CASP8 –0.469959004 0.625027892 0.379278762 1.030007225 0.065189281
BIRC5 0.184584243 1.202718297 0.985307028 1.468102086 0.06960683
HGS –0.356212259 0.700323954 0.475715593 1.030980795 0.07102271
FOXO1 –0.256847744 0.773485973 0.56725571 1.054692864 0.104503719
RGS19 –0.254924427 0.774975063 0.559873529 1.072717885 0.124343514
ATG10 0.396133246 1.486067317 0.868989001 2.541339495 0.147893068
[55]Open in a new tab
Verify Model Performance
In order to verify the accuracy of the model, we draw KM survival
curves of HCC experimental group and control group, respectively. When
the value of p is equal to 1.987e-08 in the experimental group, the
3-year survival rate is 46.2% in the high-risk group and 73.6% in the
low-risk group.
In the control group, the 3-year survival rate is 66.7% in the
high-risk group and 93.3% in the low-risk group with p equal to
2.605e-07. It can be seen that the survival rate of low-risk patients
is always higher than that of high-risk groups, whether in the
experimental group or the control group, as shown in [56]Figures 5A,B.
FIGURE 5.
[57]FIGURE 5
[58]Open in a new tab
Survival analysis of HCC patients, red line represents high-risk group,
blue line represents low-risk group. (A) KM overall survival curve of
the experimental group in TCGA database shows significantly difference
between high-risk group and low-risk group. (B) Based on the ICGC
database, the KM survival curve of the control group.
We rank the risk values of HCC patients from low to high, and draw a
scatter diagram to analyze the survival characteristics of patients.
From the scatter chart of the experimental group and the control group
in [59]Figures 6A–F, we can see that the mortality rate increases with
the enlargement of the risk value. In the thermogram of the
experimental group, the autophagy gene HSP90AB1 is highly expressed in
the high and low risk groups, and the expressed values of the autophagy
genes SQSTM1, RHEB, HDAC1, ATIC, HSPB8, and BIRC5 increase with the
enlargement of risk value, which are all regard as high-risk genes of
HCC. Because the data of the control group is less than that of the
experimental group, four high-risk autophagy genes, namely ATIC, BIRC5,
MAPK3, NPC1, are finally displayed in the thermogram.
FIGURE 6.
[60]FIGURE 6
[61]Open in a new tab
(A) Risk curve of HCC patients in experimental group based on TCGA
database. Green line represents low risk and red line represents high
risk. The dotted line is the critical value to distinguish between high
and low risks. (B) The risk curve of HCC patients in control group
based on ICGC database. (C) Based on TCGA database, the scatter plot of
HCC patients in experimental group. The green dot represents survival,
and the red dot represents mortality. With the increase of patient risk
value, the number of deaths also increases. (D) Based on ICGC database,
the scatter plot of HCC patients with different survival status in the
control group. (E) Based on TCGA database, the calorimetry of gene
expression for HCC patients with different risk values in experimental
group. Red represents high-risk gene expression. (F) Based on the ICGC
database, the calorimetry of gene expression of HCC patients with
different risk values in the control group.
In addition, we also construct a multivariate ROC curve to evaluate the
accuracy of the model. According to [62]Figures 7A,B, the area of ROC
risk curve in experimental group based on TCGA database is 0.766, which
proves that the performance of prognosis index based on autophagy
related genes is better than that of other clinical traits. The area of
ROC risk curve is 0.760, which is better than any other index except
stage. On the one hand, for [63]Figure 7A, the ROC curve of T state is
good, but for [64]Figure 7B, T state has been filtered because the ROC
curve of T state is too bad. Meanwhile, the ROC curve of stage in
[65]Figure 7A is general, and the stage ROC curve in [66]Figure 7B is
better. This phenomenon indicates that these two indicators in clinical
traits are not stable and have poor robustness. On the other hand, the
risk score has excellent ROC curve performance and good robustness in
both groups, which may be suitable for a variety of databases.
FIGURE 7.
[67]FIGURE 7
[68]Open in a new tab
(A) ROC curve analysis and clinical characteristics analysis of HCC
autophagy-related genes based on TCGA database. The AUC value of gene
prognosis index based on autophagy related genes is higher than that of
other clinical traits. (B) ROC curve analysis and clinical
characteristics analysis of HCC autophagy-related genes based on ICGC
database.
Discussion
HCC is a major cancer in the worldwide. Due to the slow progress of
complex molecular targeted therapy and lack of effective biomarkers for
the prognosis prediction of HCC ([69]Li et al., 2020), we urgently need
effective methods to find biomarkers to provide therapeutic options for
patients with HCC.
Sim et al. proposed that immunotherapy can be for HCC patients.
Meanwhile, immune checkpoint inhibitors have good tolerance in HCC and
have significant clinical benefits. Then the researchers began to
explore the potential synergy of immune checkpoint inhibitors with
local and other systemic therapies. In order to the optimal use of
these drugs, efforts have been made to identify the molecular
biomarkers of treatment response and drug resistance ([70]Sim and Knox,
2018). Sorafenib, a kinase inhibitor, is the only drug approved for the
treatment of advanced HCC ([71]Lovet and Hernandez-Gea, 2014). However,
sorafenib has developed primary or acquired drug resistance, which has
produced obstacles in the treatment of HCC. [72]Sun et al. (2017) found
that sorafenib can induce autophagy in HCC resulting in cell death or
survival, which affects sorafenib treatment. Fibrolamellar HCC is a
kind of primary liver tumor and is relatively rare ([73]Kassahun,
2016). Most cases are diagnosed as the advanced stage for the initial
diagnosis. Surgery (resection/liver transplantation) is the main
treatment and the only potential treatment option. However, due to the
high recurrence rate, the prognosis of patients is also very poor. With
the development of genome sequencing, more and more potential gene
biomarkers have been found and have been discovered to have predictive
value for HCC patients.
In this study, we combined a complete set of human autophagy genes with
HCC for the first time to explore and verify the potential value of
AAGs in HCC. First, we selected 62 differentially expressed AAGs from
50 normal samples and 374 cancer samples downloaded from the TCGA
database. Because these genes may be deeply involved in the initiation
of HCC, we conduct GO and KEGG analysis on these genes. AAGs were
mainly enriched in macroautophagy, intrinsic apoptotic signaling
pathway and regulation of apoptotic signaling pathway by Functional
Enrichment Analysis, which is consistent with previous studies.
Autophagy is a physiological process that depends on lysosomal to
degrade cytoplasmic proteins and organelles, and eliminates the
response of misfolded proteins and damaged organelles to cell stress
([74]Ryter et al., 2013). In the analysis of KEGG pathway, AAGs are
mainly concentrated in apoptosis, p53 signaling pathway, and cellular
senescence. There have been studies on the inhibition of proliferation
and metastasis of gastric cancer ([75]Wei and Wang, 2017), and the
migration and invasion of cervical cancer cells by the p53 signaling
pathway ([76]Liu et al., 2018). Furthermore, univariate and
multivariate Cox regression analysis are used to select a more useful
marker and establish a risk model for predicting the prognosis of HCC.
Finally, we find that seven genes with high expression levels,
including HSP90AB1, SQSTM1, RHEB, HDAC1, ATIC, HSPB8, and BIRC5, are
associated with poor prognosis in HCC patients. We demonstrate that
these seven gene signatures are superior to conventional
clinicopathological factors for HCC patients. Their ability to predict
survival is also demonstrated in the ICGC database. Through the ROC
curve and AUC value, it is shown that the model performs well in both
the experimental group and the control group. Therefore, we believe
that the risk score model based on seven genes might be used to divide
HCC patients into a high-risk group and low-risk group. It can be
useful for detection recurrence or early prevention in high-risk
groups, and it has great significance for the treatment of the disease.
Nowadays, most of the prognostic features for cancer based on
expression profiles have been proposed with the development of
large-scale public databases. For example, [77]Jia et al. (2018) mined
genes with prognostic value in the glioblastoma microenvironment based
on TCGA database. [78]Xiang et al. (2020) discovered that CLDN10, a key
immune-related gene, is associated with lymph node metastasis in
papillary thyroid carcinoma. Due to lack of enough cases in our control
group, there is a quantitative difference between the four
high-expression autophagy genes screened in the control group and the
seven autophagy genes in the experimental group. However, it does not
affect the final prognosis analysis results.
Conclusion
In this study, we deeply explore the function of autophagy genes in HCC
and build a reliable model based on the differential expression of
autophagy genes. However, our model does have some potential
limitations. On the one hand, due to the lack of sufficient cases and
reliable HCC cells in the control group, there will be partial bias
between our control group and the experimental group. On the other
hand, several other important clinical information, such as various
treatments and the number of lymph nodes, is not available now. In
order to provide better prognosis for patients, further prospective
experiments can be used to test the clinical efficacy and help to find
the best personalized targeted therapy in the future.
Data Availability Statement
The original contributions presented in the study are included in the
article/supplementary material, further inquiries can be directed to
the corresponding author/s.
Ethics Statement
Written informed consent was obtained from the individual(s) for the
publication of any potentially identifiable images or data included in
this article.
Author Contributions
SW and DY participated in its design, performed all the molecular
biological analyses of the data, and construct the prognosis model and
analysis data. DY prepared data and made pre-processing with data. WK
helped with data interpretation and manuscript drafting. All authors
read and approved the final manuscript.
Conflict of Interest
The authors declare that the research was conducted in the absence of
any commercial or financial relationships that could be construed as a
potential conflict of interest.
Funding. This work was supported by the National Natural Science
Foundation of China (No. 61803257) and Natural Science Foundation of
Shanghai (No. 18ZR1417200).
^1
[79]http://www.autophagy.lu/
^2
[80]https://portal.gdc.cancer.gov/
^3
[81]http://asia.ensembl.org/index.html
^4
[82]https://dcc.icgc.org/releases/current/Projects
References