Abstract
Hepatitis B Virus (HBV) infection is a major cause of morbidity and
mortality worldwide. However, poor understanding of its pathogenesis
often gives rise to intractable immune escape and prognosis recurrence.
Thus, a valid systematic approach based on big data mining and
genome-wide RNA-seq data is imperative to further investigate the
pathogenetic mechanism and identify biomarkers for drug design. In this
study, systems biology method was applied to trim false positives from
the host/pathogen genetic and epigenetic interaction network (HPI-GEN)
under HBV infection by two-side RNA-seq data. Then, via the principal
network projection (PNP) approach and the annotation of KEGG (Kyoto
Encyclopedia of Genes and Genomes) pathways, significant biomarkers
related to cellular dysfunctions were identified from the core
cross-talk signaling pathways as drug targets. Further, based on the
pre-trained deep learning-based drug-target interaction (DTI) model and
the validated pharmacological properties from databases, i.e., drug
regulation ability, toxicity, and sensitivity, a combination of
promising multi-target drugs was designed as a multiple-molecule drug
to create more possibility for the treatment of HBV infection.
Therefore, with the proposed systems medicine discovery and
repositioning procedure, we not only shed light on the etiologic
mechanism during HBV infection but also efficiently provided a
potential drug combination for therapeutic treatment of Hepatitis B.
Keywords: hepatitis B virus infection, pathogenesis, host/pathogen
interspecies genetic and epigenetic network (HPI-GEN), systems medicine
discovery, drug-target interaction (DTI) model, deep learning,
multiple-molecule drug
1. Introduction
Hepatitis B Virus (HBV) infection remains a prevalent health challenge
worldwide, and is greatly associated with substantial morbidity and
mortality due to serious liver diseases, such as hepatitis, fibrosis,
cirrhosis, and even hepatocellular carcinoma [[28]1]. According to the
latest report of World Health Organization (WHO) [[29]2], HBV affected
an estimated 257 million people and causing 600,000 deaths per year.
While current therapies seek to control the progression of the disease,
life-long treatment and surveillance are still needed because
resistance develops during treatment and reactivation often occurs
after medication discontinuation [[30]3,[31]4]. In recent years,
dedication to new drug design and immunotherapy development has been
made by researchers to hopefully silence HBV in infected hepatocytes
[[32]5,[33]6]. However, the host/pathogen cross-talk mechanism
contributing to the development and persistence of HBV infection
requires further investigation, and its current therapies are still
inadequate. Thus, new treatment options are needed to achieve a better
therapy for HBV-infected patients.
Even so, de novo drug discovery is particularly an expensive,
time-consuming, and high-risk process, let alone exploring a drug to
eradicate HBV from patients, which is generally believed arduous
[[34]4,[35]7,[36]8]. Conventionally, the pipelines of drug development
from laboratory to market take nearly average 13 to 15 years and cost
approximately two to three billion U.S. dollars, in which about 10% of
all drugs that start from pre-clinical trials ever make it to actual
human testing [[37]9,[38]10]. Fortunately, as the advancement of
biological science drives the progress of high-throughput functional
genomics and proteomics technology, novel experimental and
bioinformatics methods have not only accelerated the deciphering of
protein interactions, but also given rise to the construction of
networks for signaling pathways and protein complex identification in
specific diseases [[39]11]. Meanwhile, based on the manifold data types
from wet-lab experiments, several computational approaches have been
suggested to predict biological networks for expediting the modeling of
functional pathways so as to exemplify the molecular mechanisms of
cellular processes. Researchers are thereby being empowered to
profoundly explore the complicated and mysterious mechanisms of
biological systems for better investigating the potential drug targets
[[40]12]. On the other hand, with the development of molecular biology,
drug repurposing, also known as drug repositioning, was proposed to
reduce the time spent and the capital expenditure. The process of drug
repositioning is to find new use outside the scope of the original
medical indication for existing drugs [[41]13]. Its prospect for new
clinical indications is promising, since compared with other drug
development strategies, a repositioned drug could be instantly put into
preclinical testing and clinical trials for its safety and reliability
proven in humans.
Nonetheless, the commonly used correlation networks in bioinformatics
applications are often based on oversimplified algorithms. As inherent
difficulties appear in assessing the impact of microRNA (miRNA), long
non-coding RNA (lncRNA), epigenetic modification, microenvironment
factors, etc., hardly can these networks mirror the complicated
mechanisms of factual biological systems. In addition, while wide doses
of the repositioned drugs can be tested using in vitro and in vivo
experiments, it takes great time and effort. Therefore, to discover
potential drugs for the treatment of HBV-infected patients, the
development of a systematic procedure in computational framework based
on potent models and existing data is indispensable.
In this study, we employed the systems biology technique to identify
the significant biomarkers from the host/pathogen interspecies genetic
and epigenetic interaction network (HPI-GEN) through two-side RNA-seq
data during HBV infection. Subsequently, with the analysis of these
biomarkers by the proposed drug discovery and repositioning procedure,
potential combinations of multi-targets drugs were recommended to the
clinical trials of Hepatitis B therapy. By and large, the process can
be subdivided into a few steps: (1) Construct candidate HPI-GEN
composed of host/pathogen interspecies protein-protein interaction
network (HPI-PPIN) and host/pathogen interspecies gene regulation
network (HPI-GRN) by big data mining. (2) Prune the false positives in
candidate HPI-GEN by system identification analysis with the assistance
of two-side profile RNA-seq data and Akaike Information Criterion (AIC)
to identify the real HPI-GEN between host and pathogen. (3) Extract the
core HPI-GEN from real HPI-GEN via principal network projection (PNP)
approach. (4) Investigate the core signaling pathways related to
manifold cellular function (especially host inflammatory and immune
response, pathological anti-apoptosis, and cell survival) in the core
HPI-GEN by the annotation of KEGG (Kyoto Encyclopedia of Genes and
Genomes) pathways to gain insights into the etiologic mechanism of HBV
infection. (5) Select the significant biomarkers related to
pathological cellular dysfunctions. (6) Predict the candidate drugs
from available drugs for the selected significant biomarkers by the
pre-trained deep learning-based drug-target interaction (DTI) model.
(7) Consider general criteria for drug design specifications including
regulation ability, toxicity, and sensitivity to further sift out the
promising drugs. (8) Combine these agreeable drugs as a potential
multiple-molecule drug for possible therapeutic treatment of Hepatitis
B. It is worthwhile to note that instead of a single drug, a
combination of promising drugs was proposed for the treatment to slow
or even prevent inherent or adaptive drug resistance through the
simultaneous blockade of multiple targets or pathways [[42]14].
Nowadays, researches have been dedicated to drug discovery and design,
while few of them propose an integral end-to-end systematic procedure
from identifying significant biomarkers based on pathogenic mechanism
to recommending promising drugs for one disease. Here, we dig into the
pathogenic characters of viral proteins played on the pathogenesis to
select the potential drug targets from two-side RNA-seq data and
reinterpret the role of available drugs based on a pre-trained deep
learning-based DTI model through big drug-target interaction data,
which gives an alternative way for systems drug discovery and
repositioning to overcome HBV infection.
2. Materials and Methods
2.1. The Construction of the HPI-GEN in HepaRG Cell Line during HBV Infection
and the Application of Deep Learning-Based DTI Model for New Drug Design: An
Overview
To gain new insight into the molecular mechanism of HBV infection,
candidate cross-talk HPI-GEN, which consists of candidate intraspecies,
candidate interspecies PPIN between host and pathogen, and candidate
HPI-GRN encompassing regulatory networks between host gene, pathogen
gene, host miRNA and host lncRNA, was constructed by big data mining
and preprocessing of host/pathogen gene/miRNA/lncRNA expression from
online database.
Given that the information for interspecies PPIN construction is
insufficient, in addition to interactions attained from bio-databases,
data amplification technique through nature-linguistic programming
(NLP) and web scrapping was introduced to expand the dataset. Then, for
identifying real HPI-GEN, a systematic identification approach was
exploited based on the genome-wide RNA-seq data of HepaRG cells and
Hepatitis B Virus under HBV infection. In addition, to deal with the
highly complicate real HPI-GEN, we employed PNP method to extract core
network structure consisted of the top 3000 nodes with highest energy.
Such effective approach benefits us to investigate the core signaling
pathways for establishing the crucial molecular pathogenic mechanism to
identify significant biomarkers that contribute to the progression of
HBV infection. Finally, following the drug discovery procedure, we
built the deep learning-based DTI model to identify the potential
multi-target drugs to target their significant biomarkers for
therapeutic treatment of HBV infection.
2.2. Big Data Mining and Data Preprocessing of RNA-seq Data for Human and
Pathogen
In this study, two-side sequencing-based dataset was obtained from the
Gene Expression Omnibus (GEO) at the National Center for Biotechnology
Information (NCBI). We obtained the data recorded by Mueller et al.
[[43]15], containing human and virus genomes with the time-course
through GEO series with accession number [44]GSE101575 [[45]15]. The
raw data of the dataset, which contains 27,267 human genes and 4 HBV
genes, involves the gene expression profiles of human HepaRG cells from
three patients during HBV infection at 6, 9, 12, and 24 h post
reactivation with drug (RG7834) treatment and without. Only the data
without medication administration was deployed to the ensuing system
identification process.
To unravel the cellular mechanism during HBV infection, core signaling
pathways identified from HPI-GEN are requisite. The candidate HPI-GEN,
involving many experimental data and computational predictions from
numerous databases, consists of multiple networks. Among them, the
candidate human PPIs were obtained from the Biological General
Repository for Interaction Datasets database (BioGRID) [[46]16], DIP
[[47]17], BIND [[48]18], IntAct [[49]19], and MINT [[50]20]; the
candidate human miRNA and lncRNA regulations were individually obtained
from Target Scan Human database [[51]21], and CircuitDB [[52]22]; the
candidate regulations between human transcription factors and their
targets were according to the Integrated Transcription Factor Platform
database (ITFP) [[53]23], the Human Transcriptional Regulation
Interactions database (HTRIdb) [[54]24], and the Transcription Factor
database (TRANSFAC) [[55]25]; the candidate host-pathogen interspecies
PPIs were obtained from VirusMentha [[56]26], IMEx [[57]27], PSICQUIC
[[58]28], the PPIN constructed by Zhong et al. [[59]29], and based on
the automatic pipeline method of NLP; the candidate host-pathogen
interspecies regulations were collected from VIRmiRNA [[60]30], ViRBase
[[61]31], miRecords [[62]32], starBase v2.0 [[63]33], miRTarBase
[[64]34], and based on the automatic pipeline method of NLP.
2.3. Text Mining of Human and Pathogen Protein Interactions
To reinforce the comprehension of the interaction relationships between
host and pathogen, BioBERT (Bidirectional Encoder Representations from
Transformer for Biomedical Text Mining) [[65]35,[66]36], an up-to-date
BioNER (Biomedical Named Entity Recognition) approach is employed on
the PubMed database to analyze binary interactions between host and
pathogen. BioNER is a computerized procedure of identifying biomedical
named entities such as genes, protein complexes, diseases, and
chemicals in texts referring to information retrieval and data
extraction. Amongst all the BioNER application, BioBERT, which has
achieved state-of-the-art performance in terms of precision, is
efficaciously utilized in biomedical text mining applications to
address data.
In brief, we scraped and collected 26,732 existing PubMed articles
associated with Hepatitis B Virus and Homo sapiens within 20 years and
transformed it into XML format. Nonetheless, on account of Pubmed’s
property, the full texts from most articles in PMC aren’t publicly
accessible so that looking forward to automatically download integrated
content is unrealistic. Consequently, we additionally crawled 2831
articles from PLOS ONE to strengthen our network complexity. Then,
BioBERT model was exerted to discern known entities and identify the
types of overlapping entities among each sentence separated from the
relevant content with periods in the corpus.
In order to facilitate analysis and reconfirmation, we also gathered
the synonym names of both host and pathogen proteins (compiled from
BioThesaurus database [[67]37]) to prevent omission of vital
interactions owing to protein names mismatch. Afterwards, in each
sentence, we extracted all present proteins and identified the
correlations by determining the human proteins under the circumstance
that Hepatitis B Virus proteins had existed. To abate computational
complexity, in conformity with the subsequent system identification
processing requirement, the verb recognition and identification were
left out of account and binary interactions were built. For instance,
providing that “HBx” and “BCL2” were simultaneously identified in the
same sentence, we then set the corresponding value to 1 and so forth,
vice versa to 0.
Furthermore, to verify that authentic interaction relationships derived
from the statistical analysis of sentences, we scrutinized entire
extracted entities and completed the construction of ultimate host and
pathogen interaction-relationships matrix.
2.4. Dynamic Models of Candidate HPI-GEN for Human Cells and HBV During the
Infection
Candidate HPI-GEN, constructed via big data mining techniques, is the
epitome of an intricate network comprised of protein interactions and
gene regulations between host and pathogen during HBV infection. As
inevitable false-positive information extraction might incur system
distraction from genuine consequence and discrepancy during the pathway
analysis process, dynamic models were built to carry out systematic
identification approach by two-side time-profile HBV infection RNA-seq
data to prune false-positives in candidate HPI-GEN.
Protein abundances mirror a dynamic balance among a series of linked
processes spanning the transcription, processing, and degradation of
mRNAs to the translation, modification, and breakdown of the proteins
themselves necessary for cellular protein production and maintenance
[[68]38]. As a consequence, for the protein interactive model of PPI
pre-network in candidate HPI-GEN, the dynamic interaction model of the
ith host protein can be described by the following equation:
[MATH: piH(t+1)
=piH
(t)+∑n=1NiainH<
/msubsup>piH(
mo>t)pnH
msubsup>(t)
+∑v=1VibivH<
/msubsup>piH(
mo>t)pvP
msubsup>(t)+αiHgiH(
t)−γiH
piH<
mo>(t)+βi<
/mi>H+viH(t),
for i=
1,2,…,I<
mo>, αiH≥0 and −γiH≤0 :MATH]
(1)
where
[MATH:
piH
(t), pnH(t), giH
mi>(t)
:MATH]
and
[MATH:
pvP
(t) :MATH]
indicate the expression levels of the ith host protein, the nth host
protein, the ith host gene, and the vth pathogen protein at time t,
respectively;
[MATH:
ainH
mi> :MATH]
and
[MATH:
bivH
mi> :MATH]
denote the interactive ability between the ith host protein and nth
host protein and between the ith host protein and vth pathogen protein,
respectively;
[MATH:
Ni
and Vi
:MATH]
signify the number of host proteins and pathogen proteins interacting
with the ith host protein; and
[MATH: I :MATH]
typifies the total number of the human proteins in candidate PPIN.
[MATH:
αiH,
−γiH :MATH]
and
[MATH:
βiH :MATH]
indicate the translation rate from the corresponding mRNA, the
degradation rate, and the basal expression level of the ith host
protein, respectively; the basal level denotes the interactions with
unknown factors, such as acetylation, methylation, and ubiquitination;
and
[MATH:
viH
(t) :MATH]
represents the stochastic noise of the ith host protein at time t. Note
that the intraspecies and interspecies biological mechanism of human
proteins in candidate PPIN are represented in Equation (1) as the form
of
[MATH: ∑n=1NiainH<
/msubsup>piH(
mo>t)pnH
msubsup>(t) :MATH]
and
[MATH: ∑v=1VibivH<
/msubsup>piH(
mo>t)pvP
msubsup>(t) :MATH]
, respectively.
Now that the interactions between the host and pathogen proteins are
mutual and their biological characteristics involved in PPIs are
similar, the dynamic PPIN of the
[MATH: qth :MATH]
pathogen protein in candidate HPI-GEN can be described by the analogous
equation as follows:
[MATH: pqP(t+1)
=pqP
(t)+∑n=1NqaqnP<
/msubsup>pqP(
mo>t)pnH
msubsup>(t)
+∑v=1VqbqvP<
/msubsup>pqP(
mo>t)pvP
msubsup>(t)+αqPgqP(
t)−γqP
pqP<
mo>(t)+βq<
/mi>P+vqP(t),
for q=1,2
mn>,…,Q, αqP≥0 and −γqP≤0 :MATH]
(2)
where
[MATH:
pqP
(t), pnH(t), gqP
mi>(t)
:MATH]
and
[MATH:
pvP
(t) :MATH]
indicate the expression levels of the
[MATH: qth :MATH]
pathogen protein, the nth host protein, the
[MATH: qth :MATH]
pathogen gene, and the vth pathogen protein at time t, respectively;
[MATH:
aqnP
mi> :MATH]
and
[MATH:
bqvP
mi> :MATH]
denote the interactive ability between the
[MATH: qth :MATH]
pathogen protein and nth host protein and between the
[MATH: qth :MATH]
pathogen protein and vth pathogen protein, respectively;
[MATH: Nq
:MATH]
and
[MATH: Vq
:MATH]
signify the number of host proteins and pathogen proteins interacting
with the
[MATH: qth :MATH]
pathogen protein; and
[MATH: Q :MATH]
typifies the total number of the pathogen proteins in candidate PPIN.
[MATH:
αqP,
−γqP :MATH]
and
[MATH:
βqP :MATH]
indicate the translation rate from the corresponding mRNA, the
degradation rate, and the basal expression level of the
[MATH: qth :MATH]
pathogen protein, respectively; the basal level
[MATH:
βqP :MATH]
denotes the interactions with unknown factors such as acetylation and
ubiquitination; and
[MATH:
vqP
(t) :MATH]
represents the stochastic noise of the
[MATH: qth :MATH]
pathogen protein at time t. Similarly, the intraspecies and
interspecies biological mechanism of pathogen proteins in candidate
PPIN are represented in Equation (2) as the form of
[MATH: ∑n=1NqaqnP<
/msubsup>pqP(
mo>t)pnP
msubsup>(t) :MATH]
and
[MATH: ∑v=1VqbqvP<
/msubsup>pqP(
mo>t)pvH
msubsup>(t) :MATH]
, respectively.
For the GRN of host genes in the candidate HPI-GEN, the dynamic
regulatory model of the jth host gene can be described as follows:
[MATH: gjH(t+1)
=gjH
(t)+∑τ=1TjcjτH<
/msubsup>pτH(
mo>t)+∑ℓ=1LjdjℓH<
/msubsup>lℓH(
mo>t)−∑μ=1MjejμH<
/msubsup>gjH(
mo>t)mμH
msubsup>(t)−λjHgjH(
t)+δjH
+ϖjH(t), <
/mtext>for j=1,2,…,J,
−ejμ
H≤0 and−λjH≤0 :MATH]
(3)
where
[MATH:
gjH
(t), pτH(t), mμH
mi>(t)
:MATH]
and
[MATH:
lℓH
(t) :MATH]
indicate the expression levels of the jth host gene, the
[MATH: τth :MATH]
host TF, the
[MATH: μth :MATH]
host microRNA, and the
[MATH: ℓth :MATH]
host lncRNA at time t, respectively;
[MATH:
cjτH
mi>, −ejμH :MATH]
and
[MATH:
djℓH
mi> :MATH]
represent the regulation ability of the
[MATH: τth :MATH]
host TF, the
[MATH: μth :MATH]
host microRNA, and the
[MATH: ℓth :MATH]
host lncRNA on the jth host gene, respectively;
[MATH:
Tj,
Mj :MATH]
and
[MATH: Lj
:MATH]
denote the number of host TFs, host miRNAs, and host lncRNAs,
respectively, which regulate the expression level of the jth host gene;
[MATH: J :MATH]
indicates the total amount of genes in GRN;
[MATH:
−λjH
:MATH]
and
[MATH:
δjH :MATH]
indicate the degradation rate and the expression basal level of the jth
host gene, respectively; and
[MATH:
ϖjH
(t) :MATH]
denotes the stochastic noise due to modeling residue at time t. The
basal level
[MATH:
δjH :MATH]
denotes the unknown regulations such as methylation. Note that since
microRNAs silence gene expression by binding to target mRNAs [[69]39],
repression ability
[MATH:
−ejμH :MATH]
in candidate GRN model should be constrained negative. Likewise,
[MATH:
−λjH
:MATH]
degradation rate should be restricted non-positive due to the
depression regulation of mRNA. Furthermore, though supporting evidence
comes from numerous research showing that HBV proteins play a vital
role in the regulation of viral and host gene expression, different
from other viruses such as HIV, whose viral protein directly binds to
nucleic acids [[70]40], HBV proteins are thought to affect host gene
expression through transregulation, e.g., transactivation and
transrepression [[71]41,[72]42,[73]43,[74]44]. Thus, it seems tenable
to exclude the regulations between pathogen proteins and host gene in
the Equation (3).
In the same way, for the GRN of pathogen genes in the candidate
HPI-GEN, the dynamic regulatory model of the
[MATH: uth :MATH]
pathogen gene can be described as follows:
[MATH: guP(t+1)
=guP
(t)+∑τ=1TucuτP<
/msubsup>pτH(
mo>t)+∑ℓ=1LuduℓP<
/msubsup>lℓH(
mo>t)−∑μ=1MueuμP<
/msubsup>guP(
mo>t)mμH
msubsup>(t)−λuPguP(
t)+δuP
+ϖuP(t), <
/mtext>for u=1,2,…,U,
−euμ
P≤0 and−λuP≤0 :MATH]
(4)
where
[MATH:
guP
(t), pτH(t), mμH
mi>(t)
:MATH]
and
[MATH:
lℓH
(t) :MATH]
indicate the expression levels of the
[MATH: uth :MATH]
pathogen gene, the
[MATH: τth :MATH]
host TF, the
[MATH: μth :MATH]
host microRNA, and the
[MATH: ℓth :MATH]
host lncRNA at time t, respectively;
[MATH:
cuτP
mi>, −euμP :MATH]
and
[MATH:
duℓP
mi> :MATH]
represent the regulation ability of the
[MATH: τth :MATH]
host TF, the
[MATH: μth :MATH]
host microRNA, and the
[MATH: ℓth :MATH]
host lncRNA on the
[MATH: uth :MATH]
pathogen, respectively;
[MATH:
Tu,
Mu :MATH]
and
[MATH: Lu
:MATH]
denote the number of host TFs, host miRNAs, and host lncRNAs,
respectively, which regulate the expression level of the
[MATH: uth :MATH]
pathogen;
[MATH: U :MATH]
indicates the total amount of pathogen genes in GRN;
[MATH:
−λuP
:MATH]
and
[MATH:
δuP :MATH]
indicate the degradation rate and the expression basal level of the
[MATH: uth :MATH]
pathogen, respectively; and
[MATH:
ϖuP
(t) :MATH]
denotes the stochastic noise due to modeling residue at time t. It is
worth noting that there is a consensus on the opinion about the
indispensability of host protein synthesis machinery for pathogen
protein production and gene transcription [[75]45,[76]46]. Walsh et al.
[[77]47] even suggested that viruses are fully reliant on the
translation machinery of their host cells to produce the polypeptides
that are essential for viral replication. As a result, only gene
regulation conducted by host transcription factors
[MATH:
pτH
(t) :MATH]
instead of pathogen proteins
[MATH:
pP<
mo>(t) :MATH]
is included in Equation (4).
Since the expression of the
[MATH: μth :MATH]
host miRNA
[MATH:
mμH
(t) :MATH]
and the
[MATH: ℓ :MATH]
th host
[MATH:
lℓH
(t) :MATH]
in Equation (3) at time t might also be regulated by other regulators,
the
[MATH: μth :MATH]
miRNA can be described by the following stochastic dynamic regulatory
equation:
[MATH: mμH(t+1)
=mμH
(t)+∑τ=1TμfμτH<
/msubsup>pτH(
mo>t)+∑ℓ=1LμhμℓH<
/msubsup>lℓH(
mo>t)−∑r=1MμkμrH<
/msubsup>mμH(
mo>t)mrH
msubsup>(t)−φμHmμH(
t)+ημH
+εμH(t), <
/mtext>for μ=1,2,…,M,
−kμr
H≤0 and−φμH≤0 :MATH]
(5)
where
[MATH:
mμH
(t), pτH(t), mrH
mi>(t)
:MATH]
and
[MATH:
lℓH
(t) :MATH]
indicate the expression levels of the
[MATH: μth :MATH]
host gene, the
[MATH: τth :MATH]
host TF, the
[MATH: rth :MATH]
host miRNA and the
[MATH: ℓth :MATH]
host lncRNA at time t, respectively;
[MATH:
fμτH
mi>, −kμrH :MATH]
and
[MATH:
hμℓH
mi> :MATH]
represent the regulation ability of the
[MATH: τth :MATH]
host TF, the
[MATH: rth :MATH]
host microRNA and the
[MATH: ℓth :MATH]
host lncRNA on the
[MATH: μth :MATH]
host miRNA, respectively;
[MATH:
Tμ,
Mμ :MATH]
and
[MATH: Lμ
:MATH]
denote the number of host TFs, host miRNAs and host lncRNAs,
respectively, which regulate the expression level of the
[MATH: μth :MATH]
host miRNA;
[MATH: M :MATH]
indicates the total number of miRNAs in GRN;
[MATH:
−φμH
:MATH]
and
[MATH:
ημH :MATH]
indicate the degradation rate and the expression basal level of the
[MATH: μth :MATH]
host miRNA, respectively; and
[MATH:
εμH
(t) :MATH]
denotes the stochastic noise due to modeling residue at time t.
Along the same line, the dynamic regulatory model of the host lncRNA in
candidate GEIN can be described by the following dynamic equation:
[MATH: lℓH(t+1)
=lℓH
(t)+∑τ=1TℓxℓτH<
/msubsup>pτH(
mo>t)+∑ℓ=1OℓyℓoH<
/msubsup>loH(
mo>t)−∑r=1MℓzℓμH<
/msubsup>lℓH(
mo>t)mμH
msubsup>(t)−σℓHlℓH(
t)+ϕℓH
+ψℓH(t), <
/mtext>for ℓ=1,2,…,L,
−zℓμ
H≤0 and−σℓH≤0 :MATH]
(6)
where
[MATH:
lℓH
(t), pτH(t), mμH
mi>(t)
:MATH]
and
[MATH:
loH
(t) :MATH]
indicate the expression levels of the
[MATH: ℓth :MATH]
host lncRNA, the
[MATH: τth :MATH]
host TF, the
[MATH: μth :MATH]
host microRNA and the
[MATH: ℓth :MATH]
host lncRNA at time t, respectively;
[MATH:
xℓτH
mi>, −zℓμH :MATH]
and
[MATH:
yℓoH
mi> :MATH]
represent the regulation ability of the
[MATH: τth :MATH]
host TF, the
[MATH: μth :MATH]
host microRNA and the
[MATH: oth :MATH]
host lncRNA on the
[MATH: ℓth :MATH]
host lncRNA, respectively;
[MATH:
Tℓ,
Mℓ :MATH]
and
[MATH: Lℓ
:MATH]
denote the number of host TFs, host miRNAs and host lncRNAs,
respectively, which regulate the expression level of the
[MATH: ℓth :MATH]
host lncRNA;
[MATH: L :MATH]
indicates the total number of lncRNA in GRN;
[MATH:
−σℓH
:MATH]
and
[MATH:
ϕℓH :MATH]
indicate the degradation rate and the expression basal level of the
[MATH: ℓth :MATH]
host lncRNA, respectively; and
[MATH:
ψℓH
(t) :MATH]
denotes the stochastic noise due to modeling residue at time t.
2.5. Parameter Estimation of the Dynamic Models of Candidate HPI-GEN by
System Identification Approach
In order to identify real HPI-GEN, the accurate model parameters
training process by executing a system identification approach to the
candidate HPI-GEN after the construction of dynamic model Equations
(1)–(6) is requisite. Consequently, the stochastic Equations (1) and
(2), which depict the relation of protein interactions of the ith host
and qth pathogen protein, can be converted into the linear regression
forms as below, respectively:
The ith host protein:
[MATH: piH(t+1)=[piHp1H(t)⋯piHpNi
mrow>H(t)piHp1P(t) ⋯piHpViP(t)giH(t)piH(t)1][a
i1H<
mtr>⋮ai
NiH
bi1H⋮
biViHαi<
/mi>H1−γiHβiH]<
/mo>+viH<
mo stretchy="false">(t) <
/mo> ≜ϕiHP(t
)θiHP+viH(t), for i=1<
/mn>,2,…,I<
/mtr> :MATH]
(7)
The qth host protein:
[MATH: pqP(t+1)
=[pqPp1H(t)⋯pqPpNq
mrow>H(t)pqPp1P(t) ⋯pqPpVqP(t)gqP(t)pqP(t)1][a
q1P<
mtr>⋮aq
NqP
bq1H⋮
bqVqPαq<
/mi>P1−γqPβqP]<
/mo>+vqP<
mo stretchy="false">(t) <
/mo> ≜ϕqPP(t)
θqPP
msubsup>+vqP(t),
for q=1,<
/mo>2,…,Q :MATH]
(8)
where
[MATH:
ϕiHP(t)
:MATH]
and
[MATH:
ϕqPP(t)
:MATH]
represent the regression vectors that can be obtained from the
expression data and
[MATH:
θiHP :MATH]
and
[MATH:
θqPP :MATH]
are the unknown parameter vectors to be estimated for the
[MATH: ith :MATH]
host protein,
[MATH: qth :MATH]
pathogen protein in host PPIN, respectively.
Thence, the Equations (7) and (8) of the ith host and qth pathogen
protein can further be augmented for
[MATH:
Yi :MATH]
and
[MATH:
Yq :MATH]
time points as the following forms, respectively:
The ith host protein:
[MATH:
[
piH(t
mi>2)piH(t3)⋮<
mi>piH(
tYi+1)
]=[ϕiHP<
mrow>(t1)
ϕi
HP(
t2)<
mtr>⋮ϕi
HP(<
msub>tYi)]θiHP+[viH(t<
/mi>1)viH(
mo>t2)⋮
viH(tYi
)],<
/mo> for i
=1,2,…,<
mi>I :MATH]
(9)
the Equation (9) can also be simply represented as
[MATH:
PiH=
ΦiHPθiHP+ΩiHP,for i=
1,2,…,I<
/mrow> :MATH]
(10)
The qth host protein:
[MATH:
[
pqP(t
mi>2)pqP(t3)⋮<
mi>pqP(
tYq+1)
]=[ϕqPP<
mrow>(t1)
ϕq
PP(
t2)<
mtr>⋮ϕq
PP(<
msub>tYq)]θqPP+[vqP(t<
/mi>1)vqP(
mo>t2)⋮
vqP(tYq
)],<
/mo> for q
=1,2,…,<
mi>Q :MATH]
(11)
the Equation (11) can also be simply represented as
[MATH:
PqP=
ΦqPPθqPP+ΩqPP,for q=
1,2,…,Q<
/mrow> :MATH]
(12)
After a succession of equation transformation procedure in advance, the
parameters in vectors
[MATH:
θiHP :MATH]
and
[MATH:
θqPP :MATH]
of Equations (10) and (12) can be estimated by individually employing
the following constrained least-squares estimation problems,
[MATH:
θiHP :MATH]
parameters estimation:
[MATH: θ^iHP=minθiHP‖ΦiHPθiHP−PiH‖22, subject to AiHPθ^iHP≤biHPwher
e AiH
mi>P=[<
mtr>0⋯00⋯0<
mo>−1000⋯0
0⋯0010]∈ℝ2
×(I+Q+3) and
biHP=[01] :MATH]
(13)
[MATH:
θqPP :MATH]
parameters estimation:
[MATH: θ^qPP=minθqPP‖ΦqPPθqPP−PqP‖22, subject to AqPPθ^qPP≤bqPPwher
e AqP
mi>P=[<
mtr>0⋯00⋯0<
mo>−1000⋯0
0⋯0010]∈ℝ2
×(I+Q+3) and
bqPP=[01] :MATH]
(14)
Subsequently, we can acquire the interaction parameters in PPIN
Equations (1) and (2) individually by resolving the least-squares
problems in (13) and (14) with the help of the function lsqlin in
MATLAB optimization toolbox and simultaneously ensure the protein
translation rate
[MATH:
αiH :MATH]
and
[MATH:
αqP :MATH]
to be a non-negative value and the protein degradation rate
[MATH:
−γiH
:MATH]
and
[MATH:
−γqP
:MATH]
to be a non-positive value; that is to say
[MATH:
αiH,
αqP≥0
mrow> :MATH]
and
[MATH:
−γiH
,−γqP≤0 :MATH]
.
In the same manner, the dynamic Equations (3)–(6) which depict the
relationship of gene regulations for the jth host gene, uth pathogen
gene,
[MATH: μth :MATH]
host miRNA, and
[MATH: ℓth :MATH]
host lncRNA can be rewritten into the linear regression forms below,
respectively:
The jth host gene:
[MATH: gjH(t+1)
=[p1H(t)⋯pT
mi>jH(t)gjH(t)m1H(t)⋯gjH(t)mMj
mi>H(t) l1H(t)⋯lL
mi>jH(t) gjH(t) 1][c
j1H<
mtr>⋮cj
TjH
−ej1
H⋮
−ejM
jHdj1H⋮djLjH1−λj
mi>Hδj
mi>H]+
ϖjH(t) <
/mo> ≜ϕjHG(t)
θjHG
msubsup>+ϖjH(t),
for j=1,<
/mo>2,…,J :MATH]
(15)
The uth pathogen gene:
[MATH: guP(t+1)
=[p1H(t)⋯pT
mi>uH(t)guP(t)m1H(t)⋯guP(t)mMu
mi>H(t) l1H(t)⋯lL
mi>uH(t) guP(t) 1][c
u1P<
mtr>⋮cu
TuP
−eu1
P⋮
−euM
uPdu1P⋮duLuP1−λu
mi>Pδu
mi>P]+
ϖuP(t) <
/mo> ≜ϕuPG(t)
θuPG
msubsup>+ϖuP(t),
for u=1,2
mn>,…,U
:MATH]
(16)
The
[MATH: μth :MATH]
host miRNA:
[MATH: mμH(t+1)
=[p1H(t)⋯pT
mi>μHmμ
H(t)m1H(t)⋯mμH(t)mRμ
mi>H(t) l1H(t)⋯lL
mi>μH(t) mμH(t) 1][f
μ1H<
mtr>⋮fμ
TμH
−kμ1
H⋮
−kμR
μHhμ1H⋮hμLμH1−φμ
mi>Hημ
mi>H]+
εμH(t) <
/mo> ≜ϕμHM(t)
θμHM
msubsup>+εμH(t),
for μ=1,<
/mo>2,…,M :MATH]
(17)
The
[MATH: ℓth :MATH]
host lncRNA:
[MATH: lℓH(t+1)
=[p1H(t)⋯pT
mi>ℓHlℓ
H(t)m1H(t)⋯lℓH(t)mMℓ
mo>H(t) l1H(t)⋯lL
mi>ℓH(t) lℓH(t) 1][x
ℓ1H<
mtr>⋮xℓ
TℓH
−zℓ1
H⋮
−zℓM
ℓHyℓ1H⋮yℓLℓH1−σℓ
mo>Hϕℓ
mo>H]+
ψℓH(t) <
/mo> ≜ϕℓHL(t)
θℓHL
msubsup>+ψℓH(t),
for ℓ=1,<
/mo>2,…L :MATH]
(18)
where
[MATH:
ϕjHG(t), ϕuPG(t), ϕμHM(t)
:MATH]
and
[MATH:
ϕℓHL(t)
:MATH]
represent the regression vectors that can be obtained from the
expression data and
[MATH:
θjHG, θuPG, θμHM :MATH]
and
[MATH:
θℓHL :MATH]
are the unknown parameter vectors to be estimated for the
[MATH: jth :MATH]
host gene,
[MATH: uth :MATH]
pathogen gene,
[MATH: μth :MATH]
host miRNA, and
[MATH: ℓth :MATH]
host lncRNA in host GRN, respectively.
Then, Equations (15)–(18) of the jth host gene, uth pathogen gene,
[MATH: μth :MATH]
host miRNA, and
[MATH: ℓth :MATH]
host lncRNA can be further augmented for
[MATH:
Yj,
Yu, Y<
/mi>μ :MATH]
and
[MATH: Yℓ
:MATH]
time points as the following forms, respectively:
The jth host gene:
[MATH:
[
gjH(t2)
gjH(t3)⋮gjH(tYj
+1)]=[ϕjHG(t1)
ϕjHG(t2)⋮ϕjHG
mi>(tYj
)]θjHG+[<
mi>ϖjH(t1)
ϖjH(t2)⋮ϖjH(tYj
)], for
j=1,2,…<
mo>,J :MATH]
(19)
the Equation (19) can also be simply represented as:
[MATH:
GjH=ΦjHGθjHG+ΩjHG, for j=1,2,…,J :MATH]
(20)
The uth pathogen gene:
[MATH:
[
guP(t2)
guP(t3)⋮guP(tYu
+1)]=[ϕuPG(t1)
ϕuPG(t2)⋮ϕuPG
mi>(tYu
)]θuHG+[<
mi>ϖuP(t1)
ϖuP(t2)⋮ϖuP(tYu
)], for u
mi>=1,2,…<
mo>,U :MATH]
(21)
the Equation (21) can also be simply represented as:
[MATH:
GuP=ΦuPGθuPG+ΩuPG, for u=1,2,…,U :MATH]
(22)
The
[MATH: μth :MATH]
host miRNA:
[MATH:
[
mμH(t2)
mμH(t3)⋮mμH(tYμ
+1)]=[ϕμHM(t1)
ϕμHM(t2)⋮ϕμHM
mi>(tYμ
)]θμHM+[<
mi>εjH(t1)
εjH(t2)⋮εjH(tYμ
)], for
μ=1,2,…<
mo>,M :MATH]
(23)
the Equation (23) can also be simply represented as:
[MATH:
MμH=ΦμHMθμHM+ΩμHM, for μ=1,2,…,M :MATH]
(24)
The
[MATH: ℓth :MATH]
host lncRNA:
[MATH:
[
lℓH(t2)
lℓH(t3)⋮lℓH(tYℓ
+1)]=[ϕℓHL(t1)
ϕℓHL(t2)⋮ϕℓHL
mi>(tYℓ
)]θjHG+[<
mi>ψℓH(t1)
ψℓH(t2)⋮ψℓH(tYℓ
)], for
ℓ=1,2,…<
mo>,L :MATH]
(25)
the Equation (25) can also be simply represented as:
[MATH:
LℓH=ΦℓHGθℓHG+ΩℓHG, for ℓ=1,2,…,L :MATH]
(26)
After a succession of equation transformation procedure in advance, the
parameters in vectors
[MATH:
θjHG, θuPG, θμHM :MATH]
and
[MATH:
θℓHL :MATH]
of Equations (20), (22), (24) and (26) can be estimated by individually
employing the following constrained least-square estimation problem,
[MATH: θ^jHG=minθjHG12‖ ΦjHGθjHG−GjH‖22, subjec
t to
AjHGθ^jHG≤bjHG where b
jHG=[<
/mo>01<
/mtd>] and AjHG<
/mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mj+1
)×(Tj+Mj+Lj
+2) :MATH]
(27)
[MATH: θ^uPG=minθuPG12‖ ΦuPGθuPG−GuP‖22, subjec
t to
AuPGθ^uPG≤buPG where b
uPG=[<
/mo>01<
/mtd>] and AuPG<
/mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mu+1
)×(Tu+Mu+Lu
+2) :MATH]
(28)
[MATH: θ^μHM=minθμHM12‖ ΦμHMθμHM−MμH‖22, subjec
t to
AμHMθ^μHM≤bμHM where b
μHM=[<
/mo>01<
/mtd>] and AμHM<
/mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mμ+1
)×(Tμ+Mμ+Lμ
+2) :MATH]
(29)
[MATH: θ^ℓHL=minθℓHL12‖ ΦℓHLθℓHL−LℓH‖22, subjec
t to
AℓHLθ^ℓHL≤bℓHL wherebℓ
HL=[01] and AℓHL<
/mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mℓ+1
)×(Tℓ+Mℓ+Lℓ
+2) :MATH]
(30)
Through applying the function
[MATH:
lsqlin :MATH]
in MATLAB optimization toolbox to solve the constrained least-squares
estimation problem in (27)–(30), we could obtain optimal regulatory
parameters for GRN Equations in (3)–(6) and concurrently guarantee that
the miRNA repression ability
[MATH:
−ejμH, −euμP,<
mtext> −kμr<
/mrow>H :MATH]
and
[MATH:
−zℓμH :MATH]
as well as the degradation rate
[MATH:
−λjH
, −λuP<
/mi>, −φμH :MATH]
and
[MATH:
−σℓH
:MATH]
values corresponding to the jth host gene, the uth pathogen gene, the
[MATH: μth :MATH]
host miRNA, and the
[MATH: ℓth :MATH]
host lncRNA to be non-positive, respectively.
So far, the structure of dynamic regression model construction is
complete, yet if we substitute the expression data into the linear
model right away, infinite solutions problems will emerge due to the
lack of enough data points, in other words, information deficiency in
comparison with parameters for evaluation, that is, overfitting. Thus,
the cubic spline for extra numbers of time-profile data points
interpolation is applied to prevent this trouble (For more details,
readers can refer to [78]Appendix A). Then, with the processed RNA-seq
data, the accurate solutions of the constrained least-squares
estimation problems in (13)–(14) and (27)–(30) can be attained.
Moreover, since the measurement technology of genome-wide protein
expression in HepaRG cells and Hepatitis B Virus during infection
hasn’t been realized yet, and supporting evidence comes from research
showing that the cellular concentrations of proteins correlate with the
abundances of their corresponding mRNAs, which implied that the
variance in protein abundance can be explained by that of mRNA
[[79]48,[80]49,[81]50,[82]51] and a correlation coefficient of 48% was
also obtained between the mRNA and protein abundances in human liver
[[83]49], i.e., the RNA-seq data of gene expressions can substitute
protein expressions and contribute to sufficient information for
resolving above constrained least-squares parameter estimation problems
in (13) and (14) and (27)–(30). One could retain more information in
our previous research [[84]52].
2.6. Determination of Significant Interaction Pairs
On account of approaching the best combination to constitute the robust
HPI-GEN in the parameter identification procedures, we exploited the
cost function to obtain the optimal fit for the host/pathogen RNA-seq
data, to be more precisely, estimation of the expected relative
distance between the fitted model and the unrecognized authentic
molecular pathogenic mechanism that actually generated the observed
data. On the whole, as the cost function of the parameter estimation
for linear regression model, mean squared error (MSE) is enough to
calculate the residual variance. However, the cost from the model
complexity that might also influences the performance should be taken
into consideration as well. Akaike Information Criterion (AIC), taking
the place of MSE, was thereby selected to assess the residual variance
and model complexity at once. As the expected residual variance
declines with rising parameter numbers for inadequate model
complexities, there should be a minimum around the correct parameter
number after repeated coordination [[85]52]. Meanwhile, owing to
computational efficiency, computing the AIC statistics for all possible
combinations is impracticable. Stepwise methods are developed to
decreasing the complexity of exhaustive searching [[86]52]. Finally,
while reaching the minimum of AIC value, we could acquire the real
number of interactions or regulations for each protein (gene) one by
one in the candidate HPI-GEN. Those insignificant regulations or
interactions out of the real number determined by AIC should be pruned
to obtain the real HPI-GEN.
For each model composing PPIN, the AIC values of the
[MATH: ith :MATH]
host and the
[MATH: qth :MATH]
pathogen protein can be defined individually as the following
equations:
The
[MATH: ith :MATH]
host protein:
[MATH: AICiHP(Ni,Vi)=log(<
mn>1Yi(PiH−ΦiHPθ^iHP)T(PiH−ΦiHPθ^iHP))+2(Ni+Vi)Yi=log[(σiHP)2 ]+2(
Ni+V<
mi>i)Yi :MATH]
(31)
The
[MATH: qth :MATH]
pathogen protein:
[MATH: AICqPP(Nq,Vq)=log(<
mn>1Yq(PqP−ΦqPPθ^qPP)T(PqP−ΦqPPθ^qPP))+2(Nq+Vq)Yq=log[(σqPP)2 ]+2(
Nq+V<
mi>q)Yq :MATH]
(32)
for
[MATH:
i=1,…,I :MATH]
and
[MATH:
q=1,…,Q :MATH]
, where
[MATH:
AICi
HP(N
i,Vi
) :MATH]
with the model complexity
[MATH:
Ni+V
i :MATH]
and
[MATH:
AICq
PP(N
q,Vq
) :MATH]
with the model complexity
[MATH:
Nq+V
q :MATH]
denote the
[MATH: ith :MATH]
host and the
[MATH: qth :MATH]
pathogen protein in PPI model, respectively;
[MATH: (σiHP)2 :MATH]
and
[MATH: (σqPP)2 :MATH]
signify the covariance of estimated residual error between
[MATH:
PiH :MATH]
and
[MATH: ΦiHPθ^iHP :MATH]
and between
[MATH:
PqP :MATH]
and
[MATH: ΦqPPθ^qPP :MATH]
, respectively; and
[MATH: θ^iHP :MATH]
and
[MATH: θ^qPP :MATH]
represent the estimated parameters acquired from the solutions of the
parameter estimation problems in (13) and (14), respectively. Suppose
[MATH:
Ni*,
Vi* :MATH]
and
[MATH:
Nq*,
Vq* :MATH]
could minimize
[MATH:
AICi
HP(N
i,Vi
) :MATH]
and
[MATH:
AICq
PP(N
q,Vq
) :MATH]
in (31) and (32), respectively. Then,
[MATH:
Ni*,
Vi* :MATH]
and
[MATH:
Nq*,
Vq* :MATH]
denote the real number of PPIs in the ith host protein and the qth
pathogen protein. The insignificant PPIs out of these numbers should be
pruned as false positives from candidate PPIs to obtain real PPIs in
HPI-GEN.
For each model comprising GRN, the AIC values of the jth host gene, the
uth pathogen gene, the
[MATH: μth :MATH]
host miRNA, and the
[MATH: ℓth :MATH]
host lncRNA can be defined individually as following equations:
The jth host gene:
[MATH: AICjHG(Tj+Mj+Lj)<
mtd
columnalign="left">=log(<
mn>1Yj(GjH−ΦjHGθ^jHG)T(GjH−ΦjHGθ^jHG))+2(Tj+Mj+
Lj)<
msub>Yj
mtd>=log[(σjHG)2 ]+2(
Tj+M<
mi>j+Lj)
Yj :MATH]
(33)
The uth pathogen gene:
[MATH: AICuPG(Tu+Mu+Lu)<
mtd
columnalign="left">=log(<
mn>1Yu(GuP−ΦuPGθ^uPG)T(GuP−ΦuPGθ^uPG))+2(Tu+Mu+
Lu)<
msub>Yu
mtd>=log[(σuPG)2 ]+2(
Tu+M<
mi>u+Lu)
Yu :MATH]
(34)
The
[MATH: μth :MATH]
host miRNA:
[MATH: AICμHM(Tμ+Mμ+Lμ)<
mtd
columnalign="left">=log(<
mn>1Yμ(MμH−ΦμHMθ^μHM)T(MμH−ΦμHMθ^μHM))+2(Tμ+Mμ+
Lμ)<
msub>Yμ
mtd>=log[(σμHM)2 ]+2(
Tμ+M<
mi>μ+Lμ)
Yμ :MATH]
(35)
The
[MATH: ℓth :MATH]
host lncRNA:
[MATH: AICℓHL(Tℓ+Mℓ+Lℓ)<
mtd
columnalign="left">=log(<
mn>1Yℓ(LℓH−ΦℓHLθ^ℓHL)T(LℓH−ΦℓHLθ^ℓHL))+2(Tℓ+Mℓ+
Lℓ)<
msub>Yℓ
mtd>=log[(σℓHL)2 ]+2(
Tℓ+M<
mo>ℓ+Lℓ)
Yℓ :MATH]
(36)
for
[MATH:
j=1,…,J, q=1
,…,Q, μ=1,…,M :MATH]
, and
[MATH:
ℓ=1,…,L :MATH]
where
[MATH:
AICj
HG(T
j+Mj+<
msub>Lj), AICuP
G(T<
mi>u+Mu+Lu), AICμH<
/mi>M(Tμ+Mμ+Lμ)
:MATH]
and
[MATH:
AICℓ
HL(T
ℓ+Mℓ+<
msub>Lℓ)
:MATH]
with the model complexity
[MATH:
Tj+M
j+Lj,<
mtext> Tu+M
mi>u+Lu, Tμ+
Mμ+Lμ :MATH]
and
[MATH:
Tℓ+M
ℓ+Lℓ :MATH]
denote the AIC values of the jth host gene, the uth pathogen gene, the
[MATH: μth :MATH]
host miRNA, and the
[MATH: ℓth :MATH]
host lncRNA in GRN model, respectively;
[MATH: (σjHG)2, (σuPG)2, (σμHM)2 :MATH]
and
[MATH: (σℓHL)2 :MATH]
signify the covariance of estimated residual error between
[MATH:
GjH :MATH]
and
[MATH: ΦjHGθ^jHG, :MATH]
[MATH:
GuP :MATH]
and
[MATH: ΦuPGθ^uPG, :MATH]
[MATH:
MμH :MATH]
and
[MATH: ΦμHMθ^μHM, :MATH]
and
[MATH:
LℓH :MATH]
and
[MATH: ΦℓHLθ^ℓHL, :MATH]
respectively; and
[MATH: θ^jHG, θ^uPG, θ^μHM :MATH]
and
[MATH: θ^ℓHL :MATH]
represent the estimated parameters acquired from the solutions of the
parameter estimation problems in (27)–(30), respectively. Similarly,
the real numbers of regulation by TF, miRNA, lncRNA are obtained by
minimizing the corresponding AICs in (33)–(36). The insignificant
regulations out of these real numbers should be pruned as false
positives to obtain the real regulations by TF, miRNA, lncRNA in
HPI-GEN.
So far, through a sequence of systematic identification processes, the
construction for real HPI-GENs is broadly accomplished by pruning those
insignificant interactions and regulations out of the corresponding
AIC. Unfortunately, due to the excessively enormous network
architecture, we could hardly extract the core pathways to investigate
the pathogenesis during HBV infection, let alone screening of candidate
proteins as new targets for drug development. We thereby extracted the
core network from the real HPI-GEN via applying the principal network
projection (PNP) method.
2.7. Extracting Core Network Structure from the Real HPI-GEN by Using PNP
Approach
Prior to core network extraction from the real HPI-GEN through applying
PNP method, it is essential to integrate the network estimated
parameters from PPIN and GRN into a system matrix A as follows,
[MATH: A= [AHP
,HPAHP
,PP 00APP
,HPAPP
,PP00AHG
,HP 0AHG
,HMAHG
,HLAHM
,HP 0AHM
,HMAHM
,HLAHL
,HP 0AHL
,HMAHL
,HLAPG
,HP 0APG
,HMAPG
,HL] <
mo>=[ a^11H
⋯a^1IH⋮a^inH⋮a^I1H⋯a^IIHb^11H
⋯b^1QH⋮b^ivH⋮b^I1H⋯b^IQH0⋯
mo>0⋮
0⋮0<
mo>⋯00⋯
mo>0⋮
0⋮0<
mo>⋯0<
mtr columnalign="left"> a^11P
⋯a^1IP⋮a^qnP⋮a^Q1P⋯a^QIPb^11P
⋯b^1QP⋮b^qvP⋮b^Q1P⋯b^QQP0⋯
mo>0⋮
0⋮0<
mo>⋯0 0
mtd>⋯0⋮0⋮0<
/mn>⋯0 c^11H
⋯c^1IH⋮c^jτH⋮c^I1H⋯c^IIH0⋯
mo>0⋮
0⋮0<
mo>⋯0−e^11H
⋯−<
msubsup>e^1MH⋮−e^jμH⋮−e^I1H⋯−e^IMHd^11H
⋯d^1LH⋮d^jℓH⋮d^I1H⋯d^ILH
f^11H
⋯f^1IH⋮f^μτH⋮f^M1H⋯f^MIH0⋯
mo>0⋮
0⋮0<
mo>⋯0−k11H⋯−k1<
mi>MH⋮<
/mo>−kμr
H⋮−kM1H⋯−kMMH
mi>h^11H
⋯h^1LH⋮h^μℓH⋮h^M1H⋯h^MLH
x^11H
⋯x^1IH⋮x^ℓτH⋮x^L1H⋯x^LIH 0
mtd>⋯0⋮0⋮0<
/mn>⋯0−z^11H
⋯−<
msubsup>z^1MH⋮−z^ℓμH⋮−z^L1H⋯−z^LMHy^11H
⋯y^1LH⋮y^ℓoH⋮y^L1H⋯y^LLH
c^11P
⋯c^1IP⋮c^uτP⋮c^Q1P⋯c^QIP 0
mtd>⋯0⋮0⋮0<
/mn>⋯0−e^11P
⋯−<
msubsup>e^1MP⋮−e^uμP⋮−e^Q1P⋯−e^QMPd^11P
⋯d^1LP⋮d^uℓP⋮d^Q1P⋯d^QLP
]∈ℝ(2I+2Q+L+M)×(I+Q+L+M)
:MATH]
where HP, PP, HG, HM, HL and PG denote the host protein, pathogen
protein, host gene, host miRNA, host lncRNA and pathogen gene,
respectively;
[MATH: a^inH, b^ivH :MATH]
and
[MATH: a^qnP, b^qvP :MATH]
mentioned in (1) and (2) could be acquired in
[MATH: θ^iHP :MATH]
and
[MATH: θ^qPP :MATH]
by resolving the parameter estimation problems in (13) and (14) and
pruning false positives by AIC method in (31) and (32), respectively;
[MATH: {c^jτH, −e^jμH, d^jℓH} :MATH]
,
[MATH: {f^μτH, −kμrH,
h^μℓH} :MATH]
,
[MATH: {x^ℓτH, −z^ℓμH, y^ℓoH} :MATH]
, and
[MATH: {c^uτP,−e^uμP,d^uℓP} :MATH]
mentioned in (3)–(6) could be acquired in
[MATH: θ^jHG, θ^uPG, θ^μHM :MATH]
and
[MATH: θ^ℓHL :MATH]
by resolving the parameter estimation problems in (27)–(30) and pruning
false positives by AIC method in (33)–(36) respectively. Note that
since the regulations from pathogen proteins weren’t taken into
consideration, the corresponding parameters in matrix A are padded with
zeros.
Thereafter, we extract the core components of HPI-GEN by PNP approach,
a principal network analysis method for dimensionality reduction based
on network structure projection technique, that is, projecting matrix A
to its principal singular vectors space. Accordingly, the combined
network matrix A can be denoted by singular value decomposition form
below,
[MATH:
A=USVT :MATH]
(37)
where
[MATH: U∈ℝ(2I+2Q+L+M)×(I+Q+L+M), V∈ℝ(I+Q+L+M)×(I+Q+L+M), and S=diag(σ1,<
/mo>…,σs,
…σI+Q+L+M)
:MATH]
is a diagonal matrix of
[MATH:
σ1,…,<
/mo>σs,…σI+Q+L+M :MATH]
which includes the I + Q + L + M singular values of the matrix A in
descending order, i.e.,
[MATH:
σ1≥…≥<
/mo>σs≥…≥σI+Q+L+M≥0
:MATH]
.
[MATH:
S=[σ10⋯0⋯
00σ<
/mi>2⋯0<
/mtd>⋯0⋮⋮⋱⋮
⋱⋮00⋯σs⋯0⋮⋮⋱<
/mtd>⋮⋱⋮00⋯
0⋯σI+Q+L+M
mi>0
0⋯0⋯
mtd>0⋮⋮⋱⋮⋱<
mtd>⋮00⋯0⋯<
mn>0]
:MATH]
Also, we define the expression fraction
[MATH: Ew
:MATH]
of the interaction ability of the proteins and the transcriptional
regulatory ability of the genes, lncRNAs, miRNAs as the normalization
of singular value form,
[MATH:
Ew=σw2∑w=1I+Q+L<
mo>+Mσw
mi>2
:MATH]
(38)
According to the diminishing property of the singular values, under the
basic premise of sustaining system energy from the whole network
structure, we pick out the minimum X such that
[MATH:
Σw=1XEw≥0
.85 :MATH]
, i.e., the vector space spanned by the orthonormal bases composed of
the principal singular vectors corresponding to the top X singular
values contains 85% energy of HPI-GEN structure.
Afterwards, we define and apply the projection value of each node,
including gene, miRNA, lncRNA, and protein in the real HPI-GEN to the
vector space spanned by the orthonormal basis composed of the principal
singular vectors corresponding to the top X singular values as below,
[MATH: ProjR
(Arow<
/mi>,i,V<
msup>T*)=[∑k=1X(Arow<
/mi>,ivk
)2<
mo>]1/2 Proj<
mi>C(Acol<
/mi>,j,U*
mo>)=[∑k=1X(Acol,jTu<
mi>k)2<
mo>]1/2 :MATH]
(39)
where
[MATH:
Arow,
mo>i :MATH]
denotes the ith row vector of A for
[MATH:
i = <
mn>1,…,2I+
2Q+L+M :MATH]
;
[MATH:
Acol,
mo>j :MATH]
signifies jth column vector of A for
[MATH:
j = <
mn>1,…,I+Q
+L+M :MATH]
;
[MATH:
VT* :MATH]
; and
[MATH: U*
:MATH]
represent the vector spaces spanned by the orthonormal basis
[MATH:
{v1,
v2,…,vk} :MATH]
and
[MATH:
{u1,
u2,…,uk} :MATH]
[MATH:
for k=1,…,X
:MATH]
, respectively;
[MATH:
ProjR(Arow<
/mi>,i,V<
msup>T*) :MATH]
and
[MATH:
ProjC(Acol<
/mi>,j,U*
mo>) :MATH]
indicate the projection value from
[MATH:
Arow,
mo>i :MATH]
and
[MATH:
Acol,
mo>j :MATH]
to
[MATH:
VT* :MATH]
and
[MATH: U*
:MATH]
vector spaces, respectively.
While the projection value
[MATH:
ProjR(Arow<
/mi>,i,V<
msup>T*) :MATH]
or
[MATH:
ProjC(Acol<
/mi>,j,U*
mo>) :MATH]
in (39) approaches zero, it intimates that the corresponding node i or
j isn’t the pivotal factor or independent to the core network extracted
via the PNP procedure. Conversely, the larger projection value it gets,
the greater the node contributes to the core network.
Eventually, we select the first 3000 components on the top of the list
with higher energy in (39) as the core host-pathogen network from real
HPI-GEN of HBV infection by ranking the projection value of each node
and subsequently uploading those nodes into DAVID to obtain the KEGG
pathways as shown in Table 3. Taking KEGG pathways as reference, we
systematically identify and investigate the molecular pathogenic
mechanism of HBV infection from the core HPI-GEN as shown in Figure 4
and select the significant biomarkers for the further novel drug
discovery procedure.
2.8. Deep Learning-Based Drug-Target Interaction Prediction
2.8.1. Data Preparation
Following the identification of promising pathogenic biomarkers, a
potent drug-target interaction (DTI) model to predict the interactions
between the identified biomarkers and their corresponding molecule
drugs is crucial for discovering favorable molecular compounds for the
therapeutic treatment. Thus, we constructed a deep learning-based DTI
model to collect viable interactions to identify available drugs for
the selected biomarkers. Meanwhile, since most patients in need of
attention and treatment are infected with chronic hepatitis, the
highest priority was assigned to efficacious drugs with lower toxicity
that would not cause irreparable harm to health.
First, we integrated databases from UniProt [[87]53], DrugBank
[[88]54], ChEMBL [[89]55], Pubchem [[90]56], and BindingDB [[91]57] to
assemble applicable drug-target interactions. As feature descriptors
are simple numerical vectors designed to delineate the complicated
information of objects, they have been widely utilized to describe the
genomic sequences and chemical properties of molecules such as the
characteristics from 2D, 3D spectrum of structure, molecular weight,
predictive values of LogP, etc. of late [[92]58,[93]59]. In view of
this property, we employed both the off-the-shelf build-in molecular
and protein descriptor functions of python package pyBioMed to
transform the features from each drug and its target among the
previously collected drug-target interactions into descriptor under
python 2.7 environment, respectively. These features include molecular
dockings such as amino acid composition and dipeptide composition. For
more details about the descriptor transformation, readers could access
the documents of pyBioMed. Thereafter, we united the descriptors of
both drugs and their relevant targets into a matrix as the training
dataset to train the DTI model. The vector
[MATH:
vdrug
mi>-target
mrow> :MATH]
corresponding to the descriptor of each drug-target pair is given
below:
[MATH:
vdrug
mi>-target=[D,T]=[d1,d2
,⋯,dI,t1,t2,⋯tJ] :MATH]
(40)
where
[MATH:
vdrug
mi>-target
mrow> :MATH]
denotes the vector of the descriptor for each the drug-target pair
comprising two parts. The former part D is the descriptor of the drug
and
[MATH: di
:MATH]
represents the
[MATH: ith :MATH]
drug feature; the latter portion T is the descriptor of the target and
[MATH: tj
:MATH]
indicates the
[MATH: jth :MATH]
target characteristic; I is the total number of drug features; J is the
total number of target features.
Before performing drug-target interaction prediction, to keep our DTI
model from inferior performance arising from between-class imbalance
and distracting variation, some data preprocessing procedures are
requisite. The entire preprocessing procedure in our work encompasses
down sampling, data partitioning, feature scaling, and dimensionality
reduction.
* Down Sampling:
There are two categories of drug-target interaction in our samples:
16,000,000 samples for the unknown interactions (negative instances)
and 60,000 samples for the known interactions (positive instances).
Though deep learning methods scale well with the quantity of data and
can often leverage extremely large datasets for good performance,
imbalanced class distribution often causes the model to be overwhelmed
by the large class and ignore the minority one. So as to mitigate the
effect arising from class imbalance, we reduced the number of majority
samples (negative instances) to 60,000 which will allow the model to
learn from both classes equally.
* Data Partitioning:
After the down sampling procedure, we partition the data (the
reconstructed matrix of drug-target pairs) into two sets, three-fourth
for training and one-fourth for testing.
* Feature Scaling:
Since the variables of the features in each drug-target pair are
measured at different scales, they do not contribute equally to the
model fitting and might end up with creating a bias. To deal with this
potential problem, a feature-wise scaling is usually implemented prior
to model fitting. As powerful techniques of feature scaling, Min-max
scaling (normalization) and Standardization are both widely used in
data analysis. However, despite possessing the capability to shift and
rescale the data into a limited range of values, normalization is
sensitive to the outliers [[94]60]; in contrast, standardization
maintains useful information about outliers and makes the model less
sensitive to them [[95]61]. Thus, we apply Standardization on each
feature and the corresponding mathematical formulation is shown as
follows:
[MATH: di*=di−μiσi
mi>, ∀
mo>i=1,…,Itj*=tj−ωjδj
msub>, ∀j=1,…,J :MATH]
(41)
where
[MATH: di
:MATH]
and
[MATH:
di* :MATH]
represent the
[MATH: ith :MATH]
feature of the drug before and after Standardization, respectively;
[MATH: μi
:MATH]
and
[MATH: σi
:MATH]
signify the mean and standard deviation of the
[MATH: ith :MATH]
drug feature;
[MATH: tj
:MATH]
and
[MATH:
tj* :MATH]
indicate the jth feature of the target before and after
Standardization, respectively;
[MATH: ωj
:MATH]
and
[MATH: δj
:MATH]
denote the mean and standard deviation of the
[MATH: ith :MATH]
target feature; I is the total number of features in the drug while J
is that of the target.
* Dimensionality Reduction:
Since the high-dimensional patterns in samples might augment the number
of neurons in the network and elevate the computational complexity for
training, we adopted principal components analysis (PCA) to distill 692
features from original 1014 features of the samples. One could retain
as much information of interested [[96]62,[97]63].
Note that it is illegal and useless to carry out data preprocessing to
the testing set solely relying on the characteristics in testing data
in that we would never get any information from the features of testing
data until the training phase is over. Hence, aside from the
aforementioned transformation performed on the training data sets, we
also provide the identical transformation on the testing data during
the process of feature scaling and dimensionality reduction. More
exactly, we extracted the mean and standard deviation variables of each
feature in the training data set to standardize the corresponding
feature in the testing data. Likewise, the transformation matrix for
executing PCA of the features in training data was concurrently used to
reduce the dimension of the features in testing data.
So far, the training data for tuning the network parameters of the deep
learning-based DTI model and the testing data for evaluating the model
performance are ready. Nonetheless, if learned merely on the features
from the training set, the hyperparameters would always choose the
maximum possible model capacity, leading to overfitting [[98]63]. As a
result, we randomly split out one-tenth of the training data to
estimate the generalization error during training, allowing for the
hyperparameters to be updated in every epoch. In addition, Early
Stopping method was also employed to specify an arbitrarily large
number of training epochs and stop training once improving the model’s
fit to the training data comes at the expense of increased
generalization error. Eventually, through setting the optimizer as Adam
[[99]64] and learning rate = 0.001, we then trained our DTI model for
40 epochs with 100 samples in each mini-batch.
2.8.2. Parameters Tuning Process Based on Deep Learning Algorithm
In the network architecture, each layer can be simplified into a
function below:
[MATH: h^n=δ(wTxn<
/msub>+b) :MATH]
(42)
where
[MATH: xn
:MATH]
and
[MATH: h^n :MATH]
indicate the input and output vectors corresponding to the descriptor
of the nth drug-target pair, respectively;
[MATH: δ :MATH]
denotes the activation function (ReLU for each hidden layer and Sigmoid
for the output layer); and the vectors w and b are given as follows:
[MATH: w =[
w1w2⋮
we], b=[b1
b2⋮be
] :MATH]
(43)
where the weights parameters
[MATH:
w1,w
2,…we<
/mrow> :MATH]
in w and bias parameters
[MATH:
b1,b
2,…be<
/mrow> :MATH]
in b are free variables that capture the model’s representation of the
data in each layer and are learned from each sample.
To evaluate the model performance, the predicted output is compared
with the true label to compute a loss for the current set of model
weights. Since drug-target interaction prediction issue is a binary
classification problem, binary cross-entropy is chosen to calculate the
loss of each iteration. The summation of the loss
[MATH: L(y,y^) :MATH]
for totally N samples is given by the expression as below:
[MATH: L(y,y^)=−1N<
/mfrac>∑n=1N(yn∗l
og(y^n)+(1−yn
msub>)∗log(1−y^n))=1N
∑n=1NCn
msup>(yn,y^n) :MATH]
(44)
where
[MATH: Cn(yn,y^n) :MATH]
is loss of the nth sample calculated by binary cross-entropy, in which
[MATH: yn
:MATH]
is the nth actual label (1 or 0) and
[MATH: y^n :MATH]
is the nth predicted probability distribution.
During the learning process, the parameter set
[MATH: θ :MATH]
(46) in each layer should achieve the minimization of the objective
function (47) to obtain the optimal network parameters
[MATH: θ*
:MATH]
. Accordingly, backward propagation learning algorithm (48) is designed
to update the network parameters of both weights (
[MATH:
w1,w
2,…,we :MATH]
) and bias (
[MATH:
b1,b
2,…,be :MATH]
).
[MATH:
θ=[w1⋮we
mrow>b1⋮be]
mrow> :MATH]
(45)
[MATH:
θ*=argminθL(θ) :MATH]
(46)
[MATH:
θi=θ
i−1−η<
mo>∇L(θi−1<
/mn>) :MATH]
(47)
where i is the ith iteration of the learning process;
[MATH: η :MATH]
is the learning rate (set as 0.001); and
[MATH: ∇L(θi−1<
/mn>) :MATH]
is the gradient of
[MATH: L(θi−1<
/mn>) :MATH]
given as follows:
[MATH: ∇L(θi−1<
/mn>)=[∂L(θi−1<
/mn>)∂w1⋮∂L(θi−1<
/mn>)∂we
∂L(θi−1<
/mn>)∂b1⋮∂L(θi−1<
/mn>)∂be] :MATH]
(48)
Consequently, by back-propagating a corrective error signal through the
network, weighted connections between neurons in the DTI model are
iteratively adjusted and assessed on the basis of the drug-target pairs
in the training and the validation sets.
2.8.3. Measurement of Prediction Quality
Assessing a model is quite tricky. It isn’t reliable to evaluate model
performance (training accuracy of the final epoch) merely based on one
specific validation set, in that the accuracy obtained from one
validation set can be very different to that from another. Thus, we
took advantage of a general technique named 10-fold cross validation to
avoid the possible bias. That is, the original samples for training is
randomly partitioned into 10 equal size subsamples; one is retained as
the validation data for testing the model, and the remaining 9
subsamples are used as training data. Through repeatedly training and
validating the models 10 times with different partitions, we then
compute an average score over the rounds to give an estimate of the
model’s predictive performance. Afterwards, the model parameters with
different validation errors were applied to the testing data for
generating the final test accuracy and loss.
Aside from that, to compare the effectiveness of our DTI model with
that of other state-of-the-art ML-based approaches, we adopted AUC-ROC
curve [[100]65] to tell how good a specific model can distinguish two
classes, that is, whether or not a drug interacts with a target. ROC
represents the relationship between benefit (true positive rate) and
cost (false positive rate), while AUC is a value in the range of 0 to 1
that indicates the degree of separability between classes. The higher
AUC we obtain, the more accurate outcome we will get from the model.
The formulas for AUC-ROC curve are shown as below:
[MATH:
True Positive
Rate = Rec
all/Sensitivity
= TPTP+FN
mrow> :MATH]
(49)
[MATH:
Specificity =
TNTN+
FP :MATH]
(50)
[MATH:
False Positive Rate = 1−Specificity = FPTN+FP
:MATH]
(51)
where True Positive (TP) means that the actual value is positive and is
judged correctly; False Positive (FP) shows that the actual value is
positive but is judged by mistake; True Negative (TN) indicates the
actual value is negative and is judged accurately; False Negative (FN)
represents the actual value is negative but is judged in error.
3. Results
3.1. Overview of Systems Medicine Discovery Procedure
Though a great advance has been made to design and evaluate medicine
based on the application of diversified drug repositioning approaches,
it still takes a large amount of time and vigor if without a potent
avenue to select the promising biomarkers and drugs for the clinical
trials. In this study, we proposed a systematic biology method
containing two sections including potential biomarkers identification
and novel drugs discovery.
The overall flowchart of the proposed systems biology procedure is
shown in [101]Figure 1. On one hand, we applied a systematic
identification analysis procedure to the constructed candidate HPI-GEN,
considering both accuracy and model complexity with the help of the
information from the two-side real-time profile RNA-seq. Note that the
total nodes and edges respectively given in [102]Table 1 and [103]Table
2 of the real HPI-GEN in [104]Figure 2 abstracted from the candidate
HPI-GEN has considerably shrunk in comparison with that of the
candidate HPI-GEN, which indicates that the false-positives were
eliminated during the identification process. Supported with the PNP
approach, we then identified the core HPI-GEN in [105]Figure 3 by
selecting top-ranked 3000 nodes with significant projection values that
could reflect 85% of the real HPI-GEN. The higher the projection value
is, the stronger regulatory capacity it possesses in the host/pathogen
cross-talk mechanism during HBV infection. In the meantime, given
identified KEGG pathways derived through uploading nodes in core
HPI-GEN into DAVID in [106]Table 3, we selected the core signaling
pathways so as to explore the underlying etiologic mechanism in HBV
infection as shown in [107]Figure 4, involving microenvironmental
factors by surveying relevant papers and research.
Figure 1.
[108]Figure 1
[109]Open in a new tab
Flowchart of applying systems biology method to construct the candidate
HPI-GEN, real HPI-GEN, core HPI-GEN, and core signaling pathways during
the infection of HBV to identify the potential biomarker for later drug
discovery repurposing. The grey blocks in the shape of a cloud indicate
the online databases for constructing candidate HPI-GEN and later drug
repositioning; the white diamond blocks are the corresponding data
types of each database; the orange diamond blocks signify candidate
protein-protein interaction network (PPIN) and gene regelation network
(GRN); the blue blocks represent the input information including the
raw RNA-seq data for system identification and the collected articles
for BioBERT to augment the interspecies interactions and regulations in
PPIN and GRN, respectively; the red blocks denote the models comprising
BioBERT and stochastic regression models utilized in the systems
biology approach; the white rectangle blocks indicate the systematic
methods applied to constructing the candidate HPI-GEN and extracting
core HPI-GEN; the purple blocks imply the drug discovery and
repositioning procedure proposed afterwards; and the yellow rounded
rectangular blocks are the real HPI-GEN and core HPI-GEN during the
infection of HBV.
Table 1.
The total number of nodes in candidate interspecies host/pathogen
interspecies genetic and epigenetic interaction network (HPI-GEN) and
identified HPI-GEN during Hepatitis B Virus (HBV) infection.
Node Candidate HPI-GEN Real HPI-GEN
LncRNA node 143 137
MiRNA node 922 922
TF node 2477 1325
Receptor node 2105 2105
Virus node 4 4
Protein node 13,465 12,258
Total node 19,116 16,751
[110]Open in a new tab
Table 2.
The total number of edges in candidate interspecies HPI-GEN and
identified interspecies HPI-GEN during HBV infection.
Edge Candidate HPI-GEN Real HPI-GEN
LncRNA-LncRNA 0 0
LncRNA-miRNA 0 0
LncRNA-TF 156 9
LncRNA-Receptor 2 0
LncRNA-Protein 90 8
LncRNA-Virus 8 1
MiRNA-LncRNA 510 303
MiRNA-MiRNA 96 86
MiRNA-TF 42,913 7909
MiRNA-Receptor 26,888 9694
MiRNA-Protein 146,144 65,561
MiRNA-Virus 39 15
TF-LncRNA 231 157
TF-MiRNA 1439 944
TF-TF 33,426 8061
TF-Receptor 14,948 6500
TF-Protein 76,441 39,574
TF-Virus 54 38
Receptor-LncRNA 26 15
Receptor-MiRNA 143 102
Receptor-TF 2364 644
Receptor-Receptor 1490 693
Receptor-Protein 7755 4560
Virus-LncRNA 5 3
Virus-MiRNA 109 80
Virus-TF 509 77
Virus-Receptor 164 72
Virus-Protein 682 276
PPIs 4,039,657 906,428
Total edge 4,396,289 1,051,810
[111]Open in a new tab
Figure 2.
[112]Figure 2
[113]Open in a new tab
Real HPI-GEN between host and pathogen under infection of HBV. The real
HPI-GEN describes the cross-talk interaction and regulation between
host and pathogen after system identification analysis to prune false
positives from candidate HPI-GEN. The purple lines denote PPIs; the
green, yellow, blue, and red lines indicate the regulations by
Transcriptional factors (TFs), lncRNAs, miRNAs, and Virus proteins
respectively; and the total numbers of lncRNAs, miRNAs, proteins,
receptors, Virus proteins, and TFs in the network are 137, 922, 12,258,
2015, 4, 1325, respectively.
Figure 3.
[114]Figure 3
[115]Open in a new tab
Core HPI-GEN under infection of HBV. The core HPI-GEN denotes the
cross-talk interactions between host and pathogen after performing PNP
to extract the top 3000 nodes from the real HPI-GEN. The purple lines
denote PPIs; the green, yellow, blue, and red lines indicate the
regulations by TFs, lncRNAs, miRNAs, and Virus proteins, respectively;
and the total numbers of lncRNAs, miRNAs, proteins, receptors, Virus
proteins, and TFs in the network are 6, 262, 2086, 332, 4, 312,
respectively.
Table 3.
The pathway enrichment analysis of proteins by applying the DAVID in
core HPI-GEN during HBV infection.
Term Numbers p-Value
MicroRNAs in cancer 103 8.6 × 10^−18
Pathways in cancer 108 8.7 × 10^−10
FoxO signaling pathway 49 5.6 × 10^−9
Viral carcinogenesis 63 6.3 × 10^−8
Hepatitis B 47 7.3 × 10^−7
Cell cycle 39 1.6 × 10^−5
MAPK signaling pathway 64 6.2 × 10^−5
TNF signaling pathway 33 1.3 × 10^−4
Ras signaling pathway 55 6.2 × 10^−4
PI3K-Akt signaling pathway 77 7.6 × 10^−4
[116]Open in a new tab
Figure 4.
[117]Figure 4
[118]Open in a new tab
The core host/pathogen cross-talk signaling pathways during HBV
infection. This is a network obtained from the core HPI-GEN by the
annotation of KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways,
which represents an intracellular pathogenic mechanism during the
invasion of host hepatocyte by HBV. The solid black lines indicate the
regulation (with an arrowhead) and interaction between nodes. The solid
lines in different colors except black represent the regulations of the
corresponding cellular functions; among them, the lines with an
arrowhead denote the activation of cellular functions while the lines
with a circular head indicate the inhibition of cellular functions.
Furthermore, for the nodes of downstream target genes, the blocks with
an upward pointing arrow signify a rising expression; comparatively,
the blocks with a downward pointing arrow signify a dropping expression
under HBV infection.
On the other hand, after determining the biomarkers from signaling
pathways of the core HPI-GEN as shown in [119]Table 4, we proposed a
deep learning-based drug-target interaction (DTI) method to predict the
potential interactions (dockings) between the identified biomarkers and
their corresponding drugs for further candidate drugs discovery. The
outcome of the DTI model is presented in a form of probability. The
higher probability analyzed by the DTI model, the connection between
the selected biomarker and the predicted drug is more likely to exist.
However, given the enormous number of candidate drugs, the general drug
specifications such as regulation ability, toxicity, and sensitivity
from multiple databases were taken into consideration to reduce the
search space and ensure the safety and efficacy in clinical trials. The
pertinent flow-process diagram of the systems molecular drug design
procedure is shown in [120]Figure 5.
Table 4.
The drug targets identified for HBV infection.
Disease Drug Targets
HBV infection AKT1, NFKBIA, EIF4EBP1, HDAC6, STAT1, STAT3, TP53
[121]Open in a new tab
Figure 5.
[122]Figure 5
[123]Open in a new tab
The flowchart of the proposed systems drug discovery and repositioning
procedure. According to the potential biomarkers identified in
[124]Figure 4, we propose a systems drug discovery and repositioning
method for figuring out promising multiple-molecule drug to alleviate
HBV infection. First, we collected data in respect to drug-target
interaction from various online databases. Then, we turned each drug
and target into descriptors and assembled them into a matrix of
drug-target pairs. To keep our model from poor performance, data
preprocessing prior to training is indispensable. Accordingly, we
adopted data preprocessing approaches including down sampling, data
partitioning, feature scaling, and dimensionality reduction (For more
details, readers can refer to [125]Section 2.8.1 in Materials and
Methods) based on the property of the data. After training the Deep
learning-based drug-target interaction (DTI) model shown in [126]Figure
6 with the constructed training data, we evaluated the model
performance (as shown in [127]Table 5 and [128]Figure 7) and compared
the model efficacy of DTI with that of other traditional machine
learning (ML)-based methods (as shown in [129]Figure 8). Since there is
only a neuron in the output layer which delivers the probability of
whether the relation (docking) between a drug and a target exists, the
DTI model enabled us to stress the importance on specific interactions
(with output score approaching 1) and to identify candidate drugs.
Next, concerning the availability of the predicted drugs (candidate
drugs) via general criteria for drug design specifications such as
regulation ability, toxicity, and sensitivity obtained from databases,
we further filtered promising drugs that modulate multiple molecular
targets and possess the efficacy with lower dosages. After all,
low-dose drugs are more attainable and cause less harm to patients. In
this regard, a set of multi-target drugs with appropriate toxicity for
clinical investigations were selected as a multiple-molecule drug to
overcome HBV infection.
3.2. Extracting Core Signaling Pathways from Identified HPI-GEN and Core
HPI-GEN in HBV Infection
For the purpose of analyzing the molecular mechanism under HBV
infection, extracting core signaling pathways from candidate HPI-GEN is
critical. The process to construct candidate HPI-GEN and identify real
HPI-GEN as well as core HPI-GEN has been shown in the flowchart of
[130]Figure 1.
Thereafter, we identified the real HPI-GEN via parameter identification
concerning the cost of model complexity under the auspices of
corresponding RNA-seq data to prune false positives from candidate
HPI-GEN (refer to [131]Section 2.4, [132]Section 2.5, [133]Section 2.6
for more details). The total numbers of the nodes of proteins,
Transcription factors (TFs), miRNAs, lncRNAs, and Virus proteins as
well as the edges of their interactions and regulations for the
candidate HPI-GEN and identified real HPI-GEN are given in [134]Table 1
and [135]Table 2, respectively. In addition, through employing the
network visualizing software Cytoscape, the network of real HPI-GEN
under HBV infection is shown in [136]Figure 2.
Since the immense complexity of real HPI-GEN limits the possibility to
efficiently investigate the pathogenic mechanism under HBV infection,
applying principal network projection (PNP) and subsequently
abstracting the top 3000 nodes with significant projection values on
the 85% pivotal network structures to distill core HPI-GEN are
essential. Meanwhile, the results of core HPI-GEN under the infection
of HBV is shown in [137]Figure 3.
Moreover, we employed DAVID Bioinformatics Resources version 6.8 to
analyze the enrichments of relative pathways and specific cellular
functions of core HPI-GEN as shown in [138]Table 3.
Through the multifaceted assessment of the core HPI-GEN based on
existing studies and the annotation of KEGG signaling pathways, we
could attain the core signaling pathways associated with
microenvironmental factors including cytokines, chemokines, etc., for
the pathogenesis of HBV infection as represented in [139]Figure 4.
Then, on the basis of the investigating the consequence of the
systematic pathogenic molecular mechanism, we could identify the
potential biomarkers holding great promise for the development of
therapeutic targets.
3.3. Analysis of Core Interspecies Cross-Talk Pathways to Investigate
Host/Pathogen Offensive/Defensive Mechanism during HBV Infection
Based on the core host/pathogen cross-talk signaling pathways in
[140]Figure 4, several identified processes leading to dysfunction of
antiviral machinery are investigated to afford a stepwise understanding
of the host/pathogen offensive and defensive mechanism under HBV
infection.
3.3.1. Inception of HBV Infection
Hepatitis B Virus (HBV) has a 3.2-kb circular and partially
double-stranded DNA genome that contains four genes named S, C, P, and
X genes coding for the envelope proteins, the core protein and a
related protein termed precore protein, the viral DNA polymerase, and
the multifunctional regulatory protein, respectively [[141]66,[142]67].
The journey of HBV virions starts from attaching to host hepatocytes
through heparan sulfate proteoglycans (SDC2 and HSPG2) and promptly
transfers to sodium acid cotransporter (SLC10A1), initiating the HBV
life cycle. Once entering the cells, the viral nucleocapsid containing
the partially double-stranded DNA, known as the relaxed circular DNA
(rcDNA), will be released into the cytoplasm and transported into the
nucleus. The plus strand of the rcDNA is then repaired and completed to
generate the covalently closed circular (cccDNA), which later forms the
HBV minichromosome and is transcribed into RNAs for viral replication
[[143]68].
3.3.2. Liver Microenvironment and Immune Pathogenesis under HBV Infection
Inflammatory and innate immune responses impact the pathogenesis of
numerous diseases and participate in the activation of common
inflammatory mediators and regulatory pathways. On top of the Toll
pathways that could be directly stimulated by the viral particles, in
response to the foreign invasions, leukocytes, e.g., neutrophils and
macrophages that phagocytose and kill pathogens, might simultaneously
coordinate additional host responses by synthesizing a wide range of
inflammatory mediators, such as TNF-α, IL-6, IFN-α, and IFN-γ
[[144]69]. These cytokines and chemokines provoke the secretion of
antiviral effectors and a series of immunoregulatory mechanisms against
HBV infection. In general, cellular and molecular events and
interactions would effectively mitigate impending injury or infection
during inflammatory responses [[145]70]. However, along with the
frequent recurrence of inflammation, once imbalances of the
microenvironment occur, it might lead to abnormal epigenetic variations
and even immune dysregulation [[146]71], which consequently incurs
viral relapse.
3.3.3. Toll Pathway as the First Line of Defense against Infection
Toll-like receptors (TLRs) have long been recognized as the first line
of antiviral immunity in that they initiate intracellular signaling
pathways to induce antiviral mediators, such as interferons (IFNs) and
other cytokines [[147]72]. Upon the recognition of foreign invaders
(HBV-associated molecular patterns), TLRs elicited phosphorylation of
IKKβ (IKBKB) through several signaling transduction proteins, including
MyD88, IRAK1, TRAF6, TAB1, and TAK1. IκBα (NFKBIA) is a downstream
protein of IKKβ keeping NF-κB sequestered in an inactive state in
cytoplasm [[148]73]. When phosphorylated by IKKβ, IκBα dissociated from
NF-κB, resulting in the nuclear translocation of NF-κB and the
induction of its several targets, in which the Interleukin 6 (IL6), the
interferon-alpha receptor 1 (IFNAR1), the interferon-gamma receptor 2
(IFNGR2), and BCL3 were recognized as shown in [149]Figure 4. Among
them, IL-6 is the pleiotropic cytokine encoded by IL6 gene that
catalyzes immune reaction and inflammatory responses [[150]74,[151]75];
IFNAR1 is the receptor that binds type I interferons and activates the
JAK-STAT signaling pathway, along with MAPK, PI3K, and Akt signaling
pathways associated with multiple cellular functions, such as
inflammation and immune response regulation [[152]76]; IFNGR2 is a
receptor that binds to interferon-γ (IFN-γ), which is the immune
interferon produced predominantly by natural killer (NK) and natural
killer T (NKT) cells in response to viral infection [[153]77]; and BCL3
is a protein that inhibits DNA damage-induced apoptosis and acts as a
transcriptional coregulator of NF-κB [[154]78,[155]79].
In addition to stimulation of the downstream NF-κB pathway, the
phosphorylated IKKβ also activated the pathway of mammalian target of
rapamycin (mTOR) by triggering the disruption of TSC1/TSC2 complex. In
general, TSC1 stabilizes TSC2 through direct binding, thereby
preventing TSC2 from ubiquitination and degradation. Once IKKβ
phosphorylates TSC1 incurring collapse of its connection with TSC2, it
obliquely leads to the stabilization of RhebGTP, which mediates
signaling to mTOR [[156]80,[157]81]. As for the substrates of mTOR,
among all the identified downstream proteins in the core host/pathogen
signaling pathways as shown in [158]Figure 4, the translational
repressor 4EBP1 (EIF4EBP1) stood out to be a pivotal constituent. It is
well acknowledged that when the 4EBP1 protein is hyperphosphorylated by
mTOR kinase, it can’t bind and hijack the eIF4E factor, i.e., a general
initiator for translational regulation [[159]82,[160]83]. Concurrently,
it is reported that the released eIF4E might form a complex with the 5′
stem-loop structure of the HBV genome, thus ensuring viral replication
[[161]84].
3.3.4. TNF-α-Stimulated Signaling Pathways
TNF-α is a multifunctional cytokine that plays a principal role in cell
proliferation, apoptosis, inflammation, and immune response. From the
core host/pathogen cross-talk signaling pathways in [162]Figure 4, as
soon as TNF-α bound to TNF receptors (TNFRs), it contributed to the
phosphorylation of Akt via both TNFR2/TRAF2/PIK3CB and
TNFR1/TRADD/TRAF2/PIK3CB signaling transductions. Akt is an effector of
Class I PI3K triggering multiple downstream pathways and has been shown
to be a critical mediator of cell proliferation and survival in a
variety of cell types [[163]85]. In [164]Figure 4, a few pathways of
Akt were incorporated into the core host/pathogen cross-talk network.
First of all, the activated Akt inhibited the induction of autophagy
through transactivating mTOR, i.e., impeding GTPase-activating protein
complex TSC1-TSC2 as previously described. It is worth noting that TSC2
rather than TSC1 was directly phosphorylated by Akt, which destabilized
TSC2 and disrupted its affinity with TSC1 [[165]86]. Secondly, Akt also
enhanced MDM2-mediated ubiquitination leading to p53 (TP53) degradation
[[166]87] and catalyzed the nuclear localization and stabilization of
BCL3 by phosphorylation [[167]79], where BCL3 is the aforementioned
transcription coregulator having been proven to interact with NF-κB so
as to facilitate its transcriptional ability. Moreover, it even caused
nuclear exclusion and cellular dysfunctions of FoxO proteins through
phosphorylation and repression [[168]88]. Therefore, as the identified
target genes of FoxO proteins in the core host/pathogen cross-talk
pathways, BNIP3, BMI1, SOD2, and cyclin-dependent kinase inhibitor 1
(CDKN1A) encoding p21, might probably be down-regulated. Within, BNIP3
is a pro-apoptotic factor that induces cell death through interacting
with anti-apoptotic proteins [[169]89]; BMI1 is a critical epigenetic
repressor that functions through chromatin remodeling and plays a
central role in DNA damage repair [[170]90,[171]91]; and SOD2 is the
antioxidant defense enzyme that participates in apoptotic signaling and
oxidative stress regulation [[172]92]. However, it is well documented
that FoxO1/FoxO3 could rapidly relocalize into the nucleus in response
to STAT3 activation by IL-6, and the accumulation of FoxO1/FoxO3 in
nuclei coincided with elevated expression of CDKN1A and other genes
[[173]93]. This phenomenon could interpret the high expression of the
target genes BNIP3, BMI1, SOD2, and CDKN1A, as well as the promotion of
viral replication, since activator FOXO1 has also been found to bind to
HBV DNA and activate its transcription [[174]94].
Aside from the activation of PI3K/Akt pathway, in response to cytokine
signals, TRAF2 also recruited TAK1/TAB3 complex and IKK complex, which
is composed of IKKα, IKKβ, and IKKγ (IKBKG), bringing about the
autophosphorylation-dependent activation of TAK1 [[175]95] and the
activation of its substrates, such as IKKβ and MKK7. MKK7 then
upregulated the transcriptional activity of c-JUN and ATF2 via
activation of the JNK1. c-Jun is an AP-1 family transcription factor
involved in the regulation of cell death and survival [[176]96] as well
as inflammation and innate immune response [[177]97]. In [178]Figure 4,
the induction of c-Jun is discovered to be associated with the
overexpression of the proinflammatory cytokine receptor IL18R1
essential for IL18 mediated signal transduction [[179]98], the ligand
PD-L1 (CD274) regulating the cellular immune responses to prevent
inflammation [[180]99], and the major cyclins CCND1 controlling the
progression of a cell through the G1 phase by conjugating CDK4/6
enzymes required for the synthesis of cell cycle [[181]100]. Besides,
c-Jun not only modulated the expression of BCL3 by serving as either
the coactivator of NF-κB or the activator of BCL3 gene to reinforce
cell survival [[182]101,[183]102], but also induced miR-221 to
interfere with the regulation ability of the histone deacetylase 6
(HDAC6), which has been reported to both suppress NF-κB activation and
control autophagosome maturation essential for ubiquitin-selective
quality-control autophagy [[184]103]. Apart from that, c-Jun could even
impact the modulation of Fas expression, which sparked off the loss of
Fas-ligand (FasL)-mediated apoptosis by means of down-regulating the
activity of p53.
3.3.5. TAK-STAT Signaling Pathways
The Janus kinase (JAK)-signal transducer and activator of transcription
(STAT) signaling pathway is a chain of interactions between proteins
involved in various physiological processes, including immune function,
cell growth, and cell death [[185]104]. The activation of STAT3 was
mainly provoked by JAK1 while JAK1 was brought in the proximity of
receptors including the Glycoprotein 130 (IL6ST) and the gamma
interferon receptor 1 (IFNGR1), which receive the signal from IL-6 and
IFN-γ, respectively. However, the activation of STAT1 primarily stemmed
from the phosphorylation of TYK2, a member of JAK family that was
previously described by other research groups [[186]105], by the alpha
interferon receptor 1 (IFNAR1) residing in IFN-α stimulation. In spite
of forming homodimers or heterodimers and translocating to the nucleus
where they regulated the transcription of target genes, STAT3 and STAT1
played different even in part opposite roles in the process of pathogen
infection. STAT3 constitutive activity is essential for cell survival
and the extent of inflammation. Among the identified STAT3 targets in
the host/pathogen cross-talk signaling pathway in [187]Figure 4, an
anti-inflammatory and anti-apoptosis factor referred to as PD-L1 and
the major cyclin CCND1 involved in proliferation and cell cycle
progression as mentioned previously were included. Additionally, STAT3
could also inhibit apoptosis by suppressing Fas transcription and p53
expression.
In contrast, STAT1 promoted apoptosis by inducing the expression of
CXCL10 and strengthened the antiviral ability through modulating the
expression of GBP1. Amongst these two downstream targets, CXCL10 is a
ligand for the receptor CXCR3 involved in chemotaxis, induction of
apoptosis, and regulation of cell growth associated with a variety of
human diseases including Hepatitis C [[188]106] and Endotheliitis
[[189]107]. GBP1 is a binding protein which has been shown to provide
broad host protection against different pathogen classes through
expediting oxidative killing and delivering antimicrobial peptides to
autophagolysosomes [[190]108]. Furthermore, STAT1 also negatively
regulated the cell cycle though inducing the expression of the CDK
inhibitors CDKN1A, a target of transcription factors FoxOs as stated
previously.
Albeit STAT3 was directly activated by JAK1, it could also be
indirectly triggered through the cytoplasmic signaling cascade of
SHC/ERK pathway composed of signaling proteins, i.e., SHC1, GRB2, and
SOS1, and the classical MAPK/ERK pathway comprised of signaling
proteins, i.e., HRAS, RAF1, MEK1, and ERK1. Except inducing the
activation of STAT3, MAPK/ERK pathway also contributed to the
MSK1-mediated phosphorylation of c-Jun and ATF2, the inhibition of
FoxO3 nuclear translocation, and most importantly, the regulation of
eIF4E activity having been elucidated chiefly via phosphorylation on
serine 209 of eIF4E by other research groups [[191]83,[192]109], which
ultimately proliferated viral replication. More exactly, this was
carried out by mitogen-activated kinase2 (MNK2), the downstream kinase
of ERK1.
3.3.6. TNFs-Induced Apoptotic Pathways
As a component of the cell response to viral infection, apoptosis
mediated by the sequential activation of caspases is indispensable for
aborting the production and release of progeny virus. While diverse
signaling pathways inducing apoptosis have been found, caspase-mediated
proteolysis of downstream substrates is still a critical element of the
execution pathway common to all forms of apoptosis [[193]110]. And
among all the avenues conducive to apoptosis progression, the pathways
contributing to the modulation of caspase-3 (CASP3), a frequently
activated death protease that catalyzes the specific cleavage of many
integral cellular proteins, occupied a key position and a majority of
them began with the activation of FADD in [194]Figure 4.
Induction of FADD, the upstream activating protein of caspase-10
(CASP10), was mediated by not only the bifurcated TNFR1-TRADD signaling
cascades but also the interaction with the FasL receptor (FAS) and the
TNF-related apoptosis-inducing ligand receptor (DR5). Upon stimulation,
FADD in turn recruited the initiator caspase caspase-10 to form the
death-inducing signaling complex, which leads to activation of a
caspase cascade and eventual cell-death. Having been shown to activate
the executioner caspase-3 through direct processing, caspase-10 is
described to be functional-independent to caspase-8 (CASP8) in
initiating Fas- and tumor necrosis factor-related ligand-mediated
apoptosis [[195]111,[196]112]. Based on the cleavage potential of
caspase-3 associated with the dismantling of the cell and the formation
of apoptotic bodies, the identified critical targets including Beclin-1
(BECN1), DNA-PK (PRKDC), and the DNA fragmentation factor subunit alpha
(DFFA) were specifically cleaved by caspase-3 with subsequent loss of
its original capability. Among them, BECN1 is the component of the
phosphatidylinositol-3-kinase (PI3K) complex that induces the
appearance of autophagosome [[197]113]; PRKDC is the nuclear
serine/threonine protein kinase that takes part in degraded DNA repair
[[198]114]; and DFFA is a subunit of DFF. Further, according to
previous study, whilst DFFA is cleaved, the DNA fragmentation factor
subunit beta (DFFB) would dissociate from the cleaved fragments of DFFA
and trigger both DNA fragmentation and chromatin condensation during
apoptosis [[199]115].
Additionally, it is worthwhile to note that Akt, including AKT1 and
AKT2, has been reported to interact with X-linked inhibitor of
apoptosis protein (XIAP) and protect it from ubiquitination and
degradation [[200]116]. Once phosphorylated by Akt, the XIAP would
exploit its second baculovirus IAP repeat domain (BIR2) to inhibit the
apoptotic executioner caspase-3 [[201]117]. We believe that this
identified episode carries a foreshadowing for the succeeding
virus-induced malfunctions of programmed cell death.
3.3.7. Virus Proteins-Induced Pathogenic Mechanism
Proteins encoded by the viral genomes, especially HBx, a 154 amino acid
gene product of the X gene, have emerged as promiscuous transactivators
activating the transcription of host genes by interacting directly with
nuclear transcription factors or by obliquely activating various signal
transduction pathways in the cytoplasm and have been proven to be
potent epigenetic modifying factors in hepatocytes.
HBx regulated the expression of BCL3 to subvert apoptotic signaling
pathways via both synergistically cooperating with p300 (EP300) to
enhance c-Jun transcriptional activity and activating NF-κB by either
directly interacting with NF-κB for augmenting its binding ability to
its target DNA sequence or promoting phosphorylation of IκB as shown in
[202]Figure 4. In addition, apoptosis inhibition could also be induced
by dysfunction of caspase-3 through enhancing the ability of Akt, an
upstream kinase of caspase-3 as stated earlier, by HBx. Once caspase-3
lost its function, in response to signal from damaged DNA, DNA-PKs, the
substrate of caspase-3, would act in concert with XRCC4 and a number of
tightly coupled proteins to repair DNA. Further, it was suggested that
the intense activity or high expression of XRCC4 might be beneficial to
the promotion of HBV cccRNA formation and thereby facilitate the viral
replication [[203]118]. Since Hepatitis B Virus supremely requires
components from the host cell to complete its replication cycles,
merely one developed avenue to hijack host signaling pathways can’t
meet its demand. In addition to the significant translation initiation
factor eIF4E induced by HBx through Akt/mTOR pathway as just mentioned,
eIF4E was activated via the MAPK/ERK pathway as HBx could also trigger
Ras and its downstream signaling pathways. Hence, forming a node of
convergence of the PI3K/Akt/mTOR and Ras/Raf/MAPK signaling pathways,
the regulation of eIF4E is clearly consequential for HBx to promote
viral replication. Additionally, overexpression of cyclin G1 (CCNG1)
resulted from the loss of microRNA-122 (a liver-specific miRNA) induced
by HBx could also augment virus proliferation. It was reported that
CCNG1 specifically interacts with p53 and blocks its specific binding
to HBV enhancer elements, which simultaneously abrogates p53-mediated
inhibition of HBV transcription and releases cell proliferation from
G1/S arrest [[204]119].
p53 acts as a sequence-specific DNA binding protein that activates the
transcription of substantial target genes associating with the
regulation of cell cycle progression, apoptosis, DNA repair, and
senescence. In [205]Figure 4, HBx interacted with p53 both directly and
indirectly to inhibit its DNA consensus sequence binding and
transcriptional activity, which has been reported to efficiently block
p53-mediated cellular functions [[206]120,[207]121]. Apart from that,
HBx also inhibited p53 by conjugating c-Jun and STAT3 as well as
keeping MDM2, the principal cellular antagonist of p53 acting to limit
the p53 growth-suppressive function [[208]122], from ubiquitin-directed
self-degradation.
As for the targets of p53, apoptosis-associated proteins, such as the
proapoptotic protein Bax, which could permeabilize mitochondria and
engage the apoptotic program giving rise to activation of caspase-3
[[209]123,[210]124], the apoptosis antigen Fas triggering the Fas
signaling pathway as mentioned earlier, and the DDIT4 negatively
regulating cell growth, proliferation, and survival via inhibition of
the activity of mTOR [[211]125] were identified in our core
host/pathogen cross-talk signaling pathways in [212]Figure 4. Further,
having been reported to participate in the generation of ROS and
p53-dependent DNA damage [[213]126], DDIT4 was negative-regulated by
the HBx-induced upregulation of microRNA-221 to promote aberrant
proliferation. Such significant inverse correlation between DDIT4 mRNA
levels and miR-221 expression has been observed in a mouse model of
liver cancer as well [[214]127]. Besides, the direct suppression of
histone deacetylases 6 (HDAC6) by HBx-induced miR-221 would result in
autophagosome maturation failure; nevertheless, HBx yet also bound to
Vps34 (PI3KC3) to enhance its activity, inducing the autophagic
response crucial for the formation of autophagosomes as well as the
intensification of viral replication.
STAT1 and STAT3 are both critical elements in viral replication,
despite acting as distinct roles from each other. Functional
characterization of miR-338-3p indicated that miR-338-3p inhibited cell
proliferation by inducing cell cycle arrest at the G1/S phase. The
identified downregulation of miR-338-3p and tyrosine phosphorylation of
STAT3 triggered by HBx might cause CCND1 overexpression beneficial to
virus cell cycle progression [[215]128]. In addition, forced low
expression of miR-338 by HBx also directly led to the increased level
of autophagy. This process was mainly through the up-regulation of LC3
by the released transcription factor ATF2, which has been identified as
a direct target of miR-338 [[216]129]. Furthermore, instead of FoxO3,
HBx appeared to assist FoxO1 in counteracting the reduction of its
nuclear relocalization arising from Akt phosphorylation by inducing
STAT3 activation, which is conducive to the transcription of cccDNA as
previously described. In contrast, since interferons (IFNs) activated
STATs via phosphorylation [[217]130], once recruited by HBx, histone
acetyltransferase (HAT) CBP (CEBBP) might dynamically regulate STAT1
acetylation to counteract IFN-catalyzed STAT1 phosphorylation, nuclear
translocation, DNA binding, and target gene regulation. On the other
hand, STAT1 nuclear translocation induced by IFN-α might also be
deprived by HBV polymerase [[218]131]. Previous studies demonstrated
that miR-122 regulates type I IFN expression through both
down-regulating SOCS1 and SOCS3 to enhance interferon-mediated
suppression of HBV [[219]132,[220]133]. Nonetheless, it appears that
merely the regulation of SOCS1 by miR-122 was identified in [221]Figure
4. Finally, as a target gene of FoxOs and STAT1, CDKN1A overexpression
was mediated by HBx-induced upregulation of miR-29a. Serving as a
direct upstream of HDAC4, miR-29a could thereby alleviate the
deacetylation of CDKN1A through HDAC4 inhibition.
3.4. Drugs Discovery and Repositioning Based on Selected Biomarkers for HBV
Through reviewing the cellular dysfunctions triggered by the core
host/pathogen cross-talk interspecies signaling pathways under HBV
infection shown in [222]Figure 4, we selected the significant
biomarkers including AKT1, NFKBIA, EIF4EBP1, HDAC6, STAT1, STAT3, and
TP53 as drug targets shown in [223]Table 4. However, there is still a
long way to the clinical trials on account of the intrinsic
difficulties for drug design and development. To this end, we proposed
a systematic drug discovery and repositioning approach to uncover the
multiple-molecule drug for therapeutic treatment based on the deep
learning and data mining technique as shown in [224]Figure 5.
3.4.1. Deep Learning-Based Drug-Target Interaction Prediction Model
To effectively predict whether the identified targets interact with
some existing drugs, we provide a deep learning-based DTI model as
shown in [225]Figure 6. It is a fully-connected neural network
containing an input layer and five layers with weights; four hidden
layers (512, 256, 128, and 64 neurons in each layer) and an output
layer (1 neuron). The intermediate hidden layers are applied with
rectified linear unit (ReLU) activation function that thresholds
negative signals to 0 and passes through positive signal. This type of
activation function allows faster learning in comparison with
alternatives, e.g., sigmoid or tanh unit [[226]134]. In contrast, the
output of the last layer is fed to a sigmoid function which produces a
likelihood probability (0,1) score of the corresponding predicted
docking where a higher value symbolizes a more reliable drug-target
interaction. In addition, to further reducing overfitting, we
incorporated a dropout layer [[227]135] with a probability of 0.5
following the output of each hidden layer.
Figure 6.
[228]Figure 6
[229]Open in a new tab
Configuration of the deep learning-based DTI model. v is the vector of
each drug-target pair for the input layer;
[MATH: I :MATH]
is the total number of drug features; and
[MATH: J :MATH]
is the total number of target features.
For avoiding a poor performance due to assessing the model with a data
set unfairly sampled (outliers or other anomalies are presented in one
set), we introduced 10-fold cross validation measure to authenticate
the learning effectiveness as shown in [230]Table 5 and the
corresponding learning curves that plot the learning performance of DTI
model over experience and time are also presented in [231]Figure 7.
Note that since the model has been diagnosed well-fit and continuous
training will likely lead to an overfit, it automatically stops
learning at the epoch of 70. Subsequently, via calculating the loss and
accuracy of each DTI model yields on the test set, we received an
average accuracy of 92.6 (%) with the standard deviation of 0.294 (%).
Table 5.
Model performance (10-fold cross validation).
Validation Loss Validation Accuracy (%) Testing Loss Testing Accuracy
(%)
1 0.254831 92.09 0.236721 92.28
2 0.237422 91.89 0.2264 92.04
3 0.228304 92.35 0.226482 92.49
4 0.227555 92.62 0.228102 92.6
5 0.223151 92.55 0.218528 92.74
6 0.223817 93.04 0.227157 92.77
7 0.239133 92.39 0.225822 92.58
8 0.235536 92.93 0.226848 92.8
9 0.22345 93.1 0.218475 93.11
10 0.235327 92.21 0.219082 92.64
Average 0.232853 92.517 0.225362 92.605
Standard deviation 0.00983 0.409527 0.005559 0.294099
[232]Open in a new tab
Figure 7.
[233]Figure 7
[234]Open in a new tab
Training and Validation Learning curves (10-fold cross validation).
“-x-” lines in different colors denote the training accuracy and loss,
while “-o-” lines in different colors represent the validation accuracy
and loss in (a,b), respectively. The bold lines in red and blue
indicate the model’s average loss and accuracy of training and
validation in (a,b), respectively. Moreover, since stopping the
training of the neural network early before it has overfitted the
training dataset can reduce overfitting and improve the generalization
of the DTI model, Early Stopping approach is additionally employed to
automatically stop the learning process at the epoch of 70.
Moreover, with the increasement of experimental data, numerous machine
learning approaches such as Random Forest (RF) [[235]136], Support
Vector Machine (SVM) [[236]137], Nearest Neighbor [[237]138], etc. have
been applied to predict drug-target interactions. Therefore, to
evaluate the model performance, we contrast our DTI model with a few of
them, which are RF, Nearest Neighbor, and SVM. By adopting the
visualization tool, Receiver Operating characteristic (ROC), we can
clearly find that DTI outperforms other traditional binary classifiers
as presented in [238]Figure 8.
Figure 8.
[239]Figure 8
[240]Open in a new tab
The Receiver Operating Characteristic (ROC) curve of models based on
different methods. The accuracy of each method is calculated by the
area under the corresponding ROC curve (AUC) (Readers can refer to
[241]Section 2.8.3 for further illustration), where the DTI model could
achieve an accuracy of 0.99, which is higher than that of Random Forest
(RF) (accuracy of 0.70), Nearest Neighbor (accuracy of 0.67), and
Support Vector Machine (SVM) (accuracy of 0.57).
3.4.2. Systems Discovery and Design of the Multiple-Molecule Drug for HBV
Infection
On the basis of the pre-trained DTI model, we could evaluate the
docking ability between the identified biomarkers and the available
drugs. The larger predicted result we obtain, the higher probability
that the specific drug-target pair will have interaction. Namely, the
candidate drugs would be sifted out owing to holding higher probability
(with output score approaching 1) to interact with the identified
biomarkers. However, the exploration of promising drugs from a large
quantity of the candidates (predicted drugs) is still arduous.
Accordingly, in order to narrow down the scope of the possible options
and improve the reliability of the ultimate selected drug combination,
general criteria for drug design specifications including regulation
ability, toxicity, and sensitivity were applied to further filter out
unavailable or inappropriate medications.
Under the criteria of the pharmacological properties, the regulation
ability value for each drug was extracted from the LINCS L1000 level 5
dataset consisting of results from 978 genes of 75 cell lines treated
with 19,811 small molecular compounds. By referring to LINCS L1000, we
can examine whether a specific gene was up-regulated (positive values)
or down-regulated (negative values) after being treated with a small
molecular compound. Screening through a positive-regulation of NFKBIA,
EIF4EBP1, HDAC6, STAT1, and TP53 and the negative-regulation of AKT1
and STAT3, the remaining drugs with agreeable regulation ability for
the treatment of HBV-infected patients are shown in [242]Table 6.
Table 6.
The candidate multi-target drugs for the identified biomarkers.
AKT1
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
ribavirin −1.5283 1.9876 −1.18823
tacrolimus −0.5897 2.7541 −1.50335
sorafenib −0.3125 2.7885 −0.43586
tenofovir −0.6152 2.955 −0.63588
meloxicam −0.2726 3.4619 −0.68734
NFKBIA
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
chlortalidone 0.4056 1.8623 0.05198
busulfan 2.501 2.3207 −0.44631
rifaximin 0.681 2.6259 0.03563
sorafenib 3.5635 2.7885 −0.43586
meloxicam 3.2603 3.4619 −0.68734
EIF4EBP1
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
chlortalidone 0.4153 1.8623 0.05198
busulfan 0.7491 2.3207 −0.44631
rifaximin 1.7372 2.6259 0.03563
sorafenib 0.4733 2.7885 −0.43586
dactinomycin 0.2923 4.3767 0.00307
HDAC6
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
argatroban 0.5717 1.5976 −0.01405
chlortalidone 0.6495 1.8623 0.05198
entecavir 0.5388 2.3879 0.03615
nutlin-3 0.1749 2.529 −0.6695
meloxicam 0.1047 3.4619 −0.68734
STAT1
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
chlortalidone 0.234 1.8623 0.05198
ribavirin 0.982 1.9876 −1.18823
zafirlukast 0.2615 2.5723 −0.24097
sorafenib 0.5799 2.7885 −0.43586
hydroflumethiazide 1.5238 3.1299 0.3949
STAT3
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
argatroban −1.2902 1.5976 −0.01405
ribavirin −0.8934 1.9876 −1.18823
lamivudine −0.3132 2.1348 −1.33248
zafirlukast −0.5053 2.5723 −0.24097
tenofovir −1.5088 2.955 −0.63588
hydroflumethiazide −0.7348 3.1299 0.3949
meloxicam −1.6727 3.4619 −0.68734
TP53
Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
(EC50, nM)
acetylcysteine 1.0705 1.294 −1.2170
busulfan 2.2886 2.3207 −0.44631
nutlin-3 1.6295 2.529 −0.6695
sorafenib 0.5683 2.7885 −0.43586
calcitriol 0.4321 5.1352 0.28577
[243]Open in a new tab
Subsequently, to further narrow down the possibilities for drug
combinations, the drug toxicity specification (LD50) recorded in
DrugBank as well as the drug sensitivity specification (EC50) obtained
from the PRISM dataset, which is a dataset comprising sensitivity
values of 4518 drugs tested across 578 human cell lines, were also
adopted. Being the numeric index of lethality, median lethal dose
(LD50) plays a pivotal role in drug safety evaluation [[244]139]. The
drug with lower LD50 possesses higher toxicity, and thus it is more
detrimental. On the other hand, half maximal effective concentration
(EC50) is used as a measurement of concentration, which is often
adopted to evaluate the suitability and the efficacy of drugs, where
lower EC50 indicates the stronger inducibility, i.e., higher
sensitivity. Based on these concepts, the drug combination comprising
Sorafenib, Nutlin-3, and Tenofovir as shown in [245]Table 7 was then
designed as the multiple-molecule drug to rejuvenate the dysfunctions
in the pathogenic mechanism under HBV infection with minor collateral
tissue damage.
Table 7.
Potential multiple-molecule drug and the corresponding target genes for
HBV infection therapy.
Targets AKT1 NFKBIA EIF4EBP1 HDAC6 STAT1 STAT3 TP53
Drugs
Sorafenib O O O O O
Nutlin-3 O O
Tenofovir O O
Chemical structures of multi-target drugs
Sorafenib Nutlin-3 Tenofovir
graphic file with name biomedicines-08-00320-i001.jpg graphic file with
name biomedicines-08-00320-i002.jpg graphic file with name
biomedicines-08-00320-i003.jpg
[246]Open in a new tab
O indicates the docking between each multi-target drug composing the
multiple-molecule drug and their corresponding drug target.
Among them, Sorafenib is a multikinase inhibitor drug having been
approved for several tumor types, including Hepatocellular Carcinoma
(HCC) [[247]140]. Although prominent effects of Sorafenib for apoptosis
induction were recognized, several limitations have been indicated by
numerous medical studies upon applying Sorafenib to various cancers as
a single agent. Therefore, the use of Sorafenib in combination with
other targeted agents appears to be the main strategy for combatting
drug resistance and has been widely explored with promising results
[[248]141]. Falling under the category of small molecule inhibitor,
Nutlin-3 is conducive to the restoration of p53 pro-apoptotic ability,
leading to the activation of cell cycle arrest and apoptosis pathways
[[249]142]. Though it is a novel antitumor compound, which hasn’t been
clinically approved, the synergistic effect of Nutlin-3 combined with
other agents contributes to diminishing of the dose required in
monotherapy and decreasing the occurrence of adverse drug events
[[250]143,[251]144]. Being a potent nucleotide analogue
reverse-transcriptase inhibitor, Tenofovir is highly active against
human immunodeficiency virus (HIV). In spite of its clinical use
associated with the risk of kidney injury [[252]145], it has been shown
highly effective to patients who have never undergone an antiretroviral
therapy [[253]146]. Owing to its low toxicity, trifle side effects, and
subtle drug resistance compared to other antivirals, Tenofovir is
thereby frequently used as a member of combined drugs against HBV to
ensure stronger antiviral activity and a more favorable safety profile
[[254]147,[255]148]. It is worth noting that while therapy with
Tenofovir is commonly used to treat Hepatitis B, life-long therapy is
needed, which often increases the risk of relapse. In this respect, the
combination of other viable drugs with Tenofovir may have a chance to
improve the therapeutic effect and shorten the course of therapy.
Taken together, by administering the proposed multiple-molecule drug,
an up-regulation of NFKBIA, EIF4EBP1, HDAC6, STAT1, and TP53
accompanied by the down-regulation of AKT1 and STAT3 can plausibly be
attained, yielding encouraging results for the treatment of
HBV-infected patients.
4. Discussion
Traditional drugs commonly used for treatment in patients afflicted
with HBV are interferon and nucleoside analogues. However, the low
therapeutic response of patients to interferon and the inevitable
development of drug resistance to nucleoside analogues render the
clinical treatment challenging [[256]149]. In addition, serious toxic
accumulation of the nucleoside analogues during long-term therapy for
HBV infection is nonnegligible as well. Therefore, great efforts have
been placed into seeking highly specific ligands affecting single
target to alleviate HBV replication and to treat HBV-related
complications. Yet, in view of the prohibitive development costs and
timeline of a newly designed drug as well as the frequent failure of
monotherapy due to drug resistance, growing interest has centered on
discovering effective combinations of drugs for HBV treatment.
In this study, we develop a medicine discovery and repositioning
procedure in terms of system identification approaches via two-side
RNA-seq data and deep learning-based framework considering drug design
specifications to find a possible drug combination for the treatment of
HBV infection. Through investigation of core HPI-GEN annotated with
KEGG pathways, the core signaling pathways responsible for the
abnormally regulated cellular functions were extracted and unified in
[257]Figure 4. However, distinct cooperation between these pathways
sparks diversified implication. Consequently, further discussion of the
integrated effect is conducive to the selection of potential biomarkers
in relation to the remedy of HBV infection.
4.1. Apoptosis in HBV Infection
As a fundamental and highly regulated biological process, apoptosis is
a form of programmed cell death that participates in cell resistance to
viral infection. Although mounting evidence shows that HBx is uniquely
endowed with the capability to regulate apoptosis, its dual role as an
apoptotic mediator raises the difficulty to elucidate the pathological
mechanism [[258]150].
Caspase-3 is the typical hallmark of apoptosis [[259]110]. Numerous
studies have shown that HBx abrogates the activity of caspase-3
[[260]151,[261]152,[262]153], resulting in resistance to assorted
apoptotic stimuli. In [263]Figure 4, HBx-mediated inhibition activity
of caspase-3 was chiefly through both activating XIAP, the inhibitor of
caspase-3, and directly or obliquely regulating transcription factors
such as NF-κB and p53 to affect the expression of their downstream
anti-apoptotic target, e.g., BCL3 and Bax. Further, even if previous
researches did not specifically indicate the paramount downstream
anti-apoptotic genes of NF-κB [[264]154,[265]155], BCL3, which hampers
DNA damage-induced apoptosis and serves as a transcriptional
coregulator of NF-κB [[266]78,[267]79], counted for NF-κB -mediated
anti-apoptosis a lot in the pathogenic mechanism. Furthermore,
inhibition of caspase-3 not only abrogated apoptotic killing triggered
by the pathways in response to TNF-α, FasL, and TRAIL stimulation, but
also contributed to the activation of autophagy, leading towards
alleviated apoptosis coming from caspase-3 loss of its function to
cleave Beclin-1.
Asides from that, Fas, indispensable for the FasL-induced pathway that
gives rise to the activation of caspase-3, was downregulated during HBV
infection. Negative-expression of Fas might impede the sustainable
anti-viral response of death ligand stimulus-specific apoptotic
pathway. In contrast, as a target gene relevant to execution and
completion of apoptosis, the overexpressed CDKN1A triggered by
HBx-induced HDAC4 inhibition was concordant with the earlier research
that HBx can relieve a block on CDKN1A expression and prolong G1→S
transition in human hepatoma cells [[268]156,[269]157]. Even though
p21, the protein encoded by CDKN1A, is treated as a cyclin-dependent
kinase inhibitor capable of inhibiting all cyclin/CDK complexes, it
often promotes the assembly of type-D cyclins with CDK4 and CDK6 and
arrests cell cycle progression in the G1 phase [[270]156,[271]158],
which not only protects cells against apoptosis but also enhances the
chance of tumorigenesis.
4.2. Autophagy in HBV Infection
Autophagy is a catabolic mechanism known to engulf long-lived proteins
and damaged organelles via a lysosomal degradative pathway. Previous
studies have suggested that HBV can hijack components in the autophagic
regulatory pathways to promote its survival through augmentation of
autophagosomes formation and inhibition of autolysosome maturation
[[272]159,[273]160], which indicates the possibility of targeting the
autophagic pathway for the treatment of Hepatitis B.
HBx-catalyzed autophagic response, which is crucial for the formation
of autophagosomes as well as intensification of viral replication,
resided mainly in the increased activity of the Beclin-1/Vps34 complex
accompanied by the unchanged mTOR activity and positive regulation of
LC3 during HBV infection, which has also been reported in several
research [[274]159,[275]160]. The phagophores undergo nucleation and
elongation with Beclin-1/Vps34 complex and LC3 lipidation to form
double-membraned autophagosomes [[276]161], and the autophagic
membranes generated during infection are used for viral envelopment
[[277]162]. However, rather than Beclin-1, only PI3KC3 was activated by
the direct interconnection with HBx, which is in agreement with the
study that the need for HBx to also induce the expression of Beclin-1
is redundant [[278]163]. Moreover, although emerging study revealed
that HBx efficiently impedes autophagic degradation by disturbing
lysosomal acidification relevant to its degradative capacity without
influencing on the fusion of autophagosomes and lysosomes [[279]160],
HBx-induced reduction of HDAC6 regulation ability might lead to
autophagosome maturation failure [[280]103]. Despite the necessity of
further validation through biological experiments, this discovery
provides possible mechanistic insight into the overall observation of
HBV infection. Firstly, the intrinsic ability of HDAC6 to deacetylate
NF-κB, having been detected to hamper the invasion of cancer
[[281]164,[282]165], was destructed by HBx through transrepression,
which might gradually create a potent microenvironment favorable to the
malignant transformation of cells. Meanwhile, the negative-regulation
of HDAC6 in human HCCs and consequential loss of its tumor suppression
ability supported by previous research [[283]166] apparently advocate
the hypothesis.
4.3. Inflammation and Innate Immune Response in HBV Infection
Inflammation and innate immune response represent a highly coordinated
process aimed at fighting infection or tissue injury. When cells are
subject to foreign invasion, the inflammatory response can ensure
successful resolution of the condition and the restoration of tissue
homeostasis. However, inappropriately controlled natural defense
mechanisms will eventually lead to the progression of chronic diseases
[[284]167]. In general, innate immunity counts for a great deal in
governing infection right after contact with the pathogen to limit the
spread of the virus [[285]168]. However, the exploitation of immune and
inflammatory signaling pathways enables viruses to subvert antiviral
immunity and replicate in the hostile environment [[286]169].
Of all the proteins involved in the activation and execution of
inflammation response, NF-κB stands out as being crucial for this
process in diverse metazoan organisms. During HBV infection,
NF-κB-dependent transcription accelerated by HBx not only triggered the
anti-apoptosis mechanism in favor of viral persistence as mentioned
earlier but also augmented the inflammatory response leading to
hepatitis and cell transformation in accordance with the indication of
previous study [[287]170]. Concurrently, emerging research has also
uncovered that the dysregulated continual synthesis of IL-6, an
identified target of NF-κB, plays a pathological effect on chronic
inflammation and autoimmunity [[288]167], which obliquely emphasizes
the magnitude of the correlation between abnormally regulated NF-κB and
excessive inflammation. Besides, regarding viral immunoregulation, even
though it is contradictory that HBx on one hand sensitized cells to
inflammatory stimuli, but on the other positive-regulated PD-L1 through
transactivation of c-Jun and activation of STAT3, the findings are
consistent with the idea that production of PD-L1 transcripts pertained
positively to the intensity of liver inflammation to prohibit and
thereby evade the host immune response [[289]171,[290]172].
Moreover, unlike other proteins in their family which typically share
common characteristics and functions, STAT1 and STAT3 poles apart and
the mutual functional antagonism of them in T-cell-induced inflammation
has also been elucidated [[291]173]. Despite both serving as the target
of miR-122, SOCS1 was regulated by miR-122 and thus transactivated by
HBx in the identified mechanism during HBV infection, while SOCS3 did
not. Therefore, we suggest that the abrogation of STATs activity by HBx
to attenuate the interferon-mediated suppression of infection depends
on STAT1 instead of STAT3. Meanwhile, blockades of IL-6- and
IFNs-stimulated immune signaling pathways by viral proteins were
through STAT1 rather than STAT3, which also supports the idea from
another perspective. In addition, activation of STAT3 mediated by HBx
also bolstered its unshakeable status on maintaining viral persistence.
Likewise, inactivation of STAT1 mediated by HBV proteins counteracted
its effects on the provocation of the innate antiviral system. Such
phenomena have also been revealed in the evidence of numerous studies
[[292]174]. It is worth noting that the reported capacity of HBx to
block STAT1 nuclear import by means of affecting its methylation status
was partially through the communication with CBP in the identified
host/pathogen cross-talk. In such a way, that STAT1 lost its ability to
block cell cycle progression and hinder hepatoma cell transformation in
a course of prolonged inflammatory injury. This led to continuous
destruction and regeneration of hepatocytes, increasing the chances of
genetic alterations.
4.4. Discovery of Potential Drug Combination
Based on the significant biomarkers including AKT1, NFKBIA, EIF4EBP1,
HDAC6, STAT1, STAT3, and TP53, the combination of agreeable
multi-target drugs, i.e., Sorafenib, Nutlin-3, and Tenofovir, was
eventually recommended as a potential treatment for later clinical
researches of the multi-drug therapy treating HBV-infected patients.
Meanwhile, to gain deeper insight into the discovery of promising
drugs, in silico profiling using deep learning-based DTI model was
performed to predict interactions (dockings) between available drugs
and the identified biomarkers. Additionally, deciding whether a drug is
ready for clinical trials greatly pertains to preclinical studies that
yield preliminary efficacy, toxicity, safety, etc., information.
Therefore, to further narrow down the scope and improve the reliability
of the predicted drugs, the general pharmacological specifications for
drug design including regulation ability, toxicity, and sensitivity
were additionally adopted to evaluate the efficacy.
However, the entire process of moving a drug to clinical trials is
actually a long way to go. Better preclinical preparation and
assessment can be beneficial to the approval of a new medicine or drug
combination. Considering the common reasons for withdrawal of an
approved drug, e.g., hepatotoxicity and adverse events as well as the
safety and tolerability assessment of a drug such as HLA (human
leukocyte antigen) test, is conducive to better finding an agreeable
option for drug combination.
Repositioning drugs to treat with both common and rare disease is
gradually becoming an attractive proposition. Taking advantage of drug
repositioning avenues provides a faster discovery and translation of
clinically relevant drug combinations. Furthermore, aided with the
expediting development of computational technology and low-cost
sequencing approaches, mounting publicly accessible data will certainly
open up a broad spectrum of drug discovery and repositioning.
Consequently, by systematically integrating more extensive data and
information for both drugs and targets, a more appropriate drug
combination out of the scope of original medical indication can be
unveiled to bring new hope to HBV therapeutics.
5. Conclusions
The advent of the genomic eras has presented researchers with a myriad
of high throughput biological data, which spark the interests in
diverse biology applications. In this study, to investigate the
pathogenetic mechanism under HBV infection, we constructed candidate
host/pathogen interspecies genetic and epigenetic interaction network
(HPI-GEN) by big data mining. Then, with the help of the two-side
RNA-seq data, system identification strategies were applied to trim the
false positives from the candidate HPI-GEN to obtain the real HPI-GEN.
Thereafter, based on the extraction of PNP and the annotation of KEGG
pathways, interspecies cross-talk signaling pathways are investigated
from the real HPI-GEN for pathogenic mechanism under HBV infection to
identify significant biomarkers. Moreover, in order to discover
promising drugs for the identified drug targets, we trained a deep
learning-based DTI model to predict possible drug-target interactions.
Eventually, with the consideration of general drug design
specifications including regulation ability, toxicity, and sensitivity,
a combination of multi-target drugs as a potential multiple-molecule
drug was selected to abate HBV infection. It is worth pointing out that
although only pathogenesis for HBV infection was investigated in this
work, our unique workflow can also be utilized on a wide variety of
disease in view of systems biology, holding utility to aid in
diversified drug discovery and repurposing process. Meanwhile, along
with the development of NGS (Next-Generation Sequencing) technology,
additionally integrating analysis from available genomics data into our
pipeline can reinforce better downstream analysis and perform more
exact biological interpretation. As more extensive genomic and
pharmacological data being considered, the proposed pipeline could help
reveal a more exhaustive host/pathogen offensive and defensive
mechanism under HBV infection and facilitate the development of optimal
therapy ensuring greater therapeutic outcome.
Appendix A
In order to avoid overfitting in the parameter estimation process, the
cubic spline interpolation method was employed to interpolate 4 time
points into 1440 points. Even if the cubic interpolation couldn’t
increase the information for the parameter estimation, it could indeed
prevent the parameters estimation in Equations (13) and (14) and
(27)–(30) from overfitting during the identification process.
Author Contributions
Conceptualization, S.C. and B.-S.C.; Methodology, S.C. and B.-S.C.;
Software, S.C. and B.-S.C.; Validation S.C. and B.-S.C.; Formal
Analysis, S.C. and B.-S.C.; Investigation, S.C. and B.-S.C.; Data
Curation, S.C.; Writing-original Draft Preparation, S.C.;
Writing-Review and Editing, S.C., L.H.-C.W., and B.-S.C.;
Visualization, S.C.; Supervision, B.-S.C.; Funding Acquisition, B.-S.C.
All authors have read and agreed to the published version of the
manuscript.
Funding
This research was funded by Ministry of Science and Technology grant
number MOST 107-2221-E-007-112-MY3.
Conflicts of Interest
The authors declare no conflict of interest.
References