Abstract
   Hepatitis B Virus (HBV) infection is a major cause of morbidity and
   mortality worldwide. However, poor understanding of its pathogenesis
   often gives rise to intractable immune escape and prognosis recurrence.
   Thus, a valid systematic approach based on big data mining and
   genome-wide RNA-seq data is imperative to further investigate the
   pathogenetic mechanism and identify biomarkers for drug design. In this
   study, systems biology method was applied to trim false positives from
   the host/pathogen genetic and epigenetic interaction network (HPI-GEN)
   under HBV infection by two-side RNA-seq data. Then, via the principal
   network projection (PNP) approach and the annotation of KEGG (Kyoto
   Encyclopedia of Genes and Genomes) pathways, significant biomarkers
   related to cellular dysfunctions were identified from the core
   cross-talk signaling pathways as drug targets. Further, based on the
   pre-trained deep learning-based drug-target interaction (DTI) model and
   the validated pharmacological properties from databases, i.e., drug
   regulation ability, toxicity, and sensitivity, a combination of
   promising multi-target drugs was designed as a multiple-molecule drug
   to create more possibility for the treatment of HBV infection.
   Therefore, with the proposed systems medicine discovery and
   repositioning procedure, we not only shed light on the etiologic
   mechanism during HBV infection but also efficiently provided a
   potential drug combination for therapeutic treatment of Hepatitis B.
   Keywords: hepatitis B virus infection, pathogenesis, host/pathogen
   interspecies genetic and epigenetic network (HPI-GEN), systems medicine
   discovery, drug-target interaction (DTI) model, deep learning,
   multiple-molecule drug
1. Introduction
   Hepatitis B Virus (HBV) infection remains a prevalent health challenge
   worldwide, and is greatly associated with substantial morbidity and
   mortality due to serious liver diseases, such as hepatitis, fibrosis,
   cirrhosis, and even hepatocellular carcinoma [[28]1]. According to the
   latest report of World Health Organization (WHO) [[29]2], HBV affected
   an estimated 257 million people and causing 600,000 deaths per year.
   While current therapies seek to control the progression of the disease,
   life-long treatment and surveillance are still needed because
   resistance develops during treatment and reactivation often occurs
   after medication discontinuation [[30]3,[31]4]. In recent years,
   dedication to new drug design and immunotherapy development has been
   made by researchers to hopefully silence HBV in infected hepatocytes
   [[32]5,[33]6]. However, the host/pathogen cross-talk mechanism
   contributing to the development and persistence of HBV infection
   requires further investigation, and its current therapies are still
   inadequate. Thus, new treatment options are needed to achieve a better
   therapy for HBV-infected patients.
   Even so, de novo drug discovery is particularly an expensive,
   time-consuming, and high-risk process, let alone exploring a drug to
   eradicate HBV from patients, which is generally believed arduous
   [[34]4,[35]7,[36]8]. Conventionally, the pipelines of drug development
   from laboratory to market take nearly average 13 to 15 years and cost
   approximately two to three billion U.S. dollars, in which about 10% of
   all drugs that start from pre-clinical trials ever make it to actual
   human testing [[37]9,[38]10]. Fortunately, as the advancement of
   biological science drives the progress of high-throughput functional
   genomics and proteomics technology, novel experimental and
   bioinformatics methods have not only accelerated the deciphering of
   protein interactions, but also given rise to the construction of
   networks for signaling pathways and protein complex identification in
   specific diseases [[39]11]. Meanwhile, based on the manifold data types
   from wet-lab experiments, several computational approaches have been
   suggested to predict biological networks for expediting the modeling of
   functional pathways so as to exemplify the molecular mechanisms of
   cellular processes. Researchers are thereby being empowered to
   profoundly explore the complicated and mysterious mechanisms of
   biological systems for better investigating the potential drug targets
   [[40]12]. On the other hand, with the development of molecular biology,
   drug repurposing, also known as drug repositioning, was proposed to
   reduce the time spent and the capital expenditure. The process of drug
   repositioning is to find new use outside the scope of the original
   medical indication for existing drugs [[41]13]. Its prospect for new
   clinical indications is promising, since compared with other drug
   development strategies, a repositioned drug could be instantly put into
   preclinical testing and clinical trials for its safety and reliability
   proven in humans.
   Nonetheless, the commonly used correlation networks in bioinformatics
   applications are often based on oversimplified algorithms. As inherent
   difficulties appear in assessing the impact of microRNA (miRNA), long
   non-coding RNA (lncRNA), epigenetic modification, microenvironment
   factors, etc., hardly can these networks mirror the complicated
   mechanisms of factual biological systems. In addition, while wide doses
   of the repositioned drugs can be tested using in vitro and in vivo
   experiments, it takes great time and effort. Therefore, to discover
   potential drugs for the treatment of HBV-infected patients, the
   development of a systematic procedure in computational framework based
   on potent models and existing data is indispensable.
   In this study, we employed the systems biology technique to identify
   the significant biomarkers from the host/pathogen interspecies genetic
   and epigenetic interaction network (HPI-GEN) through two-side RNA-seq
   data during HBV infection. Subsequently, with the analysis of these
   biomarkers by the proposed drug discovery and repositioning procedure,
   potential combinations of multi-targets drugs were recommended to the
   clinical trials of Hepatitis B therapy. By and large, the process can
   be subdivided into a few steps: (1) Construct candidate HPI-GEN
   composed of host/pathogen interspecies protein-protein interaction
   network (HPI-PPIN) and host/pathogen interspecies gene regulation
   network (HPI-GRN) by big data mining. (2) Prune the false positives in
   candidate HPI-GEN by system identification analysis with the assistance
   of two-side profile RNA-seq data and Akaike Information Criterion (AIC)
   to identify the real HPI-GEN between host and pathogen. (3) Extract the
   core HPI-GEN from real HPI-GEN via principal network projection (PNP)
   approach. (4) Investigate the core signaling pathways related to
   manifold cellular function (especially host inflammatory and immune
   response, pathological anti-apoptosis, and cell survival) in the core
   HPI-GEN by the annotation of KEGG (Kyoto Encyclopedia of Genes and
   Genomes) pathways to gain insights into the etiologic mechanism of HBV
   infection. (5) Select the significant biomarkers related to
   pathological cellular dysfunctions. (6) Predict the candidate drugs
   from available drugs for the selected significant biomarkers by the
   pre-trained deep learning-based drug-target interaction (DTI) model.
   (7) Consider general criteria for drug design specifications including
   regulation ability, toxicity, and sensitivity to further sift out the
   promising drugs. (8) Combine these agreeable drugs as a potential
   multiple-molecule drug for possible therapeutic treatment of Hepatitis
   B. It is worthwhile to note that instead of a single drug, a
   combination of promising drugs was proposed for the treatment to slow
   or even prevent inherent or adaptive drug resistance through the
   simultaneous blockade of multiple targets or pathways [[42]14].
   Nowadays, researches have been dedicated to drug discovery and design,
   while few of them propose an integral end-to-end systematic procedure
   from identifying significant biomarkers based on pathogenic mechanism
   to recommending promising drugs for one disease. Here, we dig into the
   pathogenic characters of viral proteins played on the pathogenesis to
   select the potential drug targets from two-side RNA-seq data and
   reinterpret the role of available drugs based on a pre-trained deep
   learning-based DTI model through big drug-target interaction data,
   which gives an alternative way for systems drug discovery and
   repositioning to overcome HBV infection.
2. Materials and Methods
2.1. The Construction of the HPI-GEN in HepaRG Cell Line during HBV Infection
and the Application of Deep Learning-Based DTI Model for New Drug Design: An
Overview
   To gain new insight into the molecular mechanism of HBV infection,
   candidate cross-talk HPI-GEN, which consists of candidate intraspecies,
   candidate interspecies PPIN between host and pathogen, and candidate
   HPI-GRN encompassing regulatory networks between host gene, pathogen
   gene, host miRNA and host lncRNA, was constructed by big data mining
   and preprocessing of host/pathogen gene/miRNA/lncRNA expression from
   online database.
   Given that the information for interspecies PPIN construction is
   insufficient, in addition to interactions attained from bio-databases,
   data amplification technique through nature-linguistic programming
   (NLP) and web scrapping was introduced to expand the dataset. Then, for
   identifying real HPI-GEN, a systematic identification approach was
   exploited based on the genome-wide RNA-seq data of HepaRG cells and
   Hepatitis B Virus under HBV infection. In addition, to deal with the
   highly complicate real HPI-GEN, we employed PNP method to extract core
   network structure consisted of the top 3000 nodes with highest energy.
   Such effective approach benefits us to investigate the core signaling
   pathways for establishing the crucial molecular pathogenic mechanism to
   identify significant biomarkers that contribute to the progression of
   HBV infection. Finally, following the drug discovery procedure, we
   built the deep learning-based DTI model to identify the potential
   multi-target drugs to target their significant biomarkers for
   therapeutic treatment of HBV infection.
2.2. Big Data Mining and Data Preprocessing of RNA-seq Data for Human and
Pathogen
   In this study, two-side sequencing-based dataset was obtained from the
   Gene Expression Omnibus (GEO) at the National Center for Biotechnology
   Information (NCBI). We obtained the data recorded by Mueller et al.
   [[43]15], containing human and virus genomes with the time-course
   through GEO series with accession number [44]GSE101575 [[45]15]. The
   raw data of the dataset, which contains 27,267 human genes and 4 HBV
   genes, involves the gene expression profiles of human HepaRG cells from
   three patients during HBV infection at 6, 9, 12, and 24 h post
   reactivation with drug (RG7834) treatment and without. Only the data
   without medication administration was deployed to the ensuing system
   identification process.
   To unravel the cellular mechanism during HBV infection, core signaling
   pathways identified from HPI-GEN are requisite. The candidate HPI-GEN,
   involving many experimental data and computational predictions from
   numerous databases, consists of multiple networks. Among them, the
   candidate human PPIs were obtained from the Biological General
   Repository for Interaction Datasets database (BioGRID) [[46]16], DIP
   [[47]17], BIND [[48]18], IntAct [[49]19], and MINT [[50]20]; the
   candidate human miRNA and lncRNA regulations were individually obtained
   from Target Scan Human database [[51]21], and CircuitDB [[52]22]; the
   candidate regulations between human transcription factors and their
   targets were according to the Integrated Transcription Factor Platform
   database (ITFP) [[53]23], the Human Transcriptional Regulation
   Interactions database (HTRIdb) [[54]24], and the Transcription Factor
   database (TRANSFAC) [[55]25]; the candidate host-pathogen interspecies
   PPIs were obtained from VirusMentha [[56]26], IMEx [[57]27], PSICQUIC
   [[58]28], the PPIN constructed by Zhong et al. [[59]29], and based on
   the automatic pipeline method of NLP; the candidate host-pathogen
   interspecies regulations were collected from VIRmiRNA [[60]30], ViRBase
   [[61]31], miRecords [[62]32], starBase v2.0 [[63]33], miRTarBase
   [[64]34], and based on the automatic pipeline method of NLP.
2.3. Text Mining of Human and Pathogen Protein Interactions
   To reinforce the comprehension of the interaction relationships between
   host and pathogen, BioBERT (Bidirectional Encoder Representations from
   Transformer for Biomedical Text Mining) [[65]35,[66]36], an up-to-date
   BioNER (Biomedical Named Entity Recognition) approach is employed on
   the PubMed database to analyze binary interactions between host and
   pathogen. BioNER is a computerized procedure of identifying biomedical
   named entities such as genes, protein complexes, diseases, and
   chemicals in texts referring to information retrieval and data
   extraction. Amongst all the BioNER application, BioBERT, which has
   achieved state-of-the-art performance in terms of precision, is
   efficaciously utilized in biomedical text mining applications to
   address data.
   In brief, we scraped and collected 26,732 existing PubMed articles
   associated with Hepatitis B Virus and Homo sapiens within 20 years and
   transformed it into XML format. Nonetheless, on account of Pubmed’s
   property, the full texts from most articles in PMC aren’t publicly
   accessible so that looking forward to automatically download integrated
   content is unrealistic. Consequently, we additionally crawled 2831
   articles from PLOS ONE to strengthen our network complexity. Then,
   BioBERT model was exerted to discern known entities and identify the
   types of overlapping entities among each sentence separated from the
   relevant content with periods in the corpus.
   In order to facilitate analysis and reconfirmation, we also gathered
   the synonym names of both host and pathogen proteins (compiled from
   BioThesaurus database [[67]37]) to prevent omission of vital
   interactions owing to protein names mismatch. Afterwards, in each
   sentence, we extracted all present proteins and identified the
   correlations by determining the human proteins under the circumstance
   that Hepatitis B Virus proteins had existed. To abate computational
   complexity, in conformity with the subsequent system identification
   processing requirement, the verb recognition and identification were
   left out of account and binary interactions were built. For instance,
   providing that “HBx” and “BCL2” were simultaneously identified in the
   same sentence, we then set the corresponding value to 1 and so forth,
   vice versa to 0.
   Furthermore, to verify that authentic interaction relationships derived
   from the statistical analysis of sentences, we scrutinized entire
   extracted entities and completed the construction of ultimate host and
   pathogen interaction-relationships matrix.
2.4. Dynamic Models of Candidate HPI-GEN for Human Cells and HBV During the
Infection
   Candidate HPI-GEN, constructed via big data mining techniques, is the
   epitome of an intricate network comprised of protein interactions and
   gene regulations between host and pathogen during HBV infection. As
   inevitable false-positive information extraction might incur system
   distraction from genuine consequence and discrepancy during the pathway
   analysis process, dynamic models were built to carry out systematic
   identification approach by two-side time-profile HBV infection RNA-seq
   data to prune false-positives in candidate HPI-GEN.
   Protein abundances mirror a dynamic balance among a series of linked
   processes spanning the transcription, processing, and degradation of
   mRNAs to the translation, modification, and breakdown of the proteins
   themselves necessary for cellular protein production and maintenance
   [[68]38]. As a consequence, for the protein interactive model of PPI
   pre-network in candidate HPI-GEN, the dynamic interaction model of the
   ith host protein can be described by the following equation:
   [MATH: piH(t+1)
   =piH
   (t)+∑n=1NiainH<
   /msubsup>piH(
   mo>t)pnH
   msubsup>(t)
   +∑v=1VibivH<
   /msubsup>piH(
   mo>t)pvP
   msubsup>(t)+αiHgiH(
   t)−γiH
   piH<
   mo>(t)+βi<
   /mi>H+viH(t),
    for i=
   1,2,…,I<
   mo>, αiH≥0 and −γiH≤0 :MATH]
   (1)
   where
   [MATH:
   piH
   (t), pnH(t), giH
   mi>(t)
   :MATH]
   and
   [MATH:
   pvP
   (t) :MATH]
   indicate the expression levels of the ith host protein, the nth host
   protein, the ith host gene, and the vth pathogen protein at time t,
   respectively;
   [MATH:
   ainH
   mi> :MATH]
   and
   [MATH:
   bivH
   mi> :MATH]
   denote the interactive ability between the ith host protein and nth
   host protein and between the ith host protein and vth pathogen protein,
   respectively;
   [MATH:
   Ni 
   and Vi
    :MATH]
   signify the number of host proteins and pathogen proteins interacting
   with the ith host protein; and
   [MATH: I :MATH]
   typifies the total number of the human proteins in candidate PPIN.
   [MATH:
   αiH,
    −γiH :MATH]
   and
   [MATH:
   βiH :MATH]
   indicate the translation rate from the corresponding mRNA, the
   degradation rate, and the basal expression level of the ith host
   protein, respectively; the basal level denotes the interactions with
   unknown factors, such as acetylation, methylation, and ubiquitination;
   and
   [MATH:
   viH
   (t) :MATH]
   represents the stochastic noise of the ith host protein at time t. Note
   that the intraspecies and interspecies biological mechanism of human
   proteins in candidate PPIN are represented in Equation (1) as the form
   of
   [MATH: ∑n=1NiainH<
   /msubsup>piH(
   mo>t)pnH
   msubsup>(t)  :MATH]
   and
   [MATH: ∑v=1VibivH<
   /msubsup>piH(
   mo>t)pvP
   msubsup>(t) :MATH]
   , respectively.
   Now that the interactions between the host and pathogen proteins are
   mutual and their biological characteristics involved in PPIs are
   similar, the dynamic PPIN of the
   [MATH: qth :MATH]
   pathogen protein in candidate HPI-GEN can be described by the analogous
   equation as follows:
   [MATH: pqP(t+1)
   =pqP
   (t)+∑n=1NqaqnP<
   /msubsup>pqP(
   mo>t)pnH
   msubsup>(t)
   +∑v=1VqbqvP<
   /msubsup>pqP(
   mo>t)pvP
   msubsup>(t)+αqPgqP(
   t)−γqP
   pqP<
   mo>(t)+βq<
   /mi>P+vqP(t),
    for q=1,2
   mn>,…,Q, αqP≥0 and −γqP≤0 :MATH]
   (2)
   where
   [MATH:
   pqP
   (t), pnH(t), gqP
   mi>(t)
   :MATH]
   and
   [MATH:
   pvP
   (t) :MATH]
   indicate the expression levels of the
   [MATH: qth :MATH]
   pathogen protein, the nth host protein, the
   [MATH: qth :MATH]
   pathogen gene, and the vth pathogen protein at time t, respectively;
   [MATH:
   aqnP
   mi> :MATH]
   and
   [MATH:
   bqvP
   mi> :MATH]
   denote the interactive ability between the
   [MATH: qth :MATH]
   pathogen protein and nth host protein and between the
   [MATH: qth :MATH]
   pathogen protein and vth pathogen protein, respectively;
   [MATH: Nq
   :MATH]
   and
   [MATH: Vq
   :MATH]
   signify the number of host proteins and pathogen proteins interacting
   with the
   [MATH: qth :MATH]
   pathogen protein; and
   [MATH: Q :MATH]
   typifies the total number of the pathogen proteins in candidate PPIN.
   [MATH:
   αqP,
    −γqP :MATH]
   and
   [MATH:
   βqP :MATH]
   indicate the translation rate from the corresponding mRNA, the
   degradation rate, and the basal expression level of the
   [MATH: qth :MATH]
   pathogen protein, respectively; the basal level
   [MATH:
   βqP :MATH]
   denotes the interactions with unknown factors such as acetylation and
   ubiquitination; and
   [MATH:
   vqP
   (t) :MATH]
   represents the stochastic noise of the
   [MATH: qth :MATH]
   pathogen protein at time t. Similarly, the intraspecies and
   interspecies biological mechanism of pathogen proteins in candidate
   PPIN are represented in Equation (2) as the form of
   [MATH: ∑n=1NqaqnP<
   /msubsup>pqP(
   mo>t)pnP
   msubsup>(t)  :MATH]
   and
   [MATH: ∑v=1VqbqvP<
   /msubsup>pqP(
   mo>t)pvH
   msubsup>(t) :MATH]
   , respectively.
   For the GRN of host genes in the candidate HPI-GEN, the dynamic
   regulatory model of the jth host gene can be described as follows:
   [MATH: gjH(t+1)
   =gjH
   (t)+∑τ=1TjcjτH<
   /msubsup>pτH(
   mo>t)+∑ℓ=1LjdjℓH<
   /msubsup>lℓH(
   mo>t)−∑μ=1MjejμH<
   /msubsup>gjH(
   mo>t)mμH
   msubsup>(t)−λjHgjH(
   t)+δjH
   +ϖjH(t), <
   /mtext>for j=1,2,…,J,
    −ejμ
   H≤0 and−λjH≤0 :MATH]
   (3)
   where
   [MATH:
   gjH
   (t), pτH(t), mμH
   mi>(t)
   :MATH]
   and
   [MATH:
   lℓH
   (t) :MATH]
   indicate the expression levels of the jth host gene, the
   [MATH: τth :MATH]
   host TF, the
   [MATH: μth :MATH]
   host microRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA at time t, respectively;
   [MATH:
   cjτH
   mi>, −ejμH :MATH]
   and
   [MATH:
   djℓH
   mi> :MATH]
   represent the regulation ability of the
   [MATH: τth :MATH]
   host TF, the
   [MATH: μth :MATH]
   host microRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA on the jth host gene, respectively;
   [MATH:
   Tj, 
   Mj :MATH]
   and
   [MATH: Lj
   :MATH]
   denote the number of host TFs, host miRNAs, and host lncRNAs,
   respectively, which regulate the expression level of the jth host gene;
   [MATH: J :MATH]
   indicates the total amount of genes in GRN;
   [MATH:
   −λjH
    :MATH]
   and
   [MATH:
   δjH :MATH]
   indicate the degradation rate and the expression basal level of the jth
   host gene, respectively; and
   [MATH:
   ϖjH
   (t) :MATH]
   denotes the stochastic noise due to modeling residue at time t. The
   basal level
   [MATH:
   δjH :MATH]
   denotes the unknown regulations such as methylation. Note that since
   microRNAs silence gene expression by binding to target mRNAs [[69]39],
   repression ability
   [MATH:
   −ejμH :MATH]
   in candidate GRN model should be constrained negative. Likewise,
   [MATH:
   −λjH
    :MATH]
   degradation rate should be restricted non-positive due to the
   depression regulation of mRNA. Furthermore, though supporting evidence
   comes from numerous research showing that HBV proteins play a vital
   role in the regulation of viral and host gene expression, different
   from other viruses such as HIV, whose viral protein directly binds to
   nucleic acids [[70]40], HBV proteins are thought to affect host gene
   expression through transregulation, e.g., transactivation and
   transrepression [[71]41,[72]42,[73]43,[74]44]. Thus, it seems tenable
   to exclude the regulations between pathogen proteins and host gene in
   the Equation (3).
   In the same way, for the GRN of pathogen genes in the candidate
   HPI-GEN, the dynamic regulatory model of the
   [MATH: uth :MATH]
   pathogen gene can be described as follows:
   [MATH: guP(t+1)
   =guP
   (t)+∑τ=1TucuτP<
   /msubsup>pτH(
   mo>t)+∑ℓ=1LuduℓP<
   /msubsup>lℓH(
   mo>t)−∑μ=1MueuμP<
   /msubsup>guP(
   mo>t)mμH
   msubsup>(t)−λuPguP(
   t)+δuP
   +ϖuP(t), <
   /mtext>for u=1,2,…,U,
    −euμ
   P≤0 and−λuP≤0 :MATH]
   (4)
   where
   [MATH:
   guP
   (t), pτH(t), mμH
   mi>(t)
   :MATH]
   and
   [MATH:
   lℓH
   (t) :MATH]
   indicate the expression levels of the
   [MATH: uth :MATH]
   pathogen gene, the
   [MATH: τth :MATH]
   host TF, the
   [MATH: μth :MATH]
   host microRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA at time t, respectively;
   [MATH:
   cuτP
   mi>, −euμP :MATH]
   and
   [MATH:
   duℓP
   mi> :MATH]
   represent the regulation ability of the
   [MATH: τth :MATH]
   host TF, the
   [MATH: μth :MATH]
   host microRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA on the
   [MATH: uth :MATH]
   pathogen, respectively;
   [MATH:
   Tu, 
   Mu :MATH]
   and
   [MATH: Lu
   :MATH]
   denote the number of host TFs, host miRNAs, and host lncRNAs,
   respectively, which regulate the expression level of the
   [MATH: uth :MATH]
   pathogen;
   [MATH: U :MATH]
   indicates the total amount of pathogen genes in GRN;
   [MATH:
   −λuP
    :MATH]
   and
   [MATH:
   δuP :MATH]
   indicate the degradation rate and the expression basal level of the
   [MATH: uth :MATH]
   pathogen, respectively; and
   [MATH:
   ϖuP
   (t) :MATH]
   denotes the stochastic noise due to modeling residue at time t. It is
   worth noting that there is a consensus on the opinion about the
   indispensability of host protein synthesis machinery for pathogen
   protein production and gene transcription [[75]45,[76]46]. Walsh et al.
   [[77]47] even suggested that viruses are fully reliant on the
   translation machinery of their host cells to produce the polypeptides
   that are essential for viral replication. As a result, only gene
   regulation conducted by host transcription factors
   [MATH:
   pτH
   (t) :MATH]
   instead of pathogen proteins
   [MATH:
   pP<
   mo>(t) :MATH]
   is included in Equation (4).
   Since the expression of the
   [MATH: μth :MATH]
   host miRNA
   [MATH:
   mμH
   (t) :MATH]
   and the
   [MATH: ℓ :MATH]
   th host
   [MATH:
   lℓH
   (t) :MATH]
   in Equation (3) at time t might also be regulated by other regulators,
   the
   [MATH: μth :MATH]
   miRNA can be described by the following stochastic dynamic regulatory
   equation:
   [MATH: mμH(t+1)
   =mμH
   (t)+∑τ=1TμfμτH<
   /msubsup>pτH(
   mo>t)+∑ℓ=1LμhμℓH<
   /msubsup>lℓH(
   mo>t)−∑r=1MμkμrH<
   /msubsup>mμH(
   mo>t)mrH
   msubsup>(t)−φμHmμH(
   t)+ημH
   +εμH(t), <
   /mtext>for μ=1,2,…,M,
    −kμr
   H≤0 and−φμH≤0 :MATH]
   (5)
   where
   [MATH:
   mμH
   (t), pτH(t), mrH
   mi>(t)
   :MATH]
   and
   [MATH:
   lℓH
   (t) :MATH]
   indicate the expression levels of the
   [MATH: μth :MATH]
   host gene, the
   [MATH: τth :MATH]
   host TF, the
   [MATH: rth :MATH]
   host miRNA and the
   [MATH: ℓth :MATH]
   host lncRNA at time t, respectively;
   [MATH:
   fμτH
   mi>, −kμrH :MATH]
   and
   [MATH:
   hμℓH
   mi> :MATH]
   represent the regulation ability of the
   [MATH: τth :MATH]
   host TF, the
   [MATH: rth :MATH]
   host microRNA and the
   [MATH: ℓth :MATH]
   host lncRNA on the
   [MATH: μth :MATH]
   host miRNA, respectively;
   [MATH:
   Tμ, 
   Mμ :MATH]
   and
   [MATH: Lμ
   :MATH]
   denote the number of host TFs, host miRNAs and host lncRNAs,
   respectively, which regulate the expression level of the
   [MATH: μth :MATH]
   host miRNA;
   [MATH: M :MATH]
   indicates the total number of miRNAs in GRN;
   [MATH:
   −φμH
    :MATH]
   and
   [MATH:
   ημH :MATH]
   indicate the degradation rate and the expression basal level of the
   [MATH: μth :MATH]
   host miRNA, respectively; and
   [MATH:
   εμH
   (t) :MATH]
   denotes the stochastic noise due to modeling residue at time t.
   Along the same line, the dynamic regulatory model of the host lncRNA in
   candidate GEIN can be described by the following dynamic equation:
   [MATH: lℓH(t+1)
   =lℓH
   (t)+∑τ=1TℓxℓτH<
   /msubsup>pτH(
   mo>t)+∑ℓ=1OℓyℓoH<
   /msubsup>loH(
   mo>t)−∑r=1MℓzℓμH<
   /msubsup>lℓH(
   mo>t)mμH
   msubsup>(t)−σℓHlℓH(
   t)+ϕℓH
   +ψℓH(t), <
   /mtext>for ℓ=1,2,…,L,
    −zℓμ
   H≤0 and−σℓH≤0 :MATH]
   (6)
   where
   [MATH:
   lℓH
   (t), pτH(t), mμH
   mi>(t)
   :MATH]
   and
   [MATH:
   loH
   (t) :MATH]
   indicate the expression levels of the
   [MATH: ℓth :MATH]
   host lncRNA, the
   [MATH: τth :MATH]
   host TF, the
   [MATH: μth :MATH]
   host microRNA and the
   [MATH: ℓth :MATH]
   host lncRNA at time t, respectively;
   [MATH:
   xℓτH
   mi>, −zℓμH :MATH]
   and
   [MATH:
   yℓoH
   mi> :MATH]
   represent the regulation ability of the
   [MATH: τth :MATH]
   host TF, the
   [MATH: μth :MATH]
   host microRNA and the
   [MATH: oth :MATH]
   host lncRNA on the
   [MATH: ℓth :MATH]
   host lncRNA, respectively;
   [MATH:
   Tℓ, 
   Mℓ :MATH]
   and
   [MATH: Lℓ
   :MATH]
   denote the number of host TFs, host miRNAs and host lncRNAs,
   respectively, which regulate the expression level of the
   [MATH: ℓth :MATH]
   host lncRNA;
   [MATH: L :MATH]
   indicates the total number of lncRNA in GRN;
   [MATH:
   −σℓH
    :MATH]
   and
   [MATH:
   ϕℓH :MATH]
   indicate the degradation rate and the expression basal level of the
   [MATH: ℓth :MATH]
   host lncRNA, respectively; and
   [MATH:
   ψℓH
   (t) :MATH]
   denotes the stochastic noise due to modeling residue at time t.
2.5. Parameter Estimation of the Dynamic Models of Candidate HPI-GEN by
System Identification Approach
   In order to identify real HPI-GEN, the accurate model parameters
   training process by executing a system identification approach to the
   candidate HPI-GEN after the construction of dynamic model Equations
   (1)–(6) is requisite. Consequently, the stochastic Equations (1) and
   (2), which depict the relation of protein interactions of the ith host
   and qth pathogen protein, can be converted into the linear regression
   forms as below, respectively:
   The ith host protein:
   [MATH: piH(t+1)=[piHp1H(t)⋯piHpNi
   mrow>H(t)piHp1P(t) ⋯piHpViP(t)giH(t)piH(t)1][a
   i1H<
   mtr>⋮ai
   NiH
   bi1H⋮
   biViHαi<
   /mi>H1−γiHβiH]<
   /mo>+viH<
   mo stretchy="false">(t)   <
   /mo>   ≜ϕiHP(t
   )θiHP+viH(t), for i=1<
   /mn>,2,…,I<
   /mtr> :MATH]
   (7)
   The qth host protein:
   [MATH: pqP(t+1)
   =[pqPp1H(t)⋯pqPpNq
   mrow>H(t)pqPp1P(t) ⋯pqPpVqP(t)gqP(t)pqP(t)1][a
   q1P<
   mtr>⋮aq
   NqP
   bq1H⋮
   bqVqPαq<
   /mi>P1−γqPβqP]<
   /mo>+vqP<
   mo stretchy="false">(t)   <
   /mo>  ≜ϕqPP(t)
   θqPP
   msubsup>+vqP(t), 
   for q=1,<
   /mo>2,…,Q :MATH]
   (8)
   where
   [MATH:
   ϕiHP(t)
   :MATH]
   and
   [MATH:
   ϕqPP(t)
   :MATH]
   represent the regression vectors that can be obtained from the
   expression data and
   [MATH:
   θiHP :MATH]
   and
   [MATH:
   θqPP :MATH]
   are the unknown parameter vectors to be estimated for the
   [MATH: ith :MATH]
   host protein,
   [MATH: qth :MATH]
   pathogen protein in host PPIN, respectively.
   Thence, the Equations (7) and (8) of the ith host and qth pathogen
   protein can further be augmented for
   [MATH:
   Yi  :MATH]
   and
   [MATH:
   Yq  :MATH]
   time points as the following forms, respectively:
   The ith host protein:
   [MATH:
   [
   piH(t
   mi>2)piH(t3)⋮<
   mi>piH(
   tYi+1)
   ]=[ϕiHP<
   mrow>(t1)
   ϕi
   HP(
   t2)<
   mtr>⋮ϕi
   HP(<
   msub>tYi)]θiHP+[viH(t<
   /mi>1)viH(
   mo>t2)⋮
   viH(tYi
   )],<
   /mo> for i
   =1,2,…,<
   mi>I :MATH]
   (9)
   the Equation (9) can also be simply represented as
   [MATH:
   PiH=
   ΦiHPθiHP+ΩiHP,for i=
   1,2,…,I<
   /mrow> :MATH]
   (10)
   The qth host protein:
   [MATH:
   [
   pqP(t
   mi>2)pqP(t3)⋮<
   mi>pqP(
   tYq+1)
   ]=[ϕqPP<
   mrow>(t1)
   ϕq
   PP(
   t2)<
   mtr>⋮ϕq
   PP(<
   msub>tYq)]θqPP+[vqP(t<
   /mi>1)vqP(
   mo>t2)⋮
   vqP(tYq
   )],<
   /mo> for q
   =1,2,…,<
   mi>Q :MATH]
   (11)
   the Equation (11) can also be simply represented as
   [MATH:
   PqP=
   ΦqPPθqPP+ΩqPP,for q=
   1,2,…,Q<
   /mrow> :MATH]
   (12)
   After a succession of equation transformation procedure in advance, the
   parameters in vectors
   [MATH:
   θiHP :MATH]
   and
   [MATH:
   θqPP :MATH]
   of Equations (10) and (12) can be estimated by individually employing
   the following constrained least-squares estimation problems,
   [MATH:
   θiHP :MATH]
   parameters estimation:
   [MATH: θ^iHP=minθiHP‖ΦiHPθiHP−PiH‖22, subject to AiHPθ^iHP≤biHPwher
   e AiH
   mi>P=[<
   mtr>0⋯00⋯0<
   mo>−1000⋯0
   0⋯0010]∈ℝ2
   ×(I+Q+3) and
    biHP=[01] :MATH]
   (13)
   [MATH:
   θqPP :MATH]
   parameters estimation:
   [MATH: θ^qPP=minθqPP‖ΦqPPθqPP−PqP‖22, subject to AqPPθ^qPP≤bqPPwher
   e AqP
   mi>P=[<
   mtr>0⋯00⋯0<
   mo>−1000⋯0
   0⋯0010]∈ℝ2
   ×(I+Q+3) and
    bqPP=[01] :MATH]
   (14)
   Subsequently, we can acquire the interaction parameters in PPIN
   Equations (1) and (2) individually by resolving the least-squares
   problems in (13) and (14) with the help of the function lsqlin in
   MATLAB optimization toolbox and simultaneously ensure the protein
   translation rate
   [MATH:
   αiH :MATH]
   and
   [MATH:
   αqP :MATH]
   to be a non-negative value and the protein degradation rate
   [MATH:
   −γiH
    :MATH]
   and
   [MATH:
   −γqP
    :MATH]
   to be a non-positive value; that is to say
   [MATH:
   αiH,
   αqP≥0
   mrow> :MATH]
   and
   [MATH:
   −γiH
   ,−γqP≤0 :MATH]
   .
   In the same manner, the dynamic Equations (3)–(6) which depict the
   relationship of gene regulations for the jth host gene, uth pathogen
   gene,
   [MATH: μth :MATH]
   host miRNA, and
   [MATH: ℓth :MATH]
   host lncRNA can be rewritten into the linear regression forms below,
   respectively:
   The jth host gene:
   [MATH: gjH(t+1)
   =[p1H(t)⋯pT
   mi>jH(t)gjH(t)m1H(t)⋯gjH(t)mMj
   mi>H(t) l1H(t)⋯lL
   mi>jH(t) gjH(t) 1][c
   j1H<
   mtr>⋮cj
   TjH
   −ej1
   H⋮
   −ejM
   jHdj1H⋮djLjH1−λj
   mi>Hδj
   mi>H]+
   ϖjH(t)   <
   /mo>  ≜ϕjHG(t)
   θjHG
   msubsup>+ϖjH(t), 
   for j=1,<
   /mo>2,…,J :MATH]
   (15)
   The uth pathogen gene:
   [MATH: guP(t+1)
   =[p1H(t)⋯pT
   mi>uH(t)guP(t)m1H(t)⋯guP(t)mMu
   mi>H(t) l1H(t)⋯lL
   mi>uH(t) guP(t) 1][c
   u1P<
   mtr>⋮cu
   TuP
   −eu1
   P⋮
   −euM
   uPdu1P⋮duLuP1−λu
   mi>Pδu
   mi>P]+
   ϖuP(t)   <
   /mo>  ≜ϕuPG(t)
   θuPG
   msubsup>+ϖuP(t), 
   for u=1,2
   mn>,…,U
   :MATH]
   (16)
   The
   [MATH: μth :MATH]
   host miRNA:
   [MATH: mμH(t+1)
   =[p1H(t)⋯pT
   mi>μHmμ
   H(t)m1H(t)⋯mμH(t)mRμ
   mi>H(t) l1H(t)⋯lL
   mi>μH(t) mμH(t) 1][f
   μ1H<
   mtr>⋮fμ
   TμH
   −kμ1
   H⋮
   −kμR
   μHhμ1H⋮hμLμH1−φμ
   mi>Hημ
   mi>H]+
   εμH(t)   <
   /mo>  ≜ϕμHM(t)
   θμHM
   msubsup>+εμH(t), 
   for μ=1,<
   /mo>2,…,M :MATH]
   (17)
   The
   [MATH: ℓth :MATH]
   host lncRNA:
   [MATH: lℓH(t+1)
   =[p1H(t)⋯pT
   mi>ℓHlℓ
   H(t)m1H(t)⋯lℓH(t)mMℓ
   mo>H(t) l1H(t)⋯lL
   mi>ℓH(t) lℓH(t) 1][x
   ℓ1H<
   mtr>⋮xℓ
   TℓH
   −zℓ1
   H⋮
   −zℓM
   ℓHyℓ1H⋮yℓLℓH1−σℓ
   mo>Hϕℓ
   mo>H]+
   ψℓH(t)   <
   /mo>  ≜ϕℓHL(t)
   θℓHL
   msubsup>+ψℓH(t), 
   for ℓ=1,<
   /mo>2,…L :MATH]
   (18)
   where
   [MATH:
   ϕjHG(t), ϕuPG(t), ϕμHM(t)
   :MATH]
   and
   [MATH:
   ϕℓHL(t)
   :MATH]
   represent the regression vectors that can be obtained from the
   expression data and
   [MATH:
   θjHG, θuPG, θμHM :MATH]
   and
   [MATH:
   θℓHL :MATH]
   are the unknown parameter vectors to be estimated for the
   [MATH: jth :MATH]
   host gene,
   [MATH: uth :MATH]
   pathogen gene,
   [MATH: μth :MATH]
   host miRNA, and
   [MATH: ℓth :MATH]
   host lncRNA in host GRN, respectively.
   Then, Equations (15)–(18) of the jth host gene, uth pathogen gene,
   [MATH: μth :MATH]
   host miRNA, and
   [MATH: ℓth :MATH]
   host lncRNA can be further augmented for
   [MATH:
   Yj, 
   Yu, Y<
   /mi>μ :MATH]
   and
   [MATH: Yℓ
   :MATH]
   time points as the following forms, respectively:
   The jth host gene:
   [MATH:
   [
   gjH(t2)
   gjH(t3)⋮gjH(tYj
   +1)]=[ϕjHG(t1)
   ϕjHG(t2)⋮ϕjHG
   mi>(tYj
   )]θjHG+[<
   mi>ϖjH(t1)
   ϖjH(t2)⋮ϖjH(tYj
   )], for 
   j=1,2,…<
   mo>,J :MATH]
   (19)
   the Equation (19) can also be simply represented as:
   [MATH:
   GjH=ΦjHGθjHG+ΩjHG, for j=1,2,…,J :MATH]
   (20)
   The uth pathogen gene:
   [MATH:
   [
   guP(t2)
   guP(t3)⋮guP(tYu
   +1)]=[ϕuPG(t1)
   ϕuPG(t2)⋮ϕuPG
   mi>(tYu
   )]θuHG+[<
   mi>ϖuP(t1)
   ϖuP(t2)⋮ϖuP(tYu
   )], for u
   mi>=1,2,…<
   mo>,U :MATH]
   (21)
   the Equation (21) can also be simply represented as:
   [MATH:
   GuP=ΦuPGθuPG+ΩuPG, for u=1,2,…,U :MATH]
   (22)
   The
   [MATH: μth :MATH]
   host miRNA:
   [MATH:
   [
   mμH(t2)
   mμH(t3)⋮mμH(tYμ
   +1)]=[ϕμHM(t1)
   ϕμHM(t2)⋮ϕμHM
   mi>(tYμ
   )]θμHM+[<
   mi>εjH(t1)
   εjH(t2)⋮εjH(tYμ
   )], for 
   μ=1,2,…<
   mo>,M :MATH]
   (23)
   the Equation (23) can also be simply represented as:
   [MATH:
   MμH=ΦμHMθμHM+ΩμHM, for μ=1,2,…,M :MATH]
   (24)
   The
   [MATH: ℓth :MATH]
   host lncRNA:
   [MATH:
   [
   lℓH(t2)
   lℓH(t3)⋮lℓH(tYℓ
   +1)]=[ϕℓHL(t1)
   ϕℓHL(t2)⋮ϕℓHL
   mi>(tYℓ
   )]θjHG+[<
   mi>ψℓH(t1)
   ψℓH(t2)⋮ψℓH(tYℓ
   )], for 
   ℓ=1,2,…<
   mo>,L :MATH]
   (25)
   the Equation (25) can also be simply represented as:
   [MATH:
   LℓH=ΦℓHGθℓHG+ΩℓHG, for ℓ=1,2,…,L :MATH]
   (26)
   After a succession of equation transformation procedure in advance, the
   parameters in vectors
   [MATH:
   θjHG, θuPG, θμHM :MATH]
   and
   [MATH:
   θℓHL :MATH]
   of Equations (20), (22), (24) and (26) can be estimated by individually
   employing the following constrained least-square estimation problem,
   [MATH: θ^jHG=minθjHG12‖ ΦjHGθjHG−GjH‖22, subjec
   t to 
   AjHGθ^jHG≤bjHG   where b
   jHG=[<
   /mo>01<
   /mtd>] and AjHG<
   /mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mj+1
   )×(Tj+Mj+Lj
   +2)  :MATH]
   (27)
   [MATH: θ^uPG=minθuPG12‖ ΦuPGθuPG−GuP‖22, subjec
   t to 
   AuPGθ^uPG≤buPG   where b
   uPG=[<
   /mo>01<
   /mtd>] and AuPG<
   /mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mu+1
   )×(Tu+Mu+Lu
   +2)  :MATH]
   (28)
   [MATH: θ^μHM=minθμHM12‖ ΦμHMθμHM−MμH‖22, subjec
   t to 
   AμHMθ^μHM≤bμHM   where b
   μHM=[<
   /mo>01<
   /mtd>] and AμHM<
   /mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mμ+1
   )×(Tμ+Mμ+Lμ
   +2)  :MATH]
   (29)
   [MATH: θ^ℓHL=minθℓHL12‖ ΦℓHLθℓHL−LℓH‖22, subjec
   t to 
   AℓHLθ^ℓHL≤bℓHL   wherebℓ
   HL=[01] and AℓHL<
   /mrow>=[00⋯010⋯000⋯00000⋯001⋯000⋯000⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮⋱⋮⋮⋮00⋯000⋯100⋯00000⋯000⋯000⋯010]∈ℝ(Mℓ+1
   )×(Tℓ+Mℓ+Lℓ
   +2)  :MATH]
   (30)
   Through applying the function
   [MATH:
   lsqlin :MATH]
   in MATLAB optimization toolbox to solve the constrained least-squares
   estimation problem in (27)–(30), we could obtain optimal regulatory
   parameters for GRN Equations in (3)–(6) and concurrently guarantee that
   the miRNA repression ability
   [MATH:
   −ejμH, −euμP,<
   mtext> −kμr<
   /mrow>H :MATH]
   and
   [MATH:
   −zℓμH :MATH]
   as well as the degradation rate
   [MATH:
   −λjH
   , −λuP<
   /mi>, −φμH :MATH]
   and
   [MATH:
   −σℓH
    :MATH]
   values corresponding to the jth host gene, the uth pathogen gene, the
   [MATH: μth :MATH]
   host miRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA to be non-positive, respectively.
   So far, the structure of dynamic regression model construction is
   complete, yet if we substitute the expression data into the linear
   model right away, infinite solutions problems will emerge due to the
   lack of enough data points, in other words, information deficiency in
   comparison with parameters for evaluation, that is, overfitting. Thus,
   the cubic spline for extra numbers of time-profile data points
   interpolation is applied to prevent this trouble (For more details,
   readers can refer to [78]Appendix A). Then, with the processed RNA-seq
   data, the accurate solutions of the constrained least-squares
   estimation problems in (13)–(14) and (27)–(30) can be attained.
   Moreover, since the measurement technology of genome-wide protein
   expression in HepaRG cells and Hepatitis B Virus during infection
   hasn’t been realized yet, and supporting evidence comes from research
   showing that the cellular concentrations of proteins correlate with the
   abundances of their corresponding mRNAs, which implied that the
   variance in protein abundance can be explained by that of mRNA
   [[79]48,[80]49,[81]50,[82]51] and a correlation coefficient of 48% was
   also obtained between the mRNA and protein abundances in human liver
   [[83]49], i.e., the RNA-seq data of gene expressions can substitute
   protein expressions and contribute to sufficient information for
   resolving above constrained least-squares parameter estimation problems
   in (13) and (14) and (27)–(30). One could retain more information in
   our previous research [[84]52].
2.6. Determination of Significant Interaction Pairs
   On account of approaching the best combination to constitute the robust
   HPI-GEN in the parameter identification procedures, we exploited the
   cost function to obtain the optimal fit for the host/pathogen RNA-seq
   data, to be more precisely, estimation of the expected relative
   distance between the fitted model and the unrecognized authentic
   molecular pathogenic mechanism that actually generated the observed
   data. On the whole, as the cost function of the parameter estimation
   for linear regression model, mean squared error (MSE) is enough to
   calculate the residual variance. However, the cost from the model
   complexity that might also influences the performance should be taken
   into consideration as well. Akaike Information Criterion (AIC), taking
   the place of MSE, was thereby selected to assess the residual variance
   and model complexity at once. As the expected residual variance
   declines with rising parameter numbers for inadequate model
   complexities, there should be a minimum around the correct parameter
   number after repeated coordination [[85]52]. Meanwhile, owing to
   computational efficiency, computing the AIC statistics for all possible
   combinations is impracticable. Stepwise methods are developed to
   decreasing the complexity of exhaustive searching [[86]52]. Finally,
   while reaching the minimum of AIC value, we could acquire the real
   number of interactions or regulations for each protein (gene) one by
   one in the candidate HPI-GEN. Those insignificant regulations or
   interactions out of the real number determined by AIC should be pruned
   to obtain the real HPI-GEN.
   For each model composing PPIN, the AIC values of the
   [MATH: ith :MATH]
   host and the
   [MATH: qth :MATH]
   pathogen protein can be defined individually as the following
   equations:
   The
   [MATH: ith :MATH]
   host protein:
   [MATH: AICiHP(Ni,Vi)=log(<
   mn>1Yi(PiH−ΦiHPθ^iHP)T(PiH−ΦiHPθ^iHP))+2(Ni+Vi)Yi=log[(σiHP)2 ]+2(
   Ni+V<
   mi>i)Yi :MATH]
   (31)
   The
   [MATH: qth :MATH]
   pathogen protein:
   [MATH: AICqPP(Nq,Vq)=log(<
   mn>1Yq(PqP−ΦqPPθ^qPP)T(PqP−ΦqPPθ^qPP))+2(Nq+Vq)Yq=log[(σqPP)2 ]+2(
   Nq+V<
   mi>q)Yq :MATH]
   (32)
   for
   [MATH:
   i=1,…,I :MATH]
   and
   [MATH:
   q=1,…,Q :MATH]
   , where
   [MATH:
   AICi
   HP(N
   i,Vi
   ) :MATH]
   with the model complexity
   [MATH:
   Ni+V
   i :MATH]
   and
   [MATH:
   AICq
   PP(N
   q,Vq
   ) :MATH]
   with the model complexity
   [MATH:
   Nq+V
   q :MATH]
   denote the
   [MATH: ith :MATH]
   host and the
   [MATH: qth :MATH]
   pathogen protein in PPI model, respectively;
   [MATH: (σiHP)2 :MATH]
   and
   [MATH: (σqPP)2 :MATH]
   signify the covariance of estimated residual error between
   [MATH:
   PiH :MATH]
   and
   [MATH: ΦiHPθ^iHP :MATH]
   and between
   [MATH:
   PqP :MATH]
   and
   [MATH: ΦqPPθ^qPP :MATH]
   , respectively; and
   [MATH: θ^iHP :MATH]
   and
   [MATH: θ^qPP :MATH]
   represent the estimated parameters acquired from the solutions of the
   parameter estimation problems in (13) and (14), respectively. Suppose
   [MATH:
   Ni*,
   Vi* :MATH]
   and
   [MATH:
   Nq*,
   Vq* :MATH]
   could minimize
   [MATH:
   AICi
   HP(N
   i,Vi
   ) :MATH]
   and
   [MATH:
   AICq
   PP(N
   q,Vq
   ) :MATH]
   in (31) and (32), respectively. Then,
   [MATH:
   Ni*,
   Vi* :MATH]
   and
   [MATH:
   Nq*,
   Vq* :MATH]
   denote the real number of PPIs in the ith host protein and the qth
   pathogen protein. The insignificant PPIs out of these numbers should be
   pruned as false positives from candidate PPIs to obtain real PPIs in
   HPI-GEN.
   For each model comprising GRN, the AIC values of the jth host gene, the
   uth pathogen gene, the
   [MATH: μth :MATH]
   host miRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA can be defined individually as following equations:
   The jth host gene:
   [MATH: AICjHG(Tj+Mj+Lj)<
   mtd
   columnalign="left">=log(<
   mn>1Yj(GjH−ΦjHGθ^jHG)T(GjH−ΦjHGθ^jHG))+2(Tj+Mj+
   Lj)<
   msub>Yj
   mtd>=log[(σjHG)2 ]+2(
   Tj+M<
   mi>j+Lj)
   Yj :MATH]
   (33)
   The uth pathogen gene:
   [MATH: AICuPG(Tu+Mu+Lu)<
   mtd
   columnalign="left">=log(<
   mn>1Yu(GuP−ΦuPGθ^uPG)T(GuP−ΦuPGθ^uPG))+2(Tu+Mu+
   Lu)<
   msub>Yu
   mtd>=log[(σuPG)2 ]+2(
   Tu+M<
   mi>u+Lu)
   Yu :MATH]
   (34)
   The
   [MATH: μth :MATH]
   host miRNA:
   [MATH: AICμHM(Tμ+Mμ+Lμ)<
   mtd
   columnalign="left">=log(<
   mn>1Yμ(MμH−ΦμHMθ^μHM)T(MμH−ΦμHMθ^μHM))+2(Tμ+Mμ+
   Lμ)<
   msub>Yμ
   mtd>=log[(σμHM)2 ]+2(
   Tμ+M<
   mi>μ+Lμ)
   Yμ :MATH]
   (35)
   The
   [MATH: ℓth :MATH]
   host lncRNA:
   [MATH: AICℓHL(Tℓ+Mℓ+Lℓ)<
   mtd
   columnalign="left">=log(<
   mn>1Yℓ(LℓH−ΦℓHLθ^ℓHL)T(LℓH−ΦℓHLθ^ℓHL))+2(Tℓ+Mℓ+
   Lℓ)<
   msub>Yℓ
   mtd>=log[(σℓHL)2 ]+2(
   Tℓ+M<
   mo>ℓ+Lℓ)
   Yℓ :MATH]
   (36)
   for
   [MATH:
   j=1,…,J, q=1
   ,…,Q, μ=1,…,M :MATH]
   , and
   [MATH:
   ℓ=1,…,L :MATH]
   where
   [MATH:
   AICj
   HG(T
   j+Mj+<
   msub>Lj), AICuP
   G(T<
   mi>u+Mu+Lu), AICμH<
   /mi>M(Tμ+Mμ+Lμ)
   :MATH]
   and
   [MATH:
   AICℓ
   HL(T
   ℓ+Mℓ+<
   msub>Lℓ)
   :MATH]
   with the model complexity
   [MATH:
   Tj+M
   j+Lj,<
   mtext> Tu+M
   mi>u+Lu, Tμ+
   Mμ+Lμ :MATH]
   and
   [MATH:
   Tℓ+M
   ℓ+Lℓ :MATH]
   denote the AIC values of the jth host gene, the uth pathogen gene, the
   [MATH: μth :MATH]
   host miRNA, and the
   [MATH: ℓth :MATH]
   host lncRNA in GRN model, respectively;
   [MATH: (σjHG)2, (σuPG)2, (σμHM)2 :MATH]
   and
   [MATH: (σℓHL)2 :MATH]
   signify the covariance of estimated residual error between
   [MATH:
   GjH :MATH]
   and
   [MATH: ΦjHGθ^jHG, :MATH]
   [MATH:
   GuP :MATH]
   and
   [MATH: ΦuPGθ^uPG, :MATH]
   [MATH:
   MμH :MATH]
   and
   [MATH: ΦμHMθ^μHM, :MATH]
   and
   [MATH:
   LℓH :MATH]
   and
   [MATH: ΦℓHLθ^ℓHL, :MATH]
   respectively; and
   [MATH: θ^jHG, θ^uPG, θ^μHM :MATH]
   and
   [MATH: θ^ℓHL :MATH]
   represent the estimated parameters acquired from the solutions of the
   parameter estimation problems in (27)–(30), respectively. Similarly,
   the real numbers of regulation by TF, miRNA, lncRNA are obtained by
   minimizing the corresponding AICs in (33)–(36). The insignificant
   regulations out of these real numbers should be pruned as false
   positives to obtain the real regulations by TF, miRNA, lncRNA in
   HPI-GEN.
   So far, through a sequence of systematic identification processes, the
   construction for real HPI-GENs is broadly accomplished by pruning those
   insignificant interactions and regulations out of the corresponding
   AIC. Unfortunately, due to the excessively enormous network
   architecture, we could hardly extract the core pathways to investigate
   the pathogenesis during HBV infection, let alone screening of candidate
   proteins as new targets for drug development. We thereby extracted the
   core network from the real HPI-GEN via applying the principal network
   projection (PNP) method.
2.7. Extracting Core Network Structure from the Real HPI-GEN by Using PNP
Approach
   Prior to core network extraction from the real HPI-GEN through applying
   PNP method, it is essential to integrate the network estimated
   parameters from PPIN and GRN into a system matrix A as follows,
   [MATH: A= [AHP
   ,HPAHP
   ,PP 00APP
   ,HPAPP
   ,PP00AHG
   ,HP 0AHG
   ,HMAHG
   ,HLAHM
   ,HP 0AHM
   ,HMAHM
   ,HLAHL
   ,HP 0AHL
   ,HMAHL
   ,HLAPG
   ,HP 0APG
   ,HMAPG
   ,HL] <
   mo>=[ a^11H
   ⋯a^1IH⋮a^inH⋮a^I1H⋯a^IIHb^11H
   ⋯b^1QH⋮b^ivH⋮b^I1H⋯b^IQH0⋯
   mo>0⋮
   0⋮0<
   mo>⋯00⋯
   mo>0⋮
   0⋮0<
   mo>⋯0<
   mtr columnalign="left"> a^11P
   ⋯a^1IP⋮a^qnP⋮a^Q1P⋯a^QIPb^11P
   ⋯b^1QP⋮b^qvP⋮b^Q1P⋯b^QQP0⋯
   mo>0⋮
   0⋮0<
   mo>⋯0 0
   mtd>⋯0⋮0⋮0<
   /mn>⋯0 c^11H
   ⋯c^1IH⋮c^jτH⋮c^I1H⋯c^IIH0⋯
   mo>0⋮
   0⋮0<
   mo>⋯0−e^11H
   ⋯−<
   msubsup>e^1MH⋮−e^jμH⋮−e^I1H⋯−e^IMHd^11H
   ⋯d^1LH⋮d^jℓH⋮d^I1H⋯d^ILH
    f^11H
   ⋯f^1IH⋮f^μτH⋮f^M1H⋯f^MIH0⋯
   mo>0⋮
   0⋮0<
   mo>⋯0−k11H⋯−k1<
   mi>MH⋮<
   /mo>−kμr
   H⋮−kM1H⋯−kMMH
   mi>h^11H
   ⋯h^1LH⋮h^μℓH⋮h^M1H⋯h^MLH
    x^11H
   ⋯x^1IH⋮x^ℓτH⋮x^L1H⋯x^LIH 0
   mtd>⋯0⋮0⋮0<
   /mn>⋯0−z^11H
   ⋯−<
   msubsup>z^1MH⋮−z^ℓμH⋮−z^L1H⋯−z^LMHy^11H
   ⋯y^1LH⋮y^ℓoH⋮y^L1H⋯y^LLH
    c^11P
   ⋯c^1IP⋮c^uτP⋮c^Q1P⋯c^QIP 0
   mtd>⋯0⋮0⋮0<
   /mn>⋯0−e^11P
   ⋯−<
   msubsup>e^1MP⋮−e^uμP⋮−e^Q1P⋯−e^QMPd^11P
   ⋯d^1LP⋮d^uℓP⋮d^Q1P⋯d^QLP
   ]∈ℝ(2I+2Q+L+M)×(I+Q+L+M)
   :MATH]
   where HP, PP, HG, HM, HL and PG denote the host protein, pathogen
   protein, host gene, host miRNA, host lncRNA and pathogen gene,
   respectively;
   [MATH: a^inH, b^ivH :MATH]
   and
   [MATH: a^qnP, b^qvP :MATH]
   mentioned in (1) and (2) could be acquired in
   [MATH: θ^iHP :MATH]
   and
   [MATH: θ^qPP :MATH]
   by resolving the parameter estimation problems in (13) and (14) and
   pruning false positives by AIC method in (31) and (32), respectively;
   [MATH: {c^jτH, −e^jμH, d^jℓH} :MATH]
   ,
   [MATH: {f^μτH, −kμrH,
    h^μℓH} :MATH]
   ,
   [MATH: {x^ℓτH, −z^ℓμH, y^ℓoH} :MATH]
   , and
   [MATH: {c^uτP,−e^uμP,d^uℓP} :MATH]
   mentioned in (3)–(6) could be acquired in
   [MATH: θ^jHG, θ^uPG, θ^μHM :MATH]
   and
   [MATH: θ^ℓHL :MATH]
   by resolving the parameter estimation problems in (27)–(30) and pruning
   false positives by AIC method in (33)–(36) respectively. Note that
   since the regulations from pathogen proteins weren’t taken into
   consideration, the corresponding parameters in matrix A are padded with
   zeros.
   Thereafter, we extract the core components of HPI-GEN by PNP approach,
   a principal network analysis method for dimensionality reduction based
   on network structure projection technique, that is, projecting matrix A
   to its principal singular vectors space. Accordingly, the combined
   network matrix A can be denoted by singular value decomposition form
   below,
   [MATH:
   A=USVT :MATH]
   (37)
   where
   [MATH: U∈ℝ(2I+2Q+L+M)×(I+Q+L+M), V∈ℝ(I+Q+L+M)×(I+Q+L+M), and S=diag(σ1,<
   /mo>…,σs,
   …σI+Q+L+M)
   :MATH]
   is a diagonal matrix of
   [MATH:
   σ1,…,<
   /mo>σs,…σI+Q+L+M :MATH]
   which includes the I + Q + L + M singular values of the matrix A in
   descending order, i.e.,
   [MATH:
   σ1≥…≥<
   /mo>σs≥…≥σI+Q+L+M≥0
   :MATH]
   .
   [MATH:
   S=[σ10⋯0⋯
   00σ<
   /mi>2⋯0<
   /mtd>⋯0⋮⋮⋱⋮
   ⋱⋮00⋯σs⋯0⋮⋮⋱<
   /mtd>⋮⋱⋮00⋯
   0⋯σI+Q+L+M
   mi>0
   0⋯0⋯
   mtd>0⋮⋮⋱⋮⋱<
   mtd>⋮00⋯0⋯<
   mn>0]
   :MATH]
   Also, we define the expression fraction
   [MATH: Ew
   :MATH]
   of the interaction ability of the proteins and the transcriptional
   regulatory ability of the genes, lncRNAs, miRNAs as the normalization
   of singular value form,
   [MATH:
   Ew=σw2∑w=1I+Q+L<
   mo>+Mσw
   mi>2
   :MATH]
   (38)
   According to the diminishing property of the singular values, under the
   basic premise of sustaining system energy from the whole network
   structure, we pick out the minimum X such that
   [MATH:
   Σw=1XEw≥0
   .85 :MATH]
   , i.e., the vector space spanned by the orthonormal bases composed of
   the principal singular vectors corresponding to the top X singular
   values contains 85% energy of HPI-GEN structure.
   Afterwards, we define and apply the projection value of each node,
   including gene, miRNA, lncRNA, and protein in the real HPI-GEN to the
   vector space spanned by the orthonormal basis composed of the principal
   singular vectors corresponding to the top X singular values as below,
   [MATH: ProjR
   (Arow<
   /mi>,i,V<
   msup>T*)=[∑k=1X(Arow<
   /mi>,ivk
   )2<
   mo>]1/2 Proj<
   mi>C(Acol<
   /mi>,j,U*
   mo>)=[∑k=1X(Acol,jTu<
   mi>k)2<
   mo>]1/2 :MATH]
   (39)
   where
   [MATH:
   Arow,
   mo>i :MATH]
   denotes the ith row vector of A for
   [MATH:
   i = <
   mn>1,…,2I+
   2Q+L+M :MATH]
   ;
   [MATH:
   Acol,
   mo>j :MATH]
   signifies jth column vector of A for
   [MATH:
   j = <
   mn>1,…,I+Q
   +L+M :MATH]
   ;
   [MATH:
   VT* :MATH]
   ; and
   [MATH: U*
   :MATH]
   represent the vector spaces spanned by the orthonormal basis
   [MATH:
   {v1,
   v2,…,vk} :MATH]
   and
   [MATH:
   {u1,
   u2,…,uk} :MATH]
   [MATH:
   for k=1,…,X
   :MATH]
   , respectively;
   [MATH:
   ProjR(Arow<
   /mi>,i,V<
   msup>T*) :MATH]
   and
   [MATH:
   ProjC(Acol<
   /mi>,j,U*
   mo>) :MATH]
   indicate the projection value from
   [MATH:
   Arow,
   mo>i :MATH]
   and
   [MATH:
   Acol,
   mo>j :MATH]
   to
   [MATH:
   VT* :MATH]
   and
   [MATH: U*
   :MATH]
   vector spaces, respectively.
   While the projection value
   [MATH:
   ProjR(Arow<
   /mi>,i,V<
   msup>T*) :MATH]
   or
   [MATH:
   ProjC(Acol<
   /mi>,j,U*
   mo>) :MATH]
   in (39) approaches zero, it intimates that the corresponding node i or
   j isn’t the pivotal factor or independent to the core network extracted
   via the PNP procedure. Conversely, the larger projection value it gets,
   the greater the node contributes to the core network.
   Eventually, we select the first 3000 components on the top of the list
   with higher energy in (39) as the core host-pathogen network from real
   HPI-GEN of HBV infection by ranking the projection value of each node
   and subsequently uploading those nodes into DAVID to obtain the KEGG
   pathways as shown in Table 3. Taking KEGG pathways as reference, we
   systematically identify and investigate the molecular pathogenic
   mechanism of HBV infection from the core HPI-GEN as shown in Figure 4
   and select the significant biomarkers for the further novel drug
   discovery procedure.
2.8. Deep Learning-Based Drug-Target Interaction Prediction
2.8.1. Data Preparation
   Following the identification of promising pathogenic biomarkers, a
   potent drug-target interaction (DTI) model to predict the interactions
   between the identified biomarkers and their corresponding molecule
   drugs is crucial for discovering favorable molecular compounds for the
   therapeutic treatment. Thus, we constructed a deep learning-based DTI
   model to collect viable interactions to identify available drugs for
   the selected biomarkers. Meanwhile, since most patients in need of
   attention and treatment are infected with chronic hepatitis, the
   highest priority was assigned to efficacious drugs with lower toxicity
   that would not cause irreparable harm to health.
   First, we integrated databases from UniProt [[87]53], DrugBank
   [[88]54], ChEMBL [[89]55], Pubchem [[90]56], and BindingDB [[91]57] to
   assemble applicable drug-target interactions. As feature descriptors
   are simple numerical vectors designed to delineate the complicated
   information of objects, they have been widely utilized to describe the
   genomic sequences and chemical properties of molecules such as the
   characteristics from 2D, 3D spectrum of structure, molecular weight,
   predictive values of LogP, etc. of late [[92]58,[93]59]. In view of
   this property, we employed both the off-the-shelf build-in molecular
   and protein descriptor functions of python package pyBioMed to
   transform the features from each drug and its target among the
   previously collected drug-target interactions into descriptor under
   python 2.7 environment, respectively. These features include molecular
   dockings such as amino acid composition and dipeptide composition. For
   more details about the descriptor transformation, readers could access
   the documents of pyBioMed. Thereafter, we united the descriptors of
   both drugs and their relevant targets into a matrix as the training
   dataset to train the DTI model. The vector
   [MATH:
   vdrug
   mi>-target
   mrow> :MATH]
   corresponding to the descriptor of each drug-target pair is given
   below:
   [MATH:
   vdrug
   mi>-target=[D,T]=[d1,d2
   ,⋯,dI,t1,t2,⋯tJ] :MATH]
   (40)
   where
   [MATH:
   vdrug
   mi>-target
   mrow> :MATH]
   denotes the vector of the descriptor for each the drug-target pair
   comprising two parts. The former part D is the descriptor of the drug
   and
   [MATH: di
   :MATH]
   represents the
   [MATH: ith :MATH]
   drug feature; the latter portion T is the descriptor of the target and
   [MATH: tj
   :MATH]
   indicates the
   [MATH: jth :MATH]
   target characteristic; I is the total number of drug features; J is the
   total number of target features.
   Before performing drug-target interaction prediction, to keep our DTI
   model from inferior performance arising from between-class imbalance
   and distracting variation, some data preprocessing procedures are
   requisite. The entire preprocessing procedure in our work encompasses
   down sampling, data partitioning, feature scaling, and dimensionality
   reduction.
     * Down Sampling:
   There are two categories of drug-target interaction in our samples:
   16,000,000 samples for the unknown interactions (negative instances)
   and 60,000 samples for the known interactions (positive instances).
   Though deep learning methods scale well with the quantity of data and
   can often leverage extremely large datasets for good performance,
   imbalanced class distribution often causes the model to be overwhelmed
   by the large class and ignore the minority one. So as to mitigate the
   effect arising from class imbalance, we reduced the number of majority
   samples (negative instances) to 60,000 which will allow the model to
   learn from both classes equally.
     * Data Partitioning:
   After the down sampling procedure, we partition the data (the
   reconstructed matrix of drug-target pairs) into two sets, three-fourth
   for training and one-fourth for testing.
     * Feature Scaling:
   Since the variables of the features in each drug-target pair are
   measured at different scales, they do not contribute equally to the
   model fitting and might end up with creating a bias. To deal with this
   potential problem, a feature-wise scaling is usually implemented prior
   to model fitting. As powerful techniques of feature scaling, Min-max
   scaling (normalization) and Standardization are both widely used in
   data analysis. However, despite possessing the capability to shift and
   rescale the data into a limited range of values, normalization is
   sensitive to the outliers [[94]60]; in contrast, standardization
   maintains useful information about outliers and makes the model less
   sensitive to them [[95]61]. Thus, we apply Standardization on each
   feature and the corresponding mathematical formulation is shown as
   follows:
   [MATH: di*=di−μiσi
   mi>, ∀
   mo>i=1,…,Itj*=tj−ωjδj
   msub>, ∀j=1,…,J :MATH]
   (41)
   where
   [MATH: di
   :MATH]
   and
   [MATH:
   di* :MATH]
   represent the
   [MATH: ith :MATH]
   feature of the drug before and after Standardization, respectively;
   [MATH: μi
   :MATH]
   and
   [MATH: σi
   :MATH]
   signify the mean and standard deviation of the
   [MATH: ith :MATH]
   drug feature;
   [MATH: tj
   :MATH]
   and
   [MATH:
   tj* :MATH]
   indicate the jth feature of the target before and after
   Standardization, respectively;
   [MATH: ωj
   :MATH]
   and
   [MATH: δj
   :MATH]
   denote the mean and standard deviation of the
   [MATH: ith :MATH]
   target feature; I is the total number of features in the drug while J
   is that of the target.
     * Dimensionality Reduction:
   Since the high-dimensional patterns in samples might augment the number
   of neurons in the network and elevate the computational complexity for
   training, we adopted principal components analysis (PCA) to distill 692
   features from original 1014 features of the samples. One could retain
   as much information of interested [[96]62,[97]63].
   Note that it is illegal and useless to carry out data preprocessing to
   the testing set solely relying on the characteristics in testing data
   in that we would never get any information from the features of testing
   data until the training phase is over. Hence, aside from the
   aforementioned transformation performed on the training data sets, we
   also provide the identical transformation on the testing data during
   the process of feature scaling and dimensionality reduction. More
   exactly, we extracted the mean and standard deviation variables of each
   feature in the training data set to standardize the corresponding
   feature in the testing data. Likewise, the transformation matrix for
   executing PCA of the features in training data was concurrently used to
   reduce the dimension of the features in testing data.
   So far, the training data for tuning the network parameters of the deep
   learning-based DTI model and the testing data for evaluating the model
   performance are ready. Nonetheless, if learned merely on the features
   from the training set, the hyperparameters would always choose the
   maximum possible model capacity, leading to overfitting [[98]63]. As a
   result, we randomly split out one-tenth of the training data to
   estimate the generalization error during training, allowing for the
   hyperparameters to be updated in every epoch. In addition, Early
   Stopping method was also employed to specify an arbitrarily large
   number of training epochs and stop training once improving the model’s
   fit to the training data comes at the expense of increased
   generalization error. Eventually, through setting the optimizer as Adam
   [[99]64] and learning rate = 0.001, we then trained our DTI model for
   40 epochs with 100 samples in each mini-batch.
2.8.2. Parameters Tuning Process Based on Deep Learning Algorithm
   In the network architecture, each layer can be simplified into a
   function below:
   [MATH: h^n=δ(wTxn<
   /msub>+b) :MATH]
   (42)
   where
   [MATH: xn
   :MATH]
   and
   [MATH: h^n :MATH]
   indicate the input and output vectors corresponding to the descriptor
   of the nth drug-target pair, respectively;
   [MATH: δ :MATH]
   denotes the activation function (ReLU for each hidden layer and Sigmoid
   for the output layer); and the vectors w and b are given as follows:
   [MATH: w =[
   w1w2⋮
   we], b=[b1
   b2⋮be
   ] :MATH]
   (43)
   where the weights parameters
   [MATH:
   w1,w
   2,…we<
   /mrow> :MATH]
   in w and bias parameters
   [MATH:
   b1,b
   2,…be<
   /mrow> :MATH]
   in b are free variables that capture the model’s representation of the
   data in each layer and are learned from each sample.
   To evaluate the model performance, the predicted output is compared
   with the true label to compute a loss for the current set of model
   weights. Since drug-target interaction prediction issue is a binary
   classification problem, binary cross-entropy is chosen to calculate the
   loss of each iteration. The summation of the loss
   [MATH: L(y,y^) :MATH]
   for totally N samples is given by the expression as below:
   [MATH: L(y,y^)=−1N<
   /mfrac>∑n=1N(yn∗l
   og(y^n)+(1−yn
   msub>)∗log(1−y^n))=1N
   ∑n=1NCn
   msup>(yn,y^n) :MATH]
   (44)
   where
   [MATH: Cn(yn,y^n) :MATH]
   is loss of the nth sample calculated by binary cross-entropy, in which
   [MATH: yn
   :MATH]
   is the nth actual label (1 or 0) and
   [MATH: y^n :MATH]
   is the nth predicted probability distribution.
   During the learning process, the parameter set
   [MATH: θ :MATH]
   (46) in each layer should achieve the minimization of the objective
   function (47) to obtain the optimal network parameters
   [MATH: θ*
   :MATH]
   . Accordingly, backward propagation learning algorithm (48) is designed
   to update the network parameters of both weights (
   [MATH:
   w1,w
   2,…,we :MATH]
   ) and bias (
   [MATH:
   b1,b
   2,…,be :MATH]
   ).
   [MATH:
   θ=[w1⋮we
   mrow>b1⋮be]
   mrow> :MATH]
   (45)
   [MATH:
   θ*=argminθL(θ) :MATH]
   (46)
   [MATH:
   θi=θ
   i−1−η<
   mo>∇L(θi−1<
   /mn>) :MATH]
   (47)
   where i is the ith iteration of the learning process;
   [MATH: η :MATH]
   is the learning rate (set as 0.001); and
   [MATH: ∇L(θi−1<
   /mn>) :MATH]
   is the gradient of
   [MATH: L(θi−1<
   /mn>) :MATH]
   given as follows:
   [MATH: ∇L(θi−1<
   /mn>)=[∂L(θi−1<
   /mn>)∂w1⋮∂L(θi−1<
   /mn>)∂we
   ∂L(θi−1<
   /mn>)∂b1⋮∂L(θi−1<
   /mn>)∂be] :MATH]
   (48)
   Consequently, by back-propagating a corrective error signal through the
   network, weighted connections between neurons in the DTI model are
   iteratively adjusted and assessed on the basis of the drug-target pairs
   in the training and the validation sets.
2.8.3. Measurement of Prediction Quality
   Assessing a model is quite tricky. It isn’t reliable to evaluate model
   performance (training accuracy of the final epoch) merely based on one
   specific validation set, in that the accuracy obtained from one
   validation set can be very different to that from another. Thus, we
   took advantage of a general technique named 10-fold cross validation to
   avoid the possible bias. That is, the original samples for training is
   randomly partitioned into 10 equal size subsamples; one is retained as
   the validation data for testing the model, and the remaining 9
   subsamples are used as training data. Through repeatedly training and
   validating the models 10 times with different partitions, we then
   compute an average score over the rounds to give an estimate of the
   model’s predictive performance. Afterwards, the model parameters with
   different validation errors were applied to the testing data for
   generating the final test accuracy and loss.
   Aside from that, to compare the effectiveness of our DTI model with
   that of other state-of-the-art ML-based approaches, we adopted AUC-ROC
   curve [[100]65] to tell how good a specific model can distinguish two
   classes, that is, whether or not a drug interacts with a target. ROC
   represents the relationship between benefit (true positive rate) and
   cost (false positive rate), while AUC is a value in the range of 0 to 1
   that indicates the degree of separability between classes. The higher
   AUC we obtain, the more accurate outcome we will get from the model.
   The formulas for AUC-ROC curve are shown as below:
   [MATH:
   True Positive
    Rate = Rec
   all/Sensitivity 
   = TPTP+FN
   mrow> :MATH]
   (49)
   [MATH:
   Specificity =
   TNTN+
   FP :MATH]
   (50)
   [MATH:
   False Positive Rate = 1−Specificity = FPTN+FP
   :MATH]
   (51)
   where True Positive (TP) means that the actual value is positive and is
   judged correctly; False Positive (FP) shows that the actual value is
   positive but is judged by mistake; True Negative (TN) indicates the
   actual value is negative and is judged accurately; False Negative (FN)
   represents the actual value is negative but is judged in error.
3. Results
3.1. Overview of Systems Medicine Discovery Procedure
   Though a great advance has been made to design and evaluate medicine
   based on the application of diversified drug repositioning approaches,
   it still takes a large amount of time and vigor if without a potent
   avenue to select the promising biomarkers and drugs for the clinical
   trials. In this study, we proposed a systematic biology method
   containing two sections including potential biomarkers identification
   and novel drugs discovery.
   The overall flowchart of the proposed systems biology procedure is
   shown in [101]Figure 1. On one hand, we applied a systematic
   identification analysis procedure to the constructed candidate HPI-GEN,
   considering both accuracy and model complexity with the help of the
   information from the two-side real-time profile RNA-seq. Note that the
   total nodes and edges respectively given in [102]Table 1 and [103]Table
   2 of the real HPI-GEN in [104]Figure 2 abstracted from the candidate
   HPI-GEN has considerably shrunk in comparison with that of the
   candidate HPI-GEN, which indicates that the false-positives were
   eliminated during the identification process. Supported with the PNP
   approach, we then identified the core HPI-GEN in [105]Figure 3 by
   selecting top-ranked 3000 nodes with significant projection values that
   could reflect 85% of the real HPI-GEN. The higher the projection value
   is, the stronger regulatory capacity it possesses in the host/pathogen
   cross-talk mechanism during HBV infection. In the meantime, given
   identified KEGG pathways derived through uploading nodes in core
   HPI-GEN into DAVID in [106]Table 3, we selected the core signaling
   pathways so as to explore the underlying etiologic mechanism in HBV
   infection as shown in [107]Figure 4, involving microenvironmental
   factors by surveying relevant papers and research.
Figure 1.
   [108]Figure 1
   [109]Open in a new tab
   Flowchart of applying systems biology method to construct the candidate
   HPI-GEN, real HPI-GEN, core HPI-GEN, and core signaling pathways during
   the infection of HBV to identify the potential biomarker for later drug
   discovery repurposing. The grey blocks in the shape of a cloud indicate
   the online databases for constructing candidate HPI-GEN and later drug
   repositioning; the white diamond blocks are the corresponding data
   types of each database; the orange diamond blocks signify candidate
   protein-protein interaction network (PPIN) and gene regelation network
   (GRN); the blue blocks represent the input information including the
   raw RNA-seq data for system identification and the collected articles
   for BioBERT to augment the interspecies interactions and regulations in
   PPIN and GRN, respectively; the red blocks denote the models comprising
   BioBERT and stochastic regression models utilized in the systems
   biology approach; the white rectangle blocks indicate the systematic
   methods applied to constructing the candidate HPI-GEN and extracting
   core HPI-GEN; the purple blocks imply the drug discovery and
   repositioning procedure proposed afterwards; and the yellow rounded
   rectangular blocks are the real HPI-GEN and core HPI-GEN during the
   infection of HBV.
Table 1.
   The total number of nodes in candidate interspecies host/pathogen
   interspecies genetic and epigenetic interaction network (HPI-GEN) and
   identified HPI-GEN during Hepatitis B Virus (HBV) infection.
       Node      Candidate HPI-GEN Real HPI-GEN
    LncRNA node         143            137
    MiRNA node          922            922
      TF node          2477            1325
   Receptor node       2105            2105
    Virus node           4              4
   Protein node       13,465          12,258
    Total node        19,116          16,751
   [110]Open in a new tab
Table 2.
   The total number of edges in candidate interspecies HPI-GEN and
   identified interspecies HPI-GEN during HBV infection.
         Edge        Candidate HPI-GEN Real HPI-GEN
     LncRNA-LncRNA           0              0
     LncRNA-miRNA            0              0
       LncRNA-TF            156             9
    LncRNA-Receptor          2              0
    LncRNA-Protein          90              8
     LncRNA-Virus            8              1
     MiRNA-LncRNA           510            303
      MiRNA-MiRNA           96              86
       MiRNA-TF           42,913           7909
    MiRNA-Receptor        26,888           9694
     MiRNA-Protein        146,144         65,561
      MiRNA-Virus           39              15
       TF-LncRNA            231            157
       TF-MiRNA            1439            944
         TF-TF            33,426           8061
      TF-Receptor         14,948           6500
      TF-Protein          76,441          39,574
       TF-Virus             54              38
    Receptor-LncRNA         26              15
    Receptor-MiRNA          143            102
      Receptor-TF          2364            644
   Receptor-Receptor       1490            693
   Receptor-Protein        7755            4560
     Virus-LncRNA            5              3
      Virus-MiRNA           109             80
       Virus-TF             509             77
    Virus-Receptor          164             72
     Virus-Protein          682            276
         PPIs            4,039,657       906,428
      Total edge         4,396,289      1,051,810
   [111]Open in a new tab
Figure 2.
   [112]Figure 2
   [113]Open in a new tab
   Real HPI-GEN between host and pathogen under infection of HBV. The real
   HPI-GEN describes the cross-talk interaction and regulation between
   host and pathogen after system identification analysis to prune false
   positives from candidate HPI-GEN. The purple lines denote PPIs; the
   green, yellow, blue, and red lines indicate the regulations by
   Transcriptional factors (TFs), lncRNAs, miRNAs, and Virus proteins
   respectively; and the total numbers of lncRNAs, miRNAs, proteins,
   receptors, Virus proteins, and TFs in the network are 137, 922, 12,258,
   2015, 4, 1325, respectively.
Figure 3.
   [114]Figure 3
   [115]Open in a new tab
   Core HPI-GEN under infection of HBV. The core HPI-GEN denotes the
   cross-talk interactions between host and pathogen after performing PNP
   to extract the top 3000 nodes from the real HPI-GEN. The purple lines
   denote PPIs; the green, yellow, blue, and red lines indicate the
   regulations by TFs, lncRNAs, miRNAs, and Virus proteins, respectively;
   and the total numbers of lncRNAs, miRNAs, proteins, receptors, Virus
   proteins, and TFs in the network are 6, 262, 2086, 332, 4, 312,
   respectively.
Table 3.
   The pathway enrichment analysis of proteins by applying the DAVID in
   core HPI-GEN during HBV infection.
              Term            Numbers    p-Value
      MicroRNAs in cancer       103   8.6 × 10^−18
       Pathways in cancer       108   8.7 × 10^−10
     FoxO signaling pathway     49    5.6 × 10^−9
      Viral carcinogenesis      63    6.3 × 10^−8
          Hepatitis B           47    7.3 × 10^−7
           Cell cycle           39    1.6 × 10^−5
     MAPK signaling pathway     64    6.2 × 10^−5
     TNF signaling pathway      33    1.3 × 10^−4
     Ras signaling pathway      55    6.2 × 10^−4
   PI3K-Akt signaling pathway   77    7.6 × 10^−4
   [116]Open in a new tab
Figure 4.
   [117]Figure 4
   [118]Open in a new tab
   The core host/pathogen cross-talk signaling pathways during HBV
   infection. This is a network obtained from the core HPI-GEN by the
   annotation of KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways,
   which represents an intracellular pathogenic mechanism during the
   invasion of host hepatocyte by HBV. The solid black lines indicate the
   regulation (with an arrowhead) and interaction between nodes. The solid
   lines in different colors except black represent the regulations of the
   corresponding cellular functions; among them, the lines with an
   arrowhead denote the activation of cellular functions while the lines
   with a circular head indicate the inhibition of cellular functions.
   Furthermore, for the nodes of downstream target genes, the blocks with
   an upward pointing arrow signify a rising expression; comparatively,
   the blocks with a downward pointing arrow signify a dropping expression
   under HBV infection.
   On the other hand, after determining the biomarkers from signaling
   pathways of the core HPI-GEN as shown in [119]Table 4, we proposed a
   deep learning-based drug-target interaction (DTI) method to predict the
   potential interactions (dockings) between the identified biomarkers and
   their corresponding drugs for further candidate drugs discovery. The
   outcome of the DTI model is presented in a form of probability. The
   higher probability analyzed by the DTI model, the connection between
   the selected biomarker and the predicted drug is more likely to exist.
   However, given the enormous number of candidate drugs, the general drug
   specifications such as regulation ability, toxicity, and sensitivity
   from multiple databases were taken into consideration to reduce the
   search space and ensure the safety and efficacy in clinical trials. The
   pertinent flow-process diagram of the systems molecular drug design
   procedure is shown in [120]Figure 5.
Table 4.
   The drug targets identified for HBV infection.
      Disease                      Drug Targets
   HBV infection AKT1, NFKBIA, EIF4EBP1, HDAC6, STAT1, STAT3, TP53
   [121]Open in a new tab
Figure 5.
   [122]Figure 5
   [123]Open in a new tab
   The flowchart of the proposed systems drug discovery and repositioning
   procedure. According to the potential biomarkers identified in
   [124]Figure 4, we propose a systems drug discovery and repositioning
   method for figuring out promising multiple-molecule drug to alleviate
   HBV infection. First, we collected data in respect to drug-target
   interaction from various online databases. Then, we turned each drug
   and target into descriptors and assembled them into a matrix of
   drug-target pairs. To keep our model from poor performance, data
   preprocessing prior to training is indispensable. Accordingly, we
   adopted data preprocessing approaches including down sampling, data
   partitioning, feature scaling, and dimensionality reduction (For more
   details, readers can refer to [125]Section 2.8.1 in Materials and
   Methods) based on the property of the data. After training the Deep
   learning-based drug-target interaction (DTI) model shown in [126]Figure
   6 with the constructed training data, we evaluated the model
   performance (as shown in [127]Table 5 and [128]Figure 7) and compared
   the model efficacy of DTI with that of other traditional machine
   learning (ML)-based methods (as shown in [129]Figure 8). Since there is
   only a neuron in the output layer which delivers the probability of
   whether the relation (docking) between a drug and a target exists, the
   DTI model enabled us to stress the importance on specific interactions
   (with output score approaching 1) and to identify candidate drugs.
   Next, concerning the availability of the predicted drugs (candidate
   drugs) via general criteria for drug design specifications such as
   regulation ability, toxicity, and sensitivity obtained from databases,
   we further filtered promising drugs that modulate multiple molecular
   targets and possess the efficacy with lower dosages. After all,
   low-dose drugs are more attainable and cause less harm to patients. In
   this regard, a set of multi-target drugs with appropriate toxicity for
   clinical investigations were selected as a multiple-molecule drug to
   overcome HBV infection.
3.2. Extracting Core Signaling Pathways from Identified HPI-GEN and Core
HPI-GEN in HBV Infection
   For the purpose of analyzing the molecular mechanism under HBV
   infection, extracting core signaling pathways from candidate HPI-GEN is
   critical. The process to construct candidate HPI-GEN and identify real
   HPI-GEN as well as core HPI-GEN has been shown in the flowchart of
   [130]Figure 1.
   Thereafter, we identified the real HPI-GEN via parameter identification
   concerning the cost of model complexity under the auspices of
   corresponding RNA-seq data to prune false positives from candidate
   HPI-GEN (refer to [131]Section 2.4, [132]Section 2.5, [133]Section 2.6
   for more details). The total numbers of the nodes of proteins,
   Transcription factors (TFs), miRNAs, lncRNAs, and Virus proteins as
   well as the edges of their interactions and regulations for the
   candidate HPI-GEN and identified real HPI-GEN are given in [134]Table 1
   and [135]Table 2, respectively. In addition, through employing the
   network visualizing software Cytoscape, the network of real HPI-GEN
   under HBV infection is shown in [136]Figure 2.
   Since the immense complexity of real HPI-GEN limits the possibility to
   efficiently investigate the pathogenic mechanism under HBV infection,
   applying principal network projection (PNP) and subsequently
   abstracting the top 3000 nodes with significant projection values on
   the 85% pivotal network structures to distill core HPI-GEN are
   essential. Meanwhile, the results of core HPI-GEN under the infection
   of HBV is shown in [137]Figure 3.
   Moreover, we employed DAVID Bioinformatics Resources version 6.8 to
   analyze the enrichments of relative pathways and specific cellular
   functions of core HPI-GEN as shown in [138]Table 3.
   Through the multifaceted assessment of the core HPI-GEN based on
   existing studies and the annotation of KEGG signaling pathways, we
   could attain the core signaling pathways associated with
   microenvironmental factors including cytokines, chemokines, etc., for
   the pathogenesis of HBV infection as represented in [139]Figure 4.
   Then, on the basis of the investigating the consequence of the
   systematic pathogenic molecular mechanism, we could identify the
   potential biomarkers holding great promise for the development of
   therapeutic targets.
3.3. Analysis of Core Interspecies Cross-Talk Pathways to Investigate
Host/Pathogen Offensive/Defensive Mechanism during HBV Infection
   Based on the core host/pathogen cross-talk signaling pathways in
   [140]Figure 4, several identified processes leading to dysfunction of
   antiviral machinery are investigated to afford a stepwise understanding
   of the host/pathogen offensive and defensive mechanism under HBV
   infection.
3.3.1. Inception of HBV Infection
   Hepatitis B Virus (HBV) has a 3.2-kb circular and partially
   double-stranded DNA genome that contains four genes named S, C, P, and
   X genes coding for the envelope proteins, the core protein and a
   related protein termed precore protein, the viral DNA polymerase, and
   the multifunctional regulatory protein, respectively [[141]66,[142]67].
   The journey of HBV virions starts from attaching to host hepatocytes
   through heparan sulfate proteoglycans (SDC2 and HSPG2) and promptly
   transfers to sodium acid cotransporter (SLC10A1), initiating the HBV
   life cycle. Once entering the cells, the viral nucleocapsid containing
   the partially double-stranded DNA, known as the relaxed circular DNA
   (rcDNA), will be released into the cytoplasm and transported into the
   nucleus. The plus strand of the rcDNA is then repaired and completed to
   generate the covalently closed circular (cccDNA), which later forms the
   HBV minichromosome and is transcribed into RNAs for viral replication
   [[143]68].
3.3.2. Liver Microenvironment and Immune Pathogenesis under HBV Infection
   Inflammatory and innate immune responses impact the pathogenesis of
   numerous diseases and participate in the activation of common
   inflammatory mediators and regulatory pathways. On top of the Toll
   pathways that could be directly stimulated by the viral particles, in
   response to the foreign invasions, leukocytes, e.g., neutrophils and
   macrophages that phagocytose and kill pathogens, might simultaneously
   coordinate additional host responses by synthesizing a wide range of
   inflammatory mediators, such as TNF-α, IL-6, IFN-α, and IFN-γ
   [[144]69]. These cytokines and chemokines provoke the secretion of
   antiviral effectors and a series of immunoregulatory mechanisms against
   HBV infection. In general, cellular and molecular events and
   interactions would effectively mitigate impending injury or infection
   during inflammatory responses [[145]70]. However, along with the
   frequent recurrence of inflammation, once imbalances of the
   microenvironment occur, it might lead to abnormal epigenetic variations
   and even immune dysregulation [[146]71], which consequently incurs
   viral relapse.
3.3.3. Toll Pathway as the First Line of Defense against Infection
   Toll-like receptors (TLRs) have long been recognized as the first line
   of antiviral immunity in that they initiate intracellular signaling
   pathways to induce antiviral mediators, such as interferons (IFNs) and
   other cytokines [[147]72]. Upon the recognition of foreign invaders
   (HBV-associated molecular patterns), TLRs elicited phosphorylation of
   IKKβ (IKBKB) through several signaling transduction proteins, including
   MyD88, IRAK1, TRAF6, TAB1, and TAK1. IκBα (NFKBIA) is a downstream
   protein of IKKβ keeping NF-κB sequestered in an inactive state in
   cytoplasm [[148]73]. When phosphorylated by IKKβ, IκBα dissociated from
   NF-κB, resulting in the nuclear translocation of NF-κB and the
   induction of its several targets, in which the Interleukin 6 (IL6), the
   interferon-alpha receptor 1 (IFNAR1), the interferon-gamma receptor 2
   (IFNGR2), and BCL3 were recognized as shown in [149]Figure 4. Among
   them, IL-6 is the pleiotropic cytokine encoded by IL6 gene that
   catalyzes immune reaction and inflammatory responses [[150]74,[151]75];
   IFNAR1 is the receptor that binds type I interferons and activates the
   JAK-STAT signaling pathway, along with MAPK, PI3K, and Akt signaling
   pathways associated with multiple cellular functions, such as
   inflammation and immune response regulation [[152]76]; IFNGR2 is a
   receptor that binds to interferon-γ (IFN-γ), which is the immune
   interferon produced predominantly by natural killer (NK) and natural
   killer T (NKT) cells in response to viral infection [[153]77]; and BCL3
   is a protein that inhibits DNA damage-induced apoptosis and acts as a
   transcriptional coregulator of NF-κB [[154]78,[155]79].
   In addition to stimulation of the downstream NF-κB pathway, the
   phosphorylated IKKβ also activated the pathway of mammalian target of
   rapamycin (mTOR) by triggering the disruption of TSC1/TSC2 complex. In
   general, TSC1 stabilizes TSC2 through direct binding, thereby
   preventing TSC2 from ubiquitination and degradation. Once IKKβ
   phosphorylates TSC1 incurring collapse of its connection with TSC2, it
   obliquely leads to the stabilization of RhebGTP, which mediates
   signaling to mTOR [[156]80,[157]81]. As for the substrates of mTOR,
   among all the identified downstream proteins in the core host/pathogen
   signaling pathways as shown in [158]Figure 4, the translational
   repressor 4EBP1 (EIF4EBP1) stood out to be a pivotal constituent. It is
   well acknowledged that when the 4EBP1 protein is hyperphosphorylated by
   mTOR kinase, it can’t bind and hijack the eIF4E factor, i.e., a general
   initiator for translational regulation [[159]82,[160]83]. Concurrently,
   it is reported that the released eIF4E might form a complex with the 5′
   stem-loop structure of the HBV genome, thus ensuring viral replication
   [[161]84].
3.3.4. TNF-α-Stimulated Signaling Pathways
   TNF-α is a multifunctional cytokine that plays a principal role in cell
   proliferation, apoptosis, inflammation, and immune response. From the
   core host/pathogen cross-talk signaling pathways in [162]Figure 4, as
   soon as TNF-α bound to TNF receptors (TNFRs), it contributed to the
   phosphorylation of Akt via both TNFR2/TRAF2/PIK3CB and
   TNFR1/TRADD/TRAF2/PIK3CB signaling transductions. Akt is an effector of
   Class I PI3K triggering multiple downstream pathways and has been shown
   to be a critical mediator of cell proliferation and survival in a
   variety of cell types [[163]85]. In [164]Figure 4, a few pathways of
   Akt were incorporated into the core host/pathogen cross-talk network.
   First of all, the activated Akt inhibited the induction of autophagy
   through transactivating mTOR, i.e., impeding GTPase-activating protein
   complex TSC1-TSC2 as previously described. It is worth noting that TSC2
   rather than TSC1 was directly phosphorylated by Akt, which destabilized
   TSC2 and disrupted its affinity with TSC1 [[165]86]. Secondly, Akt also
   enhanced MDM2-mediated ubiquitination leading to p53 (TP53) degradation
   [[166]87] and catalyzed the nuclear localization and stabilization of
   BCL3 by phosphorylation [[167]79], where BCL3 is the aforementioned
   transcription coregulator having been proven to interact with NF-κB so
   as to facilitate its transcriptional ability. Moreover, it even caused
   nuclear exclusion and cellular dysfunctions of FoxO proteins through
   phosphorylation and repression [[168]88]. Therefore, as the identified
   target genes of FoxO proteins in the core host/pathogen cross-talk
   pathways, BNIP3, BMI1, SOD2, and cyclin-dependent kinase inhibitor 1
   (CDKN1A) encoding p21, might probably be down-regulated. Within, BNIP3
   is a pro-apoptotic factor that induces cell death through interacting
   with anti-apoptotic proteins [[169]89]; BMI1 is a critical epigenetic
   repressor that functions through chromatin remodeling and plays a
   central role in DNA damage repair [[170]90,[171]91]; and SOD2 is the
   antioxidant defense enzyme that participates in apoptotic signaling and
   oxidative stress regulation [[172]92]. However, it is well documented
   that FoxO1/FoxO3 could rapidly relocalize into the nucleus in response
   to STAT3 activation by IL-6, and the accumulation of FoxO1/FoxO3 in
   nuclei coincided with elevated expression of CDKN1A and other genes
   [[173]93]. This phenomenon could interpret the high expression of the
   target genes BNIP3, BMI1, SOD2, and CDKN1A, as well as the promotion of
   viral replication, since activator FOXO1 has also been found to bind to
   HBV DNA and activate its transcription [[174]94].
   Aside from the activation of PI3K/Akt pathway, in response to cytokine
   signals, TRAF2 also recruited TAK1/TAB3 complex and IKK complex, which
   is composed of IKKα, IKKβ, and IKKγ (IKBKG), bringing about the
   autophosphorylation-dependent activation of TAK1 [[175]95] and the
   activation of its substrates, such as IKKβ and MKK7. MKK7 then
   upregulated the transcriptional activity of c-JUN and ATF2 via
   activation of the JNK1. c-Jun is an AP-1 family transcription factor
   involved in the regulation of cell death and survival [[176]96] as well
   as inflammation and innate immune response [[177]97]. In [178]Figure 4,
   the induction of c-Jun is discovered to be associated with the
   overexpression of the proinflammatory cytokine receptor IL18R1
   essential for IL18 mediated signal transduction [[179]98], the ligand
   PD-L1 (CD274) regulating the cellular immune responses to prevent
   inflammation [[180]99], and the major cyclins CCND1 controlling the
   progression of a cell through the G1 phase by conjugating CDK4/6
   enzymes required for the synthesis of cell cycle [[181]100]. Besides,
   c-Jun not only modulated the expression of BCL3 by serving as either
   the coactivator of NF-κB or the activator of BCL3 gene to reinforce
   cell survival [[182]101,[183]102], but also induced miR-221 to
   interfere with the regulation ability of the histone deacetylase 6
   (HDAC6), which has been reported to both suppress NF-κB activation and
   control autophagosome maturation essential for ubiquitin-selective
   quality-control autophagy [[184]103]. Apart from that, c-Jun could even
   impact the modulation of Fas expression, which sparked off the loss of
   Fas-ligand (FasL)-mediated apoptosis by means of down-regulating the
   activity of p53.
3.3.5. TAK-STAT Signaling Pathways
   The Janus kinase (JAK)-signal transducer and activator of transcription
   (STAT) signaling pathway is a chain of interactions between proteins
   involved in various physiological processes, including immune function,
   cell growth, and cell death [[185]104]. The activation of STAT3 was
   mainly provoked by JAK1 while JAK1 was brought in the proximity of
   receptors including the Glycoprotein 130 (IL6ST) and the gamma
   interferon receptor 1 (IFNGR1), which receive the signal from IL-6 and
   IFN-γ, respectively. However, the activation of STAT1 primarily stemmed
   from the phosphorylation of TYK2, a member of JAK family that was
   previously described by other research groups [[186]105], by the alpha
   interferon receptor 1 (IFNAR1) residing in IFN-α stimulation. In spite
   of forming homodimers or heterodimers and translocating to the nucleus
   where they regulated the transcription of target genes, STAT3 and STAT1
   played different even in part opposite roles in the process of pathogen
   infection. STAT3 constitutive activity is essential for cell survival
   and the extent of inflammation. Among the identified STAT3 targets in
   the host/pathogen cross-talk signaling pathway in [187]Figure 4, an
   anti-inflammatory and anti-apoptosis factor referred to as PD-L1 and
   the major cyclin CCND1 involved in proliferation and cell cycle
   progression as mentioned previously were included. Additionally, STAT3
   could also inhibit apoptosis by suppressing Fas transcription and p53
   expression.
   In contrast, STAT1 promoted apoptosis by inducing the expression of
   CXCL10 and strengthened the antiviral ability through modulating the
   expression of GBP1. Amongst these two downstream targets, CXCL10 is a
   ligand for the receptor CXCR3 involved in chemotaxis, induction of
   apoptosis, and regulation of cell growth associated with a variety of
   human diseases including Hepatitis C [[188]106] and Endotheliitis
   [[189]107]. GBP1 is a binding protein which has been shown to provide
   broad host protection against different pathogen classes through
   expediting oxidative killing and delivering antimicrobial peptides to
   autophagolysosomes [[190]108]. Furthermore, STAT1 also negatively
   regulated the cell cycle though inducing the expression of the CDK
   inhibitors CDKN1A, a target of transcription factors FoxOs as stated
   previously.
   Albeit STAT3 was directly activated by JAK1, it could also be
   indirectly triggered through the cytoplasmic signaling cascade of
   SHC/ERK pathway composed of signaling proteins, i.e., SHC1, GRB2, and
   SOS1, and the classical MAPK/ERK pathway comprised of signaling
   proteins, i.e., HRAS, RAF1, MEK1, and ERK1. Except inducing the
   activation of STAT3, MAPK/ERK pathway also contributed to the
   MSK1-mediated phosphorylation of c-Jun and ATF2, the inhibition of
   FoxO3 nuclear translocation, and most importantly, the regulation of
   eIF4E activity having been elucidated chiefly via phosphorylation on
   serine 209 of eIF4E by other research groups [[191]83,[192]109], which
   ultimately proliferated viral replication. More exactly, this was
   carried out by mitogen-activated kinase2 (MNK2), the downstream kinase
   of ERK1.
3.3.6. TNFs-Induced Apoptotic Pathways
   As a component of the cell response to viral infection, apoptosis
   mediated by the sequential activation of caspases is indispensable for
   aborting the production and release of progeny virus. While diverse
   signaling pathways inducing apoptosis have been found, caspase-mediated
   proteolysis of downstream substrates is still a critical element of the
   execution pathway common to all forms of apoptosis [[193]110]. And
   among all the avenues conducive to apoptosis progression, the pathways
   contributing to the modulation of caspase-3 (CASP3), a frequently
   activated death protease that catalyzes the specific cleavage of many
   integral cellular proteins, occupied a key position and a majority of
   them began with the activation of FADD in [194]Figure 4.
   Induction of FADD, the upstream activating protein of caspase-10
   (CASP10), was mediated by not only the bifurcated TNFR1-TRADD signaling
   cascades but also the interaction with the FasL receptor (FAS) and the
   TNF-related apoptosis-inducing ligand receptor (DR5). Upon stimulation,
   FADD in turn recruited the initiator caspase caspase-10 to form the
   death-inducing signaling complex, which leads to activation of a
   caspase cascade and eventual cell-death. Having been shown to activate
   the executioner caspase-3 through direct processing, caspase-10 is
   described to be functional-independent to caspase-8 (CASP8) in
   initiating Fas- and tumor necrosis factor-related ligand-mediated
   apoptosis [[195]111,[196]112]. Based on the cleavage potential of
   caspase-3 associated with the dismantling of the cell and the formation
   of apoptotic bodies, the identified critical targets including Beclin-1
   (BECN1), DNA-PK (PRKDC), and the DNA fragmentation factor subunit alpha
   (DFFA) were specifically cleaved by caspase-3 with subsequent loss of
   its original capability. Among them, BECN1 is the component of the
   phosphatidylinositol-3-kinase (PI3K) complex that induces the
   appearance of autophagosome [[197]113]; PRKDC is the nuclear
   serine/threonine protein kinase that takes part in degraded DNA repair
   [[198]114]; and DFFA is a subunit of DFF. Further, according to
   previous study, whilst DFFA is cleaved, the DNA fragmentation factor
   subunit beta (DFFB) would dissociate from the cleaved fragments of DFFA
   and trigger both DNA fragmentation and chromatin condensation during
   apoptosis [[199]115].
   Additionally, it is worthwhile to note that Akt, including AKT1 and
   AKT2, has been reported to interact with X-linked inhibitor of
   apoptosis protein (XIAP) and protect it from ubiquitination and
   degradation [[200]116]. Once phosphorylated by Akt, the XIAP would
   exploit its second baculovirus IAP repeat domain (BIR2) to inhibit the
   apoptotic executioner caspase-3 [[201]117]. We believe that this
   identified episode carries a foreshadowing for the succeeding
   virus-induced malfunctions of programmed cell death.
3.3.7. Virus Proteins-Induced Pathogenic Mechanism
   Proteins encoded by the viral genomes, especially HBx, a 154 amino acid
   gene product of the X gene, have emerged as promiscuous transactivators
   activating the transcription of host genes by interacting directly with
   nuclear transcription factors or by obliquely activating various signal
   transduction pathways in the cytoplasm and have been proven to be
   potent epigenetic modifying factors in hepatocytes.
   HBx regulated the expression of BCL3 to subvert apoptotic signaling
   pathways via both synergistically cooperating with p300 (EP300) to
   enhance c-Jun transcriptional activity and activating NF-κB by either
   directly interacting with NF-κB for augmenting its binding ability to
   its target DNA sequence or promoting phosphorylation of IκB as shown in
   [202]Figure 4. In addition, apoptosis inhibition could also be induced
   by dysfunction of caspase-3 through enhancing the ability of Akt, an
   upstream kinase of caspase-3 as stated earlier, by HBx. Once caspase-3
   lost its function, in response to signal from damaged DNA, DNA-PKs, the
   substrate of caspase-3, would act in concert with XRCC4 and a number of
   tightly coupled proteins to repair DNA. Further, it was suggested that
   the intense activity or high expression of XRCC4 might be beneficial to
   the promotion of HBV cccRNA formation and thereby facilitate the viral
   replication [[203]118]. Since Hepatitis B Virus supremely requires
   components from the host cell to complete its replication cycles,
   merely one developed avenue to hijack host signaling pathways can’t
   meet its demand. In addition to the significant translation initiation
   factor eIF4E induced by HBx through Akt/mTOR pathway as just mentioned,
   eIF4E was activated via the MAPK/ERK pathway as HBx could also trigger
   Ras and its downstream signaling pathways. Hence, forming a node of
   convergence of the PI3K/Akt/mTOR and Ras/Raf/MAPK signaling pathways,
   the regulation of eIF4E is clearly consequential for HBx to promote
   viral replication. Additionally, overexpression of cyclin G1 (CCNG1)
   resulted from the loss of microRNA-122 (a liver-specific miRNA) induced
   by HBx could also augment virus proliferation. It was reported that
   CCNG1 specifically interacts with p53 and blocks its specific binding
   to HBV enhancer elements, which simultaneously abrogates p53-mediated
   inhibition of HBV transcription and releases cell proliferation from
   G1/S arrest [[204]119].
   p53 acts as a sequence-specific DNA binding protein that activates the
   transcription of substantial target genes associating with the
   regulation of cell cycle progression, apoptosis, DNA repair, and
   senescence. In [205]Figure 4, HBx interacted with p53 both directly and
   indirectly to inhibit its DNA consensus sequence binding and
   transcriptional activity, which has been reported to efficiently block
   p53-mediated cellular functions [[206]120,[207]121]. Apart from that,
   HBx also inhibited p53 by conjugating c-Jun and STAT3 as well as
   keeping MDM2, the principal cellular antagonist of p53 acting to limit
   the p53 growth-suppressive function [[208]122], from ubiquitin-directed
   self-degradation.
   As for the targets of p53, apoptosis-associated proteins, such as the
   proapoptotic protein Bax, which could permeabilize mitochondria and
   engage the apoptotic program giving rise to activation of caspase-3
   [[209]123,[210]124], the apoptosis antigen Fas triggering the Fas
   signaling pathway as mentioned earlier, and the DDIT4 negatively
   regulating cell growth, proliferation, and survival via inhibition of
   the activity of mTOR [[211]125] were identified in our core
   host/pathogen cross-talk signaling pathways in [212]Figure 4. Further,
   having been reported to participate in the generation of ROS and
   p53-dependent DNA damage [[213]126], DDIT4 was negative-regulated by
   the HBx-induced upregulation of microRNA-221 to promote aberrant
   proliferation. Such significant inverse correlation between DDIT4 mRNA
   levels and miR-221 expression has been observed in a mouse model of
   liver cancer as well [[214]127]. Besides, the direct suppression of
   histone deacetylases 6 (HDAC6) by HBx-induced miR-221 would result in
   autophagosome maturation failure; nevertheless, HBx yet also bound to
   Vps34 (PI3KC3) to enhance its activity, inducing the autophagic
   response crucial for the formation of autophagosomes as well as the
   intensification of viral replication.
   STAT1 and STAT3 are both critical elements in viral replication,
   despite acting as distinct roles from each other. Functional
   characterization of miR-338-3p indicated that miR-338-3p inhibited cell
   proliferation by inducing cell cycle arrest at the G1/S phase. The
   identified downregulation of miR-338-3p and tyrosine phosphorylation of
   STAT3 triggered by HBx might cause CCND1 overexpression beneficial to
   virus cell cycle progression [[215]128]. In addition, forced low
   expression of miR-338 by HBx also directly led to the increased level
   of autophagy. This process was mainly through the up-regulation of LC3
   by the released transcription factor ATF2, which has been identified as
   a direct target of miR-338 [[216]129]. Furthermore, instead of FoxO3,
   HBx appeared to assist FoxO1 in counteracting the reduction of its
   nuclear relocalization arising from Akt phosphorylation by inducing
   STAT3 activation, which is conducive to the transcription of cccDNA as
   previously described. In contrast, since interferons (IFNs) activated
   STATs via phosphorylation [[217]130], once recruited by HBx, histone
   acetyltransferase (HAT) CBP (CEBBP) might dynamically regulate STAT1
   acetylation to counteract IFN-catalyzed STAT1 phosphorylation, nuclear
   translocation, DNA binding, and target gene regulation. On the other
   hand, STAT1 nuclear translocation induced by IFN-α might also be
   deprived by HBV polymerase [[218]131]. Previous studies demonstrated
   that miR-122 regulates type I IFN expression through both
   down-regulating SOCS1 and SOCS3 to enhance interferon-mediated
   suppression of HBV [[219]132,[220]133]. Nonetheless, it appears that
   merely the regulation of SOCS1 by miR-122 was identified in [221]Figure
   4. Finally, as a target gene of FoxOs and STAT1, CDKN1A overexpression
   was mediated by HBx-induced upregulation of miR-29a. Serving as a
   direct upstream of HDAC4, miR-29a could thereby alleviate the
   deacetylation of CDKN1A through HDAC4 inhibition.
3.4. Drugs Discovery and Repositioning Based on Selected Biomarkers for HBV
   Through reviewing the cellular dysfunctions triggered by the core
   host/pathogen cross-talk interspecies signaling pathways under HBV
   infection shown in [222]Figure 4, we selected the significant
   biomarkers including AKT1, NFKBIA, EIF4EBP1, HDAC6, STAT1, STAT3, and
   TP53 as drug targets shown in [223]Table 4. However, there is still a
   long way to the clinical trials on account of the intrinsic
   difficulties for drug design and development. To this end, we proposed
   a systematic drug discovery and repositioning approach to uncover the
   multiple-molecule drug for therapeutic treatment based on the deep
   learning and data mining technique as shown in [224]Figure 5.
3.4.1. Deep Learning-Based Drug-Target Interaction Prediction Model
   To effectively predict whether the identified targets interact with
   some existing drugs, we provide a deep learning-based DTI model as
   shown in [225]Figure 6. It is a fully-connected neural network
   containing an input layer and five layers with weights; four hidden
   layers (512, 256, 128, and 64 neurons in each layer) and an output
   layer (1 neuron). The intermediate hidden layers are applied with
   rectified linear unit (ReLU) activation function that thresholds
   negative signals to 0 and passes through positive signal. This type of
   activation function allows faster learning in comparison with
   alternatives, e.g., sigmoid or tanh unit [[226]134]. In contrast, the
   output of the last layer is fed to a sigmoid function which produces a
   likelihood probability (0,1) score of the corresponding predicted
   docking where a higher value symbolizes a more reliable drug-target
   interaction. In addition, to further reducing overfitting, we
   incorporated a dropout layer [[227]135] with a probability of 0.5
   following the output of each hidden layer.
Figure 6.
   [228]Figure 6
   [229]Open in a new tab
   Configuration of the deep learning-based DTI model. v is the vector of
   each drug-target pair for the input layer;
   [MATH: I :MATH]
   is the total number of drug features; and
   [MATH: J :MATH]
   is the total number of target features.
   For avoiding a poor performance due to assessing the model with a data
   set unfairly sampled (outliers or other anomalies are presented in one
   set), we introduced 10-fold cross validation measure to authenticate
   the learning effectiveness as shown in [230]Table 5 and the
   corresponding learning curves that plot the learning performance of DTI
   model over experience and time are also presented in [231]Figure 7.
   Note that since the model has been diagnosed well-fit and continuous
   training will likely lead to an overfit, it automatically stops
   learning at the epoch of 70. Subsequently, via calculating the loss and
   accuracy of each DTI model yields on the test set, we received an
   average accuracy of 92.6 (%) with the standard deviation of 0.294 (%).
Table 5.
   Model performance (10-fold cross validation).
   Validation Loss Validation Accuracy (%) Testing Loss Testing Accuracy
   (%)
   1 0.254831 92.09 0.236721 92.28
   2 0.237422 91.89 0.2264 92.04
   3 0.228304 92.35 0.226482 92.49
   4 0.227555 92.62 0.228102 92.6
   5 0.223151 92.55 0.218528 92.74
   6 0.223817 93.04 0.227157 92.77
   7 0.239133 92.39 0.225822 92.58
   8 0.235536 92.93 0.226848 92.8
   9 0.22345 93.1 0.218475 93.11
   10 0.235327 92.21 0.219082 92.64
   Average 0.232853 92.517 0.225362 92.605
   Standard deviation 0.00983 0.409527 0.005559 0.294099
   [232]Open in a new tab
Figure 7.
   [233]Figure 7
   [234]Open in a new tab
   Training and Validation Learning curves (10-fold cross validation).
   “-x-” lines in different colors denote the training accuracy and loss,
   while “-o-” lines in different colors represent the validation accuracy
   and loss in (a,b), respectively. The bold lines in red and blue
   indicate the model’s average loss and accuracy of training and
   validation in (a,b), respectively. Moreover, since stopping the
   training of the neural network early before it has overfitted the
   training dataset can reduce overfitting and improve the generalization
   of the DTI model, Early Stopping approach is additionally employed to
   automatically stop the learning process at the epoch of 70.
   Moreover, with the increasement of experimental data, numerous machine
   learning approaches such as Random Forest (RF) [[235]136], Support
   Vector Machine (SVM) [[236]137], Nearest Neighbor [[237]138], etc. have
   been applied to predict drug-target interactions. Therefore, to
   evaluate the model performance, we contrast our DTI model with a few of
   them, which are RF, Nearest Neighbor, and SVM. By adopting the
   visualization tool, Receiver Operating characteristic (ROC), we can
   clearly find that DTI outperforms other traditional binary classifiers
   as presented in [238]Figure 8.
Figure 8.
   [239]Figure 8
   [240]Open in a new tab
   The Receiver Operating Characteristic (ROC) curve of models based on
   different methods. The accuracy of each method is calculated by the
   area under the corresponding ROC curve (AUC) (Readers can refer to
   [241]Section 2.8.3 for further illustration), where the DTI model could
   achieve an accuracy of 0.99, which is higher than that of Random Forest
   (RF) (accuracy of 0.70), Nearest Neighbor (accuracy of 0.67), and
   Support Vector Machine (SVM) (accuracy of 0.57).
3.4.2. Systems Discovery and Design of the Multiple-Molecule Drug for HBV
Infection
   On the basis of the pre-trained DTI model, we could evaluate the
   docking ability between the identified biomarkers and the available
   drugs. The larger predicted result we obtain, the higher probability
   that the specific drug-target pair will have interaction. Namely, the
   candidate drugs would be sifted out owing to holding higher probability
   (with output score approaching 1) to interact with the identified
   biomarkers. However, the exploration of promising drugs from a large
   quantity of the candidates (predicted drugs) is still arduous.
   Accordingly, in order to narrow down the scope of the possible options
   and improve the reliability of the ultimate selected drug combination,
   general criteria for drug design specifications including regulation
   ability, toxicity, and sensitivity were applied to further filter out
   unavailable or inappropriate medications.
   Under the criteria of the pharmacological properties, the regulation
   ability value for each drug was extracted from the LINCS L1000 level 5
   dataset consisting of results from 978 genes of 75 cell lines treated
   with 19,811 small molecular compounds. By referring to LINCS L1000, we
   can examine whether a specific gene was up-regulated (positive values)
   or down-regulated (negative values) after being treated with a small
   molecular compound. Screening through a positive-regulation of NFKBIA,
   EIF4EBP1, HDAC6, STAT1, and TP53 and the negative-regulation of AKT1
   and STAT3, the remaining drugs with agreeable regulation ability for
   the treatment of HBV-infected patients are shown in [242]Table 6.
Table 6.
   The candidate multi-target drugs for the identified biomarkers.
   AKT1
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   ribavirin −1.5283 1.9876 −1.18823
   tacrolimus −0.5897 2.7541 −1.50335
   sorafenib −0.3125 2.7885 −0.43586
   tenofovir −0.6152 2.955 −0.63588
   meloxicam −0.2726 3.4619 −0.68734
   NFKBIA
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   chlortalidone 0.4056 1.8623 0.05198
   busulfan 2.501 2.3207 −0.44631
   rifaximin 0.681 2.6259 0.03563
   sorafenib 3.5635 2.7885 −0.43586
   meloxicam 3.2603 3.4619 −0.68734
   EIF4EBP1
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   chlortalidone 0.4153 1.8623 0.05198
   busulfan 0.7491 2.3207 −0.44631
   rifaximin 1.7372 2.6259 0.03563
   sorafenib 0.4733 2.7885 −0.43586
   dactinomycin 0.2923 4.3767 0.00307
   HDAC6
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   argatroban 0.5717 1.5976 −0.01405
   chlortalidone 0.6495 1.8623 0.05198
   entecavir 0.5388 2.3879 0.03615
   nutlin-3 0.1749 2.529 −0.6695
   meloxicam 0.1047 3.4619 −0.68734
   STAT1
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   chlortalidone 0.234 1.8623 0.05198
   ribavirin 0.982 1.9876 −1.18823
   zafirlukast 0.2615 2.5723 −0.24097
   sorafenib 0.5799 2.7885 −0.43586
   hydroflumethiazide 1.5238 3.1299 0.3949
   STAT3
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   argatroban −1.2902 1.5976 −0.01405
   ribavirin −0.8934 1.9876 −1.18823
   lamivudine −0.3132 2.1348 −1.33248
   zafirlukast −0.5053 2.5723 −0.24097
   tenofovir −1.5088 2.955 −0.63588
   hydroflumethiazide −0.7348 3.1299 0.3949
   meloxicam −1.6727 3.4619 −0.68734
   TP53
   Drug Regulation ability (L1000) Toxicity (LD50, mol/kg) Sensitivity
   (EC50, nM)
   acetylcysteine 1.0705 1.294 −1.2170
   busulfan 2.2886 2.3207 −0.44631
   nutlin-3 1.6295 2.529 −0.6695
   sorafenib 0.5683 2.7885 −0.43586
   calcitriol 0.4321 5.1352 0.28577
   [243]Open in a new tab
   Subsequently, to further narrow down the possibilities for drug
   combinations, the drug toxicity specification (LD50) recorded in
   DrugBank as well as the drug sensitivity specification (EC50) obtained
   from the PRISM dataset, which is a dataset comprising sensitivity
   values of 4518 drugs tested across 578 human cell lines, were also
   adopted. Being the numeric index of lethality, median lethal dose
   (LD50) plays a pivotal role in drug safety evaluation [[244]139]. The
   drug with lower LD50 possesses higher toxicity, and thus it is more
   detrimental. On the other hand, half maximal effective concentration
   (EC50) is used as a measurement of concentration, which is often
   adopted to evaluate the suitability and the efficacy of drugs, where
   lower EC50 indicates the stronger inducibility, i.e., higher
   sensitivity. Based on these concepts, the drug combination comprising
   Sorafenib, Nutlin-3, and Tenofovir as shown in [245]Table 7 was then
   designed as the multiple-molecule drug to rejuvenate the dysfunctions
   in the pathogenic mechanism under HBV infection with minor collateral
   tissue damage.
Table 7.
   Potential multiple-molecule drug and the corresponding target genes for
   HBV infection therapy.
   Targets AKT1 NFKBIA EIF4EBP1 HDAC6 STAT1 STAT3 TP53
   Drugs
   Sorafenib O O O O O
   Nutlin-3 O O
   Tenofovir O O
   Chemical structures of multi-target drugs
   Sorafenib Nutlin-3 Tenofovir
   graphic file with name biomedicines-08-00320-i001.jpg graphic file with
   name biomedicines-08-00320-i002.jpg graphic file with name
   biomedicines-08-00320-i003.jpg
   [246]Open in a new tab
   O indicates the docking between each multi-target drug composing the
   multiple-molecule drug and their corresponding drug target.
   Among them, Sorafenib is a multikinase inhibitor drug having been
   approved for several tumor types, including Hepatocellular Carcinoma
   (HCC) [[247]140]. Although prominent effects of Sorafenib for apoptosis
   induction were recognized, several limitations have been indicated by
   numerous medical studies upon applying Sorafenib to various cancers as
   a single agent. Therefore, the use of Sorafenib in combination with
   other targeted agents appears to be the main strategy for combatting
   drug resistance and has been widely explored with promising results
   [[248]141]. Falling under the category of small molecule inhibitor,
   Nutlin-3 is conducive to the restoration of p53 pro-apoptotic ability,
   leading to the activation of cell cycle arrest and apoptosis pathways
   [[249]142]. Though it is a novel antitumor compound, which hasn’t been
   clinically approved, the synergistic effect of Nutlin-3 combined with
   other agents contributes to diminishing of the dose required in
   monotherapy and decreasing the occurrence of adverse drug events
   [[250]143,[251]144]. Being a potent nucleotide analogue
   reverse-transcriptase inhibitor, Tenofovir is highly active against
   human immunodeficiency virus (HIV). In spite of its clinical use
   associated with the risk of kidney injury [[252]145], it has been shown
   highly effective to patients who have never undergone an antiretroviral
   therapy [[253]146]. Owing to its low toxicity, trifle side effects, and
   subtle drug resistance compared to other antivirals, Tenofovir is
   thereby frequently used as a member of combined drugs against HBV to
   ensure stronger antiviral activity and a more favorable safety profile
   [[254]147,[255]148]. It is worth noting that while therapy with
   Tenofovir is commonly used to treat Hepatitis B, life-long therapy is
   needed, which often increases the risk of relapse. In this respect, the
   combination of other viable drugs with Tenofovir may have a chance to
   improve the therapeutic effect and shorten the course of therapy.
   Taken together, by administering the proposed multiple-molecule drug,
   an up-regulation of NFKBIA, EIF4EBP1, HDAC6, STAT1, and TP53
   accompanied by the down-regulation of AKT1 and STAT3 can plausibly be
   attained, yielding encouraging results for the treatment of
   HBV-infected patients.
4. Discussion
   Traditional drugs commonly used for treatment in patients afflicted
   with HBV are interferon and nucleoside analogues. However, the low
   therapeutic response of patients to interferon and the inevitable
   development of drug resistance to nucleoside analogues render the
   clinical treatment challenging [[256]149]. In addition, serious toxic
   accumulation of the nucleoside analogues during long-term therapy for
   HBV infection is nonnegligible as well. Therefore, great efforts have
   been placed into seeking highly specific ligands affecting single
   target to alleviate HBV replication and to treat HBV-related
   complications. Yet, in view of the prohibitive development costs and
   timeline of a newly designed drug as well as the frequent failure of
   monotherapy due to drug resistance, growing interest has centered on
   discovering effective combinations of drugs for HBV treatment.
   In this study, we develop a medicine discovery and repositioning
   procedure in terms of system identification approaches via two-side
   RNA-seq data and deep learning-based framework considering drug design
   specifications to find a possible drug combination for the treatment of
   HBV infection. Through investigation of core HPI-GEN annotated with
   KEGG pathways, the core signaling pathways responsible for the
   abnormally regulated cellular functions were extracted and unified in
   [257]Figure 4. However, distinct cooperation between these pathways
   sparks diversified implication. Consequently, further discussion of the
   integrated effect is conducive to the selection of potential biomarkers
   in relation to the remedy of HBV infection.
4.1. Apoptosis in HBV Infection
   As a fundamental and highly regulated biological process, apoptosis is
   a form of programmed cell death that participates in cell resistance to
   viral infection. Although mounting evidence shows that HBx is uniquely
   endowed with the capability to regulate apoptosis, its dual role as an
   apoptotic mediator raises the difficulty to elucidate the pathological
   mechanism [[258]150].
   Caspase-3 is the typical hallmark of apoptosis [[259]110]. Numerous
   studies have shown that HBx abrogates the activity of caspase-3
   [[260]151,[261]152,[262]153], resulting in resistance to assorted
   apoptotic stimuli. In [263]Figure 4, HBx-mediated inhibition activity
   of caspase-3 was chiefly through both activating XIAP, the inhibitor of
   caspase-3, and directly or obliquely regulating transcription factors
   such as NF-κB and p53 to affect the expression of their downstream
   anti-apoptotic target, e.g., BCL3 and Bax. Further, even if previous
   researches did not specifically indicate the paramount downstream
   anti-apoptotic genes of NF-κB [[264]154,[265]155], BCL3, which hampers
   DNA damage-induced apoptosis and serves as a transcriptional
   coregulator of NF-κB [[266]78,[267]79], counted for NF-κB -mediated
   anti-apoptosis a lot in the pathogenic mechanism. Furthermore,
   inhibition of caspase-3 not only abrogated apoptotic killing triggered
   by the pathways in response to TNF-α, FasL, and TRAIL stimulation, but
   also contributed to the activation of autophagy, leading towards
   alleviated apoptosis coming from caspase-3 loss of its function to
   cleave Beclin-1.
   Asides from that, Fas, indispensable for the FasL-induced pathway that
   gives rise to the activation of caspase-3, was downregulated during HBV
   infection. Negative-expression of Fas might impede the sustainable
   anti-viral response of death ligand stimulus-specific apoptotic
   pathway. In contrast, as a target gene relevant to execution and
   completion of apoptosis, the overexpressed CDKN1A triggered by
   HBx-induced HDAC4 inhibition was concordant with the earlier research
   that HBx can relieve a block on CDKN1A expression and prolong G1→S
   transition in human hepatoma cells [[268]156,[269]157]. Even though
   p21, the protein encoded by CDKN1A, is treated as a cyclin-dependent
   kinase inhibitor capable of inhibiting all cyclin/CDK complexes, it
   often promotes the assembly of type-D cyclins with CDK4 and CDK6 and
   arrests cell cycle progression in the G1 phase [[270]156,[271]158],
   which not only protects cells against apoptosis but also enhances the
   chance of tumorigenesis.
4.2. Autophagy in HBV Infection
   Autophagy is a catabolic mechanism known to engulf long-lived proteins
   and damaged organelles via a lysosomal degradative pathway. Previous
   studies have suggested that HBV can hijack components in the autophagic
   regulatory pathways to promote its survival through augmentation of
   autophagosomes formation and inhibition of autolysosome maturation
   [[272]159,[273]160], which indicates the possibility of targeting the
   autophagic pathway for the treatment of Hepatitis B.
   HBx-catalyzed autophagic response, which is crucial for the formation
   of autophagosomes as well as intensification of viral replication,
   resided mainly in the increased activity of the Beclin-1/Vps34 complex
   accompanied by the unchanged mTOR activity and positive regulation of
   LC3 during HBV infection, which has also been reported in several
   research [[274]159,[275]160]. The phagophores undergo nucleation and
   elongation with Beclin-1/Vps34 complex and LC3 lipidation to form
   double-membraned autophagosomes [[276]161], and the autophagic
   membranes generated during infection are used for viral envelopment
   [[277]162]. However, rather than Beclin-1, only PI3KC3 was activated by
   the direct interconnection with HBx, which is in agreement with the
   study that the need for HBx to also induce the expression of Beclin-1
   is redundant [[278]163]. Moreover, although emerging study revealed
   that HBx efficiently impedes autophagic degradation by disturbing
   lysosomal acidification relevant to its degradative capacity without
   influencing on the fusion of autophagosomes and lysosomes [[279]160],
   HBx-induced reduction of HDAC6 regulation ability might lead to
   autophagosome maturation failure [[280]103]. Despite the necessity of
   further validation through biological experiments, this discovery
   provides possible mechanistic insight into the overall observation of
   HBV infection. Firstly, the intrinsic ability of HDAC6 to deacetylate
   NF-κB, having been detected to hamper the invasion of cancer
   [[281]164,[282]165], was destructed by HBx through transrepression,
   which might gradually create a potent microenvironment favorable to the
   malignant transformation of cells. Meanwhile, the negative-regulation
   of HDAC6 in human HCCs and consequential loss of its tumor suppression
   ability supported by previous research [[283]166] apparently advocate
   the hypothesis.
4.3. Inflammation and Innate Immune Response in HBV Infection
   Inflammation and innate immune response represent a highly coordinated
   process aimed at fighting infection or tissue injury. When cells are
   subject to foreign invasion, the inflammatory response can ensure
   successful resolution of the condition and the restoration of tissue
   homeostasis. However, inappropriately controlled natural defense
   mechanisms will eventually lead to the progression of chronic diseases
   [[284]167]. In general, innate immunity counts for a great deal in
   governing infection right after contact with the pathogen to limit the
   spread of the virus [[285]168]. However, the exploitation of immune and
   inflammatory signaling pathways enables viruses to subvert antiviral
   immunity and replicate in the hostile environment [[286]169].
   Of all the proteins involved in the activation and execution of
   inflammation response, NF-κB stands out as being crucial for this
   process in diverse metazoan organisms. During HBV infection,
   NF-κB-dependent transcription accelerated by HBx not only triggered the
   anti-apoptosis mechanism in favor of viral persistence as mentioned
   earlier but also augmented the inflammatory response leading to
   hepatitis and cell transformation in accordance with the indication of
   previous study [[287]170]. Concurrently, emerging research has also
   uncovered that the dysregulated continual synthesis of IL-6, an
   identified target of NF-κB, plays a pathological effect on chronic
   inflammation and autoimmunity [[288]167], which obliquely emphasizes
   the magnitude of the correlation between abnormally regulated NF-κB and
   excessive inflammation. Besides, regarding viral immunoregulation, even
   though it is contradictory that HBx on one hand sensitized cells to
   inflammatory stimuli, but on the other positive-regulated PD-L1 through
   transactivation of c-Jun and activation of STAT3, the findings are
   consistent with the idea that production of PD-L1 transcripts pertained
   positively to the intensity of liver inflammation to prohibit and
   thereby evade the host immune response [[289]171,[290]172].
   Moreover, unlike other proteins in their family which typically share
   common characteristics and functions, STAT1 and STAT3 poles apart and
   the mutual functional antagonism of them in T-cell-induced inflammation
   has also been elucidated [[291]173]. Despite both serving as the target
   of miR-122, SOCS1 was regulated by miR-122 and thus transactivated by
   HBx in the identified mechanism during HBV infection, while SOCS3 did
   not. Therefore, we suggest that the abrogation of STATs activity by HBx
   to attenuate the interferon-mediated suppression of infection depends
   on STAT1 instead of STAT3. Meanwhile, blockades of IL-6- and
   IFNs-stimulated immune signaling pathways by viral proteins were
   through STAT1 rather than STAT3, which also supports the idea from
   another perspective. In addition, activation of STAT3 mediated by HBx
   also bolstered its unshakeable status on maintaining viral persistence.
   Likewise, inactivation of STAT1 mediated by HBV proteins counteracted
   its effects on the provocation of the innate antiviral system. Such
   phenomena have also been revealed in the evidence of numerous studies
   [[292]174]. It is worth noting that the reported capacity of HBx to
   block STAT1 nuclear import by means of affecting its methylation status
   was partially through the communication with CBP in the identified
   host/pathogen cross-talk. In such a way, that STAT1 lost its ability to
   block cell cycle progression and hinder hepatoma cell transformation in
   a course of prolonged inflammatory injury. This led to continuous
   destruction and regeneration of hepatocytes, increasing the chances of
   genetic alterations.
4.4. Discovery of Potential Drug Combination
   Based on the significant biomarkers including AKT1, NFKBIA, EIF4EBP1,
   HDAC6, STAT1, STAT3, and TP53, the combination of agreeable
   multi-target drugs, i.e., Sorafenib, Nutlin-3, and Tenofovir, was
   eventually recommended as a potential treatment for later clinical
   researches of the multi-drug therapy treating HBV-infected patients.
   Meanwhile, to gain deeper insight into the discovery of promising
   drugs, in silico profiling using deep learning-based DTI model was
   performed to predict interactions (dockings) between available drugs
   and the identified biomarkers. Additionally, deciding whether a drug is
   ready for clinical trials greatly pertains to preclinical studies that
   yield preliminary efficacy, toxicity, safety, etc., information.
   Therefore, to further narrow down the scope and improve the reliability
   of the predicted drugs, the general pharmacological specifications for
   drug design including regulation ability, toxicity, and sensitivity
   were additionally adopted to evaluate the efficacy.
   However, the entire process of moving a drug to clinical trials is
   actually a long way to go. Better preclinical preparation and
   assessment can be beneficial to the approval of a new medicine or drug
   combination. Considering the common reasons for withdrawal of an
   approved drug, e.g., hepatotoxicity and adverse events as well as the
   safety and tolerability assessment of a drug such as HLA (human
   leukocyte antigen) test, is conducive to better finding an agreeable
   option for drug combination.
   Repositioning drugs to treat with both common and rare disease is
   gradually becoming an attractive proposition. Taking advantage of drug
   repositioning avenues provides a faster discovery and translation of
   clinically relevant drug combinations. Furthermore, aided with the
   expediting development of computational technology and low-cost
   sequencing approaches, mounting publicly accessible data will certainly
   open up a broad spectrum of drug discovery and repositioning.
   Consequently, by systematically integrating more extensive data and
   information for both drugs and targets, a more appropriate drug
   combination out of the scope of original medical indication can be
   unveiled to bring new hope to HBV therapeutics.
5. Conclusions
   The advent of the genomic eras has presented researchers with a myriad
   of high throughput biological data, which spark the interests in
   diverse biology applications. In this study, to investigate the
   pathogenetic mechanism under HBV infection, we constructed candidate
   host/pathogen interspecies genetic and epigenetic interaction network
   (HPI-GEN) by big data mining. Then, with the help of the two-side
   RNA-seq data, system identification strategies were applied to trim the
   false positives from the candidate HPI-GEN to obtain the real HPI-GEN.
   Thereafter, based on the extraction of PNP and the annotation of KEGG
   pathways, interspecies cross-talk signaling pathways are investigated
   from the real HPI-GEN for pathogenic mechanism under HBV infection to
   identify significant biomarkers. Moreover, in order to discover
   promising drugs for the identified drug targets, we trained a deep
   learning-based DTI model to predict possible drug-target interactions.
   Eventually, with the consideration of general drug design
   specifications including regulation ability, toxicity, and sensitivity,
   a combination of multi-target drugs as a potential multiple-molecule
   drug was selected to abate HBV infection. It is worth pointing out that
   although only pathogenesis for HBV infection was investigated in this
   work, our unique workflow can also be utilized on a wide variety of
   disease in view of systems biology, holding utility to aid in
   diversified drug discovery and repurposing process. Meanwhile, along
   with the development of NGS (Next-Generation Sequencing) technology,
   additionally integrating analysis from available genomics data into our
   pipeline can reinforce better downstream analysis and perform more
   exact biological interpretation. As more extensive genomic and
   pharmacological data being considered, the proposed pipeline could help
   reveal a more exhaustive host/pathogen offensive and defensive
   mechanism under HBV infection and facilitate the development of optimal
   therapy ensuring greater therapeutic outcome.
Appendix A
   In order to avoid overfitting in the parameter estimation process, the
   cubic spline interpolation method was employed to interpolate 4 time
   points into 1440 points. Even if the cubic interpolation couldn’t
   increase the information for the parameter estimation, it could indeed
   prevent the parameters estimation in Equations (13) and (14) and
   (27)–(30) from overfitting during the identification process.
Author Contributions
   Conceptualization, S.C. and B.-S.C.; Methodology, S.C. and B.-S.C.;
   Software, S.C. and B.-S.C.; Validation S.C. and B.-S.C.; Formal
   Analysis, S.C. and B.-S.C.; Investigation, S.C. and B.-S.C.; Data
   Curation, S.C.; Writing-original Draft Preparation, S.C.;
   Writing-Review and Editing, S.C., L.H.-C.W., and B.-S.C.;
   Visualization, S.C.; Supervision, B.-S.C.; Funding Acquisition, B.-S.C.
   All authors have read and agreed to the published version of the
   manuscript.
Funding
   This research was funded by Ministry of Science and Technology grant
   number MOST 107-2221-E-007-112-MY3.
Conflicts of Interest
   The authors declare no conflict of interest.
References