Abstract

   Heme is an iron ion-containing molecule found within hemoproteins such
   as hemoglobin and cytochromes that participates in diverse biological
   processes. Although excessive heme has been implicated in several
   diseases including malaria, sepsis, ischemia-reperfusion, and
   disseminated intravascular coagulation, little is known about its
   regulatory and signaling functions. Furthermore, the limited
   understanding of heme’s role in regulatory and signaling functions is
   in part due to the lack of curated pathway resources for heme cell
   biology. Here, we present two resources aimed to exploit this
   unexplored information to model heme biology. The first resource is a
   terminology covering heme-specific terms not yet included in standard
   controlled vocabularies. Using this terminology, we curated and modeled
   the second resource, a mechanistic knowledge graph representing the
   heme’s interactome based on a corpus of 46 scientific articles.
   Finally, we demonstrated the utility of these resources by
   investigating the role of heme in the Toll-like receptor signaling
   pathway. Our analysis proposed a series of crosstalk events that could
   explain the role of heme in activating the TLR4 signaling pathway. In
   summary, the presented work opens the door to the scientific community
   for exploring the published knowledge on heme biology.

   Keywords: heme, hemolytic disorders, signaling pathways, knowledge
   graphs, biological expression language

Introduction

   Heme is an iron ion-coordinating porphyrin derivative essential to
   aerobic organisms ([39]Zhang, 2011). It plays a crucial role as a
   prosthetic group in hemoproteins involved in several biological
   processes such as electron transport, oxygen transfer, and catalysis
   ([40]Smith and Warren, 2009; [41]Zhang, 2011; [42]Kühl and Imhof, 2014;
   [43]Poulos, 2014). Besides its indispensable role in hemoproteins, it
   can act as a damage-associated molecular pattern leading to oxidative
   injury, inflammation, and consequently, organ dysfunction ([44]Jeney,
   2002; [45]Wagener et al., 2003; [46]Dutra and Bozza, 2014). Plasma
   scavengers such as haptoglobin and hemopexin bind hemoglobin and heme,
   respectively, thus keeping the concentration of labile heme at low
   concentrations ([47]Smith and McCulloh, 2015). However, at high
   concentrations of hemoglobin and, consequently heme, these scavenging
   proteins get saturated, resulting in the accumulation of biologically
   available heme ([48]Soares and Bozza, 2016). With respect to hemolytic
   diseases, the formation of labile heme at harmful concentrations has
   been a subject of research for some years now ([49]Roumenina et al.,
   2016; [50]Soares and Bozza, 2016; [51]Gouveia et al., 2017).

   Biomedical literature is an immense source of heterogeneous data that
   are dispersed throughout hundreds of journals. Furthermore, the
   majority of the results are scattered and published as unstructured
   free-text, or at best, presented in tables and cartoons representing
   the experimental study or biological processes and pathways. These
   shortcomings, combined with the exponential growth of biomedical
   literature, prevent the healthcare community and individual researchers
   from being aware of all the available information and knowledge in the
   literature. With the introduction of new technologies and experimental
   techniques, researchers have made significant advances in heme-related
   research and its role in the pathogenesis of numerous hemolytic
   diseases such as sepsis ([52]Larsen et al., 2010;
   [53]Effenberger-Neidnicht and Hartmann, 2018), malaria ([54]Ferreira et
   al., 2008; [55]Dey et al., 2012), and β-thalassemia ([56]Vinchi et al.,
   2013; [57]Conran, 2014; [58]Garcia-Santos et al., 2017). In these
   diseases, large amounts of heme are released from ruptured erythrocytes
   and can potentially wreak havoc ([59]Tolosano et al., 2010). Thus, it
   is crucial to develop new strategies that capture and exploit the vast
   amount of literature knowledge surrounding heme to better understand
   its mechanistic role in hemolytic disorders.

   Biological knowledge formalized as a network can be used by clinicians
   as research and information retrieval tools, by biologists to propose
   in vitro and in vivo experiments, and by bioinformaticians to analyze
   high throughput -omics experiments ([60]Catlett et al., 2013; [61]Ali
   et al., 2019). Further, they can be readily semantically integrated
   with databases and other systems biology resources to improve their
   ability to accomplish each of these tasks ([62]Hoyt et al., 2018).
   However, enabling this semantic integration requires organizing and
   formalizing the knowledge using specific vocabularies and ontologies.
   Although this endeavor involves significant curation efforts, it is key
   to the success of the subsequent modeling steps. Therefore, in
   practice, knowledge-based disease modeling approaches have been
   conducted only for major disorders such as cancer ([63]Kuperstein et
   al., 2015) or neurodegenerative disorders ([64]Mizuno et al., 2012;
   [65]Fujita et al., 2014). In summary, while the scarcity of mechanistic
   information and the necessary amount of curation often impede launching
   the aforementioned approaches, modeling and mining literature knowledge
   provide a holistic picture of the field of interest. Furthermore, the
   underlying models derived from such approaches have a broad range of
   applications including hypothesis generation, predictive modeling and
   drug discovery.

   Here, we present two resources aimed at assembling mechanistic
   knowledge surrounding the metabolism, biological functions, and
   pathology of heme in the context of selected hemolytic disorders. The
   first resource is a terminology formalizing heme-specific terms that
   have until now not been covered by other standard controlled
   vocabularies. The second resource is a heme knowledge graph (HemeKG),
   that is, a network comprising more than 700 nodes and more than 3,000
   interactions. It was generated from 46 selected articles as the first
   attempt of modeling the knowledge, which is available from more than
   20,000 heme-related publications. Finally, we demonstrate both
   resources by analyzing the crosstalk between heme biology and the TLR4
   signaling pathway. The results of this analysis suggest that the
   activation profile for labile heme as an extracellular signaling
   molecule through TLR4 induces cytokine and chemokine production.
   However, the underlying molecular mechanism and individual pathway
   effectors are not fully understood and need further exploration.

Materials and Methods

   This section describes the methodology used to generate the mechanistic
   knowledge graph and its supporting terminology. Subsequently, it
   outlines the approach followed to conduct the pathway crosstalk
   analysis. A schematic diagram of the methodology is presented in
   [66]Figure 1.

FIGURE 1.

   [67]FIGURE 1
   [68]Open in a new tab

   The workflow used to generate the supporting terminology and HemeKG.
   The first step involves the selection of relevant scientific
   literature. Next, evidence from this selected corpus is extracted and
   translated into BEL to generate a computable knowledge assembly model,
   HemeKG. In parallel to the modeling task, a terminology to support
   knowledge extraction of articles about the heme molecule was built.
   Finally, HemeKG can be used for numerous tasks such as hypothesis
   generation, predictive modeling and drug discovery.

Knowledge Modeling

   In order to identify recently published articles (i.e., published in
   the last 10 years) describing the role of heme in hemolytic disorders,
   PubMed was queried with the following: (“heme” AND “hemolysis”) OR
   (“heme” AND “thrombosis”) OR (“heme” AND “inflammation”) AND
   (“2009”[Date – Publication]: “3000”[Date – Publication]). The resulting
   3,108 articles were manually filtered by removing articles that were
   deemed too general or lacked a biochemical focus, as judged by expert
   opinion. After this filtering step, 6 reviews and 40 original research
   articles were selected for knowledge extraction and modeling. Knowledge
   was manually extracted and curated from this selected corpus using the
   official Biological Expression Language (BEL) curation guidelines from
   [69]http://openbel.org/language/version_2.0/bel_specification_version_2
   .0.html and [70]http://language.bel.bio as well as additional
   guidelines from [71]https://github.com/pharmacome/curation.

   Evidence from the selected corpus was manually translated into BEL
   statements together with their contextual information (e.g., cell type,
   tissue and dosage information). For instance, the evidence
   “Heme/iron-mediated oxidative modification of LDL can cause endothelial
   cytotoxicity and – at sublethal doses – the expression of
   stress-response genes” ([72]Nagy et al., 2010) corresponds to the
   following BEL statement:
     * SET Cell = “endothelial cell”
     * a(CHEBI:“oxidised LDL”) pos bp(MESH:“Cytotoxicity, Immunologic”).

Generation of a Supporting Terminology

   During curation, a terminology was generated to support the
   standardization of domain-specific terminology encountered during the
   curation of articles related to the heme molecule. The aim of the
   terminology is to catalog and harmonize terms not present in other
   controlled vocabularies such as ChEBI ([73]Degtyarenko et al., 2007)
   for chemicals, or Gene Ontology [GO; ([74]Ashburner et al., 2000)] and
   Medical Subject Headings [MeSH; ([75]Rogers, 1963)] for pathologies.
   Thus, each term was checked by two experts in the field assisted by the
   Ontology Lookup Service [OLS; ([76]Cote et al., 2010)] to avoid
   duplicates with other terminologies or ontologies. Furthermore, we
   required that each entry included the following metadata: an
   identifier, a label, a definition, an example of usage in a sentence,
   and references to articles in which it was described. Furthermore, a