Abstract Addressing the complex relationship between public health and environmental exposure requires multiple types and sources of data. An important source of chemical data derives from high-throughput screening (HTS) efforts, such as the Tox21/ToxCast program, which aim to identify chemical hazard using primarily in vitro assays to probe toxicity. While most of these assays target specific genes, assessing the disease-relevance of these assays remains challenging. Integration with additional data sets may help to resolve these questions by providing broader context for individual assay results. The Comparative Toxicogenomics Database (CTD), a publicly available database that builds networks of chemical, gene, and disease information from manually curated literature sources, offers a promising solution for contextual integration with HTS data. Here, we tested the value of integrating data across Tox21/ToxCast and CTD by linking elements common to both databases (i.e., assays, genes, and chemicals). Using polymarcine and Parkinson’s disease as a case study, we found that their union significantly increased chemical-gene associations and disease-pathway coverage. Integration also enabled new disease associations to be made with HTS assays, expanding coverage of chemical-gene data associated with diseases. We demonstrate how integration enables development of predictive adverse outcome pathways using 4-nonylphenol, branched as an example. Thus, we demonstrate enhancements to each data source through database integration, including scenarios where HTS data can efficiently probe chemical space that may be understudied in the literature, as well as how CTD can add biological context to those results. Keywords: Data integration, ToxCast, CTD, Disease, Disease pathways 1. Introduction There are more than 80,000 chemicals registered for use in the United States with an estimated two thousand more introduced each year [[33]1]. The majority of these have not been adequately tested for their human health effects despite the etiology of many chronic diseases involving interactions between environmental factors, including chemicals, and genes and pathways modulating physiological processes [[34]2–[35]4]. To address this challenge, high-throughput screening (HTS) efforts like the Toxicology in the 21st Century (Tox21) federal research collaboration [[36]5] have been developed to automate in vitro biological assays and maximize efficiency of evaluating the activity of a large number of chemicals on a range of cellular processes. Members of the Tox21 collaboration seek to enhance the predictive capacity of toxicology studies and thereby improve efforts to protect human health and the environment. The goals of Tox21 include developing and improving models that predict biological responses to chemicals, identifying mechanisms of action that warrant further investigation, and prioritizing chemicals for further toxicological evaluation. Utilizing in vitro assays in a systematic, large-scale operation increases clarification of specific molecular endpoints compared to traditional animal toxicology studies. A major effort targeted at chemical prioritization is the Toxicity Forecaster (ToxCast) program [[37]6–[38]8], which is an ongoing, multiphase component of the U.S. Environmental Protection Agency’s (EPA) contributions to Tox21. ToxCast enables prioritization and profiling of chemicals of regulatory interest by their AC50/LEC (half-maximal activity concentration/lowest effective concentration) values or by mapping assay results onto canonical biochemical or physiological pathways by way of implicated genes. However, there is an urgent need to understand these data in a broader biological context, including their alignment with human disease or exposure implications. One way to accomplish this is to develop evidence-based associations between HTS results and broader biological resources. With the exponential growth in environmental health data, new databases and tools have been developed to enable analysis of disconnected datasets [[39]3]. The Comparative Toxicogenomics Database (CTD) [[40]9–[41]11] is one such publicly available database developed with the goal of advancing understanding about how environmental exposures affect human health. CTD accomplishes this goal by manually curating chemical-gene, chemical-phenotype, chemical-disease, and gene-disease relationships as well as exposure data from the biomedical literature. These data are integrated with functional and pathway data to inform hypotheses about the etiologies underlying environmentally influenced diseases. In addition, CTD also includes manually curated chemical-phenotype relationships for identifying pre-disease biomarkers associated with experimental and real-world environmental exposures [[42]10–[43]12]. Like many databases, the Tox21/ToxCast collaborative effort and CTD share objectives of better characterizing the role of chemicals on human health outcomes. Where Tox21/ToxCast assays generate evidence of a chemical affecting gene activity within an in vitro context, CTD curates evidence of chemical associations with genes, proteins or disease from diverse sources (e.g. model organisms, human populations, in vitro). Establishing methods for integrating Tox21/ToxCast results with curated data in CTD represents an initial step toward addressing well-known, long-term challenges facing the Tox21/ToxCast effort [[44]4]. These challenges include how to extrapolate HTS data to human health by correlating perturbation of genes, proteins or cellular-based phenotypes to human disease. Integration of HTS results could also address information gaps in CTD owing to the comparatively narrow range of chemicals reported in the literature [[45]12,[46]13]. By integrating HTS and environmental health data, the respective enhancements to each resource can be analyzed in a human health context. Here, we describe integration of HTS data with a broader environmental health resource using Tox21/ToxCast data and CTD. Through this data integration, we demonstrate the expansion of chemical-gene coverage. Using chemical-gene interactions associated with diseases and pathways in CTD as a case study, we describe the respective enhancements to each resource and demonstrate how this data integration can be used to identify new chemical-pathway and chemical-disease associations. We assess changes in coverage of chemical-gene information for diseases and pathways in CTD to quantify the value of this database integration and demonstrate how these integrative techniques can advance discovery through development of predictive chemical-pathway-disease frameworks. 2. Materials and methods 2.1. CTD data CTD data from the October 05, 2018 release were downloaded as .CSV files from the CTD website ([47]http://ctdbase.org/downloads). The disease vocabulary (MEDIC) contained 36 MEDIC-Slim categories encompassing 5,361 specific diseases [[48]14]. The chemical-gene interactions file included 12,984 chemicals and 46,755 genes and proteins with data recorded from 87,428 curated references [[49]15].