Abstract

Background

   Acute kidney injury (AKI) caused by drug and toxicant ingestion is a
   serious clinical condition associated with high mortality rates. We
   currently lack detailed knowledge of the underlying molecular
   mechanisms and biological networks associated with AKI. In this study,
   we carried out gene co-expression analyses using DrugMatrix—a large
   toxicogenomics database with gene expression data from rats exposed to
   diverse chemicals—and identified gene modules associated with kidney
   injury to probe the molecular-level details of this disease.

Results

   We generated a comprehensive set of gene co-expression modules by using
   the Iterative Signature Algorithm and found distinct clusters of
   modules that shared genes and were associated with similar chemical
   exposure conditions. We identified two module clusters that showed
   specificity for kidney injury in that they 1) were activated by
   chemical exposures causing kidney injury, 2) were not activated by
   other chemical exposures, and 3) contained known AKI-relevant genes
   such as Havcr1, Clu, and Tff3. We used the genes in these AKI-relevant
   module clusters to develop a signature of 30 genes that could assess
   the potential of a chemical to cause kidney injury well before injury
   actually occurs. We integrated AKI-relevant module cluster genes with
   protein-protein interaction networks and identified the involvement of
   immunoproteasomes in AKI. To identify biological networks and processes
   linked to Havcr1, we determined genes within the modules that
   frequently co-express with Havcr1, including Cd44, Plk2, Mdm2, Hnmt,
   Macrod1, and Gtpbp4. We verified this procedure by showing that
   randomized data did not identify Havcr1 co-expression genes and that
   excluding up to 10 % of the data caused only minimal degradation of the
   gene set. Finally, by using an external dataset from a rat kidney
   ischemic study, we showed that the frequently co-expressed genes of
   Havcr1 behaved similarly in a model of non-chemically induced kidney
   injury.

Conclusions

   Our study demonstrated that co-expression modules and co-expressed
   genes contain rich information for generating novel biomarker
   hypotheses and constructing mechanism-based molecular networks
   associated with kidney injury.

Electronic supplementary material

   The online version of this article (doi:10.1186/s12864-016-3143-y)
   contains supplementary material, which is available to authorized
   users.

   Keywords: Acute kidney injury, Toxicogenomics, Kidney co-expression
   modules, Gene signature, Havcr1, KIM-1, Frequently co-expressed genes,
   AKI pathways, Immunoproteasome, Cd44 ectodomain, AKI networks

Background

   Acute kidney injury (AKI) is a clinically relevant disorder associated
   with high rates of morbidity and mortality [[31]1]. It reportedly
   occurs in ~20 % of hospitalized patients and in 30–60 % of critically
   ill intensive care patients [[32]2, [33]3], increases mortality in
   military-relevant burn causalities [[34]4], and often progresses to
   chronic kidney disease [[35]3]. In the United States, the annual costs
   for hospital-acquired AKI are estimated to be greater than $10 billion
   [[36]5], and recent epidemiological studies show a trend towards
   increasing occurrence of AKI [[37]6–[38]8]. The lack of suitable
   biomarkers is a major hurdle in timely diagnosis of AKI, especially
   because drug-induced AKI is often reversible as long as drug use is
   discontinued.

   Functional markers, such as serum creatinine, blood-urea-nitrogen, and
   the volume of urine output, are currently used to diagnose AKI [[39]2,
   [40]9]. These markers have low sensitivity, lack specificity, result in
   delayed diagnosis, and hence, contribute to poor clinical outcomes
   [[41]2, [42]10]. Developing suitable pre-clinical markers that could
   aid in earlier identification of AKI will also help reduce the cost and
   time associated with advanced drug development [[43]11]. The Predictive
   Safety Testing Consortium recently addressed this issue and developed
   the first U.S. Food and Drug Administration (FDA)-approved AKI
   biomarker panel for use in pre-clinical studies [[44]12]. While this is
   a major advance in the field, our current knowledge of molecular
   mechanisms and networks associated with AKI remains incomplete
   [[45]13], and the search for new clinical AKI biomarkers is ongoing
   [[46]9, [47]14]. Thus, identifying new molecular-level insights of
   kidney injury will help us to not only understand the disease process
   but also to identify new candidate biomarkers.

   Extraction of network and pathway information by using in vivo gene
   expression datasets from thousands of chemical exposures and disease
   conditions can provide detailed insight into molecular injury
   mechanisms and identify the corresponding mechanism-based biomarkers
   [[48]15–[49]20]. The diversity and complexity of the in vivo response
   require specialized techniques to extract interpretable biological
   information. Here, we used the Iterative Signature Algorithm (ISA) to
   generate gene co-expression modules associated with AKI [[50]21,
   [51]22]. Co-expressed genes are hypothesized to participate in
   biological processes and pathways that are linked together, though not
   necessarily through gene co-regulation. The bi-clustering methodology
   identifies gene modules that are clustered together under a subset of
   conditions, such that modules that can be specifically associated with
   kidney disease conditions can be hypothesized to be linked to kidney
   injury mechanisms. In this formulation, individual genes can
   participate in more than one module, because the same gene will respond
   differently to different stimuli, leading to co-expression with
   different sets of genes [[52]23]. This is consistent with the concept
   of a molecular toxicity pathway, in which a limited number of pathways
   are differentially activated in response to different injury
   conditions.

   We investigated the DrugMatrix toxicogenomics database, which contains
   chemically induced gene expression changes and associated clinical
   chemistry and organ-specific histopathology endpoints in male Sprague
   Dawley rats [[53]24, [54]25]. The DrugMatrix kidney Affymetrix dataset
   has ~60 million in vivo gene expression data points associated with
   diverse chemical exposures. We have previously used DrugMatrix liver
   data to identify modules and networks associated with liver fibrosis
   and experimentally validated the identified liver fibrosis gene
   signature [[55]26–[56]28]. Fielden et al. utilized part of the
   DrugMatrix kidney data to develop a predictive gene signature for
   kidney injury [[57]29], but to our best knowledge, there are no reports
   on identifying co-expression modules associated with kidney injury from
   toxicogenomic data. Thus, our goal in this study was to identify
   co-expression modules associated with AKI. Our study was based on two
   major hypotheses: 1) a bi-clustering approach should be able to
   identify toxicity-relevant co-expression modules in an unsupervised
   manner, and 2) genes consistently co-expressed with known biomarker
   candidates should identify biological networks associated with kidney
   injuries. We identified AKI-relevant modules that were activated in
   chemical exposure conditions causing kidney injury and which contained
   well-known AKI-relevant genes such as Havcr1, Clu, and Tff3. We used
   the genes in these modules to 1) generate a signature for predicting
   kidney injury that performed better than well-known AKI biomarkers, and
   2) identify pathways and networks of extracellular matrix
   (ECM)-receptor interactions, glutathione metabolism, and p53 signaling
   pathways associated with kidney injury. Our network analysis identified
   the involvement of immunoproteasomes in AKI. To expand on the knowledge
   and pathways associated with the well-known AKI gene Havcr1 [also known
   as the kidney injury molecule-1 (KIM1)], we identified genes that were
   frequently co-expressed with Havcr1 under conditions that cause kidney
   injury. We also confirmed that the expression pattern of these genes
   was present in an independent dataset probing rat kidney injuries due
   to ischemia, not chemical exposures. These results illustrate the
   potential of our approach to identify molecular networks associated
   with toxic injury and as a potential source for biomarkers.

Methods

Data collection and preprocessing

   We used data from DrugMatrix, a publicly available toxicogenomics
   database that contains matched gene expression and histopathology data
   from Sprague Dawley rats after exposure to a range of chemicals at
   different doses and time intervals [[58]24]. We downloaded the
   DrugMatrix data from the National Institute of Environmental Health
   Sciences [[59]30] server and focused on the kidney data generated using
   the Affymetrix rat 230 2.0 GeneChip array. We followed the
   pre-processing protocol as described in our earlier study [[60]26,
   [61]31]. Briefly, we used the R/BioConductor package affy to perform
   robust multi-array average quantile normalization and the BioConductor
   package ArrayQualityMetrics to assess the quality of the microarray
   data [[62]32–[63]34]. We renormalized the data after removing 97 arrays
   that failed on at least one of the three statistical tests in
   ArrayQualityMetrics and were thus identified as outliers. We used the
   MAS5calls function in the affy package to obtain the “Present/Absent”
   calls for each probe set and removed the probe sets that were found to
   be “Absent” in all replicates across all chemical exposures [[64]31].

   We used the BioConductor genefilter package to remove genes without
   EntrezIDs or with low variance across chemical exposures [[65]35].
   After calculating the average intensity across the replicates of a
   chemical exposure condition, we computed log-ratios for each gene
   between treatments and their corresponding controls. In the current
   analysis, we focused on chemical exposures with greater than 1-day
   time-points. This resulted in a final log-ratio matrix of 9,222 genes
   and 220 chemical exposure conditions.

Generation of gene co-expression modules and module clusters

   We used the ISA implemented in the R/BioConductor package eisa to
   generate gene co-expression modules [[66]36]. The three key parameters
   of this algorithm are 1) starter seeds, 2) gene threshold, and 3)
   condition threshold. We used the R hierarchical clustering package
   Hclust along with the dynamicTreeCut package and generated 216 gene
   clusters [[67]37]. In line with our previous work, random gene sets
   were added to the hierarchical gene clusters and expanded to ~15,000
   gene sets [[68]28]. We used both the hierarchical gene clusters and
   expanded gene sets as the starter seeds.

   The ISA uses gene and condition threshold parameters for module
   generation. These two parameters affect the size and stringency of the
   modules; i.e., the higher their values, the smaller and more highly
   correlated are the modules, whereas the smaller their values, the
   larger and less-correlated the modules [[69]23]. We varied the
   parameter combination from 2.0 to 4.0 in increments of 0.5, and used 25
   combinations of these two parameters to generate the modules; i.e., for
   a gene threshold of 2.0, we analyzed the modules at condition
   thresholds of 2.0, 2.5, 3.0, 3.5, and 4.0. For each threshold
   combination, we normalized the log-ratio matrix, generated modules,
   filtered redundant modules, and ensured module robustness by using the
   ISAnormalize, ISAiterate, ISAunique, and ISAFilterRobust functions,
   respectively. We filtered out gene modules with more than 200 genes and
   an intra-module correlation of <0.4. Finally, we merged the modules
   generated by using all threshold combinations and removed any redundant
   modules with the ISAunique function. Using this procedure, we generated
   137 gene co-expression modules. Additional file [70]1: Script S1
   provides the R script used to generate the ISA modules.

   Overlap of genes and chemical exposures is permitted in the
   co-expression modules generated using the process above. Quantifying
   the level of overlap allowed us to group similar modules together.
   Toward this end, we calculated the overlap of gene and chemical
   exposures in modules, using the Dice coefficient [[71]38, [72]39].
   Equation [73]1 shows how we calculated the module overlap score
   (OS[A,B]) between two modules A and B.
   [MATH: <msub><mi mathvariant="normal">OS</mi><mrow><mi
   mathvariant="normal">A</mi><mo>,</mo><mi
   mathvariant="normal">B</mi></mrow></msub><mo>=</mo><mfenced close=")"
   open="("><mrow><mn>2</mn><mo>*</mo><mfrac><mrow><mo>∣</mo><mi
   mathvariant="normal">G</mi><mfenced close=")" open="("><mi
   mathvariant="normal">A</mi></mfenced><mspace
   width="0.25em"></mspace><mo>∩</mo><mspace width="0.25em"></mspace><mi
   mathvariant="normal">G</mi><mfenced close=")" open="("><mi
   mathvariant="normal">B</mi></mfenced><mo>∣</mo></mrow><mrow><mo>∣</mo><
   mi mathvariant="normal">G</mi><mfenced close=")" open="("><mi
   mathvariant="normal">A</mi></mfenced><mo>∣</mo><mspace
   width="0.25em"></mspace><mo>+</mo><mspace
   width="0.25em"></mspace><mo>∣</mo><mi
   mathvariant="normal">G</mi><mfenced close=")" open="("><mi
   mathvariant="normal">B</mi></mfenced><mo>∣</mo></mrow></mfrac></mrow></
   mfenced><mo>+</mo><mfenced close=")"
   open="("><mrow><mn>2</mn><mo>*</mo><mfrac><mrow><mo>∣</mo><mi
   mathvariant="normal">E</mi><mfenced close=")" open="("><mi
   mathvariant="normal">A</mi></mfenced><mspace
   width="0.25em"></mspace><mo>∩</mo><mspace width="0.25em"></mspace><mi
   mathvariant="normal">E</mi><mfenced close=")" open="("><mi
   mathvariant="normal">B</mi></mfenced><mo>∣</mo></mrow><mrow><mo>∣</mo><
   mi mathvariant="normal">E</mi><mfenced close=")" open="("><mi
   mathvariant="normal">A</mi></mfenced><mo>∣</mo><mspace
   width="0.25em"></mspace><mo>+</mo><mspace
   width="0.25em"></mspace><mo>∣</mo><mi
   mathvariant="normal">E</mi><mfenced close=")" open="("><mi
   mathvariant="normal">B</mi></mfenced><mo>∣</mo></mrow></mfrac></mrow></
   mfenced> :MATH]
   1

   Here, |G(A)| represents the count of genes in module A; |G(B)| that of
   genes in module B; |E(A)| that of chemical exposures in module A;
   |E(B)| that of chemical exposures in module B; |G(A)∩G(B)| that of
   genes in common between modules A and B; and |E(A)∩E(B)| that of
   chemical exposures in common between modules A and B. The overlap score
   varies between zero and two, where zero represents no overlap and two
   represents complete overlap between the two modules. We created a
   module overlap score matrix by calculating the module overlap scores
   between all 137 co-expression modules and performing hierarchical
   clustering by using the Hclust package in R. The generated dendrograms
   were cut at a specific height (h = 0.5), which resulted in 16 final
   module clusters.

Module cluster characterization

   We used activation scores and enrichment of known AKI-relevant genes to
   identify AKI-relevant module clusters. We followed the procedure
   described in our earlier work to calculate the module cluster
   activation scores [[74]26]. Briefly, we first normalized the values in
   the log-ratio matrix of each gene across 220 chemical exposures by
   converting them to Z-scores. The Z-score of gene i under chemical
   exposure condition j is given by
   [MATH: <msub><mi mathvariant="normal">Z</mi><mrow><mi>i</mi><mo
   mathvariant="italic">,</mo><mi>j</mi></mrow></msub><mspace
   width="0.25em"></mspace><mo mathvariant="italic">=</mo><mspace
   width="0.25em"></mspace><mfrac><mrow><msub><mi>X</mi><mrow><mi>i</mi><m
   o mathvariant="italic">,</mo><mi>j</mi></mrow></msub><mspace
   width="0.25em"></mspace><mo mathvariant="italic">-</mo><mspace
   width="0.75em"></mspace><msub><mi>μ</mi><mi>i</mi></msub></mrow><msub><
   mi>σ</mi><mi>i</mi></msub></mfrac><mtext>,</mtext> :MATH]
   2

   Where X [i,j] is the log-ratio value for gene i under chemical exposure
   condition j; μ [i] is the average log ratio for gene i across all 220
   chemical exposures; and σ [i] is the standard deviation of the log
   ratio for gene i across all 220 chemical exposures. Next, we defined
   eight phenotypes, two of which were histopathological phenotypes based
   on available kidney histopathological data, and the remaining six of
   which were chemical exposure classes based on the
   pharmacological/toxicological class of the chemicals. The
   histopathological phenotypes were kidney-cortex, tubule, necrosis (P1),
   and kidney-tubule regeneration (P2). In these two cases, we chose a
   chemical exposure to be representative of the phenotype if all of its
   replicates had a histopathology score of ≥2 for the given phenotype.

   We defined six chemical exposure classes (hepatotoxicants,
   fluoroquinolone antibiotics, epidermal growth factor receptor kinase
   inhibitors, estrogen modulators, statins, and fibrates) based on the
   pharmacological class, and chose the chemical exposures with the
   highest dose for each chemical within a chemical exposure class as
   representatives. We provide the chemical exposures used to define each
   phenotype/chemical exposure class in Additional file [75]2: Table S1.
   The activation score A [m,p] of module cluster m associated with
   phenotype p is the mean absolute value of the Z-score for all genes i
   in module cluster m across all conditions j in phenotype p and is given
   by
   [MATH: <msub><mi>A</mi><mrow><mi>m</mi><mo
   mathvariant="italic">,</mo><mi>p</mi></mrow></msub><mo>=</mo><mfrac><mn
   >1</mn><mrow><msub><mi>N</mi><mi>m</mi></msub><msub><mi>N</mi><mi>p</mi
   ></msub></mrow></mfrac><mstyle displaystyle="true"><msubsup><mo
   stretchy="true">∑</mo><mrow><mi>i</mi><mo>∈</mo><mi>m</mi></mrow><msub>
   <mi>N</mi><mi>m</mi></msub></msubsup><mrow><mstyle
   displaystyle="true"><msubsup><mo stretchy="true">∑</mo><mi>j</mi><mi
   mathvariant="italic">Np</mi></msubsup><mrow><mfenced close="|"
   open="|"><msub><mi>Z</mi><mrow><mi>i</mi><mo
   mathvariant="italic">,</mo><mi>j</mi></mrow></msub></mfenced></mrow></m
   style></mrow></mstyle><mspace width="0.25em"></mspace><mtext>,</mtext>
   :MATH]
   3

   Where N [m] is the number of genes associated with module cluster m; N
   [p] is the number of chemical exposures associated with phenotype p;
   and Z[i,j] is the Z-score of gene i under chemical exposure condition j
   associated with phenotype p.

   We collected 57 genes with direct evidence of association with AKI from
   the Comparative Toxicogenomics Database (CTD) [[76]40], 25 of which
   mapped to the co-expression modules. We used Fisher’s exact test to
   calculate the enrichment of these genes in each module cluster. In this
   process, we identified two module clusters (MC7 and MC11) that were
   activated in kidney injury conditions and enriched with known
   AKI-relevant genes. We refer to these two module clusters as the
   AKI-relevant module clusters and their component genes as the
   AKI-relevant gene set.

Generation of gene signature to predict kidney injury

   We utilized the AKI-relevant gene set and generated gene signatures to
   predict the future occurrence of kidney injury, using the R package
   randomForest [[77]41]. Fielden et al. reported chemical exposures that
   did not show histopathological kidney injury at 5-day exposure but did
   show kidney injury at 29-day exposure [[78]29]. Our training set, based
   on these definitions, consisted of 14 nephrotoxic chemical exposures at
   early time points and 30 non-nephrotoxic chemical exposures [[79]29].
   We used the AKI-relevant gene set and developed a random forest-based
   classifier model, using 1,001 trees and default settings. To analyze
   model performance, we used the standard model evaluation parameters,
   such as sensitivity, specificity, and area under the curve (AUC) for
   the receiver-operator characteristic (ROC) curve. The ROCR package was
   used to generate the AUC-ROC curve [[80]42]. We generated the final
   model by using the top 30 genes identified in the initial run. We
   separately created models with the Havcr1 and Clu genes, using the same
   parameters as those above. We utilized an independent external dataset
   from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation
   System [TG-GATEs] to evaluate the 30-gene signature [[81]43]. We
   processed 4- and 8-day high dose kidney exposures from TG-GATEs, using
   the same procedure as that described above for processing the
   DrugMatrix dataset.

Functional enrichment analysis

   We utilized the AKI-relevant gene set and carried out functional
   enrichment analyses of Kyoto Encyclopedia of Genes and Genomes
   (KEGG)/Reactome pathways and Gene Ontology-Biological Process (GO-BP)
   terms. We used the BioConductor package clusterProfiler with default
   settings for KEGG and GO-BP enrichment analysis [[82]44], and the
   REVIGO webserver with the default semantic similarity measure (SimRel)
   to cluster and summarize the enriched GO-BP terms [[83]45]. We employed
   the ReactomePA package for Reactome pathway enrichment analysis
   [[84]46].

Protein-protein interaction network analysis

   We converted the rat Affymetrix probe IDs to human gene IDs using the
   BioConductor/R packages annotate and biomaRt [[85]47]. The R script
   used for the conversion of probe IDs to human gene IDs is provided in
   additional information (Additional file [86]3: Script S2). We used the
   high-confidence human protein-protein interaction (PPI) network with
   14,862 unique nodes generated by using interaction detection based on
   the shuffling approach [[87]48]. We used Cytoscape 3.2.1 to map the
   AKI-relevant gene set to the PPI network, and extracted the connected
   component as an AKI-relevant sub-network (AKI-SN) [[88]49]. We first
   analyzed whether the AKI-SN was obtained by random chance using by two
   statistical significance tests [[89]26, [90]50]. Our null hypothesis
   was that the observed number of nodes (AKI-SN[nodes]) and edges
   (AKI-SN[edges]) in AKI-SN are obtained by random chance. In the first
   analysis, we randomly selected 158 proteins from the human PPI network
   1,000 times and counted the number of nodes (R[nodes]) and edges
   (R[edges]) of the largest connected component. We calculated the number
   of times that R[nodes] ≥ AKI-SN[nodes] (N[randnode]). Similarly, we
   computed the number of times that R[edges] ≥ AKI-SN[edges]
   (N[randedge]). We then calculated the probability of randomly obtaining
   a sub-network with a number of nodes comparable to that of AKI-SN by
   P = N[randnode] /1,000, and the probability of randomly obtaining a
   sub-network with a number of edges comparable to that of AKI-SN by
   P = N[randedges] /1,000. In the second analysis, we scrambled the human
   PPI network while preserving the average node degree and mapped AKI-SN
   nodes to the randomized network. This process was also repeated 1,000
   times. As in the first test, we extracted the largest connected
   component, counted the number of nodes and edges, and calculated the
   probability of obtaining AKI-SN parameters by random chance.

   We next computed the properties of the network, such as its degree and
   betweenness centrality. In the PPI network, nodes represent proteins
   and edges represent the interactions/connections between them. The
   degree represents the number of interactions associated with the
   protein. Proteins with a large degree are known as hub proteins
   [[91]51]. Earlier studies have shown that hub proteins tend to be the
   essential or key protein in the network [[92]52]. The betweenness
   centrality (also known as traffic) is a measure of the number of
   shortest paths through the node and represents the capacity of the node
   to communicate with other components of the network [[93]51]. We used
   the NetworkAnalyzer option in Cytoscape 3.2.1 to compute the degree and
   betweenness centrality of the AKI-SN [[94]53]. We used the MCODE
   program in Cytoscape to identify the most inter-connected nodes in the
   AKI-SN [[95]54].

   The Ingenuity Pathways Analysis (IPA; QiagenBioinformatics, Redwood
   City, CA) software package was used to determine the gene-function
   relationships and sub-anatomical location of the genes and proteins
   identified in the sub-networks. The genes identified in the sub-network
   analysis were associated with the disease or function annotations
   filtered by the search term “renal” in order to determine the
   gene-to-function relationships in discrete anatomical locations of the
   kidney. Only PubMed identifications (PMIDs) with significant
   associations curated by IPA were included in the anatomical mapping.

Identification of frequently co-expressed genes

   Of the 137 gene co-expression modules, 21 contained the Havcr1 gene. We
   created a correlation matrix for each of the 21 modules and selected
   the 20 % of genes that were most correlated with Havcr1. We counted the
   number of occurrences of each gene across the list of the 21 most
   correlated gene sets, and genes that occurred more than twice were
   denoted as “frequently co-expressed genes.” We tested whether these
   frequently co-expressed genes could be obtained by random chance. For
   this, we shuffled the gene labels in the log-ratio matrix and carried
   out the entire analysis from the ISA module generation to identify
   frequently co-expressed genes. Subsequently, we analyzed whether the
   procedure was robust with respect to noise. We repeated the entire
   analysis excluding either 5 % of the chemical exposures or 10 % of
   chemical exposures and calculated the genes frequently co-expressed
   with Havcr1.

External validation

   We further evaluated the relevance of frequently co-expressed genes
   with Havcr1, using an external dataset ([96]GSE58438) collected from
   Gene Expression Omnibus (GEO). In this dataset, ischemic kidney injury
   was produced by clamping the renal artery [[97]55]. This dataset
   contains five control replicates and five replicates of animals
   collected at 1 and 5 days after ischemic kidney injury. We matched the
   frequently co-expressed genes and calculated the Spearman correlation
   between the average log-ratios derived from chemical exposures that
   produced kidney injury in the DrugMatrix database and those derived
   from this external dataset.

Results and discussion

Identification of AKI-relevant modules

   Figure [98]1 shows the overall workflow used in this study. We
   pre-processed DrugMatrix kidney toxicogenomic data and obtained a final
   matrix of 9,222 genes arrayed across 220 chemical exposure conditions.
   We constructed co-expression modules by using the ISA approach and
   systematically varying the parameters of gene and condition thresholds
   from 2.0 to 4.0 in increments of 0.5 (n = 25 different threshold
   combinations; Fig. [99]2). We also performed a separate analysis
   merging all modules generated from each threshold combination, removing
   identical or redundant modules. The merged set resulted in the highest
   percentage of modules enriched with at least one GO term for the
   bi-clustering analysis (Fig. [100]2b). Based on this analysis, we used
   the merged set containing 137 gene co-expression modules in our study
   (see Additional file [101]4: Table S2 and Additional file [102]5: Table
   S3 for a detailed list of the genes and chemical exposures,
   respectively, associated with each of the 137 modules). This approach
   of using the combination of all threshold parameters affords the
   advantages of being unsupervised and non-parametric, and overcomes
   potential biases inherent in using a single parameter value.

Fig. 1.

   Fig. 1
   [103]Open in a new tab

   Workflow used in this study to mine kidney toxicogenomic data

Fig. 2.

   Fig. 2
   [104]Open in a new tab

   Iterative signature algorithm (ISA) parameter selection. a Number of
   modules generated for different combinations of gene and sample
   thresholds and for the merged results. b Percentage of modules enriched
   with gene ontology (GO) terms for different threshold values and merged
   results. “All,” merged result from all threshold combinations; %M[GO],
   the percentage of modules enriched with GO terms; N[m], the number of
   modules generated

   The co-expression modules differ from conventional clusters by allowing
   the same genes and chemical exposures to be present in multiple
   modules. We quantified this overlap between module genes and chemical
   exposures by using the Dice coefficient (Eq. [105]1). We clustered the
   resultant module overlap score matrix, allowing us to condense the
   original 137 modules into 16 distinct module clusters with high
   intra-module correlations and low inter-module correlations
   (Fig. [106]3a). We identified AKI-relevant modules by evaluating the
   module cluster activation across two different kidney histopathologies
   (P1-P2) and six pharmacological classes of drug or toxicant exposure
   that did not produce kidney pathology (C1-C6) (Table [107]1). Module
   clusters 7 and 11 showed specificity for kidney injury, being activated
   in kidney injury phenotypes (i.e., P1 [kidney cortical necrosis] or P2
   [kidney tubular regeneration]) but not after exposure to drugs or
   toxicants unassociated with kidney injury (i.e., C1-C6). Module cluster
   10 was excluded from further analysis because although it was activated
   in kidney injury phenotypes P1-P2, it was also non-specifically
   activated by compounds not associated with kidney injury (i.e.,
   hepatotoxicants [C1] and statins [C4]). We further analyzed module
   clusters 7 and 11 to verify that they contained genes recognized in the
   literature to be associated with AKI. We calculated the enrichment of
   the AKI-associated genes identified in CTD and mapped them to the
   module clusters (Table [108]2). AKI-relevant genes, including Havcr1,
   Clu, and Tff3, mapped to module clusters 7 and 11 but not to other
   clusters. Thus, the AKI-relevant gene sets that comprise these modules
   are activated in conditions producing kidney pathology, specific to
   kidney injury, and enriched in genes associated with AKI (see
   Additional file [109]6: Table S4 for the 679 AKI-relevant genes along
   with their log-ratio values in the three chemical exposures that
   produced the kidney necrosis phenotype (P1)). Taken together, these
   data suggest that both modules are relevant to AKI.

Fig. 3.

   Fig. 3
   [110]Open in a new tab

   a Heat map view of modules clustered based on the module overlap score.
   b Activation of module clusters (MC1-16) for different phenotypes
   (P1-2) and chemical classes (C1-6). P1, chemical exposures that cause
   kidney-cortex, tubule, necrosis; P2, chemical exposures that cause
   kidney-tubule, regeneration; C1, chemical exposure known to cause
   hepatotoxicity; C2, fluoroquinolone antibiotics; C3, epithelial growth
   factor receptor kinase inhibitors; C4, estrogen receptor modulators;
   C5, high dose of statin drugs; C6, high dose of anti-lipidemic drugs
   (fibrates)

Table 1.

   Phenotypes and chemical exposure classes used in the analysis of module
   cluster activation
   Name                        Class
   P1^a Kidney-cortex, tubule, necrosis
   P2^a Kidney-tubule, regeneration
   C1   Hepatotoxicants
   C2   Fluoroquinolone antibiotics
   C3   Epithelial growth factor receptor kinase inhibitors
   C4   Estrogen receptor modulators
   C5   Statins
   C6   Fibrates
   [111]Open in a new tab

   ^aP1 and P2 were defined based on histopathology data

Table 2.

   Enrichment of known acute kidney injury (AKI)-relevant genes in module
   clusters
   Module cluster No. of genes in module cluster No. of AKI genes p-value
   AKI-relevant genes
   1 184 0 1.00 -
   2 164 1 0.67 CYP2D6
   3 217 7 0.0008 ALB, AMBP, CYP2C19, CYP2C9, CYP2D6, FABP1, GSTP1
   4 181 0 1.00 -
   5 70 0 1.00 -
   6 286 1 0.88 PPARG
   7 273 7 0.002 B2M, CD44, GPNMB, G6PD, GSTP1, HAVCR1, TFF3
   8 197 1 0.78 HEXB
   9 320 0 1.00 -
   10 136 0 1.00 -
   11 517 10 0.002 A2M, CLU, CD44, GPNMB, GAS6, HAVCR1, LCN2, SPP1,
   TNFRSF12A, TFF3
   12 217 0 1.00 -
   13 509 2 0.89 AMN, OCLN
   14 509 5 0.34 A2M, EPO, GSTM2, LCN2, SPP1
   15 379 0 1.00 -
   16 181 0 1.00 -
   [112]Open in a new tab

   To our best knowledge, this study is the first module-based analysis of
   a large public repository of kidney toxicogenomic data. Our
   unsupervised approach is an advantage in that, unlike standard
   differential gene expression analysis, co-expression modules are not
   associated a priori with any injury phenotype. The modules we generated
   represent a compendium of gene co-expression patterns in rat kidney
   tissue after exposure to diverse toxicants (Additional file [113]4:
   Table S2). We used disease-relevant genes post hoc to prioritize
   modules rather than following previously published methods that use
   known disease-relevant genes to guide the co-expression analysis a
   priori [[114]56]. Our module activation calculations and enrichment
   analyses did not influence the gene/condition composition of the
   modules because they were performed after we generated the modules. As
   such, the module genes, i.e., genes co-expressed with known
   AKI-relevant genes, can be hypothesized to participate in biological
   processes/disease mechanisms relevant to kidney-specific injuries. When
   we analyzed the chemical exposures that define the modules in
   AKI-relevant module clusters, the results included well-known
   nephrotoxic chemicals/drugs (e.g., cisplatin, lead-II-acetate,
   calcitriol, cholecalciferol, netilmicin, vancomycin, and oxaliplatin;
   Additional file [115]7: Table S5) [[116]5]. We used the AKI-relevant
   gene set to carry out two separate analyses: 1) identification of gene
   signatures by using a supervised classification approach; and 2)
   identification of the pathways and networks associated with AKI.

Generation of predictive gene signature for AKI

   We used the AKI-relevant gene set along with the random forest approach
   to generate gene signatures that could predict the future onset of AKI.
   Although the nephrotoxicants in our training data did not show any
   kidney injury at 3 to 5 days of exposure, they are known to produce
   kidney injury after 28 days of exposure [[117]29]. Thus, the task is to
   use the data at early time points to predict the probability of injury
   before the injury actually occurs. We used the random forest approach
   to predict the probability of later injury on the basis of the data at
   early time points. The random forest approach is an ensemble-based
   decision-tree model that has been widely used in the analysis of omics
   data [[118]57–[119]60]. The main advantages of this approach are its
   ability to 1) handle problems with small sample sizes and a large
   number of variables and 2) provide a measure of variable importance
   [[120]57]. In the random forest approach, each tree is generated with a
   random subset of the data and a prediction is made for the left-out
   data that are not used in the tree generation. This out-of-bag (OOB)
   testing provides an error estimate and is equivalent to
   cross-validation analysis [[121]60]. Our random forest approach
   identified the top 30 genes that provide the best classification
   accuracy with the variable importance measure (i.e., the mean decrease
   in accuracy; see Additional file [122]8: Table S6).

   Next, we used these 30 genes to develop a classification model. The
   final model had a sensitivity of 86 %, specificity of 93 %, AUC of
   0.91, and OOB error estimate of 9.1 % (Fig. [123]4). This performance
   estimate is overly optimistic, because the 30-gene signature was
   evaluated by using the same set as that used in gene prioritization
   [[124]43]. We compared the performance of this signature with known
   biomarkers by developing separate models, using just the Havcr1 or Clu
   gene. In ROC curve comparisons, the 30-gene signature model had better
   predictive potential than models that used only the Havcr1 or Clu gene
   (Fig. [125]4). Importantly, our gene signature included both genes
   previously associated with kidney injury (e.g., Havcr1) and potentially
   new biomarkers of kidney injury (e.g., Irf6). The model with Havcr1
   alone had a sensitivity of 57 %, specificity of 80 %, AUC of 0.64, and
   OOB error estimate of 27.3 %. The model with Clu alone had a
   sensitivity of 50 %, specificity of 66.7 %, AUC of 0.59, and OOB error
   estimate of 38.7 %. The 30-gene signature model had a sensitivity of
   83 %, specificity of 75 %, and accuracy of 79 % in classifying an
   independent external dataset of 38 chemical exposures with 3 days of
   exposure duration (see Additional file [126]9: Table S7). Thus, overall
   the model shows good performance in predicting the future onset of
   kidney injury. We observed a decrease in performance at later time
   points after the injury was well-established (see Additional file
   [127]9: Table S7). Overall, our results are comparable to those of a
   prior study reporting that a 35-gene signature predicted kidney injury
   by using a different array platform. However, 13 of 35 of the genes in
   that study could not be identified and were labeled only as EST
   [[128]29].

Fig. 4.

   Fig. 4
   [129]Open in a new tab

   Receiver Operator Characteristics for the 30-gene signature, Havcr1,
   and Clu. The model uses early (1–5 days) transcription data to predict
   the future onset of kidney injury (at 28 days). The true positive rate
   is the rate of true predictions divided by true predictions and false
   positives; the false positive rate is the rate of false predictions
   divided by the false predictions and false negatives. The diagonal line
   indicates random predictions

   Besides Havcr1, the literature includes reports of associations between
   the genes in our signature and AKI. Guca2a, a gene that codes for
   guanylate cyclase 2a, was upregulated in gentamicin-induced kidney
   toxicity [[130]61]. The protein encoded by gene Ugt2b7 [uridine
   diphosphate glucuronosyltransferase (UGT)], one of the two most
   abundantly expressed renal UGTs, is known to play a significant role in
   glucuronidation of drugs and endogenous mediators associated with
   inflammation [[131]62]. The gene Ly96 (lymphocyte antigen 96) encodes a
   protein that interacts with toll-like receptor 4 (TLR4), a key mediator
   in nephrotoxicity. Tlr4 -/- mice are less prone to ischemic kidney
   injury than are wild-type littermates [[132]63, [133]64]. TLRs induce
   the expression of PVR (also known as CD155), another gene present in
   our gene signature [[134]65]. TLR4 signaling can lead to activation of
   interferon regulatory factors (IRFs) [[135]65]. IRF1 promotes
   inflammation after ischemic AKI [[136]66]. Furthermore, the Map4k4 gene
   in our signature encodes a kinase involved in the TNF-signaling
   pathway, which is also known to play a role in kidney injury [[137]67].
   Irf6 in our signature is functionally related to TLR4 signaling but has
   not previously been associated with AKI; it may represent a novel
   biomarker of kidney injury. Thus, the literature provides supporting
   evidence that all of the genes in our signature are relevant to
   predicting kidney injury.

Functional enrichment analysis

   We identified the pathways associated with the AKI-relevant gene set,
   using KEGG and Reactome pathway enrichment analysis (Tables [138]3 and
   [139]4, respectively). The enriched KEGG pathways of ECM-receptor
   interactions [[140]68, [141]69], cell adhesion, and focal adhesion
   pathways represent known disease processes of AKI associated with loss
   of attachment to the basement membrane, changes in actin cytoskeletal
   structure, and loss of cell-cell contacts due to redistribution of cell
   adhesion molecules and integrins [[142]70]. The p53 signaling pathway
   plays a key role in cisplatin-induced kidney damage [[143]71]. The bile
   secretion pathway mapped many transporters associated with kidney
   injury (e.g., Slc21a4 and Abcc2 [[144]72]). Glutathione depletion is
   consistent with kidney injury [[145]73], and the retinoic
   acid-inducible gene-1 (RIG-1) signaling pathway modulates immune and
   inflammatory responses in renal diseases [[146]74].

Table 3.

   KEGG pathway enrichment for the acute kidney injury (AKI)-relevant gene
   set
                     Pathway                    Count p-value^a
   ECM^b-receptor interaction                   14    9.10^−5
   Glutathione metabolism                       11    4.10^−4
   Cell adhesion molecules                      16    0.012
   Bile secretion                               10    0.012
   Cytosolic DNA-sensing pathway                7     0.026
   Cysteine and methionine metabolism           6     0.026
   p53 signaling pathway                        9     0.026
   RIG-I^c-like receptor signaling pathway      8     0.029
   Toxoplasmosis                                12    0.039
   Amoebiasis                                   10    0.039
   Focal adhesion                               15    0.046
   Metabolism of xenobiotics by cytochrome P450 8     0.046
   [147]Open in a new tab

   ^a p-value after Benjamini-Hochberg multi-test correction,
   ^bExtracellular matrix

   ^cRetinoic acid-inducible gene-1

Table 4.

   Reactome pathway enrichment for the acute kidney injury (AKI)-relevant
   gene set
                   Pathway                 Count p-value^a
   Extracellular matrix organization       21      0.001
   Degradation of the extracellular matrix 11      0.014
   Cell junction organization              9       0.025
   [148]Open in a new tab

   ^a p-value after Benjamini-Hochberg multi-test correction

   We mapped the individual genes in the AKI-relevant gene set to the KEGG
   pathways listed in Table [149]3 (Fig. [150]5). Individual genes
   reported in AKI are evident in this network (e.g., Spp1, Cd44, Gstp1,
   Gstm1, Icam1, Mdm2, Abcc2, and Ccl5 [[151]75–[152]78]). Of note,
   several genes associated with programmed cell death in AKI [[153]5]
   mapped to related pathways: the apoptosis-related genes Fas, Casp3, and
   Casp8 mapped to the p53 signaling pathway; Ripk3, a key mediator of
   necroptosis, mapped to the cytosolic DNA-sensing pathway; and Tnfrsf1a,
   a gene associated with inflammation and apoptosis, mapped to the
   toxoplasmosis pathway. Similarly, the enriched GO-BP terms for the
   AKI-relevant gene set identified AKI-related biological processes, such
   as the immune response, glutathione metabolism, and cell killing
   (Table [154]5). These results suggest that co-expression module-based
   analysis offers richer information than conventional differential gene
   expression analysis. Pathway enrichment analysis of the differentially
   expressed genes in chemical exposures that produce kidney injury (P1
   and P2) could only identify a limited number of pathways and missed
   pathways related to ECM receptor interaction, cell adhesion, the
   RIG-1-like receptor signaling pathway, and toxoplasmosis (see
   Additional file [155]10: Table S8). Moreover, the number of genes that
   mapped to AKI-relevant pathways was higher in co-expression
   module-based analysis than in differential gene expression analysis.

Fig. 5.

   Fig. 5
   [156]Open in a new tab

   Genes in the acute kidney injury (AKI)-relevant gene set mapped to the
   enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The
   color of the gene nodes indicates the log[2]-fold change and the
   connecting lines represent membership in the KEGG pathways (P1-P12)
   listed in Table [157]3

Table 5.

   Gene Ontology (GO)-biological process (BP) term enrichment for acute
   kidney injury (AKI)-relevant gene set^a
                          Pathway                         log[10] p-value
   Immune response                                             −5.01
   Defense response                                            −4.47
   Response to virus                                           −4.02
   Immune system process                                       −3.57
   Response to external stimulus                               −3.57
   Response to stimulus                                        −3.57
   Response to biotic stimulus                                 −3.50
   Multi-organism process                                      −3.35
   Response to cytokine                                        −3.32
   Regulation of viral process                                 −3.17
   Response to interferon-beta                                 −2.89
   Response to organic substance                               −2.79
   Viral genome replication                                    −2.74
   Single-organism process                                     −2.74
   Response to chemical                                        −2.67
   Regulation of symbiosis                                     −2.48
   Cellular response to interferon-beta                        −2.41
   Response to interferon-alpha                                −2.39
   Glutathione metabolic process                               −2.89
   Nucleobase-containing small molecule metabolic process      −2.08
   Positive regulation of cell-killing                         −2.43
   Cell-killing                                                −2.12
   [158]Open in a new tab

   ^aEnriched GO-BP terms were clustered and summarized by using REVIGO
   webserver [[159]45]

PPI network analysis

   We carried out integrated analysis of gene expression data with PPI
   networks to generate disease-specific networks and gain new insights.
   We mapped the AKI-relevant gene set to the human high-confidence PPI
   network and extracted the connected component as a sub-network (AKI-SN)
   with 158 nodes and 196 edges, organized by cellular location
   (Fig. [160]6). We performed statistical significance tests based on
   random sampling and permutation to confirm that AKI-SN could not have
   been generated by random chance (Additional file [161]11: Figure S1).
   We calculated the statistical (or topological) properties of degree and
   betweenness centrality for all 158 proteins in AKI-SN (Additional file
   [162]12: Table S9), and ranked the proteins in the network according to
   these two properties (Additional file [163]13: Table S10). In network
   analysis, hub proteins represent proteins with a large degree (i.e.,
   with the highest number of connections) and are hypothesized to play a
   key role in the network. Betweenness centrality (also known as traffic)
   represents the capacity of a protein node to facilitate interactions
   among members of the network, and is used to identify “communication
   hotspots” in the network [[164]51]. The hub nodes with more than 5
   connections were, in order of decreasing connectivity, protein products
   of the following genes: Isg15, Fn1, Anxa7, Actn1, Casp8, Ar, A2m,
   Casp3, Stat3, Lck, Cdkn1a, Vim, Ccr1, and Flna (see orange stars in
   Fig. [165]6). Many of these proteins are involved in AKI through a
   number of different biological processes, including the induction of
   apoptosis through Casp8 and Casp3 for nephrotoxic drugs [[166]79],
   obstruction of tubular lumen by Fn1 [[167]80], and dysregulation of
   cytosolic calcium levels linked to Anxa7 in connection with acute
   tubular necrosis [[168]81–[169]83]. The topmost hub node (n = 14
   connections in the AKI-SN) was Isg15 (interferon-stimulated gene 15),
   associated with apoptosis [[170]84] and nephrotoxicant exposure in fish
   orthologs [[171]85, [172]86]. Combined with this limited evidence in
   the literature, our network analysis predicts that Isg15 may be a key
   AKI-related gene. We ranked the AKI-SN proteins according to
   betweenness centrality and identified non-hub nodes such as Clu, Cd44,
   and Gsn (see the green stars in Fig. [173]6). The protein products of
   Clu and Gsn (gelsolin) are well-documented biomarkers of AKI [[174]80,
   [175]87]. The protein product of Cd44 is a cell-surface glycoprotein
   receptor and acts as an adhesion molecule; Cd44 and its ligand Spp1
   (osteopontin) are upregulated after ischemic kidney injury [[176]88].
   Our network analysis identified this node as a critical component of
   the AKI-SN despite the limited number of connections. In the AKI-SN, we
   captured the interaction between Cd44 and Spp1 (both upregulated). In
   addition to hub and high-traffic nodes, other AKI-associated genes
   mapped to AKI-SN (e.g., Lgals3, Mdm2, S100A11, S100A10, SDC1, and A2M
   [[177]80, [178]89, [179]90] and the chemokine genes Ccl2, Ccl5, and
   Ccl23) [[180]91]).

Fig. 6.

   Fig. 6
   [181]Open in a new tab

   Acute kidney injury (AKI)-relevant human protein-protein interaction
   sub-network. The protein nodes are distributed according to cellular
   localization. The size of the node represents the number of connections
   in the sub-network. The nodes are colored according to the average
   log[2] fold-change ratio in chemical exposures that cause kidney
   necrosis. Proteins encoded by genes with average log[2] fold-change
   ratios greater than 0.6 are shown in red. Proteins encoded by genes
   with average log[2] fold-change ratios between 0.6 and −0.6 are shown
   in grey. Proteins encoded by genes with average log[2] fold-change
   ratios less than −0.6 are shown in green. Orange stars denote hub
   proteins with >5 connections, green stars denote non-hub proteins with
   a high betweenness centrality (>0.09). The red star and dotted circle
   identify the highest interconnected region of the network associated
   with the immunoproteasome

   Next, we identified the highly interconnected regions in AKI-SN
   (Additional file [182]14: Figure S2). The most interconnected regions
   in disease-specific networks have been shown to capture
   disease-associated candidate genes [[183]92]. The most interconnected
   region in AKI-SN contained the protein products of genes Psmb8 (LMP7),
   Psmb9 (LMP2), Psmb10 (MECL1), and Tap1, which are involved in antigen
   processing and presentation (see the red star and circle in
   Fig. [184]6). The proteins of Psmb8, Psmb9, and Psmb10 form the
   immunoproteasome—an alternative form of the normal constitutive
   proteasome—which is induced by interferon (IFN)-γ and TNF-α [[185]93].
   Immunoproteasomes are more efficient than constitutive proteasomes at
   eliminating damaged cellular proteins under severe stress or other
   pathological conditions, and are considered as promising drug targets
   in certain cancers and auto-immune diseases [[186]93, [187]94].
   Interferon regulatory factor-1 (IRF1) is a transcription factor that
   regulates the IFN-γ-mediated upregulation of immunoproteasome sub-units
   [[188]95]. The protein product of the gene Irf1 mapped to the AKI-SN
   (located in the nucleus region of Fig. [189]6), and was upregulated
   together with the immunoproteasome sub-units present in the network. As
   immunoproteasomes are promising drug targets, we further evaluated the
   expression of immunoproteasome sub-units across all 220 chemical
   exposures. We found that immunoproteasomes were upregulated by
   confirmed nephrotoxicants (e.g., lead-II-acetate, lead-IV-acetate,
   netilmicin, cisplatin, vancomycin, cholecalciferol, neomycin,
   gentamicin, and 2-amino-nitro-phenol; Additional file [190]15: Table
   S11). Analysis of chemical exposures associated with downregulation of
   immunoproteasome gene expression showed an interesting class effect:
   Anticancer drugs such as methotrexate as well as anthracycline
   anticancer agents, such as doxorubicin, daunorubicin, and epirubicin,
   were associated with diminished expression of genes related to
   immunoproteasomes.

   The AKI-SN also contained regulatory elements of immunoproteasomes
   related to the TLR and TNF signaling pathways. The Irf-class of
   transcription factors are master regulators of TLRs and RIG-1 signaling
   pathways [[191]96]. Genes of this class (e.g., Irf1, Irf6, and Irf9)
   mapped to the AKI-SN network in the nucleus. Components SPP1, LBP, and
   CCL5 of the TLR signaling pathway [[192]97], located in the
   extracellular region and plasma membrane, were also upregulated. The
   presence of the TNF signaling pathway [[193]67] in our network revealed
   a sequential connection between TNFRSF1A in the plasma membrane (marked
   with a blue star in Fig. [194]6) and BIRC3, RIPK3, and CASP3 in the
   cytosol. The bulk of these components showed strong upregulation,
   indicative of a strong and persistent immune and inflammatory response
   to the chemical insults. Our results are concordant with earlier
   studies that have documented potential immunoproteasome involvement in
   patients with IgA nephropathy [[195]98] and patients rejecting renal
   transplants [[196]99].

Anatomical location

   The Predictive Safety Testing Nephrotoxicity Working Group recently
   published criteria for a renal safety biomarker. One of the key
   characteristics is the ability of the biomarker to accurately localize
   damage to discrete anatomical units within the kidney architecture
   [[197]10]. A literature review determined which potential biomarkers in
   our AKI-SN of 158 genes have been mapped to specific anatomical
   locations in the kidney (Fig. [198]7; Additional file [199]16: Table
   S12). The genes Spp1, Lgals3, Fn1, CD44, and Bmp4 have been associated
   with the proximal tubular region of the kidney by microarray analysis
   and mouse gene knockout studies with experimentally induced acute
   kidney injury [[200]100, [201]101]. Genes associated with glomerular
   injury included Ccl2, Cdkn1a, Clu, Mmp14, Spp1, and Tnfrsf1a.
   [[202]102–[203]107] (see Additional file [204]16: Table S12 for
   complete results of the IPA gene-to-function literature analysis).

Fig. 7.

   Fig. 7
   [205]Open in a new tab

   Acute kidney injury-subnetwork genes associated with different
   anatomical regions of the kidney

Frequently co-expressed genes with Havcr1

   We used our module-based analysis to identify network constituents of
   the FDA-qualified pre-clinical biomarker of AKI, Havcr1 (commonly known
   as kidney injury molecule-1 or KIM-1). Modules were defined by using a
   variety of chemical exposures, and genes that co-expressed with one
   particular query gene under diverse chemical exposures helped to
   identify new genes that participate in the same pathway or biological
   process. Thus, we identified 33 genes that frequently co-expressed with
   Havcr1, i.e., the “Havcr1-co-expression gene set” (Fig. [206]8). Genes
   including Cd44, Anxa2, Anxa7, and Mdm2, which have been documented in
   AKI [[207]101], were upregulated. We first ascertained that the
   Havcr1-co-expression gene set could not be identified from shuffled,
   randomized data (see [208]Methods). We further analyzed the robustness
   of the gene set by leaving out either 5 % or 10 % of the chemical
   exposures and rerunning the analysis. In the leave-out 5 % analysis,
   Il24 dropped out of the gene set and A3galt was identified the least
   number of times. In the leave-out 10 % analysis, Cadps2 dropped out of
   the gene set, and Il24 and A3galt were identified most infrequently.
   Overall, our analysis showed that the set of 33 genes that frequently
   co-expressed with Havcr1 was significantly different from that expected
   by chance and robust with respect to its composition. Although most
   genes were unaffected, genes such as Il24, A3galt, and Cadps2 could be
   de-prioritized. Our literature review confirmed that many genes in the
   Havcr1-co-expression gene set are associated with kidney injury. Of
   particular note was the potential for cross-regulation between Cd44 and
   Havcr1. Cd44 is not detectable in normal kidneys, but is expressed in
   the proximal tubule after acute ischemic injury [[209]64]. Cd44 is a
   cell adhesion molecule involved in biological processes similar to
   Havcr1, including the ability to phagocytose apoptotic cells
   [[210]108], and its downregulation by NF-κB through the PI3K pathway.
   With Cd44, Macrod1 in the network is also an essential gene for NF-κB
   activation [[211]109]. Cd44 mediates both phagocytosis of apoptotic
   cells [[212]110] and anti-apoptotic signaling through the PI3K pathway
   [[213]111], including PKC activation and influx of extracellular
   calcium after proteolytic cleavage of its ectodomain [[214]112]. The
   evidence for possible cross-regulation between Cd44 and Havcr1,
   including analysis of cleaved ectodomain fragments of Cd44 as urinary
   marker of AKI, is a potentially testable hypothesis in future studies.

Fig. 8.

   Fig. 8
   [215]Open in a new tab

   Frequently co-expressed genes with Havcr1. The size of the node
   represents the number of times the gene was co-expressed with Havcr1 in
   the modules. The nodes are colored according to the average log[2]
   fold-change ratio in chemical exposures causing kidney necrosis.
   Proteins encoded by genes with average log[2] fold-change ratios
   greater than 0.6 are shown in red. Proteins encoded by genes with
   average log[2] fold-change ratios between 0.6 and −0.6 are shown in
   grey. Proteins encoded by genes with average log[2] fold-change ratios
   less than −0.6 are shown in green

   We used an external rat kidney ischemic injury dataset from GEO to
   compare expression levels for the Havcr1-co-expression gene set across
   data sets. The external rat kidney ischemic injury dataset correlated
   strongly with the Drug-Matrix fold-changes, showing Spearman
   correlation coefficients (r [s]) of 0.75 and 0.72 at 1 and 5 days after
   injury, respectively (Fig. [216]9).

Fig. 9.

   Fig. 9
   [217]Open in a new tab

   Scatterplots of log[2] fold-change ratio for the genes in the
   Havcr1-co-expression gene set from the DrugMatrix data (x-axis) and an
   external rat kidney ischemic injury data set ([218]GSE58438) (y-axis)
   at a) 1 day and b) 5 days after ischemic injury. The gene set is highly
   correlated in both data sets at both 1 and 5 days after ischemic
   injury, indicating a similar response to both chemically and
   non-chemically induced kidney injuries

Conclusions

   Publicly available toxicogenomic datasets are valuable resources that
   help in data mining and identifying networks, mechanisms, and
   biomarkers associated with a disease. In this analysis, we studied the
   rat kidney toxicogenomic data from DrugMatrix and identified
   co-expression modules associated with kidney injury. Our work provides
   a compendium of kidney co-expression modules and is the first such
   analysis of kidney toxicogenomic data. We used the co-expression
   modules to identify a 30-gene signature that predicted the future onset
   of kidney injury from early gene transcription data. Although some of
   the genes in our signature are associated with the mechanism of AKI, we
   also identified genes such as Irf6 as potentially novel candidates for
   AKI. Systems-level analyses identified pathways and networks associated
   with AKI. We identified AKI-relevant pathways, such as ECM receptor
   interaction and the RIG-1 signaling pathway, and showed that
   co-expression module-based approaches can identify additional
   information not obtainable by standard differential gene expression
   analysis. We identified an AKI-relevant protein interaction sub-network
   that mapped many known genes involved in AKI. Our network analysis
   revealed the involvement of immunoproteasomes in AKI and identified new
   genes, such as Isg15 and Anxa7, not previously associated with this
   disease. Finally, we used our co-expression modules to identify the
   frequently co-expressed genes with known biomarker Havcr1. Overall, our
   analyses show the potential utility of using co-expression modules in
   characterizing molecular mechanisms involved in AKI and identifying
   novel mechanism-based biomarker candidates.

Acknowledgements