Abstract

Importance

   Clinical decision and immunosuppression dosing in kidney
   transplantation rely on transplant biopsy tissue histology even though
   histology has low specificity, sensitivity, and reproducibility for
   rejection diagnosis. The inclusion of stable allografts in mechanistic
   and clinical studies is vital to provide a normal, noninjured
   comparative group for all interrogative studies on understanding
   allograft injury.

Objective

   To refine the definition of a stable allograft as one that is
   clinically, histologically, and molecularly quiescent using publicly
   available transcriptomics data.

Design, Setting, and Participants

   In this prognostic study, the National Center for Biotechnology
   Information Gene Expression Omnibus was used to search for microarray
   gene expression data from kidney transplant tissues, resulting in 38
   studies from January 1, 2017, to December 31, 2018. The diagnostic
   annotations included 510 acute rejection (AR) samples, 1154
   histologically stable (hSTA) samples, and 609 normal samples. Raw
   fluorescence intensity data were downloaded and preprocessed followed
   by data set merging and batch correction.

Main Outcomes and Measures

   The primary measure was area under the receiver operating
   characteristics curve from a set of feature selected genes and cell
   types for distinguishing AR from normal kidney tissue.

Results

   Within the 28 data sets, the feature selection procedure identified a
   set of 6 genes (KLF4, CENPJ, KLF2, PPP1R15A, FOSB, TNFAIP3) (area under
   the curve [AUC], 0.98) and 5 immune cell types (CD4^+ T-cell central
   memory [Tcm], CD4^+ T-cell effector memory [Tem], CD8^+ Tem, natural
   killer [NK] cells, and Type 1 T helper [T[H]1] cells) (AUC, 0.92) that
   were combined into 1 composite Instability Score (InstaScore) (AUC,
   0.99). The InstaScore was applied to the hSTA samples: 626 of 1154
   (54%) were found to be immune quiescent and redefined as histologically
   and molecularly stable (hSTA/mSTA); 528 of 1154 (46%) were found to
   have molecular evidence of rejection (hSTA/mAR) and should not have
   been classified as stable allografts. The validation on an independent
   cohort of 6 months of protocol biopsy samples in December 2019 showed
   that hSTA/mAR samples had a significant change in graft function
   (r = 0.52, P < .001) and graft loss at 5-year follow-up (r = 0.17). A
   drop by 10 mL/min/1.73m^2 in estimated glomerular filtration rate was
   estimated as a threshold in allograft transitioning from hSTA/mSTA to
   hSTA/mAR.

Conclusions and Relevance

   The results of this prognostic study suggest that the InstaScore could
   provide an important adjunct for comprehensive and highly quantitative
   phenotyping of protocol kidney transplant biopsy samples and could be
   integrated into clinical care for accurate estimation of subsequent
   patient clinical outcomes.

Introduction

   Breakthroughs in surgical approaches and development of newer
   generations of immunosuppressive drugs have resulted in reduction of
   clinical allograft acute rejection (AR) and improvements in life
   expectancy and quality of life for kidney transplant recipients.^[31]1
   Nevertheless, a burden of subclinical AR is present only at a molecular
   level, not associated with an alteration in graft function, and often
   not accompanied by changes in graft
   histology.^[32]2,[33]3,[34]4,[35]5,[36]6,[37]7,[38]8 In addition, the
   significant discrepancies (19%-55%) among pathologists for histologic
   phenotyping^[39]9,[40]10 result in a lack of consistency in
   interpreting an allograft as rejected,^[41]11,[42]12 not rejected, or
   stable,^[43]7,[44]9,[45]10,[46]13 thereby introducing bias in the
   interrogative mechanistic studies on allograft pathology. Furthermore,
   there is a failure to uncover the molecular biologic diversity in the
   histologic definition of a stable allograft. This bias is further
   amplified during interrogation of kidney transplant biopsy samples
   across different pathologists and investigators in public data sets.

   In this study, we have aggregated, to our knowledge, the largest public
   data set for human kidney transplantation to date: 2273 kidney tissue
   microarray samples from 28 publicly available normal and transplant
   kidney tissue data sets^[47]14 in Gene Expression Omnibus,^[48]15 a
   public genomics data repository, to investigate the molecular diversity
   of stable allografts.^[49]16,[50]17,[51]18,[52]19 We proposed that for
   accurate definition of a stable allograft, the sample must be
   associated with (1) stable clinical function, (2) normal kidney
   histology with AR (histologically stable [hSTA]), and (3) absence of a
   transcriptional signature of AR (molecularly stable [mSTA]).
   Recognizing the previously discussed variabilities in allograft
   histology interpretation, we expected that some of the labeled stable
   samples in these data sets (that only use the first 2 criteria listed
   above) would have inherent molecular variability. Our analysis has
   resulted in the generation of a histology-independent composite gene
   and cell-specific computational Instability Score (the InstaScore) to
   discern molecular rejection in hSTA allografts, classifying clinically
   stable (truth) samples as histologically and molecularly stable
   (hSTA/mSTA) or clinically and histologically stable (untruth) samples
   with molecular rejection (hSTA/mAR). Thus, our prognostic study
   proposes an approach to recognize immunologic heterogeneity in hSTA
   kidney allografts.

Methods

Data Collection

   For this prognostic study, we carried out a comprehensive search for
   publicly available microarray data at the National Center for
   Biotechnology Information Gene Expression Omnibus database^[53]15 for
   biopsy kidney transplant samples from January 1, 2017, to December 31,
   2018. Any public, deidentified data available as open access were not
   subject to local institutional review board requirements or patient
   consent as allowed under the Common Rule. For any private data used, we
   obtained the approval of the institutional review board of the
   University of California, San Francisco, and written informed consent
   from all patients. After stringent data quality control procedures
   (eMethods in the [54]Supplement), the final data set consisted of 28
   studies with 2273 samples. Their diagnostic annotations included 510 AR
   samples (including antibody-mediated rejection, T-cell–mediated
   rejection, AR, AR with chronic allograft nephropathy, borderline
   rejection, borderline rejection and chronic allograft nephropathy, and
   mixed rejection), 1154 stable samples, and 609 normal samples (ie,
   biopsy conducted before organ transplant). The summary for the
   collected studies is represented in eTable 1 in the [55]Supplement.
   This study adhered to the Preferred Reporting Items for Systematic
   Reviews and Meta-analyses ([56]PRISMA), Standards for Reporting of
   Diagnostic Accuracy ([57]STARD), and Transparent Reporting of a
   Multivariable Prediction Model for Individual Prognosis or Diagnosis
   ([58]TRIPOD) reporting guidelines.

Data Processing and Normalization

   Raw fluorescence intensity data were downloaded and preprocessed
   depending on the microarray platform. The data processing included
   background correction, log2 transformation, quantile normalization, and
   probe to gene mapping using R language, version 3.5.1^[59]20 (R
   Foundation) (eMethods and eFigure 1A in the [60]Supplement). To perform
   a meta-analysis, we merged all the studies and corrected for potential
   batch effects using the ComBat^[61]21 approach (eFigure 2 in the
   [62]Supplement); however, other approaches were evaluated (eMethods in
   the [63]Supplement).

Statistical Analysis

   To identify differentially expressed genes, we used the Significance
   Analysis of Microarrays,^[64]22 which used the siggenes package.^[65]23
   We used the false discovery rate^[66]24 with the
   Benjamini-Hochberg^[67]25 method for multiple testing correction
   (P < .05 and FC > 1.5).

Pathway Analysis

   We leveraged the Gene Ontology database using the gene set enrichment
   analysis with clusterProfiler^[68]26 to perform functional annotations
   for the significantly upregulated and downregulated genes with a false
   discovery rate less than 0.05. For the gene network analysis, we used
   the STRING protein-protein association networks database.^[69]27

Cell Type Enrichment Analysis

   To estimate the presence of certain cell types in biopsy samples, we
   used the cell type enrichment tool xCell.^[70]28 xCell leverages gene
   expression data from microarray or RNA-sequence experiments to estimate
   the presence of up to 64 immune and stromal cell types in a mixture. We
   focused on 34 immune-related and 11 nonimmune cell types (eTable 3 in
   the [71]Supplement) that were manually selected as relevant to the
   transplant injury process. The enrichment scores for each cell type
   were used to compare AR and normal samples by performing the
   nonparametric 2-sample Mann-Whitney-Wilcoxon statistical test. The P
   values were adjusted using the Benjamini-Hochberg method (P < .05).

Feature Selection Procedure

   In order to select the most important features in distinguishing AR vs
   normal samples, first, the data were split into training and testing
   sets in the ratio 80:20. All feature selection steps were performed on
   the training set with benchmarking on the testing set. Among the
   significant features, we searched for features correlated with the
   outcome (r > 0.75 × max[r]). After, we applied the recursive feature
   elimination technique with the random forest (RF) model using
   caret.^[72]29 We used a 5-fold cross-validation technique with 100
   repeats and benchmarked a model by computing the area under the
   receiver operating characteristic (AUROC) curve, and the results were
   reported with both AUROC and precision-recall area under the curve
   (AUCPR). To minimize possible bias of the data random split and to
   avoid the model overfitting, the tolerance of 1% to the feature
   selection mechanism was introduced, ie, the algorithm chose a model
   with a smaller number of features that performed no worse than 99% from
   the best model. A final set of selected features was benchmarked by
   applying the RF model to the testing set. The R package feseR^[73]30
   was adopted and modified for the implementation of the parallel
   computations.

Instability Score and hSTA Subphenotyping

   The method of subphenotyping hSTA samples was based on selected
   features from the normal or AR analysis and applied for scoring the
   hSTA samples. The hSTA samples were then identified as mAR or mSTA. We
   denoted this split as hSTA/mAR and hSTA/mSTA, respectively.

   Based on gene expression and cell type enrichment data, the feature
   selection procedure was performed to find sets of genes and cell types
   highly associated with AR. Next, with Z-scaled features, we built a
   logistic regression model and, using model coefficients, created a
   linear score function, the InstaScore:
   InstaScore = 0.596 + 2.096 × KLF4 + 2.534 × CENPJ +
   0.311 × KLF2 + 1.447 × PPP1R15A – 1.633 × FOSB +
   0.268 × TNFAIP3 + 2.249 × natural killer (NK) cells +0.542 × CD4^+
   T-cell central memory (T[cm]) cells +0.833 × CD4^+ T-cell effector
   memory (T[em]) cells +0.709 × CD8^+T[em] cells +0.146 × Type 1 T helper
   (T[H]1) cells

   Therefore, the positive InstaScore values separate AR from normal
   samples, which obtain negative values (eFigure 1B in the
   [74]Supplement). Using this definition, the InstaScores were computed
   for the hSTA samples, and the zeroth threshold was applied to perform
   the split into mAR and mSTA subtypes (eFigure 1C in the
   [75]Supplement). All the code has been uploaded to github.^[76]31

Results

   From the total 28 data sets, the feature selection procedure identified
   a set of 6 genes (KLF4, CENPJ, KLF2, PPP1R15A, FOSB, and TNFAIP3) (AUC,
   0.98) and 5 immune cell types (CD4^+ Tcm, CD4^+ Tem, CD8^+ Tem, NK
   cells, and T[H]1 cells) (AUC, 0.92) that were combined into 1 composite
   InstaScore (AUC, 0.99). We leveraged all currently publicly available
   kidney biopsy microarray data (eFigure 1A in the [77]Supplement) from
   28 studies with 2273 samples and performed a feature selection
   procedure based on the RF algorithm to identify a subset of genes and
   cell types that better distinguish AR and normal kidney samples, which
   were combined into the InstaScore (eFigure 1B in the [78]Supplement) to
   reclassify all annotated stable samples (hSTA) and identify variances
   in the samples by recognizing similarities to either the molecular
   rejection signature (hSTA/mAR) or the molecular quiescence (hSTA/mSTA)
   (eFigure 1C in the [79]Supplement). The clinical validity and
   prediction performance of the InstaScore were demonstrated on
   independent data wherein falsely classified stable samples (hSTA/mAR)
   showed significant projected differences in reduced graft function and
   survival over the true stable samples (hSTA/mSTA).

Differential Gene Expression Analysis for Upregulation of Immune-Related
Pathways in Rejection

   We performed differential gene expression analysis comparing AR with
   normal samples and identified 1509 significantly differentially
   expressed genes including 848 upregulated and 661 downregulated genes
   (eTable 2 in the [80]Supplement). Further hierarchical clustering
   analysis on the significant genes based on the Ward clustering
   technique was performed, and a significant separation was found^[81]32
   (1119 samples, 1509 genes; P < .001) between classes ([82]Figure 1A).
   Additionally, the principal component analysis and Uniform Manifold
   Approximation and Projection dimensionality reduction confirmed the
   class separation (eFigure 3 in the [83]Supplement). The functional
   annotation of the significant genes found that upregulated genes were
   enriched in the regulation of the immune response, cell aggregation and
   activation, and innate immunity (eFigure 4A in the [84]Supplement). The
   downregulated genes were enriched in metabolic processes (eFigure 4C in
   the [85]Supplement). The network analysis showed connectivity between
   the sets of genes (eFigure 4B and 4D in the [86]Supplement).

Figure 1. Heat Map Plots for Differentially Expressed Genes and Significantly
Enriched Cell Types.

   Figure 1.
   [87]Open in a new tab

   A, Heat map clustering plot for significant genes from Significance
   Analysis of Microarrays (SAM) of acute rejection (AR) vs normal
   samples. B, Heat map clustering plots for significant cell types from
   the nonparametric Wilcoxon statistical test (Benjamini-Hochberg,
   P < .05) in the analysis of AR vs normal samples. aDC indicates
   activated dendritic cell; cDC, conventional dendritic cell; DC,
   dendritic cell; FC, X; HSC, hematopoietic stem cell; M1, X; MSC,
   mesenchymal stem cell; NK, natural killer; NKG, X; pDC, plasmacytoid
   dendritic cell; Tcm, X; Tem, X; Tgd, X; Th1, Type 1 T helper cell; Th2,
   Type 2 T helper cell; T regs, regulatory T cell.

Cell Type Enrichment Analysis for Immune Cell Types Associated With AR

   To highlight the biologic heterogeneity and to capture signals from
   infiltrating cell type–specific outcomes in injured and stable kidney
   transplants, we performed cell type enrichment analysis. We leveraged
   xCell^[88]28 to focus on 45 cell types (eTable 3 in the [89]Supplement)
   that are relevant for organ transplants. We found 25 cell types (mostly
   lymphoid and myeloid cells) that were significantly (Wilcoxon test,
   Benjamini-Hochberg; 1119 samples; P < .05) enriched in AR and 12 cell
   types (immune, stromal cells, and hematopoietic stem cells) that were
   enriched in normal kidneys ([90]Figure 1B). As seen on the heat map,
   the hierarchical clustering revealed 2 main AR subclusters (510
   samples; P < .001): one was mostly enriched in lymphocytes, NK cells,
   and macrophages, and the other had minimal lymphocyte activation and
   may have represented temporal differences in rejection evolution or
   recovery. We observed that B cells, dendritic cells, macrophages, and T
   cells formed cell type–specific subclusters that suggested the
   coordinated activation of immune cells in the kidney tissues. These
   results are in agreement with previous observations^[91]33 that have
   shown AR subphenotypical splits by gene expression and cell type.
   Unsupervised clustering of hSTA along with AR and normal samples
   exposed their heterogeneity, hinting that some hSTA samples have
   molecular signal closer to AR samples (eFigure 5 in the
   [92]Supplement).

Machine Learning Feature Selection Procedure to Optimize AR Classification

   Following the feature selection procedure (eMethods in the
   [93]Supplement), we dramatically decreased the number of model features
   from all 1509 differentially expressed genes (1) to only 6 pivotal
   upregulated genes (KLF4, CENPJ, KLF2, PPP1R15A, FOSB, and TNFAIP3;
   AUROC, 0.98; AUCPR, 0.99) (eFigures 6A and 7A in the [94]Supplement);
   (2) to genes enriched as zinc finger proteins and expressed mostly in
   CD33^+ myeloid cells; and (3) to 5 cell types from the original set of
   37 significantly enriched cell types: CD4^+ Tcm, CD4^+ Tem, CD8^+ Tem,
   NK cells, and T[H]1 cells, with CD4+ Tcm having the largest effect size
   in this model (AUROC, 0.92; AUCPR, 0.88) (eFigures 6B and 7B in the
   [95]Supplement).

   The feature selected cell types showed a predominant role for
   infiltration and activation of effector T cells and NK cells in AR, and
   the feature selected genes appeared to have broad cellular functions in
   AR, triggered by mononuclear activation and infiltration and
   collectively driving a variety of functions, such as DNA recognition,
   RNA packaging, transcriptional activation, and regulation of apoptosis.
   Interestingly, although the set of 11 genes in the common rejection
   module previously identified from a cross-organ (kidney, heart, liver,
   and lung) meta-analysis study of transplant rejection^[96]16 was
   enriched in this current analysis, none of them made it to this final
   minimal feature selection set. This finding suggests that the current
   6-gene set might be more specific for the absence of AR in the kidney
   allograft, as the precise definition of a hSTA/mSTA allograft was not
   available in the earlier analysis.

   A generated RF classification model for these 6 genes and 5 cell types,
   internally validated using 5-fold cross-validation with 100 repeats,
   obtained an AUROC of 0.98 (sensitivity, 0.94; specificity, 0.94) for
   the genes alone and an AUROC of 0.92 (sensitivity, 0.85; specificity,
   0.88) for the cell types for identification of a tissue sample with
   histologically confirmed AR ([97]Figure 2A). We further combined the
   feature selected genes and cell types into 1 score value, called the
   InstaScore (eMethods in the [98]Supplement), and were able to perform
   the split into AR and normal samples with a slightly improved AUROC of
   0.99, with sensitivity of 0.95 and specificity of 0.94 ([99]Figures 2B
   and [100]2C), and an AUCPR of 0.99 (eFigure 7C in the [101]Supplement).

Figure 2. Feature Selected Genes and Cell Types and the Instability Score as
Their Combination.

   Figure 2.
   [102]Open in a new tab

   A, Hierarchical clusterings of acute rejection (AR) and normal samples.
   B, Combined selected features with AR and normal samples. C,
   Instability Score plot for AR and normal samples. NK indicates natural
   killer; Tcm, T-cell central memory; Tem, T-cell effector memory; T[H]1,
   Type 1 T helper cell.

Selected Features to Create a Scoring Function to Carry Out Precision
Subphenotyping of Stable Samples

   We then applied the InstaScore to the 1154 transplant samples that were
   identified by pathologists in each of the data sets as hSTA,
   classifying samples as more similar to normal kidneys or as more
   similar to the rejected kidney allograft group (mAR); hSTA/mSTA
   identified samples with molecular and histologic evidence of no
   rejection, and hSTA/mAR identified histologically stable allografts
   with transcriptional evidence of ongoing molecular rejection. The
   InstaScore identified 528 hSTA grafts (46%) in this study as having mAR
   ([103]Figure 3A), a misclassification rate in line with previously
   reported discrepancies in transplant phenotyping across different
   pathologists.^[104]9,[105]10

Figure 3. Plots of Acute Rejection (AR), Subphenotyped Histologically Stable
(hSTA), and Normal Samples Based on Instability Score Results.

   Figure 3.
   [106]Open in a new tab

   A, Instability Score plots. B, Heat map of AR and normal samples. C,
   Uniform Manifold Approximation and Projection (UMAP) plot of AR and
   normal samples. mAR indicates molecular rejection; mSTA, molecularly
   stable; NK, natural killer; Tcm, T-cell central memory; Tem, T-cell
   effector memory; Th1, Type 1 T helper cell.

   We represented the scores for each sample as a scatterplot in
   [107]Figures 2C and [108]3A. The InstaScore was able to significantly
   distinguish AR and normal samples (1119 samples; P < .001; [109]Figure
   2C) and distinguish hSTA/mAR and hSTA/mSTA samples (1154 samples;
   P < .001; [110]Figure 3A) by thresholding with zero. The hSTA/mAR
   samples clustered with AR and separately from hSTA/mSTA samples and had
   intermediate scores between normal and AR samples ([111]Figure 3B and
   [112]3C).

Validation of hSTA Subphenotyping Using Clinical Follow-up Data

   In order to assess the functional relevance of the InstaScore by gene
   expression and cell types, we explored its clinical use in an
   independent microarray data set from 67 unique patients with hSTA
   grafts (stable clinical graft function, no donor-specific antibody, and
   no AR) from a randomized clinical trial^[113]34 with transcriptional
   data on serial protocol kidney transplant biopsy samples at 0, 3, 6,
   12, and 24 months^[114]35,[115]36 and with longitudinal functional
   outcomes up to 5 years after initial engraftment. We tested the
   correlation association of the locked InstaScore with the change in
   estimated glomerular filtration rate (eGFR) and graft loss events over
   this time period and found high correlation values for cell type
   infiltration and activation model with delta eGFR ([116]Figure 4A)
   (r = 0.52; P < .001) and graft loss events (r = 0.17; P = .26).

Figure 4. Validation Plots on the Independent Clinical Data.

   Figure 4.
   [117]Open in a new tab

   A, Change in estimated glomerular filtration rate (eGFR) after biopsy
   by Instability Score (InstaScore) (r = −0.52, P < .001). B, Change in
   eGFR distributions for predicted histologically stable (hSTA)
   subpopulations by InstaScore (P < .001).

   Using the predicted hSTA subphenotypes, we estimated a delta eGFR
   separating threshold of −10 at 5-year follow-up ([118]Figure 4B,
   P < .001). Given these results, it appears that the InstaScore on the
   6-month protocol biopsy samples could differentiate patients more
   likely to have progressive graft injury and decline in graft function
   over time, even though the 6-month biopsy histology findings, serum
   creatinine, or donor-specific antibodies cannot provide the same
   discriminatory information.

Discussion

   Tissue histology is indispensable for the diagnosis of allograft
   pathology, but its recognized limitations have resulted in the
   incorporation of data inputs from transcriptional and proteomic
   studies. Here, we present, to our knowledge, the first unsupervised
   transcriptional and cell-state framework to map and rephenotype human
   kidney allografts with undiagnosed graft dysfunction. Unlike other
   published studies, by others^[119]37,[120]38,[121]39,[122]40,[123]41
   and by members of our group,^[124]2,[125]33,[126]42,[127]43 that have
   only focused on general transcriptional perturbations in rejection, the
   present study is, to our knowledge, the first development and
   validation of an approach that leverages the statistical power of a
   large public transcriptional data set. Along with the cell type
   enrichment analysis using xCell,^[128]28 we used logistic regression to
   build the InstaScore. By doing so, we reclassified kidney transplant
   biopsy samples, otherwise described as hSTA, into samples that have no
   molecular injury (hSTA/mSTA) and those that are most likely incorrectly
   annotated as stable but have molecular injury similar to AR (hSTA/mAR).

   Approximately half (46%) of the biopsy samples were wrongly annotated
   as stable and reclassified as hSTA/mAR by the InstaScore; these samples
   were found to be scattered across multiple data sets, supporting that
   their presence is not due to failure of histologic characterization at
   any particular transplant program, and highlighting the failure of
   histology to detect relevant molecular inflammation. The InstaScore was
   independently validated for functional relevance,^[129]35 as it
   identified hSTA/mAR 6-month protocol biopsy samples that had a higher
   risk of progressive graft injury and failure at 5 years posttransplant.

   The 6 feature selected genes, KLF4, CENPJ, KLF2, PPP1R15A, FOSB, and
   TNFAIP3, in the InstaScore are biologically relevant in the immune
   response and activation and innate immunity. KLF2, KLF4, and TNFAIP3
   regulate kidney injury.^[130]44 KLF2 is vasoprotective, and KLF4 is
   renoprotective; both genes are highly expressed in the
   endothelium^[131]45,[132]46,[133]47 and are associated with endothelial
   ischemia reperfusion injury in AR.^[134]48 TNFAIP3 has antiapoptotic
   and anti-inflammatory functions and expression in endothelial, myeloid,
   and infiltrating T cells, which results in adverse clinical outcomes in
   AR.^[135]49,[136]50,[137]51 CENPJ functions as a transcriptional
   coactivator in STAT 5 signaling and tumor necrosis factor–induced
   NF-κB–mediated transcription,^[138]52,[139]53 both of which are central
   regulators of inflammation. The phosphatase PPP1R15A is only expressed
   in stressed cells and negatively regulates acute kidney injury via type
   1 interferon;^[140]54 clonal expansion; and memory T-cell, plasma cell
   differentiation, and enhanced B-cell responses.^[141]55 FOSB expression
   is associated with the progression of kidney disease.^[142]56 Thus, all
   InstaScore genes are crucial for endothelial cell integrity, and T-cell
   activation, and have functional relevance to the kidney and
   rejection.^[143]57,[144]58

   The 5 feature selected cell types, CD4^+ Tcm, CD4^+ Tem, CD8^+ Tem, NK,
   and T[H]1, also relate to rejection biology,^[145]59 together with
   macrophages, NK cells, and B
   cells.^[146]60,[147]61,[148]62,[149]63,[150]64 In the immunologic
   response to the allograft, T cells terminally differentiate and divide
   into Tcm cells and CD8^+ and CD4^+ Tem cells,^[151]65 which produce
   interferon-γ, IL-4, and IL-5 and cytotoxic molecules like granzyme,
   granulysin, and perforin.^[152]66,[153]67 CD4^+ Tcm had the largest
   effect size in the InstaScore, likely because Tcm cells are
   characterized by slow effector function and reactive memory and
   increased response to repeat antigenic stimulation.^[154]68 In the
   hSTA/mAR grafts, these cells are primed to differentiate into Tem with
   low levels of antigen recognition, such as with varying exposure to
   baseline immunosuppression.^[155]69 Hence, identification of the
   hSTA/mAR phenotype in an otherwise clinically and histologically stable
   allograft may be of critical importance to triage allografts at greater
   risk of accelerated temporal immune injury.

Limitations

   Given the design of the study, there are a few inherent limitations.
   First, the publicly available data had limited access to clinical and
   demographic reports, which could potentially be valuable in
   incorporation with InstaScore. Second, batch effects had to be
   controlled for, for which we conducted comparisons of multiple
   normalizing methods. We chose the RF model to capture possible
   nonlinear feature interactions to identify the best feature set,
   although other more complex (eg, neural nets) or less complex (eg,
   elastic net) methods could also be considered as optional methods.
   Although the study is based on bulk microarray data, more precise
   measurement techniques, eg, single-cell RNA sequence, might be used to
   better capture finer changes in gene expression and cell composition,
   provide additional validation to the results and, in a future study,
   refine InstaScore. This may give InstaScore the ability to recognize
   different types of rejection, which can be identified at the molecular
   level, long before they can be detected at the histology.

Conclusions

   This prognostic study leverages supervised machine learning on the
   largest bulk transcriptional human kidney and kidney transplant data
   set to improve kidney allograft sample phenotyping beyond the
   capabilities of tissue histology alone. In this study, the InstaScore
   revealed a level of biologic diversity within the classification of a
   stable graft not shown by histology alone; based on these findings, the
   InstaScore may provide an immune map to help refine our understanding
   of diverse graft functional states. The InstaScore provides a new tool
   to apply polymerase chain reaction–based analysis of the minimal gene
   set to kidney allograft biopsy samples embedded in formalin frozen
   paraffin to identify hSTA/mAR grafts at greater risk for subsequent
   overt rejection and allograft damage. These patients may benefit from
   proactive immunosuppression adjustments to reduce molecular
   inflammation, preserve allograft function, and improve allograft
   survival.
   Supplement.

   eMethods

   eFigure 1. Flow Chart

   eFigure 2. Scatterplots of Gene Expression Data After Data Sets Merging

   eFigure 3. PCA Clustering Plot for Differentially Expressed Genes From
   Analysis of AR vs Normals

   eFigure 4. Pathway Enrichment Analysis of DE Genes

   eFigure 5. Heatmap of Enrichment Scores of Significant Cell Types From
   the AR vs Normal Comparison

   eFigure 6. Plots of Feature Selected Genes and Cell Types for all AR
   and Normal Samples

   eFigure 7. AUROC and AUCPR Plots of Feature Selected Genes, Cell Types
   and InstaScore

   eFigure 8. Combined Benchmark Based on P-Value, Delta Statistic and the
   Percentage of Variability for Batch Correction Methods Tested

   eTable 1. Datasets Collected From Gene Expression Omnibus (GEO)

   eTable 2a. Upregulated Differentially Expressed Genes From SAM Analysis
   of Comparison of Acute Rejection to Normal Kidney Tissues

   eTable 2b. Downregulated Differentially Expressed Genes From SAM
   Analysis of Comparison of Acute Rejection to Normal Kidney Tissues

   eTable 3. Cell Types Considered in Cell Type Enrichment Analysis With
   xCell

   eReferences