Abstract
Sepsis-associated acute kidney injury (SA-AKI) is a life-threatening
complication of sepsis, characterized by high mortality and prolonged
hospitalization. Early diagnosis and effective therapy remain difficult
despite extensive investigation. To address this, we developed an
AI-driven integrative framework that combines a Transformer-based deep
learning model with established machine learning techniques (LASSO,
SVM-RFE, Random Forest and neural networks) to uncover complex,
nonlinear interactions among gene-expression biomarkers. Analysis of
normalized microarray data from GEO ([36]GSE95233 and [37]GSE69063)
identified differentially expressed genes (DEGs), and KEGG/GO
enrichment via clusterProfiler revealed key pathways in immune
response, protein synthesis, and antigen presentation. By integrating
multiple transcriptomic cohorts, we pinpointed 617 SA-AKI-associated
DEGs—21 of which overlapped between sepsis and AKI datasets. Our
Transformer-based classifier ranked five genes (MYL12B, RPL10, PTBP1,
PPIA, and TOMM7) as top diagnostic markers, with AUC values ranging
from 0.9395 to 0.9996 (MYL12B yielding 0.9996). Drug–gene interaction
mining using DGIdb (FDR < 0.05) nominated 19 candidate therapeutics for
SA-AKI. Together, these findings demonstrate that melding deep learning
with classical machine learning not only sharpens early SA-AKI
detection but also systematically uncovers actionable drug targets,
laying groundwork for precision intervention in critical care settings.
Keywords: sepsis, acute kidney injury, transformer, machine learning,
diagnostic model
1. Introduction
Sepsis-associated acute kidney injury (SA-AKI) is a critical
complication of sepsis, carrying substantial mortality risk and
frequently requiring prolonged intensive care [[38]1,[39]2]. Sepsis,
defined as life-threatening organ dysfunction caused by a dysregulated
host response to infection, triggers both inflammatory and
immunosuppressive pathways that contribute to multi-organ failure
[[40]3]. Improving Global Outcomes (KDIGO) criteria define acute kidney
injury (AKI) based on increases in serum creatinine or decreases in
urine output [[41]4]. Specifically, Stage 1 AKI is characterized by an
increase in serum creatinine to 1.5–1.9 times baseline or an increase
of ≥0.3 mg/dL (≥26.5 µmol/L), or a reduction in urine output to <0.5
mL/kg/h for 6–12 h [[42]5]. Despite these criteria, SA-AKI diagnosis
remains challenging due to heterogeneous pathophysiology and delayed
biomarker specificity. Identifying novel biomarkers and integrating
them with machine-learning-driven analytics (e.g., electronic health
records or proteomic/transcriptomic data) could enable automated
diagnosis, prognostication, and personalized therapeutic targeting.
Recent advances in biomarker discovery have led to the identification
of novel candidates, including neutrophil gelatinase-associated
lipocalin (NGAL), kidney injury molecule-1 (KIM-1), and the combination
of tissue inhibitor of metalloproteinases-2 (TIMP-2) and insulin-like
growth factor-binding protein 7 (IGFBP7). For example, NGAL and
cystatin C demonstrate utility in early risk stratification, with
cystatin C (cutoff ≥15.1 mg/L) showing exceptional accuracy in
predicting persistent SA-AKI (AUROC 0.977) [[43]6]. Importantly,
emerging evidence has demonstrated that the inflammatory response
induced by sepsis amplifies kidney damage via mechanisms like cytokine
activation [[44]7]. Pro-inflammatory cytokines such as IL-6 (>35 pg/mL)
and TNF-
[MATH: α :MATH]
(>20 pg/mL) not only amplify systemic inflammation but directly
compromise glomerular integrity through NF-
[MATH: κ :MATH]
B-mediated endothelial activation, increasing renal vascular
permeability. These mediators orchestrate immunometabolic crosstalk,
with IL-6 shown to suppress renal tubular fatty acid oxidation while
TNF-
[MATH: α :MATH]
upregulates mitochondrial ROS generation, establishing a
self-perpetuating cycle of injury [[45]8,[46]9]. In recent years, the
development of precision medicine has offered new perspectives for
personalized treatment, where understanding the key genetic pathways
involved in SA-AKI is essential to developing more effective
therapeutic strategies with fewer side effects [[47]10]. Precision
medicine advances have revolutionized SA-AKI management by identifying
critical pathogenic pathways and biomarkers, enabling targeted
therapies.
Based on 403 screened studies focusing on AKI prediction models,
machine learning algorithms have been increasingly applied to SA-AKI,
with traditional logistic regression (LR) being the most commonly used
(44% of 25 studies), followed by extreme gradient boosting (XGBoost,
20%), support vector machines (SVMs), light gradient boosting
(LightGBM), recurrent neural network-long short-term memory (RNN-LSTM),
and categorical boosting (CatBoost)
[[48]11,[49]12,[50]13,[51]14,[52]15,[53]16]. These models primarily
focus on the early identification (60%), prognostic prediction (32%),
and subtype identification (8%) of SA-AKI. XGBoost demonstrated
superior performance in multiple studies, achieving AUROCs of
0.821–0.954 for early AKI detection and mortality prediction, while
RNN-LSTM showed exceptional accuracy (AUROC 1.000) in predicting acute
kidney disease progression [[54]17]. Key predictors included serum
creatinine, lactate, blood urea nitrogen, and diabetes mellitus, with
biomarker-enhanced models performing comparably to clinical
variable-based models [[55]18]. Despite these advancements, significant
limitations persist. Over 80% of studies were retrospective and
single-center, predominantly using databases like MIMIC-III/IV,
limiting generalizability [[56]13,[57]16]. While biomarker integration
(e.g., urinary peptides, gene signatures) could enhance
interpretability and pathophysiological relevance, only 16% of studies
incorporated such approaches [[58]19]. Model interpretability remained
challenging, with only 24% employing methods like SHAP or LIME to
address the “black box” nature of machine learning [[59]15].
Additionally, studies neglected long-term outcomes (e.g., CKD
transition) and non-ICU populations, while overreliance on late
biomarkers like serum creatinine hindered early detection
[[60]1,[61]20]. Future research should prioritize multicenter
prospective validation, biomarker-augmented interpretability
frameworks, and clinical integration to improve model robustness and
therapeutic targeting interventions for SA-AKI [[62]19,[63]20,[64]21].
Sepsis-associated acute kidney injury (SA-AKI) remains a critical
challenge in clinical settings, with limited early diagnostic tools and
therapeutic options despite its significant impact on patient outcomes.
In this study, we hypothesize that our machine learning models,
particularly those leveraging a Transformer-based deep learning
approach, can predict SA-AKI with greater accuracy by identifying key
gene biomarkers from complex gene-expression profiles. The objectives
of this research are threefold: (1) to develop an integrated AI-driven
framework combining Transformer-based models with classical machine
learning techniques to detect gene biomarkers associated with SA-AKI,
(2) to assess the diagnostic potential of these biomarkers through
rigorous feature selection and validation, and (3) to explore potential
therapeutic targets by analyzing gene–drug interactions, thereby paving
the way for precision interventions in SA-AKI management.
2. Materials and Methods
2.1. Diagram of Study Flow and Data Collection
Relevant datasets were selected from the NCBI Gene Expression Omnibus
(GEO, [65]https://www.ncbi.nlm.nih.gov/gds/) (accessed on 15 October
2024). Search keywords included “Sepsis” and “Homo sapiens”, and the
data type was set to “Expression profiling by array” to obtain
gene-expression data. After screening, the dataset [66]GSE95233 was
chosen as the primary dataset, and [67]GSE69063 was selected as the
validation dataset. The [68]GSE95233 dataset is based on the [69]GPL570
[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array and
includes 124 samples, consisting of 102 sepsis samples and 22 healthy
control samples. The [70]GSE69063 dataset includes four disease
phenotypes: allergic reaction, sepsis, trauma, and healthy controls. To
meet the research requirements, 57 samples from septic patients and 33
samples from healthy controls were extracted, totaling 90 samples. The
top 1000 genes highly related to acute kidney injury were selected from
the GTEx ([71]https://www.gtexportal.org/home/ (accessed on 15 October
2024)) [[72]22], Malacards ([73]https://www.malacards.org/ (accessed on
15 October 2024)) [[74]23], Phenopedia
([75]https://phgkb.cdc.gov/PHGKB/startPagePhenoPedia.action (accessed
on 15 October 2024)) [[76]24], and DisGeNET ([77]https://disgenet.com/
(accessed on 15 October 2024)) [[78]25] databases (accessed on 24
October 2024) and downloaded for further research analysis [[79]26].
The whole workflow of this study in [80]Figure 1.
Figure 1.
[81]Figure 1
[82]Open in a new tab
The whole workflow of this study.
2.2. Sepsis and AKI-Related Gene Analysis
Gene-expression data were normalized and annotated. Considering that
multiple probes may map to the same gene symbol in the dataset, these
probes were converted into gene symbols based on the annotation files
of the platform. For probes mapping to the same gene symbol, the
average expression value of these probes was selected as the gene’s
expression level, thus obtaining the expression levels of each gene
across different samples.
Differential expression analysis was performed using the limma package
[[83]27] in R (version 2024.12.1+563). This method employs a linear
model to assess the expression differences of each gene between the
sepsis group and healthy control group. During the analysis, a
significance threshold was set as p-value < 0.05 and fold change (FC)
greater than 1.2, i.e., FC > 1.2, as the criteria for selecting
differentially expressed genes (DEGs). Using these criteria, genes with
differential expression between the sepsis group and healthy control
group were identified.
The DEGs were intersected with AKI-associated genes from public
databases to identify shared sepsis-AKI disease genes. The overlapping
genes between sepsis and AKI were identified and termed “disease
genes”. The identification of these disease genes provides strong
evidence for the relationship between sepsis and acute kidney injury,
supporting the molecular correlation between them.
2.3. Gene Function Analysis
Gene Ontology (GO) functional analysis, which includes the terms
Cellular Component (CC), Biological Process (BP), and Molecular
Function (MF), is a powerful bioinformatics tool used to categorize
gene expressions and their characteristics. KEGG pathway analysis is
employed to identify the cellular pathways that may be involved in the
changes in differentially expressed genes (DEGs). GO and KEGG pathway
enrichment analyses of AKI and septic shock DEGs were conducted using
the R package clusterProfiler [[84]28]. A p-value of <0.05 was
considered statistically significant.
2.4. Diagnostic Signature Machine Learning Analysis
To identify key diagnostic genes associated with SA-AKI, machine
learning algorithms were employed to construct multiple classification
models and perform feature selection using the DEGs identified in
previous analyses. The disease-related genes identified in previous
analyses were used as features, and corresponding gene-expression data
were extracted for model construction. The dataset was randomly divided
into a training set (70%) and a test set (30%) to ensure reliable model
training and evaluation.
The glmnet R package [[85]29] was used to implement the LASSO (Least
Absolute Shrinkage and Selection Operator) algorithm for feature
selection. LASSO applies L1 regularization to the model parameters,
causing some coefficients to become zero, thus effectively selecting
the most predictive gene features.
The SVM-RFE (Support Vector Machine Recursive Feature Elimination)
algorithm was employed, using the SVM R package [[86]30] for feature
selection. This method recursively trains the SVM model, eliminating
less important features and retaining those that have a greater impact
on classification performance.
The randomForest R package [[87]31] was used to implement the Random
Forest (RF) algorithm. The RF algorithm constructs multiple decision
trees and combines their results through voting, effectively
identifying key gene features and providing an importance score for
each gene.
The nnet R package [[88]32] was applied to implement the artificial
neural network (NNET) algorithm. The NNET model uses a multi-layer
perceptron architecture to capture complex non-linear relationships
between gene features and the target variable.
2.5. Diagnostic Signature Transformer Analysis
Building upon the machine learning approaches, the study also utilized
the Transformer model to further enhance model performance and capture
higher-order feature interactions. Unlike traditional machine learning
algorithms, the Transformer model employs self-attention mechanisms to
learn complex relationships between features, enabling it to capture
dependencies that might be difficult to detect through conventional
methods. The Transformer model, implemented using the
transformer-pytorch library, was trained using the entire dataset,
which included continuous features. This approach provided a deeper
understanding of the genetic interactions involved in SA-AKI and
facilitated more accurate identification of important diagnostic
biomarkers.
For each of the four machine learning models (LASSO, SVM-RFE, RF, and
NNET), the top 12 most important genes were selected. The intersection
of these top genes from all four methods was then determined. These
intersecting genes were subsequently input into the Transformer model,
which prioritized the five most important genes for SA-AKI diagnosis.
This comprehensive process led to the identification of a robust set of
key biomarkers, enhancing the accuracy of SA-AKI diagnosis.
2.6. Model Construction and Evaluation
After selecting the key feature genes associated with SA-AKI, this
study proceeded to construct diagnostic models based on the selected
genes. These models aimed to distinguish between sepsis, acute kidney
injury (AKI), and healthy control groups using the expression data of
the selected genes.
To assess the diagnostic capability of the models, we calculated the
AUC (Area Under the Curve) for the model and each feature gene. The AUC
is a commonly used metric to evaluate the performance of binary
classification models, with values closer to 1 indicating better
diagnostic performance [[89]33]. The AUC values for each gene were
compared, and the accuracy and stability of the models were evaluated
based on the combined results. To validate the clinical applicability
of the constructed diagnostic models, we utilized the publicly
available dataset [90]GSE69063 as a validation set. This allowed us to
assess the performance of the models on an independent sample set. By
analyzing the prediction results from the validation set, we further
confirmed the model’s ability to distinguish between sepsis and healthy
controls, as well as between sepsis and AKI [[91]34,[92]35,[93]36].
2.7. Drug Target Prediction
To identify potential drug targets, particularly for the treatment of
SA-AKI, this study retrieved drug–gene interaction information related
to the feature genes from the Drug Gene Interaction Database (DGIdb,
[94]https://www.dgidb.org/ (accessed on 15 October 2024)) [[95]37].
We extracted the key feature genes and performed drug–gene interaction
searches using the DGIdb database. DGIdb is a comprehensive resource
for drug–gene information, offering detailed data on known and
potential drugs, including drug–gene interactions, known targets, and
mechanisms of drug action. These drugs include those already approved
for other diseases as well as those showing potential efficacy in
preclinical studies. Through drug–gene interaction analysis, we
identified potential drugs targeting the key genes associated with
sepsis and acute kidney injury, providing new directions for clinical
treatment [[96]38].
3. Results
3.1. Analysis of Genes Associated with Sepsis and AKI
Through differentially expressed gene (DEG) analysis of the
[97]GSE95233 dataset, we first outlined the gene screening process
associated with both sepsis and AKI ([98]Figure 2A). UMAP (Uniform
Manifold Approximation and Projection) analysis was then used to
visualize the distribution of samples ([99]Figure 2B), revealing a
clear separation between sepsis and control groups without obvious
outliers. A total of 617 DEGs were identified, including 248
upregulated and 369 downregulated genes ([100]Figure 2C). The top 30
DEGs were further visualized using a heatmap to illustrate their
expression patterns across samples ([101]Figure 2D). Venn analysis
showed that 21 genes were shared between sepsis DEGs and key
AKI-related genes, defined as “disease genes” ([102]Figure 2E). The
expression profiles of these 21 disease genes between sepsis and
control groups were subsequently displayed in a heatmap ([103]Figure
2F).
Figure 2.
[104]Figure 2
[105]Open in a new tab
Analysis of genes associated with sepsis and AKI. (A) Identification
process of key genes in sepsis and AKI. (B) UMAP plot of the
[106]GSE95233 Dataset. The UMAP analysis displays the distribution of
sepsis (red circles) and control (blue circles) samples, with the
x-axis representing UMAP1 and the y-axis representing UMAP2. The clear
separation between the two groups indicates significant differences in
gene-expression profiles. (C) Distribution of differentially expressed
genes between the sepsis and control groups. The x-axis represents log2
fold change, and the y-axis represents −log10 FDR. Red dots indicate
upregulated genes (248), and green dots indicate downregulated genes
(369), with the threshold set at |log2FoldChange| > 1 and FDR < 0.05.
(D) Heatmap of the top 30 DEGs between the sepsis and control groups.
The x-axis represents samples (left: sepsis group; right: control
group), and the y-axis lists the gene names of the top 30 DEGs. Red
indicates upregulation, and blue indicates downregulation, with color
intensity reflecting the magnitude of expression changes. (E)
Overlapping genes between sepsis DEGs and key AKI genes, termed
“disease genes”. The Venn diagram shows the overlap between sepsis DEGs
(617 genes) and key AKI-related genes, identifying 21 shared genes
defined as “disease genes”. (F) Heatmap of the 21 disease genes between
the sepsis and control groups. The x-axis represents samples (left:
sepsis group; right: control group), and the y-axis lists the 21
disease genes. Red indicates upregulation, and blue indicates
downregulation, highlighting the expression differences between the two
groups.
3.2. Gene Function Analysis
KEGG analysis showed that DEGs in SA-AKI were mainly enriched in
pathways like “ribosome”, “antigen processing and presentation”, and
“proteasome”. Other pathways such as “oxidative amino acid metabolism”
and “Atoplasmosis” were also significantly enriched, suggesting
involvement in immune response, protein degradation, and metabolic
abnormalities ([107]Figure 3A). GO enrichment analysis revealed that
DEGs were enriched in “cytoplasmic translation”, “viral process”, and
“protein insertion into mitochondrial outer membrane” in biological
processes. In Molecular Function, genes were enriched in “structural
constituent of ribosome” and “T cell receptor binding”. Cellular
Component analysis showed enrichment in “outer mitochondrial membrane
transport complex” and “cytoplasmic ribosomal subunit” ([108]Figure
3B). Metascape network analysis classified 21 disease genes into
functions related to “Nop56p-related prerRNA complex”, “response to
virus”, “mitochondrial transport”, and “negative regulation of
apoptotic signaling”, indicating their role in immune response,
cellular stress, and cell death regulation ([109]Figure 3C). Cell type
signature analysis identified “adult kidney C9 thin ascending limb” and
“adult kidney C10 thin ascending limb” cells as key in kidney repair
and immune responses, with certain gut and lung cells also implicated
([110]Figure 3D).
Figure 3.
[111]Figure 3
[112]Open in a new tab
Enrichment analysis of disease genes. (A) KEGG pathway enrichment
analysis of disease genes. (B) Top 10 GO terms, including Biological
Process (BP), Cellular Component (CC), and Molecular Function (MF) of
disease genes. (C) Enrichment network of disease genes obtained using
the Metascape database, represented by cluster membership
relationships. (D) summary of enrichment analysis in cell type
signatures.
3.3. Diagnostic Signature Identification Analysis
To identify diagnostic biomarkers for SA-AKI, four machine learning
algorithms—LASSO, SVM-RFE, Random Forest (RF), and neural network
(NNET)—were applied. The optimal
[MATH: λ :MATH]
parameter for LASSO regression was determined using cross-validation
([113]Figure 4A). The performance of each algorithm during training was
assessed through the reverse cumulative distribution of residuals
([114]Figure 4B). Each algorithm selected the top 12 most informative
genes based on their importance scores, as measured by dropout loss
(where a smaller value indicates greater contribution to the model’s
predictive power). This cutoff of 12 genes was chosen because an
analysis of the dropout loss distribution across all models revealed a
significant decline in predictive value beyond the 12th gene
([115]Table S1). A sensitivity analysis, testing configurations of 10,
12, and 15 genes, further confirmed that 12 genes optimized model
performance, as assessed by the Area Under the Curve (AUC) on the
validation set [116]GSE69063 ([117]Figure S1). Each algorithm selected
the top 12 most informative genes, and overlapping genes among the four
methods were identified ([118]Figure 4C). These candidate genes were
subsequently used to construct a diagnostic model based on a
Transformer architecture, and SHAP (Shapley Additive Explanations)
analysis was performed to assess feature importance. Five genes with
the highest SHAP values were identified as key diagnostic markers
([119]Figure 4D). The expression levels of these five genes in sepsis
and control groups are shown in [120]Figure 4E, highlighting their
differential expression and diagnostic relevance.
Figure 4.
[121]Figure 4
[122]Open in a new tab
Analysis for identifying diagnostic signature. (A) Gene selection for
diagnostic model construction via LASSO regression. The x-axis
represents the
[MATH: λ :MATH]
value, and the y-axis represents cross-validation error. (B) Reverse
cumulative distribution of residuals during the training process of the
four machine learning models. (C) Overlapping genes selected by LASSO
regression, SVM algorithm, random forest model and the artificial
neural network. (D) SHAP values for feature importance in Transformer
model for diagnosis. (E) Expression Levels of the 5 selected feature
genes in the sepsis and control groups (**** indicates
[MATH: p<0.0001
:MATH]
).
3.4. Diagnostic Model Validation
The diagnostic capability of the five feature genes in distinguishing
sepsis and healthy samples demonstrated good diagnostic value
([123]Table S1). In our model, the AUC was 1.0, with MYL12B having an
AUC of 0.9996, RPL10 an AUC of 0.9902, PTBP1 an AUC of 0.9890, PPIA an
AUC of 0.9662, and TOMM7 an AUC of 0.9395 ([124]Figure 5A). A nomogram
was constructed based on the five feature genes, providing a visual
representation of the diagnostic model. This nomogram facilitates the
prediction of sepsis and AKI by summing the individual contributions of
each gene ([125]Figure 5B). A diagnostic model constructed based on
these five genes was validated using the independent dataset
[126]GSE69063. This external validation yielded an AUC of 0.973
([127]Figure 5C). The corresponding ROC curve confirmed the model’s
ability to accurately distinguish SA-AKI samples from healthy kidney
samples, supporting the potential clinical utility of the identified
biomarkers.
Figure 5.
[128]Figure 5
[129]Open in a new tab
Validation of model for the diagnosis sepsis and AKI. (A) The ROC curve
of the diagnostic signal showing each signature gene based on the
datasets of [130]GSE95233. (B) Norman plot constructed based on the 8
feature genes. (C) ROC curves of diagnostic signals showing the
sensitivity and specificity of our models based on [131]GSE69063.
3.5. Drug–Gene Interaction Analysis
To explore potential drug–gene interactions relevant to SA-AKI, we
queried the DGIdb database using the 12 key genes identified by our
machine learning models as critical to SA-AKI pathogenesis. This
analysis identified 19 drugs with documented interactions with these
genes, based on interaction confidence scores (>0.7) and established
pharmacological data ([132]Figure 6). These drugs include various
immunosuppressants, antiviral agents, and some anticancer drugs,
suggesting that they may play significant roles in regulating immune
responses, inflammation, and apoptosis. After excluding the unapproved
drugs from the initial list, 11 drugs remained, which may serve as
potential treatments for sepsis and acute kidney injury ([133]Table
S3). Notably, some of these drugs, such as cyclosporine (associated
with the PPIA gene), are well-documented nephrotoxins capable of
causing AKI [[134]39,[135]40]. For example, cyclosporine interacts with
PPIA (cyclophilin A), a protein involved in inflammatory pathways
[[136]40], yet its nephrotoxic profile, characterized by acute renal
vasoconstriction and chronic structural damage [[137]41], precludes its
consideration as a therapeutic option for SA-AKI. Similarly,
fluorouracil, linked to PTBP1 (an RNA-binding protein), emerged due to
its regulatory effects on gene expression, though its clinical use is
primarily in oncology and it has been associated with nephrotoxicity
[[138]42,[139]43]. These associations, derived from computational
analysis using machine learning models, highlight the complexity of
drug–gene interactions, where such models can uncover connections
reflecting both potential biological relevance and adverse effects,
depending on the clinical context [[140]44,[141]45].
Figure 6.
[142]Figure 6
[143]Open in a new tab
Drug–gene interaction diagram, blue circles represent signature genes
and the yellow circles represent drugs.
4. Discussion
This study introduces an innovative approach to identifying key
diagnostic biomarkers for SA-AKI by integrating gene-expression
analysis with advanced machine learning techniques, including the
application of Transformer-based models in conjunction with traditional
machine learning methods. Through a comprehensive multi-step process,
we successfully identified pivotal feature genes that differentiate
SA-AKI from healthy controls.
SA-AKI was interrogated via differential expression analysis, which
uncovered 617 DEGs—including 248 upregulated and 369 downregulated
genes. Subsequent Venn analysis identified 21 genes common to both
sepsis and acute kidney injury, positioning them as potential
biomarkers for SA-AKI. KEGG and GO pathway enrichment analyses further
underscored the key biological processes involved: KEGG analysis
demonstrated significant enrichment in immune response, protein
synthesis, and antigen presentation pathways, while GO analysis
revealed that these genes principally participate in immune response,
cellular translation, and viral processes—pathways intimately linked to
the pathogenesis of SA-AKI. To refine these biomarkers, an integrative
feature selection strategy employing four machine learning algorithms
(LASSO, SVM-RFE, RF, and NNET) was implemented, yielding eight
consensus genes. Further enhancing model performance, a
Transformer-based deep learning model was incorporated to capture
complex nonlinear interactions, ultimately prioritizing five key genes
as critical diagnostic biomarkers. The resultant diagnostic framework
exhibited excellent performance in the validation set, affirming its
reliability and predictive power. Moreover, drug–gene interaction
mining via DGIdb prioritized 19 therapeutic candidates, offering novel
avenues for intervention in SA-AKI.
The results of this study align with findings in the existing
literature while also offering new perspectives. Key genes we
identified, such as RPL10, MYL12B, and PPIA, have been shown to play
significant roles in the mechanisms associated with sepsis and AKI.
Specifically, RPL10, as part of the ribosome, plays a crucial role in
translation elongation and the maturation of ribosomal subunits
[[144]46], a function that is supported by our findings linking immune
response and protein synthesis pathways. Similarly, MYL12B is involved
in regulating dynamic changes in the cytoskeleton and influences the
cell division process by modulating the activity of myosin II
[[145]47], which correlates with the relationship between cell division
and immune regulation observed in our study. PPIA promotes inflammation
and cell signaling, playing a role in kidney injury [[146]40], a
finding supported by this study, suggesting its potential as a
therapeutic target for sepsis and AKI.
In this study, we introduce a novel AI-driven framework that
synergistically integrates Transformer models with machine learning
algorithms (LASSO, SVM-RFE, Random Forest, and NNET) to identify
diagnostic biomarkers and therapeutic targets for SA-AKI. This
innovative methodology not only offers fresh insights into disease
mechanisms but also combines differential gene-expression analysis,
KEGG/GO pathway enrichment, machine-learning-based feature selection,
and drug–gene interaction analysis to provide a comprehensive view of
SA-AKI. The identification of five key genes and 19 potential
therapeutic candidates furnishes valuable references for early