Abstract

   Sepsis represents a significant global health challenge, necessitating
   early detection and effective treatment for improved outcomes. While
   traditional inflammatory markers facilitate the diagnosis of sepsis,
   the aspect of immune suppression remains poorly addressed. This study
   aimed to identify critical immune-related genes (IIRGs) associated with
   sepsis through genomic analysis and machine learning techniques,
   thereby enhancing diagnostic and treatment response predictions.
   Analyses of two extensive datasets were conducted, identifying
   significant immune genes using the ESTIMATE algorithm, Weighted Gene
   Correlation Network Analysis (WGCNA), and five machine learning
   methods. Prediction models were constructed and validated using six
   machine learning algorithms, achieving high accuracy (AUC > 0.75).
   Eleven key IIRGs were identified as active in immune pathways, such as
   the JAK-STAT signaling pathway, and were significantly correlated with
   immune cell infiltration in sepsis. Additionally, drug sensitivity
   analysis indicated that IIRGs correlated with responses to anticancer
   drugs. These results underscore the potential of these genes in
   enhancing sepsis diagnosis and treatment, highlighting the imperative
   for further validation across diverse populations.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-025-93010-8.

   Keywords: Sepsis, Immune-related genes, Machine learning, Diagnostic
   framework, Therapeutic targets

   Subject terms: Computational biology and bioinformatics, Genetics,
   Biomarkers, Diseases

Introduction

   Sepsis, a life-threatening condition triggered by a dysregulated immune
   response to infection, continues to pose a significan global health
   challenge due to its high prevalence and mortality rates. A 2020 Lancet
   study titled “Global, regional, and national sepsis incidence and
   mortality, 1990–2017: analysis for the Global Burden of Disease Study”
   reported approximately 48.9 million cases and 11 million related deaths
   worldwide in 2017^[30]1. These alarming figures significantly exceed
   previous estimates, underscoring the urgent need to address sepsis as a
   primary cause of death and a critical public health issue. Rapid and
   accurate diagnosis, coupled with effective treatment strategies, are
   crucial for enhancing patient survival.

   Sepsis arises from a chaotic immune response involving both innate and
   adaptive systems, leading to excessive immune activation, widespread
   inflammation, and tissue damage that may result in organ failure.
   Current research focuses on the identification of biomarkers to enhance
   the early diagnosis and prognosis of sepsis^[31]2–[32]4. Commonly
   investigated biomarkers include C-reactive protein (CRP), Procalcitonin
   (PCT), Interleukin-6 (IL-6), Tumor Necrosis Factor-α (TNF-α), IL-1β,
   soluble Triggering Receptor Expressed on Myeloid cells-1 (sTREM-1),
   endothelial markers, and MicroRNA. Although these biomarkers show
   promise, they lack the accuracy required for dependable clinical use.
   Further research and refinement are necessary to develop markers that
   provide both sensitivity and specificity for sepsis
   diagnosis^[33]4,[34]5.

   As medical technology evolves, machine learning (ML) models^[35]6,[36]7
   and related technologies are becoming pivotal tools for disease
   detection and prediction. Jiang et al.^[37]8 identified diagnostic
   genes and molecular mechanisms of Alzheimer’s disease using ML
   algorithms. Wang et al.^[38]9 applied deep learning to predict the
   association between circRNA and diseases, while other
   researchers^[39]10,[40]11 have utilized artificial intelligence to
   identify tumor-related biomarkers. These data-driven algorithms process
   vast quantities of clinical and biological data, detecting complex
   patterns that may elude human experts^[41]12. ML models have proven to
   offer earlier and more accurate detection of sepsis than traditional
   methods, facilitating prompt interventions and potentially enhancing
   patient outcomes^[42]13. Notably, algorithms such as Random Forest (RF)
   and eXtreme Gradient Boosting (XGBoost) have shown exceptional efficacy
   in predicting sepsis onset. One model, SepsisFinder, has demonstrated
   the ability to detect sepsis earlier than conventional models such as
   NEWS2 and GBDT under comparable sensitivity settings^[43]14,[44]15. ML
   models are also being explored for their capability to predict
   sepsis-related mortality, with a systematic review and
   meta-analysis^[45]16 confirming their significant potential in this
   domain. For instance, a focused meta-analysis in ICU settings revealed
   that RF and XGBoost were particularly effective in predicting sepsis
   outcomes. These findings emphasize the crucial role of ML in advancing
   sepsis diagnosis and treatment.

   Our study aims to extend these developments by integrating diverse ML
   algorithms with extensive patient datasets, to further enhance the
   accuracy and reliability of sepsis prediction. We are particularly
   interested in IIRGs and are developing multi-classifier ML models to
   improve diagnostic precision. In our preprint (DOI:
   10.21203/rs.3.rs-4306022/v1), we explore the use of ML-derived
   biomarkers from immune sources and examine the role of immune cells in
   sepsis pathogenesis. By assessing the significance of each gene, we aim
   to elucidate their mechanisms and associations with drug
   responsiveness. This approach is designed to advance early diagnosis,
   tailor treatment strategies, and ultimately improve patient care in
   sepsis management.

Materials and methods

Data download

   Utilizing the R package GEOquery (version 2.68.0)^[46]17, expression
   data were retrieved from the GEO database for datasets [47]GSE154918
   (n = 105) and [48]GSE134347 (n = 298)^[49]18. The [50]GSE154918 dataset
   consists of gene expression profiles from 56 patients diagnosed with
   sepsis and 49 healthy controls, totaling 105 samples, and was used as
   the validation dataset for this project. This dataset enables
   comprehensive comparison of gene expression patterns between patients
   with sepsis and healthy individuals. Similarly, the [51]GSE134347
   dataset includes gene expression data from 215 sepsis patients and 83
   healthy controls, totaling 298 samples, and served as the training
   dataset for this project. This dataset provides extensive resources for
   analyzing the genetic basis of sepsis, contrasting the expression
   profiles of affected individuals with those of healthy donors.

   The data platform for dataset [52]GSE154918 was [53]GPL20301 Illumina
   HiSeq 4000 (Homo sapiens); the data platform for dataset [54]GSE134347
   was [55]GPL17586 Affymetrix Human Transcriptome Array 2. All samples in
   dataset [56]GSE154918 and [57]GSE134347 were corrected for batch
   effects prior to further analysis. See Table [58]S1 for specific
   dataset information.

   To identify IIRGs, the ImmPort database was utilized, available at
   [[59]https://www.immport.org/home]. ImmPort serves as a critical
   resource in immunology, providing a platform for the aggregation,
   organization, and dissemination of research data. It supports the life
   sciences research community by facilitating the archival and exchange
   of scientific data through advanced information technology. This
   database offers a robust repository for both research and clinical
   data, ensuring long-term, sustainable storage. From this database, a
   list of 1,509 unique IIRGs was meticulously compiled and
   cross-verified. For detailed information on these genes, refer to
   Supplementary Table [60]S2.

   During the data integration process, the issue of missing data was
   carefully addressed to minimize its potential impact on analysis
   results. Gene expression data were standardized prior to analysis to
   ensure comparability among different samples.

   For batch effects, the R package asva was employed to process the data
   and eliminate batch effects between datasets, ensuring uniform data
   distribution and allowing research findings to more accurately reflect
   real biological differences.

Estimation

   The Estimation (Estimation of STromal and Immune cells in MAlignant
   Tumor tissues using Expression data) algorithm is utilized to assess
   the purity of tumor samples. Developed by MD Anderson Cancer, it
   requires only a simple gene expression matrix to infer the levels of
   immune cells, stromal cells, and tumor purity. For this study, the
   sepsis dataset [61]GSE134347 was input into the ESTIMATE algorithm to
   calculate the immune score, stroma score, ESTIMATE score, and tumor
   purity for each sample. The differences in these four scores between
   sepsis and non-sepsis subgroups were tested using the Wilcoxon test n.
   P < 0.05 was considered statistically significant.

WGCNA algorithm was used to identify disease-related genes in the sepsis
dataset

   To isolate Sepsis-Related Genes (SRGs), the initial step involved
   applying the WGCNA^[62]19. WGCNA aims to identify modules of
   co-expressed genes, elucidate the relationship between gene networks
   and immunity, and pinpoint key genes within these networks. The
   ‘pickSoftThreshold’ function was used to determine the optimal soft
   threshold, which was set at 5, facilitating the construction of
   scale-free networks. Topological matrices were generated, followed by
   hierarchical clustering. Setting a minimum gene count of 50 for each
   module, we dynamically sliced and identified gene modules, computed
   module Eigengenes, and used these Eigengenes to establish inter-module
   correlations and perform further hierarchical clustering. Modules with
   correlations above 0.25 were merged, resulting in a total of 22
   distinct modules. The relationship between these modules and clinical
   features was analyzed using Pearson or Spearman correlation analysis.

   To identify immune-related differentially expressed genes (IRDEGs)
   associated with sepsis, the SRGs identified through WGCNA were
   intersected with the IIRGs from the ImmPort database. To visualize
   these genes, the R package ‘pheatmap’ (version 1.0.12) was employed to
   create an expression heatmap.

Screening for important IIRGs in the sepsis dataset

   To refine the identification of important genes within IRDEGs, five
   prevalent ML algorithms were employed: Elastic Net, LASSO regression,
   RF, Boruta, and XGBoost decision trees. Elastic Net, a linear
   regression model, incorporates both L1 and L2 norm regularization in
   its training. LASSO regression introduces a penalty term to reduce
   overfitting and enhance generalizability, implemented using the
   ‘glmnet’ R package. The outcomes of LASSO regression are visually
   represented through diagnostic model diagrams and variable trace plots,
   demonstrating their effectiveness in identifying important genes.

   RF uses ensemble learning to integrate multiple decision trees,
   handling nonlinear relationships and complex interactions effectively.
   It is robust, performing well with missing data and noise, and
   evaluates the importance of all genes comprehensively. RF aggregates
   predictions from multiple trees, with the final decision derived
   through majority vote, executed via the ‘caret’ package.

   Boruta, a feature selection method gaining popularity, identifies all
   features correlated with the dependent variable, regardless of their
   impact on a specific model’s cost function. This method ensures that no
   important gene features are missed and is suitable for feature
   screening purposes, especially when complete correlation rather than
   specific model adaptation is of concern. The ‘Boruta’ package was
   applied to achieve this.

   XGBoost, a gradient boosting algorithm, builds its model iteratively,
   each time adding a CART tree that fits the residual differences from
   the previous trees’ predictions. XGBoost is particularly suitable for
   processing large-scale data. It optimizes each iteration through the
   forward distribution algorithm to improve the accuracy and robustness
   of the model. In our analysis, XGBoost effectively identified complex
   gene interactions, thereby improving the classification performance of
   the final mode, facilitated by the ‘xgboost’ package. Additionally,
   each ML algorithm was fine-tuned using Cross-Validation (CV) for
   hyperparameter optimization, ensuring model performance enhancement. To
   bolster robustness, the optimization was repeated ten times for each
   resampling, each time with a different random seed.

   When the five different ML models were used to predict the dataset of
   sepsis samples, they all demonstrated high predictive performance (such
   as AUC, C-index, and F1-score), indicating that the features we
   screened are effective and possess good predictive ability in practical
   applications. The complementarity between the five ML methods helps to
   identify more important feature genes, which showed consistent
   importance in different models. This ensured that the features selected
   were not only important but also highly stable, improving the
   robustness and accuracy of the final analysis.

   Ultimately, to achieve stable results, genes identified by all five ML
   algorithms were consolidated as the final set of important IIRGs for
   our ensuing predictive model development.

The diagnostic model of IIRGs in the sepsis dataset

   In our effort to develop a sophisticated sepsis response classification
   model, IIRGs were trained using six prevalent ML algorithms: Naive
   Bayes (NB), Conditional Inference RF (cforest), LogitBoost (an advanced
   form of logistic regression), Gradient Boosting Machine (GBM), Model
   Averaged Neural Network (avNNet), and Penalized Discriminant Analysis
   (pda). For all these algorithms, cross-validation (CV) was meticulously
   employed for hyperparameter tuning, aiming to enhance the model’s
   performance and accuracy. To ensure the robustness of our models, this
   optimization process was diligently repeated ten times, with a unique
   random seed for each iteration of resampling.

   Following the construction of classifiers using these diverse
   algorithmic models, a thorough analysis was conducted through the
   validation dataset [63]GSE154918. This step was crucial in determining
   the most effective algorithm in terms of classification performance
   within the validation dataset. Subsequently, the algorithm that
   demonstrated the best classification efficacy was selected for the
   final assembly of our sepsis prediction model. This approach
   underscores our commitment to precision and reliability in developing a
   model adept at predicting sepsis with high accuracy.

Enrichment analysis

   The utilization of GO analysis is a prevalent method for conducting
   comprehensive functional enrichment studies. This analysis encompasses
   three key areas: Molecular Function (MF), Biological Process (BP) and
   Cellular Component (CC)^[64]20. Additionally, the KEGG database is
   extensively employed for its vast repository of information on genomes,
   biological pathways, diseases, and pharmaceuticals^[65]21. To explore
   the potential mechanisms underlying the actions of the identified
   crucial IIRGs, the ‘clusterProfiler’ package in R (version 4.8.3) was
   leveraged. This package facilitated our detailed exploration through GO
   annotation analysis and KEGG pathway enrichment analysis. In our study,
   a False Discovery Rate (FDR) threshold of less than 0.05 was set as the
   benchmark for statistical significance, ensuring the reliability and
   relevance of our findings.

CIBERSORT

   CIBERSORT, accessible at [[66]https://cibersortx.stanford.edu/],
   utilizes linear support vector regression, a sophisticated statistical
   technique in ML^[67]22. Available as both an R package and a web-based
   application, it excels in deconvolving expression matrices of various
   human immune cell subtypes. The tool effectively evaluates the
   infiltration status of immune cells in sequenced samples, using a gene
   expression signature set that represents 22 distinct immune cell
   subtypes. In our study, we used the CIBERSORT algorithm to assess
   immune cell infiltration in a composite dataset of tumor samples. We
   also employed the Wilcoxon test to analyze variations in immune cell
   infiltration among sepsis and non-sepsis subgroups, setting a P value
   of less than 0.05 as the threshold for statistical significance.

ssGSEA immunoinfiltration analysis

   The single-sample gene set enrichment analysis (ssGSEA) algorithm was
   implemented to precisely quantify the relative abundance of immune cell
   infiltration^[68]23. Initially, labels were assigned to various
   infiltrated immune cell types, including Activated CD8 T cell, Gamma
   delta T cell, Natural killer cell, and Regulatory T cell. The ssGSEA
   analysis calculated enrichment scores, which indicated the relative
   abundance of each immune cell type within individual samples. For
   graphical representation, the ggplot2 package (version 3.4.2) was used
   to illustrate the distribution patterns in sepsis and control groups.
   Additionally, the Wilcoxon test was employed to determine the
   differences in immune cell infiltration between the sepsis and
   non-sepsis subgroups, establishing a P-value threshold of less than
   0.05 to denote statistical significance.

Correlation analysis of IIRGs and immune infiltration in the sepsis dataset

   To further explore the role of IIRGs in sepsis, our study extended its
   analysis to the correlation between the expression of IIRGs and immune
   infiltration in sepsis patients, specifically within the sepsis dataset
   [69]GSE134347. We also investigated the relationship between the
   expression of IIRGs and immune checkpoints in these patients. For this
   correlation analysis, Pearson correlation analysis was our primary
   analytical method. The ‘ggcorrplot’ R package (version 0.1.4.1) was
   utilized to create detailed correlation loop diagrams, enhancing our
   understanding of the interactions and potential impact of these IIRGs
   in the context of sepsis and endometriosis.

PPI network analysis (STRING)

   The STRING database, known for its extensive mapping of both
   established and speculative Protein–Protein Interaction (PPI)^[70]24,
   was utilized as a key resource in our study. We used this database to
   construct a PPI network for the essential genes identified. The
   parameters for this construction were carefully set at coefficients of
   0.4, 0.7, and 0.9 to ensure optimal specificity and relevance. Data
   from the STRING database were exported and visualized using
   Cytoscape^[71]25, a sophisticated tool for complex network analysis and
   visualization. Additionally, to explore the central components of this
   network, the CytoHubba plug-in^[72]26 was employed for an in-depth
   analysis of the hub genes. This approach enabled the unraveling of the
   complex web of interactions among key proteins and the identification
   of pivotal genes.

Drug sensitivity analysis

   The complex genomic alterations in various cancers significantly
   influence clinical treatment responses, often acting as reliable
   biomarkers for drug efficacy. In this context, the Genomics of Drug
   Sensitivity in Cancer (GDSC) database (accessible at
   [73]www.cancerRxgene.org) is recognized as the most extensive publicly
   available resource, providing valuable insights into drug sensitivity
   and molecular indicators of drug response in cancer cells. Utilizing
   the pRRophetic algorithm, we predicted the sensitivity of patients,
   grouped by different clinical variables, to prevalent anti-cancer drugs
   or small molecular compounds. This prediction was based on the analysis
   of expression matrices from the dataset and involved calculating IC50
   values. Comparative group comparison graphs were utilized to present
   our findings effectively, allowing a clear and detailed visual
   representation of the results and facilitating a better understanding
   of drug response dynamics in cancer treatments.

Statistical analysis

   Data processing and statistical analysis in our study were meticulously
   conducted using R software (available at
   [[74]https://www.r-project.org/], version 4.0.2). For continuous
   variables across two groups, statistical significance was determined
   for variables with normal distribution through the independent
   Student’s t-test. For variables not adhering to normal distribution,
   the Mann–Whitney U test (also known as the Wilcoxon rank-sum test) was
   used to analyze differences. For categorical variables, the Chi-square
   test or Fisher’s exact test was relied upon to evaluate statistical
   significance between two groups. In all instances, statistical P values
   were considered from a bilateral perspective, with a threshold of
   P < 0.05 established to denote statistical significance. In this study,
   P values were not adjusted for multiple comparisons, reflecting more
   accurately the significance level of each test^[75]27. When
   constructing the model, we controlled for relevant variables such as
   clinical characteristics and minimized the influence of confounding
   factors, ensuring the reliability of the results. This rigorous
   approach ensures a comprehensive and accurate assessment of the data,
   adhering to the highest standards of statistical analysis.

Results

ESTIMATE score distribution between patients with sepsis and those without
sepsis

   According our technology roadmap (Fig. [76]S1), the ESTIMATE algorithm
   was initially employed to calculate four immune-related metrics for
   sepsis: immune score, stroma score, ESTIMATE score, and tumor
   purity^[77]28. These scores were compared between sepsis and non-sepsis
   patient groups. As depicted in the heatmap of Fig. [78]S2-A, a notable
   disparity was observed in the distribution of these four immune-related
   scores between sepsis patients and normal sample. Notably, sepsis
   patients exhibited elevated stroma scores (Anova test, P < 2.2e−16,
   Fig. [79]S2-B) and tumor purity (Anova test, P = 7.9e−09,
   Fig. [80]S2-C), while showing reduced ESTIMATE scores (Anova test,
   P = 7.9e−09, Fig. [81]S2-D) and immune scores (Anova test, P < 2.2e−16,
   Fig. [82]S2-E). This analysis provides valuable insights into the
   immunological landscape of sepsis, highlighting significant differences
   in immune and stromal components.

Identification and correlation analysis of IIRG modules in sepsis

   A WGCNA was conducted on the sepsis dataset to identify gene modules
   associated with sepsis immunity. The analysis revealed that the optimal
   soft threshold was 5, achieving the lowest mean connectivity
   (Fig. [83]S3-A,B). Figure [84]S3-C displayed the gene clustering
   numbers, with various modules differentiated by distinct colors.
   Subsequent steps involved identifying gene modules in relation to the
   four immune-related scores. The correlation heatmap between different
   gene modules and the four immune-related scores is shown in
   Fig. [85]S3-C, revealing intriguing correlations; for instance, matrix
   scores exhibited the strongest association with the darkgreen module.
   Conversely, immune scores were most closely linked with the grey
   module. Both the ESTIMATE score and tumor purity showed the highest
   correlation with the black module. These findings offer a nuanced
   understanding of the relationships between specific gene modules and
   key immune-related scores in the context of sepsis.

   In our analysis of gene clustering within different color-coded
   modules, a notable similarity in expression patterns among genes
   grouped in the same colored module was observed (Fig. [86]1A). Further
   exploration into the inter-modular relationships revealed relatively
   low correlation levels between different modules, as illustrated in
   Fig. [87]1B. We then focused on detailing the heatmap of correlations
   between these diverse colored modules and sepsis. This analysis brought
   to light that the darkmagenta module demonstrated the most significant
   negative correlation with sepsis (r = − 0.78), whereas the brown module
   exhibited the most substantial positive correlation with the disease
   (r = 0.7), as shown in Fig. [88]1C. Subsequently, detailed scatter
   plots were presented illustrating the correlation between the brown
   module and its associated genes. This scatter plot analysis revealed a
   significant correlation (P < 0.05, Fig. [89]1D). Based on these
   findings, the genes within the brown module were ultimately selected as
   the final identified IIRGs. This decision was grounded in the strong
   correlation these genes exhibited with sepsis, underscoring their
   potential importance in understanding the disease’s immunological
   aspects.

Fig. 1.

   [90]Fig. 1
   [91]Open in a new tab

   Correlation analysis of the most relevant modules in the sepsis
   dataset. (A) Clustering network diagram between the genes of different
   color modules; (B) heat maps of correlations between different modules,
   with blue representing high correlations and purple representing low
   correlations; (C) heat maps of correlation between different colors and
   sepsis, with red representing positive correlation and blue
   representing negative correlation; And (D) scatter plots of
   correlations in the brown gene module.

Expression differences and biological pathway enrichment analysis of IIRGs in
sepsis

   To enhance our understanding of IIRGs in sepsis, we first sourced a set
   from the ImmPort database. These were intersected with disease-related
   genes identified through WGCNA, and the intersection was depicted in a
   Venn diagram (Fig. [92]2A). This process led to the identification of
   108 IRDEGs specifically expressed in the context of sepsis (Table
   [93]S3). To compare their expression patterns visually, we utilized the
   R-package ‘pheatmap’ to create a heatmap. Displayed in Fig. [94]2B,
   this heatmap showed distinct expression patterns of these 108 genes
   between sepsis and non-sepsis patients. Furthermore, our analysis
   revealed that genes significantly overexpressed in sepsis patients were
   predominantly enriched in biological pathways such as Osteoclast
   Differentiation, B Cell Receptor Signaling Pathway, Th17 Cell
   Differentiation, and T Cell Receptor Signaling Pathway. Conversely,
   genes markedly upregulated in healthy patients showed significant
   enrichment in the Chemokine Signaling Pathway, Th17 Cell
   Differentiation, JAK-STAT Signaling Pathway, PD-L1 Expression, and the
   PD-1 Checkpoint Pathway in Cancer, among other pathways. These findings
   offer crucial insights into the distinct immunological landscapes
   characterizing sepsis patients compared to healthy individuals.

Fig. 2.

   [95]Fig. 2
   [96]Open in a new tab

   Identification of IRDEGs in the sepsis dataset. (A) Intersection
   diagram of disease-related genes and immune-related genes identified by
   WGCNA; and (B) heat maps of expression of sepsis associated immune
   genes between sepsis and normal patients. Blue represents high
   expression and purple represents low expression. IRDEGs, immune-related
   differentially expressed genes; WGCNA, weighted gene correlation
   network analysis.

Functional enrichment analysis of IRDEGs reveals

   To elucidate the potential molecular mechanisms underpinning the
   IRDEGs, we performed both GO and KEGG functional enrichment analyses on
   the 108 IRDEGs. The KEGG analysis identified significant enrichment of
   these genes in pathways associated with cancer immunity. Notable
   pathways include Osteoclast Differentiation, Th17 Cell Differentiation,
   Cytokine–Cytokine Receptor Interaction, B Cell Receptor Signaling
   Pathway, T Cell Receptor Signaling Pathway, and the JAK-STAT Signaling
   Pathway, as shown in Fig. [97]S4-A and Table [98]S4. Furthermore, the
   GO functional enrichment analysis highlighted their significant roles
   in critical biological processes. These include the Cytokine-Mediated
   Signaling Pathway, Positive Regulation of Cytokine Production,
   Leukocyte Mediated Immunity, Immune Receptor Activity, Growth Factor
   Receptor Binding, and the T Cell Receptor Complex, as depicted in
   Fig. [99]S4-B and Table [100]S5. These findings provide insight into
   the roles of these 108 IRDEGs, particularly in contributing to key
   immune pathways and processes.

IIRG features identified by ML algorithm and displayed in a Venn diagram

   To further identify critical features in IRDEGs, we utilized five
   common ML algorithms: Elastic Net, LASSO regression, RF, Boruta, and
   XGBoost decision trees. LASSO regression identified 53 important
   genetic features (Fig. [101]3A); Elastic network identified 38
   important genetic features (Fig. [102]3B); RF identified 108 important
   genetic features (Fig. [103]3C); 61 important gene features were
   identified by Boruta algorithm (Fig. [104]3D); and XGBoost identified
   20 important genetic features (Fig. [105]3E). As illustrated in the
   Venn diagram (Fig. [106]3F), the five ML algorithms collectively
   identified 11 IIRGs, which we subsequently designated as marker genes.

Fig. 3.

   [107]Fig. 3
   [108]Open in a new tab

   Screening important immune-related features in the sepsis dataset. (A)
   LASSO regression screening for important features; (B) elastic network
   screening important features; (C) RF screening important features; (D)
   Boruta screening for important features; (E) XGBoost algorithm filters
   important features. LASSO, Least absolute shrinkage and selection
   operator.

High-performance sepsis prediction model developed using six ML algorithms

   Subsequently, we employed six different ML algorithms to develop sepsis
   prediction models. The results demonstrated that all six prediction
   models achieved high AUC values (Fig. [109]4A), and the models’ C-index
   and F1-scores were also high (Fig. [110]4B), indicating that the
   prediction model we developed exhibits high prediction performance.

Fig. 4.

   [111]Fig. 4
   [112]Open in a new tab

   Construction of sepsis prediction model. (A) ROC curve of sepsis
   prediction model constructed by different machine learning algorithms;
   and (B) C-index and F1-score of sepsis prediction models built using
   different machine learning algorithms. ROC, receiver operating
   characteristic.

Independent dataset validates ML algorithm model for sepsis prediction

   The predictive performance of our model was validated using an
   independent sepsis dataset. In the [113]GSE154918 dataset, models
   constructed using six different ML algorithms demonstrated relatively
   high AUC values, all greater than 0.75, with the pda model reaching
   0.901 (Fig. [114]5A, Table [115]S6). However, these models showed
   relatively low C-index and F1-scores (Fig. [116]5B), suggesting that
   our model provides good and stable predictive performance for sepsis.

Fig. 5.

   [117]Fig. 5
   [118]Open in a new tab

   Verifying the sepsis prediction model in the [119]GSE154918 dataset.
   (A) ROC curve of sepsis prediction model constructed by different
   machine learning algorithms in [120]GSE154918 dataset; (B) C-index and
   F1-score of sepsis prediction models constructed using different
   machine learning algorithms in the [121]GSE154918 dataset. ROC,
   receiver operating characteristic.

Importance and contribution of genetic features in different models

   The contribution of 11 IIRGs was investigated across various models. In
   the NB model, gene MAPK14 was identified as the most contributive to
   sample prediction (Fig. [122]S5a-A). In the LogitBoost model, gene IL10
   made the largest contribution (Fig. [123]S5a-B); in the GBM model, gene
   IL21R was the most influential (Fig. [124]S5a-C); in the cforest model,
   gene MAPK14 again played a significant role (Fig. [125]S5a-D). In the
   avNNet model, gene JAK2 was the most contributive (Fig. [126]S5a-E).
   For the pda model, gene MAPK14 was found to be the most influential
   (Fig. [127]S5a-F).

   In the NB model, gene JAK2 made the largest contribution
   (Fig. [128]S5b-A). In the LogitBoost model, gene SOCS3 was the most
   contributive (Fig. [129]S5b-B); in the GBM model, gene NCR3 played the
   most significant role (Fig. [130]S5b-C). In the cforest model, gene
   MAPK14 was identified as the most influential (Fig. [131]S5b-D). In the
   avNNet model, gene JAK2 made the largest contribution
   (Fig. [132]S5b-E). In the pda model, gene MAPK14 was the most
   significant contributor (Fig. [133]S5b-F).

Relationship analysis between IIRGs and immune infiltration

   To explore the association between the 11 IIRGs and immune infiltration
   in sepsis, we utilized CIBERSORT (Table [134]S7) and ssGSEA (Table
   [135]S8) methodologies. The CIBERSORT analysis revealed significant
   differences in immune cell profiles between sepsis patients and healthy
   controls (Fig. [136]6A). Specifically, immune cells such as T cells CD4
   memory resting, T cells CD8, NK cells resting, B cells naive, and T
   cells CD4 naive were predominantly observed in healthy individuals,
   whereas Neutrophils, T cells regulatory (Tregs), Macrophages M0, and
   Monocytes were notably more abundant in sepsis patients.

Fig. 6.

   [137]Fig. 6
   [138]Open in a new tab

   Analysis of immune infiltration in sepsis and healthy patients in the
   sepsis dataset. (A) Distribution of immune cell infiltration between
   sepsis and non-sepsis patients in CIBERSORT algorithm; and (B)
   distribution of immune cell infiltration between sepsis and non-sepsis
   patients in ssGSEA algorithm, with purple representing high
   infiltration and blue representing low infiltration.ssGSEA,
   single-sample gene-set enrichment analysis.

   The ssGSEA analysis indicated marked disparities in immune infiltration
   between sepsis patients and healthy individuals (Fig. [139]6B). In
   sepsis patients, immune cells such as Neutrophils, Th2 cells,
   Macrophages, Mast cells, iDC, DC, Th17 cells, Treg cells, and aDC were
   significantly enriched. Conversely, immune cells like NK CD56bright
   cells, TNK cells, Th1 cells, NK CD56dim cells, and Tfh cells were more
   prevalent in healthy patients. These findings provide valuable insights
   into the immune landscape of sepsis, highlighting the distinct immune
   cell profiles in sepsis patients compared to healthy individuals and
   emphasizing the potential roles of these 11 IIRGs in modulating immune
   responses in sepsis.

   In our study, we explored the correlation between the expression of 11
   IIRGs and the levels of immune infiltration in patients, visualizing
   these relationships through correlation circles. Within the CIBERSORT
   analysis, a significant correlation was observed between the expression
   of these 11 IIRGs and various immune cells, notably T cell CD8, T cells
   CD4 memory resting, Macrophages M0, and Neutrophils (Fig. [140]S6-A).
   Similarly, the ssGSEA analysis showed a strong correlation between the
   expression of these genes and immune cells such as T cell CD8, B cells,
   Cytotoxic cells, Macrophages, NK cells, and T cells (Fig. [141]S6-B).

Analysis of the relationship between IIRGs and immune checkpoints

   Furthermore, we investigated the interaction between the expression of
   these 11 IIRGs and immune checkpoint genes. Initially, the expression
   patterns of immune checkpoint genes were compared between sepsis
   patients and healthy samples. This comparison revealed a notable
   difference in the expression of these genes between the two groups,
   suggesting a potential impact of immune factors on the progression of
   sepsis (Fig. [142]7A). Subsequent analysis focused on the correlation
   between the expression of these 11 IIRGs and immune checkpoint genes,
   again represented through correlation circles. Notably, genes such as
   IL21R, NCR3, and TRAV30 from our characteristic set exhibited a
   significant positive correlation with the expressions of most immune
   checkpoint genes (Pearson correlation analysis, P < 0.05), including
   HLA-DPB1, HLA-DPA1, HLA-DQB2, HLA-DRB1, HLA-DQA1 (Fig. [143]7B).
   Conversely, the other IIRGs in the study often displayed a negative
   correlation with the expression levels of a similar spectrum of immune
   checkpoint-related genes (Pearson correlation analysis, P < 0.05), as
   indicated in Fig. [144]7B. These results underscore the intricate
   relationships between IIRGs and immune checkpoints, providing critical
   insights into their influential roles within the immune responses
   characteristic of sepsis.

Fig. 7.

   [145]Fig. 7
   [146]Open in a new tab

   Correlation analysis between IIRGs and immune checkpoint related genes
   in the sepsis dataset. (A) Heat maps of the expression of immune
   checkpoint related genes in the sepsis training dataset in patients
   with sepsis and patients without sepsis, with blue representing low
   expression and purple representing high expression. (B) The correlation
   between the expression of IIRGs in the sepsis training dataset and
   immune checkpoint-related genes, blue represents negative correlation
   and purple represents positive correlation. IIRGs, important
   immune-related genes.

Protein interaction network analysis of important immunity-related genes

   The interactions among the 11 identified IIRGs were analyzed within the
   STRING database (Fig. [147]S7), revealing that these genes exhibited
   strong interactions with each other. Notably, the genes JAK2 and IL10
   showed a high degree of connectivity within the network, indicating
   their strong interactions with other genes.

Association analysis of IIRGs and drug sensitivity

   To investigate the relationship between the 11 IIRGs and drug
   sensitivity, the pRRophetic package was used to calculate the IC50
   values for 14 drugs from the [148]GSE134347 sepsis dataset in the CCLE
   database. The analysis revealed that, with the exception of PD.0325901,
   PF2341066, and PHA.665752, the IC50 values of the remaining drugs were
   significantly different between sepsis and healthy patients (P < 0.05,
   Fig. [149]8A), suggesting a distinct difference in drug efficacy
   between these groups. Additionally, a correlation analysis was
   conducted between the 11 IIRGs and the IC50 of drug sensitivity,
   showing significant associations (Fig. [150]8B). For example, the
   expression of GRB2 gene was strongly and positively correlated with the
   IC50 of Erlotinib (r = 0.51, P < 0.05), yet it displayed a negative
   correlation with the IC50 of PD.0325901 (r = − 0.61, P < 0.05).

Fig. 8.

   [151]Fig. 8
   [152]Open in a new tab

   Drug sensitivity analysis of the sepsis dataset. (A) Box diagram of
   drug IC50 distribution between sepsis patients and healthy patients, ns
   representing P ≥ 0.05, * representing P < 0.05, *** representing
   P < 0.001. (B) Correlation circle diagram of 11 IIRGs and drug IC50,
   purple represents positive correlation, blue represents negative
   correlation, * represents P < 0.05, the darker the color, the stronger
   the correlation. IIRGs, important immune-related genes.

Discussion

   In a comprehensive study spanning 2005 to 2014 across 27 academic
   hospitals, there was observed a significant increase in septic shock
   incidences^[153]29,[154]30, rising from 12.8 to 18.6 cases per 1,000
   hospital admissions, while mortality rates decreased slightly from 55
   to 51%^[155]31. This upward trend is attributed to
   factors^[156]32–[157]35 such as aging populations, increased
   immunosuppression, and the prevalence of multi-drug resistant
   infections, underscoring the ongoing challenge of sepsis as a critical
   global health concern^[158]13,[159]36. Despite traditional inflammatory
   markers being crucial in diagnosing various sepsis types, a significant
   gap in research remains regarding immune exhaustion in septic
   patients^[160]37,[161]38, which could result in either under-treatment
   or overtreatment^[162]39–[163]41.

   In response to these challenges, our team has developed an innovative
   multi-biomarker model using ML techniques. This model successfully
   identifies 11 key IIRGs., enhancing the ability to detect sepsis and
   predict drug treatment responsiveness. This advancement not only
   facilitates the early identification of septic patients but also aids
   in evaluating their immune status, thereby laying the groundwork for
   more tailored and precise treatment approaches. Additionally, our
   research highlights the potential effectiveness of immune checkpoint
   blockade therapy in treating sepsis, marking a significant stride in
   the field.

   In our preprint (DOI: 10.21203/rs.3.rs-4306022/v1), RNA-seq data from
   [164]GSE154918 was utilized and sepsis-related disease expression genes
   were identified through WGCNA. By intersecting these genes with IIRGs
   obtained from the ImmPort database
   ([165]https://www.immport.org/shared/home), we pinpointed 108
   immune-related disease expression genes linked to sepsis. Using five
   different commonly used ML algorithms for CV, we identified 11 key
   IIRGs. Further, we leveraged six prevalent ML methods to sift through
   these genes, aiming to find the algorithm with the best classification
   performance in our validation dataset, leading to the development of a
   new predictive model. This model demonstrated precision in identifying
   septic patients, thereby validating the predictive value of these 11
   IIRGs in sepsis. The genes identified are GRB2, IL21R, NCR3, TRAV30,
   IL10, PDGFC, APOBEC3A, MAPK14, JAK2, SOCS3, and PLSCR1. A comprehensive
   literature review revealed that six of these genes—GRB2, IL21R, NCR3,
   TRAV30, PDGFC, and PLSCR1—had not been previously associated with
   sepsis. In-depth exploration of the roles of these genes in sepsis
   poses a significant challenge^[166]42. However, such research could
   potentially reveal novel therapeutic targets or disease mechanisms,
   offering fresh perspectives and theoretical foundations for developing
   new treatments for sepsis.

   The IIRGs we identified are involved in multiple key immune signaling
   pathways, such as cytokine signaling pathways, T cell receptor
   signaling pathways, and B cell receptor signaling pathways. Cytokines
   play a crucial regulatory role in the inflammatory responses associated
   with sepsis. These pathways influence the activation and
   differentiation of immune cells, thus regulating the immune status in
   patients with sepsis. For example, Th17 cells enhance the body’s
   pathogen clearance capability by promoting the activation of other
   immune cells through the secretion of cytokines such as IL-17^[167]43.

   These IIRGs potentially play a significant role in regulating immunity
   and inflammation. Among the identified IIRGs, certain genes exhibit
   critical biological functions. IL10, an anti-inflammatory cytokine, may
   shield the host by dampening inflammatory responses in sepsis and
   preventing a persistent inflammatory state^[168]44. The expression of
   IL-10 is inversely associated with the prognosis of sepsis patients,
   indicating its protective role in the progression of the disease,
   although excessive IL-10 can lead to immune escape and foster
   infection^[169]45. In our constructed PPI network, IL10 shows a high
   degree of connectivity, and a significant correlation exists between
   IL-10 and MAPK14, along with related signaling pathways. Research has
   identified MAPK14 as a key regulator of IL-10^[170]46, thus marking it
   as a potential sepsis treatment target. Additionally, another study
   recognized MAPK14 as crucial within a new mitochondrial-related gene
   signature for diagnosing sepsis^[171]47. This study established a
   diagnostic model featuring MAPK14, underscoring its diagnostic
   significance, which aligns with our findings. Our PPI network analysis
   further confirms the linkage of MAPK14 with pathways involving PDGFRB,
   SOS2, IL-10, and GRB2, highlighting MAPK14 as a promising target for
   sepsis therapy.

   JAK2 is implicated in the immune response by mediating cytokine
   signaling^[172]48. Its abnormal expression in sepsis may result in an
   inadequate immune response to infection and diminish the capacity to
   eliminate pathogens. Additionally, JAK2 is thought to regulate
   leukocyte differentiation and function, potentially affecting the
   inflammatory microenvironment.

   Through our PPI network analysis, a high degree of connectivity was
   observed between JAK2 and GRB2 within a network composed of 11 IIRGs.
   However, research on the interaction between JAK2 and GRB2 has
   predominantly been focused in oncology, with limited exploration in the
   context of sepsis.

   The GRB2 protein is pivotal^[173]49 in various critical biological
   processes, including cell growth, proliferation, metabolism, embryonic
   development, and the differentiation of cancer cells. It plays a
   crucial role in the signal transduction of immune cells^[174]50, with
   the function of Gab2 being essential in the progression of various
   cancers, often associated with thrombosis and inflammation. However,
   the specific role of GRB2 in sepsis has not been highlighted.
   Therefore, investigating the interaction between GRB2 and JAK2 in
   sepsis could lead to groundbreaking discoveries, providing new
   perspectives and strategies for treating sepsis.

   NCR3, a natural killer (NK) cell receptor, shows upregulation in
   sepsis, indicating the significance of NK cells in controlling
   infection and clearing tumor cells^[175]51. However, the activation of
   NCR3 may also promote systemic inflammatory responses, and its dual
   role in sepsis warrants further exploration^[176]52.

   Through KEGG enrichment analysis, it was found that these IIRGs are
   predominantly involved in cytokine signaling pathways, T cell receptor
   signaling pathways, etc., and are closely associated with inflammation
   and immune regulation. Sepsis patients often experience severe cytokine
   storms, and the abnormal expression of related genes may be a critical
   factor leading to imbalanced immune responses. The immunosuppressive
   characteristics of sepsis are linked to the dysfunction of T cells and
   B cells^[177]53. A deeper exploration of the role of IIRGs in the
   activation, proliferation, and functional regulation of T cells and B
   cells will aid in understanding the immune escape mechanisms in sepsis.

   Alterations in the expression levels of key immune infiltration genes
   play a critical role in the onset, progression, and treatment of
   sepsis^[178]54. Utilizing the CIBERSORT algorithm, we analyzed the
   relationship between the expression of these pivotal genes and immune
   cell infiltration in septic patients. Our findings indicated a
   significant negative correlation between MAPK14, JAK2, SOCS3, and
   PLSCR1 with T cell CD8 immune cells, whereas IL21R and NCR3 were
   positively correlated with T cell CD8. These results are crucial for
   understanding the immune response characteristics in septic patients.
   Additionally, SOCS3, MAPK14, and IL-10 showed a positive correlation
   with Macrophages M0. Analysis of these key genes and their impact on
   immune infiltration using the ssGSEA algorithm revealed that genes like
   MAPK14, JAK2, SOCS3, PLSCR1, IL-10, and PDGFC were positively
   correlated with Macrophages, while IL21R and NCR3 demonstrated negative
   correlations with Macrophages and positive correlations with various NK
   and T cells, reflecting their diverse roles in immune regulation.
   Further research into these IIRGs, especially those not previously
   directly linked to sepsis such as IL21R, NCR3, PLSCR1, is deemed highly
   significant^[179]55.

   Sepsis and cancer share many pathophysiological characteristics, with
   immune suppression mechanisms in both involving dysfunctions in myeloid
   and lymphoid cells, ultimately leading to impaired antibacterial
   phagocytosis and antitumor cytotoxicity^[180]56. Recent studies have
   suggested that antitumor drugs might alleviate lung damage during acute
   sepsis^[181]57, and the occurrence of sepsis might also reduce the risk
   of certain cancers^[182]58. Exploring the immune-related aspects
   between sepsis and cancer represents a novel area of investigation. Our
   study discovered that these 11 IIGs are closely linked with immune
   checkpoints, crucial in regulating the immune system, particularly in
   tumor immune evasion^[183]59. This suggests the presence of tumor-like
   immune escape mechanisms in sepsis^[184]60–[185]62. Previous research
   has indicated that anti-PD-L1 peptides could improve survival rates in
   mice infected with fungi^[186]62, pointing to a role for immune
   checkpoint pathways in immune suppression during the later stages of
   sepsis. Through KEGG functional enrichment analysis, pathways related
   to cancer immunity were found to be significantly enriched in sepsis,
   suggesting a potential overlap in immune regulation between these two
   diseases. Surprisingly, we also discovered that the expression of these
   IIGs is closely associated with the sensitivity and resistance to
   antitumor drugs, with the GRB2 gene showing a significant positive
   correlation with paclitaxel^[187]63. While studies have demonstrated
   that the GRB2 gene could increase sensitivity to paclitaxel in
   non-small cell lung cancer, its role in enhancing sensitivity to
   paclitaxel through the activation of the MAPK pathway in sepsis is yet
   to be reported. This study not only validates the efficacy of IIRGs as
   biomarkers for diagnosing sepsis but also reveals their significant
   correlation with genes related to immune checkpoints. Modulating the
   expression or function of these genes could help restore normal immune
   responses or improve drug responsiveness in patients with sepsis,
   potentially increasing survival rates. These findings might pave the
   way for new potential therapeutic targets in sepsis treatment, laying
   the groundwork for the development of more effective drug interventions
   for this disease.

   While the expression of our constructed 11 characteristic genes shows
   good and stable predictive performance in identifying sepsis, closely
   correlating with patients’ immune infiltration status and drug
   sensitivity, several limitations must be acknowledged and addressed in
   subsequent research. Firstly, our study primarily utilized datasets
   from [188]GSE154918 (n = 105) and [189]GSE134347 (n = 298). Although
   the sample sizes of these datasets are relatively large, they may not
   adequately represent all populations and clinical scenarios,
   potentially limiting the generalizability of our results. Despite
   existing literature^[190]64,[191]65 supporting the reliability of these
   datasets in sepsis research, future studies should include more
   datasets to enhance the applicability of the results. Secondly, despite
   our model’s demonstrated good predictive performance, given the nature
   of sepsis as a potentially fatal disease, additional evaluation
   indicators are required to assess the model and improve the accuracy
   and scientific rigor of the study. Additionally, although the model
   showed high accuracy and applicability across multiple datasets, it has
   not yet been tested on clinical samples, which could limit its
   practical clinical use. These genes still need experimental
   verification or clinical evaluation to refine the accuracy and
   scientific rigor of the study. In the future, we will consider relevant
   experimental validation or clinical evaluations to enhance the
   reliability and comprehensiveness of our research results. Moreover,
   our data analysis was based on data from public databases, which might
   contain errors or missing information. Without strict quality control
   measures, the reliability of our analysis could be compromised.
   Therefore, more prospective and mechanistic studies are needed to
   further validate and refine these findings.

   In our study, we identified 11 important genes associated with sepsis,
   and the expression patterns of these genes can serve as potential
   biomarkers for early diagnosis and disease stratification. In the
   future, standardized detection methods could be developed to quickly
   assess the expression of these biomarkers in clinical samples. This
   approach could enable doctors to identify sepsis patients early based
   on gene expression characteristics, thereby initiating appropriate
   treatment promptly.

   Our study established a variety of ML models (such as conditional
   inference RF, naive Bayes, etc.) and verified their high predictive
   performance in sepsis classification. These models can be further
   refined with actual clinical data to optimize parameters and algorithms
   to suit specific patient groups. By integrating these predictive models
   with clinical information systems, doctors can swiftly calculate the
   sepsis risk score upon hospital admission and formulate personalized
   treatment plans based on this.

   In the future, to effectively integrate these biomarkers and predictive
   models into clinical workflows, preclinical and clinical validation,
   along with multidisciplinary collaboration and education, are essential
   to enhance their application in sepsis management.

   Sepsis is a complex and rapidly evolving condition, and the
   introduction of ML tools risks leading to an over-reliance on models by
   medical staff^[192]66. While AI can analyze vast amounts of data and
   identify potential patterns, the diagnostic process should still
   prioritize the doctor’s clinical intuition and judgment. For instance,
   if the model inaccurately assesses the patient’s status, it may lead to
   unnecessary treatments or delay the correct diagnosis^[193]67. Medical
   staff must maintain a balance between AI recommendations and clinical
   experience, and strengthen the verification and cross-checking of model
   results to ensure that patients receive the most appropriate
   treatment^[194]68.

   The integration of AI tools in medical practice necessitates a clear
   ethical framework, which involves not only the ethical relationships
   between the model’s developers, medical providers, and patients but
   also the broader social impact. The development of comprehensive
   ethical guidelines can assist medical institutions in addressing
   ethical challenges that may arise when deploying these tools^[195]69.

Conclusion

   This investigation has successfully identified 11 IIRGs linked to
   sepsis through the application of genomic analysis combined with
   machine learning techniques. These genes, including GRB2, IL21R, and
   NCR3, are identified as potential biomarkers for early sepsis
   diagnosis, offering profound insights into the immune processes that
   underpin the condition. The predictive models developed from these
   genes have shown high levels of accuracy and stability, highlighting
   their potential applicability in clinical settings to improve detection
   and treatment of sepsis. Moreover, functional enrichment and immune
   infiltration analyses underscore the pivotal role these genes play in
   modulating immune responses within sepsis contexts. Furthermore, a
   notable correlation between these IIRGs and drug sensitivity highlights
   the prospects for personalized medicine, suggesting customized
   treatment options for sepsis patients. This research emphasizes the
   significance of discerning immune-related genetic markers in sepsis,
   laying down a theoretical framework for the formulation of novel
   diagnostic and therapeutic avenues that confront a significant global
   health challenge. Future research should focus on validating these
   results across varied populations and delving deeper into the
   mechanistic functions of these genes to optimize their clinical
   efficacy in managing sepsis.

Electronic supplementary material

   Below is the link to the electronic supplementary material.
   [196]Supplementary Material 1^ (959.5KB, tif)
   [197]Supplementary Material 2^ (2.8MB, tif)
   [198]Supplementary Material 3^ (1MB, tif)
   [199]Supplementary Material 4^ (454.5KB, tif)
   [200]Supplementary Material 5^ (559KB, tif)
   [201]Supplementary Material 6^ (470.6KB, tif)
   [202]Supplementary Material 7^ (1.1MB, tif)
   [203]Supplementary Material 8^ (2.1MB, tif)
   [204]Supplementary Material 9^ (15.5KB, docx)
   [205]Supplementary Material 10^ (498.5KB, xls)
   [206]Supplementary Material 11^ (17.1KB, docx)
   [207]Supplementary Material 12^ (18.5KB, docx)
   [208]Supplementary Material 13^ (18.6KB, docx)
   [209]Supplementary Material 14^ (25KB, xls)
   [210]Supplementary Material 15^ (101.5KB, xls)
   [211]Supplementary Material 16^ (141.5KB, xls)
   [212]Supplementary Material 17^ (17.3KB, docx)

Acknowledgements