Abstract

   This study aimed to identify crosstalk genes and explore their
   potential roles in type 2 diabetes (T2D) and Sjögren’s syndrome (SS)
   using bioinformatic analysis. We analyzed multiple publicly available
   gene expression datasets and screened 16 crosstalk genes. Consequently,
   genes that may play significant roles in disease processes were
   identified. Thereafter, we used gene set variation analysis to assess
   the differences in gene sets among various samples. LASSO regression
   analysis was performed to determine the optimal diagnostic genes, and
   predictive models for T2D and SS were constructed. The classification
   accuracy of these models was evaluated using receiver operating
   characteristic curves. Among 16 identified crosstalk genes, 11 showed
   significant differences in expression. These genes were significantly
   enriched in biological processes. The predictive model generated from
   ALDH6A1 and IL11RA demonstrated good classification accuracy for T2D
   and SS samples. Immune infiltration analysis revealed significant
   differences in specific immune cell types between the disease and
   control groups, demonstrating a significant correlation with the
   identified hub genes, highlighting their potential involvement in T2D
   and SS pathophysiology. This study revealed the crucial role of
   specific immune cells in T2D and SS pathology, providing new insights
   into both conditions and the potential targets for future immunotherapy
   design.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-025-14044-6.

   Keywords: Type 2 diabetes, Sjögren’s syndrome, Crosstalk genes, Immune
   cells, Bioinformatics analysis

   Subject terms: Biomarkers, Endocrinology, Rheumatology

Introduction

   Type 2 Diabetes (T2D) and Sjögren’s Syndrome (SS) are two major chronic
   diseases that pose significant threats to human health worldwide. T2D
   is a chronic metabolic disorder characterized by high blood glucose
   levels, with common symptoms including dry mouth, excessive thirst, and
   frequent urination^[36]1. It is typically accompanied by severe
   complications such as cardiovascular disease and kidney damage,
   resulting in substantial economic burdens for patients and
   society^[37]2–[38]4. SS is an autoimmune disease primarily
   characterized by impaired function of the salivary and lacrimal glands,
   primarily manifesting as discomforting symptoms such as dry mouth and
   dry eyes^[39]5. Epidemiological studies indicate that both diseases
   have high incidence and prevalence rates, severely affecting the
   quality of life of affected individuals.

   Although there is symptom overlap between T2D and SS in clinical
   settings, the common pathophysiological mechanisms underlying both
   conditions have not yet been clearly elucidated. Recent studies suggest
   that inflammatory responses^[40]6 and metabolic imbalances^[41]7 play
   significant roles in the development of both diseases, indicating
   potential cross-talk and mutual influence in their pathogenesis.
   However, existing literature regarding the molecular associations
   between T2D and SS, as well as their potential signaling pathways,
   remains relatively limited; moreover, a systematic investigation of the
   molecular interaction networks and key regulatory genes between both
   conditions has not been conducted^[42]8–[43]14.

   In this context, an in-depth exploration of the molecular associations
   between T2D and SS holds significant scientific importance. First, it
   will aid in understanding the shared pathological basis of complex
   diseases; second, it will provide a theoretical foundation for
   discovering new diagnostic biomarkers and targeted therapeutic
   strategies. Therefore, in this study, we used various bioinformatics
   methods (including weighted gene co-expression network analysis
   [WGCNA]^[44]15), based on multiple publicly available gene expression
   datasets, to (1) systematically identify and characterize the key
   interactive genes common to T2D and SS and (2) analyze their biological
   functions and related signaling pathways. By revealing the molecular
   interaction mechanisms between these two diseases, we aim to provide
   new theoretical foundations and molecular targets for personalized
   diagnosis, treatment, and precise interventions.

Results

Technology roadmap

   The technology roadmap is present in Fig. [45]S1.

WGCNA

   WGCNA was performed on the genes in the [46]GSE18732 dataset to
   evaluate the co-expression module of the T2D dataset. First, the
   scale-free fit index (Fig. [47]1a) under different soft thresholds was
   calculated and displayed to render the constructed network more
   consistent with the scale-free topology. The results showed that at a
   fitting index of 0.80, the minimum soft threshold was 3, which is the
   optimal soft threshold for constructing a scale-free network. A
   co-expression network was constructed based on the optimal soft
   threshold, and the genes were clustered and labeled with grouping
   information using a clustering tree (Fig. [48]1b). The results showed
   that using a mergeCutHeight threshold of 0.2, the genes were clustered
   into the following 16 modules: MEmidnightblue, MEpink, MEblack,
   MEpurple, MEgreenyellow, MEblue, MEyellow, MEmagenta, MEgreen,
   MEsalmon, MElightcyan, MEred, MEtan, MEturquoise, MEbrown, and MEcyan.
   Thereafter, the genes were clustered, and the relationships between the
   genes and merged modules were visualized (Fig. [49]1c). Next, the
   correlation between all genes in the 16 modules and the T2D and control
   samples was determined according to the expression patterns of module
   genes (Fig. [50]1d). Finally, the modules MEmidnightblue and MEtan with
   p < 0.05, |r|> 0.30 were selected.

Fig. 1.

   [51]Fig. 1
   [52]Open in a new tab

   Weighted gene co-expression network analysis (WGCNA) for type 2
   diabetes (T2D) dataset. (a) Determination of optimal soft-thresholding
   power. Left panel: Scale-free topology fit index (y-axis, left) and
   mean connectivity (y-axis, right) plotted against soft-thresholding
   power (x-axis). Red line indicates the selected soft threshold power
   (β = 3). Right panel: Mean network connectivity (y-axis) at different
   soft-thresholding powers (x-axis). (b) Gene cluster dendrogram and
   module assignment. Genes clustered by topological overlap (y-axis);
   colored horizontal bars below denote co-expression modules identified
   by dynamic tree cutting. (c) Gene clustering results. Top: Hierarchical
   clustering dendrogram based on topological overlap dissimilarity
   (y-axis). Bottom: Color-coded assignment of genes to modules (y-axis).
   (d) Module-trait associations. Correlation analysis between gene module
   eigengenes (rows) and clinical traits (columns: T2D status). Color
   scale represents Pearson correlation coefficient (r)
   values: Red = positive correlation, Purple = negative correlation.
   Correlation strength: |r|< 0.3 (weak/none), 0.3–0.5 (weak), 0.5–0.8
   (moderate), > 0.8 (strong).

   To evaluate the co-expression modules of the SS dataset, WGCNA was
   performed on the genes in the SS dataset, [53]GSE40611. First, the
   scale-free fitting index (Fig. [54]2a) at different soft thresholds was
   calculated and displayed to ensure that the constructed network was
   more consistent with the scale-free topology. The results showed that
   at a fitting index of 0.80, the minimum soft threshold that conformed
   to the construction of the scale-free network was 12. Furthermore, a
   co-expression network was constructed based on the optimal soft
   threshold, and the genes were clustered and labeled with grouping
   information using a clustering tree (Fig. [55]2b). The results showed
   that at a screening criterion of 0.2, the genes were clustered into the
   following 16 modules: MElightcyan, MElightgreen, MEgrey60, MEpink,
   MEtan, MEmidnightblue, MElightyellow, MEpurple, MEcyan, MEgreen,
   MEsalmon, MEblue, MEblack, MEbrown, MEroyalblue, and MEgrey.
   Thereafter, the genes were clustered and the relationships between the
   genes and merged modules visualized (Fig. [56]2c). Next, the
   correlation between all genes in the 16 modules and the SS and control
   samples was obtained according to the expression patterns of the module
   genes (Fig. [57]2d). Finally, the modules MElightyellow, MEpurple,
   MEblue, MEblack, and MEbrown with p < 0.05 and |r|> 0.30 were selected.

Fig. 2.

   [58]Fig. 2
   [59]Open in a new tab

   Weighted gene co-expression network analysis (WGCNA) for Sjogren’s
   syndrome (SS) dataset. (a) Determination of optimal soft-thresholding
   power. Left panel: Scale-free topology fit index (y-axis, left) and
   mean connectivity (y-axis, right) plotted against soft-thresholding
   power (x-axis). Red line indicates the selected soft threshold power
   (β = 12). Right panel: Mean network connectivity (y-axis) at different
   soft-thresholding powers (x-axis). (b) Gene cluster dendrogram and
   module assignment. Genes clustered by topological overlap (y-axis);
   colored horizontal bars below denote co-expression modules identified
   by dynamic tree cutting. (c) Gene clustering results. Top: Hierarchical
   clustering dendrogram based on topological overlap dissimilarity
   (y-axis). Bottom: Color-coded assignment of genes to modules (y-axis).
   (d) Module-trait associations. Correlation analysis between gene module
   eigengenes (rows) and clinical traits (columns: SS status (Control or
   SS)). Color scale represents Pearson correlation coefficient (r)
   values: Red = positive correlation, Brown = negative correlation.
   Correlation strength: |r|< 0.3 (weak/none), 0.3–0.5 (weak), 0.5–0.8
   (moderate), > 0.8 (strong).

Crosstalk genes for T2D and SS

   We performed a differential analysis of the T2D dataset [60]GSE18732
   and SS dataset [61]GSE40611, and a volcano plot was constructed
   (Fig. [62]3a, b). A total of 885 differentially expressed genes (DEGs)
   between T2D and control samples in the [63]GSE18732 dataset met the
   |logFC|> 0 and p < 0.05 threshold; under this threshold, 451 genes were
   upregulated (logFC > 0 and p < 0.05) and 434 genes downregulated
   (logFC < 0 and p < 0.05). In the [64]GSE40611 SS dataset, 3326 DEGs met
   the |logFC|> 0 and p < 0.05 threshold between the SS and control
   samples. Under this threshold, 1526 genes were upregulated (logFC > 0
   and p < 0.05) and 1800 genes downregulated (logFC < 0 and p < 0.05).

Fig. 3.

   [65]Fig. 3
   [66]Open in a new tab

   Differential gene expression and crosstalk gene analysis. (a) Volcano
   plot of T2D versus Control differential expression. Type 2 diabetes
   (T2D) dataset showing differentially expressed genes (DEGs). X-axis:
   Log ~ 2 ~ -fold change (log ~ 2 ~ FC); Y-axis: − log ~ 10 ~ (adjusted
   P value). Colored dots: Red = up-regulated DEGs (log ~ 2 ~ FC > 0,
   adj. P < 0.05); Blue = down-regulated DEGs (log ~ 2 ~ FC < 0,
   adj. P < 0.05); Gray = non-significant genes. (b) Volcano plot of SS
   versus Control differential expression. Sjögren’s syndrome (SS) dataset
   DEGs. Axes and color scheme identical to panel a. (c) Venn diagram of
   DEG-module overlaps. Intersections between DEGs and WGCNA module genes.
   Sets defined: Mistyrose circle = T2D DEGs; Lavender circle = SS
   DEGs; Silver circle = T2D module genes; Pink circle = SS module genes.
   Overlapping regions indicate crosstalk genes shared between datasets
   and analysis methods. (d) and (e) Crosstalk gene expression
   heatmaps. Normalized expression z-scores across samples. Columns:
   Samples ([67]GSE18732 in d, [68]GSE40611 in (e); Rows: Crosstalk
   genes; Color scale: Red = high expression, Blue = low expression (scale
   bar shown). Sample groups labeled: Control (c) versus Disease (T2D/SS).
   (f) and (g) Crosstalk gene expression group comparisons. Violin plots
   showing expression distribution between groups. Y-axis: Normalized
   expression value; X-axis: Sample groups (Control vs. Disease). Central
   line: Median; Box: Interquartile range; Whiskers: 1.5 × IQR. Dataset
   source: [69]GSE18732 (f), [70]GSE40611 (g).

   The intersection of the DEGs and module genes of the two datasets was
   obtained and Venn diagram was constructed to obtain the crosstalk genes
   of the two datasets (Fig. [71]3c). In the present study, “crosstalk
   genes” are defined as the intersection of DEGs and WGCNA-derived module
   genes that are associated with both T2D and SS; these are genes that
   are differentially expressed and located within the key co-expression
   modules in both diseases. The following 16 crosstalk genes were
   obtained: AK1, ALDH6A1, CHCHD10, CKB, EFEMP2, IL11RA, IL15, LARP6,
   LYSMD2, MSRB1, MYADM, NR3C1, SH3RF2, TMEM9, TXLNB, and UBE2Q2. In
   addition, heat maps were designed to show crosstalk gene expression in
   the [72]GSE18732 and [73]GSE40611 datasets (Fig. [74]3d, e).

   Furthermore, we used the Mann–Whitney U test to screen crosstalk genes
   and construct group comparison plots (Fig. [75]3f, g), which showed
   that the following 11 genes were significantly different in the
   [76]GSE18732 and [77]GSE40611 datasets: AK1, ALDH6A1, CHCHD10, CKB,
   IL11RA, IL15, LARP6, LYSMD2, SH3RF2, TMEM9, and UBE2Q2.

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway
enrichment analysis

   The biological processes (BPs), cellular components (CCs), molecular
   functions (MFs), and biological pathways (KEGG) of the 11 crosstalk
   genes were further explored using gene ontology GO and KEGG pathway
   enrichment analyses, and the specific results are shown in Table
   [78]S2. The 11 crosstalk genes were mainly enriched in BPs such as ATP
   metabolic process, regulation of protein catabolic processes,
   cytokine-mediated signaling pathway, and other BPs; CCs such as
   multivesicular body membranes, integral components of mitochondrial
   inner membranes, and intrinsic components of mitochondrial inner
   membranes; MFs such as ubiquitin-like protein transferase activity,
   ubiquitin-like protein transferase activity, sequence-specific mRNA
   binding, and other MFs. In addition, it was enriched in biological
   pathways (KEGG) such as the JAK-STAT signaling pathway,
   cytokine-cytokine receptor interaction, thiamine metabolism, and other
   biological pathways. The results of the analyses were visualized using
   a bubble diagram (Fig. [79]4a).

Fig. 4.

   [80]Fig. 4
   [81]Open in a new tab

   Functional enrichment analysis of crosstalk genes based on Gene
   Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). (a)
   Enrichment bubble plot. Significantly enriched GO terms (Biological
   Process [BP], Cellular Component [CC], Molecular Function [MF]) and
   KEGG pathways. X-axis: Gene ratio (Gene counts/Background
   genes); Y-axis: Enriched term name; Bubble size: Number of associated
   genes; Bubble color: − log ~ 10 ~ (adjusted P value) gradient
   (red = low P value [high significance], black = high P value [low
   significance]). Filter criteria: P < 0.05, False Discovery Rate
   (FDR) < 0.25. (b) BP enrichment network. Circular visualization of
   Biological Process (BP) terms and associated crosstalk genes. Red
   nodes: GO-BP terms; Black nodes: Genes; Gray lines: Gene-term
   associations. (c) CC enrichment network. Cellular Component (CC)
   term-gene interactions. Node and line schemes identical to panel b. (d)
   MF enrichment network. Molecular Function (MF) term-gene relationships.
   Node and line schemes identical to panel b. (e) KEGG pathway
   network. Kyoto Encyclopedia of Genes and Genomes pathway-gene
   interactions. Red nodes: KEGG pathways; Black nodes: Genes; Gray lines:
   Gene-pathway associations.

   A loop network diagram of BP, MF, CC, and biological pathway (KEGG) was
   constructed based on the results of the GO and KEGG pathway enrichment
   analyses (Fig. [82]4b–e). The lines indicate the respective molecules
   and annotations of the respective entries; a greater node size
   indicates a higher number of molecules encompassed by the entries.

Gene set enrichment analysis (GSEA)

   GSEA was used to examine the expression and BPs of all genes in the T2D
   dataset [83]GSE18732 to determine the effect of their expression levels
   on T2D. Associations between the affected CCs and their MFs (Fig.
   [84]S2a) are shown in Table [85]S3. The results showed that all genes
   in the [86]GSE18732 dataset were significantly enriched in T2D (Fig.
   [87]S2b), the insulin pathway (Fig. [88]S2c), the MAPK signaling
   pathway (Fig. [89]S2d), the IL6 pathway (Fig. [90]S2e), the TGF beta
   signaling pathway (Fig. [91]S2f), T2D (Fig. [92]S2g), and other related
   biological functions and signaling pathways.

   GSEA was used to analyze the expression and BPs of all genes in the
   [93]GSE40611 dataset to determine the effect of their expression levels
   on SS. The relationships between the affected CCs and their MFs (Fig.
   [94]S3a) are shown in Table [95]S4. The results showed that all genes
   in the [96]GSE40611 dataset were significantly enriched in the negative
   regulation of the PI3K AKT network (Fig. [97]S3b), the IL18 signaling
   pathway (Fig. [98]S3c), immune system diseases (Fig. [99]S3d), Notch
   signaling (Fig. [100]S3e), the inflammatory response pathway (Fig.
   [101]S3f), the JAK STAT signaling pathway (Fig. [102]S3g), and other
   related biological functions and signaling pathways.

Gene set variation analysis (GSVA)

   To investigate the disparity in the h.all.v7.4. symbols.gmt gene set
   between the T2D and control samples within the T2D dataset
   [103]GSE18732, GSVA was performed for all genes in the dataset.
   Detailed information is provided in Table [104]S5. The five most
   significant pathways (p < 0.05) were screened and visualized using a
   heat map and bar graph (Fig. [105]S4a, b). The five pathways included
   oxidative phosphorylation, KRAS signaling dn, the Hedgehog signaling
   pathway, Wnt/beta-catenin signaling, and G2M checkpoint.

   To investigate the disparity between h.all.v7.4. symbols.gmt gene set
   in the SS and control samples within the SS dataset [106]GSE40611, GSVA
   was performed for all genes in the dataset. Detailed information is
   provided in Table [107]S6. The following five most significant pathways
   (p < 0.05) were screened and visualized using a heat map and bar graph
   (Fig. [108]S5a, b): interferon alpha response, interferon gamma
   response, allograft rejection, IL-6/JAK/STAT3 signaling, and KRAS
   signaling up.

Identification of diagnostic genes

   To identify the best diagnostic genes among the crosstalk genes of the
   two diseases, Least Absolute Shrinkage and Selection Operator (LASSO)
   regression analysis was performed, and LASSO regression model and
   variable trajectory maps were constructed for visualization. The
   results showed four best diagnostic genes (Fig. [109]5a, b) in the T2D
   dataset [110]GSE18732 (ALDH6A1, IL11RA, IL15 and LARP6) and five best
   diagnostic genes (Fig. [111]5c, d) in the SS dataset (AK1, ALDH6A1,
   IL11RA, LYSMD2, and SH3RF2). The best diagnostic genes of the two
   datasets were intersected and a Venn diagram (Fig. [112]5e) was
   constructed, and the following two diagnostic genes were obtained:
   ALDH6A1 and IL11RA. The group comparison diagram of the [113]GSE18732
   (Fig. [114]5f) and [115]GSE40611 (Fig. [116]5g) datasets showed that
   ALDH6A1 and IL11RA expressions were significantly different between the
   disease and control groups in both datasets.

Fig. 5.

   [117]Fig. 5
   [118]Open in a new tab

   Diagnostic biomarker screening via LASSO (Least Absolute Shrinkage and
   Selection Operator) regression. (a) and (b) Diagnostic model (a) and
   variable trajectory plots (b) of the LASSO regression model for the T2D
   dataset [119]GSE18732. (c) and (d) Diagnostic model (c) and variable
   trajectory plots (d) of LASSO regression model for the SS dataset
   [120]GSE40611. (e) Diagnostic gene intersection. Venn diagram: Shared
   biomarkers between T2D and SS. Silver circle: T2D diagnostic genes
   ([121]GSE18732, n = 4). Pink circle: SS diagnostic genes
   ([122]GSE40611, n = 5). Overlap (darkgray): Crosstalk diagnostic genes
   (ALDH6A1, IL11RA). (f) and (g) Expression validation. (f) [123]GSE18732
   (T2D): Boxplot: ALDH6A1/IL11RA expression, Y-axis: Normalized
   expression value, X-axis: Sample groups [Control vs. T2D], Boxplot
   elements: Center line = median, box = interquartile range (IQR),
   whiskers = 1.5 × IQR, Black stars: P < 0.01 (vs. Control). (g)
   [124]GSE40611 (SS): Configuration identical to f with SS cohort.
   Abbreviations: T2D = Type 2 Diabetes; SS = Sjögren’s syndrome;
   IQR = Interquartile range.

Design of the prediction model

   Based on two diagnostic genes (ALDH6A1 and IL11RA) and univariate and
   multivariate binary logistic regression, we constructed predictive
   models for the T2D and SS datasets. To further validate the predictive
   model in the T2D dataset, a nomogram demonstrating the relationship
   between these two genes was constructed (Fig. [125]6a). The results
   showed that ALDH6A1 expression had a significantly higher utility than
   IL11RA expression in the T2D prediction model. The clinical utility of
   the T2D prediction model was evaluated using decision curve analysis
   (DCA) (Fig. [126]6b). The results demonstrated that, within a specific
   range, the model’s line was stable at a higher level than that of all
   positive and negative values. Moreover, the model had a greater net
   benefit and superior effect. In addition, the R package pROC was used
   to construct the receiver operating characteristic (ROC) curve using
   the RiskScore in the T2D dataset. The ROC curve (Fig. [127]6c) showed
   that the RiskScore of the T2D prediction model in the T2D dataset had a
   certain accuracy (area under the curve [AUC] > 0.7) in the
   classification of T2D and control samples. The RiskScore was calculated
   using the following formula:
   graphic file with name d33e708.gif

Fig. 6.

   [128]Fig. 6
   [129]Open in a new tab

   Construction and validation of diagnostic models for Type 2 Diabetes
   (T2D) and Sjögren’s syndrome (SS). (a) T2D diagnostic nomogram
   ([130]GSE18732). Predicts T2D risk based on two biomarkers. Points
   scale (top axis): Score for each gene expression level; Total points
   (middle axis): Summed scores; Risk probability (bottom axis): Predicted
   T2D probability. Gray vertical lines: Projection path for sample
   calculation. (b) Decision curve analysis (DCA) for T2D model. X-axis:
   Threshold probability; Y-axis: Net benefit; Darkslateblue curve:
   Diagnostic model; Mistyrose line: “Treat all” strategy; Darksalmon
   line: “Treat none” strategy. (c) and (d) ROC curves for T2D validation:
   (c) T2D dataset ([131]GSE18732) evaluation: X-axis:
   1—Specificity; Y-axis: Sensitivity; Blue curve: Diagnostic model
   (AUC = 0.705, 95% CI 0.597–0.813), (d) Independent T2D cohort
   ([132]GSE29221) validation: Components identical to (c) (AUC = 0.764,
   95% CI 0.562–0.966), (e) SS diagnostic nomogram
   ([133]GSE40611). Configuration identical to a with SS risk
   prediction. Pink vertical lines: Sample calculation path. (f) DCA for
   SS model. Component scheme identical to (b). (g) and (h) ROC curves for
   SS validation: (g) SS dataset ([134]GSE40611) evaluation: Red curve:
   Diagnostic model (AUC = 0.807.X, 95% CI 0.657–0.958), (h) Independent
   SS cohort ([135]GSE154926) validation: Components identical to
   g (AUC = 0.738, 95% CI 0.540–0.936).

   Using the RiskScore formula, we calculated the RiskScore of the T2D
   dataset [136]GSE29221 and plotted a diagnostic ROC curve
   (Fig. [137]6d). Furthermore, the RiskScore of the [138]GSE29221 T2D
   prediction model had a level of accuracy (AUC > 0.7) in classifying the
   T2D and control samples.

   To further validate the predictive value of [139]GSE40611 in the SS
   dataset, a nomogram showing the relationship between the two genes was
   constructed (Fig. [140]6e). The results showed that IL11RA expression
   level was significantly higher than ALDH6A1 expression level in the SS
   prediction model. DCA was used to evaluate and present the clinical
   utility of the prediction model for SS (Fig. [141]6f). The outcomes
   indicated that, within a specific range, the model’s line was
   consistently higher than that of all positive and all negative, with a
   greater net benefit and superior effect. In addition, the R package
   pROC was used to construct the ROC curve using RiskScore in the SS
   dataset. The ROC curve showed (Fig. [142]6g) that the RiskScore of the
   SS prediction model had a level of accuracy (AUC > 0.7) in the
   classification of SS and control samples. The RiskScore was calculated
   using the following formula:
   graphic file with name d33e788.gif

   Based on the RiskScore formula for the SS prediction model, we
   calculated the RiskScore for the SS dataset [143]GSE154926 and plotted
   a diagnostic ROC curve (Fig. [144]6h). The results showed that the
   RiskScore of the [145]GSE154926 SS diagnostic model was accurate in
   classifying SS and control samples (AUC > 0.7).

Construction of the protein–protein interaction (PPI) network and screening
of hub genes

   We used the diagnostic genes ALDH6A1 and IL11RA as hub genes and
   constructed an interaction network between the two hub genes and
   functionally similar genes using the GeneMANIA website (Fig. [146]S6a).
   The lines with different colors represent co-expression and share
   information such as protein domains.

   Thereafter, we then obtained miRNAs associated with hub genes using the
   ENCORI database and Transcription factors (TFs) combined with hub genes
   through the ChIPBase database. The miRNA-mRNA-TF regulatory network was
   constructed and visualized using Cytoscape software (Fig. [147]S6b). In
   the network, there were two hub genes, 12 TFs, and nine miRNAs.

Immune infiltration analysis between disease and control samples

   Using the gene expression matrix of T2D samples from the [148]GSE29221
   dataset, we quantified the immune infiltration abundance of 28 immune
   cell subtypes in both T2D and control groups through single-sample GSEA
   (ssGSEA). First, the differences in the expression of infiltrating
   immune cells among the different groups were presented through a group
   comparison plot. The group comparison diagram (Fig. [149]7a) showed
   three immune cells with statistical significance (p < 0.05):
   macrophages, natural killer T cells, and neutrophils.

Fig. 7.

   [150]Fig. 7
   [151]Open in a new tab

   Immune microenvironment analysis via single-sample Gene Set Enrichment
   Analysis (ssGSEA) in Type 2 Diabetes (T2D). (a) Immune cell
   infiltration comparison. Differential abundance of 28 immune cell types
   between T2D (n = 12) and Control (n = 12) samples in
   [152]GSE29221. Y-axis: ssGSEA enrichment score; X-axis: Immune cell
   types; Boxplot colors: Crimson = T2D group, Darkslategray = Control
   group; Statistical indicators: *** = P < 0.001,
   ** = P < 0.01, * = P < 0.05, ns = P ≥ 0.05. (b) Immune cell correlation
   network. Pairwise Spearman correlations between immune
   cells. Rows/Columns: Immune cell types; Color scale: Crimson = Positive
   correlation (r > 0), Darkslategray = Negative correlation
   (r < 0); Color intensity: Correlation strength (|r|). (c) Hub
   gene-immune cell correlations. Bubble plot showing relationships
   between immune cells (X-axis) and hub genes (Y-axis). Bubble size:
   |Correlation coefficient (r)|; Bubble color: Crimson = Positive
   correlation, Darkslategray = Negative correlation.

   A correlation heat map was used to show the correlation results of the
   infiltration levels of the three immune cells in the T2D dataset
   (Fig. [153]7b). The results showed a positive correlation between the
   infiltration levels of the three immune cells.

   Finally, the correlation between hub genes and immune cell infiltration
   levels was demonstrated using a correlation bubble plot (Fig. [154]7c).
   IL11RA showed the strongest positive correlation with macrophages
   (r = 0.390).

   The expression matrix of SS samples from the SS dataset [155]GSE154926
   was used to compute the immune infiltration levels of 28 immune cells
   in SS and control samples using the ssGSEA algorithm. Initially, the
   expression differences of the infiltration levels of immune cells in
   distinct groups were presented using a group comparison plot. The group
   comparison diagram (Fig. [156]8a) indicated that the following immune
   cell types showed statistical significance (p < 0.05): central memory
   CD8 + T cells, effector memory CD4 + T cells, and immature dendritic
   cells.

Fig. 8.

   [157]Fig. 8
   [158]Open in a new tab

   Immune microenvironment analysis via single-sample Gene Set Enrichment
   Analysis (ssGSEA) in Sjögren’s syndrome (SS). (a) Immune cell abundance
   comparison. Differential infiltration of 24 immune cell types between
   SS (n = 42) and Control (n = 7) samples in [159]GSE154926. Y-axis:
   ssGSEA enrichment score; X-axis: Immune cell types; Boxplot colors:
   Peru = SS group, Mediumturquoise = Control group; Statistical
   indicators: *** = P< 0.001, ** = P < 0.01, * = P < 0.05, ns = P ≥ 0.05.
   (b) Immune cell correlation network. Pairwise Spearman correlations
   between infiltrating immune cells. Rows/Columns: Immune cell
   types; Color gradient: Peru = Positive correlation (r > 0),
   Mediumturquoise = Negative correlation (r < 0); Color intensity:
   Absolute correlation strength (|r|). (c) Hub gene-immune cell
   relationships. Bubble plot showing interactions between immune cells
   (X-axis) and hub genes (Y-axis). Bubble size: |Spearman correlation
   coefficient (r)|; Bubble color: Peru = Positive correlation,
   Mediumturquoise = Negative correlation.

   The correlation results of the three immune cell infiltration levels in
   the immune infiltration analysis of the SS dataset [160]GSE154926 were
   presented in a correlation heat map (Fig. [161]8b). The results showed
   a positive correlation between the three immune cell infiltration
   levels.

   Finally, the correlation between hub genes and immune cell infiltration
   levels was determined using a correlation bubble plot (Fig. [162]8c).
   The results of the correlation bubble plot showed that ALDH6A1 had the
   strongest positive correlation with immature dendritic cells
   (r = 0.438).

Discussion

   In recent years, T2D and SS have attracted considerable attention owing
   to their clinical significance. T2D, a metabolic disorder, presently
   affects approximately 463 million people globally and the figure is
   expected to reach 700 million by 2045^[163]16. Its complications, such
   as cardiovascular disease and kidney failure^[164]2,[165]3, place a
   significant economic burden on global healthcare. Meanwhile, SS
   primarily impacts women, affecting 0.54.8% of the population. SS causes
   lacrimal and salivary gland dysfunctions, resulting in chronic dryness
   and fatigue^[166]17. Recent research has increasingly focused on the
   interaction between T2D and SS; however, substantial gaps persist in
   our understanding of the molecular mechanisms between these two chronic
   diseases.

   Therefore, the present study used advanced multivariate statistical
   techniques, including GSVA and LASSO regression analyses, to conduct a
   comprehensive bioinformatics analysis of samples from the [167]GSE18732
   and [168]GSE40611 datasets, which are related to T2D and SS,
   respectively. This large-scale systematic exploration enhanced the
   reliability and generalizability of the study findings. By performing
   an intersection analysis of the diagnostic genes for both diseases, we
   identified common candidate biomarkers that offer a novel perspective
   for cross-disease research.

   The primary results of this study indicate that ALDH6A1 and IL11RA are
   common and important diagnostic genes for T2D and SS. This finding
   uncovers potential molecular connections between the two disorders and
   provides a basis for selecting cross-feature biomarkers. IL11RA, as the
   primary receptor for IL11, forms a signaling complex with gp130 that
   activates the JAK-STAT pathway, thereby affecting endocrine and immune
   responses. In T2D, IL11 signaling is associated with inflammation in
   metabolic syndrome; in SS, it affects immune cell functionality and
   tissue inflammation. Furthermore, IL11 modulates energy metabolism in
   adipocytes and myocytes, thereby affecting triglyceride metabolism,
   blood glucose levels, and insulin sensitivity. ALDH6A1, an aldehyde
   dehydrogenase, plays a critical role in amino acid metabolism and
   oxidative stress regulation. In addition, it helps remove harmful
   aldehydes, reducing oxidative stress-related cellular
   damage^[169]18,[170]19. In both T2D and SS, oxidative stress correlates
   with cellular apoptosis and tissue damage. ALDH6A1 regulates immune
   responses in SS, and its dysfunction may exacerbate autoimmune
   inflammation. Notably, when constructing predictive models, the
   expression level of ALDH6A1 was significantly higher than that of
   IL11RA, indicating that ALDH6A1 plays a more vital role in T2D
   pathogenesis. In addition, IL11RA expression was strongly correlated
   with macrophage infiltration, providing new insights into the pathology
   of SS. The effectiveness of these two genes in classifying T2D and SS
   was validated using a constructed predictive model and risk score
   assessment. The AUC value of the ROC curve exceeded 0.7, suggesting
   that the model had good classification performance. The innovation of
   this study lies in its comprehensive analysis and data mining, which
   uncovered the molecular mechanisms and clinical relevance of T2D and SS
   in multiple dimensions. These results enrich the biological context of
   these two conditions and provide guidance for future studies. Future
   studies should validate the functions of ALDH6A1 and IL11RA using
   animal experiments or clinical samples, thereby laying the foundation
   for the clinical application of these biomarkers.

   In the present study, we conducted a comprehensive analysis of DEGs
   associated with T2D and SS. We identified 885 DEGs in the T2D dataset,
   including 451 upregulated and 434 downregulated genes, whereas the SS
   dataset revealed 3326 DEGs, including 1526 upregulated and 1800
   downregulated genes. These significant differences in gene expression
   highlight the complex molecular underpinnings of both conditions and
   lay a crucial foundation for exploring their pathological mechanisms.
   Identifying DEGs is of the utmost importance for understanding the
   changes in T2D- and SS-related biological pathways. Notably, many genes
   upregulated in T2D are associated with insulin signaling and metabolic
   pathways, such as the PI3K-AKT signaling pathway, which plays an
   integral role in glucose metabolism and insulin response. Conversely,
   downregulated genes affect cellular stress response mechanisms, thereby
   worsening insulin resistance. In SS, the upregulated genes likely
   reflect enhanced immune responses and inflammation. This study
   corroborates previous findings that genes upregulated in insulin
   resistance and inflammation are often related^[171]20, supporting the
   reliability of our results.

   Furthermore, our pathway enrichment analysis of the DEGs revealed
   critical signaling pathways such as the NF-κB pathway, which is
   essential for immune regulation. The presence of batch effects can
   introduce biases in differential expression characteristics, which
   complicate biological interpretation. Therefore, we used normalization
   techniques and principal component analysis to ensure that the observed
   gene expression differences accurately reflected biological signals.

   In our study, we identified 16 crosstalk genes, 11 of which showed
   significant differences, including AK1, ALDH6A1, and CHCHD10. These
   genes share common pathogenic pathways under T2D and SS, indicating
   their potential in dual therapeutic strategies. The JAK-STAT pathway
   has previously been associated with inflammation and immune responses
   under autoimmune conditions^[172]21; this study provides novel insights
   into the interplay between these pathways.

   Notably, our findings represent the first recognition of the exact
   roles of crosstalk genes within these processes. Validation using
   multicenter clinical samples is recommended to assess their diagnostic
   efficacy across diverse patient populations, thereby underscoring the
   significance of these results in enhancing our understanding of the
   interactions between T2D and SS and presenting new opportunities for
   therapeutic interventions.

   This study introduced a predictive model based on ALDH6A1 and IL11RA.
   This model demonstrated strong classification accuracy in
   distinguishing T2D from SS, with an AUC > 0.7. However, ethnicity, sex,
   and age variations must be considered to ensure the generalizability of
   the model. This requires external validation and adjustments for
   subgroup-specific differences in gene expression.

   In addition, the present study found significant changes in immune
   cells in T2D, including macrophages, natural killer T cells, and
   neutrophils. In addition, notable shifts in central memory CD8 + T
   cells were observed in the SS group. These findings indicate the
   critical role of these cells in disease development and provide
   insights into immunotherapy. Macrophages play a significant role in the
   onset and development of metabolic diseases, particularly in T2D. T2D
   is characterized by insulin resistance and chronic low-grade
   inflammation, which are closely related to the functional and
   phenotypic changes in macrophages. Studies have revealed that
   macrophages polarize to the M1 phenotype in the T2D environment,
   promoting inflammatory responses that lead to tissue damage and insulin
   resistance.

   IL-11 is a pro-inflammatory cytokine that may play a critical role in
   T2D progression by binding to the IL-11RA. Its overexpression can lead
   to sustained activation of macrophages, exacerbating inflammation and
   affecting pancreatic β-cell function, resulting in inadequate insulin
   secretion. Studies have found that IL-11RA expression is significantly
   upregulated in macrophages from patients with T2D and animal models,
   indicating its important regulatory role in the inflammatory
   process^[173]22,[174]23. Therefore, adjusting macrophage polarization
   may offer a novel treatment for T2D. Moreover, the activation of
   natural killer T cells and modulation of neutrophils can target insulin
   resistance and provide therapeutic opportunities.

   This study provides support for the use of immune infiltration data in
   precision medicine. The study suggests the need for large-scale
   sequencing and single-cell transcriptomic analyses to create specific
   immune cell profiles for T2D and SS. In addition, it proposes the
   design of targeted immunotherapy strategies and establishment of
   dynamic monitoring systems for immune function assessment to aid
   personalized clinical decisions.

   Overall, although the predictive model based on ALDH6A1 and IL11RA
   provides promising avenues for early screening of T2D and SS, further
   research is essential for broader applicability and optimization, along
   with careful consideration of ethical issues, to protect patient rights
   and promote sustainable model use. With ongoing refinement, this model
   holds substantial potential for enhancing clinical diagnostic
   efficiency and patient management, thereby advancing immunotherapy and
   therapeutic research on these diseases through improved understanding
   of immune cell changes.

   Although this study reveals important molecular mechanisms between T2D
   and SS through systematic bioinformatics analysis, it is essential to
   acknowledge several limitations. First, the limited sample size may
   have affected the statistical power and reliability of the results,
   thereby restricting the generalizability of the study’s conclusions.
   Second, this research is based on publicly available databases and
   multi-cohort data, primarily focusing on molecular-level associations
   and mechanism exploration, without including experimental validation.
   Owing to practical constraints such as research timeline and available
   resources, relevant functional validation through animal experiments or
   clinical samples could not be completed in this study. This limitation
   partially affects the biological significance of the findings and the
   direct confirmation of their clinical application value. We aim to
   conduct in-depth functional and mechanistic validations targeting key
   genes and molecular pathways by integrating animal experiments and
   clinical samples in subsequent research to further enhance the
   scientific rigor and clinical translational potential of our study.
   Finally, because the data were sourced from specific public databases,
   there may be inherent biases; subsequent research should broaden the
   sample scope and integrate multiple datasets to improve the reliability
   and generalizability of the analysis results.

   Through multilevel bioinformatics analysis, this study identified 16
   genes involved in crosstalk (a phenomenon in which genes interact and
   affect their functions), including AK1, ALDH6A1, and CHCHD10. Among
   these 16 genes, 11 showed significant differences in expression.
   Crosstalk genes are abundant in various biological processes and
   pathways, such as ATP metabolic processes and the JAK-STAT signaling
   pathway. A predictive model based on ALDH6A1 and IL11RA showed good
   classification accuracy for differentiating T2D from SS, with an AUC
   value > 0.7. Our research emphasizes the role of specific immune cells
   in disease progression and offers a new understanding of the
   pathophysiology of these two conditions. These findings not only
   improve the comprehension of T2D and SS but also provide potential
   targets for the design of future immunotherapy. IL-11 interacts with
   macrophages through IL-11RA, thereby affecting inflammatory response.
   The elevated expression of IL-11RA in T2D and SS may exacerbate
   inflammation. Thus, targeting IL-11RA represents a promising
   therapeutic strategy, potentially achieved through monoclonal
   antibodies that block IL-11 from binding to IL-11RA, thereby reducing
   inflammation and improving insulin sensitivity. The effects of IL-11
   involve the JAK-STAT signaling pathway. JAK inhibitors, such as
   tofacitinib, have shown efficacy in various autoimmune diseases,
   suggesting their potential utility in T2D and SS. The PI3K-AKT pathway
   closely relates to insulin signaling, and IL-11 may affect insulin
   resistance through this pathway. Therefore, targeting this pathway with
   PI3K inhibitors may improve the pathophysiology of T2D. Dysfunction of
   ALDH6A1 in immature dendritic cells is related to the onset of SS.
   Enhancing ALDH6A1 expression or activity can promote the maturation and
   immune regulatory functions of these dendritic cells, thereby improving
   autoimmune responses. Gene editing technologies, such as CRISPR/Cas9,
   can be used to upregulate ALDH6A1 expression, providing a novel
   therapeutic target for SS.

   Future studies should validate these results by using larger and more
   diverse samples to increase the external validity of the research. In
   addition, exploring the clinical application of these findings, such as
   leveraging immune infiltration information to promote personalized
   treatment, is an important direction for further research. Through such
   explorations, we anticipate achieving significant advancements in the
   management of these diseases and improving the quality of life of
   patients.

Methods

Data download

   We used the R package GEOquery^[175]24 (Version 2.70.0) to download the
   T2D ([176]GSE18732 and [177]GSE29221) and SS datasets ([178]GSE40611
   and [179]GSE154926) from the gene expression omnibus (GEO)
   database^[180]25 ([181]https://www.ncbi.nlm.nih.gov/geo/). The
   [182]GSE18732, [183]GSE29221, [184]GSE40611, and [185]GSE154926
   datasets contained data from Homo sapiens. Specifically, the
   [186]GSE18732 dataset was on skeletal muscle, microarray platform
   [187]GPL9486; the [188]GSE29221 dataset was on skeletal muscle,
   microarray platform [189]GPL6947. The tissue in the [190]GSE40611
   dataset was the parotid gland and the chip platform was [191]GPL570.
   The tissue in the [192]GSE154926 dataset was a minor salivary gland and
   the chip platform was [193]GPL18573. The [194]GSE18732 dataset
   contained 45 T2D and 47 control samples; the [195]GSE29221 dataset
   contained 12 T2D and 12 control samples. The [196]GSE40611 dataset
   contained 17 SS and 18 control samples; the [197]GSE154926 dataset
   contained 42 SS and seven control samples. These samples were all
   included in the analysis, and the specific details of the datasets are
   presented in Table [198]S1.

Differential analysis

   Based on the sample grouping of the T2D dataset, the samples were
   categorized into T2D and control samples. The R package limma^[199]26
   (Version 3.54.2) was used to analyze the differences in genes between
   the T2D and control samples. We set the thresholds for DEGs to
   |logFC|> 0 and p < 0.05. Genes with logFC > 0 and p < 0.05 were
   considered upregulated DEGs; meanwhile, those with logFC < 0 and
   p < 0.05 were considered downregulated DEGs. The results of the
   differential analysis were displayed by generating a volcano plot using
   the R package ggplot2 (Version 3.4.4).

   Furthermore, limma was used for the differential analysis of genes in
   the SS and control specimens. The threshold of DEGs was consistent with
   that of the T2D dataset.

WGCNA

   WGCNA^[200]15 is a systems biology method for describing gene
   association patterns between different samples and identifying gene
   sets with highly synergistic changes. In addition, the method
   identifies candidate biomarker genes or therapeutic targets by
   evaluating intramodular connectivity and module-trait correlations.
   WGCNA was performed using the WGCNA package of R software^[201]27
   (Version 1.72–5). First, the correlation coefficient between any two
   genes was determined, and the weighted value of the correlation
   coefficient was used to ensure adherence of the connections between the
   genes in the network to the scale-free network. Subsequently,
   hierarchical clustering trees were established based on the correlation
   coefficients between genes. Distinct branches of the cluster tree
   signified different gene modules, with diverse colors denoting
   different modules; thereafter, the module significance was computed.
   The scale-free fitting index was 0.80; the soft threshold values were
   the best soft threshold values; the combined shear height of the
   modules was set to 0.2; and the minimum distance was set to 0.2. The
   minimum module gene number for the WGCNA in [202]GSE18732 was set at
   60; the minimum number of module genes for WGCNA in the [203]GSE40611
   dataset was set to 100. Afterward, the correlation between the two
   groups of disease and control samples and the different modules were
   measured, and the genes in each module were recorded. Finally, all
   modules with p < 0.05 and |r|> 0.30 were retained, and the genes in the
   modules were regarded as module genes.

GO and KEGG pathway enrichment analyses

   GO analysis^[204]28 is commonly used for large-scale functional
   enrichment studies, including BPs, CCs, and MFs. Meanwhile, the
   KEGG^[205]29 is a widely used database that stores information on
   genomes, biological pathways, diseases, and drugs. We used the R
   package clusterProfiler^[206]30 (Version 4.4.4) for GO and genetic
   pathways (KEGG) enrichment analyses; item selection criteria for the
   value p < 0.05.

GSEA

   GSEA^[207]31 is used to assess predefined gene sets and analyze trends
   in gene distribution related to phenotypic correlations. This approach
   facilitates the identification of the contribution of specific gene
   sets to the observed phenotypes. In the present study, the genes in the
   [208]GSE18732 dataset were first ranked according to their logFC
   values; thereafter, the genes were evaluated using GSEA via the R
   package clusterProfiler^[209]30 (Version 4.4.4). The parameters
   employed in the GSEA were as follows: seed number, 2025; number of
   calculations, 1000, minimum number of genes in each gene set, 10; and
   maximum number of genes in each gene set, 500. Through Molecular
   Signatures Database (MSigDB)^[210]32. For c2 gene set cp. All. V2022.1.
   Hs. Symbols. The GMT [all Canonical Pathways] (3050) was used for GSEA
   and the screening criteria was p < 0.05.

GSVA

   GSVA^[211]33 is a nonparametric and unsupervised analysis approach. It
   is primarily used to assess the enrichment outcomes of gene sets in a
   microarray nuclear transcriptome by transforming the expression matrix
   of gene sets of different samples into an expression matrix of gene
   sets between samples. This allows the evaluation of the enrichment of
   distinct pathways in various samples. The h.all.v7.4.symbols.gmt gene
   set was obtained from the MSigDB^[212]32, and GSVA was performed on all
   genes in the dataset to compute the functional enrichment disparities
   between the two sample groups. The selection criterion for GSVA was
   p < 0.05. Five pathways with the most significant p-values were
   selected for presentation.

LASSO regression analysis

   LASSO regression analysis is based on linear regression analysis with
   the incorporation of a penalty term (lambda × absolute value of the
   slope). This curtails overfitting of the model and enhances its
   generalization capability. In the present study, the outcomes of the
   LASSO regression analysis were presented visually using diagnostic
   model graphs and variable trajectory diagrams.

Construction of the prediction model

   A generalized linear model was used for univariate and multivariate
   binary logistic regression analyses. The screening criterion of
   univariate binary logistic regression was p < 0.05, and genes that met
   the screening criterion were included in multivariate binary logistic
   regression for model construction. Thereafter, the R package rms
   (version 6.4–0) was used to construct a nomogram based on the results
   of multivariate logistic regression analysis, and a DCA plot was used
   to evaluate the diagnostic model.

The PPI network

   The GeneMANIA database^[213]34 ([214]https://genemania.org/) is used to
   generate hypotheses regarding gene function, analyze gene lists, and
   prioritize genes for functional analyses. Given a list of query genes,
   GeneMANIA identifies functionally similar genes by using a large set of
   genomic and proteomic data. In this mode, each functional genomic
   dataset is weighted according to the predicted query value. Besides,
   GeneMANIA has been used to predict gene function. Given a query gene,
   GeneMANIA identifies genes that are likely to share functions based on
   how the gene interacts with it. We predicted functionally similar hub
   genes using the GeneMANIA online website to construct a PPI network.

Construction of a regulatory network

   miRNAs play a significant role in the modulation of biological
   development and evolution. Moreover, they regulate a variety of target
   genes, and the same target gene is regulated by multiple miRNAs. To
   analyze the relationship between hub genes and miRNAs, hub
   genes-related miRNAs were obtained from the ENCORI database^[215]35.

   TFs control gene expression by interacting with hub genes at the
   posttranscriptional stage. We retrieved TFs from the ChIPBase
   database^[216]36 ([217]http://rna.sysu.edu.cn/chipbase/) to analyze
   their regulatory effect on hub genes.

Immune infiltration analysis

   ssGSEA^[218]37 measures the relative abundance of each immune cell
   infiltrate. Initially, each infiltrating immune cell type was marked,
   including activated CD8 + T cells, activated dendritic cells,
   gamma-delta T cells, natural killer cells, and various human immune
   cell subtypes such as regulatory T cells. Subsequently, the enrichment
   scores computed using ssGSEA were used to determine the relative
   abundance of immune cell infiltration in each sample, thereby obtaining
   the immune cell infiltration matrix. Afterward, R package ggplot2
   (Version 3.4.4) was used to construct group comparison maps to
   illustrate the expression disparities in immune cells between the two
   groups. Subsequently, immune cells with notable differences between the
   two groups were selected for subsequent analysis, and the correlation
   between immune cells was calculated using the Spearman algorithm. The
   correlation between hub genes and immune cells was calculated using
   Spearman’s algorithm. The R package ggplot2 (version 3.4.4) was used to
   construct a correlation bubble plot to illustrate the correlation
   analysis results for the hub genes and immune cells.

Statistical analysis

   All data processing and analyses in this study were conducted using R
   software (Version 4.2.2). In the comparison of continuous variables
   between the two groups, the statistical significance of normally
   distributed variables was evaluated using the independent Student’s t
   test, except otherwise stated. The Mann–Whitney U test (Wilcoxon
   rank-sum test) was used to assess the disparities among variables that
   did not follow a normal distribution. The Kruskal–Wallis test was used
   to compare three or more groups. Spearman’s correlation analysis was
   used to calculate the correlation coefficients between the different
   molecules. All statistical p values were two-sided if not specified,
   and p < 0.05 was considered statistical significance.

Supplementary Information

   Below is the link to the electronic supplementary material.
   [219]Supplementary Material 1^ (1.4MB, pdf)

Acknowledgements