Abstract Background Pancreatic cellular heterogeneity is fundamental to systemic metabolic regulation, yet its pathological remodeling in diabetes remains poorly characterized. Methods We integrated single-cell RNA sequencing with machine learning frameworks to decode pancreatic heterogeneity. Novel tools included PanSubPred (two-stage feature selection/XGBoost classifier) for multi-lineage annotation and PSC-Stat (XGBoost/Gini optimization) for stellate cell activation analysis. Results By establishing PanSubPred, we systematically decoded pancreatic cellular diversity, identifying 64 cell-type-specific markers (38 novel) that maintained cross-dataset accuracy (AUC > 0.970) even after excluding known canonical markers. Building on this annotation precision, we developed PSC-Stat to quantify stellate cell activation dynamics, revealing their progressive activation from diabetes to pancreatic cancer (activated/quiescent ratio: control: 1.44 ± 1.02, diabetes: 4.72 ± 4.01, pancreatic cancer: 18.67 ± 18.70). Diabetes reorganized intercellular communication into ductal-centric hubs via FGF7-FGFR2/3, EFNB3-EPHB2/4/6 and EFNA5-EPHA2 axes, from which we derived a 15-gene signature for diabetic ductal cells (AUC = 0.846). Beta cell heterogeneity analysis uncovered diabetes-associated depletion of mature insulin-secretory clusters (INS + NKX6-1+), expansion of immature (CD81 + RBP4+) and endoplasmic reticulum stress-adapted subtypes (DDIT3 + HSPA5+). Moreover, non-beta lineages exhibited parallel dysfunction: acinar cells shifted toward inflammatory states (CCL2 + CXCL17+), while ductal cells adopted secretory phenotypes (MUC1 + CFTR+). Conclusions This study presents a machine learning-based single-cell framework that systematically maps pancreatic cellular alterations in diabetes. The identified novel signatures, stellate activation dynamics, and beta cell maturation trajectories may serve as potential targets for diabetic management and pancreatic cancer risk stratification. Graphical abstract [40]graphic file with name 12933_2025_2865_Figa_HTML.jpg Supplementary Information The online version contains supplementary material available at 10.1186/s12933-025-02865-8. Keywords: Pancreatic cellular heterogeneity, Single-cell transcriptomics, Machine learning, Type 2 diabetes, Stellate cell activation, Beta cell dysfunction Research in context What is already known about this subject? * Pancreatic cellular heterogeneity underpins systemic metabolic regulation and orchestrates glucose homeostasis. * The communication network among pancreatic cells is crucial for maintaining normal function. * Beta cell mass declines in T2D, with beta cells displaying heterogeneity between mature and immature subpopulations. What is the key question? * How does pancreatic molecular heterogeneity, spanning multi-lineage cellular states, intercellular crosstalk, and β cell maturation gradients, orchestrate metabolic homeostasis, and how do these systems collapse in T2D? What are the new findings? * AI-empowered cellular cartography: We developed PanSubPred and PSC-Stat, frameworks enabling high-fidelity annotation of pancreatic lineages (64 markers, 38 novel) and stellate cell activation. * Diabetes-driven communication breakdown: T2D collapses global intercellular signaling but strengthens ductal-centric hubs via certain pathways, forming a 15-gene novel signature. * Multi-lineage pancreatic dysfunction: T2D drives divergent transitions across β and non-β cells. Decline in mature β cell populations is observed in T2D, coupled with an increase in immature β cells. How might this impact on clinical practice in the foreseeable future? * The AI-driven PanSubPred/PSC-Stat platform accelerates single-cell multi-lineage exploration, potentially guiding the development of new therapeutic approaches. Introduction The human pancreas, a central hub for systemic metabolic regulation, maintains nutrient homeostasis through its dual exocrine and endocrine functions [[41]1]. The exocrine pancreas secretes digestive enzymes including proteases, lipases, and amylases, to break down meals into absorbable small molecules like amino acids, fatty acids, and monosaccharides for intestinal absorption [[42]2, [43]3]. The endocrine function, orchestrated by dispersed islet cells, dynamically regulates nutrient storage and utilization through hormones like insulin and glucagon. Insulin drives the conversion of glucose into glycogen or fat for storage [[44]4], whereas glucagon not only participates in glucose production, but also modulates circulating fatty acids and amino acids [[45]5], and it has been shown to play an important role in the regulation of hepatic steatosis, which is closely linked to insulin resistance in patients with type 2 diabetes (T2D) [[46]6–[47]10]. Pancreatic function critically depends on a highly heterogeneous cellular network. Recent studies reveal that classically defined cell types in pancreas exhibit significant subpopulation divergence. For instance, beta cells can be subdivided into functionally distinct subpopulations based on insulin secretion capacity, functional identity, or cell maturity [[48]11–[49]13]. Molecular profiling with multi-omics approaches, including single-cell/single-nucleus transcriptomics or single-cell chromatin accessibility, has provided deep insights into beta cell heterogeneity [[50]14–[51]17]. Functionally, this heterogeneity enables adaptive responses to different metabolic demands: high-secreting beta cell clusters maintain insulin level, while stress-resilient subtypes may facilitate recovery and regeneration during periods of metabolic stress [[52]18]. Segerstolpe et al. identified a highly proliferative alpha cell subset potentially involved in pancreatic self-renewal mechanisms [[53]19]. Pancreatic stellate cells (PSCs), a special stromal cell, exhibit two phenotypes: quiescent (qPSC) and activated (aPSC) [[54]20]. Quiescent PSCs, characterized by vitamin A-rich lipid droplets, locate at the peri-acini or around vascular cells, while aPSCs lose these droplets and produce substantial extracellular matrix proteins, playing a pivotal role in remodeling the tumor microenvironment to support pancreatic ductal adenocarcinoma (PDAC) [[55]21, [56]22]. Additionally, pancreatic acinar and ductal cells also display molecularly distinct subpopulations [[57]19, [58]21]. Such cellular heterogeneity complicates the molecular studies of human pancreases. Single-cell RNA sequencing (scRNA-seq) analysis enables transcriptomic profiling of individual cells within heterogeneous tissues [[59]23–[60]25], offering deep insights into pancreatic cellular diversity. While existing studies have mapped cell-type-specific molecular profiles in both non-diabetic (ND) and diabetic individuals [[61]21, [62]26, [63]27], discrepancies in marker genes, especially in the exocrine cells of the pancreas, hinder cross-dataset comparisons and in-depth functional analyses of subpopulations. Machine learning algorithms now provide a breakthrough: supervised probabilistic prediction frameworks integrate multi-source data to build robust classification models, facilitating cell annotation and discovery of novel cell markers [[64]28, [65]29]. Nevertheless, although pancreatic cellular heterogeneity (e.g., distinct subpopulations [[66]19, [67]21]) is well-documented, its functional contribution to T2D pathogenesis remains poorly characterized. In this study, we integrate high-resolution scRNA-seq data with machine learning to systematically map the heterogeneity of human pancreatic exocrine and endocrine cells. We developed PanSubPred, a novel transcriptomics-based pancreatic cell classifier, which enabled high-precision multi-lineage annotation and identified 64 cell-type-specific markers, including 38 novel ones. Leveraging this accurate annotation, we quantified stellate cell activation dynamics using PSC-Stat, revealing a progressive activation continuum from diabetes to PDAC. Intercellular communication analysis uncovered diabetes-driven reorganization into ductal-centric signaling hubs and a novel 15-gene signature. Furthermore, we investigated dynamic changes in subpopulation heterogeneity across pancreatic cell types, including beta and non-beta cells, collectively linking multi-lineage dysregulation to T2D progression. Methods Data curation We retrieved the publicly available human pancreatic datasets from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) and ArrayExpress (Table [68]S1) [[69]16, [70]19, [71]21, [72]30–[73]34]. Donors previously diagnosed with T2D or with an HbA1c ≥ 6.5 were classified as T2D, while those without a previous T2D diagnosis and with an HbA1c ≤ 5.6 were classified as ND [[74]35]. To construct PanSubPred, the dataset from Baron et al. (containing 6749 cells) was randomly partitioned into a training set (4723 cells, 70%) and an independent internal testing set (2026 cells, 30%) at a 7:3 ratio. Additionally, T2D cells from the Baron et al. dataset were also used as an internal test set, while datasets from Segerstolpe et al., Enge et al., and Ngara et al. served as external validation cohorts (Table [75]S2). The PSC phenotype classifier (PSC-Stat) was developed with 400 pancreatic stellate cells (168 qPSCs and 232 aPSCs) from Baron et al., split into a training set (280 cells, 70%) and a internal testing set (120 cells, 30%). Diabetic-status cells from Baron et al. were also used as an internal test set, while datasets from Fang et al., Kang et al., and PDAC data from Steele et al. served as external validation cohorts (Table [76]S7). Ductal cell profiles extracted from Fang et al. were randomly divided into training (80%) and independent testing (20%) sets (Table [77]S9). Raw data processing and quality control For the raw reads from ArrayExpress (accession number: E-MTAB-5061) and GEO (accession number: [78]GSE81608), sequence quality was assessed using FastQC (v.0.12.1) and MultiQC (v.1.22.2). Fastp (v.0.23.4) was then employed to remove adapters and low-quality sequences. After quality control, cells with less than 85% of bases exceeding a Q30 quality score were excluded. Specifically, 196 low-quality cells were filtered out from E-MTAB-5061 and 13 cells were removed from [79]GSE81608. Then, purity-filtered reads were aligned to the human GRCh38 reference genome using STAR (v.2.7.11a) with default parameters, and the number of read counts per gene was quantified with HTSeq (v.2.0.2). Aligned and preprocessed expression count table data from [80]GSE84133, [81]GSE81547, [82]GSE153855, [83]GSE101207, [84]GSE217837, and [85]GSE155698 were downloaded directly from NCBI GEO. The Seurat (v4.1.1) [[86]36] R package was used for data preprocessing, including filtering low-quality cells, normalization, scaling, and visualization of marker genes. The Seurat object was filtered to include only genes expressed in three or more cells and exclude low-quality cells (cells with genes < 200 and > 5,000, mitochondrial gene percentage > 25%). Gene expression was normalized for each cell by library size and log-transformed using a size factor of 10,000 molecules per cell. Dimensionality reduction was performed by principal components analysis (PCA), and visualized with Uniform Manifold Approximation and Projection (UMAP). Cell clusters were identified using FindClusters function (based on the top 15 principal components), with the Louvain algorithm [[87]37] implemented in Seurat. Feature selection and model construction for machine learning analysis In this study, to select features associated with the target variables, several commonly used feature selection algorithms were applied to calculate the importance scores and rankings of all features, including maximal information coefficient (MIC), analysis of variance (ANOVA), F-score, principal component analysis (PCA), gradient boosting decision tree (GBDT), and gini impurity [[88]38, [89]39]. Features were ranked according to their importance scores, and then the incremental feature selection (IFS) strategy was employed to identify the local optimal feature subset containing the top k features. In addition, several machine learning classification algorithms, namely support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and logistic regression (LR) were included in this study. To evaluate the baseline performance of candidate algorithms, all classifiers were implemented using their default parameters. After obtaining the optimal model, hyperparameter tuning was implemented using the GridSearchCV function with 5-fold cross-validation (CV) to optimize the parameters for each classifier. For the multi-class classification problem of the pancreas cell subpopulation classifier, the one-vs-rest strategy was employed to the binary classification algorithm. Reproducing existing cell-type prediction pipelines Four established cell-type annotation pipelines were reproduced for benchmarking against PanSubPred: caSTLe [[90]40], scID [[91]41], scPred [[92]42], and CellTypist [[93]43]. All pipelines were trained on the Baron dataset using the same training strategy as PanSubPred, including performance evaluation across two internal test cohorts: Baron test set and Baron T2D data, and three external test data: Segerstolpe external data, Enge external data, and Ngara external data. The caSTLe [[94]40] implementation followed its original GitHub repository ([95]https://github.com/yuvallb/CaSTLe), with Baron data as the source domain and all test cohorts as target domain. scID [[96]41] and scPred [[97]42] were reproduced using code from their respective repositories ([98]https://github.com/BatadaLab/scID and [99]https://github.com/powellgenomicslab/scPred), maintaining equivalent training-testing splits and preprocessing steps as PanSubPred. We incorporated two distinct aspects of using CellTypist for automated cell type annotation and benchmarking. First, we utilized the existing pre-trained model provided by the developers, which was trained on healthy adult human pancreatic islet cell data. For this, we employed the “Adult_Human_PancreaticIslet” model through the CellTypist online platform ([100]https://www.celltypist.org/) by sequentially uploading the test datasets in CSV format and selecting this pre-trained logistic regression–based model for annotation. Second, to ensure a fair comparison at the method level, we retrained CellTypist using our unified reference training dataset, thereby generating a new model based on the same logistic regression framework, which was implemented using the celltypist.train function in the celltypist Python package (v1.6.3). This retrained model was then applied to various internal and external test datasets to evaluate its performance relative to other methods under the same training conditions. For the other comparative methods (caSTLe, scID, and scPred), we similarly initialized and trained each tool using our reference training dataset according to their respective frameworks. This consistent training approach ensured that all methods learned from the same reference data, allowing for a fair and unbiased benchmarking of their cell type annotation performance on the test datasets. Performance evaluation of machine learning model Serval classic metrics, including accuracy, precision, recall, and F1 measure were used to evaluate the model performance [[101]44], which were defined as follows. graphic file with name d33e578.gif 1 where TP, FP, TN, and FN denote the number of true positives, false positives, true negatives and false negatives, respectively. Additionally, the area under the receiver operating characteristic (ROC) curve (AUC) was also utilized to quantify the prediction performance of the model. Pathway enrichment analysis and pathway scoring Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed using the clusterProfiler [[102]45] (v4.7.1.3) R package. GO enrichment analyses were conducted with the enrichGO function, and KEGG pathway enrichment was carried out using the enrichKEGG function, both with default settings. Enriched terms were ordered by statistical significance, and terms with a p-value < 0.05 was considered statistically significant. Pathway activity quantification at single-cell resolution was achieved through implementation of the Seurat AddModuleScore function, which calculates cell scores based on specific gene sets. This approach enabled us to quantify the activity of particular pathways or gene sets at the single-cell level, offering deeper insights into the functional significance of differentially expressed genes. Pancreas intercellular communication analysis Pancreatic cell-cell communication networks were systematically analyzed using the CellCall R package (v1.0.7) [[103]46], the CellChat R package (v2.2.0) [[104]47], and the cellphonedb python package (v5.0.1) [[105]48]. The CellCall analysis included inferring intercellular communication scores, exploring pathway activity, and predicting the activity scores of downstream transcription factors (TFs). Two main metrics were utilized to quantify intercellular communication: (1) communication frequency: the total number of binary presence/absence of interaction between any two of the seven pancreatic cell types (e.g., beta-alpha and delta-alpha in ND) and (2) communication strength, defined as communication scores calculated for each cell-cell pair using the computational framework from [[106]46]. Pancreas cell subpopulation analysis Three different data were used to explored pancreatic heterogeneity. Initially, a separate Seurat object was created for each dataset, and low-quality cells were removed from each dataset individually. Subsequently, the three datasets were further normalized, merged, and batch-corrected using the merge functions from the Seurat (v4.1.1) package and RunHarmony functions from the Harmony (v.0.1.0) package [[107]49]. Downstream clustering analysis was then performed using the harmony-corrected components. The FindNeighbors function of Seurat with the first 20 principal components was used for the nearest-neighbor graph construction. For each cluster of subset cells, the FindAllMarkers function of Seurat package was utilized to find the differentially expressed positive genes (DEPGs) (> 25% cell detection; p.adj < 0.05 and logFC > 1). The pathway analysis was performed to explore the underlying biological functions of these DEPGs from different pancreatic cell clusters. Pseudotime trajectory and RNA velocity analysis Pseudotime analysis was performed using the Monocle R package (v2.26.0) [[108]50]. The workflow included converting Seurat objects to Monocle objects while preserving highly variable genes and cell-type annotation information. Trajectory inference was conducted using Monocle’s DDRTree method for dimensionality reduction and trajectory visualization, enabling the exploration of cell differentiation trajectories. To perform RNA velocity analysis, we retrieved cellular identifiers, dimensionally reduced embeddings (PCA/UMAP), and associated metadata from the primary Seurat object. Spliced/unspliced counts were generated using Velocyto (v0.17.16) [[109]51] through alignment against the GRCh38 human reference genome. These matrices were then integrated with annotated single-cell transcriptomes via scVelo (v0.3.3) [[110]52] for dynamical modeling. Employing a stochastic inference framework, we computed RNA velocity vectors and projected them onto the embedding to map potential lineage trajectories among cell subclusters. Results Development and validation of PanSubPred for accurate pancreatic cell subtype identification To systematically identify novel and robust cell-type-specific marker genes across seven major pancreatic cell populations, including both endocrine and exocrine lineages, we developed PanSubPred, an integrated computational framework tailored for pancreatic cell subtype annotation. Details on the development and validation of PanSubPred are provided in the Supplementary Text [111]S1 (Fig. [112]1). Using cell labels predicted by PanSubPred, we successfully distinguished cell subpopulations with distinct clustering patterns (Fig. [113]1B). Among the 64 marker genes, most were exocrine-related (Fig. [114]1C and D). Notably, 26 genes were previously reported markers, while 38 are novel candidates (Table [115]S6). Fig. 1. [116]Fig. 1 [117]Open in a new tab Construction and characterization of PanSubPred. A Summary of the PanSubPred method. B Clustering of gene expression profiles of pancreatic islet cells, using cell labels predicted by PanSubPred. C The heatmap of 64 genes in seven pancreatic cell types reveals the cell-type-specific genes (left). The normalized expression level for CELA3A, GC and COL6A2 genes for each cell projected onto UMAP coordinates (right). D The dot plot of 64 genes in different pancreatic cell subpopulations reveals the cell-type-specific genes. E Pathway enrichment analysis of 64 pancreas marker genes. Pathways were ordered by statistical significance (p-value). The Black dashed line represents the statistical significance threshold (p-value < 0.05) PSC-Stat decodes stellate cell activation dynamics across the T2D-PDAC pathological spectrum PSCs are central to pancreatic fibro-inflammatory remodeling, where their transition from qPSCs to aPSCs drives disease progression in both diabetes and pancreatic cancer [[118]22, [119]53]. Precise discrimination of these functionally distinct phenotypes is critical for identifying therapeutic targets against pathological stromal reprogramming. Among the 64 pancreatic cell marker genes identified in PanSubPred, we found that 14 of the 17 stellate-cell-specific genes, including both known markers and newly identified candidates, were differentially expressed between qPSCs and aPSCs (Fig. [120]2A). This distinct expression pattern suggested that these genes could serve as potential biomarkers for distinguishing between the two PSC phenotypes. To test this hypothesis, we developed PSC-Stat, a dedicated computational framework for PSC phenotype classification (Fig. [121]2B). During model construction, we trained and compared three machine learning models (SVM, RF, and XGBoost), with the XGBoost-based model achieving the best performance (AUC = 0.956) in 5-fold CV (Fig. [122]2C). To further optimize the model, we applied three feature selection algorithms, MIC, ANOVA, and Gini, to evaluate the 17 stellate-cell-specific genes. The Gini method identified the top 15 genes as the optimal feature set based on their importance scores (Fig. [123]2D and E). Notably, MMP14 ranked as the top-priority gene in feature selection, followed by TIMP1 and CRLF1, suggesting their pivotal roles in pancreatic stellate cell activation dynamics (Fig. [124]2E). This refined gene panel not only improved model performance, but also highlighted molecular features that may play a role in the transition from qPSCs to aPSCs, offering biologically meaningful clues into the activation process. Fig. 2. [125]Fig. 2 [126]Open in a new tab Construction and evaluation of PSC-Stat. A The violin plot of 17 stellate-cell-specific genes showed that 14 genes of them were differentially expressed between qPSCs and aPSCs. The statistical significance was as such: ns: p-value > 0.05; *: p-value < 0.05; **: p-value < 0.01; ***: p-value < 0.001; ****: p-value < 0.0001, two-sided Wilcoxon rank-sum test. B Summary of the PSC-Stat framework. C The performance comparison of three classifier algorithms-based models with all 17 genes on 5-fold CV of training set. D The IFS curve of the 17 genes from analysis of Gini, ANOVA, and MIC based on the XGBoost algorithm with 5-fold CV. E The importance scores of the top 15 genes based on gini impurity. F The ROC curve of PSC-Stat for different datasets. G Fang scRNA‑seq data projected on the Azimuth reference (top) and the corresponding cell type annotation prediction score displayed on dimensional reduction plot (bottom). H aPSC/qPSC ratio difference between ND (n = 9) and T2D (n = 4) individuals. two-sided Student’s t-test. I aPSC/qPSC ratio difference among ND (n = 9), T2D (n = 4) and PDAC (n = 14) individuals. two-sided Student’s t-test The optimized PSC-Stat achieved outstanding performance, with an AUC of 0.963 in 5-fold CV and 0.978 on the independent test set (Fig. [127]2F), demonstrating its robustness and reliability. Notably, it accurately distinguished qPSCs/aPSCs in the T2D status with an AUC of 0.992 (Fig. [128]2F). To further assess the generalizability of PSC-stat, we validated it on three external independent cohorts: Fang (healthy and T2D individuals) [[129]32], Kang (healthy donors) [[130]16], and Steele (PDAC and adjacent tissues) [[131]33] (Fig. [132]S5A–C). Since these datasets lacked explicit PSC activation labels, we projected the scRNA-seq data onto a publicly available Azimuth integrated human pancreas reference to infer stellate cell phenotypes and obtain corresponding prediction scores (Fig. [133]2G, [134]S5D and E) [[135]16]. Cells with high-confidence (prediction score > 0.9) were retained for model evaluation. PSC-stat achieved consistent performance across all cohorts (Fig. [136]2F: AUC: Fang = 0.886, Kang = 0.835, Steele = 0.881), underscoring its robust predictive accuracy across diverse disease contexts. Together, these results validate PSC-stat as a powerful tool for precise classification of PSC state, providing a valuable resource for studying stellate cell dynamics in pancreatic physiology and pathology. PSC-Stat is freely available at [137]http://lin-group.cn/server/PSC-Stat/index.html. To investigate the pathological relationship between T2D and PDAC, we analyzed stellate cell activation dynamics across disease states. Strikingly, the aPSC/qPSC ratio exhibited a stepwise increase from healthy controls (ND) to T2D and further to PDAC (Fig. [138]2H-I, S5F, aPSC/qPSC ratio: ND: 1.44 ± 1.02, T2D: 4.72 ± 4.01, PDAC: 18.67 ± 18.70). This cascade activation pattern aligns with clinical observations: epidemiological studies show that new-onset diabetes (NOD) patients have a 5- to 8-fold increased risk of being diagnosed with PDAC within 1 to 3 years [[139]54, [140]55], and approximately 50% of PDAC cases present with diabetic phenotypes at diagnosis [[141]56]. Our findings provide single-cell evidence that progressive stellate cell activation may serve as a pathological nexus linking T2D to PDAC development. These results establish a cellular dynamics framework for understanding the T2D-PDAC continuum and suggest that monitoring aPSC ratios could serve as a novel biomarker for early PDAC detection in diabetic populations. Rewired intercellular communication in T2D highlights ductal-centric interaction hubs and a 15-gene diagnostic signature Recent studies have highlighted that coordinated intercellular communication is essential for maintaining pancreatic function and glucose homeostasis [[142]57]. To investigate how diabetes perturbs this coordination, we systematically compared cell-cell interaction networks between ND and T2D samples using CellCall [[143]46], a computational framework that integrates ligand-receptor (L-R) interactions and downstream TF activities through KEGG pathway-based L-R-TF axes. This method infers potential communication events by calculating the L2-norm of L-R co-expression (normalized via softmax function) combined with TF activity scores derived from regulon-target gene enrichment analysis. In normal pancreatic tissue, intercellular communication networks primarily centered on exocrine cells, with stellate and ductal cells acting as central signal integrators that coordinate signals from both endocrine and exocrine populations (Fig. [144]3A). In contrast, T2D patients exhibited significant remodeling of these networks: global interaction frequency (i.e., the number of cell-cell interactions) declined (e.g., complete loss of stellate (sender)-stellate (receiver), delta-alpha, and gamma-stellate interactions etc.), and signal strength (i.e., the communication score calculated by CellCall for each cell-cell interaction) between most cell pairs decreased, particularly the communication targeting stellate cells. This shifted the communication hub toward ductal cells, forming a new pattern where ductal cells become the primary signal recipients (Fig. [145]3A). To further validate this finding, we conducted comparative analyses using two additional cell-cell communication inference tools, CellChat and CellphoneDB. CellChat analysis confirmed that, compared to the ND group, both the number and the overall strength of inferred interactions were reduced in T2D (Fig. [146]S6A). Examining interaction numbers and strengths consistently showed that in the ND group, the majority of cell-cell interactions were organized around stellate and ductal cells serving as core signal integrators (Fig. [147]S6B, D). In contrast, in T2D samples, the communication network shifted to a ductal cell-centered pattern (Fig. [148]S6C, E). We further used CellChat’s comparative pipeline to visualize the differences in interaction numbers and strengths between ND and T2D using circle diagrams. The results once again supported the observation that in T2D, interactions directed toward ductal cells were notably enhanced, particularly signals from beta cells to ductal cells and from delta to ductal cells (Fig. [149]S6F and G). Additionally, we applied CellphoneDB as a second independent method to identify statistically significant ligand–receptor interactions in both ND and T2D groups, considering interactions with p-values < 0.05 as significant. Consistent with the previous analyses, we found that the number of significant interactions in T2D was lower than in ND. Moreover, in the ND group, significant interactions were predominantly concentrated around stellate and ductal cells, whereas in T2D they were mainly distributed in stellate and ductal cells, with ductal cells emerging as a key hub (Fig. [150]S6H-I). Fig. 3. [151]Fig. 3 [152]Open in a new tab Analysis of cell-cell communication among pancreatic cells in ND and T2D. A Circle plot of intercellular communication patterns among seven pancreatic cell subsets in ND (left) and T2D (right), respectively. The outermost ring colors represent the seven distinct cell types. The inner ring colors indicate whether the cell is acting as a sender (ligand-expressing, in red) or a receiver (receptor-expressing, in blue). Each arc represents an inferred communication from a ligand-expressing cell to a receptor-expressing cell, representing the inferred signal directionality. The color of the arc corresponds to the sender cell type, and the color gradient from light to dark reflects the strength of the communication score between the interacting cell pairs. B Sankey plot showing the ligand–receptor–transcription factor (L–R–TF) signaling cascade from beta cells (sender) to ductal cells (receiver) under T2D conditions, in which ligands are expressed in beta cells and both receptors and TFs are in ductal cells. The three columns from left to right represent ligands (sender), receptors (receiver), and TFs, respectively. And the coloring of the genes is only intended to improve readability. Each complete stream indicates a distinct signaling pathway involving a L–R–TF signaling. The color of the stream on the left and right sides are consistent with ligand and receptor respectively, allowing the flow of information through the pathway to be visually tracked. C The communication network from other cells to ductal cells in T2D patients. Select ligand-receptor axes with a communication score greater than 0.5 for visualization. D Venn diagram showing the overlap of target genes associated with 6 key TFs in ductal cells (TGs) and two disease-specific pathways (PGs). E Venn diagram showing the shared genes in ductal cells between the Fang et al. (discovery) and Segerstolpe et al. (validation) datasets. F The performance comparison of four classifier algorithms-based models with the shared 77 genes on training set. G The IFS curve of gene selection using LR and four feature selection methods on training set. The black dotted line represents the top 15 genes selected by GBDT. H The importance scores of the top 15 genes selected by GBDT. I The ROC curve of the model with the top 15 genes on training, internal testing and external validating Segerstolpe dataset Interestingly, despite the global attenuation of intercellular signaling, beta-ductal and delta-ductal communication scores were enhanced in T2D (Fig. [153]3A), potentially driven by ligand-receptor axes such as FGF7-FGFR2/3, EFNB3-EPHB2/4/6, and EFNA5-EPHA2 (Fig. [154]3B and C, Fig. [155]S7A–C). Further analysis of these ligand-receptor-TF axes in T2D ductal cells identified six key TFs: ETS1, JUN, MYC, NFKB1, NFKBIA, and ABL1 (Fig. [156]3B, Fig. [157]S7D–F). Additionally, CellCall pathway analysis revealed T2D-specific activation of insulin signaling and axon guidance pathways, predominantly enriched in beta-ductal and delta-ductal interactions (Fig. [158]S7G-H), suggesting their roles in metabolic stress adaptation and neural remodeling during diabetes progression. To test whether these transcriptional and pathway alterations could serve as potential T2D-specific signatures, we developed a machine learning framework integrating TF-associated genes and pathway components. From the 1,472 target genes (TGs) of the six key TFs in ductal cells (Table [159]S8A) and 304 genes from T2D-enriched pathways (PGs) (Table [160]S8B), we identified 1,729 candidate genes, including 47 shared genes (Fig. [161]3D). We then extracted their expression profiles from two independent ductal cell datasets, covering both ND and T2D samples. Due to the inherent sparsity in single-cell data, many genes are not detected in most cells; therefore, we excluded genes not expressed in more than 70% of cells. After filtering, 78 and 1,451 genes remained in the two datasets, with 77 genes shared (Fig. [162]3E). The downstream analyses focused on this robust set of 77 genes. The larger dataset was used as a discovery cohort, randomly split into training (80%) and testing (20%) sets (Table [163]S9). We evaluated multiple algorithms and found that LR-based model achieved the best performance (Fig. [164]3F). Using MIC, ANOVA, Gini, and GBDT for feature selection combined with LR, we selected the inflection point with top 15 GBDT-ranked genes as the optimal feature set (Fig. [165]3G-H). Parameter tuning based on these 15 genes yielded a model with good discrimination in both the training and internal test sets (Fig. [166]3I). External validation on the Segerstolpe dataset further confirmed its robustness (AUC = 0.846; Fig. [167]3I). These results indicate that integrating communication-related transcriptional hubs and pathway perturbations is a promising strategy for identifying T2D biomarkers mechanistically linked to disease progression. The consistent high AUC values across datasets support the notion that altered intercellular communication is a core feature of T2D and offers important insights into its pathogenesis. Among the 15 identified genes, the core gene RPL3 (Fig. [168]3H, ranked first in feature importance) is a downstream target of the transcription factor MYC. In addition, RPS4X, RPL13, RPL36, and ACTG1, part of this 15-gene panel, are also regulated by MYC. Studies have shown that MYC, a known recovery gene associated with apoptosis, also acts as a key regulator of cell proliferation, and its altered expression levels are linked to the development of T2D [[169]58]. Furthermore, four of the 15 genes are downstream targets of ABL1, and another four are regulated by NFKB1. Notably, seven out of the 15 genes, including ACTB (ranked third), S100A6 (ranked fourth), and KRT19 (ranked fifth), are downstream targets of the transcription factor ETS1 (Fig. [170]3H, Table [171]S8). This suggests that ETS1 may serve as a central hub in regulating the transcriptional network of pancreatic ductal cells, further supporting our hypothesis that communication-regulated transcription factors and their target genes can provide new molecular evidence and potential intervention targets for T2D progression. Collectively, these genes, orchestrated by TFs such as MYC, ETS1, and NFKB1, form a ductal cell-centered communication network, offering a new set of potential molecular biomarkers for diabetes. A representative mature beta cell subset associated with insulin secretion and mitochondrial metabolism is reduced in T2D Beta cell heterogeneity has been well-documented in both humans and mice, with distinct subpopulations identified based on functional, morphological, molecular, and maturation characteristics [[172]11–[173]13]. However, the relationship between beta-cell heterogeneity and beta-cell failure in T2D, as well as the specific roles of individual beta subpopulations in diabetes pathogenesis, remains poorly understood. To address this, we integrated single-cell transcriptomic data from Xin, Baron, and Segerstolpe with ND and T2D individuals. After quality control to remove low-quality cells, we constructed a joint embedding space consisting of 18 distinct clusters, irrespective of data source, donor origin, and disease status (Fig. [174]S8A). Upon annotating the major pancreatic cell subpopulations (Fig. [175]S8B–D), we extracted beta cells and performed Louvain clustering, identifying three distinct clusters that were present in both ND and T2D (Fig. [176]4A-B). Fig. 4. [177]Fig. 4 [178]Open in a new tab scRNA-seq analysis reveals changes in beta cell heterogeneity promoted by T2D. A Unsupervised clustering of single-cell transcriptome visualized with UMAP analysis. Data represent pancreatic islet cells from ND (n = 6,284 cells pooled from 19 donors) or T2D (n = 3,083 cells pooled from 9 donors). B Projection of beta cells (n = 2,793) from both ND and T2D using UMAP analysis. C Normalized expression values for key beta cell maturation, immaturity, insulin secretion, endoplasmic reticulum (ER) stress and ER-associated degradation (ERAD) compared among different beta cell clusters. D Cell scores for selected gene sets, including beta cell maturity, immaturity, insulin secretion, beta cell development, beta cell proliferation and response to ER stress. The central error bars in the violin plot indicate the mean and standard deviation (mean ± sd) of the data distribution. E Enrichment of GO terms ordered by statistical significance in cluster 0 beta enriched genes. Dashed line represents p-value < 0.05. F Frequency of beta cluster cells in ND and T2D donors, with cluster 0 (ND n = 19, T2D n = 9), cluster 1 (ND n = 7, T2D n = 7), and cluster 2 (ND n = 13, T2D n = 5). Data shown are the mean ± s.e.m. *p value < 0.05, two-sided Wilcoxon rank-sum test. G Significant differential enriched genes in each beta cluster of ND and T2D individuals. H Summarized map of beta cell heterogeneity and loss. Tp: transcriptional. I RNA-velocity analysis of ND and T2D INS/GCG, alpha-/beta-cells. Black streamline arrows represent predicted direction of cell state change and trajectories. Larger red arrows represent overall velocity for each area of the UMAP We conducted molecular profiling of the three distinct beta cell clusters (Fig. [179]S8D-E), identifying three distinct beta cell subpopulations: cluster 0 representing mature beta cells, cluster 1 corresponding to immature or dedifferentiated beta cells, and cluster 2 characterized by elevated expression of endoplasmic reticulum (ER) stress-related genes. The detailed molecular signatures of each cluster are discussed in Supplementary Text [180]S2 (Fig. [181]4C-E). To obtain a global landscape of beta cell heterogeneity, we compared the proportion differences of beta cells between ND and T2D individuals. We identified a significant reduction in mature beta cells (cluster 0) and concomitant expansion of immature beta cells (cluster 1) in T2D pancreases, accompanied by an increased trend of ER stress-associated beta cell subpopulations, although this did not reach statistical significance (Fig. [182]4F). These observations suggest that the progression of diabetes may be associated with a gradual loss of beta cell function and the accumulation of immature beta cells, reflecting the decline in insulin secretion capacity and changes in beta cell responsiveness. Consistent with our findings, Hrovatin et al. [[183]59] constructed a comprehensive single-cell atlas of mouse islets spanning developmental and diabetic states, and revealed that beta cells in T2D genetic models exhibit downregulation of genes associated with maturity and insulin secretion (e.g., Ucn3, G6pc2), alongside upregulation of ER stress activity and enrichment of immature-markers. This shift in gene expression programs aligns well with the heterogeneity we observed in the distribution of beta cell subpopulations in human T2D. Further analysis revealed that in T2D, subpopulations representing mature beta cells (cluster 0) and less mature cells (cluster 1) exhibited more differentially expressed genes compared to ND (Fig. [184]4G), suggesting that diabetes progression may amplify the molecular differences between these subpopulations, increasing transcriptional output and noise. Such heightened transcriptional output resembles the dysregulated gene expression observed in aging beta cells reported by Shrestha et al. [[185]60], potentially explaining the recurrent identification of ER stress-linked beta cell subtypes. Regarding the potential mechanisms of beta cell loss, we hypothesize that cells initially activate compensatory responses, such as ER-associated degradation (ERAD), to alleviate ER stress by degrading misfolded proteins. If stress is resolved, cells restore homeostasis and resume normal insulin secretion. However, persistent ER stress that cannot be mitigated may eventually trigger apoptotic pathways, leading to beta cell loss (Fig. [186]4H). Molecular and functional heterogeneity of non-beta pancreatic cells in T2D Beyond beta cells, significant molecular heterogeneity exists across pancreatic cell populations. Sub-clustering of acinar cells identified two functionally distinct subsets (Fig. [187]5A): cluster 0 exhibited high expression of digestive enzyme-related genes (e.g., CELA2A, CELA3A, CELA2B), while cluster 1 was enriched for immune and inflammatory markers, such as CD74, CCL2, CXCL17 (Fig. [188]5B), consistent with prior reports by Segerstolpe et al. [[189]19]. Notably, beyond immune and inflammatory signatures, subset 1 acinar cells also displayed distinct metabolic dysregulation, characterized by oxidative stress, xenobiotic metabolism, and hormonal modulation (Fig. [190]5D). Functional characterization identified cluster 0 as the dominant mature acinar population (70.3% in healthy controls), primarily responsible for normal function of acinar cell. In contrast, cluster 1 adopted a multifunctional state involving inflammatory modulation, metabolic stress adaptation. Critically, the proportional expansion of cluster 1 in T2D individuals suggests its potential role in exacerbating pancreatic dysfunction through aberrant metabolic-immune crosstalk (Fig. [191]5C). Fig. 5. [192]Fig. 5 [193]Open in a new tab Functional and molecular heterogeneity in non-beta pancreatic cells. A Projection of two acinar subpopulations from ND and T2D using UMAP analysis. B The differentially expressed genes between acinar clusters 0 and 1 indicate enrichment patterns, with upregulated genes representing those enriched in cluster 0 and downregulated genes corresponding to those enriched in cluster 1. Significance threshold: adjusted p-value < 0.05 and|logFC| >1. C Bar plot showing the proportion of each acinar 0/1 subpopulation in ND and T2D. Each condition is color-coded as indicated. D Enrichment of GO terms and associated genes significantly enriched in acinar clusters 0 and 1. E Projection of two ductal subpopulations from ND and T2D using UMAP analysis. F Bar plot showing the proportion of each ductal 0/1 subpopulation in ND and T2D. Each condition is color-coded as indicated. G Genes with significantly differential expression between subclusters of ductal subsets, with upregulated genes representing those enriched in cluster 0 and downregulated genes corresponding to those enriched in cluster 1. Significance threshold: adjusted p-value < 0.05 and|logFC| >1. H Pathway enrichment analysis of genes significantly enriched in ductal cell clusters 0 and 1. Dashed line represents p-value < 0.05. I Projection of three alpha subpopulations from ND and T2D using UMAP analysis. J Normalized expression values for key alpha cell identity/function, mitochondrial-associated, and cell proliferation-associated genes compared among different alpha cell clusters. K Comparison of GO enrichment analysis for genes enriched in alpha cell clusters 0 and 2. Dashed line represents p-value < 0.05 Sub-clustering analysis of ductal cells also identified two distinct subpopulations (Fig. [194]5E). Cluster 0 predominated in ND pancreases (70.5%), whereas subset 1 was markedly expanded in T2D individuals (Fig. [195]5F: 70.4%). Transcriptomic profiling revealed ductal 0 cell was enriched for microtubule-associated genes (including TUBA1C, TUBA1B, TUBB, TUBB4B etc.), involved in cytoskeletal organization and mitotic cell cycle regulation, while cluster 1 was enriched for genes associated with bile, gastric, pancreatic, and salivary secretion pathways (Fig. [196]5G and H). Consistent with our findings, Baron et al. previously reported ductal cell heterogeneity, identifying two major subpopulations characterized by high expression of MUC1 and CFTR, respectively [[197]21]. In our study, both MUC1 and CFTR were significantly enriched in cluster 1 (Fig. [198]5G). We conducted a subpopulation analysis of pancreatic alpha cells and identified three distinct alpha cell subtypes (Fig. [199]5I). The predominant cluster 0 (79.4% of alpha cells) exhibited marked enrichment of alpha cell identity genes (Fig. [200]5J: ARX, LOXL4, FEV), aligning with the electrophysiologically active mature alpha cell subtype reported by Camunas-Soler et al. [[201]61]. In stark contrast, the minor cluster 2 (0.5%) showed pronounced cell cycle gene activation, including CDK1, MKI67, CENPF etc. Consistent with findings from Segerstolpe et al. [[202]19], this rare alpha-cell subset may represent a highly proliferative population contributing to pancreatic tissue self-renewal. Additionally, cluster 1 (20.1%) demonstrated unique enrichment of mitochondrial-encoded genes, suggesting metabolic reprogramming may underlie its functional specialization (Fig. [203]5J). Functional enrichment analysis confirmed that cluster 0 was associated with biological functions related to mature alpha-cell activity, including hormone secretion/transport/response and signal release. Meanwhile, cluster 2 was predominantly enriched in pathways related to cell division, proliferation, and cell cycle regulation (Fig. [204]5K). Similar to the heterogeneity observed in beta-cell maturation states, we speculate that alpha cell subpopulations may maintain pancreatic homeostasis through a mature-proliferative duality, suggesting a conserved mechanism for pancreatic endocrine homeostasis. Discussion In this study, we developed PanSubPred, a novel pancreatic cell annotation tool based on human pancreatic scRNA-seq data, for accurate identification of endocrine and exocrine cell subtypes. The innovation of this tool is twofold. First, our machine learning framework, integrating multi-source data, successfully addresses cross-dataset generalization challenges, establishing a highly robust classification model. Second, the integration of complementary feature selection algorithms with ensemble learning strategies effectively captures both linear and nonlinear gene expression patterns, enabling robust marker discovery. Performance evaluation demonstrates that PanSubPred maintains encouraging classification accuracy across all pancreatic cell types, notably showing robustness in cross-datasets. Systematic analysis of canonical markers versus novel candidates revealed an interesting biological insight: While the six established markers (INS/GCG/SST/PPY/PRSS1/KRT19) contribute to satisfactory classification accuracy, the remaining 58 genes maintain 98.3–100% discriminative power. This finding holds dual significance. Methodologically, it validates our framework’s ability to extract subtle biological signals beyond canonical markers. Biologically, it suggests pancreatic cell identity may be regulated through coordinated expression programs involving both well-established markers and novel genes, opening new avenues for discovering diagnostic biomarkers and therapeutic targets. The characteristic co-expression patterns among these genes also indicate their participation in cell-type-specific regulatory networks. Beyond the six classical markers (INS/GCG/SST/PPY/PRSS1/KRT19), the gene panel used in PanSubPred includes 20 previously reported potential markers and 38 novel cell-specific markers identified in this study (Table [205]S6); the latter may provide new molecular tools for enrichment, purification, and functional analysis of specific pancreatic cell subpopulations. Moreover, we also recognize that for diseases closely related to pancreatic function, such as diabetes, the choice of reference atlas (e.g., using healthy vs. diseased samples) during cell annotation may influence the accuracy of label transfer in single-cell analyses. Most existing reference atlases or pre-trained models are constructed based on healthy tissue samples; however, in disease states like diabetes, the transcriptional features of pancreatic cells may undergo significant changes, such as beta cell dysfunction and abnormal dedifferentiation. These alterations pose challenges for accurate label transfer. Therefore, PanSubPred places particular emphasis on robust annotation and generalization across multi-source data and different disease states, aiming to minimize potential biases caused by pathological conditions. In the future, developing mixed or stratified reference atlases that incorporate both healthy and different disease-state samples may further enhance the accuracy and stability of cell subtype annotation, providing more reliable tools to support single-cell studies of complex diseases. Significant heterogeneity has been identified within classically defined pancreatic cell populations. Specifically, PSCs exhibit two activation-dependent subtypes: activated (aPSC) and quiescent (qPSC). Leveraging PSC-specific markers, we developed PSC-Stat, a novel tool for dynamically resolving PSC activation states. Notably, our analysis revealed an upregulated aPSC/qPSC ratio in both T2D and PDAC compared to controls. In T2D, persistent aPSC activation drives excessive extracellular matrix deposition, accelerating pancreatic fibrosis, which may be an important mechanism of beta cell failure [[206]62–[207]64]. More importantly, the progressive activation of aPSCs from ND to T2D and T2D to PDAC highlights a potential pathological link between diabetes and PDAC. This aligns with clinical evidence of an elevated PDAC risk in NOD patients [[208]54, [209]55], further supporting the notion that diabetes serves as the early warning for PDAC. Mechanistically, T2D-induced aPSC activation may remodel the pancreatic stromal microenvironment through secretion of pro-fibrotic factors and pro-inflammatory mediators, ultimately promoting PDAC pathogenesis [[210]65]. These findings suggest that monitoring PSC activation states via PSC-Stat could facilitate the development of targeted interventions against aPSCs. Such strategies could not only mitigate pancreatic fibrosis but also improve the islet microenvironment, offering a dual regulatory approach for both T2D and PDAC progression. Beta cell heterogeneity is a typical representation of pancreatic islet cell heterogeneity. To reproduce previously reported beta cell molecular subpopulations, we employed a classical heterogeneity analysis framework: identifying subpopulation structures through k-nearest neighbors (kNN)-based clustering, screening molecular differences via Wilcoxon rank-sum test, and characterizing functional features/cellular identities through biological pathway enrichment analysis [[211]11, [212]21]. In integrated human islet single-cell data, we identified a more mature, metabolically and functionally active beta cell subpopulation, as reported by Rubio-Navarro et al. in mouse beta cell studies [[213]11], which accounted for 77.8% of beta cells in ND and was significantly reduced in T2D. In contrast, an immature/proliferative/dedifferentiated beta cell subpopulation exhibited expansion in T2D. This maturity gradient-based classification of beta cells aligns with Bader et al.‘s findings in mice, where the human ortholog CFAP126 of murine Fltp serves as a mature marker for beta cell [[214]12]. Crucially, the well-documented beta cell loss in chronic T2D [[215]66] is reinterpreted through dual pathological trajectories in our study: selective depletion of functional beta cell pools concurrent with compensatory expansion of immature/transcriptionally unstable/stress-resilient subpopulations. This imbalance likely reflects an islet compensation-decompensation continuum that ultimately exacerbates beta cell failure in T2D. Furthermore, we associated the accumulation of the ER stress-associated subpopulation with the activation of beta cell apoptosis pathways, proposing a “stress-exhaustion” model that mechanistically explains the recurrent identification of ER stress-associated beta cell subtypes across different studies [[216]21]. In summary, our findings highlight that diabetes is characterized not only by changes in the proportions of beta cell subpopulations but also by profound disruptions in gene expression across these clusters. As the disease progresses, beta cells in diabetic patients show more marked instability and heterogeneity at the transcriptional level, which may underlie the gradual loss of pancreatic function. Comparative analysis of pancreatic cell communication networks between ND and T2D highlight significant topological reorganization. We observed a shift from a stellate-/ductal- central signal integrators in ND to a ductal-centric interaction hub in T2D, where ductal cells serve as the primary integrators of signals from multiple cell types. We propose that the transcriptional reprogramming of ductal cells in T2D exacerbates beta cell dysfunction through metabolic-inflammatory crosstalk. Specifically, NFKBIA, a key mediator of the beta-ductal communication cascade, serves as a crucial immune and inflammatory regulator whose expression has been associated with reduced beta-cell exocytosis in T2D [[217]61]. Meanwhile, NF-κB activation (via NFKB1/NFKBIA) in ductal cells drives the secretion of pro-inflammatory mediators such as IL-6 and CXCL8 [[218]67], which are known to induce beta-cell apoptosis and impair glucose-stimulated insulin secretion [[219]68, [220]69]. Collectively, these findings position ductal cells as potential active orchestrators of beta-cell failure in T2D, linking transcriptional rewiring to metabolic stress and inflammatory niche formation. Furthermore, through systematic analysis of signal transduction axes (receptor-ligand-TF-pathway-target gene), we identified a 15-gene signature in T2D ductal cells, which exhibited high AUC values across independent cohorts, demonstrating the utility of communication-driven analyses in potential markers discovery. Among these features, RPL3, RPS4X, RPL13, and RPL36 are ribosomal protein-related and are all downstream targets of MYC. MYC, a popular target in cancer research, orchestrates 3D chromatin structure by forming CTCF-mediated loops in cancer [[221]70]. Though the role of MYC in diabetic chromatin remodeling remains unexplored, emerging evidence suggests its potential to reconfigure enhancer-promoter interactions of metabolic genes under hyperglycemic stress. For instance, MYC interacts with lncRNA CCAT1 [[222]71], a regulator of glucose metabolism [[223]72]. These observations position MYC as a plausible mediator of 3D genomic changes in diabetic ductal cells, warranting validation through HiChIP or ChIA-PET assays. Additionally, KRT19 and KRT7 are established ductal cell markers, while S100A6 and S100A10, both members of the S100 family, contribute to intracellular calcium signaling. Aberrant expression of S100A6 has been linked to T2D and pancreatic cancer, and it has been proposed as a potential diagnostic marker for PDAC [[224]73, [225]74]. Although these pancreatic ductal cell profile-based diagnostic markers have robust performance, we acknowledge the limited clinical feasibility of pancreatic biopsy as a routine diagnostic approach for diabetes. The invasive nature and associated risks of biopsy procedures, render this method impractical for widespread clinical adoption. Nonetheless, we believe that this communication-based perspective provides novel mechanistic insights into disease progression and may help identify novel promising targets for therapeutic intervention. Conclusion This study establishes that pathological remodeling of pancreatic cellular heterogeneity is central to T2D progression. Through machine learning-enhanced single-cell analytics, we developed PanSubPred and PSC-stat frameworks to resolve seven pancreatic lineages and stellate cell activation states, identifying 38 novel markers and a diabetes-/PDAC-associated aPSC activation cascade. T2D rewired intercellular communication toward ductal hubs while depleting mature beta cells and expanding dysfunctional subtypes (immature CD81+/ER-stressed DDIT3+). Analysis of the variations of pancreas cellular heterogeneity in disease progression could improve our understanding of the disease molecular mechanism and might help to develop more potential anti-diabetic therapeutic schedules. Electronic supplementary material Below is the link to the electronic supplementary material. [226]Supplementary Material 1^ (63.2KB, xlsx) [227]Supplementary Material 2^ (5.1MB, docx) Acknowledgements