Abstract Pancreatic ductal adenocarcinoma (PDAC) presents significant clinical challenges owing to its dense stroma and complex tumor microenvironment (TME). In this study, large-scale single-cell transcriptomics and spatial transcriptomics (ST) were integrated to dissect the heterogeneity of fibroblasts and their crosstalk with epithelial cells, with a focus on key ligand-receptor interactions. Eight distinct fibroblast subpopulations were identified, among which extracellular matrix (ECM)-remodeling fibroblasts were particularly enriched in tumor tissues and associated with poor prognosis. ECM-remodeling fibroblasts were located at the terminal stage of the fibroblast pseudotime trajectory, and SOX11 was identified as a key transcription factor in this subpopulation. Further analyses revealed that ECM-remodeling fibroblasts can interact with epithelial cells through the POSTN-ITGAV/ITGB5 ligand-receptor axis, a critical pathway that promotes tumor progression. Clinical analyses demonstrated a strong correlation between POSTN expression and poor prognosis in patients with PDAC. Mechanistically, POSTN interacts with integrin ITGAV/ITGB5 on tumor cells, activating the PI3K/AKT/β-catenin pathway and promoting epithelial-mesenchymal transition (EMT) phenotype. Pharmacological inhibition of the POSTN-integrin axis partially reversed these malignant traits, highlighting its potential as a therapeutic target. This study provides new insights into fibroblast heterogeneity and its role in PDAC progression, emphasizing the POSTN-ITGAV/ITGB5 axis as a promising target for therapeutic interventions. Keywords: Pancreatic ductal adenocarcinoma, Cancer-associated fibroblasts, Single-cell RNA sequencing, Heterogeneity, Stroma-tumor crosstalk, POSTN Introduction Pancreatic ductal adenocarcinoma (PDAC) is one of the most intractable cancers worldwide [47]^1. Despite advancements in diagnostics and treatments over recent decades, the 5-year survival rate remains below 9%.[48]^1 Surgical resection, the primary curative option, is feasible in only 10-15% of patients, and most suffer from metastasis owing to the lack of effective therapies [49]^2. This poor prognosis is largely attributed to the extensive stromal components and tumor heterogeneity in PDAC. The complex and intricate interactions between stromal components and heterogeneous cancer cells significantly hinder our understanding of tumor progression and metastasis. Improving clinical outcomes demands a detailed understanding of stromal heterogeneity and its crosstalk with epithelial cells. Cancer-associated fibroblasts (CAFs) constitute the predominant component of the PDAC stroma and play a pivotal role in tumor progression. The heterogeneity of CAFs is manifested across diverse levels, from cellular level to microenvironmental and regional level, all of which contribute to their functional diversity [50]^3. Recent single-cell transcriptomic technologies have enabled the characterization of CAFs across various tumors [51]^4^-[52]^6, including PDAC [53]^7^-[54]^9. While most studies have focused on descriptive profiling, detailed investigations into CAF-tumor cell crosstalk and clinically relevant targets remain to be further explored. Furthermore, the small sample sizes in these studies have impeded comprehensive understanding of CAFs and cancer cell subpopulations, particularly rare subtypes. Therefore, large-scale integrated single-cell transcriptomic datasets are urgently required to unravel CAF heterogeneity and complex interactions. CAFs primarily interact with epithelial cells via ligand-receptor interactions, either through soluble ligands binding to receptors or direct cell-cell adhesions [55]^10. Such ligand-receptor communication is deeply involved in tumorigenesis, tumor progression, and therapy resistance [56]^11. For instance, in PDAC, CAF-derived thrombospondin 2 (THBS2) drives tumor progression via integrin αvβ3/CD36-mediated signaling [57]^12. Similarly, in esophageal squamous cell carcinoma (ESCC), CAF-derived collagen I induces radioresistance via the collagen I-integrin axis [58]^13. Understanding the key ligand-receptor interactions in CAF-tumor crosstalk holds significant potential for advancing therapeutic development. However, mapping these signaling pathways at a single-cell level and validating their clinical significance with patient samples and functional assays still remains to be further explored in PDAC research. This study integrated 66 single-cell transcriptome samples as a discovery cohort supplemented with a partial external validation cohort and spatial transcriptomics (ST) for spatial context. This approach allowed us to delineate the heterogeneity of CAFs and epithelial cells in PDAC. We identified the POSTN-ITGAV/ITGB5 axis as a key mediator in PDAC progression. Clinical analyses revealed a correlation between CAF-derived POSTN and poor outcomes. Mechanistically, functional validation revealed that ECM-remodeling CAF-derived POSTN promotes differentiation of PDAC cells into the epithelial-mesenchymal transition (EMT) phenotype through activating the PI3K/AKT/β-catenin signaling pathway via integrin αvβ5. These findings pave the way for novel therapeutic strategies targeting this signaling axis in PDAC. Methods and Materials scRNA data collection and basic analysis A discovery cohort of 66 single-cell RNA sequencing (scRNA-seq) samples was assembled from three publicly available datasets ([59]GSE205013[60]^9, [61]GSE197177[62]^14, and CRA001160[63]^7), with clinical information obtained from the corresponding studies ([64]Table S1). Data were analyzed using the Seurat (v5.0.2) R package. Cells were filtered based on the following criteria: (1) 300-8,000 detected genes, (2) UMI count < 40,000, (3) mitochondrial gene percentage < 10%, and (4) hemoglobin gene percentage < 1%. Samples with fewer than 500 cells were excluded from analysis. The retained 66 samples were merged, normalized, and scaled. The top 2000 HVGs were selected for principal component analysis (PCA), with 30 principal components selected for subsequent clustering. Batch effects were corrected using Harmony (v1.2.1). Clustering was performed using the FindNeighbors and FindClusters functions (resolution = 0.5), and clusters were visualized using uniform manifold approximation and projection (UMAP). Cell annotation was conducted manually, based on marker genes from previous studies. Fibroblast subclusters were re-analyzed following the same pipeline. Additionally, a validation cohort of 44 scRNA-seq samples ([65]GSE155698[66]^15 and CRA001160) was incorporated and analyzed using the same workflow ([67]Table S2). Spatial transcriptomic (ST) data collection and cell-type estimation ST data for PDAC samples were obtained from [68]GSE235315[69]^16 ([70]Table S3). Individual ST samples were pre-processed using the Seurat. To evaluate the spatial distribution of clusters identified in the scRNA-seq discovery cohort, ST and scRNA-seq expression matrices were integrated using CellTrek (v0.0.94) [71]^17, incorporating spatial coordinate. Functional enrichment analysis of single-cell subclusters Gene Ontology (GO) enrichment analysis for marker genes in each subcluster was performed using clusterProfiler (v4.6.0), with pathways considered significantly enriched at adjusted P < 0.05. Gene set variation analysis (GSVA) on hallmark pathways from MSigDB (v2023.2.Hs) was performed using the GSVA package (v1.34.0) to calculate GSVA scores. Trajectory analysis Fibroblast differentiation trajectories were inferred using Monocle2 (v1.3.4) [72]^18 with default parameters. Trends were further validated using Monocle3. Epithelial cell trajectories were constructed using Monocle3 to outline evolutionary trends. Regulatory network inference of single-cell subclusters Regulatory networks and regulon activities were analyzed using pySCENIC (v0.12.1). Transcription factor (TF) activity was assessed by calculating the AUC of genes regulated by each TF, and the regulon-specific score (RSS) for each TF was also computed. Identification of epithelial expression programs We applied non-negative matrix factorization (NMF) to epithelial cells from each sample to identify programs that captured cellular heterogeneity. These methods were similar to those used in previous studies [73]^19. Fourteen high-quality meta-programs (MPs) were identified, with shared cluster genes, completing a list of 50 genes. The MPs were named based on GO enrichment results. Subclustering and annotation of epithelial cell subclusters Epithelial cells were subclustered using the Seurat pipeline, with NMF replacing PCA for dimensionality reduction to identify subclusters based on gene expression modules. Each cell was scored using the above MP geneset via the “AddModuleScore” function. These scores, combined with the sample origin, facilitated the annotation of epithelial cell states and detailed subcluster assessments. Copy number variation (CNV) estimation in epithelial cells CNVs in epithelial cells were inferred using inferCNV (v1.1.3) with default parameters, using control pancreatic epithelial cells as references. CNVs in epithelial cells were analyzed, and the cells were