Abstract Background The tumor microenvironment (TME) in lung adenocarcinoma (LUAD) influences tumor progression and immunosuppressive phenotypes through cell communication. We aimed to decipher cellular communication and molecular patterns in LUAD. Methods We analyzed scRNA-seq data from LUAD patients in multiple cohorts, revealing complex cell communication networks within the TME. Using cell chat analysis and COSmap technology, we inferred LUAD's spatial organization. Employing the NMF algorithm and survival screening, we identified a cell communication interactions (CCIs) model and validated it across various datasets. Results We uncovered intricate cell communication interactions within the TME, identifying three LUAD patient subtypes with distinct prognosis, clinical characteristics, mutation status, expression patterns, and immune infiltration. Our CCI model exhibited robust performance in prognosis and immunotherapy response prediction. Several potential therapeutic targets and agents for high CCI score patients with immunosuppressive profiles were identified. Machine learning algorithms pinpointed the novel candidate gene ITGB1 and validated its role in LUAD tumor phenotype in vitro. Conclusion Our study elucidates molecular patterns and cell communication interactions in LUAD as effective biomarkers and predictors of immunotherapy response. Targeting cell communication interactions offers novel avenues for LUAD immunotherapy and prognostic evaluations, with ITGB1 emerging as a promising therapeutic target. Keywords: Lung adenocarcinoma, Tumor immune microenvironment, ScRNA-seq, Immunotherapy, Prognosis, ITGB1 Graphical abstract Image 1 [45]Open in a new tab Highlights Highlights of this research. * • The diverse landscape of cell communication interactions in TME of lung adenocarcinoma. * • The commonalities from the crosstalk of cell communication interaction (CCI). * • Comprehensive multi-omics analysis reveals the CCI pattern's vital role. * • CCI signature may facilitate identifying immunosuppression status, informing clinical decisions, and improving outcomes. * • Identification of a novel candidate gene ITGB1 in LUAD. 1. Introduction Lung cancer is the most common type of malignancy and one of the leading causes of cancer-associated death worldwide [[46]1]. Lung adenocarcinoma (LUAD) accounts for over 80 % of cases [[47]2], the incidence of which continues to increase. Despite major advances in the multi-disciplinary management of LUAD, including surgery, chemotherapy, radiotherapy, and targeted therapy, the 5-year survival rate for patients with LUAD remains grim [[48]3]. Immunotherapy, as an emerging treatment modality, has been adopted for improving prognosis [[49]4], but quite a few patients do not respond to anti-PD-1/PD-L1 immune checkpoint inhibitors (ICI) [[50]1]. What is worse, some cases were reported to suffer from hyperprogressive disease following the administration of ICI [[51]5,[52]6]. Thereby, the development of individualized treatment for LUAD patients is warranted [[53]7]. Due to the rapid development of the concepts of precision medicine and individualized therapy, the investigation of predictive and monitoring markers is never-ending. Therefore, many histology and molecular biomarkers have gradually become deeply integrated into the guidance of treatment. As to immunotherapy, biomarkers such as tumor mutation burden, and PD-L1 TPS (Tumor cell Proportion Score) are under clinical practice. However, the current predictive biomarkers of benefit from ICI are still limited and unprecise due to the spatially and temporally variable interactions [[54]8]. The preferred strategy to improve the predictive biomarker performance of ICI is combining different markers to reduce the presumed risk of a single one. The interaction between tumor cells and the surrounding fibroblasts, endothelials, and immune cells in the tumor microenvironment is crucial for the progression and the response to treatment in lung cancer. Insight into cell communication networks provides new clues and novel biomarkers to mold processes that contribute to the LUAD tumor microenvironment [[55]9]. Hence, the comprehensive analysis of the cell communication interactions in LUAD facilitates the exploitation of novel biomarkers to predict the treatment response and prognosis. The past decade has witnessed the emergence of multiple molecular prognostic markers based on RNA-seq due to the advances in sequencing technology. Compared to bulk RNA-seq, the single-cell RNA sequencing (scRNA-seq) technology could reveal gene expression at single-cell level [[56]10]. With the maturity of this technology, scRNA-seq facilitates the exploration of intracellular communication and helps uncover potential value in the complicated tumor microenvironment in tumor progression. In this article, we identified cell annotations by integrating scRNA-seq data from our institute and other databases. Subsequently, we conducted the global and putative intracellular communication analysis. Through examining immune interactions in the TME, a novel molecular phenotype, and mapping of the immune landscape was developed. The LUAD classifications based on the Cell Communication Interactions (CCI) showed marked heterogeneity in transcriptomic features, immune infiltration, clinical characteristics, and genomic alterations. Subcutaneously, a CCI score model was established and performed well in predicting prognosis and response to immunotherapy in LUAD patients. Finally, we identified a potential therapeutic target for LUAD, ITGB1, and initially validated its tumor phenotype. 2. Methods and materials 2.1. ScRNA-seq and bulk RNA-seq data collection Twenty-three primary LUAD patients who had received surgical resection in the Department of Thoracic Surgery in Zhongshan Hospital, Fudan University (FDZSH) were included for scRNA-seq. The diagnosis of LUAD was confirmed by at least two qualified pathologists. We also included other two public datasets, one from ArraryExpress (Accession numbers E-MTAB-6149 and E-MTAB-6653) and the other from Human Cell Atlas Data Coordination Platform (Accession number PRJEB31843). The RNA-seq gene expression, copy number variation (CNV), somatic number variation (SNV) data, and clinical characteristics of The Cancer Genome Atlas (TCGA) – LUAD cohort were retrieved from the UCSC Xena ([57]https://gdc.xenahubs.net). The samples were filtered by deleting the replicate samples and the Formalin-fixed paraffin-embedding (FFPE) sequencing samples under the guidance of the broad institute ([58]https://confluence.broadinstitute.org/display/GDAC/FAQ#FAQsampleTy pesQWhatTCGAsampletypesareFirehosepipelinesexecutedupon). The Schematic diagram of this research was shown in [59]Fig. 1. This study was approved by the Ethics Committee on Human Research of the Zhongshan Hospital, Fudan University. Informed consent has been obtained from each participant. Other data from the public database did not require ethical consent. Fig. 1. [60]Fig. 1 [61]Open in a new tab The landscape of sc-RNA seq data. (A) The UAMP plot of different cell clusters. (B) The distribution of cells from different sources and the sample distribution of 26 cell clusters. (C) The integration of different scRNA-seq data. (D) Heatmap of the top three marker genes for each cluster. (E) The curated cell annotation of LUAD scRNA-seq data. (F) KEGG pathway enrichment analysis of marker genes for different cell types. UMAP, Uniform Manifold Approximation, and Projection; LUAD, lung adenocarcinoma; KEGG, Kyoto Encyclopedia of Genes and Genomes. 2.2. The procedure of scRNA-seq data analysis The Preparation for scRNA-seq data followed the protocol for the 10x Genomics Chromium Single Cell platform and our previously published study demonstrated the details of tissue processing, single-cell suspension, single-cell capture, library preparation, and sequencing [[62]11]. As We followed the Seurat v4[[63]12] guidelines for the routine procedure. Firstly, utilizing the Seurat R package create Seurat object for each sample based on the output raw data files from Cellranger v3.0 pipeline. Secondly, we conducted the quality control procedure by setting the threshold as follows: 500 < nFeature <10000, 1000 < nCount <50000, mitogene percent <15 % and genes expressed in less than 10 cells were filtered out after cell quality control. The SCTransform function was performed separately in each sample to regress out the mitochondrial genes and cell cycle scores. After PCA dimension reduction, the harmony [[64]13] R package was utilized for removing the batch effect. Cell clustering was based on harmony dimensionality reduction using the first 32 PCs and a resolution value of 0.5. Cell annotation was identified manually with the assistance of auto-annotated tools SingleR [[65]14] and scType [[66]15]. The canonical markers were collected from several previous studies [[67][16], [68][17], [69][18]]. The marker genes of each cluster were conducted by the function FindAllMarkers with default parameters. The Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of markers genes of each cell type was performed using the R package clusterProfiler [[70]19]. 2.3. Cell communication analysis and COSMAP cell spatial organization To infer cell-cell communication networks, we implemented the Cellchat [[71]20](v1.4.0) R package for a systematic analysis. Cellchat has a database involving interactions among ligands, receptors, and their cofactors that accurately represent known heteromeric molecular complexes. Cellchat identified differentially over-expressed ligands and receptors for each cell group, quantifies communications between two cell groups mediated by these signaling genes, associates each interaction with a probability value, and significantly identifies the interaction probability under the randomly permutes statistic test, the visualization of different ligand-receptor (LR) interactions was also performed by Cellchat. According to the Cellular Spatial Organization mapper (CSOmap) computational tool reported by Zhang et al. [[72]21], we utilized the LR interactions from cellChat to recapitulate the spatial organization of multiple cell clusters. 2.4. The CCI screening and subclasses identification The cell communication interactions were screened first under the cellchat analysis. The CCI score was calculated according to the reports of previous studies20–22. In short, the cross-talk score of CCI in bulk RNA-seq was calculated based on the average expression of interaction genes. We used Pearson correlation analysis (p < 0.05 and correlation coefficient >0.3) and univariate Cox regression analysis for correlational and prognostic screening of CCI. An unsupervised machine learning methodology was implemented by employing the consensus clustering algorithm. The R package NMF [[73]22] was utilized to execute the non-negative matrix factorization (NMF) clustering procedure. The optimal value of k, signifying the number of subclasses, was determined by scrutinizing the cophenetic plot, a graphical depiction of the magnitude of the correlation coefficient. 2.5. LASSO algorithm and CCI score model construction LASSO is a regression methodology that is widely employed to analyze data with high dimensions and strong interrelationships [[74]23]. In this study, we harnessed the R package glmnet [[75]24] to perform LASSO-Cox regression analysis, leveraging the built-in cross-validation function to optimize the L1 regularization parameter lambda for selecting candidate features. Next, the R caret package ([76]https://cran.r-project.org/package=caret) was utilized to construct a classification model for the TCGA-LUAD cohort and evaluate the efficacy of machine learning classifiers. 2.6. Differential gene expression, immune infiltration, and mutational analysis The differential gene expression analysis between different subclasses was performed using the R package limma [[77]25] with FDR<0.05 and Log[2] (Fold Change) > 1 unless specified otherwise. The immune infiltration and cell composition estimation of TCGA patients was based on the TIDE [[78]26] and TIMER2 [[79]27]. The distribution of mutation frequency and status was illustrated using the MAF file, which was further analyzed with the R maftools package [[80]28]. Tumor mutational burden (TMB) was calculated as the number of somatic base substitutions or indels per megabase (Mb) of the coding region target territory of the test (currently, 1.11 Mb). The genetic abnormality scores including the number of segments, fraction altered, homologous recombination defect (HRD), and Aneuploidy score were obtained from a previous research [[81]29]. 2.7. Feature selection methods We employed a comprehensive approach integrating three distinct machine learning methods for the feature selection process. The Elastic Net model was implemented using the R package 'glmnet,' while Boruta feature selection, facilitated by the R package 'Boruta,' was employed to identify significant contributors to the overall survival. We conducted both univariate and multivariate Cox's proportional hazards regression analyses. Variables exhibiting statistical significance in the univariate analysis were subsequently integrated into the multivariate analysis. The outcome was visualized using a forest plot, highlighting the significant variables in the multivariate analysis. Written informed consent was obtained from all patients' representatives. 2.8. Specimens and western blotting We collected six LUAD patients who underwent surgical treatment at the Zhongshan hospital, Fudan University in June 2023. The diagnosis of lung adenocarcinoma was confirmed in each case by histopathological analysis. Briefly, the western blotting was performed using antibodies against ITGB1(1:1000, Cat No. 12594-1-AP, Proteintech) and beta actin(1:3000, AF0003, Beyotime). 2.9. Cell culture and reagents The human LUAD cell lines A549 and H358 was purchased by Chinese Academy of Science Cell Bank and cultured in Dulbecco's Modified Eagle's Medium (DMEM Beyotime, CN) with 10 % Fetal bovine serum (FBS, Evergreen, Zhejiang, China) and 1 % antibiotics in a humidified atmosphere of 5 % CO2 at 37 °C. The qPCR primer for ITGB1: 5′- CCTACTTCTGCACGATGTGATG -3′(forward), 5′- CCTTTGCTACGGTTGGTTACATT -3′(reverse). 2.10. Cell proliferation, transwell cell migration, and scratch assays Cell proliferation capacity was assessed using the EdU cell proliferation kit (C0078S, Beyotime). The procedure was carried out according to the kit's instructions. The working concentration of EdU was 10 μM. Transwell migration assays were conducted using the upper chamber with LUAD cells seeded in serum-free DMEM and the lower chamber with 20 % FBS DMEM. The incubation time for A549 and H358 cells was 24 h for 10,000 cells and 48 h for 20,000 cells, respectively. After incubation, the upper chamber was fixed with 4 % paraformaldehyde and stained with 0.1 % crystal violet. Migration was quantified using ImageJ software. Scratch assays were performed in a 6-well plate where the cells were seeded at a density of 80 % and allowed to grow until reaching confluence. Then, a cross-shaped scratch was made with a tip, and the DMEM was replaced with serum-free DMEM for the next 24 h. The plate was photographed every 8 h. 2.11. Statistical analysis The statistical analysis was conducted using the R software (Version 4.2.1), a powerful and widely used tool for data analysis in the scientific community. Categorical variables were compared using the chi-square test or Fisher's exact test, as appropriate. The normality of the data was assessed using the Shapiro-Wilk test, and the unpaired t-test or Mann-Whitney test was utilized to compare two groups depending on the results of the normality test. To investigate the correlations between all profiles, Spearman's nonparametric analysis was performed. Survival analysis was conducted using the Log-rank test, and the univariate and multivariate Cox proportional hazards regression was carried out using the stepwise method, all within the R language. The nomogram construction, validation, and calibration were conducted and plotted using the R rms and Hmisc packages. For all tests, a two-sided p-value <0.05 was considered significant. Furthermore, One asterisk represents p < 0.05, two asterisks represent p < 0.01, and three asterisks represent p < 0.001, while ns was used to indicate results that were not statistically significant. 3. Results 3.1. The landscape of integrated single-cell transcriptome of LUAD patients The landscape of scRNA-seq in LUAD patients showed a high cellular diversity under the application of integrated analysis of multi-datasets from different institutes. First, routine quality control for divided scRNA-seq datasets was performed. The quality control procedure and the threshold of pMT, pHB, nCount, and nFeatures were shown in [82]Fig. 2. [83]Fig. 1A–C demonstrated the successful integration of the 25 clusters after the removal of the batch effect. Each cluster's differential genes expression was illustrated in [84]Fig. 1D, based on which we identified the cell annotation of different clusters ([85]Fig. 1, [86]Fig. 3). Furthermore, the compared KEGG functional enrichment analysis of all the annotated cell clusters was performed ([87]Fig. 1F). Fig. 2. [88]Fig. 2 [89]Open in a new tab The cell communication interactions and spatial organization of LUAD. (A) The cell communication interactions between different cell types. (B) Network overview for the cell communication interactions between different cell types. (C) Detailed network of cell communication interactions among all cell types. (D) The spatial organization (angle = 165) in the pseudo-space inferred by CSOmap (left). The cross-section of z = 0,-10,10 of the pseudo-space. The color of the dots represents cell density (right). (E) The intensity and specificity of statistically significant cell communication interactions between different cell types. Fig. 3. [90]Fig. 3 [91]Open in a new tab The subtypes based on the cell communication patterns through NMF analysis. (A) Venn diagram of CCIs with significant correlation and prognostic importance. (B) The cophenetic correlation coefficient for k (2–6) of the NMF clustering using expression levels of 102 screened CCIs. (C) Heatmap clustering of TCGA-LUAD datasets when consensus (k) = 3. (D) The difference in survival among different molecular subtypes in the TCGA-LUAD cohort. (E) Comparison of different molecular subtypes with clinical information. The horizontal axis represents the different subtypes of samples while the vertical axis represents the proportion of clinical information in the corresponding subtype sample. Different colors represent different clinical characteristics. CCI, cell communication interaction; NMF, non-negative matrix factorization; TCGA, The Cancer Genome Atlas. 3.2. The comprehensive analysis of intercellular communication networks and inferred cell spatial organization Intercellular communication networks demonstrated the diverse interactions between different cell types in the tumor microenvironment of LUAD. [92]Fig. 2A showed that cancer cells had close cellular communication with other cell types. Among them, fibroblasts had the most interactions and the strongest interaction strength among other cells, while interactions with mast cells are lower in the tumor environment ([93]Fig. 2B and C). Based on the reconstructed spatial relationships by CSOmap, we found that the interactions between cancer cells, endothelial cells, myeloid cells, and fibroblasts tend to locate in the center of tumor spatial organization. In addition, T cells, B cells, and NK cells are more likely to distribute outside the tumor core. We also observed that some malignant epithelial cells clustered in the spatial center of the tumor, while others were separated from the interaction center ([94]Fig. 2D). Based on the Cellchat analysis, we obtained the inferred significant cell-cell communication interactions at the level of ligands/receptors in [95]Fig. 2E, cell communication interactions that differ significantly between cell clusters can be referred to [96]Table S1. Among the notable cell communication interactions, those involving the HLA molecule family are predominantly engaged in crosstalk between immune cells and other cellular components within the tumor microenvironment. Additionally, the importance of interactions such as OCLN-OCLN, MDK-SDC1, and CDH5 underscores the significant role of stromal cells in tumor progression and immune modulation. 3.3. The cell communication interactions identified three molecular subtypes in LUAD Cell communication interactions vary greatly in different cell clusters, which contribute to the distinct pathways’ activation or functional change and thus affect the tumor progression. Hence, we screened the significant cell communication interactions with strong relevance and prognostic value and 102 pairs of genes were identified as candidates ([97]Fig. 3A). Subsequently, 493 LUAD samples from the TCGA dataset were classified by NMF consensus clustering based on the expression profiles of the interactions aforementioned. The cophenetic correlation coefficient plot ([98]Fig. 3B) suggests that the optimal k value was determined as three. The basis heatmap of consensus clustering ([99]Fig. 3C) and t-SNE plot illustrates clear and crisp boundaries, indicating robust clustering for the samples. The prognosis of LUAD patients differed among the three clusters (p < 0.001), with a shortest median survival time (MST) for C1 (MST: 33.30 [23.40, 39.90] months) than C2 (MST: 40.97 [33.17, 50.03] months) and C3 (MST: 59.27 [50.53, NA] months) ([100]Fig. 3D). 3.4. Comparison of clinical characteristics among three subclasses The clinical characteristics of these three subclasses were investigated. The chi-square test in [101]Fig. 3E showed that the C1 subclass with the worst clinical outcome was correlated to higher T stage (p = 0.008) and overall stage III and IV (p = 0.006), while the C3 subclass including more young nonsmoker female patients. Differences in clinical features among the three subclasses showed a similar trend. That is, the proportion of patients aged <65 years in the C3 subclass is higher than in the C1 and C2 subclass, and the proportion of female patients increased sequentially in the order of C1, C2, and C3. 3.5. The dissection of genomic, molecular, and immune infiltration differences underlying the communication-related subclasses Investigation of the tumor genomic landscape and transcriptome profiling was related to the tumor progression. The genes with high mutation frequency across LUAD subclasses and the associated oncogenic pathways [[102]30] are visualized in [103]Fig. 4A (The statistical analysis details are shown in [104]Table S2). C3 subclass had a significantly higher mutation frequency of EGFR (20.5 %) than C1 (4.55 %) and C2 (13.29 %), while the alteration of KEAP1 and STK11 in C1 was supremely higher ([105]Fig. 4). We then observed significant differences in HRD score, TMB, and other copy number alterations. Specifically, C2 group showed highest HRD score (32.5[30.2,34.8]) than C1(24.9 [21.8; 28.0]) and C3(20.9 [19.2; 22.7]). The mutation burden load of C1 and C2 was 2.53 [2.05; 3.02] and 2.95 [2.56; 3.35] while patients in C3 subclass bear lower than 2 tumor mutation loads ([106]Fig. 4B). According to the OLCR algorithm estimation, we calculated the stemness of these three subclasses. Notably, [107]Fig. 4C showed a continuous decrease trend in the stemness score in groups C1 to C3 (p < 0.001). Fig. 4. [108]Fig. 4 [109]Open in a new tab The dissection of genomic, molecular, and immune infiltration differences. (A) The landscape of mutational genes and associated oncogenic pathways among three subclasses. (B–C) The mutational events (B) and stemness score (C) among the three subclasses. (D) Heatmap of biological pathways among the three subclasses. (E) The comparison of the immune score, stromal score, ESTIMATE score, and tumor purity among the three subclasses. (F) Expression patterns of 11 immune checkpoints among the three subclasses. (G) The abundance of 22 immune cells among the three subclasses generated by the CIBERSORT algorithm. The specific differential expression genes were identified after all three comparisons among three subclasses, which include 172 genes for C1, 99 genes for C2, and 231 genes for C3 ([110]Table S3). The GO enrichment analysis of the specific genes was shown in [111]Fig. 5A and [112]Table S4. Signature genes in C1 and C3 subtypes are enriched in immune biological processes including IgG binding, immune receptor activity, and chemokine receptor binding, while the extracellular matrix-related processes are significantly enriched in the C2 subclass. Furthermore, the KEGG and Disease Ontology (DO) enrichment analysis of these genes showed similar trends ([113]Fig. 5B and C). GSEA analysis was applied to calculate the enrichment score of each sample and identified the pathway's differences among the three subclasses. As shown in [114]Fig. 4D, the enrichment score of immune response-related pathways and diverse metabolism pathways was both higher in the C3 subclass. Fig. 5. [115]Fig. 5 [116]Open in a new tab Construction of the CCI risk model. (A) The LASSO-Cox algorithm screening procedure showed the coefficients (left) and partial likelihood deviance (right) of CCIs at different lambda values, and the optimal lambda value was determined in the left plot. (B) The Kaplan-Meier survival curve of the LUAD patients with high or low CCI scores. (C) The time-dependent receiver operating curve (ROC) of the CCI model. (D) The result of univariate and multivariate Cox regression analysis. (E) The nomogram plot of the CCI score model. (F) The calibration plot of the nomogram. ESTIMATE algorithm output the immune score, stromal score, estimate score, and tumor purity to indicate the degree of immune cell infiltration within tumor tissue. It is found that the C3 subclass had the highest ESTIMATE score and lowest tumor purity ([117]Fig. 4E). The expression of 11 immune checkpoint genes showed the same trend, demonstrating the differences in the immune microenvironment of the three subclasses ([118]Fig. 4F). Utilizing the CIBERSORT algorithm, we assessed the proportions of 22 immune cell types and found that subclass C3 were enriched among other cell types, except for memory B cells, T cell CD4 memory resting, and NK cell resting ([119]Fig. 4G). 3.6. Construction of the CCI score prognostic model Since the subclasses based on the interactions' pattern showed distinct characteristics in multi-omics analysis, to build an advantageous model for clinical use, we carried out the LASSO-Cox analysis on the TCGA-LUAD cohort as the training dataset. The optimal lambda value was determined as 0.0599 with the smallest partial likelihood of deviance ([120]Fig. 5A) and the formula was described as follows: The CCI Score = 0.039 * ANGPTL4_ITGA5_ITGB1 + 0.208 * ANGPTL4_SDC4 + 0.165 * MDK_ITGA6_ITGB1 + 0.295 * ADM_CALCRL + 0.154 * LAMB1_ITGA6_ITGB4 - 0.126 * ALCAM_CD6 - 0.258 * PTPRC_MRC1. The expression level of different interactions was calculated as the average expression of interactions’ component genes in terms of Log[2] (TPM + 1). The LUAD patients were divided into high and low groups according to the average value of the CCI score and the K-M survival plot showed that the high-CCI score group was significantly related to poorer prognosis and shorter MST ([121]Fig. 5B). The area under curve (AUC) value of 1-, 3-, and 5-year time-dependent ROC curve was 0.7, 0.71, and 0.7, respectively ([122]Fig. 5C). The correlation between CCI score and clinical characteristics were visualized in [123]Fig. 6. The forest plot ([124]Fig. 5D) showed the results of univariate and multivariate Cox analysis, which demonstrated that the CCI score is an independent prognostic factor. The nomogram ([125]Fig. 5E) was built by the significant variables including the CCI Score, the averaged concordance index (C-index) of which was 0.710 (95 % CI: 0.66–0.76), indicating the satisfying discrimination power of the model. The calibration curve plot showed the predictive accuracy of the CCI Score model ([126]Fig. 5F). Collectively, our model has a robust and accurate prediction for the prognosis. Fig. 6. [127]Fig. 6 [128]Open in a new tab External validation of the CCI score model in the GEO cohort. (A) The Kaplan-Meier survival curve of the LUAD patients with high or low CCI scores in the GEO cohort. (B) The time-dependent ROC of the CCI model in the GEO cohort. (C) The proportion of immune cells from the GEO cohort generated by the CIBERSORT algorithm. (D) The relevance between CCI score and immune cell infiltration based on Pearson correlation analysis. (E–F) The TME-related signature analysis between low CCI and high CCI score groups. 3.7. Validation of the CCI score model on GEO cohorts We selected [129]GSE31210 dataset as the independent external cohort and the expression of CCI was quantified by the mean of Log2 (TPM+1). The Kaplan-Meier survival analysis showed the CCI score was correlated with poor prognosis (p = 0.00027, [130]Fig. 6A). Time-ROC plot demonstrated the great performance of the CCI score model in the GEO dataset ([131]Fig. 6B). Furthermore, we explored the immune infiltration of different CCI score groups in the validation dataset ([132]Fig. 6C). Similar to the results in the TCGA cohort, the CCI score was negatively related to the abundance of NK cells activated, but positively correlated with macrophages, fibroblast, and neutrophils ([133]Fig. 6D, details in [134]Table S5). Additionally, the higher CCI score was associated with the hypoxia signature and immune exclusion ([135]Fig. 6E and F). 3.8. The predictive value of response to immunotherapy of the CCI score model Since the CCI score is closely related to the tumor immune microenvironment, we examined its ability to predict the response to immunotherapy. First, we used the TIDE and ImmuneCellAI methods to analyze the tumor immune dysfunction and exclusion level and deduced their immunotherapy response rate in the TCGA cohort ([136]Fig. 7). Subsequently, we found that patients who benefit from the anti-PD1 immunotherapy harbored a lower CCI score in the IMvigor210 cohort ([137]Fig. 7A). Accordingly, [138]Fig. 7B demonstrated that patients in the low CCI score group were more likely to respond to immunotherapy, and as a result, they had a better prognosis ([139]Fig. 7C). Furthermore, we validated the CCI score model for predicting immune responses in patients who received immunotherapy in 8 public cancer cohorts using logistic regression methods and all of them indicated the great performance of the model ([140]Fig. 7D&[141]Table S6). Fig. 7. [142]Fig. 7 [143]Open in a new tab The immunological prediction of the CCI score and the drug estimation. (A) Comparison of the CCI scores between patients with different responses to anti-PD1 immunotherapy in the IMvigor210 cohort. (B) Comparison of the response to anti-PD1 immunotherapy between the patients with high or low CCI scores in the IMvigor210 cohort. (C) The Kaplan-Meier survival curve of the patients with high or low CCI scores in the IMvigor210 cohort. (D) Logistic regression of 8 public immunotherapy-related cohorts. (E–F) The results of Spearman's correlation and differential drug response analysis of CTRP (E) and PRISM-derived (F) compounds. 3.9. Identification of potential therapeutic agents for LUAD patients with high CCI scores The CTRP and PRISM datasets contain gene expression and drug sensitivity profiles of hundreds of compounds that can be used to build predictive models for drug response. After purification by the ISOpure algorithm, we conducted the drug sensitivity prediction analysis based on the gene expression data of LUAD patients. We identified compounds with lower estimated AUC values in the high CCI score group (Log[2]FC > 0.10) and a negative correlation coefficient (Spearman's R < −0.30 for CTRP and −0.35 for PRISM). These analyses screened one CTRP-derived compound (paclitaxel) and six PRISM-derived compounds (obatoclax, tivantinib, ganetespib, NVP-AUY922, temsirolimus, and ispinesib), indicating that the LUAD patients harbored a high CCI score may benefit from these agents ([144]Fig. 7E and F). 3.10. Mining of potential therapeutic targets by machine learning feature selection and validation in vitro Since the CCI model construction identified several hub genes in LUAD, we next screened the candidates using the three different feature selection algorithms: Botura machine learning method, Elastic Net methods, and multivariate cox analysis. The intersection of the features were IGTB1, MRC1, and ANGTL4([145]Fig. 8). After the literature searching, we identified ITGB1 as the potential therapeutic targets and validated its higher expression in tumor than normal tissues of LUAD specimens ([146]Fig. 8A). We next validated the knockdown effects of ITGB1 in LUAD cells ([147]Fig. 8B), The EdU incorporation assays showed that A549 and H358 cells with silenced ITGB1 expression had less positive cells than control([148]Fig. 8C). The migration assays and scratch assay further validated that the knockdown of ITGB1 suppresses the migration of LUAD cells.([149]Fig. 8D–E). Thus, our findings demonstrated the correlation between ITGB's expression and tumor proliferation of LUAD, indicating the potential of tumor therapy of ITGB1. Fig. 8. [150]Fig. 8 [151]Open in a new tab The Validation of ITGB1's tumor phenotype in vitro. (A) The expression level of ITGB1 in tumor and normal tissue of LUAD patients. (The original, uncropped Western blot images are provided in the Supplementary file.) (B) RT-qPCR showed the knockdown effects of ITGB1 in LUAD cells. (C) Cell Proliferation in A549 and H358 Cells with ITGB1 Knockdown. (D) Cell Migration in A549 and H358 Cells with ITGB1 Knockdown. (E) Scratch assays in A549 and H358 cells with ITGB1 knockdown. 4. Discussion In this study, we integrated multiple experimental scRNA-seq datasets to fully dissect the cell communication networks of TME in LUAD patients. The identified key cell communication interactions contributed to the novel molecular phenotype along with the bulk RNA-seq data. Comprehensive analyses were also conducted, including gene expression profile analysis, genomic alteration landscape, and immune infiltration. Then, a CCI score model was built through LASSO-Cox screening, which could be used for the prediction of prognosis and immunotherapy response. Finally, we screened ITGB1 as the novel therapeutic target by machine learning feature selection and validated its tumor phenotype by experiments. The TME in LUAD is comprised of various types of cells and the communication interactions between these cells and tumor cells exert crucial effects on tumorigenesis, cancer progression, and response to treatment [[152]31]. He et al. analyzed the substantial heterogeneity in early-stage LUAD patients with EGFR mutations and implicated the complex cell communication interactions among tumor cells, stromal cells, and immune cells in the TME [[153]32]. Chen et al. also revealed the cellular diversity, molecular complexity, and various cell communications in different stages of LUAD [[154]11]. Zhang et al. investigated the heterogeneous immune cell populations and networks in LUAD patients harboring EGFR mutations with brain metastasis by spatial RNA-seq [[155]33]. These results suggested that the intracellular crosstalk networks based on the scRNA-seq could be used to reveal the TME heterogeneity, molecular classification, and detection of potential biomarkers. Our research utilized the cellchat tools to analyze the cell communication network based on the integrated scRNA-seq data and further investigated the spatial organization, which revealed that tumor cells and surrounding mesenchymal cells tend to locate in the tumor core, while immune cells distributed in the peripheral region of pseudo-space. In Ren's research, the CSOmap successfully recapitulates the spatial organization of TME for multiple cancers based on the scRNA-seq data [[156]21]. Keren et al. demonstrated the spatial enrichment analysis showed that the TME organization in breast cancer has immune mixed and compartmentalized tumors, suggesting the physical separation of predominantly immune regions and predominantly tumor regions [[157]34], Cao et al. found that the intercellular communication network-based scRNA-seq can be used to classify papillary thyroid carcinoma at the molecular level and predict prognosis and treatment response [[158]35]. Ghoshdastider also developed the deconvolution algorithm to estimate the relative crosstalk score in the bulk RNA-seq data of TCGA [[159]36]. We identified several cell communication interactions and stratified LUAD patients into three subtypes using the cell communication interactions by NMF classification. As a tight junction protein, OCLN-OCLN has emerged as a potential therapeutic target across various cancer types [[160][37], [161][38], [162][39]]. Sagnak's [[163]39] study highlights its transcriptomic therapeutic potential in prostate cancer, while Pudova's research [[164]40] delves into the association between OCLN-represented cell polarity and patient prognosis in micropapillary carcinoma. Furthermore, Moorthi's work [[165]41] discusses the occurrence of the OCLN-RASGRF1 fusion in RAS wild-type lung cancer and the possible therapeutic implications of MEK inhibitors for this specific subset of patients. These studies, along with our findings, underscore the necessity for deeper investigation into the role of these ligand-receptor pairs in cancer biology. Further investigation of different subtypes revealed that the C1 subclass harbored the shortest survival time, higher tumor mutation burden, and lower immune infiltration. The C1 subtype has few EGFR mutations, which is a crucial driver gene mutation for LUAD patients [[166]42]. Besides the lack of driver gene mutation, the C1 subtype displayed the desert-like immune phenotype with a dismal prognosis, while the expression pattern of the C3 subtype may be more inclined to have an active immune response. Next, we constructed a CCI score model based on the screened cell communication interactions and validated the clinical value of the CCI risk score. Pan et al. discovered that the LR score model has a good performance in predicting patients’ response to anti-PD1 therapy in breast cancer [[167]43]. Cao and Liu identified several ligand-receptor-related genes to establish the predictive model in thyroid cancer [[168]35] and renal cancer, respectively [[169]44]. We first expanded the concept of cell communication interactions from ligand-receptor pairs to more than two cascade genes cross talk in silicon analysis and established the CCI score model. Recent studies suggested that the application of immunotherapy in combination with conventional treatment provides a new strategy for NSCLC patients suffering from refractory tumors [[170][45], [171][46], [172][47]]. However, we found that the high CCI score patients might respond inadequately to immunotherapies. Furthermore, we identified several potential agents for high CCI score LUAD patients, such as Paclitaxel, a widely-used chemotherapeutic agent for treating lung cancer [[173]48,[174]49]. Liu et al. reported that ICIs in combination with paclitaxel are more effective in patients with immunosuppression status [[175]50]. Obatoclax, a B cell chronic lymphocytic leukemia/lymphoma 2 (Bcl-2) family antagonist, was reported to be used in combination with docetaxel in relapsed NSCLC patients [[176]51,[177]52]. In our analysis, ITGB1 has been screened and validated as playing an important role in tumor proliferation and migration in LUAD. ITGB1, known as the integrin subunit beta-1, plays a pivotal role in various cellular processes, including cell adhesion, migration, proliferation, and differentiation. Its regulatory functions are exerted through multiple signaling pathways that have been extensively studied. One of the key pathways involves the focal adhesion kinase (FAK) pathway, where ITGB1 engagement leads to FAK activation, resulting in the initiation of cascades that promote cell survival and cytoskeletal remodeling. Additionally, ITGB1 is integral to the phosphoinositide 3-kinase (PI3K)/Akt pathway, which supports cell growth and resistance to apoptosis [[178]53]. The interplay between ITGB1 and these pathways is critical during tumor progression, where it influences cell survival and metastatic potential [[179]54]. Previous researchers have consistently reported that ITGB1 exerts oncogenic effects in various solid cancers [[180][55], [181][56], [182][57]]. Chen et al. [[183]58] also demonstrated the significance of ligand-receptor interactions such as SELPLG-ITGB2 and ITGB4 in the prognosis of LUAD, which aligns with our research findings. Moreover, both Group Li [[184]57], Group Shi, and wang et al. [[185]59,[186]60] have reported that ITGB1 contributes to enhanced resistance to radiotherapy and chemotherapy. These findings provide valuable insights and potential avenues for investigating novel therapeutic agents for the treatment of LUAD. Our research provided a new perspective on understanding the cross-talk of cell communications in TME and built a novel CCI score model superior to other approaches in risk stratification and personalized treatment prediction. However, there are still some limitations of this work. Firstly, the scRNA-seq data only provided the expression profiles. Due to the lack of information about cell communication and spatial proximity among cells, additional studies at a spatial scale are needed to further validate the putative spatial states and complex interactions. Secondly, the conclusions of this study were generated from bioinformatic analysis, and therefore, further experimental and clinical validation is warranted. 5. Conclusion In summary, this study uncovered the complex cell communication interactions in the TME of LUAD patients by cell communication and putative spatial analysis. The three subtypes of LUAD patients considerably varied in prognosis, clinical characteristics, mutation status, expression pattern, and immune infiltration. A CCI score model was constructed, which exerts great performance in the prediction of prognosis and response to immunotherapy through internal and external validation. Finally, we identified ITGB1 as therapeutic target for LUAD and agents for high CCI score patients with potential immunosuppression status, aiming to shed new light on precision medicine. Ethics approval and consent to participate This study was approved by the Ethics Committee on Human Research of the Zhongshan Hospital, Fudan University(ethics approval number: B2021-088(2); ethics approval date: 2021-4-30). Informed consent has been obtained from each participant at the time of hospital. Consent for publication Not applicable. Funding This study was funded by the Shanghai Sailing Program (NO. 22YF1407500). Data availability statement Data adopted in this study are available in TCGA ([187]http://portal.gdc.cancer.gov/) and UCSC Xena Browser ([188]http://xena.ucsc.edu/). Other data can be directed to the corresponding authors upon reasonable request. CRediT authorship contribution statement Xing Jin: Writing – review & editing, Writing – original draft. Zhengyang Hu: Writing – original draft. Jiacheng Yin: Validation, Methodology, Investigation. Guangyao Shan: Writing – review & editing, Resources, Data curation. Mengnan Zhao: Methodology, Formal analysis, Data curation. Zhenyu Liao: Visualization, Resources. Jiaqi Liang: Writing – review & editing, Visualization, Data curation. Guoshu Bi: Writing – review & editing, Validation, Methodology. Ye Cheng: Visualization, Software. Junjie Xi: Supervision, Project administration. Zhencong Chen: Supervision, Project administration, Funding acquisition, Conceptualization. Miao Lin: Supervision, Project administration. Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:Zhencong Chen reports financial support was provided by Shanghai Sailing Program (NO. 22YF1407500). If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments