Abstract

Background

   The tumor microenvironment (TME) in lung adenocarcinoma (LUAD)
   influences tumor progression and immunosuppressive phenotypes through
   cell communication. We aimed to decipher cellular communication and
   molecular patterns in LUAD.

Methods

   We analyzed scRNA-seq data from LUAD patients in multiple cohorts,
   revealing complex cell communication networks within the TME. Using
   cell chat analysis and COSmap technology, we inferred LUAD's spatial
   organization. Employing the NMF algorithm and survival screening, we
   identified a cell communication interactions (CCIs) model and validated
   it across various datasets.

Results

   We uncovered intricate cell communication interactions within the TME,
   identifying three LUAD patient subtypes with distinct prognosis,
   clinical characteristics, mutation status, expression patterns, and
   immune infiltration. Our CCI model exhibited robust performance in
   prognosis and immunotherapy response prediction. Several potential
   therapeutic targets and agents for high CCI score patients with
   immunosuppressive profiles were identified. Machine learning algorithms
   pinpointed the novel candidate gene ITGB1 and validated its role in
   LUAD tumor phenotype in vitro.

Conclusion

   Our study elucidates molecular patterns and cell communication
   interactions in LUAD as effective biomarkers and predictors of
   immunotherapy response. Targeting cell communication interactions
   offers novel avenues for LUAD immunotherapy and prognostic evaluations,
   with ITGB1 emerging as a promising therapeutic target.

   Keywords: Lung adenocarcinoma, Tumor immune microenvironment,
   ScRNA-seq, Immunotherapy, Prognosis, ITGB1

Graphical abstract

   Image 1
   [45]Open in a new tab

Highlights

   Highlights of this research.
     * •
       The diverse landscape of cell communication interactions in TME of
       lung adenocarcinoma.
     * •
       The commonalities from the crosstalk of cell communication
       interaction (CCI).
     * •
       Comprehensive multi-omics analysis reveals the CCI pattern's vital
       role.
     * •
       CCI signature may facilitate identifying immunosuppression status,
       informing clinical decisions, and improving outcomes.
     * •
       Identification of a novel candidate gene ITGB1 in LUAD.

1. Introduction

   Lung cancer is the most common type of malignancy and one of the
   leading causes of cancer-associated death worldwide [[46]1]. Lung
   adenocarcinoma (LUAD) accounts for over 80 % of cases [[47]2], the
   incidence of which continues to increase. Despite major advances in the
   multi-disciplinary management of LUAD, including surgery, chemotherapy,
   radiotherapy, and targeted therapy, the 5-year survival rate for
   patients with LUAD remains grim [[48]3]. Immunotherapy, as an emerging
   treatment modality, has been adopted for improving prognosis [[49]4],
   but quite a few patients do not respond to anti-PD-1/PD-L1 immune
   checkpoint inhibitors (ICI) [[50]1]. What is worse, some cases were
   reported to suffer from hyperprogressive disease following the
   administration of ICI [[51]5,[52]6]. Thereby, the development of
   individualized treatment for LUAD patients is warranted [[53]7].

   Due to the rapid development of the concepts of precision medicine and
   individualized therapy, the investigation of predictive and monitoring
   markers is never-ending. Therefore, many histology and molecular
   biomarkers have gradually become deeply integrated into the guidance of
   treatment. As to immunotherapy, biomarkers such as tumor mutation
   burden, and PD-L1 TPS (Tumor cell Proportion Score) are under clinical
   practice. However, the current predictive biomarkers of benefit from
   ICI are still limited and unprecise due to the spatially and temporally
   variable interactions [[54]8]. The preferred strategy to improve the
   predictive biomarker performance of ICI is combining different markers
   to reduce the presumed risk of a single one.

   The interaction between tumor cells and the surrounding fibroblasts,
   endothelials, and immune cells in the tumor microenvironment is crucial
   for the progression and the response to treatment in lung cancer.
   Insight into cell communication networks provides new clues and novel
   biomarkers to mold processes that contribute to the LUAD tumor
   microenvironment [[55]9]. Hence, the comprehensive analysis of the cell
   communication interactions in LUAD facilitates the exploitation of
   novel biomarkers to predict the treatment response and prognosis.

   The past decade has witnessed the emergence of multiple molecular
   prognostic markers based on RNA-seq due to the advances in sequencing
   technology. Compared to bulk RNA-seq, the single-cell RNA sequencing
   (scRNA-seq) technology could reveal gene expression at single-cell
   level [[56]10]. With the maturity of this technology, scRNA-seq
   facilitates the exploration of intracellular communication and helps
   uncover potential value in the complicated tumor microenvironment in
   tumor progression.

   In this article, we identified cell annotations by integrating
   scRNA-seq data from our institute and other databases. Subsequently, we
   conducted the global and putative intracellular communication analysis.
   Through examining immune interactions in the TME, a novel molecular
   phenotype, and mapping of the immune landscape was developed. The LUAD
   classifications based on the Cell Communication Interactions (CCI)
   showed marked heterogeneity in transcriptomic features, immune
   infiltration, clinical characteristics, and genomic alterations.
   Subcutaneously, a CCI score model was established and performed well in
   predicting prognosis and response to immunotherapy in LUAD patients.
   Finally, we identified a potential therapeutic target for LUAD, ITGB1,
   and initially validated its tumor phenotype.

2. Methods and materials

2.1. ScRNA-seq and bulk RNA-seq data collection

   Twenty-three primary LUAD patients who had received surgical resection
   in the Department of Thoracic Surgery in Zhongshan Hospital, Fudan
   University (FDZSH) were included for scRNA-seq. The diagnosis of LUAD
   was confirmed by at least two qualified pathologists. We also included
   other two public datasets, one from ArraryExpress (Accession numbers
   E-MTAB-6149 and E-MTAB-6653) and the other from Human Cell Atlas Data
   Coordination Platform (Accession number PRJEB31843). The RNA-seq gene
   expression, copy number variation (CNV), somatic number variation (SNV)
   data, and clinical characteristics of The Cancer Genome Atlas (TCGA) –
   LUAD cohort were retrieved from the UCSC Xena
   ([57]https://gdc.xenahubs.net). The samples were filtered by deleting
   the replicate samples and the Formalin-fixed paraffin-embedding (FFPE)
   sequencing samples under the guidance of the broad institute
   ([58]https://confluence.broadinstitute.org/display/GDAC/FAQ#FAQsampleTy
   pesQWhatTCGAsampletypesareFirehosepipelinesexecutedupon). The Schematic
   diagram of this research was shown in [59]Fig. 1. This study was
   approved by the Ethics Committee on Human Research of the Zhongshan
   Hospital, Fudan University. Informed consent has been obtained from
   each participant. Other data from the public database did not require
   ethical consent.

Fig. 1.

   [60]Fig. 1
   [61]Open in a new tab

   The landscape of sc-RNA seq data.

   (A) The UAMP plot of different cell clusters. (B) The distribution of
   cells from different sources and the sample distribution of 26 cell
   clusters. (C) The integration of different scRNA-seq data. (D) Heatmap
   of the top three marker genes for each cluster. (E) The curated cell
   annotation of LUAD scRNA-seq data. (F) KEGG pathway enrichment analysis
   of marker genes for different cell types. UMAP, Uniform Manifold
   Approximation, and Projection; LUAD, lung adenocarcinoma; KEGG, Kyoto
   Encyclopedia of Genes and Genomes.

2.2. The procedure of scRNA-seq data analysis

   The Preparation for scRNA-seq data followed the protocol for the 10x
   Genomics Chromium Single Cell platform and our previously published
   study demonstrated the details of tissue processing, single-cell
   suspension, single-cell capture, library preparation, and sequencing
   [[62]11]. As We followed the Seurat v4[[63]12] guidelines for the
   routine procedure. Firstly, utilizing the Seurat R package create
   Seurat object for each sample based on the output raw data files from
   Cellranger v3.0 pipeline. Secondly, we conducted the quality control
   procedure by setting the threshold as follows: 500 < nFeature <10000,
   1000 < nCount <50000, mitogene percent <15 % and genes expressed in
   less than 10 cells were filtered out after cell quality control. The
   SCTransform function was performed separately in each sample to regress
   out the mitochondrial genes and cell cycle scores. After PCA dimension
   reduction, the harmony [[64]13] R package was utilized for removing the
   batch effect. Cell clustering was based on harmony dimensionality
   reduction using the first 32 PCs and a resolution value of 0.5. Cell
   annotation was identified manually with the assistance of
   auto-annotated tools SingleR [[65]14] and scType [[66]15]. The
   canonical markers were collected from several previous studies
   [[67][16], [68][17], [69][18]]. The marker genes of each cluster were
   conducted by the function FindAllMarkers with default parameters. The
   Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of
   markers genes of each cell type was performed using the R package
   clusterProfiler [[70]19].

2.3. Cell communication analysis and COSMAP cell spatial organization

   To infer cell-cell communication networks, we implemented the Cellchat
   [[71]20](v1.4.0) R package for a systematic analysis. Cellchat has a
   database involving interactions among ligands, receptors, and their
   cofactors that accurately represent known heteromeric molecular
   complexes. Cellchat identified differentially over-expressed ligands
   and receptors for each cell group, quantifies communications between
   two cell groups mediated by these signaling genes, associates each
   interaction with a probability value, and significantly identifies the
   interaction probability under the randomly permutes statistic test, the
   visualization of different ligand-receptor (LR) interactions was also
   performed by Cellchat. According to the Cellular Spatial Organization
   mapper (CSOmap) computational tool reported by Zhang et al. [[72]21],
   we utilized the LR interactions from cellChat to recapitulate the
   spatial organization of multiple cell clusters.

2.4. The CCI screening and subclasses identification

   The cell communication interactions were screened first under the
   cellchat analysis. The CCI score was calculated according to the
   reports of previous studies20–22. In short, the cross-talk score of CCI
   in bulk RNA-seq was calculated based on the average expression of
   interaction genes. We used Pearson correlation analysis (p < 0.05 and
   correlation coefficient >0.3) and univariate Cox regression analysis
   for correlational and prognostic screening of CCI. An unsupervised
   machine learning methodology was implemented by employing the consensus
   clustering algorithm. The R package NMF [[73]22] was utilized to
   execute the non-negative matrix factorization (NMF) clustering
   procedure. The optimal value of k, signifying the number of subclasses,
   was determined by scrutinizing the cophenetic plot, a graphical
   depiction of the magnitude of the correlation coefficient.

2.5. LASSO algorithm and CCI score model construction

   LASSO is a regression methodology that is widely employed to analyze
   data with high dimensions and strong interrelationships [[74]23]. In
   this study, we harnessed the R package glmnet [[75]24] to perform
   LASSO-Cox regression analysis, leveraging the built-in cross-validation
   function to optimize the L1 regularization parameter lambda for
   selecting candidate features. Next, the R caret package
   ([76]https://cran.r-project.org/package=caret) was utilized to
   construct a classification model for the TCGA-LUAD cohort and evaluate
   the efficacy of machine learning classifiers.

2.6. Differential gene expression, immune infiltration, and mutational
analysis

   The differential gene expression analysis between different subclasses
   was performed using the R package limma [[77]25] with FDR<0.05 and
   Log[2] (Fold Change) > 1 unless specified otherwise. The immune
   infiltration and cell composition estimation of TCGA patients was based
   on the TIDE [[78]26] and TIMER2 [[79]27]. The distribution of mutation
   frequency and status was illustrated using the MAF file, which was
   further analyzed with the R maftools package [[80]28]. Tumor mutational
   burden (TMB) was calculated as the number of somatic base substitutions
   or indels per megabase (Mb) of the coding region target territory of
   the test (currently, 1.11 Mb). The genetic abnormality scores including
   the number of segments, fraction altered, homologous recombination
   defect (HRD), and Aneuploidy score were obtained from a previous
   research [[81]29].

2.7. Feature selection methods

   We employed a comprehensive approach integrating three distinct machine
   learning methods for the feature selection process. The Elastic Net
   model was implemented using the R package 'glmnet,' while Boruta
   feature selection, facilitated by the R package 'Boruta,' was employed
   to identify significant contributors to the overall survival. We
   conducted both univariate and multivariate Cox's proportional hazards
   regression analyses. Variables exhibiting statistical significance in
   the univariate analysis were subsequently integrated into the
   multivariate analysis. The outcome was visualized using a forest plot,
   highlighting the significant variables in the multivariate analysis.
   Written informed consent was obtained from all patients'
   representatives.

2.8. Specimens and western blotting

   We collected six LUAD patients who underwent surgical treatment at the
   Zhongshan hospital, Fudan University in June 2023. The diagnosis of
   lung adenocarcinoma was confirmed in each case by histopathological
   analysis. Briefly, the western blotting was performed using antibodies
   against ITGB1(1:1000, Cat No. 12594-1-AP, Proteintech) and beta
   actin(1:3000, AF0003, Beyotime).

2.9. Cell culture and reagents

   The human LUAD cell lines A549 and H358 was purchased by Chinese
   Academy of Science Cell Bank and cultured in Dulbecco's Modified
   Eagle's Medium (DMEM Beyotime, CN) with 10 % Fetal bovine serum (FBS,
   Evergreen, Zhejiang, China) and 1 % antibiotics in a humidified
   atmosphere of 5 % CO2 at 37 °C. The qPCR primer for ITGB1: 5′-
   CCTACTTCTGCACGATGTGATG -3′(forward), 5′- CCTTTGCTACGGTTGGTTACATT
   -3′(reverse).

2.10. Cell proliferation, transwell cell migration, and scratch assays

   Cell proliferation capacity was assessed using the EdU cell
   proliferation kit (C0078S, Beyotime). The procedure was carried out
   according to the kit's instructions. The working concentration of EdU
   was 10 μM. Transwell migration assays were conducted using the upper
   chamber with LUAD cells seeded in serum-free DMEM and the lower chamber
   with 20 % FBS DMEM. The incubation time for A549 and H358 cells was
   24 h for 10,000 cells and 48 h for 20,000 cells, respectively. After
   incubation, the upper chamber was fixed with 4 % paraformaldehyde and
   stained with 0.1 % crystal violet. Migration was quantified using
   ImageJ software. Scratch assays were performed in a 6-well plate where
   the cells were seeded at a density of 80 % and allowed to grow until
   reaching confluence. Then, a cross-shaped scratch was made with a tip,
   and the DMEM was replaced with serum-free DMEM for the next 24 h. The
   plate was photographed every 8 h.

2.11. Statistical analysis

   The statistical analysis was conducted using the R software (Version
   4.2.1), a powerful and widely used tool for data analysis in the
   scientific community. Categorical variables were compared using the
   chi-square test or Fisher's exact test, as appropriate. The normality
   of the data was assessed using the Shapiro-Wilk test, and the unpaired
   t-test or Mann-Whitney test was utilized to compare two groups
   depending on the results of the normality test. To investigate the
   correlations between all profiles, Spearman's nonparametric analysis
   was performed. Survival analysis was conducted using the Log-rank test,
   and the univariate and multivariate Cox proportional hazards regression
   was carried out using the stepwise method, all within the R language.
   The nomogram construction, validation, and calibration were conducted
   and plotted using the R rms and Hmisc packages. For all tests, a
   two-sided p-value <0.05 was considered significant. Furthermore, One
   asterisk represents p < 0.05, two asterisks represent p < 0.01, and
   three asterisks represent p < 0.001, while ns was used to indicate
   results that were not statistically significant.

3. Results

3.1. The landscape of integrated single-cell transcriptome of LUAD patients

   The landscape of scRNA-seq in LUAD patients showed a high cellular
   diversity under the application of integrated analysis of
   multi-datasets from different institutes. First, routine quality
   control for divided scRNA-seq datasets was performed. The quality
   control procedure and the threshold of pMT, pHB, nCount, and nFeatures
   were shown in [82]Fig. 2. [83]Fig. 1A–C demonstrated the successful
   integration of the 25 clusters after the removal of the batch effect.
   Each cluster's differential genes expression was illustrated in
   [84]Fig. 1D, based on which we identified the cell annotation of
   different clusters ([85]Fig. 1, [86]Fig. 3). Furthermore, the compared
   KEGG functional enrichment analysis of all the annotated cell clusters
   was performed ([87]Fig. 1F).

Fig. 2.

   [88]Fig. 2
   [89]Open in a new tab

   The cell communication interactions and spatial organization of LUAD.

   (A) The cell communication interactions between different cell types.
   (B) Network overview for the cell communication interactions between
   different cell types. (C) Detailed network of cell communication
   interactions among all cell types. (D) The spatial organization
   (angle = 165) in the pseudo-space inferred by CSOmap (left). The
   cross-section of z = 0,-10,10 of the pseudo-space. The color of the
   dots represents cell density (right). (E) The intensity and specificity
   of statistically significant cell communication interactions between
   different cell types.

Fig. 3.

   [90]Fig. 3
   [91]Open in a new tab

   The subtypes based on the cell communication patterns through NMF
   analysis.

   (A) Venn diagram of CCIs with significant correlation and prognostic
   importance. (B) The cophenetic correlation coefficient for k (2–6) of
   the NMF clustering using expression levels of 102 screened CCIs. (C)
   Heatmap clustering of TCGA-LUAD datasets when consensus (k) = 3. (D)
   The difference in survival among different molecular subtypes in the
   TCGA-LUAD cohort. (E) Comparison of different molecular subtypes with
   clinical information. The horizontal axis represents the different
   subtypes of samples while the vertical axis represents the proportion
   of clinical information in the corresponding subtype sample. Different
   colors represent different clinical characteristics. CCI, cell
   communication interaction; NMF, non-negative matrix factorization;
   TCGA, The Cancer Genome Atlas.

3.2. The comprehensive analysis of intercellular communication networks and
inferred cell spatial organization

   Intercellular communication networks demonstrated the diverse
   interactions between different cell types in the tumor microenvironment
   of LUAD. [92]Fig. 2A showed that cancer cells had close cellular
   communication with other cell types. Among them, fibroblasts had the
   most interactions and the strongest interaction strength among other
   cells, while interactions with mast cells are lower in the tumor
   environment ([93]Fig. 2B and C). Based on the reconstructed spatial
   relationships by CSOmap, we found that the interactions between cancer
   cells, endothelial cells, myeloid cells, and fibroblasts tend to locate
   in the center of tumor spatial organization. In addition, T cells, B
   cells, and NK cells are more likely to distribute outside the tumor
   core. We also observed that some malignant epithelial cells clustered
   in the spatial center of the tumor, while others were separated from
   the interaction center ([94]Fig. 2D). Based on the Cellchat analysis,
   we obtained the inferred significant cell-cell communication
   interactions at the level of ligands/receptors in [95]Fig. 2E, cell
   communication interactions that differ significantly between cell
   clusters can be referred to [96]Table S1. Among the notable cell
   communication interactions, those involving the HLA molecule family are
   predominantly engaged in crosstalk between immune cells and other
   cellular components within the tumor microenvironment. Additionally,
   the importance of interactions such as OCLN-OCLN, MDK-SDC1, and CDH5
   underscores the significant role of stromal cells in tumor progression
   and immune modulation.

3.3. The cell communication interactions identified three molecular subtypes
in LUAD

   Cell communication interactions vary greatly in different cell
   clusters, which contribute to the distinct pathways’ activation or
   functional change and thus affect the tumor progression. Hence, we
   screened the significant cell communication interactions with strong
   relevance and prognostic value and 102 pairs of genes were identified
   as candidates ([97]Fig. 3A). Subsequently, 493 LUAD samples from the
   TCGA dataset were classified by NMF consensus clustering based on the
   expression profiles of the interactions aforementioned. The cophenetic
   correlation coefficient plot ([98]Fig. 3B) suggests that the optimal k
   value was determined as three. The basis heatmap of consensus
   clustering ([99]Fig. 3C) and t-SNE plot illustrates clear and crisp
   boundaries, indicating robust clustering for the samples. The prognosis
   of LUAD patients differed among the three clusters (p < 0.001), with a
   shortest median survival time (MST) for C1 (MST: 33.30 [23.40, 39.90]
   months) than C2 (MST: 40.97 [33.17, 50.03] months) and C3 (MST: 59.27
   [50.53, NA] months) ([100]Fig. 3D).

3.4. Comparison of clinical characteristics among three subclasses

   The clinical characteristics of these three subclasses were
   investigated. The chi-square test in [101]Fig. 3E showed that the C1
   subclass with the worst clinical outcome was correlated to higher T
   stage (p = 0.008) and overall stage III and IV (p = 0.006), while the
   C3 subclass including more young nonsmoker female patients. Differences
   in clinical features among the three subclasses showed a similar trend.
   That is, the proportion of patients aged <65 years in the C3 subclass
   is higher than in the C1 and C2 subclass, and the proportion of female
   patients increased sequentially in the order of C1, C2, and C3.

3.5. The dissection of genomic, molecular, and immune infiltration
differences underlying the communication-related subclasses

   Investigation of the tumor genomic landscape and transcriptome
   profiling was related to the tumor progression. The genes with high
   mutation frequency across LUAD subclasses and the associated oncogenic
   pathways [[102]30] are visualized in [103]Fig. 4A (The statistical
   analysis details are shown in [104]Table S2). C3 subclass had a
   significantly higher mutation frequency of EGFR (20.5 %) than C1
   (4.55 %) and C2 (13.29 %), while the alteration of KEAP1 and STK11 in
   C1 was supremely higher ([105]Fig. 4). We then observed significant
   differences in HRD score, TMB, and other copy number alterations.
   Specifically, C2 group showed highest HRD score (32.5[30.2,34.8]) than
   C1(24.9 [21.8; 28.0]) and C3(20.9 [19.2; 22.7]). The mutation burden
   load of C1 and C2 was 2.53 [2.05; 3.02] and 2.95 [2.56; 3.35] while
   patients in C3 subclass bear lower than 2 tumor mutation loads
   ([106]Fig. 4B). According to the OLCR algorithm estimation, we
   calculated the stemness of these three subclasses. Notably, [107]Fig.
   4C showed a continuous decrease trend in the stemness score in groups
   C1 to C3 (p < 0.001).

Fig. 4.

   [108]Fig. 4
   [109]Open in a new tab

   The dissection of genomic, molecular, and immune infiltration
   differences.

   (A) The landscape of mutational genes and associated oncogenic pathways
   among three subclasses. (B–C) The mutational events (B) and stemness
   score (C) among the three subclasses. (D) Heatmap of biological
   pathways among the three subclasses. (E) The comparison of the immune
   score, stromal score, ESTIMATE score, and tumor purity among the three
   subclasses. (F) Expression patterns of 11 immune checkpoints among the
   three subclasses. (G) The abundance of 22 immune cells among the three
   subclasses generated by the CIBERSORT algorithm.

   The specific differential expression genes were identified after all
   three comparisons among three subclasses, which include 172 genes for
   C1, 99 genes for C2, and 231 genes for C3 ([110]Table S3). The GO
   enrichment analysis of the specific genes was shown in [111]Fig. 5A and
   [112]Table S4. Signature genes in C1 and C3 subtypes are enriched in
   immune biological processes including IgG binding, immune receptor
   activity, and chemokine receptor binding, while the extracellular
   matrix-related processes are significantly enriched in the C2 subclass.
   Furthermore, the KEGG and Disease Ontology (DO) enrichment analysis of
   these genes showed similar trends ([113]Fig. 5B and C). GSEA analysis
   was applied to calculate the enrichment score of each sample and
   identified the pathway's differences among the three subclasses. As
   shown in [114]Fig. 4D, the enrichment score of immune response-related
   pathways and diverse metabolism pathways was both higher in the C3
   subclass.

Fig. 5.

   [115]Fig. 5
   [116]Open in a new tab

   Construction of the CCI risk model.

   (A) The LASSO-Cox algorithm screening procedure showed the coefficients
   (left) and partial likelihood deviance (right) of CCIs at different
   lambda values, and the optimal lambda value was determined in the left
   plot. (B) The Kaplan-Meier survival curve of the LUAD patients with
   high or low CCI scores. (C) The time-dependent receiver operating curve
   (ROC) of the CCI model. (D) The result of univariate and multivariate
   Cox regression analysis. (E) The nomogram plot of the CCI score model.
   (F) The calibration plot of the nomogram.

   ESTIMATE algorithm output the immune score, stromal score, estimate
   score, and tumor purity to indicate the degree of immune cell
   infiltration within tumor tissue. It is found that the C3 subclass had
   the highest ESTIMATE score and lowest tumor purity ([117]Fig. 4E). The
   expression of 11 immune checkpoint genes showed the same trend,
   demonstrating the differences in the immune microenvironment of the
   three subclasses ([118]Fig. 4F). Utilizing the CIBERSORT algorithm, we
   assessed the proportions of 22 immune cell types and found that
   subclass C3 were enriched among other cell types, except for memory B
   cells, T cell CD4 memory resting, and NK cell resting ([119]Fig. 4G).

3.6. Construction of the CCI score prognostic model

   Since the subclasses based on the interactions' pattern showed distinct
   characteristics in multi-omics analysis, to build an advantageous model
   for clinical use, we carried out the LASSO-Cox analysis on the
   TCGA-LUAD cohort as the training dataset. The optimal lambda value was
   determined as 0.0599 with the smallest partial likelihood of deviance
   ([120]Fig. 5A) and the formula was described as follows: The CCI
   Score = 0.039 * ANGPTL4_ITGA5_ITGB1 + 0.208 * ANGPTL4_SDC4 + 0.165 *
   MDK_ITGA6_ITGB1 + 0.295 * ADM_CALCRL + 0.154 * LAMB1_ITGA6_ITGB4 -
   0.126 * ALCAM_CD6 - 0.258 * PTPRC_MRC1. The expression level of
   different interactions was calculated as the average expression of
   interactions’ component genes in terms of Log[2] (TPM + 1).

   The LUAD patients were divided into high and low groups according to
   the average value of the CCI score and the K-M survival plot showed
   that the high-CCI score group was significantly related to poorer
   prognosis and shorter MST ([121]Fig. 5B). The area under curve (AUC)
   value of 1-, 3-, and 5-year time-dependent ROC curve was 0.7, 0.71, and
   0.7, respectively ([122]Fig. 5C). The correlation between CCI score and
   clinical characteristics were visualized in [123]Fig. 6. The forest
   plot ([124]Fig. 5D) showed the results of univariate and multivariate
   Cox analysis, which demonstrated that the CCI score is an independent
   prognostic factor. The nomogram ([125]Fig. 5E) was built by the
   significant variables including the CCI Score, the averaged concordance
   index (C-index) of which was 0.710 (95 % CI: 0.66–0.76), indicating the
   satisfying discrimination power of the model. The calibration curve
   plot showed the predictive accuracy of the CCI Score model ([126]Fig.
   5F). Collectively, our model has a robust and accurate prediction for
   the prognosis.

Fig. 6.

   [127]Fig. 6
   [128]Open in a new tab

   External validation of the CCI score model in the GEO cohort.

   (A) The Kaplan-Meier survival curve of the LUAD patients with high or
   low CCI scores in the GEO cohort. (B) The time-dependent ROC of the CCI
   model in the GEO cohort. (C) The proportion of immune cells from the
   GEO cohort generated by the CIBERSORT algorithm. (D) The relevance
   between CCI score and immune cell infiltration based on Pearson
   correlation analysis. (E–F) The TME-related signature analysis between
   low CCI and high CCI score groups.

3.7. Validation of the CCI score model on GEO cohorts

   We selected [129]GSE31210 dataset as the independent external cohort
   and the expression of CCI was quantified by the mean of Log2 (TPM+1).
   The Kaplan-Meier survival analysis showed the CCI score was correlated
   with poor prognosis (p = 0.00027, [130]Fig. 6A). Time-ROC plot
   demonstrated the great performance of the CCI score model in the GEO
   dataset ([131]Fig. 6B). Furthermore, we explored the immune
   infiltration of different CCI score groups in the validation dataset
   ([132]Fig. 6C). Similar to the results in the TCGA cohort, the CCI
   score was negatively related to the abundance of NK cells activated,
   but positively correlated with macrophages, fibroblast, and neutrophils
   ([133]Fig. 6D, details in [134]Table S5). Additionally, the higher CCI
   score was associated with the hypoxia signature and immune exclusion
   ([135]Fig. 6E and F).

3.8. The predictive value of response to immunotherapy of the CCI score model

   Since the CCI score is closely related to the tumor immune
   microenvironment, we examined its ability to predict the response to
   immunotherapy. First, we used the TIDE and ImmuneCellAI methods to
   analyze the tumor immune dysfunction and exclusion level and deduced
   their immunotherapy response rate in the TCGA cohort ([136]Fig. 7).
   Subsequently, we found that patients who benefit from the anti-PD1
   immunotherapy harbored a lower CCI score in the IMvigor210 cohort
   ([137]Fig. 7A). Accordingly, [138]Fig. 7B demonstrated that patients in
   the low CCI score group were more likely to respond to immunotherapy,
   and as a result, they had a better prognosis ([139]Fig. 7C).
   Furthermore, we validated the CCI score model for predicting immune
   responses in patients who received immunotherapy in 8 public cancer
   cohorts using logistic regression methods and all of them indicated the
   great performance of the model ([140]Fig. 7D&[141]Table S6).

Fig. 7.

   [142]Fig. 7
   [143]Open in a new tab

   The immunological prediction of the CCI score and the drug estimation.

   (A) Comparison of the CCI scores between patients with different
   responses to anti-PD1 immunotherapy in the IMvigor210 cohort. (B)
   Comparison of the response to anti-PD1 immunotherapy between the
   patients with high or low CCI scores in the IMvigor210 cohort. (C) The
   Kaplan-Meier survival curve of the patients with high or low CCI scores
   in the IMvigor210 cohort. (D) Logistic regression of 8 public
   immunotherapy-related cohorts. (E–F) The results of Spearman's
   correlation and differential drug response analysis of CTRP (E) and
   PRISM-derived (F) compounds.

3.9. Identification of potential therapeutic agents for LUAD patients with
high CCI scores

   The CTRP and PRISM datasets contain gene expression and drug
   sensitivity profiles of hundreds of compounds that can be used to build
   predictive models for drug response. After purification by the ISOpure
   algorithm, we conducted the drug sensitivity prediction analysis based
   on the gene expression data of LUAD patients. We identified compounds
   with lower estimated AUC values in the high CCI score group
   (Log[2]FC > 0.10) and a negative correlation coefficient (Spearman's
   R < −0.30 for CTRP and −0.35 for PRISM). These analyses screened one
   CTRP-derived compound (paclitaxel) and six PRISM-derived compounds
   (obatoclax, tivantinib, ganetespib, NVP-AUY922, temsirolimus, and
   ispinesib), indicating that the LUAD patients harbored a high CCI score
   may benefit from these agents ([144]Fig. 7E and F).

3.10. Mining of potential therapeutic targets by machine learning feature
selection and validation in vitro

   Since the CCI model construction identified several hub genes in LUAD,
   we next screened the candidates using the three different feature
   selection algorithms: Botura machine learning method, Elastic Net
   methods, and multivariate cox analysis. The intersection of the
   features were IGTB1, MRC1, and ANGTL4([145]Fig. 8). After the
   literature searching, we identified ITGB1 as the potential therapeutic
   targets and validated its higher expression in tumor than normal
   tissues of LUAD specimens ([146]Fig. 8A). We next validated the
   knockdown effects of ITGB1 in LUAD cells ([147]Fig. 8B), The EdU
   incorporation assays showed that A549 and H358 cells with silenced
   ITGB1 expression had less positive cells than control([148]Fig. 8C).
   The migration assays and scratch assay further validated that the
   knockdown of ITGB1 suppresses the migration of LUAD cells.([149]Fig.
   8D–E). Thus, our findings demonstrated the correlation between ITGB's
   expression and tumor proliferation of LUAD, indicating the potential of
   tumor therapy of ITGB1.

Fig. 8.

   [150]Fig. 8
   [151]Open in a new tab

   The Validation of ITGB1's tumor phenotype in vitro.

   (A) The expression level of ITGB1 in tumor and normal tissue of LUAD
   patients. (The original, uncropped Western blot images are provided in
   the Supplementary file.) (B) RT-qPCR showed the knockdown effects of
   ITGB1 in LUAD cells. (C) Cell Proliferation in A549 and H358 Cells with
   ITGB1 Knockdown. (D) Cell Migration in A549 and H358 Cells with ITGB1
   Knockdown. (E) Scratch assays in A549 and H358 cells with ITGB1
   knockdown.

4. Discussion

   In this study, we integrated multiple experimental scRNA-seq datasets
   to fully dissect the cell communication networks of TME in LUAD
   patients. The identified key cell communication interactions
   contributed to the novel molecular phenotype along with the bulk
   RNA-seq data. Comprehensive analyses were also conducted, including
   gene expression profile analysis, genomic alteration landscape, and
   immune infiltration. Then, a CCI score model was built through
   LASSO-Cox screening, which could be used for the prediction of
   prognosis and immunotherapy response. Finally, we screened ITGB1 as the
   novel therapeutic target by machine learning feature selection and
   validated its tumor phenotype by experiments.

   The TME in LUAD is comprised of various types of cells and the
   communication interactions between these cells and tumor cells exert
   crucial effects on tumorigenesis, cancer progression, and response to
   treatment [[152]31]. He et al. analyzed the substantial heterogeneity
   in early-stage LUAD patients with EGFR mutations and implicated the
   complex cell communication interactions among tumor cells, stromal
   cells, and immune cells in the TME [[153]32]. Chen et al. also revealed
   the cellular diversity, molecular complexity, and various cell
   communications in different stages of LUAD [[154]11]. Zhang et al.
   investigated the heterogeneous immune cell populations and networks in
   LUAD patients harboring EGFR mutations with brain metastasis by spatial
   RNA-seq [[155]33]. These results suggested that the intracellular
   crosstalk networks based on the scRNA-seq could be used to reveal the
   TME heterogeneity, molecular classification, and detection of potential
   biomarkers.

   Our research utilized the cellchat tools to analyze the cell
   communication network based on the integrated scRNA-seq data and
   further investigated the spatial organization, which revealed that
   tumor cells and surrounding mesenchymal cells tend to locate in the
   tumor core, while immune cells distributed in the peripheral region of
   pseudo-space. In Ren's research, the CSOmap successfully recapitulates
   the spatial organization of TME for multiple cancers based on the
   scRNA-seq data [[156]21]. Keren et al. demonstrated the spatial
   enrichment analysis showed that the TME organization in breast cancer
   has immune mixed and compartmentalized tumors, suggesting the physical
   separation of predominantly immune regions and predominantly tumor
   regions [[157]34], Cao et al. found that the intercellular
   communication network-based scRNA-seq can be used to classify papillary
   thyroid carcinoma at the molecular level and predict prognosis and
   treatment response [[158]35]. Ghoshdastider also developed the
   deconvolution algorithm to estimate the relative crosstalk score in the
   bulk RNA-seq data of TCGA [[159]36]. We identified several cell
   communication interactions and stratified LUAD patients into three
   subtypes using the cell communication interactions by NMF
   classification. As a tight junction protein, OCLN-OCLN has emerged as a
   potential therapeutic target across various cancer types [[160][37],
   [161][38], [162][39]]. Sagnak's [[163]39] study highlights its
   transcriptomic therapeutic potential in prostate cancer, while Pudova's
   research [[164]40] delves into the association between OCLN-represented
   cell polarity and patient prognosis in micropapillary carcinoma.
   Furthermore, Moorthi's work [[165]41] discusses the occurrence of the
   OCLN-RASGRF1 fusion in RAS wild-type lung cancer and the possible
   therapeutic implications of MEK inhibitors for this specific subset of
   patients. These studies, along with our findings, underscore the
   necessity for deeper investigation into the role of these
   ligand-receptor pairs in cancer biology.

   Further investigation of different subtypes revealed that the C1
   subclass harbored the shortest survival time, higher tumor mutation
   burden, and lower immune infiltration. The C1 subtype has few EGFR
   mutations, which is a crucial driver gene mutation for LUAD patients
   [[166]42]. Besides the lack of driver gene mutation, the C1 subtype
   displayed the desert-like immune phenotype with a dismal prognosis,
   while the expression pattern of the C3 subtype may be more inclined to
   have an active immune response.

   Next, we constructed a CCI score model based on the screened cell
   communication interactions and validated the clinical value of the CCI
   risk score. Pan et al. discovered that the LR score model has a good
   performance in predicting patients’ response to anti-PD1 therapy in
   breast cancer [[167]43]. Cao and Liu identified several
   ligand-receptor-related genes to establish the predictive model in
   thyroid cancer [[168]35] and renal cancer, respectively [[169]44]. We
   first expanded the concept of cell communication interactions from
   ligand-receptor pairs to more than two cascade genes cross talk in
   silicon analysis and established the CCI score model.

   Recent studies suggested that the application of immunotherapy in
   combination with conventional treatment provides a new strategy for
   NSCLC patients suffering from refractory tumors [[170][45], [171][46],
   [172][47]]. However, we found that the high CCI score patients might
   respond inadequately to immunotherapies. Furthermore, we identified
   several potential agents for high CCI score LUAD patients, such as
   Paclitaxel, a widely-used chemotherapeutic agent for treating lung
   cancer [[173]48,[174]49]. Liu et al. reported that ICIs in combination
   with paclitaxel are more effective in patients with immunosuppression
   status [[175]50]. Obatoclax, a B cell chronic lymphocytic
   leukemia/lymphoma 2 (Bcl-2) family antagonist, was reported to be used
   in combination with docetaxel in relapsed NSCLC patients
   [[176]51,[177]52].

   In our analysis, ITGB1 has been screened and validated as playing an
   important role in tumor proliferation and migration in LUAD. ITGB1,
   known as the integrin subunit beta-1, plays a pivotal role in various
   cellular processes, including cell adhesion, migration, proliferation,
   and differentiation. Its regulatory functions are exerted through
   multiple signaling pathways that have been extensively studied. One of
   the key pathways involves the focal adhesion kinase (FAK) pathway,
   where ITGB1 engagement leads to FAK activation, resulting in the
   initiation of cascades that promote cell survival and cytoskeletal
   remodeling. Additionally, ITGB1 is integral to the phosphoinositide
   3-kinase (PI3K)/Akt pathway, which supports cell growth and resistance
   to apoptosis [[178]53]. The interplay between ITGB1 and these pathways
   is critical during tumor progression, where it influences cell survival
   and metastatic potential [[179]54]. Previous researchers have
   consistently reported that ITGB1 exerts oncogenic effects in various
   solid cancers [[180][55], [181][56], [182][57]]. Chen et al. [[183]58]
   also demonstrated the significance of ligand-receptor interactions such
   as SELPLG-ITGB2 and ITGB4 in the prognosis of LUAD, which aligns with
   our research findings. Moreover, both Group Li [[184]57], Group Shi,
   and wang et al. [[185]59,[186]60] have reported that ITGB1 contributes
   to enhanced resistance to radiotherapy and chemotherapy. These findings
   provide valuable insights and potential avenues for investigating novel
   therapeutic agents for the treatment of LUAD.

   Our research provided a new perspective on understanding the cross-talk
   of cell communications in TME and built a novel CCI score model
   superior to other approaches in risk stratification and personalized
   treatment prediction. However, there are still some limitations of this
   work. Firstly, the scRNA-seq data only provided the expression
   profiles. Due to the lack of information about cell communication and
   spatial proximity among cells, additional studies at a spatial scale
   are needed to further validate the putative spatial states and complex
   interactions. Secondly, the conclusions of this study were generated
   from bioinformatic analysis, and therefore, further experimental and
   clinical validation is warranted.

5. Conclusion

   In summary, this study uncovered the complex cell communication
   interactions in the TME of LUAD patients by cell communication and
   putative spatial analysis. The three subtypes of LUAD patients
   considerably varied in prognosis, clinical characteristics, mutation
   status, expression pattern, and immune infiltration. A CCI score model
   was constructed, which exerts great performance in the prediction of
   prognosis and response to immunotherapy through internal and external
   validation. Finally, we identified ITGB1 as therapeutic target for LUAD
   and agents for high CCI score patients with potential immunosuppression
   status, aiming to shed new light on precision medicine.

Ethics approval and consent to participate

   This study was approved by the Ethics Committee on Human Research of
   the Zhongshan Hospital, Fudan University(ethics approval number:
   B2021-088(2); ethics approval date: 2021-4-30). Informed consent has
   been obtained from each participant at the time of hospital.

Consent for publication

   Not applicable.

Funding

   This study was funded by the Shanghai Sailing Program (NO.
   22YF1407500).

Data availability statement

   Data adopted in this study are available in TCGA
   ([187]http://portal.gdc.cancer.gov/) and UCSC Xena Browser
   ([188]http://xena.ucsc.edu/). Other data can be directed to the
   corresponding authors upon reasonable request.

CRediT authorship contribution statement

   Xing Jin: Writing – review & editing, Writing – original draft.
   Zhengyang Hu: Writing – original draft. Jiacheng Yin: Validation,
   Methodology, Investigation. Guangyao Shan: Writing – review & editing,
   Resources, Data curation. Mengnan Zhao: Methodology, Formal analysis,
   Data curation. Zhenyu Liao: Visualization, Resources. Jiaqi Liang:
   Writing – review & editing, Visualization, Data curation. Guoshu Bi:
   Writing – review & editing, Validation, Methodology. Ye Cheng:
   Visualization, Software. Junjie Xi: Supervision, Project
   administration. Zhencong Chen: Supervision, Project administration,
   Funding acquisition, Conceptualization. Miao Lin: Supervision, Project
   administration.

Declaration of competing interest

   The authors declare the following financial interests/personal
   relationships which may be considered as potential competing
   interests:Zhencong Chen reports financial support was provided by
   Shanghai Sailing Program (NO. 22YF1407500). If there are other authors,
   they declare that they have no known competing financial interests or
   personal relationships that could have appeared to influence the work
   reported in this paper.

Acknowledgments