Graphical abstract graphic file with name fx1.jpg [89]Open in a new tab Highlights * • Single-cell analysis uncovers immune ecosystems in early-stage LUAD * • Immune detection of GGO-associated neoantigens in lung cancer * • Activated CXCL13^+CD8^+ T cells recognize GGO-derived neoantigens at the early stage * • Immune modulation of lymphatic endothelial cells conditions LUAD progression __________________________________________________________________ The immune microenvironment in early-stage lung cancer remains elusive. Deng et al. unveil six immune ecotypes in early-stage lung adenocarcinoma, showing early GGO neoantigen detection by CD8^+ T cells versus solid nodule suppression, highlighting paths for early therapeutic interventions. Introduction Lung adenocarcinoma (LUAD), the predominant subtype of lung cancer, boasts a meager 5-year survival rate of 50% for patients in advanced stages despite contemporary first-line therapies.[90]^1 The advent of low-dose computed tomography (CT) has dramatically improved survival rates by enabling early detection, particularly in the form of ground-glass opacity (GGO). Indeed, a retrospective study of stage I LUAD patients who underwent surgery underlines the significance of early detection through low-dose CT, revealing an impressive 5-year overall survival rate of 94.9%.[91]^2 GGOs, evident on chest CT, manifest as hazy opacity without obscuring the underlying bronchi and pulmonary vascular structures.[92]^3 While GGOs typically exhibit low malignant potential, they can persist for years,[93]^4 with nodular GGOs suggesting early-stage malignancy if they enlarge or develop a solid component over time.[94]^5^,[95]^6 In contrast, patients may present with a complete solid pattern initially, indicating distinct malignant progression routes. Whether the manifestation of early development of lung malignancy can originate from different patterns and the underlying biological mechanisms that would be involved if this were true remain unclear. The histological examination of early-stage LUAD with GGO features reveals thickened alveolar walls with atypical cuboidal pneumocytes, observed in preinvasive lesions such as atypical adenomatous hyperplasia, adenocarcinoma in situ, minimally invasive adenocarcinoma, or lepidic-predominant invasive adenocarcinoma (LPA),[96]^7 showing almost 100% 5-year disease-free survival after surgery.[97]^8 Notably, GGOs associated with LPA pose a higher risk of metastasis and disease progression.[98]^9 These histological features of LUAD do not consistently correlate with mutational profiling, complicating our understanding of GGO progression. EGFR mutations are more frequent in smaller peripheral adenocarcinoma with a diameter less than 3 cm and GGO ratio over 50%.[99]^10 However, no significant correlation exists for KRAS mutations or ALK rearrangements.[100]^11^,[101]^12^,[102]^13 As a result, researchers have focused on understanding the biological characteristics of the microenvironment within GGOs. Recently, the advent of single-cell RNA sequencing (scRNA-seq) has enabled several studies on the early stages of lung malignancies.[103]^14^,[104]^15 For example, one study revealed metabolic reprogramming exhibited by malignant cells in subsolid GGO lesions and dominant enrichment of natural killer T (NKT) cells in the subsolid GGO microenvironment.[105]^14 Another study identified an early-stage lung cancer activation module (LCAM) by using scRNA-seq data of 35 LUAD lesions.[106]^15 LCAM-high lesions contain high abundance of PD1^+CXCL13^+ T cells, IgG^+ plasma cells, and SPP1^+ macrophages and respond better to anti-PD1 immunotherapy independent of the tumor mutation burden (TMB). Although such studies provided exclusive insights into clinically relevant cellular heterogeneity associated with early-stage LUAD, we are still far from understanding the co-operations among subpopulations of cells within the malignancies. Growing evidence shows that malignant cells and their neighbors behave as communities, and increasing attention is now being directed toward the behavior of diverse cell types as an ecosystem.[107]^16 Here, we integrated EcoTyper, a machine learning framework, to delineate a high-resolution atlas of multicellular ecotypes depicting the progression of LUAD from GGO toward advanced stages. We generated and analyzed both scRNA-seq and whole-exome sequencing (WES) data from 58 patient samples and identified 42 cell states consisting of malignant cells and other associated cell types within the GGO microenvironment. We uncovered six multicellular ecotypes that extend beyond previous knowledge of GGO-associated malignancies, in which early detection of tumor-associated neoantigens represented a main barrier for the malignant progression. We further showed that computationally predicted EGFR-mutant-related neoantigens could be detected in an additional cohort by using human leukocyte antigen (HLA) immunopeptidome and the corresponding peptides could potentially induce a T cell-specific reactivity. These six ecotypes thus may participate in the progression of early-stage LUAD in a cross-community-dependent manner and have significant potential for therapeutic interventions. Results Discovery of cell states and multicellular ecotype portraits of GGO and advanced LUAD To reveal the multicellular ecotypes and distinct cell states present within both early-stage GGO lesions and advanced LUADs, we utilized EcoTyper, a unified chassis for scRNA-seq datasets.[108]^17 Our analysis was based on scRNA-seq data obtained from 55 surgical biopsies of patients ([109]Figure 1A; [110]Table S1). Figure 1. [111]Figure 1 [112]Open in a new tab Characterization of GGO and solid lung adenocarcinoma cell states and LMEs (A) Schematic depicting the analytical framework and stratifications of patients based on the radiological patterns. Yellow dashed lines highlight the malignant lesions. Scale bars, 1 cm (identical for all the CT images). (B) Overview of clinical characteristics of primary untreated individuals with lung adenocarcinoma. (C) The t-SNE (t-distributed stochastic neighbor embedding) by major cell partitions (left), tissue types (middle), or specimens from different patients (right). (D) Lung malignant ecotypes detected in untreated individuals with lung adenocarcinoma. Top: LME compositions are depicted as network diagrams. The width of each edge represents the Jaccard index across tumor samples. Bottom: cell state compositions in the corresponding LMEs. (E) Characteristics of LMEs in the discovery cohort. Top: proportions of major cell types (averaged and scaled). Bottom: mutation frequencies of driver genes. (F) The fraction of LMEs in different radiological clinical stages. Endo, endothelium; Epi, epithelium; Fib, fibroblast; NK, natural killer cell; DC, dendritic cell; PMN, polymorphonuclear cell (mainly neutrophil in our datasets); Mo/Mφ, monocyte and macrophage. We conducted droplet-based scRNA-seq using 10X Genomics on a total of 35 GGO samples from untreated patients and 14 solid nodule LUAD samples from treatment-naive patients ([113]Figure S1A; [114]Table S1). The GGO samples were categorized into six different patterns based on their radiological features, including pure GGO (pGGO, n = 6, Hounsfield unit [HU] < −500), high-density GGO (dGGO, n = 6, with a density measurement HU between −500 and −300) ([115]Figures 1A and [116]S1B), GGO with 25% solid component (GGO25, n = 4), GGO with 50% solid component (GGO50, n = 6), GGO with 75% solid component (GGO75, n = 7), and GGO with 100% solid component (GGO100, n = 6). The solid nodule subgroup was further divided based on size and lymph node metastasis, including 1–2 cm solid nodule (Solid1, n = 4), 3–5 cm solid nodule (Solid3, n = 3), and solid nodule with regional lymph node metastases (SolidN, n = 7). Among these samples, 47 were also subjected to WES analysis ([117]Figure 1B). This classification is clinically relevant and covers most of the radiological heterogeneity observed in outpatients ([118]Figure 1B). We also included 6 normal lung tissue samples from patients with benign lung diseases (pulmonary bullae or sequestration) to ensure the specificity of our analyses ([119]Table S1). We pooled all cells together and performed batch correction ([120]Figure S1C) and corrected for read depth and mitochondrial read counts ([121]Table S2). Our scRNA-seq analysis yielded 33,336 unique transcripts from a total of 462,784 single cells ([122]Figure 1C). Of these, 294,507 cells (63.64%) were from GGO samples, 127,204 cells (27.49%) were from solid nodules, and the remaining 41,073 cells (8.88%) were from the normal samples ([123]Figures 1C and [124]S1D). Cells were then subjected to non-supervised cell clustering using Seurat,[125]^18 which led to the identification of 11 major cell subtypes, including epithelial cells, endothelial cells, fibroblasts, B cells, CD4^+ lymphocytes, CD8^+ lymphocytes, natural killer (NK) cells, dendritic cells (DCs), mast cells, polymorphonuclear leukocytes (PMNs), and monocytes/macrophages ([126]Figures 1C and [127]S1D; [128]Table S2). There were no significant differences in terms of cell subtype distributions among radiologically classified samples, although a trend of lower myeloid cell compartment was observed in advanced solid nodules ([129]Figures S1E and S1F). To elucidate the cell states of each cell type in patient-derived samples, we applied non-negative matrix factorization to the gene expression profiles of 11 major cell types. This allowed us to reconstruct a weighted combination of discrete transcriptional matrices that represent specific cell states within each cell type (see [130]STAR Methods). We chose the number of cell states per cell type to maximize sensitivity of cell state discovery and ensure the stability of clustering results ([131]Figure S2A). Using this approach, we identified 42 distinct cellular states among different GGOs and advanced solid nodular samples ([132]Table S2). These included (1) three B cell states; (2) five CD4^+ lymphocyte states, which covered a TIGIT-expressing regulatory T cell state, a naive CD4 T cell state, a CXCL13-expressing CD4 T cell state, and a helper T cell state; (3) four CD8^+ lymphocyte states, including a GZMB^+ effector CD8 T cell state, a PD1^+CXCL13^+ exhausted CD8^+ T cell state, and a transitional CD8 T cell state; (4) two major DC states, including a CCR7-expressing DC and a CD1A^+ conventional DC state; (5) five endothelial cell states, including a tumor-educated endothelial cell state, a tip-like endothelial cell state, a stalk-like endothelial cell state, a scavenging endothelial cell state, a lymphatic endothelial cell state, an artery endothelial cell state, and an alveolar type I endothelial cell state; (6) three epithelial cell states representing two alveolar-derived adenocarcinoma cell states; (7) four fibroblast cell states, including pericytes, collagen-expressing fibroblast cell states, and myofibroblasts; (8) five monocyte/macrophage cell states, including a SPP1^+ macrophage cell state, tissue-resident macrophages, monocytes, early-state macrophages, and DC-like monocytes; (9) four NK cell states, including an NKT cell state and a toxic XCL1^+ NK cell state[133]^19; (10) four mast cell states; and (11) three major PMN cell states ([134]Figure S2B). Ecotype-determined cell states have been also validated in an independent in-house cohort comprising 76 samples sequenced by bulk RNA sequencing (RNA-seq) ([135]Figure S2C). We then clustered these cell states into communities by maximizing the co-association patterns,[136]^17 revealing six lung multicellular ecotypes (LMEs) ([137]Figure 1D). The LMEs varied considerably in their numbers of elemental cell states. For example, LME03 contained the most heterogeneous cell states among all the ecotypes, in which epithelial cells harbored mostly EGFR mutations, histone demethylase KMT2D mutations, and tumor suppressor TP53 mutations ([138]Figure 1E). We observed a significant increase in LME01, LME02, and LME04, while a decrease in LME03 and LME05 accompanying the progression from normal lung tissue to advanced solid nodule ([139]Figure 1F) and similar trends of LME changes were also observed in our bulk RNA-seq dataset ([140]Figures S2D and S2E). Overall, our data revealed extensive multicellular ecotypes in GGO and advanced solid nodules, which may potentially indicate different stages of LUAD development. Malignant cell states in GGO and advanced LUAD GGOs can remain dormant for decades but eventually progress toward advanced solid nodules as the proportion of the solid part increases.[141]^20^,[142]^21 However, in some cases, patients are diagnosed with pure solid nodule without any preceding GGO pattern. The differential molecular features of these radiological patterns and their respective cellular progression routes remain unknown. To gain insight into their underlying molecular characteristics, we first focused on the malignant cell population in our patient cohort. We identified malignant cells by inferring large-scale copy-number variations (CNVs) using their normal immune, endothelial, and fibroblast cells as references ([143]Figures 2A and [144]S3A). To