Abstract

   Lung adenocarcinoma featured as mixed ground-glass opacity (mGGO)
   doubled its volume half of the time in comparison with that featured as
   pure ground-glass opacity (pGGO). The mechanisms underlying the
   heterogeneous appearance of mGGO remain elusive. In this study, we
   macro-dissected the solid (S) components and ground-glass (GG)
   components of mGGO and performed single-cell sequencing analyses of six
   paired components from three mGGO patients. A total of 19,391
   single-cell profiles were taken into analysis, and the data of each
   patient were analyzed independently to obtain a common alteration.
   Cancer cells and macrophages were the dominant cell types in the S and
   GG components, respectively. Cancer cells in the S components, which
   showed relatively malignant phenotypes, were likely to originate from
   both the GG and S components and monitor the surrounding tumor
   microenvironment (TME) through an intricate cell interaction network.
   SPP1 ^hi macrophages were enriched in the S components and showed
   increased activity of chemoattraction, while macrophages in the GG
   components displayed an active antimicrobial process with a higher
   stress-induced state. In addition, the CD47–SIRPA axis was demonstrated
   to be critical in the maintenance of the GG components. Taken together,
   our study unraveled the alterations of cell components and
   transcriptomic features between different components in mGGOs.

   Keywords: ground-glass opacity, non-small cell lung cancer, intra-tumor
   heterogeneity, single-cell RNA sequencing, macrophage

Introduction

   For early-stage lung adenocarcinoma (LUAD), one of the radiological
   features could be pulmonary ground-glass (GG) opacities (GGOs), which
   are usually managed clinically based on their sizes, locations, growth
   rates, and percentages of the solid (S) components ([41]1–[42]3). Based
   on whether existing the S components, GGOs are usually classified as
   pure GGOs (pGGOs), which are entirely composed of the GG components,
   and mixed GGOs (mGGOs), containing both the GG and S components.
   According to epidemiological investigations, it is now generally
   accepted that lung cancers featured as mGGOs imply a higher likelihood
   of disease progression in comparison with those featured as pGGOs
   ([43]4, [44]5). Several clinical trials have shown that the increasing
   ratio of the S components leads to poor prognosis ([45]6, [46]7).
   Furthermore, a recent study found limited response of GGO to the
   PD-1/PD-L1 therapy, suggesting different immune microenvironments
   between GGO and advanced lung cancer ([47]8). Therefore, the mechanism
   underlying the specific features of GGO should be studied, and novel
   strategies to manage the disease should be proposed.

   Radiological features of mGGOs have been largely investigated, as well
   as common genetic alterations of GGOs were also characterized using
   bulk sequencing. The most frequent mutation identified in GGOs in Asia
   is EGFR mutations, while mutations in common lung cancer driver genes
   such as TP53, ALK, and KRAS are only sporadically found in GGOs
   ([48]9–[49]13). This indicates that different molecular features could
   be found in GGOs. However, limited studies have been performed in
   understanding the development of GGOs, especially interpreted distinct
   growth patterns among the S and GG components within mGGOs. One of the
   reasons is that malignant cells in GGOs are hard to investigate by
   using traditional bulk sequencing techniques due to their limited
   fractions in the lesions. Furthermore, the crosstalk between malignant
   cells and immune cells is also essential in the progression of
   neoplasia ([50]14). To decipher intricate single-cell profiles,
   single-cell RNA sequencing (scRNA-seq) was introduced.

   In recent years, the number of scRNA-seq research on GGOs has exploded.
   Lu et al. found higher heterogeneity in clonal architectures of solid
   LUAD tumors compared to pGGO by scRNA-seq ([51]15). Another study also
   demonstrated the differences among solid LUAD tumors, mGGOs, and normal
   lung tissues and depicted the distribution of the various cell types in
   mGGOs ([52]16). These prior studies have provided valuable information
   to outline the overall molecular characteristics of GGOs. However, the
   molecular-level differences between the S and GG components, and
   whether there is a developmental relationship between the two types of
   components remain unclear.

   From real-world clinical observations, the different proportions
   between the S and GG components, or the consolidation-to-tumor ratio
   (CTR) in mGGOs, implied distinct fate of disease progression ([53]17,
   [54]18). From a pathological point of view, the S portion of mGGOs
   exhibits infiltrative characteristics and thus is considered more
   malignant than the GG components. However, the factors driving these
   infiltrative characteristics are currently not well defined. In this
   study, we focused on the intra-nodule heterogeneity of mGGO nodules and
   attempted to identify the key molecular mechanisms underlying the
   transformation from early pre-infiltrative lesions to minimally
   invasive adenocarcinoma (MIA) or invasive adenocarcinoma. In this
   regard, we performed scRNA-seq on paired GG and S component tissues
   from three patients with pulmonary mGGO nodules of similar sizes in
   which the GG and S portions were clearly demarcated and investigated
   potential mechanisms underlying the progression of mGGOs.

   In this study, we depicted distinct cell atlas and transcriptomic
   features of different components in mGGOs. The dominant cell types
   within the lesions, cancer cells, and macrophages were extensively
   discussed. Coupled non-negative matrix factorization (cNMF) was used to
   decipher common and unique changes in cancer cells and macrophages in
   different components. Furthermore, the evolution of cancer cells was
   analyzed using two different ways of trajectory analysis. Furthermore,
   we defined 4 component-enriched macrophage subsets from 12 macrophage
   subclusters in mGGO samples. Finally, alterations of comprehensive cell
   networks were described, and region-specific intercellular interactions
   were explored. To eliminate inter-patient heterogeneity, most of the
   results were obtained after intersecting data from separate analyses in
   a patient-independent way. Taken together, our study explored a new
   strategy to investigate human tumor samples and provided a
   comprehensive transcriptomic overview of the GG and S components of
   mGGOs for the first time.

Material and Methods

Patients and Tissue Samples

   Surgical tissue samples were collected from three patients who
   underwent surgical resections of mGGOs at the Second Xiangya Hospital
   of Central South Hospital from October 2019 to April 2020. The
   inclusion criteria were as follows: 1) the radiological manifestation
   was mGGO, 2) the solid area was relatively flat and clear demarcated
   from the GG area under high-resolution CT (HRCT), 3) patients were
   disease free except for lung nodules, 4) patients had not received any
   antitumor therapy before the surgery, and 5) pathological examination
   suggested invasive LUAD. Detailed clinical information is shown in
   [55]Supplementary Table 1 . This study was approved by the Ethics
   Committee Board of the Second Xiangya Hospital of Central South
   Hospital (2020084). Written informed consent was obtained from all
   participants included in this study.

Tissue Dissociation and Single-Cell RNA Sequencing Sample Preparation

   Paired patient tissue samples of the S and GG components from the same
   mGGOs were collected during surgeries. An incision along the largest
   cross-section of the mGGO nodule was performed, and naked-eye
   identification and CT were used to map the focal areas of the S and GG
   components. Dissected areas were identified by at least two first-line
   thoracic surgeons based on CT images and tissue appearance. The GG
   areas usually appeared grayish white, while the solid areas usually
   appear dark gray or waxy white, with visible blood vessels and fine
   bronchial perforations. To avoid mixtures between different focal
   areas, the GG components distal to the nodule were taken, and it was
   ensured that the dissected solid focal areas were smaller than their
   actual sizes.

   The samples were stored at 4°C in the GEXSCOPE™ Tissue Preservation
   Solution (Singleron Biotechnologies, Nanjing, China) and processed on
   ice within 72 h. After samples were washed 3 times with Hanks’ Balanced
   Salt Solution (HBSS) and minced into 1–2-mm-thick pieces, 2 ml of
   GEXSCOPE™ Tissue Dissociation Solution (Singleron Biotechnologies,
   Nanjing, China) was added to each sample and allowed for digestion at
   37°C for 15 min to dissociate the tissue into single cells. Then, the
   samples were filtrated using 40-μm sterile strainers, followed by
   centrifugation at 1,000 rpm for 5 min. After the centrifugation, the
   precipitation was resuspended in 1 ml of phosphate-buffered saline
   (PBS; HyClone, Marlborough, MA, USA). Subsequently, 2 ml of GEXSCOPE™
   Red Blood Cell Lysis Buffer (Singleron Biotechnologies, Nanjing, China)
   was added to the cell suspension and incubated at 25°C for 10 min to
   remove red blood cells. After that, the mixture was centrifuged at 500
   × g for 5 min, and the precipitation was resuspended in PBS. Finally,
   trypan blue (Millipore Sigma, Merck KGaA, Darmstadt, Germany) was used
   to stain the samples, and the cell viability was evaluated under a
   phase-contrast light microscope (Nikon, Tokyo, Japan).

Tissue Density Calculation

   Tissue density was obtained by the equation below after calculating the
   wet weight of dissociated tissue (including samples excluded due to
   non-compliance with the requirements such as pathological typing):

   Tissue density = tissue weight (mg)/[number of dissociated cells/10
   ([56]4)]

   This indicated the weight of tissue occupied per 10,000 cells.

Single-Cell RNA Sequencing and the Primary Analysis of Sequencing Data

   GEXSCOPE™ Single Cell RNA Library Kit Tissue (Singleron
   Biotechnologies, Nanjing, China) was used to barcode single cells,
   capture mRNA from isolated single cells, and generate cDNA libraries
   for scRNA-seq. Then, individual libraries were diluted to 4 ng/μl and
   pooled for sequencing on a HiSeq X platform (Illumina, San Diego, CA,
   USA) with 150-bp paired-end reads.

   Low-quality reads and adaptor sequences were first removed with fastQC
   and fastp to generate clean reads ([57]19). Subsequently, clean reads
   were mapped to the reference genome GRCh38 (Ensembl version 92 gene
   annotation) with STAR ([58]20). After that, expression matrix files
   were generated based on gene counts and unique molecular identifier
   (UMI) counts that were acquired by featureCounts software ([59]21).

Quality Control, Dimension Reduction, and Clustering of the Single-Cell RNA
Sequencing Data

   Raw reads from each patient were processed to generate gene expression
   profiles using a celescope1.3.0 pipeline. Briefly, after filtering read
   1 without poly T tails, valid cell barcode and UMI were extracted.
   Adapters and poly A tails were trimmed (fastp V1) before aligning read
   2 to GRCh38 with ensemble version 92 gene annotation (fastp 2.5.3a and
   featureCounts 1.6.2) ([60]21). Reads with the same cell barcode, UMI,
   and gene were grouped together to calculate the number of UMIs per gene
   per cell. The UMI count tables of each cellular barcode were used for
   further analyses. Before further analyses, cells were filtered by UMI
   counts, gene counts, and the mitochondrial content ratio. Only cells
   with UMI counts below 30,000, gene counts between 200 and 5,000, and
   the mitochondrial content below or equal to 50% were retained. After
   filtering dimension reduction and clustering were applied using Seurat
   v3.1.2 ([61]22). Specifically, NormalizeData and ScaleData functions
   were utilized for the normalization and scaling of all gene
   expressions. Then, the expression matrix of each patient was
   integrated, principal component analysis (PCA) was performed with the
   top 2,000 variable genes that were selected by FindVariableFeautres(),
   and the top 20 principal components were used to separate cells into
   multiple clusters with FindClusters(). After that, the Uniform Manifold
   Approximation and Projection (UMAP) algorithm was applied to visualize
   cells in a two-dimensional space.

Differentially Expressed Gene Analysis

   To identify differentially expressed genes (DEGs) of each cluster,
   Seurat v3.1.2 FindMarkers() was utilized to select genes based on
   Wilcoxon likelihood-ratio test with default parameters. DEGs were
   defined as genes that are expressed in more than 10% of the cells in a
   cluster and with an average log[2](Fold Change) of greater than 0.25.

Cell Type Annotation

   The cell type of each cluster was annotated based on the expression of
   canonical markers found in the SynEcoSys database (Singleron
   Biotechnology). Seurat v3.1.2 DoHeatmap(), DotPlot(), and Vlnplot()
   were used to generate heatmaps, dot plots, and violin plots to display
   the expression of the markers used to identify different cell types,
   respectively.

Single-Cell RNA Sequencing-Based Copy Number Alteration Detection

   The copy number alterations (CNAs) in cancer cells were detected with
   InferCNV package ([62]23), with non-malignant immune cells as
   baselines. Genes expressed in more than 3 cells were sorted based on
   their loci on each chromosome. The ceiling of the relative expression
   values was set as 1.5 SDs from the residual-normalized expression
   values, and the relative expression values were centered at 1. The
   relative expression on each chromosome was smoothened using a slide
   window size of 101 genes to remove the effect of gene-specific
   expression.

Pathway Enrichment Analysis

   To investigate the potential functions of DEGs, the “clusterProfiler” R
   package version ([63]24) was used to perform Gene Ontology (GO), Kyoto
   Encyclopedia of Genes and Genomes (KEGG), and Reactome analysis.
   Adjusted p (p_adj) values of less than 0.05 were used to define
   significantly enriched pathways. Referred GO gene sets included
   molecular function (MF), biological process (BP), and cellular
   component (CC) categories. Gene set enrichment analysis (GSEA) was also
   performed in the macrophage subclusters Mac3 and Mac4.

Coupled Non-Negative Matrix Factorization Analysis

   With the use of the cNMF algorithm, the genes of targeted cell types
   were first filtered, and the number of meta-programs was confirmed
   according to the statistical stability and the prediction error rate.
   Then, the meta-programs were extracted, and the score of each program
   was calculated for each cell.

   Also, the Jaccard similarity coefficient was used to compare the
   transcriptional similarity between the meta-programs and particular
   cell types. The Jaccard similarity coefficient was calculated using the
   top 100 marker genes of each cell type and the top 100 genes of each
   meta-program.

Trajectory Analysis

   To map the differentiation/conversion of particular cell types,
   pseudotime trajectory analysis was performed with Monocle2 ([64]25). To
   construct the trajectory, Seurat v3.1.2 FindVairableFeatures() was used
   to select highly variable genes from clusters, and dimension reduction
   was performed with DDRTree(). Finally, the trajectory was visualized by
   plot_cell_trajectory().

RNA Velocity

   For the RNA velocity analysis, the BAM files containing cancer cells
   from each patient were analyzed with the velocyto ([65]26) and the
   scVelo ([66]27) in python with default parameters. The reference genome
   used was GRCh38 (Ensembl version 92 gene annotation). To ensure
   visualization consistency, “runUMAP” function of Seurat was applied to
   plot the corresponding cell populations.

Single-Cell Entropy Analysis

   To evaluate the stemness of cells, the entropy of gene expression was
   calculated based on single-cell expression profiles with SLICE (version
   0.99.0) ([67]28). ERCC spike-ins and ribosomal genes were removed, and
   SLICE object was created to perform the bootstrap calculation of
   single-cell gene entropy values using the getEntropy() function.

Violin Plots of Differentially Expressed Genes in Focus Subclusters of Cancer
Cells

   Violin plots were utilized to demonstrate the enhanced expression of
   genes in specific subpopulations after the re-clustering of cancer
   cells. The focus subcluster was defined as the dominant subcluster with
   the highest proportion in the cancer cells of each patient, which
   included the C3 subcluster for the P01 cancer cells, the C6 subcluster
   for the P02 cancer cells, and the C4 subcluster for the P03 cancer
   cells. The gene expressions in the focus subcluster and other
   subclusters were visualized in violin plots using VlnPlot() function.

Cell–Cell Interaction Analysis

   To analyze the cell–cell interaction, the CellPhoneDB ([68]29) was
   utilized based on known receptor–ligand pairs. To calculate the null
   distribution of average ligand–receptor expression levels in the
   interacting clusters, the cluster labels of all cells were randomly
   permuted 1,000 times. The threshold of cells expressing within each
   cluster to 0.1 was then set, and the significant interaction pairs
   whose p-value <0.05 was visualized with the plot function embedded in
   the CellphoneDB.

Immunofluorescence

   All tissues were fixed in 4% formaldehyde, then embedded in paraffin,
   and cut into sections of 5-µm thickness for staining with H&E and
   immunostaining. After being deparaffinized and rehydrated and antigen
   was retrieved and blocked by standard techniques, sections were then
   incubated overnight at 4°C with primary antibodies at the following
   dilutions: CD68 (1:100, Servicebio, Gent, Belgium), NAPSA (1:100,
   ABclonal, Woburn, MA, USA), CD206 (1:3,000, Servicebio), G3BP2 (1:100,
   Affinity Biosciences, Cincinnati, OH, USA), SIRPA (1:50, SAB, Nanjing,
   China), HLA-F (1:8000, ProteinTech, Chicago, IL, USA), CD47 (1:100,
   HUABIO, Woburn, MA, USA), CD44 (1:100, Bioss, Woburn, MA, USA), and
   SPP1 (1:100, HUABIO).

Results

Distinct Pathological Features of Ground-Glass and Solid Components in Same
Mixed Ground-Glass Opacities

   We enrolled five non-smoking mGGO patients with clear GG/S margin. By
   synchronizing the identification of CT and tissue section appearance,
   different components of mGGO were able to be distinguished among three
   of them, followed by scRNA-seq ([69] Supplementary Figure 1A and
   [70]Supplementary Table 1 ).

   The images of the mGGOs and their pathological sections revealed
   different features in the GG and S component areas. The GG components
   are focal nodular areas of alveolar epithelial cells in a lepidic
   growth pattern, thickened alveolar walls with an enlarged
   air-containing cavity, and infiltrated immune cells ([71] Supplementary
   Figure 1B ). The S components could be relatively consistently
   characterized by the absence of organized fibrous structures and
   obvious alveolar tissues and presented large areas of disorganized
   fibrous necrosis, third lymph nodes, tumor cell infiltration, and
   malignant gland tissue ([72] Supplementary Figure 1C ). Tissues were
   dissected and then digested into single-cell solution, and then single
   cells were counted. Tissue density was continuous in GG regions,
   whereas S areas were distinct ([73] Supplementary Figure 1D ).

A Single-Cell Landscape of Paired Ground-Glass and Solid Components From
Mixed Ground-Glass Opacities

   To characterize the cellular dynamics in mGGO components, a total of
   19,391 isolated single cells (8,635 from the GG components and 10,392
   from the S components) were obtained from three patients diagnosed with
   Invasive Adenocarcinoma (IAC) and were subjected to UMAP clustering
   analysis ([74] Figure 1A ). All cells were classified into nine major
   cell types ([75] Figure 1B ). Based on the expression of
   cell-type-annotation markers from the SynEcoSys database, we found that
   the cells are comprised mainly of clusters of cancer cells (GG
   components, 975; S components, 4,629), macrophages (GG components,
   5,888; S components, 3,705), dendritic cells (DCs) (GG components, 487;
   S components, 602), T cells (GG components, 115; S components, 363), B
   cells (GG components, 50; S components, 3), plasma cells (GG
   components, 116; S components, 240), mast cells (GG components, 68; S
   components, 124), club cells (GG components, 257; S components, 27),
   and ciliated cells (GG components, 134; S components, 51) ([76]
   Figures 1C, D ). Moreover, we identified a cluster annotated as the
   proliferating myeloid cells, which were composed of macrophages (GG
   components, 327; S components, 450) and DCs (GG components, 51; S
   components, 102) in proliferation, and the proportions of the S
   components were slightly higher than those of GG areas ([77] Figure 1C
   and [78]Supplementary Figure 2A ).

Figure 1.

   [79]Figure 1
   [80]Open in a new tab

   Sampling, sequencing, and clustering from mGGO. (A) Sampling and
   sequencing workflow. (B) Cell atlas of mGGO shown by UMAP; each dot
   corresponds to a single cell, and clusters are labeled with names and
   colored. (C) Differences of cluster proportion between S component and
   GG component of mGGO; chi-squared test was used for analysis. (D)
   Bubble plot shows mean expression level of marker genes (x-axis) for
   each cell type (y-axis); dot size, proportion of cells expressed the
   gene; dot color, average expression level among cells. mGGO, mixed
   ground-glass opacity; UMAP, Uniform Manifold Approximation and
   Projection; S, solid; GG, ground glass. * P<0.05, ** P<0.01, ****
   P<0.0001, ns non-significance.

   The most abundant cells of mGGOs were observed to be macrophages and
   cancer cells, occupying nearly 50% of the cells in the GG and S
   component samples, respectively ([81] Figure 1C and [82]Supplementary
   Figure 2B ), and this was further confirmed by immunofluorescence
   staining ([83] Supplementary Figure 2C ). Moreover, we confirmed that
   DCs were the third major cell type, dominated by conventional DCs
   (cDCs). No significant differences were observed in the numbers of cDC
   subtypes between the S and GG components. The proportion of lymphocytes
   was about 5% in all samples, dominated by CD8+ T cells and plasma
   cells, indicating activated adaptive immune responses ([84]
   Supplementary Figure 2D ). In addition, we found discrepancies in cell
   composition between the GG and S component samples. In the S
   components, the epithelial cell types were largely composed of cancer
   cells, while a certain amount of normal club cells and ciliated cells
   were retained in the GG components, which were consistent with the
   previous report ([85]15). These cellular compositions demonstrated both
   the common and distinct characteristics between the GG and S
   components.

   Cancer cells and macrophages were the dominating cell types
   constituting the mGGOs, and we hypothesized that the activities of
   these two cell types were key to unraveling the mechanisms underlying
   the development of mGGOs.

Cancer Cells From the Solid Components and Ground-Glass Components Exhibit
Different Transcriptional Features

   Cancer cells were identified with canonical markers of LUAD and
   alveolar epithelial cells (e.g., SFTPB, SFTPC, EPCAM, NAPSA, and
   NKX2-1). A total of 4,905 cancer cells (4,040 originated from the S
   component samples and 865 initiated from the GG component samples) were
   clustered into three distinct subsets, each of which consisted almost
   exclusively of cancer cells with single-patient origin. Within each
   subgroup, there was a more pronounced difference between cancer cells
   of the S component versus GG origin ([86] Figure 2A ). This suggests a
   predominance of tumor heterogeneity between patients, while cancer
   cells from different radiological components exhibit milder
   intratumoral heterogeneity. Cancer cells in the S components
   upregulated SOX4 and CEACAM5 and downregulated pro-inflammatory factors
   such as SCGB1A1 and SFTPC in comparison with cancer cells in the GG
   components ([87] Figure 2B ). The Reactome pathway enrichment analysis
   showed that the interleukin-related pathway, receptor tyrosine kinase
   (RTK) signaling, VEGFR2-mediated proliferation regulatory signals, and
   interferon alpha/beta signaling pathway were upregulated in cancer
   cells located in the S components. Moreover, the MAPK pathway, TNF
   signaling pathway, and apoptosis-related pathway were also enriched by
   KEGG pathway analysis ([88] Supplementary Figure 3A ). In comparison
   with cancer cells in the GG components, downregulated pathways include
   interferon gamma signaling pathway, surfactant metabolism, and
   lysosome-related pathway ([89] Supplementary Figure 3B ). The copy
   number variations (CNVs) were higher in cancer cells relative to immune
   cells ([90] Figure 2C ). Cancer cells were classified into 8 sub-clones
   based on the CNVs. Each patient harbored unique sub-clones, while some
   sub-clones (clones 1, 2, 3, and 8) were shared by cancer cells in
   different components of one individual. To obtain transcriptional
   signatures of cancer cells from different samples, we performed the
   cNMF consensus clustering of RNA-seq data and identified seven
   meta-programs of cancer cells. The meta-program scores were
   heterogeneous in S component samples, while similar patterns among the
   GG components were observed, especially the meta-program 7 ([91]
   Figures 2D, E ). The meta-program 7 gene signature included genes
   encoding surfactant-associated proteins A1, A2, B, C, and D (SFTPA1,
   SFTPA2, SFTPB, SFTPC, and SFTPD), cell adhesion factors, lysosomal
   homeostasis-related genes, and factors regulating cell growth cycle or
   apoptosis. We found that meta-program 7 included genes of extracellular
   matrix receptor proteins, such as integrin superfamily members
   including ITGB6 and ITGA2, as well as CD44 and TNC ([92] Supplementary
   Table 2 ), indicating that it might influence tumor progression by
   mediating signaling of immune modulators and regulating bindings and
   adhesion to collagen fibers ([93]30, [94]31). In a nutshell, these data
   imply that the cancer cells are more heterogeneous with high expression
   levels of cancer biomarkers in the S component samples, and it might be
   flawed to find a uniform molecular signature for these cancer cells.
   The cancer cells in the S component samples were more heterogeneous and
   showed more obvious expressions of cancer biomarkers. The cancer cells
   in the GG component samples, however, highly expressed
   surfactant-associated proteins and exhibited stress-related features.

Figure 2.

   [95]Figure 2
   [96]Open in a new tab

   Distinct transcriptome signatures of cancer cells in different
   components. (A) The UMAP visualization of cancer cells: the left graph
   shows that they are clustered into three subclusters, and the right
   graph labels these cells by sample sources. (B) The volcano plot
   reveals differentially expressed genes in S component and GG component.
   Threshold has been set as 10^−5 for p_val_adj and 0.75 for avg_logFC.
   (C) Hierarchical clustering of the chromosomal gene expression pattern
   separating cancer cells from non-malignant immune cell clusters. Each
   row represents a sample, and each column represents a single cell. (D)
   The UMAP visualization shows the expression level of gene sets within
   each meta-program clustered by the cNMF in cancer cell clusters. (E)
   The heatmap shows the expression level of gene sets within each
   meta-program (row) by sample (column). The red box marks GG
   component-specific meta-program. UMAP, Uniform Manifold Approximation
   and Projection; S, solid; GG, ground glass; cNMF, coupled non-negative
   matrix factorization.

Cancer Cells in the Solid and Ground-Glass Components Differ in Heterogeneity
and Stemness, But Both May Contribute to the Tumor Development

   To investigate the grade of cancer cells, we explored alveolar type I
   (AT1) and alveolar type II (AT2) cells with annotated markers,
   including PDPN, AGER, ABCA3, and SFTP gene families ([97]32, [98]33)
   ([99] Figure 3A ). The AT1 markers PDPN and AGER were sporadically
   expressed, indicating a low ratio of AT1-like cells. AT2 markers SFTPB
   and ABCA3 were simultaneously expressed in a large proportion of cancer
   cells. Most cancer cells in the GG components expressed SFTPA1 and
   SFTPA2, although cancer cells in the S components of different
   individuals had diverse distribution patterns. SFTPC is expressed in a
   higher proportion of cancer cells in the GG component states. To sum
   up, AT1-like cells were rarely found in mGGO samples. AT2-like cells
   varied among patients. Interestingly, we found that the majority of
   cancer cells from P01-GG expressed all AT2 markers, which strongly
   indicated their origin from AT2 cells.

Figure 3.

   [100]Figure 3
   [101]Open in a new tab

   Origin of cancer cells in solid part of mGGO. (A) The gene plot of
   published AT1 (AGER and PDPN) and AT2 (ABCA3, SFTPA1, SFTPA2, SFTPB,
   and SFTPC) markers expressed in cancer cells; red dots are cells that
   expressed these genes. (B) Subclustering of cancer cells by each
   patient; each color represents a subcluster of cancer cells, and the
   proportion of each subcluster is displayed on the bar graph on the
   right. (C) The violin plot reveals differential expression of
   surfactant family between ground-glass (GG) and solid (S) components
   growth privilege cancer subcluster (labeled as focus) and other
   clusters (labeled as other). Wilcoxon rank-sum test was used for
   analysis. (D) The UMAP visualization of cancer subclusters trajectory
   used monocle by sample. Each subcluster is plotted individually, and
   the red box marks the GG/S component growth privilege cancer
   subcluster. (E) The PAGA embedded with RNA velocity reveals the
   possible trajectory of cancer subclusters. Subclusters highlighted with
   darker node backgrounds are the assuming origin cluster based on RNA
   velocity. mGGO, mixed ground-glass opacity; UMAP, Uniform Manifold
   Approximation and Projection.

   Taking the consideration of CNVs where cancer cells between different
   components of mGGO shared similar structural genomic rearrangements
   within each patient, the homogenous genesis of the GG and S components
   was suggested. To explore whether there was a developmental
   relationship between the S and GG components, we clustered the cancer
   cells from individuals ([102] Figure 3B ). We discovered that cancer
   cells seemed less heterogeneous and that each patient possessed a
   dominant cancer cell subcluster (P01-C3, P02-C6, and P03-C4) in the GG
   component. In contrast, in the S component samples, cancer cells
   demonstrated a mixed composition of multiple subclusters without any
   dominant subpopulation, which was consistent with the pathological
   features of LUAD ([103]34).

   We then analyzed the DEGs of the dominant subclusters in the GG
   component samples. Similar to the overall transcriptional profile of
   cancer cells in the GG component samples, surfactant-associated
   proteins (SFTPA1, SFTPA2, SFTPB, SFTPC, and SFTPD) were highly
   expressed, and the expression level was higher than that of other
   cancer cell subclusters of the GG component samples ([104] Figure 3C ).

   To investigate the developmental relationship between cancer cells from
   different pathological areas of mGGOs, we conducted the trajectory
   analysis of cancer cells from each of the three patients. The dominant
   subclusters of cancer cells in the GG components were generally located
   at one end of the trajectory, while the distributions of cancer cells
   were dispersed in the S components ([105] Figure 3D ). This indicates
   that most of the cancer cells in the GG components were in a similar
   state of the putative evolutionary process, whereas cancer cells in the
   S component were scattered along the trajectory and showed branching
   evolutionary architectures. Furthermore, we also mapped the
   developmental trajectories with the RNA velocity analysis for cancer
   cells in each patient, so as to infer the cancer cell subcluster in the
   initial state ([106]27). The results showed that the initial state
   cancer cell subclusters in two of the three patients (P02 and P03) were
   the dominant subclusters of the GG components, while the dominant
   cancer cell subcluster (P01-C2) in the GG component of patient P01 was
   the second to the initial state ([107] Figure 3E ). Moreover,
   tumor-promoting genes such as HMGB3 and IGFBP2 were highly expressed in
   P01-C2, which implied a poorly differentiated subpopulation of
   malignant cancer cells ([108]35). In addition, the stemness of cancer
   cells in the S component samples was slightly higher than that in the
   GG components ([109] Figure S3C ). Taken together, our results suggest
   that the cancer cells could originate from either GG foci cancer cells
   or solid foci cancer cells.

Diversity of Macrophages Within Different Components in Mixed Ground-Glass
Opacity

   According to the expression of annotated marker genes LYZ, CD68, MRC1,
   and MARCO, we identified a total of 9,593 macrophages ([110] Figure 1D
   ). Further, we identified two macrophage types with canonical marker
   gene expression (IL-1B, IL-6, and TNF for M1 macrophages, and CD163,
   MRC1, and FN1 for M2 macrophages), and we confirmed that M2 macrophages
   were predominant in the mGGOs, and a small portion of cells expressed
   both M1 and M2 markers (referred as the M1–M2 type) ([111] Figures 4A,
   B ). We noticed that the proportion of M1_M2 macrophage was higher in
   the S components ([112] Figure 4C ).

Figure 4.

   [113]Figure 4
   [114]Open in a new tab

   Diversity of macrophages within different components in mGGO. (A) The
   UMAP of M1_M2 macrophages (blue) and M2 macrophages (red) distribution.
   (B) The UMAP shows expression patterns of M1 macrophage markers IL-1B,
   IL-6, and TNF and M2 macrophage markers CD163, FN1, and MRC1. (C)
   Proportions of M1_M2 macrophage (blue) and M2 macrophage (red) in
   different samples. (D) The UMAP visualized 12 macrophage subclusters.
   (E) The heatmap reveals distinct transcription patterns among
   macrophage subclusters; each row represents a gene, and each column
   represents a macrophage. (F) Proportions of 12 macrophage subclusters
   (x-axis) reveal distinct distributions between GG component (gray) and
   S component (red) in mGGO. (G) Proportions of macrophage subclusters in
   different samples. mGGO, mixed ground-glass opacity; UMAP, Uniform
   Manifold Approximation and Projection; GG, ground glass; S, solid.

   Graph-based clustering gave rise to 12 macrophage subclusters ([115]
   Figures 4D, E ). Mac1–Mac6 were the most abundant subclusters within
   the samples ([116] Figure 4F ). Among these clusters, Mac1-INHBA ^hi,
   which upregulated FABP and CD52, and Mac2-FOLR2 ^hi, with upregulation
   of CCL13, showed diverse distribution in the S/GG components among
   different individuals ([117] Figure 4F ). Mac3-SPP1 ^hi and Mac4-LYZ
   ^hi-SPN ^lo were enriched in the S and GG components, respectively.
   Mac6-IL-1B ^hi and Mac5 also showed similar but weaker distribution
   bias in different components ([118] Figure 4G ). Interestingly, the
   transcription features of Mac4 and Mac5 were similar. For example, they
   both upregulated LYZ and AZU1, suggesting their common roles in the
   antimicrobial process. However, Mac3-SPP1^hi was characterized by a
   high expression of CCL3, CCL4, CXCL2, and CXCL8, and so did Mac6-IL-1B
   ^hi, with upregulation of CXCL3, indicating an active function of
   chemoattraction and recruitment of inflammatory cells ([119]36).
   Macrophages with high expression of SPP1 were also found to be one of
   the common phenotypes in lung cancer ([120]37). Taken together, these
   results depict the distinct distribution of macrophage subclusters
   within different components of mGGO.

   We conducted the KEGG, GO, and Reactome pathway analyses on macrophage
   subclusters. No shared pathways were found between the Mac3 and Mac6
   group and the Mac4 and Mac5 group, while both groups upregulated a few
   pathways in common within-group ([121] Figure 5A ). IL-4, IL-10, and
   IL-13 signaling, along with significantly elevated tumor-associated
   transcription factor abnormalities, upregulation of IL-17 and Hif-1a
   pathways, and glucose metabolism pathway, was enriched in Mac3 ([122]
   Figures 5B, C and [123]Figure S4A ), which pointed to M2
   characteristics. Furthermore, AGE-RAGE and Toll-like receptor (TLR)
   signaling pathways were also upregulated, accompanied by the
   upregulation of viral oncogenic pathways ([124] Figure 5B ), which
   pointed to M1 characteristics. Such transitional transcriptional
   features with both M1 and M2 characteristics could also be found in
   another subcluster, Mac6, which was also enriched in the S component
   samples.

Figure 5.

   [125]Figure 5
   [126]Open in a new tab

   Pathway enrichment of S/GG component-dominant macrophage subclusters.
   (A) The Venn diagram shows overlapping significantly upregulated
   pathways in Mac3, Mac6, Mac4, and Mac5, (B, C) The bubble plot shows
   significantly enriched KEGG and Reactome pathways in Mac3, Mac6, Mac4,
   and Mac5. The colors of bubbles represent the values of significance,
   and the sizes represent the number of genes enriched in the pathway.
   (D) The GSEA using pre-ranked differentially expressed genes between
   Mac3 _S component versus Mac4 _GG component were compared with hallmark
   gene sets. The pre-ranked gene set was defined by function of the
   FindMarkers in Seurat. The enrichment of the oxidative phosphorylation
   pathway and glycolysis pathway is shown on the graph. S, solid; GG,
   ground glass; KEGG, Kyoto Encyclopedia of Genes and Genomes; GSEA, gene
   set enrichment analysis.

   Furthermore, although glucose metabolism was upregulated in Mac3, we
   found that the expression level of genes that related to oxidative
   phosphorylation was higher in Mac4 cells than Mac3 cells, which were
   mainly derived from the GG and S component samples, respectively ([127]
   Figure 5D ). The metabolic requirement of cell proliferation is
   different in normal cells compared to cancer cells ([128]38), and
   recent studies have found that in the tumor microenvironment (TME),
   myeloid cells have the highest glucose uptake rate ([129]39).
   Therefore, these data indicate a possible shift in the glucose uptake
   and utilization during the early-stage tumorigenesis of lung cancer.

   Taken together, our data depicted various ecosystems of macrophages in
   the mGGO lesions and distinct features of macrophages enriched in
   different components of mGGO.

The Macrophages in the Solid Component Samples Upregulated NF-κB-Related
Signaling

   Next, we performed the cNMF clustering analysis and identified 8
   meta-programs of macrophages ([130] Figures 6A, B ). Meta-program 7 had
   relatively high scores in Mac3 and Mac6. Interestingly, the
   meta-program 7 scores were higher in all M1–M2 state subclusters (Mac3,
   Mac6, Mac9, and Mac12). The meta-program 7 gene signature mainly
   consisted of heat shock protein family genes, GPCR signaling
   pathway-related genes, inflammatory cytokine genes, and tumor
   suppressor genes ([131] Figure 6B and [132]Supplementary Table 3 ). We
   noticed that NFκB repressing factors such as NFκBIA and NFκBIZ were
   also included in this meta-program. We compared the enrichment of the
   NFκB pathway and the pathways downstream of NFκB in the Mac3 cells from
   the S component samples and the Mac4 cells from the GG components using
   the GSEA and found that these pathways were all enriched in the S
   component Mac3 macrophages ([133] Figure 6C ). Therefore, we speculate
   that the NFκB repressing factors in the S component Mac3 macrophages
   are negative-feedback regulations caused by the overactivation of NFκB.
   In addition, genes related to AP-1 and Bcl-2, such as FOS, JUN, and
   BCL2A1, were also enriched in the meta-program 7 ([134] Figure 6B ).
   These genes are essential regulators of the direction of macrophage
   polarization ([135]40, [136]41). Taken together, these features
   demonstrate the association of the atypical polarization feature of
   macrophages with the upregulation of meta-program 7, especially in the
   S components of mGGOs.

Figure 6.

   [137]Figure 6
   [138]Open in a new tab

   cNMF and IFA reveal different signatures of macrophages between GG
   component and S component. (A) The UMAP shows meta-program of
   macrophage genes clustered by the cNMF; the shades of the colors
   represent proportions of cells expressing the meta-program. (B) The
   heatmap reveals the expression level of meta-program (row) in each
   sample (row); GG dominant meta-program (up) and S dominant meta-program
   (down) are marked with red boxes. We also list M1-like genes, M2-like
   genes, and enriched gene sets with similar functions in each
   highlighted meta-program. (C) The GSEA using pre-ranked differentially
   expressed genes between Mac3 _S component and Mac4 _GG component were
   compared with hallmark gene sets, Biocarta gene sets, and GPCR gene
   sets. The pre-ranked gene set was defined by function of the
   FindMarkers in Seurat. We show the enrichment of NF-κB pathway (top
   left), NF-κB-initiated TNF-alpha signaling pathway (bottom left), and
   GPCR pathway (right). (D) G3BP2 expression in nlung, GG-like, and
   S-like tissue. The “Whole Tissue” panel shows the overall region of an
   “mGGO looking” area with IFA and H&E stains. A typical field of nlung,
   GG-like, and S-like was selected using green, pink, and yellow boxes,
   respectively. Scale bars, 500 μm. The zoomed-in version of each field
   is below the panels (row). Each panel, from left to right, shows the
   image of DAPI (blue) + G3BP2 (pink), DAPI (blue) + CD206 (red), and H&E
   stains. Scale bars, 100 μm. (E) IFA shows the high expression level of
   G3BP2 (red) in lepidic growth cancer cells and macrophages in the
   alveolar cavity (yellow arrow) compared with those infiltrated into
   stromal cells (white arrow). cNMF, coupled non-negative matrix
   factorization; IFA, immunofluorescence assay; GG, ground glass; S,
   solid; UMAP, Uniform Manifold Approximation and Projection; GSEA, gene
   set enrichment analysis.

The Macrophages in the Ground-Glass Component Samples Show a Gain of Function
for Dealing With Infection and Cell Mutation

   Similarly, we also investigated the macrophage subclusters of the GG
   component samples. In subcluster Mac4, the pathway enrichment analyses
   showed that the pathways related to complement cascade, ZAP-70
   signaling pathway, and MHC class II antigen presentation were
   upregulated ([139] Supplementary Figure 5C ). This suggests that Mac4
   is probably involved in clearing pathogens/antigens and presenting them
   to the adaptive immune system. Meanwhile, in Mac1 and Mac5, which
   constituted a large proportion of macrophages in the GG components,
   with similar features, ROBO/Slit signaling pathway, nonsense-mediated
   mRNA degradation pathway (NMD), and aberrant translation pathway
   associated with viral infection ([140] Figure 5C and [141]Supplementary
   Figure 4A ). Upregulation of the NMD pathway was associated with the
   sensation of cellular stress ([142]42) and enhancement of antitumor
   immune surveillance ([143]43). In addition, both the NMD and ROBO/slit
   pathways are associated with viral infections ([144]44), indicating
   that the development of GGOs is probably associated with infections.
   Interestingly, the cNMF analysis of macrophages showed that the
   subclusters highly represented in the GG components (C1, C4, C5, C10,
   and C12; [145]Figure 4F ) were all highly associated with the
   meta-program 3 ([146] Figures 6A, B ). This gene set contained a large
   number of ribosomal protein genes as well as genes involved in RNA
   processing and maturation, suggesting active translational activities
   in macrophages in the GG components ([147] Figure 6B ). Due to the
   enrichment of the NMD and the active RNA processing and
   translation-related functions in pathway analysis and cNMF clustering,
   we speculate that macrophages from the GG components may orchestrate
   pathways to disrupt protein translation. We examined the expression of
   G3BP2, which encodes core components of stress granules (SGs). We found
   that the G3BP2 level was significantly elevated in the GG components of
   mGGOs compared with that of normal lung tissue, and the expression
   level was higher in the air-end tissues than in the stromal end tissues
   of mGGOs ([148] Figures 6D, E ). The formation of SGs has been
   discovered during tumorigenesis and can be induced by viral infections
   as well as the transcriptional/translational burden caused by mutations
   ([149]45, [150]46). In a nutshell, macrophages in the GG components
   were under the highest level of stress in comparison with normal lung
   tissues and the S components.

Cancer Cell-Related Interactions Suggest Different Tumor Development Patterns
in Different Mixed Ground-Glass Opacity Components

   Due to the obvious difference in gene expression patterns among cancer
   cells in different mGGO components, we hypothesize that such
   discrepancies could lead to distinct interactions between cancer cells
   and the TME. We investigated cancer cell-specific interaction pairs in
   different mGGO components by CellphoneDB ([151] Figure 7A ). The S
   component samples were enriched with angiogenesis-related interactions,
   with high expression of VEGFA and FLT1 (encoding VEGFR1) in cancer
   cells and macrophages, respectively. An in vitro study has verified
   that M2-type macrophages highly expressing FLT1 assist epithelial cells
   in rapid angiogenesis ([152]47). Furthermore, HLA-F was upregulated in
   cancer cells from the S components, which was shown to be correlated
   with the upregulation of LILRB1/LILRB2 in macrophages ([153] Figure 7B
   ). In addition, compared to cancer cells in the GG components, cancer
   cells in the S component samples exhibited more significant potential
   interactions with DCs, mainly via adhesion molecules such as ICAM1 and
   F11R. At the same time, the adhesion molecule of CD58 that interacted
   with T-cell receptor CD2 was highly expressed in cancer cells,
   indicating higher immunogenicity in the S component samples, which
   activated T cells for its cytotoxicity ([154]48). The highly expressed
   TNF in macrophages in the S components was associated with upregulated
   DAG1 in cancer cells, which regulated cell growth and apoptosis
   ([155]49). Taken together, the interactions between cancer cells and
   immune cells showed an intricate regulation network in the S
   components, including immune-suppressive interactions such as
   HLA-F_LILRB1/LILRB2 ( [156]50, [157]51) and RPS19_C5AR1 ( [158]52) and
   immune-activating interactions such as DAG1_TNF ( [159]53) and CD58_CD2
   ( [160]54). Also, immune cells especially macrophages are more likely
   to promote cancer development in S areas through highly specific
   interaction pairs such as SORT1_GRN ( [161]55) and PGRMC2_CCL4L2 (
   [162]56).

Figure 7.

   [163]Figure 7
   [164]Open in a new tab

   Cancer cell-related interactions suggest different tumor development
   patterns in different mGGO components. (A) The bubble plot of cancer
   cell-related ligand–receptor pairs reveals S and GG preferences. The