Abstract Brain metastasis occurs in approximately 30% of patients with lung adenocarcinoma (LUAD) and is closely associated with poor prognosis, recurrence, and death. However, dynamic gene regulation and molecular mechanism driving LUAD progression remain poorly understood. In this study, we performed a comprehensive single-cell transcriptome analysis using data from normal, early stage, advanced stage, and brain metastasis LUAD. Our single-cell-level analysis reveals the cellular composition heterogeneity at different stages during LUAD progression. We identified stage-specific risk genes that could contribute to LUAD progression and metastasis by reprogramming immune-related and metabolic-related functions. We constructed an early advanced metastatic dysregulated network and revealed the dynamic changes in gene regulations during LUAD progression. We identified 6 early advanced (HLA-DRB1, HLA-DQB1, SFTPB, SFTPC, PLA2G1B, and FOLR1), 8 advanced metastasis (RPS15, RPS11, RPL13A, RPS24, HLA-DRB5, LYPLA1, KCNJ15, and PSMA3), and 2 common risk genes in different stages (SFTPD and HLA-DRA) as prognostic markers in LUAD. Particularly, decreased expression of HLA-DRA, HLA-DRB1, HLA-DQB1, and HLA-DRB5 refer poor prognosis in LUAD by controlling antigen processing and presentation and T cell activation. Increased expression of PSMA3 and LYPLA1 refer poor prognosis by reprogramming fatty acid metabolism and RNA catabolic process. Our findings will help further understanding the pathobiology of brain metastases in LUAD. Key words: MT: Bioinformatics, lung adenocarcinoma, cancer dynamic network, tumor heterogeneity, biological function, prognosis Graphical abstract graphic file with name fx1.jpg [51]Open in a new tab __________________________________________________________________ Wang et al. provided an integrative approach to characterize dynamic dysregulated networks and identified stage-specific risk genes during LUAD progression from early stage, advanced stage to metastatic stage. They reported 16 stage-specific risk genes as prognostic biomarkers, which could contribute to LUAD progression and metastasis by reprogramming immune-related and metabolic-related functions. Introduction Lung cancer is the cancer with the fastest increasing morbidity and mortality, and it is also one of the malignant tumors threatening the health and life of the population. The 2 main subtypes of lung cancer are small cell lung cancer and non-small cell lung cancer (NSCLC),[52]^1 histologically divided into lung adenocarcinoma (LUAD), squamous cell carcinoma, and large cell carcinoma, among which LUAD is the most common histological subtype.[53]^2 If detected early, surgical resection of NSCLC has a good prognosis.[54]^3 However, approximately 75% of patients have advanced disease (stage III/IV) at the time of diagnosis.[55]^4 Because of the mutation of cancer cells, the invasion degree is increased.[56]^5^,[57]^6 The cells separate from the primary tumor and travel through blood vessels to other organs. Brain metastasis is the leading cause of prognosis, recurrence, and death in NSCLC patients, of whom approximately 20%–40% eventually develop brain metastasis.[58]^7 Advanced cancer and brain metastasis can shorten overall survival and decrease quality of life. However, the spec ific aspects of the different stages of LUAD progression remain poorly understood. It has become an urgent task to study the molecular characteristics of different stages of LUAD progression and further analyze the molecular mechanisms of cancer in search of new therapeutic targets. Cancer is a multicellular community composed of malignant epithelial cells and different types of non-malignant immune and stromal cells, which show dynamic interactions.[59]^8 Numerous studies have shown significant tumor heterogeneity of both genetic and cellular compositions between primary and metastatic tumors.[60]^9 Advanced tumors consist of subclones with various genetic alterations and functional roles.[61]^10 Tumor heterogeneity and the complex cellular architecture play a crucial role in cancer progression and response to treatment.[62]^11 However, the dynamic molecular features that characterize the contributions of tumor heterogeneity to malignant progression, metastasis, and poor survival are largely unknown. Many different cell types in our bodies express unique transcriptomes. Traditional bulk sequencing can only provide an average expression signal for a collection of cells, whereas single-cell RNA sequencing (scRNA-seq) isolates all cells in a tissue sample and sequences each cell in turn. scRNA-seq can be used to carry out non-biased high-throughput studies with the smallest samples, measure gene expression levels more accurately, and have greater sensitivity in quantifying rare mutations and transcripts.[63]^12 There is increasing evidence that gene expression is heterogeneous, even in similar cell types.[64]^13^,[65]^14 scRNA-seq can reveal the complicated and rare cells, adjust the relationship between genes, and reveal the trajectory in the development of different cell lineages.[66]^15 In summary, we integrated scRNA-seq datasets from lung tissues, early stage, advanced stage, and brain metastasis LUAD tissues to characterize dynamic dysregulated networks, identify stage-specific risk genes and explores their prognostic value during LUAD tumor progression and metastasis ([67]Figure 1). This study revealed the dynamic molecular characteristics and functions of LUAD in the progression of early advanced brain metastasis, providing an important theoretical basis for the molecular mechanism research and targeted therapy of LUAD. Figure 1. [68]Figure 1 [69]Open in a new tab Molecular mechanism and functional analysis of different stages of LUAD progression Results Identification of cellular components at different stages of LUAD progression To explore the changes in cell composition during the progression of LUAD, we obtained scRNA-seq data from different tissue sources, including normal lung tissue (normal lung), early primary LUAD tissue (early LUAD), advanced primary LUAD tissue (advanced LUAD), and brain metastasis tissue (metastatic LUAD) ([70]Figure 2A). We cataloged the tissue cells into 9 cell lineages based on the expression levels of canonical marker genes[71]^16 ([72]Figure 2B, [73]Table S1). These include epithelial cells, stromal cells (fibroblasts, endothelial cells), immune cells (T lymphocytes, natural killer [NK] cells, B lymphocytes, myeloid cells, and mast cells), and oligodendrocytes. Except for oligodendrocytes, which were only present in metastatic LUAD, other cell types were recognized in normal lung, early LUAD, advanced LUAD, and metastatic LUAD ([74]Figures 2C–2E). Compared with normal lung tissue, T lymphocytes and B lymphocytes were significantly enriched and NK cells were decreased in primary LUAD tissues (early LUAD and advanced LUAD, respectively). The results indicated that the adaptive immune process of the body was gradually activated with the occurrence and development of cancer. In addition, compared with normal lung tissue, it was found that the percentage of epithelial cells increased significantly in primary LUAD tissue and metastatic LUAD, which were 8.0%, 18.9%, and 45.4%, respectively ([75]Figure 2F). The results indicated that the tumor microenvironment showed high cellular heterogeneity and dynamically changing after LUAD progression and metastasis. By analyzing the changes in cell composition caused by the occurrence, development, and invasion of LUAD, it was found that the components of different cell types were significantly different in the process of normal primary LUAD brain metastasis, revealing the difference of tissue-specific cell population. The signature genes of all cell types showed stable expression patterns, and each classic marker gene was only expressed in the corresponding cell types identified by us, independent of the stage of cancer development ([76]Figure 2G). These results indicate that there are significant differences in the proportion of cell components in different stages of LUAD progression. Figure 2. [77]Figure 2 [78]Open in a new tab Identification of cellular components of scRNA-seq data from different stage of LUAD progression (A) Overview of single-cell data from different tissues obtained from GEO. (B) tSNE map of 129,277 single cells. (C–E) tSNE plots of single cells in different tissues, colored by cell types. (F) The proportion of each cell types in different tissues. (G) Bubble plots of the average expression levels of classic marker genes in nine cell types from different tissue sources. Heterogeneity of tumor cells in the pathological process of early stage, advanced stage to brain metastasis of LUAD To characterize the heterogeneity of tumor cells and identify stage-specific risk genes in LUAD tissues, we identified the malignant tumor cells in the early stage, advanced stage, and brain metastasis LUAD ([79]Figures 3A and 3B). Using stromal cells (fibroblasts and endothelial cells) and immune cells (T cells, NK cells, B cells, myeloid cells, and mast cells) as reference cells, inferCNV analysis was applied to infer large-scale copy number variations based on the scRNA-seq data. We identified 2,071, 4,103, and 7,448 LUAD malignant epithelial cells in early stage (early MECs), advanced stage (advanced MECs), and brain metastatic stage (metastatic MECs), respectively. We divided epithelial cells in lung tissue into 4 subpopulations based on canonical marker gene expression, namely, alveolar type I cells (AGER), alveolar type II cells (SFTPC/LAMP3), Club cells (SCGB1A1), and ciliated cells (FOXJ1/RFX2) ([80]Figures 3C and 3D). Alveolar cell type II is the cell type that is enriched in normal lung tissue ([81]Figure 3D). Alveolar type II cells are components of the pulmonary epithelium and serve as progenitor cells of alveolar type I cells to promote the repair and regeneration of the alveolar epithelium.[82]^17 A trajectory analysis was performed on epithelial cells in normal lung tissue and early MECs to assess inter-cellular heterogeneity ([83]Figure 4A). We found that alveolar type I cells were mainly located in state 4, alveolar type II cells were mainly located in state 5, ciliated cells were located in state 3, and state 3 was almost full of ciliated cells. Most of the early MECs were in state 1, and most of the cells in state 1 also belonged to the early MECs ([84]Figure 4B). By comparing early MECs with normal epithelial cells in cell state 1, we identified 133 normal-early risk genes, including 78 significantly upregulated genes and 55 significantly downregulated genes (fold change [FC] > 2, false discovery rate [FDR] <0.05). Functional analysis showed that normal-early risk genes were mainly enriched in respiratory gaseous exchange by respiratory system, antigen processing and presentation and T cell activation ([85]Figure S1A). Downregulated genes in early MECs were mainly enriched in respiratory gaseous exchange by respiratory system and tissue homeostasis ([86]Figure 4C). The significantly upregulated genes in early MECs were mainly enriched in immune-related functions, such as leukocyte activation, T cell activation, negative regulation of endopeptidase activity, and negative regulation of peptidase activity. Protease activity has been reported to be pivotal in executing antigen receptor responses and lymphocyte function.[87]^18 The dysregulation of protease activity and immune-related genes were related with cancer risk.[88]^19^,[89]^20 Figure 3. [90]Figure 3 [91]Open in a new tab Identification of MECs at different stages of LUAD progression (A, B) Heatmap showing large-scale CNV of epithelial cells in primary LUAD (A) and metastatic LUAD (B). (C) tSNE plots of normal epithelial subclusters, colored by cell types. Alveolar type I cells (AT1), alveolar type II cells (AT2), club cells (Club), and ciliated cells (Ciliated). (D) The bar chart details the proportions of different cell types. Figure 4. [92]Figure 4 [93]Open in a new tab Pseudotemporal trajectory analysis revealing heterogeneity of tumor cells in different stages of LUAD (A) Developmental trajectory representation of normal epithelial cells and early MECs along inferred pseudotime by Monocle. Each point corresponds to a single cell, colored according to cell type (top) and cell state (bottom). (B, E, and H) Major cell state composition of each cell type (top) and cell types distribution in each cell state (bottom). (C, F, and I) The bubble plots show significantly enriched biological processes based on the analysis of the DEGs in normal-early LUAD (C), early-advanced LUAD (F), and advanced-metastatic LUAD (I). (D) Developmental trajectory of early MECs and advanced MECs. (G) Developmental trajectory of advanced MECs and metastatic MECs. Trajectory analysis was performed on early MECs and advanced MECs to assess inter-cellular heterogeneity ([94]Figure 4D). The results showed that the early MECs were mainly located in cell state 2, and most of the cell types in cell state 2 were early MECs ([95]Figure 4E). Advanced MECs were mainly located in cell state 1, and most of the cells in cell state 1 were also advanced MECs. Therefore, cell state 2 is the specific cell state in the early stage of LUAD, and cell state 1 is the specific cell state in the advanced stage of LUAD. By comparing advanced MECs with early MECs cells, we identified 151 early advanced risk genes. Early advanced risk genes were mainly enriched in immune-related functions, such as antigen processing and presentation, humoral immune response, and T cell activation ([96]Figure S1B). Furthermore, we identified 124 DEGs in state 1, 33 DEGs in state 2, and 283 DEGs in state 3. We found that the DEGs of cell state 1 are mainly associated with neutrophil degranulation. The DEGs of cell state 2 were mainly related to regulation of endogenous peptidase activity, regulation of peptidase activity, cell chemotaxis, and cell-cell adhesion. The DEGs of cell state 3 were mainly related to cellular detoxification, response to hydrogen peroxide, and cell response to toxic substances ([97]Figure 4F). State 1 is a specific branch of advanced MECs in the early advanced trajectory, which is significantly enriched in neutrophil degranulation. Neutrophils have been reported to play an important role in cancer occurrence and progression.[98]^21 Granule protein released by neutrophil degranulation is associated with tumor progression and may promote the development of LUAD.[99]^22 Intercellular heterogeneity was evaluated by a trajectory analysis of advanced MECs and metastatic MECs ([100]Figure 4G). The results showed that advanced MECs were mainly located in cell state 2, and most of the cell types in cell state 2 were advanced MECs. Metastatic MECs were mainly located in cell state 3, and most of the cells in cell state 3 were also metastatic MECs ([101]Figure 4H). Therefore, cell state 2 is the specific cell state of advanced stage of LUAD, and cell state 3 is the specific cell state of brain metastasis of LUAD. By comparing advanced MECs with metastatic MECs cells, we identified 279 advanced metastatic risk genes. Functional analysis showed that advanced metastatic risk genes significantly influenced immune response-related and metabolic-related functions, such as antigen processing and presentation, T cell activation, fatty acid metabolic process, RNA catabolic process, and response to hypoxia ([102]Figure S1C). For example, LYPLA1 was significantly upregulated from advanced to metastatic LUAD, which is involved in fatty acid metabolic processes. Recent studies have reported that reprogramming fatty acid metabolism could promote lymph node metastasis in cervical cancer.[103]^23 Furthermore, we identified 83 DEGs in cell state 1, 73 DEGs in cell state 2, and 97 DEGs in cell state 3. We found that DEGs in cell state 1 and cell state 2 are mainly related to regulation of endogenous peptidase activity, regulation of peptidase activity, and protein targeting to membrane. The DEGs of cell state 3 are mainly related to the regulation of exogenous signaling pathways through death domain receptors, positive regulation of cell substrate adhesion, and platelet degranulation ([104]Figure 4I). State 3 is a unique branch of metastatic MECs in the advanced metastatic trajectory, which is significantly enriched in biological functions such as platelet degranulation. The interaction between cancer cells and activated platelets has been reported to promote the metastatic behavior of tumor cells.[105]^24^,[106]^25^,[107]^26 Metastatic MECs in state 3 may contribute to brain metastasis of LUAD through modulation of platelet degranulation. Our results demonstrated differences in transcriptional trajectories, revealing different cell expression patterns at different stages of LUAD progression. Construction and functional analysis of early advanced metastatic dynamic dysregulated network in LUAD To characterize the dynamic changes during LUAD progression at the single-cell level, we constructed early advanced and advanced metastatic dysregulated networks by integrating interaction relationships mediated by risk genes ([108]Figures 5A and 5B). The early advanced dysregulated network involved 151 risk genes and 83 dysregulated interactions. The advanced metastatic dysregulated network involved 279 risk genes and 570 dysregulated interactions. Then, an early advanced metastatic dynamic dysregulated network was constructed by comparing early advanced and advanced metastatic dysregulated interactions ([109]Figure 5C). As compared with early advanced dysregulated network, the advanced metastatic dysregulated network gained interactions 567 (73.6%) and lost 80 interactions (10.4%). These results showed the dynamic rewiring of network connections during different stages of LUAD progression. Figure 5. [110]Figure 5 [111]Open in a new tab Construction and functional analysis of early advanced metastatic dynamic dysregulated network (A) Early advanced dysregulated network. (B) Advanced metastatic dysregulated network. (C) Dynamic network of early advanced brain metastasis in LUAD. In the network, the greenyellow node is early advanced risk gene, the red node is advanced metastatic risk gene, and the blue nodes is early advanced metastatic risk gene. (D) Key modules identified in the network by MCODE. (E) Histogram of the proportion of different types of genes in key modules. (F) Significant enrichment of the GO terms for module genes. Filled colors from blue to red represent adjusted p values. We used the MCODE method to cluster the closely related nodes in the dynamic dysregulated network of early advanced brain metastasis, obtaining a total of 11 significant modules ([112]Figure 5D). We found that module 7 contained only DEGs from advanced LUAD to brain metastasis, and module 11 contained only DEGs from the early stage to the advanced stage ([113]Figure 5E). Functional enrichment analysis was performed for genes in the module ([114]Figure 5F). The results show that module 1 is mainly composed of RPL and RPS family genes, and plays an important role in tumor cell growth and proliferation. Module 2 correlated with antigen processing and presentation, consisting of both MHC-I (HLA-A, HLA-B, and HLA-C) and MHC-II (HLA-DRA, HLA-DBQ1, HLA-DRB1, and HLA-DRB5) molecules. These genes are downregulated in LUAD progression, indicating increased tumor immune escape and decreased T cell recognition and activation, further confirming the decreased immunogenicity of cancer cells. Module 3 correlated with cellular responses, in which chemical signals bind to the corresponding receptors to induce events within the cell that ultimately change its behavior. Module 4 correlated with the regulation of nerve cell maturation, and most of module 4 genes are DEGs from advanced LUAD to brain metastasis. Module 5 correlated with the function of normal epithelial cells, such as gas exchange through the respiratory system and tissue homeostasis, suggesting that epithelial cells still function as normal epithelial cells during the progression of LUAD. Module 8 correlated with ATP synthesis, oxygen electron transport chains, and oxidative phosphorylation. Module 9 correlated with the importation of inorganic ions through the plasma membrane, and the transmembrane transport of inorganic ions is closely related to a variety of life processes. Later, DAVID was used to enrich the KEGG pathway of module genes and identify the sub-pathway where the module genes were located, allowing us to reconstruct the risk pathway dynamically related to early advanced brain metastasis of LUAD ([115]Figure 6). In many pathological conditions, such as during autoimmune, cancer and infection, major histocompatibility complex (MHC) provides peptides to T cells leading to the initiation of adaptive immune responses.[116]^27 The results showed that MHC-I (HLA-A, HLA-B, and HLA-C) and B2M were significantly enriched in the upstream of the T cell receptor signaling pathway to regulate CD8 cell killing target cells and regulate the activity of natural killer cells. MHC-II (HLA-DQB1, HLA-DRB1, HLA-DRB5, and HLA-DRA) molecules are significantly enriched in the upstream of the T cell receptor signaling pathway and regulate the secretion of cytokines by CD4 T cells and the activation of other immune cells. In this study, low expression of MHC-I (HLA-A, HLA-B, and HLA-C) molecules and MHC-II (HLA-DQB1, HLA-DRB1, HLA-DRB5, and HLA-DRA) molecules inhibited antigen processing and presentation, indicating increased tumor immune escape and thus promoting tumor progression. Dysregulation of the cell cycle leads to uncontrolled cell proliferation, one of the molecular hallmarks of tumorigenesis. The cell cycle is a well-organized mechanism and is tightly controlled by regulatory molecules such as cyclins, cyclin-dependent kinases and their inhibitors. Cyclin D1 (CCND1), a key regulatory protein, controls the transition from G1 to S phase during cell division.[117]^28 In this study, CCND1 is highly expressed in the progression of LUAD and is enriched in the downstream of the pathway, which can promote cell cycle progression and tumor development. Ferroptosis is a novel iron-dependent programmed cell death mode.[118]^29 It plays a key role in inhibiting tumorigenesis by removing cells deficient in key nutrients or damaged by infection or environmental stress in the environment.[119]^30 Ferroptosis-protection genes[120]^31 (AKR1C1, AKR1C2, and AKR1C3) are highly expressed in advanced metastatic LUAD and are significantly enriched in the upstream of chemical carcinogenesis-reactive oxygen pathway, promoting the progression of lung cancer. GNB2, JUN, and FOS are significantly enriched in MAPK signaling pathway, and JUN and FOS are enriched in the downstream of the pathway, promoting cancer cell proliferation, angiogenesis, and cancer metastasis. These results showed reveals the dynamic changes in gene regulation and function during different stages of LUAD progression. Figure 6. [121]Figure 6 [122]Open in a new tab Stage-specific risk genes were associated with pathways contributing to LUAD progression Stage-specific risk genes affect multiple important biological pathways. Filled colors represent stage-specific risk genes, with greenyellow for early advanced risk genes, red for advanced metastatic risk genes, and blue for early advanced metastatic risk genes. Identification of prognostic markers in the dynamic dysregulated network at different stages of LUAD progression To identify potential prognostic markers at different stages of LUAD progression, we performed survival analysis using 88 module genes based on the GEPIA2 database. As a result, we identified 16 stage-specific risk genes as prognostic markers of LUAD, including 6 early advanced risk genes (HLA-DRB1, HLA-DQB1, SFTPB, SFTPC, PLA2G1B, and FOLR1) and 8 advanced metastatic risk genes (RPS15, RPS11, RPL13A, RPS24, HLA-DRB5, LYPLA1, KCNJ15, and PSMA3), 1 common risk gene SFTPD in normal early and advanced metastatic stages, and 1 common risk gene HLA-DRA in in normal early, early advanced, and advanced metastatic stages ([123]Figures 7 and [124]8). For example, HLA-DRB1, SFTPC, and FOLR1 showed significantly decreased expressions from the early to the advanced stage of LUAD ([125]Figures 7 and [126]8). Decreased expression of HLA-DRB1 (p = 0.01), SFTPC (p = 3.0e-04), and FOLR1 (p = 0.02) were associated with poorer prognosis in LUAD ([127]Figure 7). HLA-DRB1 can bind to receptors on CD4^+ T cells to regulate the differentiation and proliferation of cytotoxic T lymphocytes and other lymphocyte subsets.[128]^32^,[129]^33 Previous studies have shown that lower expression of HLA-DRB1 in patients with advanced stage than early stage of NSCLC.[130]^34 Previous studies have demonstrated that the loss of SFTPC is more likely to be a marker associated with advanced lung tumors. The downregulation of SFTPC can promote cell proliferation and predict a poor survival rate in LUAD.[131]^35 FOLR1 was reported to be decreased in advanced p-stage than in early p-stage of NSCLC and its higher expression is associated with a better prognosis in early stage NSCLC.[132]^36 We found that RPS15, RPS11, RPL13A, and RPS24 were significantly downregulated from advanced to metastatic LUAD. These genes are involved in the EIF2 signaling pathway that related to angiogenesis and tumor metastasis.[133]^37 Patients with the lower expression of RPS15, RPS11, RPL13A, and RPS24 genes had a better prognosis. HLA-DRB5 is downregulated in the progression of brain metastasis, which is relevant to the development of cancer and metastatic progression,[134]^38 and the low expression of HLA-DRB5 is associated with poor prognosis. We found that LYPLA1, KCNJ15, and PSMA3 were significantly upregulated from advanced to metastatic LUAD. High expression of LYPLA1 and PSMA3 were associated with a poor prognosis in LUAD. LYPLA1 has been reported to be significantly differentially expressed between early and late recurrence groups in colorectal cancer.[135]^39 Inhibition of LYPLA1 expression can inhibit cell proliferation and migration in NSCLC.[136]^40 KCNJ15 has been reported to be associated with the epithelial-mesenchymal transition.[137]^41 PSMA3 is involved in some metabolic pathways and processes, and plays an important role in the occurrence, development, and metastasis of cancer.[138]^42 The expression of HLA-DRA demonstrated downregulation from normal tissue, early stage to advanced stage, and lowest in brain metastasis of LUAD ([139]Figure 8). A lower expression of HLA-DRA in LUAD tumors was associated with reduced patient survival (log-rank test, p = 1.5e-3) ([140]Figure 7). Recent studies have shown that reduced HLA-DRA correlated with a lower immune enrichment score of CD4^+ T cells[141]^43 and could predict the response to anti-programmed death-1 immunotherapy in NSCLC.[142]^44 Furthermore, we examined the association of these prognostic genes with the drug sensitivity profiles of NSCLC chemotherapeutic drugs in 135 NSCLC cell lines. We found that prognostic genes showed significant differential expression in mRNA and protein level in sensitive vs. resistant NSCLC cell lines (2-sample t-tests; p < 0.05). For example, HLA-DRA mRNA expression was associated with sensitivity to gemcitabine and HLA-DRB1 mRNA expression was associated with resistance to carboplatin. The protein expression of HLA-DRA was associated with resistance to paclitaxel and sensitivity to gefitinib.[143]^45 Altogether, stage-specific risk genes could serve as prognostic markers and play important roles in different stages of LUAD progression. Figure 7. [144]Figure 7 [145]Open in a new tab Identification of stage-specific prognostic genes during LUAD metastatic progression Comparison of overall survival among patients with high (red) or low (blue) expression levels of risk genes by Kaplan–Meier analysis (with log rank values) in the cohort of LUAD cancer patients from TCGA. Figure 8. [146]Figure 8 [147]Open in a new tab The dynamic changes in different stages of LUAD progression Dynamic changes in tumor microenvironment cell composition (top), dysregulated biological functions (middle), and stage-specific risk genes (bottom) in different stages of normal lung, early LUAD, advanced LUAD, and metastatic LUAD. Discussion LUAD is the most common subtype of lung cancer, which has a poor prognosis. In recent years, increasing research progress on LUAD have been achieved by facilitating the analysis of single-cell transcriptome sequencing. For example, single-cell transcriptomic studies revealed substantial heterogeneity in cellular composition of the tumor microenvironment[148]^8^,[149]^46^,[150]^47 and molecular properties in primary and metastatic LUAD.[151]^48^,[152]^49 Liu et al. demonstrated intra-patient and intra-tumor heterogeneity in the regulation of pathways related to LUAD tumor progression.[153]^9 Many studies have characterized the cell differentiation trajectory and topological architecture of LUAD evolution.[154]^50^,[155]^51 Several studies have started to characterize the molecular and cellular dynamics at different stages of LUAD progression.[156]^16^,[157]^50^,[158]^52^,[159]^53^,[160]^54 However, the dynamic gene regulation and molecular mechanism driving the progression of LUAD from early stage, advanced stage to metastatic stage have not been comprehensively elucidated. Therefore, we provided an integrative approach to characterize dynamic dysregulated networks and identify stage-specific risk genes during LUAD progression from early stage, advanced stage, to metastatic stage. First, a trajectory analysis of MECs at different stages showed the increased transcriptional heterogeneity during LUAD progression and metastasis. We also identified LUAD stage-specific risk genes by comparing the tumor cell populations from different LUAD stage. Next, we constructed and comprehensively analyzed the early advanced metastatic dynamic dysregulated network in LUAD. We identified several functional modules, which involved in many distinct cancer-related biological functions and pathways. Previous studies have shown that various signaling pathways are associated with the occurrence and metastasis of NSCLC, including MAPK signaling pathway[161]^55^,[162]^56 and T cell receptor signaling pathway,[163]^57^,[164]^58^,[165]^59 which was also confirmed by our study. Finally, we identified 6 early advanced risk genes, 8 advanced metastatic risk genes, and 2 common risk genes in different stages, which can be used as prognostic biomarkers in LUAD. For example, SFTPB showed significantly decreased expressions from early to advanced stage of LUAD. Decreased expression of SFTPB was associated with poorer prognosis in LUAD (p = 3.0e-04) ([166]Figure 7). SFTPB has been reported to be used as potential blood biomarkers for distinguishing LUAD from lung squamous cell carcinoma.[167]^60 These results highlight a crucial role of stage-specific risk genes in the development of LUAD. In summary, we integrated scRNA-seq datasets from lung tissues, early stage, advanced stage, and brain metastasis LUAD tissues to characterize dynamic dysregulated networks, identify stage-specific risk genes and explores their prognostic value during LUAD tumor progression and metastasis. Our results reveal tumor heterogeneity in cellular composition, gene expression and biological functions at different stages of LUAD progression. We also report 16 stage-specific risk genes that could serve as prognostic biomarkers for LUAD ([168]Figure 8). This study provides an in-depth analysis of the dynamic molecular characteristics and functions in the progression of early-advanced-brain metastasis in LUAD, providing an important theoretical basis for the molecular mechanism research and targeted therapy of LUAD. Materials and methods scRNA-seq data preprocessing Single-cell transcriptome data of LUAD were obtained based on GEO database (GEO: [169]GSE131907).[170]^16 The sequencing data contained 208,506 cells from 58 LUAD samples from 44 patients. These cells were derived from normal lung tissue, lymph nodes, primary tumor tissue, brain metastasis, and pleural effusion. In this study, cells from normal lung tissue (normal lung), early primary LUAD tissue (early LUAD), advanced primary LUAD tissue (advanced LUAD), and brain metastasis tissue (metastatic LUAD) were selected, including 42,995 cells from normal lung tissue, 45,149 cells from early primary LUAD tissue, 12,073 cells from advanced primary LUAD tissue, and 29,060 cells from brain metastasis. scRNA-seq data obtained from GEO were preprocessed according to different tissue sources using R package Seurat ([171]https://satijalab.org/seurat/),[172]^61 which is widely used in preprocessing, dimension reduction, and cell clustering for scRNA-seq data. R package Seurat was used to calculate the percentage of mitochondrial reads. Cells that had unique feature counts higher than 5,000 or less than 200 or a single cell with more than 10% mitochondrial genes were filtered out. After moving the low-quality cells, the filtered single cell data was normalized by the NormalizeData function. Unsupervised reduction and clustering We used the RunPCA function of the Seurat R package to determine the principal components (PCs) of cells at different stages of LUAD, and used the ElbowPlot function to visualize PCs. Furthermore, cells with similar expressions were clustered by the FindClusters function, and the shared nearest neighbor algorithm was used to identify cell clusters. Finally, the identity of each cluster was identified through t-distributed random neighborhood embedding visualization of cell clusters, combined with the expression values of known marker genes of various cell clusters. Copy number variation was inferred based on scRNA-seq data Cancer cells were identified by calculating single-cell copy number variations (CNVs) using the inferCNV algorithm,[173]^62 which is used to explore the expression intensity of genes across positions of tumor genomes in comparison to a set of reference normal cells to determine somatic changes.[174]^63 First, non-malignant cells, including stromal cells and immune cells, were selected as reference cells to estimate the CNV of malignant cells. For epithelial cells treated by inferCNV, we selected 2 parameters to summarize CNV signals[175]^16: (1) mean squares: the mean squares of estimated CNV signals across all genomic locations; (2) copy number correlation coefficient: the Pearson correlation coefficient between CNV signal and the average CNV signal of the top 5% of cells. Epithelial cells with abnormal CNV signals (mean square of >0.02 and copy number correlation coefficient of >0.4) were defined as malignant tumor cells. Single-cell trajectory construction and analysis Monocle is an R package for analyzing scRNA-seq, which can arrange cells in simulated chronological order to show their developmental trajectories such as cell differentiation and other biological processes.[176]^64 In our study, Monocle was used to construct cell trajectories by integrating epithelial cells from normal lung tissues and MECs from inferCNV inferred early primary LUAD, advanced primary LUAD, and brain metastasis tissues. The first step is to select genes that provide important information for defining trajectory progression, also known as feature selection. Then, the above gene sets were used to construct the data trajectory. We took the UMI counts of the gene-cell matrix as input to Monocle, and then its newCellDataSet function was called to create an object with the parameter expressionFamily = negbinomial.size. Then the differentialGeneTest function is used to calculate the differentially expressed genes (DEGs) of each cell type as the gene set to define the progression of the trajectory, and dimension reduction and cell ordering are carried out. Finally, the dynamic trajectory of epithelial cells in LUAD progression is deduced. Trajectory analysis will reveal the heterogeneity in gene expression and biological functions among MECs at different stages of LUAD, which is termed as different cell expression patterns. Differential expression analysis and functional analysis The FindMarkers function of the Seurat package was used to identify DEGs with the statistical threshold of FC of more than 2 and FDR of less than 0.05. R package clusterprofiler[177]^65 and DAVID ([178]https://david.ncifcrf.gov/) to gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis, respectively. Identification of stage-specific risk genes Genes that were significantly differentially expressed normal relative to early MECS with state 1 in the trajectory were defined as normal early risk genes. Genes that were significantly differentially expressed in state 1 relative to state 2 in the trajectory of early advanced LUAD were defined as early advanced risk genes. Genes that were significantly differentially expressed in state 3 relative to state 2 in the trajectory of advanced metastatic LUAD were defined as advanced metastatic risk genes. The common DEGs between early advanced stage and advanced metastatic stage were defined as early advanced metastatic risk genes. Construct a dynamic dysregulated network of early advanced brain metastasis of LUAD The early advanced dysregulated network is constructed by integrating interaction relationships mediated by early advanced risk genes. The advanced metastatic dysregulated network is constructed by integrating interaction relationships mediated by advanced metastatic risk genes. The early advanced metastatic dynamic dysregulated network is constructed by integrating early advanced and advanced metastatic dysregulated networks. The interaction relationships are obtained from the STRING database,[179]^66 and Cytoscape[180]^67 software was used to visualize the network. Identification of key modules in the dynamic dysregulated network Based on the dynamic network of early-advanced-brain metastasis, we used molecular complex detection (MCODE) (a plugin in Cytoscape) to identify the most significant modules in the network, and the parameters of MCODE are as follows: degree cutoff = 2, node score cutoff = 0.2, K-core = 2, and max depth = 100. Survival analysis We used the "Survival Analysis" component of GEPIA2[181]^68 ([182]http://gepia2.cancer-pku.cn/#index), which can be used to assess the relationship between gene expression and cancer prognosis based on gene expression levels and to evaluate hypotheses using log-rank tests, to detect OS (overall Survival) of risk genes in TCGA. The thresholds for distinguishing low expression group and high expression group were defined as cutoff-high (50%) and cutoff-low (50%), respectively, and genes with p-values of less than 0.05 were selected as prognostic markers. Data availability statement All data and codes in this study are available under proper request; please contact [183]wangli@hrbmu.edu.cn. Acknowledgments