Abstract Lung cancer has a high incidence rate and lung squamous cell carcinoma accounts for about 30% of lung cancer. Due to the lack of accurate diagnosis for early stage lung squamous cell carcinoma and the limitations of treatment methods, the clinical treatment effect of lung squamous cell carcinoma is not satisfactory.we used single-cell sequencing analysis, bioinformatics analysis, and Mendelian randomization analysisto look for risk factors that can predict the onset of lung squamous cell carcinoma, the prognosis of lung squamous cell carcinoma patients, and the “star” molecule—Fatty Acid Binding Protein 4 (FABP4) as a mediator of macrophage infiltration in the lung squamous cell carcinoma tumor microenvironment. Overall, this “star” molecule (FABP4) may play an important role in the future prediction and treatment of lung squamous cell carcinoma.We first clarified the role of FABP4 in the occurrence and prognosis of LUSC through single-cell sequencing analysis, bioinformatics analysis, and Mendelian randomization analysis. This provides important evidence for the prevention and treatment of LUSC in the future. Keywords: Single-cell RNA sequencing, Bioinformatics analysis, Mendelian random analysis, Fatty acid binding protein 4, Lung squamous cell carcinoma Introduction Mendelian random analysis is an analytical technique that can be used to investigate the relationship between exposure factors and outcome indicators while avoiding confounding, reverse causality, and a variety of biases [[34]1]. Mendelian randomization is an application of instrumental variable analysis aimed at testing causal hypotheses in non experimental data. In MR analysis, genetic variations (usually single nucleotide polymorphisms) are used as instrumental variables to infer risk factors. The MR principle refers to Mendel's second law, which states that when DNA is passed from parents to offspring during gamete formation, genetic alleles are independently separated. This is similar to randomized allocation therapy in RCTs, aimed at generating groups with similar clinical characteristics to reduce the risk of confounding. Single-cell RNA sequencing is a popular method for investigating cellular heterogeneity and functional variety in anatomical tissues. [[35]2–[36]6]. Lung cancer is one of the leading causes of cancer mortality in both China and the United States, and its prevalence has gradually increased [[37]7, [38]8]. Squamous cell carcinoma of the lung is currently the second most common subtype of all lung cancers in terms of incidence rate [[39]9]. The most common cause of lung squamous cell carcinoma is smoking [[40]10]. Currently, the most common treatments for lung squamous cell carcinoma are surgery, radiation therapy, chemotherapy, immunotherapy, and others [[41]11]. Because lung squamous cell carcinoma develops rapidly, there is an urgent need for early identification and cutting-edge treatment options. One of the most prevalent members of the intracellular Fatty acid-binding proteins (FABPs)family, fatty acid binding protein 4 (FABP4) may bind hydrophobic ligands to regulate lipid trafficking and is connected to advancement in many malignant cancers. For instance, lipid intake and tumor growth are inextricably linked. Exogenous FABP4 has been shown to improve fatty acid uptake and accelerate tumor cell development in prostate cancer and breast cancer via activating the PI3 K/AKT and MAPK pathways, according to research [[42]12, [43]13]. An essential factor in preventing tumor cell metastasis is preventing fatty acid absorption. In ovarian cancer, MIR-409-3P directly controls FABP4 and has an impact on the disease’s spread [[44]14]. When tumor cells absorb too many fatty acids, they can create lipid droplets in the cytoplasm that make them more resistant to chemotherapy and radiation [[45]15, [46]16]. FABP4 prevents the decrease of intracellular lipid droplets, which increases the sensitivity of tumor cells to carboplatin, according to investigations that have been conducted on animals [[47]17]. FABP4 also unquestionably contributes to the tumor microenvironment. Typically, immune cells, cancer cells, endothelial cells, and adipocytes make up the tumor microenvironment. Adipocytes are directly transferred from blood lipids to ovarian cancer cells as a result of the link between adipocytes and ovarian cancer cells in ovarian cancer, which encourages the growth of tumor cells both in vitro and in vivo. At the fat tumor cell interface, ovarian cancer cells were found to express FABP4 [[48]18]. Adipocytes are directly transferred from blood lipids to ovarian cancer cells as a result of the link between adipocytes and ovarian cancer cells in ovarian cancer, which encourages the growth of tumor cells both in vitro and in vivo. At the fat tumor cell interface, ovarian cancer cells were found to express FABP4 [[49]19]. By sending signals via the control of NFMOR-6/STT3, FABP4 expression in breast cancer contributes to the promotion of breast cancer development [[50]20]. Additionally, FABP4 inhibits NF by ubiquitinating ATPB in tumor-associated macrophages, which promotes the growth of neuroblastoma and B-IL1 pathways [[51]21]. In this study, we used FABP4 as the exposure factor and lung squamous cell carcinoma as the outcome measure. Through Mendelian randomization analysis, we found that FABP4 is a risk factor for the occurrence of lung squamous cell carcinoma. In addition, we used single cell RNA sequencing and bioinformatics analysis to evaluate the role of FABP4 in the incidence rate of lung squamous cell carcinoma and its impact on the immune microenvironment. These findings will help us understand the functional significance of FABP4 in lung squamous cell carcinoma and create more precise approaches for treating the disease. Materials and methods Mendelian random analysis In order to explore the hypothesized causal relationship between exposure factors and outcomes, we used the R software packages “TwoSampleMR” and “GSMR” to investigate the hypothesized causal relationship between exposure factors and outcomes, in order to obtain suggestive associations (P < 0.05), and conducted Mendelian random analysis on the causal relationship between exposure and outcomes using the main five MR methods (MR Egger, weighted median, inverse variance weighting, weighted mode, simple mode). We chose SNPs with genome-wide significance (P < 5 × 10 − 6) as instrumental variables. The F-statistic data for each tool is estimated by F = β 2/SE2, with a screening criterion of F-test value > 10. Heterogeneity analysis was evaluated through Inverse variance weighted and MR Egger tests, and Pvalue < 0.05 indicates the presence of heterogeneity in the study. The MR Egger intercept test can detect the pleiotropy of the data, and Pvalue < 0.05 indicates the presence of pleiotropy in the data. Finally, use the leave one method for sensitivity analysis (to determine the impact of each SNP on the Mendelian randomization analysis results, and if there are outliers, they need to be removed and reanalyzed). Source of transcriptome and single-cell RNA sequencing data We downloaded RNA sequencing data and clinical information of lung squamous cell carcinoma from the TCGA database, and also downloaded RNA sequencing data and clinical information of lung squamous cell carcinoma from the GEO database ([52]GSE73403). The scRNA-seq data of LUSC tumor ([53]GSM3304009)and normal ([54]GSM3304010) sample of [55]GSE117570 were downloaded from the GEO database. The Cancer Genome Atlas (TCGA)database: ([56]https://www.cancer.gov/ccg/). Gene Expression Omnibus (GEO) database: ([57]http://www.ncbi.nlm.nih.gov/geo/). Preprocessing of single-cell RNA sequencing datas We downloaded RNA sequencing data and clinical information of lung squamous cell carcinoma from the TCGA database, and also downloaded RNA sequencing data and clinical information of lung squamous cell carcinoma from the GEO database ([58]GSE73403). The single-cell sequencing data of lung squamous cell carcinoma is sourced from the single-cell sequencing data of [59]GSE117570 in the GEO database. Simultaneously use R software"Seurat package"[[60]22] to convert scRNA seq data into Seurat objects. The single-cell inclusion features we selected are: minimum gene count of 200, maximum gene count of 4500, minimum RNA count of 1000, maximum RNA count of 20,000, and maximum mitochondrial RNA percentage of 10%. Finally, we use the"NormalizedData"package for data normalization. Clustering and annotation of differentially expressed genes in single-cell RNA sequencing datas We use T-distributed stochastic neighbor embedding (t-SNE) [[61]23] for unsupervised clustering and unbiased visualization on the two-dimensional graph plane of cell subpopulations, and annotate cell subpopulations using the"SingleR"[[62]24] software package. The"FindAllMarkers"function is used to compare differentially expressed genes between different clusters, and the screening criteria for differentially expressed genes are: |log2 (fold change)|> 0.5 and adjusted P-value < 0.05. Cell communication and cell trajectory analysis We use cellphoneDB software to analyze the communication interactions of different cell subpopulations. To infer the developmental trajectory of each cell type, we conducted a pseudo temporal analysis using R package Monocle 3, and the results were clustered and visualized using the UMAP method. cellphoneDB: ([63]https://www.cellphonedb.org/). Monocle 3: ([64]https://cole-trapnell-lab.github.io/monocle3). Differential gene screening and pathway enrichment analysis Using the limma package for differential analysis, the screening conditions for differentially expressed genes are: |log2 (fold change)|> 1.5 and adjusted P-value < 0.05. Perform pathway enrichment analysis on differentially expressed genes using KEGG and GO. KEGG: ([65]https://www.kegg.jp/). GO: ([66]https://geneontology.org/). Survival analysis We conducted survival analysis using the"survey"function of the R survival package and used Cox regression analysis to determine the significance of the difference groups. Drug sensitivity analysis We use"oncoPredict"to predict the sensitivity of FABP4 gene high expression and low expression groups to drugs. Pan cancer analysis We analyzed the differential expression of FABP4 gene in cancer tissue and normal tissue in the Timer 2.0 database. We downloaded a standardized pan cancer dataset from the UCSC database and analyzed the differential expression of FABP4 in different clinical stages of other tumors, as well as its impact on the survival of other cancers. TIMER 2.0 database: ([67]http://timer.cistrome.org/). UCSC database: ([68]https://xenabrowser.net/). Statistical analysis Our data in this study (represented by means ± SDs) were statistically analyzed using GraphPad Prism 9 and tested for statistical significance using one-way ANOVA. A p-value < 0.05 is considered statistically significant. The significant statistical differences are expressed as *P < 0.05, **P < 0.01, ***P < 0.001, and **** P < 0.0001. Non significant statistical differences are expressed as"ns". Results Single cell subpopulation clustering and annotation We obtained single-cell sequencing data from [69]GSE117570 for lung squamous cell carcinoma and normal lung tissue. After quality control, standardization, and dimensionality reduction using the PCA method, all cells were classified into 8 subgroups using tSNE clustering and known marker genes, and annotated as 6 cell types: epithelial cells, endothelial cells, T cells, B cells, monocytes, and macrophages (Fig. [70]1A, B). Finally, we analyzed the characteristic genes and differential genes of six cell subtypes (Fig. [71]1C, D). Fig. 1. [72]Fig. 1 [73]Open in a new tab Cell clustering and annotation (A, B). The main expressed genes and differentially expressed genes (C, D) in six types of cells Single cell subpopulation maker gene annotation We identified the maker genes of epithelial cells, endothelial cells, T cells, B cells, monocytes, and macrophages, and analyzed the expression of these genes in these six subtypes of cells. We found that KRT6 A, CAPS, BPIFA1, and SFTPC are maker genes of epithelial cells and are highly expressed in epithelial cells. G0S2 is the maker gene of monocytes and is highly expressed in monocytes. FABP4 is the maker gene of macrophages and is highly expressed in macrophages. CD69 is the maker gene of T cells and is highly expressed in T cells. MGP is the maker gene of endothelial cells and is highly expressed in endothelial cells, IGLC2 is the maker gene of B cells and is highly expressed in B cells (Fig. [74]2A–R). Fig. 2. [75]Fig. 2 [76]Open in a new tab Maker genes of six types of cells and their expression in each type of cell (A–R) Single factor and multiple regression analysis of the Maker gene We integrated the maker gene sets KRT6 A, G0S2, CAPS, FABP4, BPIFA1, CD69, MGP, SFTPC, and IGLC of lung squamous cell carcinoma, and conducted univariate regression analysis based on the survival time and status of lung squamous cell carcinoma patients. Next, we conducted a multivariate regression analysis on the gene sets with P < 0.05 in the univariate regression. Combining univariate and multivariate regression analysis, we found that FABP4 is an independent factor affecting the prognosis of lung squamous cell carcinoma patients (Fig. [77]3A, B). We found that in lung squamous cell carcinoma patients in the [78]GSE73403 and TCGA databases, patients with high expression of FABP4 A had shorter survival compared to those with low expression of FABP4 (Fig. [79]3C, D). Fig. 3. [80]Fig. 3 [81]Open in a new tab Univariate and multivariate regression analysis of maker genes (A, B). The effect of FABP4 expression on the survival of lung squamous cell carcinoma (C, D) Pathway enrichment analysis and drug sensitivity analysis We used the R software package limma (version 3.40.6) for differential analysis to obtain the differential genes between the high expression group of FABP4 and the low expression group of FABP4 (Fig. [82]4A). Here we present the top 50 differentially expressed genes through a heatmap (Fig. [83]4B). We conducted GO and KEGG pathway enrichment analysis on these differentially expressed genes and found that they are enriched in pathways such as cell mitosis, DNA replication, cytokines, growth factors, immunoglobulins, cell cycle, and ECM receptor interactions (Fig. [84]4C, D). Finally, we used oncoPredict to predict and analyze the sensitivity of the FABP4 gene high expression and low expression groups to drugs. We found that patients with low expression of FABP4 have high sensitivity to Selumetinib and Compound, but low sensitivity to Vincristine and Parbendazole (Fig. [85]4E–J). Fig. 4. [86]Fig. 4 [87]Open in a new tab Volcano plot of differentially expressed genes between high and low FABP4 groups of patients (A). The top 50 differentially expressed genes (B) in patients with high and low FABP4 levels. GO and KEGG pathway analysis of differentially expressed genes (C, D). Predictive Analysis of Drug Sensitivity in Patients with High and Low FABP4 Groups (E-J) Immune cell infiltration analysis We used Cibersort to analyze the infiltration of 22 immune cells in lung squamous cell carcinoma and found that M2 macrophage infiltration was weaker in patients with high FABP4 expression than in patients with low FABP4 expression in TCGA and [88]GSE73403 lung squamous cell carcinoma (Fig. [89]5A, B). Fig. 5. [90]Fig. 5 [91]Open in a new tab Immunocyte Infiltration Analysis (A, B) Cell trajectory and cell communication analysis To further reveal the cell differentiation process in the immune microenvironment, we calculated the cell differentiation process in lung squamous cell carcinoma patients. We found that monocytes have evolved to some extent towards macrophages and T cells. However, the evolutionary pathways of epithelial cells, endothelial cells, B cells, monocytes, T cells, and macrophages are far apart, and it is possible that there is no evolutionary relationship between these cells (Fig. [92]6A, B).We then analyzed the dynamic expression of FABP4 in cell trajectories and found that it is still highly expressed in macrophages during cell differentiation trajectories. As evolution began, FABP4 was highly expressed in macrophages and showed an upward trend. Then, as evolution progressed, FABP4 expression gradually decreased, and finally decreased in epithelial cells, monocytes, T cells, and B cells (Fig. [93]6C, D). Finally, we analyzed the communication connections between these cells and found that there is a close relationship between macrophages and endothelial cells, and there is a chemotaxis phenomenon between macrophages and monocytes (Fig. [94]6E). (Fig. [95]6F) shows the expression of genes regulating co inhibition, co stimulation, and chemokines among epithelial cells, endothelial cells, B cells, T cells, monocytes, and macrophages. Fig. 6. [96]Fig. 6 [97]Open in a new tab Analysis of cell differentiation trajectory (A, B). The expression levels and dynamic changes of FABP4 in six types of cells (C, D). Cell communication analysis between 6 types of cells (E). The expression of co inhibitory, co stimulatory, and chemokine genes among six types of cells (F) Pan cancer analysis We analyzed the differential expression of FABP4 gene in cancer tissues and normal tissues in the Timer 2.0 database, and found that the expression level of FABP4 was lower in most tumors than in normal tissues (Fig. [98]7A). We downloaded a standardized pan cancer dataset from the UCSC database and used R software (version 3.6.4) to calculate the expression differences of genes in different clinical stage samples for each tumor. We observed significant differences in 10 types of tumors, including CESC, BRCA, STES, KIPAN, STAD, KIRC, THYM, THCA, OV, and BLCA. In addition, we established a Cox proportional hazards regression model using the Coxph function of the R software package survival (version 3.2–7) to analyze the relationship between gene expression and prognosis in each tumor. Logrank test was used for statistical testing to obtain prognostic significance. Finally, we observed that in CESC, TCGA, TCGA-STES, HNSC), COAD High expression of COADREAD, THCA, and READ has a poor prognosis, while low expression in GBMLGG, KIRC, and LIHC has a poor prognosis (Fig. [99]7B, C). Fig. 7. [100]Fig. 7 [101]Open in a new tab Differences in FABP4 between tumor tissue and normal tissue in pan cancer (A). The impact of FABP4 on stage staging and survival in pan cancer (B, C) Exploring the causal relationship between FABP4 and lung squamous cell carcinoma through dual sample Mendelian random analysis We conducted Mendelian random analysis on the causal relationship between exposure and outcome using five main MR methods (MR Egger, Weighted median, Inverse variance weighted, Weighted mode, Simple mode). Through the main method of evaluating causal effects, Inverse variance weighted, it was found that FABP4 is a risk factor for lung squamous cell carcinoma (OR [95% CI] 1.31 [1.09–1.57], p-value = 0.004) (Fig. [102]8A). Meanwhile, both the forest plot and scatter plot indicate that FABP4 is a risk factor for lung squamous cell carcinoma (Fig. [103]8B, C). In order to evaluate the reliability of the results, we used the retention method and funnel plot to evaluate our model. We found that in the sensitivity analysis, when a single SNP was systematically deleted and MR analysis was rerun, we found no significant bias in the observed causal relationship, and the funnel plot was no longer offset, indicating that there was no bias in this study (Fig. [104]8D, E). We use MR Egger, Inverse variance weighted for heterogeneity detection while also utilizing Egger_ Intercept is used for horizontal pleiotropy testing. We found that our causal conclusions do not exhibit heterogeneity or level of validity (Fig. [105]8F). Finally, we used lung squamous cell carcinoma as the exposure factor and FABP4 as the outcome factor using a two sample Mendelian analysis. We found that there is no causal relationship between lung squamous cell carcinoma and FABP4, in other words, FABP4 is a risk factor for lung squamous cell carcinoma. This conclusion is not affected by the reverse causal relationship (Fig. [106]8G). Fig. 8. [107]Fig. 8 [108]Open in a new tab Two sample Mendelian random analysis of FABP4 and lung squamous cell carcinoma (A). Forest map, scatter plot, forest map and funnel plot (B-E). Heterogeneity analysis and horizontal pleiotropy analysis (F). Reverse Mendelian random analysis (G) Exploring the causal relationship between FABP4 and benign lung tumors and other subtypes of lung cancer through dual sample Mendelian random analysis To further understand whether FABP4 has a causal relationship only with lung squamous cell carcinoma, we used a two sample Mendelian random analysis to explore the causal relationship between FABP4 and benign lung tumors and other subtypes of lung cancer. We found no causal relationship between FABP4 and benign lung tumors, lung adenocarcinoma, and small cell lung cancer through the main method of evaluating causal effects, Inverse variance weighted (Fig. [109]9A–C). Fig. 9. [110]Fig. 9 [111]Open in a new tab Two sample Mendelian random analysis (A–C) of FABP4 in benign lung tumors, lung adenocarcinoma, and lung squamous cell carcinoma Discussion Using existing large-scale GWAS data for Mendelian randomization research has the advantages of being faster and cheaper. Mendelian randomization studies can understand the potential causal relationships between modifiable risk factors and exposed diseases, which often require extensive sample sizes and long-term follow-up to draw conclusions. However, one drawback of MR design is that it can only be applied to risk factors with appropriate genetic variations. Genetic variations typically have a significant impact on most risk factors, which may lead to low statistical confidence and the risk of false negative results in MR analysis. Therefore, we need to collect clinical data to validate our research conclusions. We reported a gene called Fatty Acid Binding Protein 4 (FABP4) that can serve as a risk factor for the development of lung squamous cell carcinoma, predict the prognosis of lung squamous cell carcinoma patients, and serve as a marker for macrophage infiltration in the immune microenvironment of lung squamous cell carcinoma. The impact of FABP4 on lung squamous cell carcinoma has never been explored before our research report. The only studies in the field of lung cancer include SIRT5 promoting cancer progression in non-small cell lung cancer by reducing FABP4 acetylation levels, and the increased expression of FABP3 and FABP4 being associated with poor prognosis in non-small cell lung cancer [[112]25, [113]26]. Therefore, exploring the effect of FABP4 on lung scales has great research significance. Fatty Acid Binding Protein 4 (FABP4) is a member of the intracellular lipid binding protein family, which binds hydrophobic ligands to regulate the transport of extracellular lipids and plays an important role in intracellular metabolic pathways [[114]27–[115]29]. FABP4 is abnormally expressed in many types of cancer and activates multiple oncogenic signaling pathways by promoting lipid transport [[116]30]. In addition, FABP4 plays an important role in the infiltration of immune cells in the tumor immune microenvironment, especially in the infiltration of macrophages [[117]21–[118]33]. In this study, we found that FABP4 is a marker of macrophage infiltration in the immune response environment of lung squamous cell carcinoma. In addition, when using Cibersort to analyze the infiltration of 22 immune cells in the high and low FABP4 groups of lung squamous cell carcinoma patients in TCGA and [119]GSE73403, we found that M2 macrophage infiltration was more pronounced in patients with high FABP4 expression. Existing studies have shown that the infiltration of M2 macrophages has a promoting effect on tumor development [[120]34–[121]36], which may be the reason why patients with high FABP4 expression have a shorter survival period than those with low FABP4 expression. Due to the current lack of predictive methods for the occurrence of lung squamous cell carcinoma, most patients are diagnosed with lung squamous cell carcinoma in the middle and late stages, which leads to a significantly poor prognosis for lung squamous cell carcinoma patients, with a low 5-year survival rate of only about 15% [[122]37, [123]38]. Based on this, possible exposure factors for the occurrence of lung squamous cell carcinoma can be explored through Mendelian random analysis. At present, research using Mendelian random analysis to explore exposure factors for lung squamous cell carcinoma includes: airflow restriction may be an independent predictive factor for lung squamous cell carcinoma [[124]39], gastroesophageal reflux disease is positively correlated with the incidence of lung squamous cell carcinoma in patients [[125]40], idiopathic pulmonary fibrosis is an independent risk factor for lung squamous cell carcinoma and may increase the risk of lung squamous cell carcinoma [[126]41], Hypothyroidism has a genetic protective causal relationship with lung squamous cell carcinoma [[127]42]. In this study, we found that FABP4 is a risk factor for lung squamous cell carcinoma, and there is no causal relationship between the occurrence of benign lung tumors, lung adenocarcinoma, or small cell lung cancer. This has a positive effect on predicting the causal relationship specific to lung squamous cell carcinoma. At present, research on the anti-tumor effects of FABP4 inhibitors is still limited to some tumors. For example, the FABP4 small molecule inhibitor BMS-309403 can enhance the sensitivity of ovarian cancer cells to carboplatin [[128]17], inhibit lipid transfer of TME with FABP4 inhibitors, reduce tumor regeneration, and inhibit tumor regeneration through inherited or pharmacological targeted SCD1 [[129]43]. FABP4 inhibitors can inhibit the metastasis of cholangiocarcinoma cells [[130]44]. Conclusions In summary, in our study, we integrated Mendelian randomization, bioinformatics, and single-cell RNA sequencing analysis to screen for the"star"molecule fatty acid binding protein 4 (FABP4), which may provide some guidance for more specific treatment of lung squamous cell carcinoma in the future. Limitation We only use information from lung squamous cell carcinoma patients in public databases to construct prognostic model, lacking our own collection of lung squamous cell carcinoma patients to validate the prognostic model. Finally, due to the limitations of the analysis model type and experimental conditions in our article, we did not conduct corresponding in vitro and in vivo validations. Acknowledgements