Abstract Objectives: Sm proteins (SNRPB/D1/D2/D3/E/F/G), involved in pre-mRNA splicing, were previously reported in the tumorigenesis of several cancers. However, their specific role in lung adenocarcinoma (LUAD) remains obscure. Our study aims to feature abnormal expressions and mutations of genes for Sm proteins and assess their potential as therapeutic targets via integrated bioinformatics analysis. Methods: In this research, we explored the expression pattern and prognostic worth of genes for Sm proteins in LUAD across TCGA, GEO, UALCAN, Oncomine, Metascape, David 6.8, and Kaplan-Meier Plotter, and confirmed its independent prognostic value via univariate and multivariate cox regression analysis. Meanwhile, their expression patterns were validated by RT-qPCR. Gene mutations and co-expression of genes for Sm proteins were analyzed by the cBioPortal database. The PPI network for Sm proteins in LUAD was visualized by the STRING and Cytoscape. The correlations between genes for Sm proteins and immune infiltration were analyzed by using the “GSVA” R package. Results: Sm proteins genes were found upregulated expression in both LUAD tissues and LUAD cell lines. Moreover, highly expressed mRNA levels for Sm proteins were strongly associated with short survival time in LUAD. Genes for Sm proteins were positively connected with the infiltration of Th2 cells, but negatively connected with the infiltration of mast cells, Th1 cells, and NK cells. Importantly, Cox regression analysis showed that high SNRPD1/E/F/G expression were independent risk factors for the overall survival of LUAD. Conclusion: Our study showed that SNRPD1/E/F/G could independently predict the prognostic outcome of LUAD and was correlated with immune infiltration. Also, this report laid the foundation for additional exploration on the potential treatment target’s role of SNRPD1/E/F/G in LUAD. Keywords: SM proteins, prognostic biomarkers, immune infiltration, target therapy, lung adenocarcinoma Introduction Lung cancer is the most well-known sort of malignant tumor worldwide and is the major cause of cancer mortality ([36]Sung et al., 2021). Lung adenocarcinoma (LUAD) has been the most common subtype of Non-small cell cancer (NSCLC) ([37]Anonymous, 2015). LUAD is characterized by a lack of early clinical symptoms, a high rate of distant metastasis and drug resistance, which pose serious challenges to clinical treatment. Currently, treatment methods for LUAD mainly include surgical resection, radiotherapy, chemotherapy, immunotherapy, and molecular targeted therapy ([38]Reck and Rabe, 2017). The best treatment for lung cancer is surgical resection in the early stages of lung cancer. However, the early symptoms of the disease are not obvious, easy to be ignored, which leads to a late diagnosis. The treatment of advanced lung adenocarcinoma is limited, and molecular targeted therapy is a promising choice, as well as immunotherapy. However, because of the lack of effective molecular targets, most drugs remain ineffective in the treatment of LUAD patients, of whom the 5-year survival rate is just 15% ([39]Chen et al., 2016). Hence, it is absolutely necessary to recognize effective and dependable biomarkers to determine poor prognoses and direct treatment strategies. The spliceosome is a ribonucleoprotein (RNP) with a complex ring-shaped structure, which mainly consists of small nuclear ribonucleoproteins (snRNPs), which is involved in splicing the pre rna into mature mRNA. Seven Sm proteins (SNRPB, SNRPD1, SNRPD2, SNRPD3, SNRPE, SNRPF, and SNRPG) and a small eponymous small nuclear RNA (snRNA) compose snRNPs ([40]Chari et al., 2008). Accurate splicing is essential for normal cellular functions like cell proliferation, apoptosis, migration, and invasion. Sm proteins, involved in the formation of anti-Sm antibodies, were significant diagnostic biomarkers in autoimmune diseases, such as systemic lupus erythematosus (SLE). Furthermore, priors works have illustrated that the aberrant expressions of genes for Sm proteins are related to some human cancers, including cervical cancer ([41]Zhu et al., 2020), glioblastoma ([42]Correa et al., 2016), breast cancer ([43]Dai et al., 2021), and hepatocellular carcinoma ([44]Zhan et al., 2020). Up to now, there have been limited studies investigating the connection between the abnormal expressions of genes for Sm proteins and LUAD. For instance, a report ([45]Liu et al., 2019) showed that SNRPB down-regulation inhibited the growth and metastasis of NSCLC cells via RAB26 down-regulation. In the study directed by [46]Valles et al. (2012), SNRPB and SNRPE were proved to be related to poor survival in LUAD patients. However, the precise functional roles of Sm proteins in LUAD are unclear yet. Our study aimed to present abnormal expressions and mutations of genes for Sm proteins and assess their potential as therapeutic targets via integrated bioinformatics analysis, which may be helpful to the treatment of LUAD patients. Materials and Methods Data Source and Flow Chart The [47]GSE40791 dataset data performed by [48]GPL570 ([49]Zhang et al., 2012) were downloaded from the Gene Expression Omnibus (GEO) database ([50]https://www.ncbi.nlm.nih.gov/geo/) ([51]Clough and Barrett, 2016). [52]GSE40791 included 100 non-tumor lung tissues and 94 lung adenocarcinoma tissues (N = 100, T = 94). The TGCA-LUAD data (N = 59, T = 535) were downloaded from the TGCA database ([53]https://portal.gdc.cancer.gov) ([54]Tomczak et al., 2015). A flow chart of this study procedure is shown in [55]Figure 1. The characteristics of 535 patients, including their ages, genders, smoking history, TNM stages, pathologic stages, primary therapy outcome, residual tumor and anatomic neoplasm subdivision, are presented in [56]Supplementary Table S1. FIGURE 1. [57]FIGURE 1 [58]Open in a new tab Flow chart of the present study. Comparison of Genes for Sm Proteins Expression Levels Between LUAD and Corresponding Normal Tissue The Oncomine database ([59]https://www.oncomine.org) ([60]Rhodes et al., 2004) was used to obtain the data of mRNA expressions for Sm proteins in various cancers, and these data were analyzed via Student’s t-test. And we defined the following threshold: p-value<0.001, fold change >2 and 10% of most highly ranked genes. Furthermore, we compared expression levels of genes for Sm proteins between lung adenocarcinoma samples and normal samples in [61]GSE40791. Moreover, the transcription levels of these genes were validated in the GEPIA 2 database ([62]http://gepia2.cancer-pku.cn) ([63]Tang et al., 2017), including tumor and normal samples from the TCGA and the Genotype-Tissue Expression Project (GTEx) database ([64]https://gtexportal.org/home/) and the TCGA-LUAD data performed by R software. Cell Culture and RT-qPCR BEAS-2B cell line (human bronchial epithelial cell line) was purchased from the American Type Culture Collection (Manassas, VA, United States), A549 cell line was purchased from the Type Culture Collection of the Chinese Academy of Sciences, Shanghai, China. Two types of cells were cultured in DMEM with 10% FBS, penicillin (50 U/ml), and streptomycin (50 U/ml). The cells were incubated at 37°C with 5% CO[2]. Total RNA of the above cells was extracted using TRIzol (Thermo, United States), and first-strand complementary DNA (cDNA) synthesis from total RNA was carried out using the GoScript Reverse Transcription System (Promega, United States). RT-qPCR was conducted using an AriaMx Real-Time PCR machine (Agilent Technologies, United States) with TB GreenPremix ExTaq II (Takara Bio, Japan). The primer sequences are shown in [65]Supplementary Table S2. RT-qPCR cycle conditions: 3 min at 95°C, 40 cycles of 15 s for 95°C and 60 s for 60°C. Data was normalized to the house-keeping gene GAPDH. The relative gene expression was performed by the 2^−ΔΔCt method ([66]Livak and Schmittgen, 2001). Correlations Between Genes for Sm Proteins and Clinicopathological Parameters In our study, the transcription levels of genes for Sm proteins in TP53 mutation of LUAD patients were analyzed with the lung adenocarcinoma dataset using the UALCAN database ([67]http://ualcan. [68]path.uab.edu) ([69]Chandrashekar et al., 2017). Moreover, we explored the correlation between genes for Sm proteins gene expressions and tumor stages via the “Expression DIY module” of GEPIA 2 and selected 50 neighboring genes related to genes for Sm proteins using the “Similar Genes Detection” module. Survival Analysis The Kaplan-Meier plotter ([70]http://kmplot.com) ([71]Gyorffy et al., 2013; [72]Nagy et al., 2021), including mRNA expression data and patients’ clinical information from GEO and TCGA, is a powerful online tool to further verify the prognostic value of genes in several cancers. The overall survival (OS), free progression (FP), and post-progression survival (PPS) curves of genes for Sm proteins in LUAD were shown via the Kaplan-Meier plotter. Meanwhile, we also explored if the dyregulation of genes for Sm proteins had any impacts on the OS of LUAD patients with smoke history by using the Kaplan Meier plotter. In addition, univariate and multivariate cox regression analysis of the TGCA-LUAD data was performed using the “survival” and the “survminer” R package. Gene Mutations and Co-Expression Analysis In this study, the lung adenocarcinoma dataset (TCGA, Firehose Legacy), including data from 230 complete samples of 586 patients, was selected and visualized as the map of gene mutations, expression heatmap, and co-expression map of genes for Sm proteins using the cBioPortal database ([73]http://www. [74]cbioportal.org) ([75]Cerami et al., 2012; [76]Gao et al., 2013). The z-Score threshold was set to ±1.8. Constructed Protein-Protein Interaction Network and Selected Hub Genes The tool of STRING ([77]https://string-db.org/) ([78]Szklarczyk et al., 2019) was used to construct the protein-protein interaction (PPI) network between the seven genes for Sm proteins and their 50 frequently neighboring genes. All PPI pairs with a combined score of >0.4 were extracted. Then, we used the “CytoHubba” plugin (v0.1) ([79]Chin et al., 2014) of Cytoscape (v3.8.2) ([80]https://cytoscape.org) ([81]Otasek et al., 2019) to identify hub genes in that PPI network. Furthermore, the online tool of Metascape ([82]http://metascape.org) ([83]Zhou et al., 2019) was used to analyze the MCODE components of that PPI network. GO and KEGG Enrichment Analyses Functional enrichment analyses, including gene ontology (GO) analysis comprising cellular component (CC), molecular function (MF), and biological process (BP), and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis, were performed via the tool of DAVID 6.8 ([84]https://david.ncifcrf.gov) ([85]Huang da et al., 2009). Subsequently, the specific enriched terms for GO and KEGG enrichment analysis were visualized using the “clusterProfiler” package ([86]Yu et al., 2012) in R software. Moreover, we presented the network of enriched terms colored by cluster ID using the online tool of Metascape. Associations Between Genes for Sm Proteins and Immune Infiltration Finally, we explore the associations between genes for Sm proteins and 24 immune cell types in TGCA-LUAD. We investigated dendritic cell (DC), activated dendritic cell (aDC), immature dendritic cell (iDC),plasmacytoid dendritic cell (pDC), B cells, CD8^+ T cells, Cytotoxic T cells, T cells, T helper cells, T help 1 (Th1) cells, Th17 cells, Th2 cells, T central memory (Tcm), T effector memory (Tem), T follicular helper (Tfh), T gamma delta (Tgd), regulatory T Cell (Treg), eosinophils, macrophages, mast cells, neutrophils, natural killer (NK) cells, NK CD56^bright cells and NK CD56^dim cells ([87]Bindea et al., 2013), and the above associations was performed using single-sample Gene Sets Enrichment Analysis (ssGSEA) algorithm of R package “GSVA” ([88]Hanzelmann et al., 2013) and lollipop charts were produced using R package “ggplot2.” Statistical Analysis One-way Analysis of Variance (ANOVA) in the GEPIA 2 database, the log-rank test in Kaplan-Meier survival analysis, and the Cox proportional risk models in univariate and multivariate analyses, were used in statistical analysis. R software (v4.0.2) ([89]http://www.r-project.org) was used for analysis in this study. RT-qPCR data was performed in GraphPad Prism (v9.0.2) (San Diego, CA, United States) and presented as the mean ± S.D. Student’s T-test was used for statistical analyses between the data pairs where appropriate. *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001 were considered to represent statistical significance. Results The mRNA Levels of Sm Proteins Were Significantly Increased in Lung Cancer Tissues and LUAD Cell Lines From [90]Figure 2A and [91]Table 1, we could see that the mRNA levels of SNRPB/D1/D2/E/G were higher in lung cancer than in non-cancer tissues. The results were also consistent with the mRNA levels of Sm proteins were upregulated in LUAD compared to normal lung tissues ([92]Figure 2B). Moreover, the relative mRNA level of all Sm proteins in A549 cell line is higher than the relative mRNA level of them in normal lung cell lines, BEAS-2B ([93]Figure 2C). The transcription levels of genes for Sm proteins considerably increased in data of GEPIA 2 (TCGA-LUAD & GTEx), TCGA-LUAD data and pair LUAD data ([94]Figures 3A–C) match the above results. FIGURE 2. [95]FIGURE 2 [96]Open in a new tab The transcription levels of genes for Sm proteins in lung cancer tissues and lung cell lines. (A) The genes for Sm proteins in different human cancers (ONCOMINE). (B) Box plots of genes for Sm proteins expression between lung cancer and normal tissues in [97]GSE40791. (C) Bar charts of mRNA level of genes for Sm proteins between BEAS-2B and A549 cell line. Students T-test was performed to assess the statistical significance. A p value of <0.05 was regarded as statistically significant. TABLE 1. The mRNA levels of Sm proteins were significantly higher in lung cancer than in normal lung tissues (ONCOMINE). Lung cancer VS normal lung tissue Fold change p-value T-test References