Abstract

   Background: Karyopherin alpha (KPNA), a nuclear transporter, has been
   implicated in the development as well as the progression of many types
   of malignancies. Immune homeostasis is a multilevel system which
   regulated by multiple factors. However, the functional significance of
   the KPNA family in the pathogenesis of lung adenocarcinoma (LUAD) and
   the impact of immune homeostasis are not well characterized.

   Methods: In this study, by integrating the TCGA-LUAD database and
   Masked Somatic Mutation, we first conducted an investigation on the
   expression levels and mutation status of the KPNA family in patients
   with LUAD. Then, we constructed a prognostic model based on clinical
   features and the expression of the KPNA family. We performed functional
   enrichment analysis and constructed a regulatory network utilizing the
   differential genes in high-and low-risk groups. Lastly, we performed
   immune infiltration analysis using CIBERSORT.

   Results: Analysis of TCGA datasets revealed differential expression of
   the KPNA family in LUAD. Kaplan-Meier survival analyses indicated that
   the high expression of KPNA2 and KPNA4 were predictive of inferior
   overall survival (OS). In addition, we constructed a prognostic model
   incorporating clinical factors and the expression level of KPNA4 and
   KPNA5, which accurately predicted 1-year, 3-years, and 5-years survival
   outcomes. Patients in the high-risk group showed a poor prognosis.
   Functional enrichment analysis exhibited remarkable enrichment of
   transcriptional dysregulation in the high-risk group. On the other
   hand, gene set enrichment analysis (GSEA) displayed enrichment of cell
   cycle checkpoints as well as cell cycle mitotic in the high-risk group.
   Finally, analysis of immune infiltration revealed significant
   differences between the high-and low-risk groups. Further, the
   high-risk group was more prone to immune evasion while the inflammatory
   response was strongly associated with the low-risk group.

   Conclusions: the KPNA family-based prognostic model reflects many
   biological aspects of LUAD and provides potential targets for precision
   therapy in LUAD.

   Keywords: lung adenocarcinoma, the KPNA family, immune homeostasis,
   biomarker, potential target

Introduction

   Lung cancer is among the most prevalent tumors and contributes to about
   21% of all cancer-related fatalities ([38]Siegel et al., 2022).
   Non-small cell lung cancer (NSCLC) is the most common subtype of lung
   cancer that represents at least 85% of all cases of lung cancer.
   Histologically, NSCLC can be categorized into three types, namely,
   large cell carcinoma, lung squamous cell carcinoma (LUSC), and lung
   adenocarcinoma (LUAD), ([39]Ko et al., 2018; [40]Majem et al., 2020).
   Currently, the principal treatment modalities for lung cancer include
   targeted therapy, chemotherapy, radiotherapy, surgery, and
   immunotherapy ([41]Catania et al., 2021). Due to the highly malignant
   nature of lung cancer, 5-year survival rates of patients with stage I
   to IIIA range from 14 to 49%, and those for stage IIIB to IV disease
   are <5% ([42]Ko et al., 2018). LUAD is the most common subtype of lung
   cancer, accounting for approximately −40% of all cases ([43]Yin et al.,
   2019). The 5-years overall survival (OS) rate of patients with LUAD is
   less than 20% ([44]Wu et al., 2021). Therefore, exploration of the
   pathogenetic mechanism of LUAD and identification of potential
   therapeutic targets is a key research imperative.

   Karyopherin alpha (KPNA) are nuclear transporters (NTRs) that consist
   of a cluster of basic amino acids, which selectively through the
   nuclear pore complex (NPC) ([45]Hazawa et al., 2020; [46]Miyamoto et
   al., 2020). NPC is composed of 30 nucleoporin (NUP) proteins, which is
   the sole channel between the nucleus and the cytoplasm ([47]Hazawa et
   al., 2020). Active transport of proteins from the cytoplasm to the
   nucleus through NPC usually requires a carrier molecule that identifies
   the transport signal on the cargo, which is called nuclear localization
   signal (NLS) ([48]Miyamoto et al., 2016). The classical mechanism of
   the passage of proteins into the nucleus is as follows: cargoes usually
   possess NLS that is initially detected by KPNA and then exhibits
   interaction with karyopherin b1 (KPNB1), and the created trimeric
   complex diffuses into the nucleus through NPC ([49]Myat et al., 2018).
   The main role of KPNA in nucleocytoplasmic transport is to function as
   adaptor molecules that carry protein cargoes carrying NLS and
   Karyopherin beta (KPNB) from the cytoplasm to the nucleus ([50]Miyamoto
   et al., 2016). In addition to its function in mediating
   nucleocytoplasmic transport, KPNA also has non-transport functions such
   as lamin polymerization, nuclear membrane formation, spindle assembly,
   protein degradation, cytoplasmic retention, cell surface function, gene
   expression, and mRNA-related function ([51]Miyamoto et al., 2016). In
   addition, KPNA is increasingly recognized to have a central in cancer
   growth and progression ([52]Wang et al., 2012; [53]Xu et al., 2021).

   The human type the KPNA family consists of seven subtypes, KPNA1,
   KPNA2, KPNA3, KPNA4, KPNA5, KPNA6, and KPNA7 ([54]Miyamoto et al.,
   2016), and these subtypes exhibit 42–86% homology to one another
   ([55]Oostdyk et al., 2019). The KPNA family can be further divided into
   three subfamilies based on sequence homology: α1, α2, and α3. The α1
   subfamily comprises three members, KPNA1, KPNA5, and KPNA6. α2
   subfamily comprises two members, KPNA2 and KPNA7. α3 subfamily
   comprises two members, KPNA3 and KPNA4 ([56]Miyamoto et al., 2016;
   [57]Myat et al., 2018). KPNA1 was the founding member of the α1
   subfamily. The α2 and α3 subfamilies are known to have evolved through
   duplication of the founding KPNA, and to have developed cell and
   tissue-specific roles which facilitate development and differentiation
   in higher eukaryotes ([58]Oostdyk et al., 2019). Aberrant expression of
   the KPNA family has been detected in multiple cancers, which was
   related to poor prognosis. For example, a study identified high KPNA1
   expression in breast cancer, which was associated with poor overall
   survival (OS) ([59]Tsoi et al., 2021). High KPNA2 expression in
   melanoma was linked to poor OS and disease-free survival (DFS)
   ([60]Yang et al., 2020). High expression of KPNA2 has been identified
   in ovarian carcinoma and cervical cancer, which was associated with
   poor prognosis ([61]Cui et al., 2021; [62]Wang et al., 2021). High
   KPNA4 expression in liver cancer was shown to be associated with poor
   OS in patients ([63]Xu et al., 2021).

   The KPNA family plays varied roles in different types of malignancies.
   For example, KPNA1 was shown to modulate the nuclear import of NCOR2
   splicing variant [64]BQ323636.1 and thus promote tamoxifen resistance
   in breast cancer ([65]Tsoi et al., 2021). The expression of KPNA2 in
   ovarian carcinoma can promote epithelial-mesenchymal transition (EMT),
   migration, and invasion. The expression of KPNA2 in colorectal cancer
   tissue was correlated with stage, differentiation status, and
   metastasis. Overexpression of KPNA2 indicated a poor prognosis in
   patients ([66]Han and Wang, 2020). KPNA3 was shown to confer sorafenib
   resistance via TWIST-regulated EMT in advanced liver cancer ([67]Hu et
   al., 2019). The expression of KPNA4 in prostate cancer was shown to
   promote metastasis through miR-708-KPNA4-TNF axes ([68]Yang et al.,
   2017), and KPNA4 was found to enhance cancer cell proliferation and
   cisplatin resistance in cutaneous squamous cell carcinoma ([69]Zhang et
   al., 2019). KPNA5, KPNA6, and KPNA1 binding regions can promote the
   proliferation of breast cancer cells ([70]Kim et al., 2015). KPNA7
   promotes cell growth and anchorage-independent growth, and reduces
   autophagy of pancreatic cancer cells ([71]Laurila et al., 2014).
   Previous studies have reported overexpression of KPNA4 in LUAD and
   identified it as a potential key driver of the malignant phenotype
   ([72]Hu et al., 2020). Nonetheless, the functional role and underlying
   mechanism of the KPNA family in LUAD are poorly understood.

   In this study, we used the TCGA-LUAD database and Masked Somatic
   Mutation to evaluate the expression, mutation status, and prognostic
   value of the KPNA family in LUAD. We built a prognostic model for
   individuals on the basis of the clinical features and the expression of
   the KPNA family and analyzed the differences in mutational signature in
   the two risk groups. Next, we did a differential expression analysis,
   Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment
   analysis, and Gene Ontology (GO) enrichment analysis in the two risk
   groups. Finally, we performed the analysis of immune infiltration in
   these groups. This is the first investigation to examine the function
   of the KPNA family in LUAD, as per our best knowledge. Our findings may
   avail both potential biomarkers and therapeutic targets against LUAD.

Materials and methods

Data acquisition and pretreatment

   TCGA-LUAD expression profile data were acquired from UCSC Xena
   ([73]http://xena.ucsc.edu/); the downloaded data type was count, and
   the count values were transformed to transcript per million (TPM)
   values in advance. Transcriptomic data from 594 patients in TCGA-LUAD,
   535 tumor samples, and 59 normal samples were included in the current
   analysis. In addition, we selected “Masked Somatic Mutation” data as
   the somatic mutation data (n = 561) of LUAD patients from TCGA GDC
   ([74]https://portal.gdc.cancer.gov/), processed these data using
   VarScan, and performed an analysis of somatic mutation using the
   maftools R package ([75]Mayakonda et al., 2018). The copy number
   information (n = 531) of patients in TCGA-LUAD was downloaded in UCSC
   Xena, which assessed gene copy number variation (CNV).

   In this analysis, we used the clinical information of 594 patients from
   TCGA-LUAD, including age, sex, survival status, and TNM stage. We
   matched patient IDs in the clinical database with the transcriptomic
   data as well as somatic mutation data above and removed samples with
   unavailable transcriptomic data and somatic mutation data.

   The KPNA family (KPNA1, KPNA2, KPNA3, KPNA4, KPNA5, KPNA6, and KPNA7)
   expression profiles, mutation data, and CNV data were extracted via R
   languages for subsequent analysis.

Differential expression analyses

   Based on information in the TCGA-LUAD datasets, we divided the samples
   into tumor samples and normal samples and screened out differentially
   expressed genes (DEGs) utilizing the DESeq2 package. The screening
   criteria were log2 (fold change) > 1.0 and p-value < 0.05 ([76]Love et
   al., 2014). Subsequently, differential expression analysis was
   performed using the DESeq2 package to determine the expression profiles
   of low-and high-risk groups. The screening criteria were log2 (fold
   change) > 2.0 and adj. p-value < 0.05. Volcano plots were plotted using
   package ggplot2, heat maps were drawn using package pheatmap to
   demonstrate the differential gene expression.

Establishment of the prognostic model

   Kaplan-Meier method in conjunction with the log-rank test was utilized
   for survival analysis to establish the link between high/low expression
   of the KPNA family genes and OS.

   To determine the predictive power of the KPNA family for the prognosis
   of LUAD individuals, we performed univariate Cox regression analysis,
   LASSO regression analysis, and multivariate Cox regression analysis
   based on the TCGA-LUAD to identify independent prognostic factors, and
   created a prognostic model. First, univariate Cox proportional
   regression analysis was utilized to investigate the link between the
   expression levels of genes in the KPNA family and OS; genes with an
   adjusted p-value < 0.1 were retained. Subsequently, to eliminate the
   effect of multicollinearity, we used the LASSO algorithm to screen
   meaningful variables in univariate Cox regression analysis. Then we
   performed a stepwise regression analysis using multivariate Cox
   regression to discover independent prognostic factors. Finally,
   optimized gene expression and correlation estimated Cox regression
   coefficients were taken into consideration to generate a risk score
   formula: risk score = (exp-Gene1*coef-Gene1) +
   (exp-Gene2*coef-Gene2)+……+(exp-Gene*coef-Gene).

   The participants were then classified into the aforementioned two risk
   groups as per the given risk score. Kaplan-Meier analysis and log-rank
   test were performed to compare OS in the two groups applying the
   survival package. Additionally, receiver operating characteristic (ROC)
   curve analysis evaluated the survival predictive value of the risk
   score. The area under ROC curves (AUC) values were derived utilizing
   the R package timeROC.

   After detection of independent prognostic factors, we combined clinical
   information such as age, sex, stage, and other factors to establish a
   nomogram for prognostic assessment of LUAD patients. In particular, we
   evaluated the prognostic outcomes at 1, 3, and 5 years,
   correspondingly. The reliability of the model was assessed by plotting
   the calibration curve.

Construct functional enrichment analysis and regulatory network

   We did GO enrichment analysis as well as KEGG pathway enrichment
   analysis of the differentially expressed genes of two risk groups
   utilizing the clusterProfiler R package and R package GOPlot ([77]Ogata
   et al., 1999; [78]Ashburner et al., 2000; [79]Yu et al., 2012). GSEA
   was instrumental in developing the gene expression matrix with
   clusterProfiler R package; “c2. cp.all.v7.0. symbols” was chosen as a
   reference gene set. In addition, false discovery rate (FDR) < 0.25 with
   p < 0.05 denotes substantial enrichment ([80]Suarez-Farinas et al.,
   2010). Based on the “c2. cp.all.v7.0. symbols” gene set, we utilized
   the R package Gene set variation analysis (GSVA) on the basis of the
   gene expression matrix for each sample, calculated the related pathway
   scores, and generated the Heat maps using the ssGSEA method
   ([81]Hänzelmann et al., 2013).

   Using the STRING protein-protein interactions database, we evaluated
   the link between the hub genes and their interactions and exported the
   results; core genes were thoroughly screened with the CytoHubba Plugin
   in Cytoscape ([82]Chin et al., 2014).

   In addition, hub genes-miRNA regulation analysis and transcription
   factors-target genes regulatory network analysis were performed with
   NetworkAnalyst ([83]http://www.networkanalyst.ca/NetworkAnalyst).
   Results were finally exported from Networkanalyst, and miRNA-hub genes
   and transcription factors-hub genes regulatory network plotted using
   Cytoscape software.

Analysis of immune cell infiltration

   We performed deconvolution with transcriptome matrix using the
   CIBERSORT algorithm (which is premised on the linear support vector
   regression principle) and assessed the cellular composition and the
   abundance of immune cells in the mixed infiltrate ([84]Newman et al.,
   2015). Gene expression matrices data were uploaded onto the CIBERSORT,
   and after filtering the outputs (p-value < 0.05), we obtained the
   matrix of infiltrating immune cells. Bar graphs were plotted using R
   package ggplot2 to demonstrate the distributions of 22 types of
   infiltrating immune cells in every sample. In addition, we studied the
   correlation of two risk groups with immune and inflammation by
   extracting HLA family-related genes (MHC class I and II) and
   complement-related genes.

Statistical analysis

   The R software (version 4.0.2) performed all the analyses and data
   processing. Between-group variations with respect to normally
   distributed continuous variables were investigated with the aid of the
   Student’s t-test, whereas those with respect to non-normally
   distributed variables were investigated utilizing the Mann-Whitney U
   test (Wilcoxon’s rank-sum test). Additionally, for between-group
   differences with respect to categorical variables, the Chi-squared test
   or Fisher exact test was used. Correlation between different genes was
   assessed using Spearman correlation analysis. Kaplan-Meier survival
   analyses were done through the utilization of the R package survival
   and the between-group differences in survival outcomes were assessed
   using the log‐rank test. Univariate as well as multivariate Cox
   regression analyses were utilized to ascertain the independent
   prognostic factors. Two-sided p values < 0.05 denoted statistical
   significance for all analyses.

Results

Aberration of the KPNA family in TCGA-LUAD

   First, we extracted the KPNA family from the TCGA-LUAD datasets, which
   included KPNA1, KPNA2, KPNA3, KPNA4, KPNA5, KPNA6, and KPNA7, and the
   details are shown in [85]Supplementary Table S1. We plotted the
   heatmaps of the KPNA family and found a non-uniform trend in their
   expression with no significant correlations between them ([86]Figures
   1A,B). We identified differential expression of KPNA2, KPNA3, KPNA5,
   KPNA6, and KPNA7. Compared with normal tissue, KPNA2, KPNA6, and KPNA7
   were highly expressed in LUAD, while KPNA3 and KPNA5 expression were
   decreased in LUAD ([87]Figure 1C). Subsequently, we plotted ROC curves,
   which clearly showed the discriminative value of these genes in
   differentiating between tumor samples and non-tumor samples. The AUC
   values of KPNA2, KPNA3, KPNA5, and KPNA7 were >0.7, which indicated a
   promising discriminating ability. In addition, we did Kaplan-Meier
   survival analysis to identify genes that affect the prognosis in LUAD.
   The expression of KPNA2 and KPNA4 was found to affect the OS of LUAD
   individuals, and the patients with high expression of KPNA2 and KPNA4
   showed a much worse prognosis ([88]Figures 1D–J).

FIGURE 1.

   [89]FIGURE 1
   [90]Open in a new tab

   Expression patterns of the KPNA family in TCGA-LUAD (A) Heat maps of
   gene expressions of the KPNA family (B) Heat map of gene-gene
   correlations in the KPNA family (C) Boxplots of the KPNA family genes
   between the normal and tumor tissues (D–J) ROC curve showing group
   differences and the Kapla-Meier curves showing survival differences. *
   represents p < 0.05; ** represents p < 0.01; *** represents p < 0.001;
   ns represents no significant difference (p > 0.05).

   The panorama of gene mutations was displayed in TCGA-LUAD datasets;
   missense mutations accounted for the majority of mutations,
   single-nucleotide polymorphisms (SNPs) occurred more frequently than
   deletions or insertions, and C>A was most frequently identified in
   single nucleotide variants (SNVs) among patients with LUAD
   ([91]Supplementary Figures S1A,B). Subsequently, we extracted the KPNA
   family information and analyzed the mutational signatures. The
   frequency of overall the KPNA family mutations was low, and the
   mutation types were primarily missense mutations ([92]Figure 2A). We
   plotted the lollipop diagrams according to mutational signatures
   ([93]Figures 2B–G). In addition, we analyzed CNV changes according to
   the information on the CNV of the KPNA family. As shown in [94]Figure
   2H, the copy number amplifications of KPNA1, KPNA2, KPNA4, KPNA6, and
   KPNA7 in total samples were higher than the copy number deletions, but
   the copy number amplifications of KPNA3 and KPNA5 were lower than the
   copy number deletions.

FIGURE 2.

   [95]FIGURE 2
   [96]Open in a new tab

   Mutations and copy number variations (CNV) of the KPNA family (A)
   Mutation frequency of the KPNA family (B–G) Mutated sites of the KPNA
   family (H) Copy number alterations of the KPNA family; red dots
   indicate that amplifications are greater than deletions while blue dots
   indicate that deletions are greater than amplifications.

Creation of prognostic model based on the KPNA family

   We conducted a univariate Cox regression analysis to detect the KPNA
   family genes linked to the prognosis of LUAD patients. Four genes were
   discovered to be linked to survival. To further screen the genes
   associated with prognosis, we screened the genes using LASSO regression
   analysis and Cox regression analysis and eventually identified KPNA4
   and KPNA5 as independent prognostic factors ([97]Figures 3A–C). As per
   their expression values and regression coefficients, we derived the
   risk score for LUAD specimens and plotted the heatmaps to visualize the
   distribution of samples in the two risk groups ([98]Figure 3D). We
   conducted a survival analysis of LUAD individuals utilizing their risk
   score-based grouping; the findings affirmed that patients in the
   high-risk group experienced a poor prognosis ([99]Figure 4A). ROC curve
   analysis indicated good predictive efficacy of risk score-based
   grouping for 1-year, 3-years, and 5-years survival outcomes (1-year AUC
   = 0.615, 3-years AUC = 0.645, 5-years AUC = 0.629) ([100]Figure 4B).

FIGURE 3.

   [101]FIGURE 3
   [102]Open in a new tab

   Independent prognostic factors of the KPNA family (A) Forest plot of
   univariate Cox regression analysis of the KPNA family (B) Lasso
   regression model of the KPNA family (C) Forest plot of multivariate Cox
   regression analysis of the KPNA family (D) Calculated risk score and
   the heat maps of risk factors based on the findings of multivariate Cox
   regression analysis.

FIGURE 4.

   [103]FIGURE 4
   [104]Open in a new tab

   Evaluation of prognostic model and correlation analysis of clinical
   features (A) Analysis of prognosis in the two risk groups (B) ROC
   curves displaying the predictive value of the models for 1-year,
   3-year, and 5-year survival outcomes (C,D) Prognostic nomogram and
   calibration curves according to clinical factors and the expression of
   KPNA4 and KPNA5 (E–I) Differences in clinical features between the two
   risk groups.

   Subsequently, we constructed a nomogram incorporating age, sex,
   clinical stage, and the expression level of KPNA4 and KPNA5 for
   prognostic assessment of LUAD patients ([105]Figure 4C). Through
   calibration curves, we found that the prognostic model for 1-year,
   3-years, and 5-years had high reliability ([106]Figure 4D).
   Additionally, we performed risk stratification based on different
   factors including age, sex, clinical stage, survival status, and immune
   subtypes. The results affirmed that there were no remarkable
   differences between the two risk groups with respect to age or sex;
   however, there were substantial differences between the two risk groups
   in terms of clinical stage and immune subtypes ([107]Figures 4E–I).

Comparison of tumor mutation burden and microsatellite instability utilizing
risk score

   We further compared the mutational signatures between the two groups
   utilizing the risk score. There were no remarkable differences in MSI
   scores between the two risk groups, but the high-risk group had greater
   TMB scores in contrast to the low-risk group ([108]Figures 5A,B).
   Subsequently, we analyzed the top 30 mutant genes of the two risk
   groups and ascertained variations in genetic mutations between them
   ([109]Figure 5C).

FIGURE 5.

   [110]FIGURE 5
   [111]Open in a new tab

   Differences in mutation signatures between high-and low-risk groups (A)
   Differences in MSI score between high- and low-risk groups (B)
   Differences in TMB score between high-and low-risk groups (C)
   Differences in the top 30 mutant genes between high-and low-risk
   groups. *represents p < 0.05; ** represents p < 0.01; *** represents p
   < 0.001; ns represents no significant difference (p > 0.05).

Differential expression analysis and functional enrichment analysis of
high-and low-risk groups

   According to the low-and high-risk groups, we did a differential
   analysis of all genes within the expression profiles in the TCGA cohort
   using the volcano plots and heat maps ([112]Figures 6A,B). Pathway
   enrichment analysis, as well as GO enrichment analysis, were performed
   on DEGs separately ([113]Supplementary Tables S2, S3). GO enrichment
   analysis included molecular function (MF), biological process (BP), and
   cellular component (CC). The key DEGs enriched the following principal
   biological processes: epithelium development, cornification, tissue
   development, and morphogenesis of a branching epithelium, morphogenesis
   of a branching structure; the principal aggregation of cellular
   components was as follows: extracellular region, cornified envelope,
   and chromatin. The principal enriched molecular functions were as
   follows: DNA-binding transcription factor activity, sequence-specific
   double-stranded DNA binding, and amino acid sodium symporter activity
   ([114]Figures 6C,D). The pathway enrichment was mainly enriched in
   Neuroactive ligand-receptor interaction, Salivary secretion, Galactose
   metabolism, Vascular smooth muscle contraction, and Transcriptional
   dysregulation in cancers ([115]Figures 6E,F).

FIGURE 6.

   [116]FIGURE 6
   [117]Open in a new tab

   Differential expression analysis and functional enrichment analysis of
   the two risk groups (A,B) Heat map and volcano plot of differential
   expression in high-and low-risk groups (C) GO analysis of the two risk
   groups. Outer circle, GO term; cylindrical of inner circle, number of
   enriched genes; yellow, BP (Biological Process); blue, MF (Molecular
   Function); green, CC (Cellular Component) (D) Top 20 of BP, MF, CC (E)
   KEGG analysis of high-and low-risk groups. Outer circle, the number of
   KEGG pathways; inner circle, number of enriched genes; yellow,
   metabolism; blue, organismal systems; green, human diseases; purple,
   environmental information processing (F) Top 20 of KEGG pathways
   enrichment.

   Subsequently, we constructed PPI networks by STRING databases to
   identify the hub genes and reveal their potential interactions. First
   of all, we built protein interaction networks by DEGs and the minimum
   score of interactions was set to 0.7 ([118]Supplementary Figure S2A).
   We additionally determined the most relevant genes in the PPI networks
   by the Cytohubba plugin and 15 genes were regarded as hub genes:
   SPANXD, MAGEA4, MAGEC1, SPANXC, CTAG2, MAGEA10, CT45A1, MAGEA1, MAGEA1,
   MAGEC2, SPRR2D, KRT6A, KRT14, CASP14, and SPRR2E ([119]Supplementary
   Figure S2B).

   We also predicted the potential miRNAs which regulate the 15 hub genes
   by the Networkanalyst databases; the final subnetwork contained 49
   nodes (i.e., miRNA) and 11 seeds (i.e., matched hub genes)
   ([120]Supplementary Figure S2C). Similarly, we obtained the
   transcription factors-hub genes regulatory networks based on the JASPAR
   databases, the final contained 14 seeds (i.e., hub genes) and 46 nodes
   (i.e., transcription factors) ([121]Supplementary Figure S2D).

   Subsequently, we carried out GSEA between the two risk groups to
   identify remarkably enriched pathways (p-value < 0.05)
   ([122]Supplementary Table S4). The GSEA results showed enrichment of
   cell cycle checkpoints, cell cycle mitotic, retinoblastoma gene in
   cancer, mitotic metaphase, and anaphase in the high-risk group. CD22
   mediated BCR regulation, heme scavenging from plasma, asthma, and
   antigen activates B cell receptor BCR resulting in the generation of
   second messengers were enriched in the low-risk group ([123]Figures
   7A,B). GSVA findings ascertained that there were variations in a total
   of six gene sets between the two risk groups, according to the
   screening of the hallmark gene sets, for example, angiogenesis, apical
   surface, and apical junction ([124]Figure 7C).

FIGURE 7.

   [125]FIGURE 7
   [126]Open in a new tab

   GSVA and GSEA analysis of high- and low-risk groups (A) Main enriched
   pathways in the high-risk group (B) Main enriched pathways in the
   low-risk group (C) Differential gene sets based on the hallmark gene
   sets.

Analysis of immune infiltration in the high-and low-risk groups

   After ranking based on the risk score, the immune cell infiltration for
   each sample in the TCGA LUAD is shown in the bar graphs. The
   infiltration scores and correlation analysis between the 22 immune
   cells were obtained by the CIBERSORT algorithm, respectively
   ([127]Figures 8A,B). We further evaluated the differences in immune
   cell infiltrates in the two risk groups. As shown in [128]Figure 8C,
   the infiltration scores for naive B cells, plasma cells, CD4^+ T cells
   memory resting T cells, and resting dendritic cells were lower in the
   high-risk group than in the low-risk group; however, the infiltration
   scores for CD8^+ T cells and M0 Macrophages were greater in the
   high-risk group. We computed the correlation of the expression level of
   KPNA4 and KPNA5 and various types of immune cells by Spearman’s
   correlation analysis ([129]Supplementary Figures S3, S4). Additionally,
   we combined the genes related to immunity and inflammation (for
   example, HLA family and complement-related genes), and analyzed the
   differences in the two risk groups. We found that the MHC-II family was
   decreased in the high-risk group, and the main function of the MHC-II
   gene is antigen-presenting. This suggested that the antigen-presenting
   function might be affected in the high-risk group ([130]Figures 8D,E).
   Additionally, there were variations of complement-related genes in both
   groups, which illustrated a close association with inflammation
   ([131]Figures 8F,G).

FIGURE 8.

   [132]FIGURE 8
   [133]Open in a new tab

   Immune infiltration in the two risk groups (A) The panorama of 22
   immune cell infiltrates (B) Correlation analyses of 22 immune cell
   types (C) Differences in the immune cell infiltration between high-and
   low-risk groups (D,E) Differential expression of HLA gene family
   between the two risk groups (F,G) Differential expression of
   complement-related genes in the two risk groups. * represents p < 0.05;
   ** represents p < 0.01; *** represents p < 0.001; ns represents no
   significant difference (p > 0.05).

Discussion

   Due to its highly malignant nature and a paucity of methods for early
   diagnosis, LUAD is linked to high incidence as well as mortality rates.
   Therefore, recognition of particular principal molecular pathways and
   extensively sensitive, reliable biomarkers is required to improve the
   early diagnosis and survival outcomes of LUAD patients. Previous
   investigations have demonstrated the relationship of the KPNA family
   genes with tumor progression ([134]Wang et al., 2012; [135]Xu et al.,
   2021). However, there is a lack of in-depth characterization of the
   role of the KPNA family in LUAD. This is the first investigation to
   develop a prognostic model premised on the expression of the KPNA
   family genes, as per our best knowledge. Enrichment analysis revealed
   the involvement of the KPNA family in transcription, cell cycle, immune
   infiltration, and inflammatory response, which are tumor-related
   processes. Thus, our findings may be useful in the development of
   future investigations to determine patient prognosis and to recognize
   candidate therapeutic targets in LUAD individuals.

   We explored the connection between the KPNA family expression and the
   OS of patients. High expression of KPNA2 and KPNA4 were predictive of
   inferior OS. KPNA4 has previously been identified as a tumor promoter
   gene in some cancers ([136]Wang et al., 2015). For example, high
   expression of KPNA4 in cutaneous squamous cell carcinoma was discovered
   to enhance cancer cell proliferation as well as cisplatin resistance
   ([137]Zhang et al., 2019). Inhibition of KPNA4 attenuated prostate
   cancer metastasis ([138]Yang et al., 2017). Regulating upstream
   modulators facilitates angiogenesis as well as progression in lung
   cancer by targeting the miR‐340‐5p/KPNA4 axis ([139]Li et al., 2020). A
   previous study identified overexpression of KPNA2 in NSCLC, and KPNA2
   was identified as a potential biomarker for NSCLC ([140]Wang et al.,
   2011). These studies support our conclusions that KPNA2 and KPNA4 may
   be useful prognostic biomarkers for LUAD patients.

   KEGG enrichment analysis showed transcriptional dysregulation in
   cancers enriched with DEGs in the high-risk group. Transcription
   factors serve as a group of sequence-specific binding proteins that can
   activate or suppress transcription through transactivation or
   transrepression domains. Transcription factors have been linked to the
   pathogenesis of a variety of human diseases (including cancers); these
   account for approximately 20% of all oncogenes identified so far
   ([141]Lambert et al., 2018). Previous literature reports have displayed
   the involvement of transcription factors in regulating cell
   proliferation, differentiation, apoptosis, and their remarkable
   function in the onset and development of tumors ([142]Sever and Brugge,
   2015). Dysregulation of principal transcriptional modulators not only
   defines the cancer phenotype but is important for its development
   ([143]Gonda and Ramsay, 2015). Our results suggest that the KPNA family
   may influence the transcriptional dysregulation in LUAD. Therefore, it
   is important to study the mechanism of transcriptional dysregulation of
   the KPNA family in LUAD.

   In this study, we found that cell cycle checkpoints and cell cycle
   mitotic were enriched in the high-risk group. Cell cycle checkpoints
   are biochemical signaling mechanisms that detect DNA damage or
   chromosomal dysfunction and trigger a series of sophisticated cellular
   repair responses ([144]Wu et al., 2005). Typically, cell cycle
   checkpoints are disrupted in most malignancies and serve a vital
   function in maintaining genomic integrity and inactivating checkpoint
   genes ([145]Zheng et al., 2010). In previous research, impaired
   function of cell cycle checkpoints was found to raise the risk of lung
   cancer ([146]Wu et al., 2005). Mitosis is the critical stage of the
   cell cycle, involving the passage of one of the sister chromatids to
   each of the daughter cells. Therefore, precise regulation of mitosis is
   essential for the maintenance of chromosome stability in human cells
   ([147]Pines, 2006). Aberrant mitotic progression leads to chromosomal
   missegregation, contributing to carcinogenesis ([148]Kops et al., 2005;
   [149]Holland and Cleveland, 2009; [150]Schvartzman et al., 2010). Our
   study identified significant enrichment of these two pathways in the
   high-risk group, which additionally validated the accuracy of the risk
   prediction model constructed in this study.

   The tumor microenvironment (TME) is a heterogeneous system consisting
   of immune cells, cancer cells, and an extracellular matrix
   ([151]Hoadley et al., 2014; [152]Warrick et al., 2019). The roles for
   immune homeostasis similar to a buffering system. While the immune
   system is constantly stimulated and dampened, the system is maintained
   at a relatively stable steady state ([153]da Gama Duarte et al., 2018).
   In this study, the infiltration scores for naive B cells, plasma cells,
   CD4^+ T cells memory resting T cells, and resting dendritic cells were
   lowered in the high-risk groups than in the low-risk groups, but the
   infiltration scores for CD8^+ T cells, M0 Macrophages were elevated in
   the high-risk group. This could lead to different responses to
   immunotherapies in the two risk groups. The purpose of immunotherapy is
   to alter the environment, and thereby, the equilibrium of the response.
   Therefore, the sensitivity of immunotherapy in the two risk groups also
   remains unexplored.

   Immune evasion is a significant feature of cancer, and inhibition of
   HLA gene levels may lead to attenuated antigen presentation,
   facilitating immune evasion ([154]McGranahan et al., 2017). HLA family
   genes were decreased in the high-risk group, which suggests that the
   high-risk group was more prone to immune evasion and thus have a worse
   prognosis. These results are consistent with our survival analysis.
   Additionally, we studied the expression of inflammation-related genes
   in the two risk groups and captured the down-regulation of
   complement-related genes in the high-risk group. These findings suggest
   that inflammation was strongly associated with the low-risk group.

   This is the first-ever report on the association of the KPNA family
   expression with survival outcomes of patients with LUAD. Therefore, the
   KPNA family may potentially serve as a novel prognostic biomarker in
   patients with LUAD and provide novel targets for LUAD immunotherapy.
   However, this was bioinformatics research and most of the findings were
   generated from public databases and bioinformatics analysis. Further in
   vitro and in vivo experiments are required to validate our findings.

   In conclusion, we found that KPNA2 and KPNA4 are potential prognostic
   markers. We created a prognostic model on the basis of the expression
   level of the KPNA family, which was shown to accurately predict
   prognosis. This prognostic model reflects many aspects of LUAD biology
   and provides new insights into precision therapy for LUAD. In the
   future, a lot of basic experiments need to be carried out to validate
   the applicability and accuracy of this model.

Acknowledgments