Abstract

Background

   The highest rate of cancer-related deaths worldwide is from lung
   adenocarcinoma (LUAD) annually. Metabolism was associated with
   tumorigenesis and cancer development. Metabolic-related genes may be
   important biomarkers and metabolic therapeutic targets for LUAD.

Materials and Methods

   In this study, the gleaned cohort included LUAD RNA-SEQ data from the
   Cancer Genome Atlas (TCGA) and corresponding clinical data (n = 445).
   The training cohort was utilized to model construction, and data from
   the Gene Expression Omnibus (GEO, [38]GSE30219 cohort, n = 83; GEO,
   [39]GSE72094, n = 393) were regarded as a testing cohort and utilized
   for validation. First, we used a lasso-penalized Cox regression
   analysis to build a new metabolic-related signature for predicting the
   prognosis of LUAD patients. Next, we verified the metabolic gene model
   by survival analysis, C-index, receiver operating characteristic (ROC)
   analysis. Univariate and multivariate Cox regression analyses were
   utilized to verify the gene signature as an independent prognostic
   factor. Finally, we constructed a nomogram and performed gene set
   enrichment analysis to facilitate subsequent clinical applications and
   molecular mechanism analysis.

Result

   Patients with higher risk scores showed significantly associated with
   poorer survival. We also verified the signature can work as an
   independent prognostic factor for LUAD survival. The nomogram showed
   better clinical application performance for LUAD patient prognostic
   prediction. Finally, KEGG and GO pathways enrichment analyses suggested
   several especially enriched pathways, which may be helpful for us
   investigative the underlying mechanisms.

   Keywords: Metabolic, Prognostic, Dignature, Lung adenocarcinoma

Introduction

   Lung cancer (LC) is one of the most common cancers worldwide and the
   main cause of cancer-related mortality ([40]Bray et al., 2018;
   [41]Torre et al., 2015). Non-small cell lung cancer (NSCLC) accounts
   for 85% of all LCs. The 5-year survival rate after the diagnosis of LC
   is 15.6% ([42]Nanavaty, Alvarez & Alberts, 2014). In NSCLC, lung
   adenocarcinoma (LUAD) is the major histological subtype ([43]Balzer et
   al., 2018), and the recurrence rate and mortality rate remain high
   despite recent advances in surgical methods, neoadjuvant therapies, and
   immunotherapies.

   As bioinformatics advances in oncology research, researchers can
   utilize access public resources from multiple public databases such as
   The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO),
   as well as Surveillance and Epidemiology and End Results (SEER)
   ([44]Doll, Rademaker & Sosa, 2018; [45]Li et al., 2018; [46]Liu et al.,
   2019). Bioinformatics has contributed to determining the prognosis and
   treatment of LC ([47]Parikh, 2019). There have been numerous studies on
   gene prognosis models that could contribute to the selection of LC
   treatment methods and the prediction of survival after LC surgery; for
   example, a prognostic signature containing six genes (RRAGB, RSPH9,
   RPS6KL1, RXFP1, RRM2, and RTL) to evaluate the prognosis of NSCLC
   patients ([48]Xie & Xie, 2019). In another article on prognostic
   characteristics of LUAD, a prognostic model based on 20 genes was
   developed to predict patient overall survival (OS) ([49]Zhao, Li &
   Tian, 2018). These prognostic signatures all have better clinical
   application performance.

   Metabolic changes in LC are the key to diagnosis, and metabolic
   remodelling is a critical factor in tumorigenesis and development
   ([50]Chen et al., 2019b). Metabolic remodelling not only provides
   substances and energy for the survival and proliferation of tumour
   cells but also protects tumour cells so that they can survive,
   proliferate, and metastasize in harsh microenvironments ([51]Hensley et
   al., 2016). Therefore, changes in metabolism affect tumour prognosis
   and treatment effects ([52]Chang, Fang & Gu, 2020; [53]Cruz-Bermúdez et
   al., 2019). To explore the correlation between metabolic genes and the
   prognosis of LUAD patients, we utilized the TCGA-LUAD database to build
   a prognostic signature of multiple metabolic-related genes and
   validated it in GEO data sets for LUAD patients. We conducted this
   study, and our findings suggested that metabolic-related gene
   signatures may be a prognostic marker for LUAD patients.

Materials and Methods

Data collections

   First, we got clinical information for patients with LUAD from TCGA
   ([54]https://portal.gdc.cancer.gov/). It included 497 LUAD patients
   with mRNA expression profiles and clinical follow up information was
   available for our study. The number of obtainable clinical cases for
   the selected subjects was 445 after removing 52 patient samples from
   the study due to a lack of clinical information (such as survival time,
   T stage, N stage, and so on) or survival time less than 30 days
   (avoiding non-cancer-related death samples). A total of 445 LUAD
   patients and their information were utilized to build a training cohort
   for identifying prognostic metabolic-related genes and building a
   prognostic risk model. Next, we downloaded the LUAD gene expression
   data from GEO ([55]https://www.ncbi.nlm.nih.gov/geo/) in two accessed
   datasets [56]GSE30219 and [57]GSE72094. Removing other cancer
   pathological types, such as lung squamous cell carcinoma, 83 LUAD
   samples, and 393 LUAD samples were utilized to build a testing cohort
   for validating the prognostic value of the TCGA-LUAD prognostic risk
   model (up to April 01, 2020).

Identification of metabolic-related genes in TCGA-LUAD

   First, we obtained 944 hub metabolism-related genes from the
   intersection of the MSigDB database
   ([58]https://www.gsea-msigdb.org/gsea/msigdb) and TCGA-LUAD. Then, a
   Wilcoxon signed-rank test was performed on normal and cancer tissues in
   the training cohort by “limma” R package (—log FC—>  0.5; FC: fold
   change; a false discovery rate (FDR) P <  0.05) ([59]Diboun et al.,
   2006). The heatmap was plotted by the ”pheatmap” R package and we
   obtained 336 metabolic-related differentially expressed genes. Second,
   after univariate Cox regression analysis, 59 metabolic genes were
   retained (P <  0.05) by using the method that the correlation between
   expression values of metabolic genes and survival of samples in the
   training cohort. Last, we performed lasso-penalized Cox regression
   analysis to identify more important metabolic genes for OS prediction
   through the “glmnet, survival” R package ([60]Zhang et al., 2019). We
   obtained twenty-three metabolic-related genes for risk model building.
   The three-step screening method was robust and performed via Perl
   ([61]https://www.perl.org/) and R (version 3.6.1).

Building the prognostic metabolic gene signature

   To construct the prognostic model, we utilized lasso-penalized Cox
   regression analysis to select the prognostic metabolic-related gene
   ([62]Tibshirani, 1997). We obtained a risk score for each patient by
   their coefficient. Risk score= (Coef AKR1A 1 × expression of AKR1A1) +
   (Coef NT5E × expression of NT5E) + ……(Coef TYMS × expression of TYMS)
   ([63]Liu et al., 2020). R software packages “survival” and “survminer”
   were used to calculate the optimal cut-off value for risk scores and
   plot Kaplan–Meier survival curves ([64]Chan et al., 2018). Using the
   median as a point of differentiation, we differentiated patients into
   two groups: high-risk and low-risk. The R package “survivalROC” was
   used to plot time-dependent ROC curves for predicting the diagnostic
   value ([65]Heagerty, Lumley & Pepe, 2000). The concordance index
   (C-index) was used to evaluate the predictive ability of the risk
   model.

Verification of the prognostic signature as an independent risk factor and
correlation analysis between the clinical characteristics and risk scores

   Patients with complete information on the corresponding clinical data
   were available for univariate and multivariate analysis. P <  0.05
   symbolizes statistically significant ([66]Liu et al., 2020). We
   performing the Student’s t-test to verify the correlation between
   clinical characteristics and risk scores.

Construction and verification of the predictive nomogram

   The nomogram was built by the “rms” R package according to training
   cohort data ([67]Iasonos et al., 2008). In our study, the tumour-node
   metastasis (TNM) model and the prognostic signature were integrated
   into the predictive nomogram. We performed the calibration plot and
   C-index to investigate the predictive ability of the nomogram. The
   calibration plot was used to assess whether the numerical value of the
   predicted value of the model and the probability of the occurrence of
   the ending event were consistent ([68]Fenlon et al., 2018). We used
   C-index to assess the predictive ability of the nomogram. It estimates
   the probability that the predicted result is consistent with the actual
   observed result. We compared the TNM model, prognostic model, and the
   nomogram model through ROC analysis and C-index ([69]Liu et al., 2020).
   Next, we verified the nomogram by C-index, ROC analysis, and
   calibration plot in the testing cohort data.

KEGG and GO pathways enrichment analyses

   To study the biological role of mRNA markers in LUAD patients, we
   utilized the Kyoto Encyclopedia of Genes and Genomics (KEGG) and Gene
   Ontology (GO) pathway enrichment analysis to explore which pathways the
   differentially expressed genes were mainly enriched in (up to April 11,
   2020). Gene Set Enrichment Analysis (GESA)
   ([70]https://www.gsea-msigdb.org/gsea/index.jsp) was utilized to find
   enriched terms in the training cohort or testing cohort
   ([71]Subramanian et al., 2005). We choose “c2. cp. kegg. v6.2. symbols.
   gmt. gene sets” as a reference gene set from the MSigDB database
   ([72]https://www.gsea-msigdb.org/gsea/msigdb/). P <  0.05, FDR q-value
   <  0.25, and normalized enrichment score —NES— ≥1 suggested
   statistically significant. We plotted the results by “ggplot2,
   gridExtra, grid, plyr” R package. All operations are carried out in
   GSEA_4.0.3. GO pathway enrichment analysis of metabolic genes was
   performed by “clusterProfiler, org.Hs.eg.db, plot, ggplot2” in R p
   ackage ([73]Pathan et al., 2015).

Statistical analyses

   logFC: logarithmic value of FC; positive/negative logFC indicates the
   logarithmic foldness of upregulation/downregulation; —log FC—>0.5
   indicates multiple differences in the gene expression greater than 0.5
   between normal tissues and cancer tissues. A coefficient is a number
   that expresses a measurement of a particular quality of a substance or
   object under specified conditions ([74]Bøvelstad et al., 2007). The
   risk score was calculated according to the formula:
   [MATH: <mstyle displaystyle="true"><mi
   mathvariant="normal">Risk score</mi><mfenced separators="" open="("
   close=")"><mrow><mi
   mathvariant="normal">patients</mi></mrow></mfenced><mo>=</mo><munder><m
   row><mo mathsize="big" movablelimits="false">
   ∑</mo></mrow><mrow><mi>n</mi></mrow></munder><mfenced separators=""
   open="(" close=")"><mrow><mi
   mathvariant="normal">coefficient</mi><mfenced separators="" open="("
   close=")"><mrow><mi
   mathvariant="normal">mRNAn</mi></mrow></mfenced><mo>∗</mo><mi
   mathvariant="normal">expression (mRNAn)</mi></mrow></mfenced><mo>.</mo>
   </mstyle> :MATH]

   C-index had a lower accuracy from 0.50–0.70, medium accuracy between
   0.71–0.90, and higher accuracy when greater than 0.90 degrees ([75]Kim,
   Schaubel & Mccullough, 2018). The Area Under Curve (AUC) value of the
   time-dependent ROC lay in the range of 0.5–0.9 viewed as statistically
   significant. All statistical analyses were conducted through R 3.6.1.

Results

Construction of the prognostic signature from the training cohort

   The workflow of the study was shown in [76]Fig. 1. In our study, 445
   LUAD patients ([77]Table S1, clinical information in [78]Table S1) from
   the TCGA data set were assigned to the training sample cohort; 83 LUAD
   patients ([79]Table S3, clinical information in [80]Table S4) and 393
   LUAD patients ([81]Table S5, clinical information in [82]Table S6) from
   the GEO data set were assigned to the testing sample cohort for batch
   processing. After the Wilcoxon signed-rank test was applied to the
   training set, we obtained 336 meaningful metabolic genes ([83]Fig. 2,
   [84]Table S7). From the 336 meaningful metabolic genes, we got 59 mRNAs
   which were considered to be significantly associated with OS in LUAD
   patients, 42 high-risk genes, and 17 low-risk genes ([85]Fig. 3,
   [86]Table S8). Finally, lasso-penalized Cox analysis identified 23
   genes (AKR1A1, NT5E, PTGIS, GMPS, MBOAT1, ADCY9, B4GALT1, MAOB, INPP4B,
   NEU1, ALDOA, ENTPD2, GNPNAT1, GSTA3, PKM, HK3, ALDH2, AK2, LDHA, CHPT1,
   SMS, CTPS2, and TYMS) to construct the prognostic model ([87]Table 1).

Figure 1. Workflow of the study.

   [88]Figure 1
   [89]Open in a new tab

Figure 2. Heatmap and volcano map of the differentially expressed genes in
normal and tumour tissues from TCGA-LUAD.

   [90]Figure 2
   [91]Open in a new tab

   (A) Heatmap of the differentially expressed genes in normal and tumour
   tissues from TCGA-LUAD; (B) Volcano map of the differentially expressed
   genes in normal and tumour tissues from TCGA-LUAD. In the volcano map,
   red: genes upregulated in tumour groups; yellow: genes downregulated in
   the tumour group; black: no differentially expressed genes in the
   tumour group.

Figure 3. Forest map of the fifty-nine mRNAs (red: high-risk genes; yellow:
low-risk genes).

   [92]Figure 3
   [93]Open in a new tab

Table 1. The 23-gene signatures screened by their coefficients.

    Gene   Coefficient  HR   HR.95L HR.95H    P
   AKR1A1  −0.005      0.985 0.973  0.996  0.01
   NT5E    0.003       1.01  1.003  1.017  0.007
   PTGIS   0.01        1.007 1.003  1.011  0.001
   GMPS    0.003       1.075 1.039  1.113  4.17E−05
   MBOAT1  −0.025      0.956 0.92   0.994  0.023
   ADCY9   −0.009      0.908 0.847  0.975  0.008
   B4GALT1 0.002       1.011 1.006  1.016  3.05E−06
   MAOB    −0.019      0.968 0.938  0.998  0.039
   INPP4B  0.046       1.122 1.033  1.219  0.006
   NEU1    −0.003      0.986 0.972  0.999  0.034
   ALDOA   0.001       1.002 1      1.003  0.006
   ENTPD2  0.065       1.119 1.062  1.179  2.25E−05
   GNPNAT1 0.028       1.045 1.029  1.061  2.86E−08
   GSTA3   −0.084      0.708 0.502  0.997  0.048
   PKM     0.001       1.004 1.002  1.007  0
   HK3     −0.008      0.946 0.904  0.99   0.016
   ALDH2   −0.001      0.98  0.966  0.993  0.003
   AK2     −0.013      0.978 0.96   0.995  0.014
   LDHA    0.002       1.005 1.003  1.006  4.20E−09
   CHPT1   0.024       1.042 1.008  1.076  0.014
   SMS     0.001       1.013 1.007  1.02   0
   CTPS2   0.052       1.079 1.016  1.146  0.014
   TYMS    0.016       1.028 1.014  1.044  0.0002
   [94]Open in a new tab

   Notes.

   HR
          hazard ratio

The twenty-three metabolic genes signature and predictability assessment in
the training cohort

   Finding that there is a significant and independent correlation between
   the expression of twenty-three prognostic mRNAs and OS, we believed
   that combining the 23 metabolic genes to form a twenty-three gene
   signature could predict a patient’s prognosis. According to the optimal
   cut-off of 1.53, we classified training cohort samples into two groups:
   a high-risk group (n = 222) and a low-risk group (n = 223) ([95]Table
   S9). The OS rate between the two risk groups was significantly
   different (P = 5.543e−10) ([96]Fig. 4A).

Figure 4. Kaplan–Meier analysis, time-dependent ROC analysis, and risk score
analysis for the twenty-three gene signature in the TCGA-LUAD cohort.

   [97]Figure 4
   [98]Open in a new tab

   (A) Kaplan–Meier curve of the twenty-three gene signature in the TCGA
   cohort; (B) time-dependent ROC analysis of the twenty-three gene
   signature in the TCGA cohort; (C–E) a heatmap of mRNA expression of the
   twenty-three gene signature, and risk curves in the TCGA cohort.

   The AUC was 0.798, 0.747, and 0.734 for the 1, 3, and 5-year OS,
   separately ([99]Fig. 4B). We plotted patients’ risk curves in the
   training cohort and analysed their distribution in [100]Figs.
   4D–[101]4E, and the heat map reveals the prognostic mRNA expression
   patterns between two distinct prognostic patient groups ([102]Fig. 4C).

Validation of the twenty-three metabolic gene signatures

   To test the robustness of the prognostic signature, according to the
   risk score cut-off of the training cohort, [103]GSE30219 was structured
   into a high-risk group (n = 25) and a low-risk group (n = 58)
   ([104]Table S10); and [105]GSE72094 was structured into a high-risk
   group (n = 196) and a low-risk group (n = 197) ([106]Table S11). In the
   [107]GSE30219 cohort, the Kaplan–Meier survival curves of the
   prognostic signature have a statistically significant difference in the
   two predicted risk groups ([108]Fig. 5, P = 1.176e−02). The AUC was
   0.694, 0.645, and 0.637 for the 1, 3, and 5-year OS, separately
   ([109]Fig. 5B). The risk curves and expression of twenty-three
   metabolic genes in the [110]GSE30219 cohort were shown in [111]Figs.
   5C–[112]5E Fig 5C-E. And in the [113]GSE72094 cohort, Kaplan–Meier
   survival curves of the prognostic signature has a statistically
   significant difference in the two predicted risk groups ([114]Fig. 6,
   P = 1.417e−10). The AUC was 0.695, 0.725, and 0.742 for the 1, 3, and
   5-year OS, separately ([115]Fig. 6B). The risk curves and expression of
   the twenty-three metabolic genes in the [116]GSE30219 cohort were shown
   in [117]Figs. 6C–[118]6E. We also compared our results with the two
   published gene signature studies ([119]Xie & Xie, 2019; [120]Zhao, Li &
   Tian, 2018) and found that the C-index results were better than their
   signatures ([121]Table 2).

Figure 5. Kaplan–Meier analysis, time-dependent ROC analysis, and risk score
analysis for the twenty-three gene signature in the [122]GSE30219 cohort.

   [123]Figure 5
   [124]Open in a new tab

   (A) Kaplan–Meier curve of the twenty-three gene signature in the
   [125]GSE30219 cohort; (B) time-dependent ROC analysis of the
   twenty-three gene signature in the [126]GSE30219 cohort; (C–E) a
   heatmap of mRNA expression of the twenty-three gene signature, and risk
   curves in the [127]GSE30219 cohort.

Figure 6. Kaplan–Meier analysis, time-dependent ROC analysis, and risk score
analysis for the twenty-three gene signature in the [128]GSE72094 cohort.

   [129]Figure 6
   [130]Open in a new tab

   (A) Kaplan–Meier curve of the twenty-three gene signature in the
   [131]GSE72094 cohort; (B) time-dependent ROC analysis of the
   twenty-three gene signature in the [132]GSE72094 cohort; (C–E) a
   heatmap of mRNA expression of the twenty-three gene signature, and risk
   curves in the [133]GSE72094 cohort.

Table 2. Comparison of the twenty-three gene prognostic signature to the two
published prognostic signatures.

   Studies TCGA cohort [134]GSE30219 cohort [135]GSE72094 cohort
   HR (95% CI) P C-index HR (95% CI) P C-index HR (95% CI) P C-index
   Present study 4.55(3.487∼5.946) 2.00E−16 0.730 3.80(1.977∼7.308)
   6.25E−05 0.682 1.74(1.514∼1.99) 2.96E−15 0.709
   23-gene signature
   Zhao, K.et al. 1.10(1.073∼1.123) 6.20E−16 0.697 1.98(1.683∼2.319)
   1.72E−19 0.682 1.53(1.39∼1.672) 2.17E−15 0.704
   20-gene signature
   Xie, H.et al. 1.29(1.167∼1.428) 7.45E−07 0.632 2.34(1.825∼2.996)
   1.87E−11 0.668 1.82 (1.506∼2.206) 7.10E−10 0.660
   6-gene signature
   [136]Open in a new tab

   Notes.

   HR
          hazard ratio

   CI
          confidence interval

   Taken together, these results indicated a better predictive performance
   about our prognostic signature.

Correlation analysis between the prognostic signature and clinical
characteristics

   445 patients with their information from TCGA-LUAD cohort were utilized
   for the correlation analysis. Being male (P = 0.017) and TNM stage (P
   < 0.01) have a significant correlation with a higher risk score.
   Samples with higher T, N, and M grading were also significantly
   correlated with a higher risk score ([137]Table 3).

Table 3. Correlation analysis of the clinical characteristics and the
twenty-three gene signature in LUAD.

   Characteristics TCGA training cohort (n = 445) [138]GSE30219 testing
   cohort (n = 83) [139]GSE72094 testing cohort (n = 393)
   n Mean (rick score) SD t P n Mean (rick score) SD t P n Mean (rick
   score) SD t P
   Age(years)
   <68 247 0.765 0.64 −0.164 0.87 23 0.543 0.346 −0.088 0.931 153 1.253
   0.93 −0.524 0.600
   ≥68 198 0.775 0.663 60 0.55 0.272 240 1.305 1.04
   Gender
   Female 244 0.702 0.611 −2.393 0.017 19 0.535 0.357 −0.183 0.856 219
   1.203 1.012 −1.850 0.065
   Male 201 0.851 0.686 64 0.552 0.274 174 1.389 0.972
   TNM stage
   Stage I+II 348 0.68 0.584 −4.822 <0.001 60 0.456 0.235 −6.598 <0.001
   320 1.166 0.842 −3.771 <0.001
   Stage III+IV 97 1.083 0.765 23 0.875 0.239 73 1.805 1.392
   T
   T1+2 387 0.716 0.595 −3.5 0.001 73 0.528 0.299 −2.479 0.024 – – – – –
   T3+4 58 1.125 0.86 10 0.697 0.185 – – –
   N
   N0 292 0.71 0.639 −2.629 0.009 71 0.502 0.276 −4.03 0.001 – – – – –
   N1-3 153 0.881 0.657 12 0.817 0.245 – – –
   M
   M0 403 0.742 0.64 −2.577 0.013 67 0.468 0.241 −5.959 <0.001 – – – – –
   M1 42 1.028 0.689 16 0.881 0.251 – – –
   [140]Open in a new tab

   Notes.

   SD
          standard deviation

   TNM
          tumornode metastasis

   83 patients with their information from [141]GSE30219 cohort were
   utilized for the correlation analysis. TNM stage was significantly
   associated with a higher risk score (P <  0.001). Samples with higher
   T, N, and M grading were also significantly correlated with a higher
   risk score (P <  0.01) ([142]Table 3). And 393 patients with their
   information from [143]GSE72094 cohort were utilized for the correlation
   analysis. TNM stage was significantly correlated with a higher risk
   score (P <  0.001) ([144]Table 3).

Validation of the independent prognostic factor

   We analysed 445 patients, with a median age of 68, grouping them by
   gender ([145]Table 4). The results identified that our prognostic
   signature was an independent OS prognostic indicator ([146]Figs. 7A and
   [147]7B). We also verified that the prognostic signature can serve as
   an independent prognostic indicator in the [148]GSE30219 cohort
   ([149]Figs. 7C and [150]7D) and [151]GSE72094 cohort ([152]Figs. 7E and
   [153]7F). The multivariate Cox analysis indicating that the prognostic
   signature was significantly associated with OS in each cohort when
   adjusted for the TNM stage ([154]Table 4). And stratification analysis
   showed that high-risk group was significantly correlated with a poorer
   OS ([155]Figs. 8A–[156]8F). However, in the [157]GSE30219 testing
   cohort, patients in the high-risk group from TNM stages III and IV show
   no significantly correlated with OS ([158]Fig. 8D).

Table 4. Univariate and multivariate Cox regression analysis of overall
survival in each cohort.

   Variables Univariate analysis Multivariate analysis
   HR HR.95L HR.95H P HR HR.95L HR.95H P
   TCGA training cohort (n = 445)
   Age(≥68/ <68) 0.999 0.98 1.018 0.922 1.007 0.987 1.028 0.491
   Gender (female/male) 1.006 0.692 1.462 0.976 0.936 0.638 1.373 0.736
   TNM stage (I+II/III+IV) 1.659 1.404 1.961 3.01E−09 2.677 0.994 2.831
   0.043
   T 1.592 1.278 1.982 3.27E−05 1.009 0.788 1.291 0.944
   N 1.798 1.463 2.209 2.34E−08 1.142 0.714 1.826 0.58
   M 1.798 0.985 3.283 0.056 0.466 0.116 1.871 0.281
   Risk Score 3.89 2.925 5.174 9.79E−21 3.699 2.717 5.036 9.80E−17
   [159]GSE30219 testing cohort (n = 83)
   Age(≥68/ <68) 1.037 1.023 1.052 3.22E−07 1.031 0.994 1.07 0.102
   Gender(female/male) 1.815 1.129 2.917 0.014 1.126 0.491 2.586 0.779
   TNM stage(I+II/III+IV) 2.719 0.762 3.03 2.35E−03 2.409 1.014 2.882
   0.044
   T 1.663 1.448 1.911 7.10E−13 1.361 1.126 1.644 0.001
   N 1.777 1.51 2.091 4.36E−12 1.358 1.087 1.698 0.007
   M 2.856 1.17 6.97 0.021 2.357 0.958 5.797 0.062
   Risk Score 3.726 1.429 9.714 0.007 2.26 1.168 4.374 0.016
   [160]GSE72094 testing cohort (n = 393)
   Age(≥68/<68) 1.007 0.988 1.027 0.479 0.999 0.978 1.019 0.889
   Gender(female/male) 1.547 1.065 2.246 0.022 1.487 1.011 2.189 0.044
   TNM stage 1.625 1.360 1.941 <0.001 1.607 1.333 1.938 <0.001
   Risk Score 1.736 1.514 1.990 <0.001 1.646 1.431 1.894 <0.001
   [161]Open in a new tab

   Notes.

   HR
          hazard ratio

   TNM
          tumornode metastasis

Figure 7. Forrest plot of the univariate (yellow) and multivariate (red) Cox
regression analysis in the each cohort.

   [162]Figure 7
   [163]Open in a new tab

   (A–B) Forrest plot of the univariate (yellow) and multivariate (red)
   Cox regression analysis in the TCGA-LUAD cohort; (C–D) Forrest plot of
   the univariate (yellow) and multivariate (red) Cox regression analysis
   in the [164]GSE30219 cohort; (E–F) Forrest plot of the univariate
   (yellow) and multivariate (red) Cox regression analysis in the
   [165]GSE72094 cohort.

Figure 8. The Kaplan–Meier curve for the twenty-three gene signature in LUAD.

   [166]Figure 8
   [167]Open in a new tab

   (A–B) The Kaplan–Meier curve showed that the survival of patients was
   significantly poorer in the high-risk group in TNM stages I and II/TNM
   stages III and IV of the TCGA-LUAD cohort; (C–D) the Kaplan–Meier curve
   showed that the survival of patients was significantly poorer in the
   high-risk group in TNM stages I and II/TNM stages III and IV of the
   [168]GSE30219 cohort. However, there was no statistical significance in
   TNM stages III and IV; (E–F) the Kaplan–Meier curve showed that the
   survival of patients was significantly poorer in the high-risk group in
   TNM stages I and II/TNM stages III and IV of the [169]GSE72094 cohort.

Construction and verification of the predictive nomogram

   We built the nomogram by including the independent prognostic roles
   ([170]Fig. 9A). Calibration plots verified the performance of the
   nomogram ([171]Fig. 9B). The C-index of the TNM model, prognostic
   signature, and nomogram model were 0.654, 0.730, and 0.793, separately
   ([172]Table 5). The AUC was 0.838, 0.785, and 0.779 for the 1, 3, and
   5-year OS, respectively. The nomogram model showed a better AUC
   predicting 1, 3, and 5-year OS than the prognostic model in the
   training cohort ([173]Figs. 9C–[174]9E).

Figure 9. Building the nomogram for predicting the overall survival of LUAD
patients from the TCGA cohort.

   [175]Figure 9
   [176]Open in a new tab

   (A) The nomogram plot was built based on all prognostic factors in the
   TCGA-LUAD; (B) the calibration plot for internal validation of the
   nomogram; (C–E) the time-dependent ROC curves of the nomograms compared
   for 1-, 3-, and 5-year overall survival in TCGA-LUAD.

Table 5. Comparison of the nomogram model with the TNM model and prognostic
model.

   Cohort Nomogram model TNM model Prognostic model
   HR (95% CI) P C-index HR (95% CI) P C-index HR (95% CI) P C-index
   TCGA cohort 2.719(2.269∼3.259) 2.00E−16 0.793 2.957(2.015∼4.339)
   3.03E−08 0.654 4.55(3.487∼5.946) 2.00E−16 0.730
   [177]GSE30219 cohort 2.958(2.189∼3.375) 7.55E−16 0.684
   1.267(0.889∼1.803) 0.016 0.619 3.80(1.977∼7.308) 6.25E−05 0.682
   [178]GSE72094 cohort 2.718(2.181∼3.388) 7.55E−16 0.713 2.606
   (1.736∼3.914) 3.85E−06 0.579 1.74(1.514∼1.99) 2.96E−15 0.709
   [179]Open in a new tab

   Notes.

   TNM
          tumornode metastasis

   CI
          confidence interval

   Next, we verified the clinical application of the nomogram in the
   [180]GSE30219 and [181]GSE72094 cohorts. In the [182]GSE30219 and
   [183]GSE72094 testing cohorts, calibration plots verified the
   performance of the nomogram ([184]Figs. 10A and [185]10E). In the
   [186]GSE30219 testing cohort, the C-index of the TNM model, prognostic
   signature, and nomogram model were 0.612, 0.682, and 0.684 ([187]Table
   5). The AUC was 0.731, 0.686, and 0.722 for the 1, 3, and 5-year OS,
   separately ([188]Figs. 10B–[189]10D). In the [190]GSE72094 testing
   cohort, the C-index of the TNM model, prognostic signature, and
   nomogram model were 0.579, 0.709, and 0.713, separately ([191]Table 5).
   The AUC was 0.708, 0.794, and 0.771 for the 1, 3, and 5-year OS,
   separately ([192]Figs. 10F–[193]10H). The nomogram model identified the
   better AUC predicting 1, 3, and 5-year OS than the prognostic signature
   in both testing cohorts.

Figure 10. Validation of the nomogram in the [194]GSE30219 and [195]GSE72094
cohorts.

   [196]Figure 10
   [197]Open in a new tab

   (A) The calibration plot for validation of the nomogram in the
   [198]GSE30219 cohort; (B–D) the time-dependent ROC curves of the
   nomograms compared for 1-, 3-, and 5-year overall survival in the
   [199]GSE30219 cohort; (E) the calibration plot for validation of the
   nomogram in the [200]GSE72094 cohort; (F–H) the time-dependent ROC
   curves of the nomograms compared for 1-, 3-, and 5-year overall
   survival in [201]GSE30219 cohort.

   Taken together, the nomogram model increased the predicting ability of
   the prognostic signature. These results indicated the better predictive
   performance of the nomogram model.

KEGG and GO pathways enrichment analyses

   KEGG enrichment analyses suggesting that a majority of the
   metabolism-related pathways such as the metabolism of fatty acid,
   arachidonic acid, glycerophospholipid, alpha-linolenic acid, and
   pyrimidine were associated with the low-risk group, while the cell
   cycle, mismatch repair, ubiquitin-mediated proteolysis, and p53
   signalling pathways were associated with the high-risk group ([202]Fig.
   11A, [203]Table 6, [204]Table S12). Besides, we performed GO pathway
   enrichment analysis on 23 genes. These 23 genes were statistically
   significant in metabolic-related pathways, such as nucleotide
   biosynthetic processes, secretory granule lumen, and monosaccharide
   binding in Biological Processes (BP), cellular component (CC), and
   molecular function (MF) ([205]Fig. 11B, [206]Table S13).

Figure 11. KEGG and GO enrichment pathway analysis.

   [207]Figure 11
   [208]Open in a new tab

   (A) KEGG enrichment pathway analysis of twenty-three metabolic-related
   genes obtained by the lasso Cox model: five representative KEGG
   pathways in high-risk patients. five representative KEGG pathways in
   low-risk patients. (B) GO enrichment pathway analysis of twenty-three
   metabolic-related genes obtained by the lasso cox model.

Table 6. The SIZE, ES, NES. NOM p-val, and FDR q-val of the KEGG enrichment
pathways.

   SIZE ES NES NOM P FDR q-valve
   Low risk group enrich pathways
   KEGG_ASTHMA 28 −0.802 −2.039 <0.01 0.040
   KEGG_ARACHIDONIC_ACID_METABOLISM 58 −0.558 −1.956 0.002 0.057
   KEGG_GLYCEROPHOSPHOLIPID_METABOLISM 77 −0.435 −1.739 0.008 0.117
   KEGG_ALDOSTERONE_REGULATED_SODIUM_ REABSORPTION 42 −0.494 −1.702 0.010
   0.135
   KEGG_ALPHA_LINOLENIC_ACID_METABOLISM 19 −0.568 −1.672 0.019 0.136
   KEGG_FATTY_ACID_METABOLISM 42 −0.517 −1.635 0.048 0.138
   High risk group enrich pathways
   KEGG_CELL_CYCLE 124 0.746 2.384 <0.01 0.001
   KEGG_PYRIMIDINE_METABOLISM 98 0.620 2.160 <0.01 0.008
   KEGG_MISMATCH_REPAIR 23 0.852 2.149 <0.01 0.005
   KEGG_UBIQUITIN_MEDIATED_PROTEOLYSIS 134 0.541 2.144 <0.01 0.004
   KEGG_P53_SIGNALING_PATHWAY 68 0.546 2.100 0.002 0.006
   [209]Open in a new tab

   Notes.

   ES
          enrichment score

   NES
          normalized enrichment score

   NOM P
          nominal p value

   FDR q-valve
          false discovery rate

Discussion

   Lung cancer has the highest mortality rate among cancer-related
   diseases worldwide ([210]Bray et al., 2018). LUAD is the principal type
   of LC, with a percentage of more than half of morbidity and mortality
   in this group of patients ([211]Jemal et al., 2017). With the
   increasing exploration of cancer metabolic heterogeneity, metabolic
   genes can work as a prognostic signature for LUAD. Identification of
   metabolism-related gene preferences and dependence mechanisms in tumour