Abstract

   Abnormal lipid metabolism is closely related to the malignant
   biological behavior of tumor cells. Such abnormal lipid metabolism
   provides energy for rapid proliferation, and certain genes related to
   lipid metabolism encode important components of tumor signaling
   pathways. In this study, we analyzed pancreatic cancer datasets from
   The Cancer Genome Atlas and searched for prognostic genes related to
   lipid metabolism in the Molecular Signature Database. A risk score
   model was built and verified using the [32]GSE57495 dataset and
   International Cancer Genome Consortium dataset. Four molecular subtypes
   and 4249 differentially expressed genes (DEGs) were identified. The
   DEGs obtained by Weighted Gene Coexpression Network Construction
   analysis were intersected with 4249 DEGs to obtain a total of 1340
   DEGs. The final prognosis model included CA8, CEP55, GNB3 and SGSM2,
   and these had a significant effect on overall survival. The area under
   the curve at 1, 3 and 5 years was 0.72, 0.79 and 0.87, respectively.
   These same results were obtained using the validation cohort. Survival
   analysis data showed that the model could stratify the prognosis of
   patients with different clinical characteristics, and the model has
   clinical independence. Functional analysis indicated that the model is
   associated with multiple cancer‐related pathways. Compared with
   published models, our model has a higher C‐index and greater risk
   value. In summary, this four‐gene signature is an independent risk
   factor for pancreatic cancer survival and may be an effective
   prognostic indicator.

   Keywords: four‐gene signature, pancreatic cancer, prognosis, TCGA,
   WGCNA
     __________________________________________________________________

   Abnormal lipid metabolism is closely related to the malignant
   biological behavior of tumor cells. In this study, we analyzed
   pancreatic cancer datasets from The Cancer Genome Atlas and searched
   for prognostic genes related to lipid metabolism in the Molecular
   Signature Database. Compared with published models, our model has a
   higher C‐index and greater risk value. This four‐gene signature is an
   independent risk factor for pancreatic cancer survival and may be an
   effective prognostic indicator.

   graphic file with name FEB4-11-3153-g010.jpg
     __________________________________________________________________

Abbreviations

   AUC
          area under the receiver operating characteristic curve

   CI
          confidence interval

   DEG
          differentially expressed gene

   FDR
          false discovery rate

   GEO
          Gene Expression Omnibus

   GO
          Gene Ontology

   GSVA
          Gene Set Variation Analysis

   HR
          hazard ratio

   ICGC
          International Cancer Genome Consortium

   KEGG
          Kyoto Encyclopedia of Genes and Genomes

   KM
          Kaplan–Meier

   Lasso
          least absolute shrinkage and selection operator

   NMF
          non‐negative matrix clustering algorithm

   OS
          overall survival

   RNA‐seq
          RNA sequencing

   ROC
          receiver operating characteristic

   ssGSEA
          single‐sample gene set enrichment analysis

   TCGA
          The Cancer Genome Atlas

   TIMER
          Tumor Immune Estimation Resource

   WGCNA
          Weighted Gene Coexpression Network Construction

   As a malignancy of the digestive system, pancreatic cancer is one of
   the most aggressive malignancies in the world. In recent years,
   pancreatic cancer morbidity and mortality rates have steadily
   increased, with an overall 5‐year survival rate of about 8% for
   patients [[33]1]. Despite the rapid development of diagnosis and
   treatment of pancreatic cancer over the past 20 years, the mortality
   rate of patients remains high [[34]2, [35]3, [36]4]. The combination of
   tumor markers and imaging helps to diagnose the disease in a timely and
   accurate manner [[37]5]; however, due to the insidious nature of the
   disease and the lack of early clinical signs, it is not easy to
   diagnose. It has been reported that about 50% of patients have
   confirmed that the cancer has metastasized [[38]6, [39]7]. Therefore,
   revealing the molecular mechanisms of pancreatic cancer progression and
   developing corresponding targeted therapies are critical to improving
   pancreatic cancer outcomes.

   Lipids play an important function in maintaining normal cell function
   and homeostasis; they are not only an important part of the cell
   membrane but also provide precursors for important molecules needed in
   the growth and differentiation pathways [[40]8, [41]9]. Intracellular
   lipids come from two sources: one is food intake, and the other is
   lipid synthesis from scratch by hepatocytes and cells in need. Normal
   cells in the body acquire lipids mainly from diet and rarely from lipid
   synthesis from scratch. For most normal cells, lipids that meet
   cellular needs rarely are synthesized from scratch because of slow cell
   growth [[42]10, [43]11]. However, Medes et al. [[44]12] found in the
   1950s that tumor cells synthesize fatty acids primarily by synthesizing
   them from scratch. In tumor cells, on the one hand, lipids and
   cholesterol are often activated to meet the needs of tumor cells that
   are rapidly proliferating, and on the other hand, lipids alter the
   properties of biofilms and protect cells from oxidative damage from
   internal and external sources [[45]13]. Lipogenesis is an important
   feature of rapid malignancy growth [[46]14]. Normal cell lipid
   synthesis from scratch is rare, and about 90% of fatty acids are
   synthesized from scratch in tumor cells [[47]12]. Activated lipid
   scratch synthesis was found to be associated with poorer prognosis and
   shorter disease‐free survival in tumor patients [[48]15, [49]16]. At
   the molecular level, increased lipid synthesis from scratch in tumors
   is often accompanied by increased lipid synthase and enhanced activity
   [[50]17, [51]18]. Thus, aberrant activation of lipid synthesis from
   scratch is a common feature of tumor cells. In addition, lipid
   metabolic reprogramming that promotes increased lipogenesis is
   associated with the abnormal development and progression of pancreatic
   adenocarcinoma [[52]19]. Mammalian target of rapamycin complex 2
   (mTORC2) stimulates the synthesis of sphingomyelin (glucoceramide) and
   glycerophospholipids (cardiolipin) to promote tumor progression
   [[53]20]. Studies have shown that mTORC1 stimulates the synthesis of
   fatty acids and sterols by regulating the expression of SREBP1c, a
   major adipogenic transcription factor [[54]20, [55]21, [56]22, [57]23].
   The active form of SREBP1c is sensitive to proteasomal degradation but
   can enter the nucleus and participate in its transcriptional targets,
   including its own gene promoter and the promoter of the major enzyme
   encoding fatty acid synthesis [[58]24]. However, a deeper understanding
   of lipid metabolism‐related genes in the prognosis and treatment of
   pancreatic cancer is needed.

   In this study, lipid metabolism‐related gene expression in pancreatic
   cancer was analyzed to identify key genes that could predict patient
   prognosis. A differentially expressed gene (DEG) analysis, Weighted
   Gene Coexpression Network Construction Analysis (WGCNA) and Cox
   proportional risk model were used to finally construct a signature
   based on the expression of several key genes as a prognostic signature
   for pancreatic cancer. This prognostic model can be used as an
   effective tool to predict the prognosis of patients with pancreatic
   cancer. These findings will also help identify new therapeutic targets
   for pancreatic cancer.

Material and methods

Expression spectral data and preprocessing

   Human lipid metabolics‐related pathways were downloaded from Molecular
   Signature Database v7.0 [[59]25], and a total of 776 genes related to
   lipid metabolism were sorted out from the six lipid metabolic pathways
   in Table [60]1. Pancreatic cancer RNA sequencing (RNA‐seq) expression
   data and corresponding clinical follow‐up data were obtained from the
   public database The Cancer Genome Atlas (TCGA)
   ([61]https://portal.gdc.cancer.gov/) [[62]26], which contained RNA‐seq
   data of 182 patients and clinical information of 171 patients on
   December 3, 2019. [63]GSE57495 is a microarray dataset from the Gene
   Expression Omnibus (GEO) database
   ([64]http://www.ncbi.nlm.nih.gov/geo/) [[65]27], containing expression
   profile data and clinical sample information from 62 patients with
   early pancreatic cancer. The International Cancer Genome Consortium
   (ICGC) validation dataset included 257 patients with pancreatic cancer
   with expression profile data and clinical follow‐up information. For
   TCGA dataset, (a) samples without clinical data and overall survival
   (OS) <30 days were removed, (b) normal tissue sample data were removed,
   (c) genes with fragments per kilobase of exon per million of zero in
   half of the samples were removed, and (d) the expression profile of
   genes related to lipid metabolism was preserved. For GEO datasets, (a)
   normal tissue sample data were removed, (b) OS data from months was
   converted to days, (c) samples with OS <30 days were removed, (d) Chip
   probes were mapped to the human gene SYMBOL using Bioconductor package,
   and (e) the expression profile of genes related to lipid metabolism was
   preserved. For the ICGC dataset, (a) sample data without survival
   status were removed, (b) samples with OS <30 days were removed, and (c)
   the expression profile of genes related to lipid metabolism was
   preserved. The [66]GSE57495 and ICGC datasets were considered as the
   validation dataset. The clinical data information is shown in
   Table [67]2. The workflow chart is shown in Fig. [68]1.

Table 1.

   Pathways involved in lipid metabolism.
   Pathways Database Gene count
   Regulation of lipid metabolism by peroxisome proliferator‐activated
   receptor alpha Reactome 119
   Metabolism of lipids Reactome 738
   Sphingolipid metabolism Reactome 89
   Transcriptional regulation of white adipocyte differentiation Reactome
   84
   Glycerophospholipid metabolism KEGG 77
   Fatty acid metabolism Reactome 177
   Sum 1,284 (unique: 776)
   [69]Open in a new tab

Table 2.

   The clinical information of four datasets.
   Characteristics TCGA set Training set P value [70]GSE57495 set ICGC set
   Age (years)
   <65                78         71       1.00   –                103
   ≥65                93         83              –                154
   Progression‐free survival
   Alive              80         69       1.00   21               151
   Dead               91         85              41               106
   Sex
   Female             78         72       1.00   –                120
   Male               93         82              –                137
   pathologic_T
   T1                 7          7        0.09   –                –
   T2                 21         19              –                –
   T3                138        123              –                –
   T4/TX              4          4               –                –
   pathologic_N
   N1                119        106       0.21   –                –
   N0/NX              51         47              –                –
   pathologic_M
   MX                 90         81       0.199  –                –
   M0/M1              81         72              –                –
   Tumor stage
   Stage I            19         18       0.125  –                –
   Stage II          142        127              –                –
   Stage III          3          3               –                –
   Stage IV           3          3               –                –
   pathologic_G
   G1                 28       33324      0.241  –                –
   G2                 92         84              –                –
   G3                 47         43              –                –
   G4                 2          2               –                –
   Total             171        154         –    62               257
   [71]Open in a new tab

Fig. 1.

   Fig. 1
   [72]Open in a new tab

   Workflow chart. FC, fold change.

Identification of prognostic genes

   The expression profile of 776 lipid metabolism genes was extracted from
   TCGA dataset. However, 15 genes were not found. Furthermore, we keep
   genes that are not zero in more than half of the samples. As a result,
   730 genes were used for subsequent analysis. Next, univariate Cox
   analysis of coxph function in r package was performed to obtain genes
   related to prognosis of pancreatic cancer with P < 0.05.

Identification of molecular subtypes

   Cluster analysis of pancreatic cancer samples was performed by
   nonnegative matrix clustering algorithm (NMF), and the standard
   ‘brunet’ was selected by NMF method for 50 iterations. The clustering
   number k was set as 2–10, and the average contour width of the common
   member matrix was determined through the r package ‘NMF’. The minimum
   member of each subclass was set as 10. According to the cophenetic,
   dispersion and silhouette index were used to determine the optimal
   clustering number.

Difference of tumor‐infiltrating immune cells in molecular subtypes

   Six types (B_cell, CD4_Tcell, CD8_Tcell, neutrophil, macrophage cell
   and dendritic cell) of tumor‐infiltrating immune cell were retrieved
   from Tumor Immune Estimation Resource
   ([73]https://cistrome.shinyapps.io/timer/) [[74]28]. Immunity, matrix
   score, and tumor purity of each sample were calculated in r package
   estimate. These indicators were compared on molecular subtypes.

Identification of DEGs

   r package differentially expressed Seq2 (DESeq2) [[75]29] was applied
   to calculate the DEGs in molecular subtypes with a false discovery rate
   (FDR) <0.05 and |log[2]FC| > 1.

WGCNA

   Based on expression profiles of DEG, the WGCNA coexpression algorithm
   was used to mine the coexpression module using the r package WGCNA
   ([76]http://www.r‐project.org/) [[77]30]. First, the appropriate soft
   threshold is determined by approximate scale‐free topology criteria.
   The adjacency matrix was transformed into a topological matrix, and the
   genes were clustered using average‐linkage hierarchical clustering.
   Lastly, the dynamic tree cut method was used to determine module
   eigengenes, at least 30 coexpressed genes.

Functional enrichment and pathway enrichment analysis

   Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes
   and Genomes (KEGG) pathway analysis were performed for DEG based on
   ‘WebGestaltR’ [[78]31] in r. FDR < 0.05 was defined as significant.

Construction of a prognostic risk model based on differentially coexpressed
genes

   First, 90% of samples were randomly selected from the preprocessed 171
   TCGA samples as the training set for model construction. To avoid the
   random allocation bias affecting the stability of subsequent modeling,
   we repeatedly sampled 100 samples with replacement in advance to ensure
   that the randomly selected samples were consistent with all samples in
   age, stage and TNM staging. Univariate Cox regression analysis for OS
   was performed to identify prognostic DEGs with P < 0.05 using survival
   coxph function in r. Lasso (least absolute shrinkage and selection
   operator) Cox regression analysis was performed to find characteristic
   genes using r package glmnet. Subsequently, the multivariate Cox
   proportional hazards regression model was used to build a prognostic
   model in the training group. The risk formula was as follows:
   RiskScore[4 ]= −0.0666 × CA8 + 0.0413 × CEP55 − 0.2189 × GNB3 − 0.0339 
   × SGSM2. Next, the Kaplan–Meier (KM) survival curve was used to compare
   prognosis between the low‐ and high‐risk groups, which were classified
   by the median risk score as the cutoff value in all patients. The
   receiver operating characteristic (ROC) curve was applied to assess
   diagnostic accuracy through comparing the areas under the ROC curves
   (AUCs) using timeROC package in the training and validation groups.

Gene set enrichment analysis

   The r software package Gene Set Variation Analysis (GSVA) [[79]32] was
   used for single‐sample gene set enrichment analysis (ssGSEA), and the
   function with correlation >0.45 was selected.

Advantages of genetic signatures

   To identify the independence of four gene signatures, we used
   univariate and multivariate Cox regression to analyze the relationship
   among age, sex, pathological stage T, stage N, stage M, tumor stage,
   grade and RiskScore with prognosis. Next, by referring to the
   literature, we selected four prognostic risk models, 15‐gene signature
   (Chen) [[80]33], 7‐gene signature (Cheng) [[81]34], 5‐gene signature
   (Raman) [[82]35] and 9‐gene signature (Wu) [[83]36], for comparison
   with our 4‐gene model. ROC curve and KM survival curve of four models
   were drawn in TCGA dataset. Furthermore, we compared the four models
   with the restricted mean survival (RMS) using r language RMS [[84]37]
   and standardized net benefit between four models and four gene
   signatures.

Results

Identification of four molecular subtypes of pancreatic cancer

   Univariate Cox survival analysis of lipid metabolism genes using coxph
   revealed 189 genes associated with the prognosis of pancreatic cancer.
   Pancreatic cancer samples were clustered by NMF algorithm, and the
   optimal number of clusters was determined to be four based on
   cophenetic, dispersion and silhouette metrics (Fig. [85]2A,B). The
   expression of lipid metabolism‐related genes showed that the expression
   of C2 genes was lower than that of C1, C3 and C4 genes (Fig. [86]2C).
   Analysis of the prognostic relationship among the four subtypes showed
   that C1 had the worst prognosis and a significant difference
   [P < 0.0001; hazard ratio (HR), 2.264 (1.582–4.352)], and C2 had the
   best prognosis (Figs [87]2D and [88]S1). Subsequently, we compared the
   clinical characteristics of the four subtypes and found significant
   differences in T, TNM, cancer grade and age (Fig. [89]S2). The immune
   evaluation of the four subtypes showed significant differences in
   immune scores and stroma (Fig. [90]S3).

Fig. 2.

   Fig. 2
   [91]Open in a new tab

   Identification of molecular subtypes of pancreatic cancer. (A)
   Consensus map of NMF clustering. (B) The distribution of cophenetic,
   rss and dispersion with rank = 2–1. (C) Cluster heatmap of 740 lipid
   metabolism genes. (D) KM prognostic survival curve of molecular
   subtype.

Identification of differential expression genes and functional analysis

   DESeq2 was conducted to calculate the DEGs between C1 and C2, C3 and C4
   molecular subtypes; a total of 4249 were obtained (Fig. [92]3). Next,
   according to the expression profile of coding genes, the WGCNA
   coexpression algorithm was used to mine the coexpressed coding genes
   and coexpression modules, and hierarchical clustering analysis was
   performed on the samples to show that there were no outlier samples
   (Fig. [93]4A). A soft threshold of 10 was selected (Fig. [94]4B,C).
   Genes were clustered using the averages‐linkage hierarchy clustering
   method, and 14 modules were obtained by setting height = 0.25,
   deepSplit = 2 and minModuleSize = 30 (Fig. [95]4D). The correlation of
   each module with sex, age, ethnicity and clusters 1, 2, 3 and 4 was
   further analyzed. The results showed that the modules significantly
   related to clusters 1, 2, 3 and 4 were magenta, green, brown and
   turquoise, respectively (Fig. [96]4E–H). Finally, 4249 DEGs and genes
   with significant coexpression modules were intersected to obtain 1340
   differentially expressed cogenes. To determine whether differential
   genes are related to pancreatic cancer function, we performed GO
   molecular function and KEGG function enrichment analysis of 1340
   differential genes using r software package WebGestaltR
   ([97]https://www.r‐project.org/help.html). The significant pathways
   enriched by KEGG are related to insulin secretion and dopaminergic
   synapse pathways (Fig. [98]5A). GO enriched 233 GO cellular component
   (CC) (Fig. [99]5B), 195 GO molecular function (MF) (Fig. [100]5C) and
   977 GO biological process (BP) (Fig. [101]5D).

Fig. 3.

   Fig. 3
   [102]Open in a new tab

   Identification of DEG. (A) Volcano diagram of DEG in C1 subtype. (B)
   Volcano diagram of DEG in C2 subtype. (C) Volcano diagram of DEG in C3
   subtype. (D) Volcano diagram of DEG in C4 subtype.

Fig. 4.

   Fig. 4
   [103]Open in a new tab

   WGCNA. (A) Cluster analysis of samples. (B) Analysis of network
   topology for various soft‐thresholding powers. (C) Gene dendrogram and
   module colors. (D) Correlation between 14 modules and clinical
   phenotype. (E) Gene significance (y axis) versus module membership (x
   axis) plotted for magenta module in TCGA dataset. (F) Gene significance
   (y axis) versus module membership (x axis) plotted for green module in
   TCGA dataset. (G) Gene significance (y axis) versus module membership
   (x axis) plotted for brown module in TCGA dataset. (H) Gene
   significance (y axis) versus module membership (x axis) plotted for
   turquoise module in TCGA dataset.

Fig. 5.

   Fig. 5
   [104]Open in a new tab

   Functional analysis of differentially coexpressed genes. (A) The DEGs
   were enriched to the top 20 enriched by KEGG. (B) The DEGs were
   enriched to the top 20 enriched by GO CC. (C) The DEGs were enriched to
   the top 20 enriched by GO MF. (D) The DEGs were enriched to the top 20
   enriched by GO BP. The color from red to blue represents the
   significance of P value; the darker the red the smaller is the P value,
   and the dot size represents the number of genes enriched into the
   pathway (the higher the number, the larger the dot).

Construction and risk prediction of Gene Signature four‐gene signature

   Ninety percent of the 171 TCGA samples were randomly selected as the
   training set for model construction. Univariate Cox proportional hazard
   analysis was conducted for the expression profile of each
   differentially expressed cogene, and r package survival coxph function
   was used to obtain 369 genes with significant prognostic differences.
   To further narrow the range of genes and construct the prognostic model
   with high accuracy, we used r software package glmnet for Lasso Cox
   regression analysis. First, the analysis of the change trajectory of
   each independent variable shows that as the lambda gradually increases,
   the number of independent variable coefficients tending to zero also
   gradually increases (Fig. [105]6A). Then the confidence interval (CI)
   under each lambda is analyzed, and the model reaches the optimal value
   when lambda = 0.1753479 (Fig. [106]6B). Four genes were selected at
   lambda = 0.1753479 as target genes (Table [107]3). The four‐mRNA
   signature formula is as follows:\def\mybox{\vrule depth ‐0.5mm height
   4mm width 8mm}

Fig. 6.

   Fig. 6
   [108]Open in a new tab

   Construction of risk model. (A) The CI for each lambda. (B) The
   trajectory of each independent variable, the log of lambda on the
   horizontal axis and the coefficient on the vertical axis. (C) Risk
   score, survival time and survival status, and expression of four genes
   in the training set. (D) ROC curve and AUC of four‐gene signature in
   the training set. (E) KM survival curve of four‐gene signature in the
   training set.

Table 3.

   Four‐mRNA signature.
   Symbol Coefficient HR     z‐score  P value Low 95% CI High 95% CI
   CA8     −0.06663     −2   0.045465 0.9355    0.8764     0.9987
   CEP55    0.04133   3.639  0.000274 1.0422    1.0193     1.0657
   GNB3    −0.21887   −1.397 0.162309 0.8034    0.591      1.0921
   SGSM2   −0.03386    −1.6  0.10963  0.9667    0.9274     1.0077
   [109]Open in a new tab

   RiskScore4 = −0.0666 × CA8 + 0.0413 × CEP55 − 0.2189 × GNB3 − 0.0339 × 
   SGSM2.

   The RiskScore of each sample is calculated according to the expression
   level of the sample, and the RiskScore of the sample is plotted. The
   survival time of the samples with high RiskScore was significantly
   lower than that with low RiskScore. The gene expression changes with
   the increase of risk value showed that CEP55 was a risk factor, whereas
   CA8, GNB3 and SGSM2 were protective factors (Fig. [110]6C). ROC
   analysis of RiskScore by r software package timeROC showed that AUC of
   1, 3 and 5 years was >0.70 (Fig. [111]6D). Finally, we carried out
   z‐score for RiskScore and divided the samples with z‐score‐based
   RiskScore greater than zero into the high‐risk group and the samples
   with less than zero into the low‐risk group. KM prognostic analysis
   showed a significant difference between the two groups (Fig. [112]6E).

Robustness of four‐gene signature

   To determine the robustness of the model, we used the whole TCGA
   dataset, [113]GSE57495 dataset and ICGC verification set as the
   verification dataset, and the same model and coefficient as the
   training set were adopted. Similarly, the high RiskScore sample had a
   worse prognostic capability, CEP55 was a risk factor, and CA8, GNB3 and
   SGSM2 were protective factors (Fig. [114]7A,D,G). ROC analysis showed
   that the model had high AUC (Fig. [115]7B,E,H). The results of the KM
   curve showed that there were significant marginal differences between
   the two groups (Fig. [116]7C,F,I). Moreover, we obtained three
   additional pancreatic cancer datasets from the GEO database,
   [117]GSE28735, [118]GSE62452, and [119]GSE85916, and we used the same
   method to score the risk for each patient in these three cohorts.
   First, we performed ROC analysis on the [120]GSE28735 dataset. Because
   of the short follow‐up period, it was not possible to calculate the
   5‐year AUC, in which the AUC of 1 and 3 years reached more than 0.78
   (Fig. [121]S4A), and there was a significant prognostic difference
   between the high‐ and low‐risk groups (Fig. [122]S4D). ROC analysis in
   the [123]GSE62452 dataset showed the highest AUC of 0.77 for 3 years
   (Fig. [124]S4B), with significant prognostic differences between the
   high‐ and low‐risk groups (Fig. [125]S4E). ROC analysis in the
   [126]GSE85916 dataset showed the highest 5‐year AUC of 0.84
   (Fig. [127]S4C), with a significant prognostic difference between the
   high‐ and low‐risk groups (Fig. [128]S4F).

Fig. 7.

   Fig. 7
   [129]Open in a new tab

   Robustness of risk model. (A, D, G) Risk score, survival time and
   survival status, and expression of four genes in whole TCGA dataset,
   [130]GSE57495 dataset and ICGC verification set. (B, E, H) ROC curve
   and AUC of four‐gene signature in whole TCGA dataset, [131]GSE57495
   dataset and ICGC verification set. (C, F, I) KM survival curve of
   four‐gene signature in whole TCGA dataset, [132]GSE57495 dataset and
   ICGC verification set.

Prognostic analysis of risk models and clinical features

   Survival analysis showed that only age, N stage and OS were
   significantly correlated in the TCGA training set sample (P < 0.05),
   and TNM stage presented significant margin (P = 0.05464; Fig. [133]8).
   It was further found that four‐mRNA signature could distinguish the
   young and old groups, female, stage I + II, T1 + T2 and T3 + T4
   patients from high‐ and low‐risk groups (P < 0.05; Fig. [134]9). These
   data further illustrate that our model still has good predictive
   ability in different clinical signs.

Fig. 8.

   Fig. 8
   [135]Open in a new tab

   Prognostic analysis of clinical characteristics and risk models. (A) KM
   prognosis curve in age samples. (B) KM prognosis curve in sex samples.
   (C) KM prognosis curve in T stage samples. (D) KM prognosis curve in N
   stage samples. (E) KM prognosis curve in M stage samples. (F) KM
   prognosis curve in grade samples. (G) KM prognosis curve in TNM stage
   samples.

Fig. 9.

   Fig. 9
   [136]Open in a new tab

   Prognostic analysis of clinical characteristics and risk models. (A) KM
   prognosis curve in young samples (age ≤ 65 years). (B) KM prognosis
   curve in old samples (age > 65 years). (C) KM prognosis curve in female
   samples. (D) KM prognosis curve in male samples. (E) KM prognosis curve
   in T3 stage samples. (F) KM prognosis curve in T1 + T2 + T4 stage
   samples. (G) KM prognosis curve in N1 stage samples. (H) KM prognosis
   curve in N0 + NX stage samples. (I) KM prognosis curve in MX stage
   samples. (J) KM prognosis curve in M0 + M1 stage samples. (K) KM
   prognosis curve in stage II samples. (L) KM prognosis curve in stage
   I + II + IV samples. (M) KM prognosis curve in G1 + G2 stage samples.
   (N) KM prognosis curve in G3 + G4 + G5 stage samples. (O) KM prognosis
   curve in C1 + C2 samples. (P) KM prognosis curve in C3 + C4 samples.

Clinical independence and regulatory pathway of four‐mRNA signature

   To identify the independence of the four‐mRNA signature model in
   clinical applications, we used univariate and multivariate Cox
   regression analysis to analyze relevant HR, 95% CI of HR and P value in
   the clinical information carried by the whole TCGA data. In TCGA
   dataset, the univariate COX regression analysis found that sex, T3, T4
   versus T1/T2, stage III versus stage I, II and IV, and RiskScore are
   significantly associated with survival, but the corresponding
   multivariate Cox regression analysis found that age, stage of N and
   risk score (HR, 3.606; 95% CI: 1.659–7.839; P = 0.007) were
   significantly associated with survival (Fig. [137]10A,B). The earlier
   conditions indicate that our model four‐mRNA signature has good
   predictive performance in clinical application value. To observe the
   relationship between risk scores of different samples and biological
   function, we used the r software package GSVA for ssGSEA analysis. The
   function with a correlation >0.45 was selected, from which it can be
   seen that most of them are negatively correlated with the risk score of
   the sample, while a few are positively correlated with the risk score
   of the sample (Fig. [138]10C). Cluster analysis results showed that
   among the 17 pathways, KEGG_P53_SIGNALING_PATHWAY,
   KEGG_SYSTEMIC_LUPUS_ERYTHE MATOSUS, KEGG_CELL_CYCLE and other
   metabolic‐related pathways increased with the increase of RiskScore,
   and KEGG_pentose_phosphoate_pathway declined with the increase of
   RiskScore (Fig. [139]10D). This also suggests that the dysfunction
   of these pathways is closely related to tumor development.

Fig. 10.

   Fig. 10
   [140]Open in a new tab

   Clinical independence and regulatory pathway of four‐mRNA signature.
   (A) Univariate Cox regression analysis of four‐mRNA signature. (B)
   Multivariate Cox regression analysis of four‐mRNA signature. (C)
   Clustering of correlation coefficients between KEGG pathways and
   RiskScore with a correlation >0.45. (D) The change of ssGSEA score in
   each sample with the increase of risk score in the KEGG pathway. The
   horizontal axis represents the sample, and the risk score increases
   from left to right.

Advantages of risk models

   Four prognosis‐related risk models, 15‐gene signature (Chen), 7‐gene
   signature (Cheng), 5‐gene signature (Raman) and 9‐gene signature (Wu),
   were selected and compared with our four‐gene model. To make the models
   comparable, we calculated the risk scores of each pancreatic cancer
   sample in TCGA using the same method. Samples were divided into the
   risk‐H and risk‐L groups according to the median risk score, and the KM
   prognosis curve showed that there were significant differences in OS
   prognosis of samples from the four models in the risk‐H and risk‐L
   groups (P < 0.05; Fig. [141]11A–D). The ROC analysis results of the
   model showed that the prediction effect of the four models was worse
   than that of the four‐gene signature models (Fig. [142]11E–H). The RMS
   curve was further drawn using the r language RMS package, indicating
   that the AUC of the four gene models was higher than that of the four
   models (Fig. [143]11I). Curves of risk coefficients also show that the
   model is relatively high (Fig. [144]11J).

Fig. 11.

   Fig. 11
   [145]Open in a new tab

   The superiority of the risk model. (A–D) KM prognosis curve of four
   models. (E–H) AUC of 1, 3, 5 and 3 years in four models. (I) RMS curves
   of five models. (J) Risk coefficient curves of five models.

Discussion

   Pancreatic cancer is by default the king of all cancers, placing a
   heavy burden on families and the world [[146]38]. Accurate prediction
   of the prognosis of pancreatic cancer is important for the choice of
   treatment and the improvement of the prognosis. In this study, four
   independent datasets were used to identify DEGs associated with lipid
   metabolism between pancreatic cancer and normal cervical tissue. A
   total of 189 DEGs were identified, and four molecular isoforms were
   identified based on lipid metabolism genes combined with WGCNA to
   finally identify four gene signatures.

   The prognostic value of various gene expression profiles has been
   studied in pancreatic cancer over the past decade. For example,
   Demirkol Canli et al. [[147]39] identified a gene signature composed of
   20 prognostic genes (PPS20), indicating OS and event‐free survival of
   pancreatic cancer. Wolfe et al. [[148]40] developed a four‐miRNA
   molecular signature that is associated with risk for local‐regional
   recurrence and OS after pancreatic cancer resection. Meng et al.
   [[149]41] constructed a novel eight‐mRNA signature to predict the
   prognosis of PAAD patients by applying ESTIMATE scoring to
   RNA‐seq‐based transcriptome data. Chen et al. [[150]33] developed a
   prognostic 15‐gene signature to know OS by analyzing microarray data
   from 63 patients with early pancreatic ductal adenocarcinoma (PDAC)
   (stages IB, IIA and IIB) in the Moffitt cohort. Although these reports
   are promising, the proposed genome is either too large or traditionally
   genetically composed. Therefore, focusing on key biological processes,
   such as lipid metabolism, which have a significant impact on cancer
   occurrence and progression, may introduce potential biomarkers for
   pancreatic cancer screening, as well as new treatment strategies and
   targets.

   Abnormalities in signaling pathways are one of the important advances
   in cancer. With this in mind, abnormal activity of energy metabolism,
   such as lipid metabolism, has a unique role in cancer development, and
   its expression and active state have become of interest to researchers
   for screening and therapeutic inventions [[151]42, [152]43]. Based on
   this, we used a bioinformatics approach to mine the correlation between
   lipid gene status and prognostic prediction in patients with pancreatic
   cancer. We found significant aberrant expression of CEP55, CA8, GNB3
   and SGSM2 as the key dysregulated metabolic factors within the study
   population. Overexpression of CEP55 activates p21 and enhances the
   cell‐cycle transition. Also, CEP55 upregulation promotes PANC cell
   aggressiveness via activating pancreatic cancer [[153]44]. Many studies
   have shown that CA8 is associated with poor prognosis in a variety of
   tumors, [[154]45, [155]46, [156]47], but it has not been reported in
   pancreatic cancer. GNB3 was reported to influence development of
   metastasis in low‐grade tumors [[157]48]. SGSM2 downregulation promoted
   estrogen receptor‐positive breast cancer cell migration via modulating
   cell adhesion and cytoskeleton dynamics [[158]49]. These published
   reports reinforce the potential of these genes as a comprehensive
   prognosis. We screened prognostic genes from lipid metabolism‐related
   genes and divided four molecular subtypes to select four genes that are
   likely to be involved in lipid metabolic processes, although these
   genes have not been studied in depth in lipid metabolism studies. On
   this basis, we suggest that four gene signatures are likely to serve as
   prognostic biological indicators of pancreatic cancer, and that these
   genes may be involved in important lipid metabolism processes.

   Inevitably, there are some limitations in the research, and we hope to
   address these in future work. First, although three study cohorts were
   included in this study, our findings should be confirmed in a separate
   cohort. Second, the prognostic value of lipid metabolism genes was
   studied using gene microarrays, and this single assay should also be
   validated by other methods, such as real‐time quantitative RT‐PCR.
   Third, the majority of genes in our prognostic model have not been
   reported in studies related to lipid metabolism. Their specific
   clinical significance, biological function and potential mechanism of
   action should be studied in further experiments. In conclusion, more
   experimental evidence is needed to determine the function of these
   genes in pancreatic cancer.

Conclusions

   Our study found that four lipid metabolism‐related genes were
   significantly associated with prognosis in patients with pancreatic
   cancer; therefore, the four‐gene signature with some
   clinicopathological characteristics could be a useful biomarker for the
   prognosis of pancreatic cancer.

Conflict of interest

   The authors declare no conflict of interest.

Author contributions

   YY and HW conceived and designed the research. YQ drafted the
   manuscript and agreed to be accountable for all aspects of the work in
   ensuring that questions related to the accuracy or integrity of any
   part of the work are appropriately investigated and resolved. ZC
   contributed to date acquisition. YS analyzed data. YY interpreted data.
   HW revised the manuscript for important intellectual content. All
   authors approved the final version to be published.

Supporting information

   Fig. S1. KM prognosis curves of four molecular subtypes. (A) KM curve
   between C1 and C2 molecular subtypes. (B) KM curve between C1 and C3
   molecular subtypes. (C) KM curve between C1 and C4 molecular subtypes.
   (D) KM curve between C2 and C3 molecular subtypes. (E) KM curve between
   C2 and C4 molecular subtypes. (F) KM curve between C3 and C4 molecular
   subtypes.

   Fig. S2. Comparison of clinical characteristics of molecular subtypes.
   (A) Sample distribution of T stages in four subtypes. (B) Sample
   distribution of N stages in four subtypes. (C) Sample distribution of M
   stages in four subtypes. (D) Sample distribution of TNM stages in four
   subtypes. (E) Sample distribution of tumor stages in four subtypes. (F)
   Sample distribution of sex stages in four subtypes. (G) Sample
   distribution of age stages in four subtypes.

   Fig. S3. Comparison of immune characteristics in molecular subtypes.
   (A) B cell score between molecular subtypes. (B) CD4 cell score between
   molecular subtypes. (C) CD8 cell score between molecular subtypes. (D)
   Neutrophil cell score between molecular subtypes. (E) Macrophage cell
   score between molecular subtypes. (F) Dendritic cell score between
   molecular subtypes. (G) Stromal score between molecular subtypes. (H)
   Est_Immune score between molecular subtypes. (I) ESTIMATE score between
   molecular subtypes.

   Fig. S4. Prognostic ability of four‐gene signature. (A, D) ROC curve
   and KM curve of RiskScore in [159]GSE28735 dataset. (B, E) ROC curve
   and KM curve of RiskScore in [160]GSE62452 dataset. (C, F) ROC curve
   and KM curve of RiskScore in [161]GSE85916 dataset.
   [162]Click here for additional data file.^ (2.9MB, zip)

Contributor Information

   Yan Qin, Email: qin.yan@zs-hospital.sh.cn.

   Hao Wang, Email: wh91351@126.com.

Data Accessibility

   The analyzed datasets generated during the study are available from the
   corresponding author on reasonable request.

References