Abstract
Background: Lung adenocarcinoma (LUAD) is a sex-biased and easily
metastatic malignant disease. A signature based on 5 long non-coding
RNAs (lncRNAs) has been established to promote the overall survival
(OS) prediction effect on LUAD.
Methods: The RNA expression profiles of LUAD patients were obtained
from The Cancer Genome Atlas. OS-associated lncRNAs were identified
based on the differential expression analysis between LUAD and normal
samples followed by survival analysis, univariate and multivariate Cox
proportional hazards regression analyses. OS-associated lncRNA with sex
dimorphism was determined based on the analysis of expression between
males and females. Functional enrichment analysis of the Gene Ontology
(GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG)
pathways was performed to explore the possible mechanisms of 5-lncRNA
signatures.
Results: A 5-lncRNA signature (composed of [31]AC068228.1, SATB2-AS1,
LINC01843, [32]AC026355.1, and [33]AL606489.1) was found to be
effective in predicting high-risk LUAD patients as well as applicable
to female and male subgroups and <65-year and ≥65-year age subgroups.
The forecasted effect of the 5-lncRNA signature was more efficient and
stable than the TNM stage and other clinical risk factors (such as sex
and age). Functional enrichment analysis revealed that the mRNA
co-expressed with these five OS-related lncRNAs was associated with RNA
regulation within the nucleus. [34]AL606489.1 demonstrated a sexual
dimorphism that may be associated with microtubule activity.
Conclusion: Our 5-lncRNA signature could efficaciously predict the OS
of LUAD patients. [35]AL606489.1 demonstrated gender dimorphism, which
provides a new direction for mechanistic studies on sexual dimorphism.
Keywords: long noncoding RNAs (lncRNAs), The Cancer Genome Atlas
(TCGA), gender dimorphism, prognostic prediction, lung adenocarcinoma
(LUAD)
Introduction
Lung cancer is the leading cause of death in cancer patients across the
world ([36]Bade and Cruz, 2020). Approximately 20% of the lung cancer
cases are accounted for by small-cell lung cancer, while the remaining
80% are accounted for by non-small-cell lung cancer ([37]Leung et al.,
2016). Lung adenocarcinoma (LUAD) and lung squamous cell carcinoma are
the most common subtypes of non-small-cell lung cancer ([38]Herbst et
al., 2018). LUAD shifts earlier than lung squamous cell carcinoma
([39]Chen et al., 2017); therefore, it is very important to have
effective early diagnostic methods for LUAD. Epidemiological studies
have revealed that LUAD varies between males and females, with the
highest incidence of occurrence among never-smokers and women
([40]Couraud et al., 2012). Meanwhile, several other factors affect
lung cancer ([41]Paz-Ares et al., 2018). In addition to the well-known
risk factors such as tobacco, a close association of genetic variants
has been demonstrated in multiple studies with the risk of lung cancer
([42]Li and Hemminki, 2004; [43]Musolf et al., 2016; [44]Cheng et al.,
2019).
With the widespread development of the human genome program
([45]Collins et al., 2003), several genes have been highlighted as
being probably related to the onset of lung cancer, including AKT
([46]Hyman et al., 2017), BRD4 ([47]Zhang et al., 2021), FGFR1
([48]Yuan et al., 2017), BRAF ([49]Lokhandwala et al., 2019), MET
([50]Wu et al., 2020), PIK3CA ([51]Wang et al., 2020), and EGFR
([52]Zhao et al., 2021). However, the current reports on the long
non-coding RNA (lncRNA) remain inadequate. LncRNA refers to any
polyadenylated RNA of length >200 bp ([53]Quinn and Chang, 2016), which
forms a transcript of a large portion of the eukaryotic genome
([54]Jathar et al., 2017). In recent years, lncRNA has received
continuous attention from researchers for its important role in the
eukaryotic gene expression and genome remodeling ([55]Cech and Steitz,
2014; [56]Fungtammasan et al., 2015), including from the tumor
perspective ([57]Hauptman and Glavač, 2013). Up to 37,595 non-coding
genes ([58]Snyder et al., 2020) have been identified, according to the
latest data from the ENCODE Project Consortium 2018 ([59]Davis et al.,
2018). Clearly, the number of current studies on lncRNA are
insufficient compared to this large number of genes.
In numerous lncRNA-related studies, lncRNA has been widely reported as
a biomarker in the diseases of multiple systems ([60]Zhang et al.,
2018; [61]Yu et al., 2019). Ideal biomarkers not only facilitate the
early diagnosis of disease ([62]Xu et al., 2020) but also predict
patient prognosis ([63]Chao and Zhou, 2019) as well as become potential
drug therapeutic targets ([64]Tamang et al., 2019). In the field of
LUAD, the effect of lncRNA as a biomarker on tumor cells has been
explored in terms of immunity ([65]Li et al., 2020), ferroptosis
([66]Lu et al., 2021), and cell pyroptosis ([67]Li et al., 2018).
However, for LUAD, as a typical sex-biased ([68]Yuan et al., 2016)
malignancy, no investigation has yet explored the possible mechanisms
of sex-biased differences in LUAD through the biomarker role of lncRNA.
The occurrence of cancer is affected by gender differences, which is a
consistent finding in the field of cancer epidemiology ([69]Dorak and
Karpuzoglu, 2012). Liu et al. analyzed the sex differences in lncRNAs
across different cancers and found that LINC 00263 acts as an oncogene
associated with men and estrogens; these findings may help explore the
differential gene regulatory mechanisms in sex-specific cancers
([70]Liu et al., 2020).
In conclusion, this study identified the prognostic models of LUAD
through information mining from public databases and explored the
possible mechanisms of sex differences in LUAD. Alternatively, as The
Cancer Genome Atlas (TCGA) contains the most extensive lncRNA
expression matrix ([71]Tomczak et al., 2015), we prefer to conduct
experiments in the TCGA database.
Materials and methods
Data sources
The lncRNA and mRNA expression dataset in the FPKM format as well as
the clinical characters for 535 LUAD patients and 59 normal patients
were directly downloaded from the TCGA
([72]https://portal.gdc.cancer.gov/), updated until 5 December 2021.
GEO database ([73]https://www.ncbi.nlm.nih.gov/geo/) was used to
perform external validation.
Isolation of differentially expressed lncRNA
DELs between the LUAD and normal samples were isolated from all lncRNAs
using the R software. The p-value of each lncRNA in the LUAD and normal
samples was calculated by the rank-sum test, and the p-values were
rectified by the False Discovery Rate (FDR) method. Only the lncRNAs
with adjusted p-values < 0.05 and log2 | fold change | values > 2 were
defined as differentially expressed lncRNAs. Volcano plot and heatmaps
were visualized by the “plot” function and the “heatmap” package of the
R, respectively.
Isolation of overall survival-related lncRNAs in LUAD patients
First, we removed LUAD patients with OS < 0 days. Next, we employed
univariate Cox proportional hazards regression (CPHR) analysis and
Kaplan-Meier analysis to assess the presence of any significant
correlations between the expression of each DELs and the OS of LUAD
patients. Only lncRNAs with p < 0.01 from both the analyses were
considered with a logical agreement in expression and prognostic effect
and selected as the candidate OS-related lncRNAs. Then, half of the
patients were randomly assigned as the “primary dataset” after removing
patients with incomplete clinical information; the original complete
dataset was called the “entire dataset”. In addition to randomization,
the criteria for grouping included no statistical differences in the
clinical characteristics between the “primary” and “entire datasets. In
order to fit the prediction model with the best-prediction effect,
multivariate CPHR analysis (stepwise model) of candidate OS-associated
lncRNAs was performed with the R software in the “primary dataset”. To
ensure the goodness of fitting and to avoid overfitting, the Akaike
information criterion (AIC) was computed, and the prediction model with
the lowest AIC was considered as the most ideal. LncRNAs included in
the best prediction model were selected as OS-related lncRNAs.
Calculation and evaluation of the OS-related lncRNA signature
We determined the coefficients for each lncRNA by another multivariate
CPHR analysis in the “primary dataset”. Until this point, we confirmed
a risk score formula with the expressions of the OS-related lncRNAs as
the independent variables and weighted by the regression coefficients
corresponding to the lncRNAs. The risk scoring formula used is given
below:
[MATH:
Risk Score=β1×Expression ge<
/mi>ne1+β2×Expression ge<
/mi>ne2+⋯+βn×Expression ge<
/mi>nen :MATH]
where βi correspond to the correlation coefficient.
To determine whether the OS-related lncRNA signature was an independent
predictor of OS, we applied both univariate and multivariate CPHR
analyses of OS-related lncRNA signature and the routine clinical risk
factors (such as sex, age, TNM stage, tumor stage, lymph node
metastasis, and distant metastasis) in the LUAD patients. Next, we
assessed whether the predictive effect of the OS-related lncRNA
signature on OS was independent of the routine clinical risk factors by
stratified analysis. Meanwhile, to evaluate the prognostic effect of
the lncRNA-based classifiers across different time ranges, we plotted
the time-dependent receiver operating characteristic (ROC) curves and
then calculated the area under the time-dependent ROC curve (AUC)
values for each dataset. Finally, the predictive effects of the
5-lncRNA classifier and the classifiers based on the other clinical
risk factors were compared by AUC.
Identification of OS-related lncRNAs with gender dimorphism
Whether the OS-related lncRNA was differentially expressed between the
male and female patients was determined by the rank-sum test using p <
0.05 as the significance threshold. In both the male and female groups,
the patients were assigned into two groups of high or low expression
bounded by the median expression of an OS-related lncRNA, and the
Kaplan–Meier curve was applied to analyze whether there were
differences in survival time between the high and low expression
groups. The lncRNA was considered to be with gender dimorphism if an
OS-related lncRNA was differentially expressed in males and females
while showing different prognostic association in males and females.
Functional enrichment analyses with co-expressed mRNA
The co-expression degree of OS-related lncRNAs and mRNA was determined
by Pearson’s correlational analysis. The mRNAs with a positive
correlation coefficient >0.5 with OS-related lncRNAs were employed in
the next step of enrichment analysis. The “cluster profile” package in
R software was used for the functional enrichment analysis using the
Gene Ontology (GO) terms and Kyoto Encyclopedia of Gene and Genomes
(KEGG) pathways ([74]Kanehisa and Goto, 2000; [75]Kanehisa, 2019;
[76]Kanehisa et al., 2021), with p < 0.01 set as a significance
threshold.
Statistical analysis
For the survival analysis, the survival curves were plotted by the
Kaplan-Meier method, and the differential p-values were calculated by
the log-rank test. The t-test was used to compare the presence of any
significant differences between the “primary” and “entire datasets”.
Unless otherwise specified, p < 0.05 was considered to indicate a
statistical difference.
Results
Candidate OS-related lncRNAs in LUAD patients
The flow chart illustrated in [77]Figure 1 shows the overall design of
this study and some of the main results. After data collation, we
obtained the expression data of 14,142 lncRNAs and 19,658 mRNAs for 535
LUAD samples and 59 normal samples from the TCGA-LUAD database. Through
statistical comparison, 1,223 DELs in tumor samples and normal samples
were identified with a log2 | fold change |> 2 and adjusted p < 0.05.
Of these 1,223 DELs, 1,044 lncRNA were upregulated and 179 were
downregulated in the LUAD patients. Next, volcano plots and heatmaps of
the differential genes were drawn using the “plot” function and the
“pheatmap” package in the R software, the results of which are
illustrated in [78]Figures 2A,B.
FIGURE 1.
[79]FIGURE 1
[80]Open in a new tab
Flowchart depicting the study protocol.
FIGURE 2.
[81]FIGURE 2
[82]Open in a new tab
Volcano plot and heatmap of lncRNAs. (A) Volcano plot of 1,223 lncRNAs
in the LUAD samples. Yellow dots represent 1,044 upregulated lncRNAs,
while blue dots represent 179 downregulated lncRNAs. (B) Heatmap of
1,223 lncRNAs expression levels in LUAD samples from the TCGA-LUAD
project. N = normal samples, T = LUAD samples.
After the exclusion of 45 LUAD samples with incomplete survival data,
490 LUAD samples were finally enrolled in the study. In these 490
samples, 1,223 DELs were analyzed by the Kaplan-Meier method and
univariate CPHR analysis, where OS served as the dependent variable and
lncRNA expression as the independent variable. The results of the
univariate CPHR analysis are depicted in [83]Supplementary Table S1,
and a total of 15 lncRNAs were found to be statistically significantly
associated with OS in LUAD patients (all p < 0.01). Of this 15 lncRNA,
the high expression of 13 lncRNAs (namely, LINC02081, [84]AC010343.3,
LINC02086, [85]AC068228.1, [86]AC022784.1, SATB2-AS1, [87]AL138789.1,
LINC01843, LINC00519, [88]AL606489.1, DEPDC1-AS1, [89]AC087588.2, and
FAM83A-AS1) was associated with a shorter OS. In contrast, the high
expression of [90]AC026355.1 and [91]AL031600.2 was associated with a
higher OS. Moreover, as shown in [92]Supplementary Figure S1, the
results of the Kaplan-Meier analysis conformed to those of the
univariate CPHR analyses. To this point, 15 lncRNAs with some
correlation between the gene expression volume and prognosis were
included as the candidate OS-related lncRNAs.
Identification and evaluation of an OS-related lncRNA signature to predict
the OS
After removing 11 samples without complete clinical features (such as
TNM stage or age), 479 LUAD samples formed the “entire dataset”, of
which 239 groups were randomly selected as the “primary dataset”. The
differential analysis revealed no statistical differences in the
baseline clinical risk factors and OS between the “entire” and “primary
datasets” (all p > 0.05; [93]Table 1).
TABLE 1.
Baseline clinical characteristics and OS between the “entire dataset”
and the “primary dataset”.
Character Primary dataset Entire dataset p-value(t)
n = 239 n = 479
Age (year) 0.35
≥65 145 (60.67%) 266 (55.53%)
<65 94 (39.33%) 213 (44.47%)
Gender 0.80
Female 135 (56.49%) 260 (54.28%)
Male 104 (43.51%) 219 (45.72%)
TNM stage 0.68
I 130 (54.39%) 259 (54.07%)
II-IV 109 (45.61%) 220 (45.93%)
Tumor stage 0.80
TX 2 (0.84%) 3 (0.63%)
T1-T2 212 (88.70%) 415 (86.64%)
T3-T4 25 (10.46%) 61 (12.73%)
Lymph node metastasis 0.78
NX 5 (2.09%) 9 (1.89%)
No 152 (63.60%) 311 (64.92%)
Yes 82 (34.31%) 159 (33.19%)
Distant metastasis 0.80
MX 72 (30.13%) 139 (29.02%)
No 156 (65.27%) 316 (65.97%)
Yes 11 (4.60%) 24 (5.01%)
OS (days) 0.85
Average 773 761
Median 582 557
[94]Open in a new tab
Candidate prognosis lncRNAs were further screened by multivariate-CPHR
analysis (stepwise model) in the “primary dataset” using AIC to avoid
overfitting. Five OS-related lncRNAs were picked with the largest fit
and the lowest AIC values ([95]Table 2), namely, is [96]AC068228.1,
SATB2-AS1, LINC01843, [97]AC026355.1, and [98]AL606489.1. Next, these 5
OS-related lncRNAs and their risk coefficients were integrated into the
predictive signature to obtain a risk scoring using the following
formula:
[MATH:
Risk Score=0.4537×expression AC0682
28.1+2.1929×expression SATB2–
AS1+0.0824×expression LINC01843<
/mrow>+–0.3963×expression AC026355.1
+0.1383×expression AL606489.1
:MATH]
TABLE 2.
Five OS-related lncRNAs in the “primary dataset”.
lncRNA Coefficients HR (95% CI) p-value
[99]AC068228.1 0.4537 1.5741 (1.2184–2.0337) <0.001
SATB2-AS1 2.1929 8.9619 (2.2255–36.0894) 0.002
LINC01843 0.0824 1.0859 (1.0109–1.1665) 0.023
[100]AC026355.1 −0.3963 0.6727 (0.5325–0.8500) <0.001
[101]AL606489.1 0.1383 1.1483 (0.9741–1.3537) 0.099
[102]Open in a new tab
Abbreviations: HR, hazard ratio; CI, confidence interval.
Next, we computed the risk score for LUAD patients in the “primary
dataset” according to the 5 lncRNA signatures. Using the median risk
score (0.09382005) as the cut-off value, 239 LUAD patients were
classified into high- (n = 119) or low- (n = 120) risk groups. The risk
score distributions, OS status, and the 5 lncRNA expression profiles in
the “primary datasets” are depicted in [103]Figure 3 (A–C). OS-related
lncRNAs expression heatmaps revealed that the 4 upregulated lncRNA
(i.e., [104]AC068228.1, SATB2-AS1, LINC01843, and [105]AL606489.1)
demonstrated higher expression levels in the high-risk group, and the
[106]AC026355.1 expression levels were lower in the high-risk groups.
As shown in [107]Figure 3D, the Kaplan–Meier curve obviously showed
that the OS time in the high-risk group was less than that in the
low-risk group (p = 1.071E-04, log-rank test). Subsequently, in the
“primary dataset”, as shown in [108]Figures 3E–G, the AUC of the
time-dependent ROC curve was 0.768 at 1 year, 0.668 at 3 years, and
0.702 at 5 years.
FIGURE 3.
[109]FIGURE 3
[110]Open in a new tab
Assessment of the 5-lncRNA signature for predicting OS of LUAD in the
“primary dataset”. (A) The risk score distribution in the “primary
dataset”. (B) The OS status in the “primary dataset”. (C) The
OS-related lncRNAs expression heatmaps of the 5-lncRNA signature in the
“primary dataset”. (D) Kaplan–Meier curves comparing OS between the
high-risk groups (n = 119) and low-risk groups (n = 120) in the
“primary dataset”. Blue- and red-shaded sections indicate the
confidence intervals for survival. Listed below the curve is the number
of patients being at risk. (E) Time-dependent ROC curve based on
5-lncRNA signature predicting 1 year-OS in the “primary dataset”. (F)
Time-dependent ROC curve based on 5-lncRNA signature predicting
3 years-OS in the “primary dataset”. (G) Time-dependent ROC curve based
on 5-lncRNA signature predicting 5 years-OS in the “primary dataset”.
To verify the prediction of 5-lncRNA signatures obtained from the
“primary dataset”, we applied 5-lncRNA signatures to the “entire
dataset” (n = 479). Similarly, 479 patients were classified into the
high-risk (n = 244) and low-risk (n = 235) groups according to the
median risk score in the “primary dataset”. The risk score
distributions, OS status, and the 5 lncRNA expression profiles in the
“entire dataset” are depicted in [111]Figures 4A–C. The results from
the “entire dataset” are consistent with those from the “primary
dataset”. Meanwhile, the Kaplan–Meier curve ([112]Figure 4D) showed
that the OS in the high-risk group (n = 244) was significantly shorter
than that in the low-risk group (n = 235) (p = 5.587E-07, log-rank
test). As shown in [113]Figures 4E–G, the AUC of the time-dependent ROC
curve was 0.738 at 1 year, 0.661 at 3 years, and 0.709 at 5 years. The
5-lncRNA signature showed a good prediction performance both in the
“primary dataset” and the “entire dataset” of the LUAD patients. The
prediction results of 5-lncRNA signature in “primary dataset” and the
“entire dataset” were shown in [114]Supplementary Table S2.
FIGURE 4.
[115]FIGURE 4
[116]Open in a new tab
Assessment of the 5-lncRNA signature in the “entire dataset”. (A) The
risk score distribution in the “entire dataset”. (B) The OS status in
the “entire dataset”. (C) The OS-related lncRNAs expression heatmaps of
the 5-lncRNA signature in the “entire dataset”. (D) The Kaplan–Meier
curves comparing OS between the high-risk groups (n = 244) and the
low-risk groups (n = 235) in the “entire dataset”. Blue- and red-shaded
sections indicate the confidence intervals for survival. The number of
patients at risk is listed below the curve. (E) Time-dependent ROC
curve based on 5-lncRNA signature predicting 1 year-OS in the “entire
dataset”. (F) Time-dependent ROC curve based on 5-lncRNA signature
predicting 3 years-OS in the “entire dataset”. (G) Time-dependent ROC
curve based on 5-lncRNA signature predicting the 5 years-OS in the
“entire dataset”.
The prognostic effect of the 5-lncRNA signature as an independent prognostic
factor in LUAD patients.
Next, to examine whether the prognostic performance of the 5-lncRNA
features was independent of other conventional clinical risk factors,
we performed multivariate CPHR analyses. The hazard ratio (HR) in the
“entire dataset” ([117]Table 3) was 1.085 (p < 0.001, 95% CI =
1.052–1.118), and in the “primary dataset” ([118]Supplementary Table
S3) was 1.065 (p < 0.001, 95% CI = 1.028–1.102). The abovementioned
data indicates that these 5 lncRNA signatures could independently
predict the prognosis of LUAD patients as an independent prognostic
factor for LUAD.
TABLE 3.
Univariate and multivariate Cox proportional hazards regression
analyses results of 5-lncRNA signature and other clinical risk factors
in the “entire dataset”.
Characteristic Univariate analysis Multivariate analysis
HR (95%CI) p-value HR (95%CI) p-value
Age 1.009 (0.992–1.025) 0.266 1.016 (1.000–1.033) 0.049
Gender (female vs. male) 1.009 (0.992–1.026) 0.855 0.923 (0.666–1.278)
0.631
TNM stage (I-Ⅳ) 1.659 (1.430–1.924) <0.001 1.422 (1.138–1.776) 0.001
Tumor stage (T1-T4) 1.495 (1.223–1.828) <0.001 1.149 (0.929–1.421)
0.199
Lymph node metastasis (N0-N3) 1.715 (1.435–2.049) <0.001 1.221
(0.952–1.565) 0.114
Risk score 1.101 (1.070–1.133) <0.001 1.085 (1.052–1.118) <0.001
[119]Open in a new tab
Notes: Bold values indicate statistical significance (p < 0.05).
Abbreviations: HR, hazard ratio; CI, confidence interval.
To validate the scope of applicability of the risk score prediction, we
conducted a stratified analysis of the “entire dataset”. First,
considering the number of people, 479 LUAD patients were classified
into stage I (n = 259; [120]Figure 5A) and stage Ⅱ–Ⅳ (n = 220;
[121]Figure 5B) based on the TNM stage. Each subgroup was classified as
the high-risk and low-risk groups and then Kaplan–Meier curves were
accordingly plotted. Second, 479 patients were classified into no (n =
311, [122]Figure 5C) or yes (n = 159, [123]Figure 5D) subgroups
according to the absence or presence of lymphoid tract metastasis,
respectively. Next, 479 patients with LUAD were assigned into male (n =
219, [124]Figure 5E) and female subgroups (n = 260, [125]Figure 5F).
Then, 479 patients with LUAD were assigned to the age ≥ 65 years (n =
266, [126]Figure 5G) and <65 years subgroups (n = 213, [127]Figure 5H).
Finally, we noted that, in all subgroups, the survival time was
significantly lower in the high-risk group than that in the low-risk
groups, albeit it was not statistically significant in the female (p =
0.08) subgroup. To further validate the association between OS and
5-lncRNA, GEO database ([128]GSE3141 and [129]GSE19188) was used to
perform external validation. The Kaplan-Meier curves for OS associated
with the SATB2-AS1 expression were shown in [130]Supplementary Figure
S2 ([131]GSE3141: p = 0.6312, [132]GSE19188: p = 0.0914, [133]GSE3141 +
[134]GSE19188: p = 0.1322). It is a pity that all three statistics did
not show a significant effect. However, SATB2-AS1 still showed a clear
trend towards promoting oncogenes, which is consistent with our
findings in the TCGA database. All case ID involved in this study were
shown in [135]Supplementary Table S4.
FIGURE 5.
[136]FIGURE 5
[137]Open in a new tab
Stratified analysis of the 5-lncRNA signature in LUAD patients. (A)
Kaplan-Meier analysis of patients in the stage I subgroup, (B) stage
Ⅱ–IV subgroup, (C) without lymph node metastasis subgroup, (D) with
lymph node metastasis subgroup, (E) male subgroup, (F) female subgroup,
(G) age ≥65 years subgroups, and (H) age <65 years subgroups. The
differences between the two risk groups were assessed by two-sided
log-rank tests.
Five-lncRNA signature-based signature has a better survival prediction effect
than other clinical characters
We employed the time-dependent ROC curves to compare the predictive
effects of different prognostic factors using the AUC as a comparison
indicator. As shown in [138]Figure 6, the stable predictive performance
of the 5-lncRNA signature is more outstanding than the conventional
clinical characters such as the TNM stage, and are efficient to predict
the prognosis of LUAD patients.
FIGURE 6.
[139]FIGURE 6
[140]Open in a new tab
The prognostic value of the 5-lncRNA signature in comparison with other
clinical factors. Time-dependent ROC curve analysis of the 5-lncRNA
signature for predicting (A) 1 year-OS, (B) 3 years-OS, and (C)
5 years-OS in the “primary dataset”. Time-dependent ROC curve analysis
of the 5-lncRNA signature for predicting (D) 1 year-OS, (E) 3 years-OS,
and (F) 5 years-OS in the “entire dataset”.
[141]AL606489.1, an OS-related lncRNAs, demonstrating gender dimorphism
Among the 5 OS-related genes, [142]AL606489.1, SATB2-AS1 and
[143]AC068228.1 was differentially expressed between males and females
([144]Figures 7A–C. This significant difference was not shown in
LINC01843 (p = 0.5833) and [145]AC026355.1 (p = 0.5177), as shown in
[146]Supplementary Figures S3A,B. The Kaplan–Meier curves for the OS
related with [147]AL606489.1 expression in males (low = 109, high =
110) and females (low = 130, high = 130) are depicted in [148]Figures
7D,G respectively. For SATB2-AS1, the Kaplan–Meier curves in males and
females are depicted in [149]Figures 7E,H respectively. For
[150]AC068228.1, the Kaplan–Meier curves in males and females are
depicted in [151]Figures 7F,I respectively. In males, the high
expression of [152]AL606489.1, SATB2-AS1 or [153]AC068228.1 was
associated with the shorter OS. In females, the high expression of
SATB2-AS1 or [154]AC068228.1 was associated with the shorter OS.
Dissimilarly, the high expression of [155]AL606489.1 in females was not
significantly associated with the OS (p = 0.2704). Finally, to verify
whether this discrepancy was attributable to [156]AL606489.1
association with the gender, we noted no significant difference in the
overall survival between males and females by the Kaplan-Meier analysis
([157]Supplementary Figure S3).
FIGURE 7.
[158]FIGURE 7
[159]Open in a new tab
The expression of [160]AL606489.1, SATB2-AS1 and [161]AC068228.1 in
LUAD. (A) Differentially expressed [162]AL606489.1 between 260 female
and 219 male tumor samples. (B) Differentially expressed SATB2-AS1
between 260 female and 219 male tumor samples. (C) Differentially
expressed [163]AC068228.1 between 260 female and 219 male tumor
samples. Kaplan–Meier curves for OS associated with the [164]AL606489.1
expression in (D) male and (G) female. Kaplan–Meier curves for OS
associated with the SATB2-AS1 expression in (E) male and (H) female.
Kaplan–Meier curves for OS associated with the [165]AC068228.1
expression in (F) male and (I) female.
Functional characteristics of 5 OS-related lncRNAs
To determine the possible function of 5 OS-related lncRNAs in the
tumorigenic development of LUAD tumors, we conducted an function
enrichment analysis on mRNAs co-expressed with OS-associated lncRNAs in
490 LUAD samples. The levels of the 928 mRNA expressions were
positively associated with the level of at least one OS-related lncRNA
(co-expression coefficient >0.50). The GO analysis indicated that these
co-expressed mRNAs were enriched in 52 GO terms ([166]Supplementary
Table S5). These GO terms were mainly enriched in regulating the mRNA
metabolic processes, RNA splicing, and ubiquitin-specific protease
activity ([167]Figure 8A). Similar findings were obtained from the KEGG
pathway enrichment analysis ([168]Figure 8B), such as the
ubiquitin-mediated proteolysis pathway. Therefore, the characteristics
of 5-lncRNA mainly affected the gene expression within the nucleus and
may be related to cell cycle regulation.
FIGURE 8.
[169]FIGURE 8
[170]Open in a new tab
GO and KEGG functional enrichment analysis of the mRNA co-expressed
with 5 OS-related lncRNA. (A) GO enrichment analysis. (B) KEGG
enrichment analysis.
Discission
LUAD is one of the most widely diagnosed subtypes of lung cancer
([171]Fong et al., 1999). Owing to the unknown pathogenesis and
unsatisfactory treatment effect, the mortality of LUAD patients remains
high ([172]Jiang et al., 2019). In recent years, lncRNA has been
applied as a potential tumor marker with promising research progress in
LUAD ([173]Li et al., 2014).
In this study, both univariate and multivariate CPHR analyses were
performed to establish a 5-lncRNA signature. This model showed high
accuracy in both the “entire” and “primary datasets”. In contrast, our
prognostic model outperformed the other prognostic features. Risk
stratification analysis suggested that our prediction model applied to
different subgroups. Finally, we employed GO and KEGG to detect the
biological function of our predictive model. Our results seemingly
explored how these 5 OS-related lncRNAs are involved in tumor
progression. Finally, the lncRNA [174]AL606489.1 showed a possible
association with sex dimorphism.
Our prognostic model consisted of 5 LncRNAs, 4 (i.e., [175]AC068228.1,
SATB2-AS1, [176]AC026355.1, [177]AL606489.1) of which have been
previously reported to be related to the prognosis of LUAD. For
instance, SATB2-AS1 has been reported to promote tumor cell growth in
osteosarcoma ([178]Liu et al., 2017), and NSCLC ([179]Wu et al., 2021).
However, in colorectal cancer ([180]Xu et al., 2019), SATB2-AS1 has the
effect of inhibiting tumor cell metastasis. Similar to our result,
[181]AC026355.1 was reported to be an immune-related gene with tumor
suppressor effects by Li et al. ([182]Li et al., 2020) In past studies,
[183]AL606489.1 has been reported to be associated with autophagy
([184]Liu et al., 2021), ferroptosis ([185]Guo et al., 2021),
cuproptosis ([186]Mo et al., 2022) and pyroptosis ([187]Li et al.,
2018; [188]Song et al., 2021) processes in LUAD tumor cells. LINC 01843
was first shown to be associated with LUAD progression. These reports
provide a new direction for gene sequence studies in LUAD.
In the GO and KEGG analysis results, the mRNAs co-expressed with
prognostic-associated lncRNAs were associated with processing and RNA
transport in the nucleus, such as in the regulation of mRNA metabolic
process and the regulation of RNA splicing. Past studies have
demonstrated that one of the prognostic-related genes, SATB2-AS1, acts
as a miR-299-3p sponge, promoted the development of NSCLC. The
underlying mechanism is the promotion of tumor cell proliferation, cell
cycle progression, and survival ([189]Wu et al., 2021). Thus, the
results of GO and KEGG seem to appropriately reflect the place of
action that was associated with prognosis, lncRNA affects the
prognostic effect in patients with LUAD.
In the risk stratification analysis, this predictive model showed a
slightly better performance in male patients (p < 0.05) than in female
patients (p = 0.08), which prompted us to further explore the reasons
for this discrepancy.
In our study, [190]AL606489.1 was highly expressed in males relative to
that in females. Moreover, on the premise that there is no significant
difference in the prognosis between males and females with LUAD,
[191]AL606489.1 exhibited high levels of OS association in male
patients, while showing no significant OS association in female
patients. Therefore, we suggest that [192]AL606489.1 demonstrates a
gender dimorphism in terms of the prognostic effects in patients with
LUAD. Meanwhile, this difference of [193]AL606489.1 expression in
females compared to males may be why the 5-lncRNA signature did not
show significance in females in [194]Figure 5F.
A person’s gender is one of the key factors affecting the occurrence
and development of cancer throughout his or her lifetime. In addition
to the sex-specificity of ovarian cancer in women and prostate cancer
in men, several tumors are associated with a significant sex bias in
terms of incidence ([195]Li et al., 2018), metastatic ([196]Kim et al.,
2020), prognosis ([197]Song et al., 2021), and therapeutic efficacy
([198]Freudenstein et al., 2020). As the attention to gender
differences has increased, gender dimorphism has been mentioned in
increasing studies ([199]Yuan et al., 2016).
In LUAD, sex bias is also associated with patients’ acquired behavior.
For instance, Henschke et al. reported that women smoking was
associated with a higher risk of lung cancer compared to men smoking,
but after diagnosis of lung cancer, they had better survival rates
([200]Henschke et al., 2006). The difference in prognosis between male
and female patients may be related to natural differences in hormone
levels. Multiple studies have demonstrated that sexual dimorphism may
be due to differences in the estrogen content between men and women,
which develops into different prognostic effects between male and
female patients with LUAD. For example, LncRNA LINC00263 has been
implicated as an oncogene in men and estrogen by Liu et al. ([201]Liu
et al., 2020). However, the specific role of lncRNA in sex dimorphism
has not been well studied. In the present case, [202]AL606489.1 can
hence be a breakthrough.
In our study, the action mechanism of [203]AL606489.1 was explored by
co-expression analyses. In the co-expression analysis, [204]AL606489.1
was found to be highly correlated with the sarcolemmal
membrane-associated protein (SLMAP) expression (correlation coefficient
= 0.64) ([205]Supplementary Table S5). A subform of the SLMAP has been
reported to be a component of the microtubule (Mt) tissue center
([206]Guzzo et al., 2004). Mts is an important therapeutic target for
tumor cells ([207]Dumontet and Jordan, 2010). Clinically, some
compounds that break Mt dynamics are also some of the most effective
chemotherapeutics for cancer, such as vincristine alkaloids and taxanes
([208]Checchi et al., 2003). Similarly, the mt-targeted drugs (MTDs)
form a major family of anticancer drugs with anti-mitotic and
antiangiogenic properties that inhibit tumor progression, mainly by
changing the Mt dynamics of the tumor and endothelial cells ([209]Bhat
and Setaluri, 2007). However, there are no reliable markers that can be
used for the prediction of the development of cancer sensitivity and
resistance during treatment. In this study, [210]AL606489.1 was found
to be highly co-expressed with SLMAP and highly correlated with LUAD
prognosis, indicating its potential as a reliable marker.
Alternatively, the differential expression of [211]AL606489.1 in males
and females may be responsible for the clinical emergence of
sex-differential efficacy of anticancer drugs that disrupt the Mt
dynamics ([212]Moore et al., 2003).
The limitations of the present study include the lack of external
validation considering that the most lncRNAs required in this study
were inaccessible in the GEO database. Second, as RNA testing in the
TCGA database is constantly updated, this study is slightly
sample-limited. Finally, the preliminary conclusion that
[213]AL606489.1 demonstrates sexual dimorphism, as derived in this
study, needs to be further validated through in vitro and in vivo
biological experiments, if the external conditions support it.
Conclusion
Our 5-lncRNA signature (composed of [214]AC068228.1, SATB2-AS1,
LINC01843, [215]AC026355.1, and [216]AL606489.1) could effectively
predict the OS of LUAD patients, indicating its positive role in early
screening and prognosis prediction of LUAD. Moreover, [217]AL606489.1
demonstrated gender dimorphism, thereby providing a new direction for
mechanistic studies on sexual dimorphism.
Acknowledgments