Abstract

Background

   The most common histological subtypes of esophageal cancer are squamous
   cell carcinoma (ESCC) and adenocarcinoma (EAC). It has been
   demonstrated that non-marginal differences in gene expression and
   somatic alternation exist between these two subtypes; consequently,
   biomarkers that have prognostic values for them are expected to be
   distinct. In contrast, laryngeal squamous cell cancer (LSCC) has a
   better prognosis than hypopharyngeal squamous cell carcinoma (HSCC).
   Likewise, subtype-specific prognostic signatures may exist for LSCC and
   HSCC. Long non-coding RNAs (lncRNAs) hold promise for identifying
   prognostic signatures for a variety of cancers including esophageal
   cancer and head and neck squamous cell carcinoma (HNSCC).

Methods

   In this study, we applied a novel feature selection method capable of
   identifying specific prognostic signatures uniquely for each subtype –
   the Cox-filter method – to The Cancer Genome Atlas esophageal cancer
   and HSNCC RNA-Seq data, with the objectives of constructing
   subtype-specific prognostic lncRNA expression signatures for esophageal
   cancer and HNSCC.

Results

   By incorporating biological relevancy information, the lncRNA lists
   identified by the Cox-filter method were further refined. The resulting
   signatures include genes that are highly related to cancer, such as H19
   and NEAT1, which possess perfect prognostic values for esophageal
   cancer and HNSCC, respectively.

Conclusions

   The Cox-filter method is indeed a handy tool to identify
   subtype-specific prognostic lncRNA signatures. We anticipate the method
   will gain wider applications.

   Keywords: Long non-coding RNA (lncRNA), Prognostic signature, Head and
   neck squamous cell carcinoma (HNSCC), Esophageal cancer, Cox regression
   model

Background

   Esophageal cancer is a cancer of the esophagus, the hollow tube that
   carries foods and liquids from throat to stomach. The causes of
   esophageal cancer are unclear, but it is commonly believed that both
   environmental and genetic factors play roles in its initiation and
   progression [[31]1]. For instance, smoking, heavy alcohol consumption,
   obesity, and damage to the esophagus from acid reflux (Barrett
   esophagus) are thought to increase the risk of developing esophageal
   cancer, while, the tendency of familial aggregation for esophageal
   cancer suggests that genetic components are of crucial importance. The
   most common histological subtypes of esophageal cancer are squamous
   cell carcinoma (ESCC) and adenocarcinoma (EAC). As far as prognosis is
   concerned, no evidence suggests any substantial difference between
   these two subtypes. Nevertheless, a study by The Cancer Genome Atlas
   research group [[32]2] has demonstrated that non-marginal differences
   with regard to gene expression and somatic alteration exist between
   ESCC and EAC. Consequently, biomarkers that hold prognostic value for
   these two subtypes are expected to be distinct, at least to some
   extent.

   Head and neck squamous cell carcinoma (HNSCC) develops in mucous
   membranes of the mouth, nose and throat. Hypopharyngeal squamous cell
   carcinoma (HSCC), which originates in mucosa of the hypopharynx and
   accounts for approximately 3% of HNSCC cases, has one of the poorest
   prognoses among HNSCC patients [[33]3]. Laryngeal squamous cell cancer
   (LSCC) accounts for relatively more HNSCC cases and has a better
   prognosis compared to HSCC even though the initial sites of these two
   diseases are anatomically very close. LSCC originates in the larynx,
   whereas HSCC originates in the lower part of the throat near the larynx
   (i.e., the hypopharynx). Therefore, finding molecular markers that can
   distinguish between the two subtypes is crucial for survival
   prediction.

   Long non-coding RNAs (lncRNAs) are a class of RNA molecules that have a
   length of more than 200 nucleotides and are without protein-coding
   capacity [[34]4]. Therefore, lncRNAs have previously been regarded as
   transcriptional “junk.” Nowadays, paramount investigations have
   demonstrated that lncRNAs can serve as novel biomarkers and therapeutic
   targets in complex diseases such as cancer. Identification of lncRNA
   signatures is in demand and usually requires the help of a feature
   selection method. The primary aims of feature selection are to reduce
   the number of features (e.g., genes or metabolites) under consideration
   to a manageable size, thus speeding up the learning process and
   facilitating biological interpretation and experimental validation
   [[35]5].

   Applying feature selection to lncRNA (vs mRNA) data might achieve
   better model parsimony because mRNA-based studies obtain signatures
   with a limited number of genes, and because the expression levels of
   lncRNAs are usually lower than those of mRNAs (thus less differentially
   expressed lncRNAs can be identified). Studies that aim to identify
   lncRNA signatures for esophageal cancers and HNSCC have increased
   dramatically. For example, studies by Cao et al. [[36]6], Wang et al.
   [[37]7] and Yao et al. [[38]8] specifically aimed to identify lncRNA
   expression signatures with prognostic value for HNSCC patients, while
   several studies [[39]9–[40]12] identified relevant lncRNA signatures
   for esophageal cancer. Nevertheless, those studies usually considered
   HNSCC or esophageal cancer as a whole or only focused on one specific
   subtype.

   In this study, we applied a novel feature selection method – the
   Cox-filter method [[41]13] – to the cancer genome atlas (TCGA)
   esophageal cancer and HNSCC RNA-Seq data, with the objectives of
   constructing subtype-specific prognostic lncRNA expression signatures
   for EC and HNSCC. Precision medicine for those patients will only be
   possible once subtype-specific prognostic signatures become available.

Materials and methods

Experimental data

   The lncRNA expression values, namely, FPKM (fragments per kilo-bases
   per million) for HNSCC were retrieved from the TANRIC (The Atlas of
   ncRNA in Cancer) database [[42]14], version 1.0.6
   ([43]https://www.tanric.org/), which was last updated on 07/29/2015.
   Then the corresponding clinical information was retrieved from the the
   Genomic Data Commons ([44]https://gdc.cancer.gov) by matching the
   barcode IDs of samples in the TANRIC database [[45]14] with those in
   the TCGA database. Patients without information on overall survival
   (OS), age, gender, pathological tumor stage and histological subtype
   were discarded. Only patients with LSCC and HSCC were retained for
   analysis. If the sum of FPKM values of lncRNA expression across all
   samples (LSCC and HSCC patients combined) was < 4, they were deleted.
   Finally, log 2 transformations on (FPKM counts + 1) were carried out,
   providing a better approximation to a normal distribution.

   For the esophageal cancer study, both the expression profiles (RNA-Seq
   data) of TCGA ESCA cohort and clinical information such as overall
   survival time were downloaded from the Genomic Data Commons.
   Subsequently, the lncRNAs were collected by mapping the Ensemble IDs of
   RNA-Seq data to those in the TANRIC database [[46]14] (given that the
   ESCA cohort is not included in the TANRIC database) so that expression
   profiles of lncRNAs were obtained.

   The ratio of LSCC and HSCC is extremely high (89:6) while that for ESCC
   to EAC is very close to 1 (81:83), which represents the two extreme
   cases (huge imbalance of sample ratios versus perfect balance of sample
   ratios). Hence, using these two datasets, it is possible to examine the
   influence of subgroup size imbalance on the performance of a feature
   selection algorithm. The demographical characteristics of these two
   datasets are presented in Table [47]1.

Table 1.

   Characteristics of head and neck squamous cell carcinoma and esophageal
   cancer data
   Patients (#) Deaths (#) Median survival time (days) p-value (log-rank
   test)
   Esophageal cancer
    Squamous Cell Carcinoma [ESCC] 81 29 763
    Adenocarcinoma [EAC] 83 38 801 0.721
   Head and Neck Squamous Cell Carcinoma [HNSCC]
    Laryngeal [LSCC] 89 25 1838
    Hypopharyngeal [HSCC] 6 2 – 0.839
   [48]Open in a new tab

   The log-rank tests indicated histological subtype but had no prognostic
   value for esophageal cancer or HNSCC. For esophageal cancer, this is
   consistent with previous results. For HNSCC the discrepancy may be
   attributable to the small sample size of HSCC subtype

Statistical methods

   The Cox-filter method proposed by Tian et al. [[49]13] screens genes
   one by one according to the significance level of the corresponding
   coefficients in a Cox model. Under the two-class cases (the model can
   easily be extended to multiple-class cases), the corresponding Cox
   model may be written as,
   [MATH: <msub><mi>λ</mi><mi>ijg</mi></msub><mfenced close=")"
   open="("><mi
   mathvariant="normal">t</mi></mfenced><mo>=</mo><msub><mi>λ</mi><mrow><m
   n>0</mn><mi mathvariant="normal">g</mi></mrow></msub><mfenced close=")"
   open="("><mi mathvariant="normal">t</mi></mfenced><mo>exp</mo><mo
   stretchy="true">(</mo><msub><mi
   mathvariant="normal">β</mi><mtext>1g</mtext></msub><msub><mi
   mathvariant="normal">I</mi><mi
   mathvariant="normal">i</mi></msub><mfenced close=")" open="("><mrow><mi
   mathvariant="normal">j</mi><mo>=</mo><msub><mi
   mathvariant="normal">c</mi><mn>2</mn></msub></mrow></mfenced><mo>+</mo>
   <msub><mi mathvariant="normal">β</mi><mtext>2g</mtext></msub><msub><mi
   mathvariant="normal">X</mi><mi>ijg</mi></msub><mo>+</mo><mfenced
   close=")" open="("><mrow><msub><mi
   mathvariant="normal">β</mi><mtext>1g</mtext></msub><msub><mi
   mathvariant="normal">I</mi><mi
   mathvariant="normal">i</mi></msub><mfenced close=")" open="("><mrow><mi
   mathvariant="normal">j</mi><mo>=</mo><msub><mi
   mathvariant="normal">c</mi><mn>2</mn></msub></mrow></mfenced><mo>×</mo>
   <msub><mi
   mathvariant="normal">X</mi><mi>ijg</mi></msub></mrow></mfenced> :MATH]

   Tian et al. [[50]13] provided a detailed description of the definitions
   of parameters (i.e., βs and λs) and a graphical illustration of all
   possible scenarios; those details are not presented here. For the
   current study, the features under consideration are lncRNAs,
   subtype-specific prognostic lncRNAs were those for which either β[2g]
   or (β[2g] + β[3g]) is significantly different from zero. More
   specifically, β[2g] ≠ 0 implies that lncRNA g has a prognostic value
   for subgroup c[1] while (β[2g] + β[3g]) ≠ 0 implies lncRNA g has a
   prognostic value for subgroup c[2]. Therefore, β[2g] and β[3g] are the
   parameters of interest and their significance levels determine if
   subtype-specific lncRNAs exist.

Statistical language and packages

   All statistical analyses were carried out in the R language, version
   3.5 ([51]www.r-project.org).

Results

   By applying the Cox-filter model to esophageal cancer data and setting
   the cutoff of adjusted p-values for these linear coefficients at 0.05,
   we identified 200 lncRNAs that have prognostic values for EAC and 96
   for ESCC. Among them, there were 46 overlaps. We searched the GeneCards
   database for their biological relevance. For EAC, after removing 19
   genes that are not be recognized by the GeneCards database
   ([52]www.genecards.org), 58 lncRNAs were indicated to be directly
   related to cancers. For ESCC, 19 lncRNAs are unrecognizable as well.
   Among the remaining 77 lncRNAs, 27 of them were directly related to
   cancers. A Venn-diagram (Fig. [53]1) was made and the gene symbols were
   given, stratified by EAC-specific lncRNAs, ESCC-specific lncRNAs and
   overlapped lncRNAs between two subtypes. Among these unique 74 lncRNAs,
   44 were regarded as being differentially expressed between cancer
   tissues and normal tissues.

Fig. 1.

   [54]Fig. 1
   [55]Open in a new tab

   Venn-diagram illustrating EAC- and ESCC-specific prognostic lncRNAs.
   Gene symbols of microRNAs that were misclassified as lncRNAs are
   crossed out. EAC: esophageal adenocarcinoma; ESCC: esophageal squamous
   cell carcinoma

   For HNSCC, using a cutoff of 0.05 for adjusted p-values the Cox-filter
   method identified 126 LSCC lncRNAs (20 non-identifiable in the
   GeneCards database) and 89 HSCC lncRNAs (30 of which are
   non-identifiable in the GeneCards database). Fifty-six were directly
   related to cancers for LSCC and 16 for HSCC. Among these lncRNAs, 6
   lncRNAs were shared by these two subtypes, and 44 lncRNAs were regarded
   as being differentially expressed between cancer tissues and normal
   tissues. Figure [56]2 presents gene symbols of those lncRNAs. From the
   gene symbols given in Figs. [57]1 and [58]2, we observed several
   microRNAs (e.g., MIR146A and MIR 296) that were mistakenly recognized
   as lncRNAs by the TANRIC database. Since TANRIC has not been updated
   since its initiation, it is natural to expect such errors. In the
   following results, those microRNAs were removed manually.

Fig. 2.

   [59]Fig. 2
   [60]Open in a new tab

   Venn-diagram illustrating LSCC-specific prognostic lncRNAs and
   HSCC-specific prognostic lncRNAs. Gene symbols of microRNAs that were
   misclassified as lncRNAs are crossed out. LSCC: laryngeal squamous cell
   cancer; HSCC: hypopharyngeal squamous cell cancer

Discussion

   In this study, Pvt1 oncogene (PVT1) with a confidence score of 25.4 is
   ranked on the second place for the EAC-specific prognostic lncRNAs.
   Based on the strategy of competitive endogenous RNA (ceRNA) networks
   [[61]15], overexpression of PVT1 correlates with a poor prognosis
   [[62]16] or a fast tumor progression [[63]17] in esophageal cancer
   patients or in ESCC [[64]18] In this study, PVT1 was indicated as an
   EAC-specific lncRNA since it does not belong to the intersection set
   between lncRNAs for these two subtypes.

   CDKN2B antisense RNA 1 (CDKN2B-AS1), also known as ANRIL, was on the
   top of this list (i.e., cancer related EAC-specific prognostic
   lncRNAs), however, only three studies [[65]19–[66]21] have addressed
   its association with esophageal cancer. While the first two studies
   explored the association between CDKN2B-AS1 and esophageal cancer by
   way of genetic mutations, the third did so from the prospective of
   expression level. Other than esophageal cancer, CDKN2B-AS1 had been
   linked to a variety of cancer types such as acute lymphoblastic
   leukemia [[67]22], gastric cancer [[68]20, [69]23] and hepatocellular
   carcinoma (HCC) [[70]24]. For other top-ranked lncRNAs, Yoon et al.
   [[71]25] have demonstrated that LUCAT1 was over-expressed in tumor
   issues compared to paired normal tissues and may promote carcinogenesis
   of ESCC. Another recent study [[72]26] has shown that up-regulation of
   CBR3-AS1 promoted cell proliferation and was positively correlated with
   pathologic stages of ESCC. Lastly, despite the absence of literature
   suggesting that TP53TG plays any role in the development and
   progression of esophageal cancer, this lncRNA can suppress tumor growth
   and is of importance for the correct response of P53 to DNA damage
   [[73]27]. In addition, the association of TP53TG with other cancer
   types such as glioma and lung caner has been reported in previous
   studies.

   Besides the lower prevalence of lncRNA studies on EAC, another possible
   explanation for the links of top-5-ranked lncRNAs with ESCC instead of
   EAC is that racial disparities of ESCC between Asian and Caucasian
   populations existed at the molecular level [[74]28]. Then, it is
   natural to observe a link between PVT1 and ESCC during the literature
   mining considering those studies were all carried out in East Asia. In
   contrast, our work is based on the TCGA RNA-Seq data in which most
   patients are Whites.

   On the other hand, for the top 5 directly-related-to-cancer lncRNAs for
   the ESCC, only two studies provided experimental supports on the
   association of HULC [[75]29] and EGOT [[76]30] with esophageal cancer.
   For the remaining three lncRNAs – LINC01089, TUSC8 and CAHM — the
   LncRNADisease2 database [[77]31] used computational methods and
   predicted they are associated with gastric cancer. Even though the
   identified lncRNAs are related to a variety of cancers, more focus on
   their correlations with ESCC and EAC are in demand. The expression
   levels of those 10 lncRNAs were compared between ESCC and EAC, between
   esophageal cancer tissues and normal tissues using Wilcoxon tests.
   Among them, 6 (4 were specific for EAC, 1 for ESCC and 1 shared by both
   subtypes) had a corresponding p-value < 0.05 and may be considered as
   the differentially expressed lncRNAs between EAC and ESCC (Fig. [78]3).
   All these 6 lncRNAs except CAHM had corresponding Wilcoxon test
   p-values < 0.05 in the comparison of tumor tissues and normal tissues
   as well (Fig. [79]3). Nevertheless, as shown in Fig. [80]4a, these 10
   lncRNAs hold very limited discriminative capacity to separate EAC from
   ESCC. In contrast, they can predict the prognosis status perfectly. In
   Fig. [81]4b, Kaplan-Meier curves were plotted for high-risk and
   low-risk groups (stratified according to the estimated risk scores of
   the multivariate Cox-regression model with these 10 lncRNAs as
   covariates), and then a log-rank test was performed to compare these
   survival curves. From Fig. [82]4b, we observed that within each
   subtype, the difference between the high-risk and low-risk groups was
   significant while within each risk group (between subtype), the
   difference was less or not significant. This result is expectable given
   that the outcomes (i.e., dependent variables) considered in the
   segmentation of subtypes and prognosis prediction are distinct.

Fig. 3.

   [83]Fig. 3
   [84]Open in a new tab

   Box-plots illustrating the expression levels of 6 differentially
   expressed lncRNAs between EAC and ESCC (which have a Wilcoxon test
   p-value < 0.05). Among them, 5 lncRNAs may be regarded as
   differentially expressed lncRNAs between esophageal cancer and normal
   controls (which have a corresponding Wilcoxon test p-value < 0.05 as
   well). EAC: esophageal adenocarcinoma; ESCC: esophageal squamous cell
   carcinoma

Fig. 4.

   [85]Fig. 4
   [86]Open in a new tab

   Discriminative value and prognostic value of the top 10
   directly-related-to-cancer lncRNAs identified by the Cox-filter method
   for the esophageal cancer application. a Heat-map. b Kaplan-Meier
   curves. Based on the risk scores calculated using a multivariate Cox
   regression model, the samples were divided into a high- and low-risk of
   death groups. From these two plots, it was observed that while the
   lncRNAs possessed little information for segmentation of EAC and ESCC,
   they can distinguish the high- and low-risk groups perfectly well. In
   the Kaplan-Meier plot the log-rank p-value was also given. EAC:
   esophageal adenocarcinoma; ESCC: esophageal squamous cell carcinoma;
   LR: low-risk group; HR: high-risk group

   Among the overlapped 11 lncRNAs, in addition to that CAHM was
   experimentally validated to be associated with colorectal cancer by a
   qPCR study [[87]32] and astrocytoma [[88]33] by a microarray study,
   TMEM51-AS1 was with chromophobe renal cell carcinoma [[89]34] and liver
   cancer [[90]35] by qPCR studies, RAD51-AS1 was with only ovarian
   epithelial cancer [[91]36], RNF139-AS1 was with only astrocytoma
   [[92]37] and LINC01089 with breast cancer [[93]38] by qPCR and
   astrocytoma [[94]33] by a microarray study, all except DSE and SPPL2B
   (which is not recorded on LncRNADisease2 database) were predicted to be
   correlated with a variety of cancers such as gastric cancer by the
   LncRNADisease2 database. Further studies are warranted to investigate
   the roles that the identified lncRNAs (including overlapped ones and
   unique-to-subtype ones by integrating the Cox-filter method and
   biological relevancy) may play during the development and progression
   of esophageal cancer.

   For LSCC prognostic lncRNAs, H19, MALAT1, NEAT1, CYTOR and SNHG12 were
   ranked as the first five of this directly-related-to-cancer list. For
   HSCC, TERC, PCAT1, CYTOR, LINC01234 and LINC00958 made to the list. H19
   is a well-known oncogene and acts as a driving force in a variety of
   cancers. For HNSCC specifically, a study by Guan et al. [[95]39]
   demonstrated that overexpression of H19 is associated with tumor
   recurrence and poor prognosis by performing an experiment including 62
   HNSCC patients (46 with LSCC and 14 with HSCC). A very recent study
   [[96]40] also showed that the expression level of H19 was higher in
   patients with metastasized (vs non-metastasized) tongue squamous cell
   carcinoma, and was higher in tumor cells than normal squamous cells.

   MALAT1 was found to be overexpressed in tumor tissues of oral squamous
   cell carcinoma (OSCC) patients by a real-time PCR experiment carried
   out by Zhou et al. [[97]41]. Chang et al. [[98]42] showed that
   inhibition of MALAT1 can prevent OSCC proliferation whereas its
   overexpression can promote OSCC. According to the ceRNA network, MALAT1
   is a microRNA sponge of miR-125b of which STAT3 is predicted as a
   binding target. In addition, two studies [[99]43, [100]44] provided
   experimental supports for the association of MALAT1 and tongue squamous
   cell carcinoma. Using qRT-PCR, Wang et al. [[101]45] examined and
   compared the expression level of NEAT1 in LSCC and adjacent
   non-neoplastic tissues and showed that NEAT1 was significantly
   over-expressed in LSCC. Hence, they concluded that “NEAT1 plays an
   oncogenic role in the tumorigenesis of LSCC.”

   CYTOR, also known as LINC00152, was proved experimentally to be
   associated with progression and prognosis of tongue squamous cell
   carcinoma [[102]46] and HNSCC [[103]47]. Using TCGA RNA-Seq data and
   some bioinformatics tools, Guo et al. [[104]48] identified CYTOR as an
   HNSCC-associated lncRNA and determined that its expression is
   positively correlated with lymph node metastasis and risk of death.
   Subsequently, its function was explored by cell-based experiments which
   suggested that CYTOR inhibited cell apoptosis after the treatment with
   chemotherapeutic drug diamminedichloroplatinum (DDP). Furthermore,
   acting as the microRNA sponge of miR-19-5p that combines with the 3’UTR
   region of WWP1, overexpression of SNHG12 may promote proliferation and
   invasion of LSCC [[105]49]. In our analysis, CYTOR was shared by both
   LSCC and HSCC subtypes.

   Even though no experimental evidence or computational prediction links
   TERC with HNSCC in the LncRNADisease2.0 database [[106]31], literature
   mining in the PubMed database identified several studies to support
   their association. For LSCC specifically, Liu et al. [[107]50] detected
   TERC gene amplification in precancerous and cancerous tissues using
   fluorescent in situ hybridization. In a recent study [[108]51], the
   expression values of PCAT1 in paired HNSCC tissues and adjacent
   non-tumor tissues were measured using qRT-PCR. The results showed that
   PCAT1 was over-expressed in the tumor tissues, which consisted with the
   results given by the online bioinformatics tool, GEPIA
   ([109]http://gepia.cancer-pku.cn). In addition, that study also proved
   that after the knockdown of PCAT1, p38 MAPK and apoptosis
   signal-regulating kinase 1 which induced Caspase 9 and PART mediated
   apoptosis were activated.

   For the last two HSCC-specific lncRNAs, namely, LINC01234 and
   LINC00958, no evidence has been found to link them with HNSCC in either
   the LncRNADisease2 database (experimentally or computationally) or the
   PubMed literature search. Both of these genes overlapped the HSCC and
   LSCC subtypes. Likewise, for the final 3 overlapped lncRNAs, no support
   for a link with HNSCC can be found. Further studies are warranted.

   Among these 9 unique directly-related-to-cancer lncRNAs, only CYTOR and
   SNHG12 have Wilcoxon test p-values < 0.05 (Fig. [110]5) and may be
   loosely regarded as differentially expressed genes between LSCC and
   HSCC subtypes, and between cancer tissues and normal tissues. The small
   sample size of HSCC in this analysis may explain the results to some
   degree. Similar to the results of esophageal cancer application, while
   these 9 lncRNAs cannot distinguish LSCC and HSCC, they do have
   prognostic value for predicting the risk of death for HNSCC patients
   (here, LSCC and HSCC were examined together given there were only 6
   HSCC patients in this study). Corresponding heat-map and Kaplan-Meier
   curves are presented in Fig. [111]6. Lastly, the regulated mRNAs by the
   identified lncRNAs were retrieved from the lncRNADisease 2.0 database
   [[112]31] and the pathway enrichment analysis was carried out using the
   String database [[113]52]. The enriched GO terms and KEGG pathways for
   these four subtypes are presented in Table [114]2, from which we
   observe that no overlaps among these four subtypes occur.

Fig. 5.

   [115]Fig. 5
   [116]Open in a new tab

   Box-plots illustrating the expression levels of 2 differentially
   expressed lncRNAs between LSCC and HSCC (which have a Wilcoxon test
   p-value < 0.05). Because the sample size of HSCC is very small, only
   two lncRNAs barely made the significance level of 0.05, which were
   differentially expressed lncRNAs between cancer tissues and normal
   tissues as well. LSCC: laryngeal squamous cell cancer; HSCC:
   hypopharyngeal squamous cell cancer

Fig. 6.

   [117]Fig. 6
   [118]Open in a new tab

   Discriminative value and prognostic value of 9 top
   directly-related-to-cancer lncRNAs identified by the Cox-filter method
   for the head and neck cancer. a Heat-map of these lncRNAs. b
   Kaplan-Meier curves of these lncRNAs. While these lncRNAs possessed
   little information for segmentation of HSCC and LSCC, they can
   distinguish the high- and low-risk of death groups perfectly well. In
   the Kaplan-Meier plot the log-rank p-value is also given. Since the
   number of HSCC patients included in this study is very small, the
   log-rank test was based on two groups instead of four groups. LSCC:
   laryngeal squamous cell cancer; HSCC: hypopharyngeal squamous cell
   cancer; LR: low-risk group; HR: high-risk group

Table 2.

   Enriched pathway analysis for the mRNAs regulated by selected lncRNAs
   GO-BP GO-CC GO-MF KEGG
   LSCC – – – –
   HSCC peptide cross-linking, skin development, epidermis development,
   keratinization, epithelium development, cornification cornified
   envelope
   ESCC – nuclear envelope, organelle envelope, organelle,
   membrane-bounded organelle, intracellular membrane-bounded organelle,
   organelle part, intracellular organelle part, organelle membrane,
   intracellular organelle, nuclear part – –
   EAC Nephron tubule development – – Nitrogen metabolism
   [119]Open in a new tab

   no enriched pathways. LSCC laryngeal squamous cell carcinoma, HSCC
   hypopharyngeal squamous cell carcinoma, ESCC esophageal squamous cell
   carcinoma, EAC esophageal adenocarcinoma, GO-BP gene ontology
   biological process category, GO-CC gene ontology cellular component
   category, GO-MF gene ontology molecular function category, KEGG Kyoto
   encyclopedia of genes and genomes pathways

Conclusions

   The Cox-filter method is among the first efforts to develop feature
   selection algorithms capable of identifying prognostic genes
   specifically for different subtypes. When applied to gene expression
   profiles, it achieved satisfactory performance. In this study, we show
   that this method is applicable to lncRNA expression profiles, as
   illustrated by the two real-world applications in which the Cox-filter
   method identified many lncRNAs with meaningful implication with cancer.
   The ratio of the two distinct subtypes in these applications represent
   extreme cases: one with good balance case and one with bad balance. The
   Cox-filter method can easily deal with the first case. In the second
   case, it can still estimate the significance level of lncRNAs in minor
   subtypes by borrowing some information from the dominant subtype.
   Therefore, the Cox-filter method is a handy tool to construct
   subtype-specific prognostic lncRNA signatures, indeed.

   The big drawback of the Cox-filter method is inclusion of many false
   positives in the final models. To address this drawback, several
   extensions that incorporate biological information and prioritize genes
   with high connectivity levels have been proposed [[120]53, [121]54].
   When applying to lncRNA profiles, the issue is still apparent and thus
   needs to be addressed as well. However, those extensions cannot be
   adopted to the lncRNA expression profiles directly because the
   biological pathway information was retrieved from a knowledgebase such
   as String [[122]52] or HPRD [[123]55], which focus on mRNAs (protein
   coding genes). A statistical model (e.g., the WGCNA method [[124]56]
   with the capacity of constructing co-expression networks/modules is
   needed before implementing such Cox-filter extensions. Nevertheless, by
   combining biological relevancy information from the GeneCards database,
   we further refined the lncRNA lists identified by the Cox-filter
   method, and the resulting lncRNA signatures have been demonstrated to
   possess perfect prognostic value.

Acknowledgements