Abstract Background Long non-coding RNAs (lncRNAs) are importantly involved in the initiation and progression of non-small cell lung cancer (NSCLC). However, the classification and mechanisms of lncRNAs remain largely elusive. Aim Hence, we addressed this through bioinformatics analysis. Methods and results We utilized microarray technology to analyze lncRNAs and mRNAs in twenty paired NSCLC tumor tissues and adjacent normal tissues. Gene set enrichment analysis, Kyoto Encyclopedia of Genes and Genomes, and Gene Ontology were conducted to discern the biological functions of identified differentially expressed transcripts. Additionally, networks of lncRNA-mRNA co-expression, including cis-regulation, lncRNA-transcription factor (TF)-mRNA, trans-regulation, and lncRNA-miRNA-mRNA interactions were explored. Furthermore, the study examined differentially expressed transcripts and their prognostic values in a large RNA-seq dataset of 1016 NSCLC tumors and normal tissues extracted from the Cancer Genome Atlas (TCGA). The analysis revealed 391 lncRNAs and 344 mRNAs with differential expression in NSCLC tumor tissues compared to adjacent normal tissues. Subsequently, 43,557 co-expressed lncRNA-mRNA pairs were identified, including 27 lncRNA-mRNA pairs in cis, 9 lncRNA-TF-mRNA networks, 34 lncRNA-mRNA pairs in trans, and 8701 lncRNA-miRNA-mRNA competing endogenous RNA (ceRNA) networks. Notably, these lncRNAs were found to be involved in immune-related pathways. Six significant transcripts, including NTF4, PTPRD-AS, ITGA11, HID1-AS1, RASGRF2-AS1, and TBX2-AS1, were identified within the ceRNA network and trans-regulation. Conclusion This study brings important insights into the regulatory roles of lncRNAs in NSCLC, providing a fresh perspective on lncRNA research in tumor biology. Keywords: Non-small cell lung cancer, Long non-coding RNAs, cis-Regulation, trans-Regulation, Tumorigenesis Highlights * • Genome-wide profiling of long non-coding RNAs (lncRNAs) in NSCLC datasets. * •Cis- and trans-lncRNAs were identified in NSCLC. * •Both cis- and trans-lncRNAs regulate key immune-related pathways in NSCLC. 1. Introduction Lung cancer is one of the most prevalent cancer types and is a leading cause of cancer-related mortality [[35]1]. In 2022, lung cancer leads to approximately 870,982 new cases and 505,618 deaths in China [[36]2]. Over 80% of lung cancer have been diagnosed as non-small cell lung cancer (NSCLC), representing a significant and serious subtype of the disease [[37]3]. Multiple treatment strategies have been developed for cancer management, aimed at enhancing the five-year survival (OS) rates of patients with non-small cell lung cancer (NSCLC) over the past decades [[38][4], [39][5], [40][6]]. Despite advancements in treatment strategies, the five-year overall survival (OS) rate for non-small cell lung cancer (NSCLC) is still low, ranging from 10% to 20% according to the Global Surveillance of Trends in Cancer Survival 2000–2014 (CONCORD-3) [[41]7]. These findings underscore the need to explore additional diagnostic and therapeutic biomarkers for NSCLC, thereby significantly enhancing the clinical outcomes for patients with this disease. In recent years, numerous studies highlighted the pivotal roles of lncRNAs in cancer, demonstrating their regulation of tumor initiation and progression, and their correlation with the prognosis and survival outcomes of patients [[42][8], [43][9], [44][10]]. Especially, lncRNA pro-transition associated RNA (PTAR) can be an oncogene in NSCLC by increasing cell proliferation, migration, and invasion [[45]11]. While lncRNAs have been implicated in various oncogenic processes, fetal-lethal non-coding developmental regulatory RNA (FENDRR) acts as a tumor suppressor in NSCLC, inhibiting tumor cell growth, migration, and invasion [[46]12]. Although numerous lncRNAs exert vital roles in NSCLC, the focus regarding attention on the mechanism of action is their RNA products in the past years. Recent research indicates that not only the products of RNA species exert important roles, but also the lncRNA locus separately exerts regulatory functions [[47]13,[48]14]. About 10%–20% of human transcripts are identified as protein-coding RNAs, and 80%–90% of transcripts are non-protein-coding RNAs (non-coding RNAs) [[49]15]. LncRNAs, a subclass of non-coding RNAs characterized by transcripts longer than 200 nucleotides, do not encode proteins but possess extensive regulatory capabilities, influencing a broad spectrum of cellular processes [[50]16,[51]17]. LncRNAs are heterogeneous molecules in various cancers due to their diverse locus and function [[52]18]. LncRNAs are broadly classified into different categories according to relative genomic position to the protein-coding RNAs, including sense, antisense, intronic, intergenic, and bidirectional [[53]16,[54]19,[55]20]. Indeed, lncRNAs play multifaceted roles in both cis and trans regulation, organization of nuclear, and gene modulation, sometimes even encoding small proteins themselves [[56][21], [57][22], [58][23]]. It has been challenging to understand the biological function of several lncRNAs in regulating biological and pathological processes mediated by the lncRNAs themselves. However, to date, our understanding of the expression and regulatory mechanisms of lncRNAs in NSCLC remains limited. In recent years, advancements in next-generation sequencing (NGS) have enabled the confirmation of numerous biomarkers and underlying mechanisms in lung cancer through bioinformatics analysis [[59][24], [60][25], [61][26], [62][27]]. This study aims to identify the landscape of lncRNA expression and explore the function of lncRNAs in NSCLC using bioinformatics analysis. We identified lncRNAs and protein-coding transcripts that showed differential expressions based on microarray analysis. The cis- and trans-regulatory lncRNAs and their regulated biological pathways were then explored in NSCLC. Importantly, TBX2-AS1, NTF4, PTPRD-AS, ITGA11, RASGRF2-AS1, and HID1-AS1 were identified as potential therapeutic biomarkers for NSCLC. 2. Material and methods 2.1. Specimen Three tumor samples and normal tissues were collected from patients diagnosed with NSCLC. The cohort consisted of two females and one male, with a median age of 64 years (50–70). All patients have signed informed consent for their inclusion in this study. Importantly, samples were procured before any pre-operative radiation or chemotherapy treatments. After the pathological examination by three pathologists, the samples were saved in liquid nitrogen. Our research has been supported by the Ethics Committee of the First Hospital of Kunming Medical University. 2.2. RNA extraction, cDNA library preparation, and microarray analysis Total RNA was isolated from fresh NSCLC tumor tissues and normal tissues using TRzol reagent (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer's protocol. The concentration and quality of RNA were assessed using the NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), and the integrity of RNA was evaluated using an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) according to the manufacturer's instructions. Subsequently, purified single-strand RNA was reverse transcribed to generate double-strand cDNA and labeled with Cyanine 3-CTP (Cy3). The Cy3-labeled cDNA was then hybridized onto the Agilent Human ceRNA Microarray 2019 (4∗180k, Design ID:086188). Finally, the hybridized slides were scanned using the G2505C Agilent Scanner (Agilent Technologies). 2.3. Quality control and data standardization After screening the original images of slides, the raw data was extracted using the Feature Extraction software (version 10.7.1.1, Agilent Technologies). Subsequently, the raw data were normalized and analyzed using the R software package. The coefficient of variable (CV) value of each sample was used to the stability of the signal detected by the chip repeat probe. The correlation between samples was used to measure the correlation between these samples. Principal component analysis (PCA) was used to reveal the distribution of the main information of each sample. 2.4. Differential analysis for lncRNAs and mRNAs LncRNAs and mRNAs with different expressions were identified using the student's t-test [[63]28], the threshold adjusted as log2(Fold change, FC)≥2 and P ≤ 0.05. The volcano plots and heatmaps were drawn to show the significantly differential expressed lncRNAs and mRNAs. 2.5. Biofunction analysis of DE-mRNAs Gene Ontology (GO) analysis [[64]29,[65]30], encompassing biological process (BP), cellular component (CC), and molecular function (MF), was conducted using bioinformatics tools available at OECloud ([66]https://cloud.oebiotech.cn). The significance of each GO term was assessed utilizing the Hypergeometric distribution algorithm, as previously described [[67]31]. KEGG pathway enrichment analysis was carried out using the Hypergeometric distribution algorithm based on the KEGG database [[68]32]. 2.6. Correlation analysis for co-expression lncRNA-mRNA pairs The correlation between lncRNAs and mRNAs with differential expressions (DE-mRNAs) was evaluated using the Pearson correlation analysis. LncRNA-mRNA pairs were revealed based on a correlation coefficient greater than 0.8 and P < 0.05. The lncRNA-mRNA pairs were visualized using the Circos tool [[69]32]. 2.7. Cis- and trans-regulation analysis Furthermore, we explored the regulatory manners of lncRNAs according to the distances between the protein-coding transcripts and lncRNAs. First, we investigated the potential cis-regulation of lncRNAs on their neighboring protein-coding transcripts, focusing on those located within a 100 kb distance from the lncRNA locus. The neighboring protein-coding transcripts of lncRNAs were screened using the FEELnc tool [[70]33], and the cis-regulatory lncRNA-mRNA pairs were identified by overlapping with the above lncRNA-mRNA pairs. Second, we explored the trans-regulatory lncRNAs on protein-coding transcripts using RIsearch-2.0 according to the filter criteria, the direct binding sites of lncRNAs sequences on protein-coding transcript sequences more than 10 bases, and the binding free emery less than 100 [[71]34]. The trans-regulatory lncRNA-mRNA pairs were screened by overlapping with the above lncRNA-mRNA pairs and visualized using a network tool [[72]35]. Third, lncRNAs recruited TFs to the specific DNA sequence region (such as promoters) to initiate transcriptional activity, or multiple TFs bind to a lncRNA to regulate molecular mechanisms. We analyzed transcription factor (TF) binding motifs within the regulatory regions of lncRNAs, specifically within a range of ± 1 kb from the lncRNA start sites. Potential TFs binding to lncRNAs were predicted using JASPAR [[73]36]. We constructed the lncRNA-TF-mRNA ternary regulatory network based on data from the Gene Transcription Regulation Database (GTRD) [[74]37] and the above lncRNA-mRNA pairs, and the network was visualized network tool [[75]35]. 2.8. Pathway and network enrichment analysis The protein-coding transcripts of the cis-, trans-regulatory lncRNA-mRNA pairs, and ceRNA network were annotated and enriched via the clusterProfiler package. 2.9. TCGA data analysis The RNA-seq dataset was obtained from 1016 NSCLC tumors and normal tissues from the Cancer Genome Atlas (TCGA) database, accessible at [76]https://www.cancer.gov/about-nci/organization/ccg/research/structur al-genomics/tcga. This dataset includes 515 samples of TCGA-LUAD (lung adenocarcinoma) and 501 samples of TCGA-LUSC (lung squamous cell carcinoma). The relationship between significant transcripts and clinical phenotypes, including age, gender, stages, TNM, lymph node count information, and the relationship between expression of the transcripts and survival were investigated. 3. Results 3.1. Identification of the DE-lncRNAs and DE-mRNAs in NSCLC In this study, a Microarray analysis was carried out to investigate the transcriptome alteration in NSLCLC. After quality control and data standardization ([77]Fig. S1), 391 lncRNAs were identified, which were preferably distributed in Chr1, Chr2, Chr6, Chr10, Chr11, and Chr17 ([78]Fig. S2A). More than 70 % of lncRNAs are longer than 500 bp ([79]Fig. S2B). We found that the highest percentage of lncRNAs located in 1–3 exons ([80]Fig. S2C). The types of lncRNAs were identified as exon type (27.55 %), intron type (23.14 %), and intergenic region type (49.31 %) ([81]Figs. S2D–E). Furthermore, 304 DE-lncRNAs (145 upregulated and 159 downregulated) were identified [log2(FC) > 2 and P < 0.05] ([82]Fig. 1A–C, [83]Table S1). Similarly, a total of 344 DE-mRNAs (232 upregulated and 112 downregulated) were identified [log2(FC) > 2 and P < 0.05] ([84]Fig. 1D–F, [85]Table S2) in NSCLC tumor tissues compared to that in normal tissues. Fig. 1. [86]Fig. 1 [87]Open in a new tab Identification of the DE-lncRNAs and DE-mRNAs in NSCLC. (A). The volcano plots illustrated the different distributions of the DE-lncRNAs. Red points represented upregulated lncRNAs. Point represented downregulated lncRNAs. (B). The scatter plots exhibited the expression levels of DE-lncRNAs between NSCLC tumor and adjacent normal tissues. Red points represented upregulated lncRNAs. Point represented downregulated lncRNAs. (C). Heatmap of the unsupervised hierarchical clustering showing the distinction of DE-lncRNAs between NSCLC tumor and adjacent normal tissues. (D). The volcano plots exhibited the distribution of DE-mRNAs. Red points represented upregulated mRNAs. Point represented downregulated mRNAs. (E). The scatter plots exhibited the expression levels of DE-mRNAs between NSCLC tumor and adjacent normal tissues. Red points represented upregulated mRNAs. Point represented downregulated mRNAs. (F). Heatmap of the unsupervised hierarchical clustering showing the distinction of DE-mRNAs between NSCLC tumor and adjacent normal tissues. (For interpretation of the references to color in this figure legend, the reader is referred to