Abstract Hyaluronan (HA) accumulation in clear cell renal cell carcinoma (ccRCC) is associated with poor prognosis; however, its biology and role in tumorigenesis are unknown. RNA sequencing of 48 HA-positive and 48 HA-negative formalin-fixed paraffin-embedded (FFPE) samples was performed to identify differentially expressed genes (DEG). The DEGs were subjected to pathway and gene enrichment analyses. The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) data and DEGs were used for the cluster analysis. In total, 129 DEGs were identified. HA-positive tumors exhibited enhanced expression of genes related to extracellular matrix (ECM) organization and ECM receptor interaction pathways. Gene set enrichment analysis showed that epithelial–mesenchymal transition-associated genes were highly enriched in the HA-positive phenotype. A protein–protein interaction network was constructed, and 17 hub genes were discovered. Heatmap analysis of TCGA-KIRC data identified two prognostic clusters corresponding to HA-positive and HA-negative phenotypes. These clusters were used to verify the expression levels and conduct survival analysis of the hub genes, 11 of which were linked to poor prognosis. These findings enhance our understanding of hyaluronan in ccRCC. Subject terms: Medical research, Urology, Renal cancer, Prognostic markers Introduction Global Cancer Statistics showed that approximately 430,000 new cases of kidney cancer (KC) were diagnosed in 2020, accounting for 2.2% of all human malignancies. It contributed to 180,000 deaths worldwide, making it the most lethal urological malignancy worldwide^[34]1, and the incidence of KC is increasing^[35]2. Renal cell carcinoma (RCC) is the most common type of kidney cancer, accounting for > 90% of primary kidney tumors^[36]3. The three most common histological subtypes of RCC are clear cell RCC (ccRCC), papillary RCC, and chromophobe RCC. Recently, more entities with molecularly defined pathogenesis have been identified^[37]4. RCC is often diagnosed incidentally, and one-third of the patients present with metastatic disease. Twenty percent of patients who undergo surgery for a primary tumor later develop metastases. Despite recent advances in systemic therapies, the prognosis of metastatic disease remains dismal^[38]5,[39]6. Therefore, it is imperative to identify new biomarkers for disease detection, prognostication, and treatment. Hyaluronan (HA) is a ubiquitous large glycosaminoglycan (GAG) found in the extracellular matrix (ECM), where it forms a pericellular coat surrounding cells and functions as a cushion^[40]7. It is composed of a variable number of repeating disaccharide units of N-acetyl-glucosamine (GlcNAc) and glucuronic acid (GlcUA), with an average molecular mass ranging from 1000 to 8000 kD^[41]8. The turnover of hyaluronan is rapid, and one-third of the hyaluronan mass undergoes turnover daily^[42]9. Hyaluronan is synthesized by the hyaluronan synthase enzymes HAS1, HAS2, and HAS3^[43]10. Degradation is mediated mainly by the family of hyaluronidases (HYAL 1-4, PH20, CEMIP, and TMEM2)^[44]11–[45]13. In addition to RCC, increased HA content has been associated with worse outcomes and more aggressive tumor growth in several human malignancies, including breast cancer, colon carcinoma, gastric carcinoma, thyroid cancer, pancreatic cancer, lung adenocarcinoma, lymphoma, hepatocellular carcinomas, and gliomas^[46]14–[47]23. It is postulated that HA acts as a barrier that shields tumor cells from insults, such as therapeutic agents and the immune system, and could serve as a potential target for anticancer drugs^[48]24. The use of PEGylated human hyaluronidase (PEGPH20) has shown promising efficacy in sensitizing pancreatic cancer cells to radiotherapy and in improving the efficacy of anti-PD-1 therapy^[49]25,[50]26. In normal human kidneys, most HA is produced in the renal medulla, while the renal cortex, from which renal cell carcinomas usually arise, is hyaluronan poor^[51]27. Increased cortical HA content is associated with various non-neoplastic kidney diseases/conditions such as acute kidney injury, chronic kidney disease, allograft, and diabetic nephropathy^[52]28–[53]31. To date, reports on HA in RCCs are limited. In our previous study, we showed that increased cellular hyaluronan conveys poor prognosis in patients with ccRCC^[54]14. In addition, a higher hyaluronan content was associated with a higher tumor grade. The individual molecules associated with HA have been studied on gene expression levels. Chi et al.^[55]32 showed that expression of HAS1 was increased in RCC tissue compared with adjacent normal tissue while HYAL1 expression was lower in ccRCC than in normal renal tissue^[56]32. Cai et al. found that HAS1-3 mRNA expression was higher in human ccRCC tissues than in adjacent normal tissues^[57]33. However, only HAS3 protein expression was higher. In conclusion, the results of expression studies are inconsistent, and the expression levels of different HA family proteins might not necessarily reflect overall HA levels. Therefore, to better understand the biological background of hyaluronan accumulation in ccRCC, we performed RNA sequencing of previously found hyaluronan-positive and-negative tumor cohorts. The aim of this study was to investigate the differences in RNA expression profiles and to find new potential hyaluronan-associated molecules. Materials and methods Patients and sample selection A research flowchart of this study is shown in Fig. [58]1. Formalin-fixed and paraffin-embedded (FFPE) tissue samples from patients who underwent surgery for ccRCC in the period 2000–2013 at Kuopio University Hospital were collected from the Biobank of Eastern Finland. The study (Hyaluronan in Renal Cell Carcinoma, HARCC) was approved by the Ethics Committee of the Northern Savo Hospital District (379/2016, November 1st, 2016). The diagnostic samples were processed and diagnosed according to the routine protocol in the Department of Clinical Pathology. Hyaluronan staining and evaluation were performed as described by Jokelainen et al.^[59]14. We selected 48 hyaluronan-positive and 48 hyaluronan-negative tumor samples for RNA sequencing on the basis of tumor grade, sarcomatoid change, sex, survival, and metastasis status (Table [60]1). Three 1-mm-wide tissue cores were punched from each representative tumor block. Figure 1. [61]Figure 1 [62]Open in a new tab Research flowchart. The dashed line represents in silico analysis using the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) data. HA Hyaluronan, ccRCC clear cell renal cell carcinoma, DEG differentially expressed gene, GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes, GSEA Gene Set Enrichment Analysis, STRING Search Tool for the Retrieval of Interacting Genes/Proteins, TRRUST Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining, NOJAH NOt Just Another Heatmap. Table 1. Characteristics of 96 renal cell carcinoma tissue samples. Hyaluronan-positive N (%) Hyaluronan-negative N (%) Samples 48 48 Sex Male 31 (64.6) 24 (50) Female 17 (35.4) 24 (50) Age (mean, range) 61.8 (41–82) 67.0 (36–86) WHO/ISUP grade 1 2 (4.2) 8 (16.7) 2 22 (45.8) 22 (45.8) 3 10 (20.8) 9 (18.8) 4 14 (29.2) 9 (18.8) Sarcomatoid change No 41 (85.4) 43 (89.6) Yes 7 (14.6) 5 (10.4) Disease-related death No 28 (58.6) 35 (72.9) Yes 20 (41.2) 13 (27.1) Clinical stage I 16 (33.3) 24 (50.0) II 9 (18.8) 8 (16.7) III 12 (25.0) 7 (14.6) IV 11 (22.9) 9 (18.8) Metastasis at diagnosis M0 37 (77.1) 39 (81.3) M1 11 (22.9) 9 (18.7) [63]Open in a new tab Next-generation sequencing RNA was isolated by use of an RNeasy FFPE Kit (Qiagen), and deparaffinization was performed using 640–750 µL deparaffinization solution from the kit with 60–90 min incubation at 56 °C. RNA was eluted with 2′ 14 µL of RNase-free water. rRNA was removed by use of a QIASeq FastSelect (rRNA HMR, Qiagen), using 1 µg RNA or as much as could be handled by the kit (max 10 µL). RNA-sequencing libraries were constructed with a TruSeq Stranded mRNA Library Prep kit (Illumina) and 0.3 µL adapters with 30 PCR cycles. Barcodes were optimized by use of BARCOSEL software ([64]http://ekhidna2.biocenter.helsinki.fi/barcosel/)^[65]34. Sequencing was performed using an Illumina Novaseq 6000 instrument. Adapter sequences, low-quality bases (q = 25), and short sequences (m = 30) were first trimmed using cutadapt (v.4.1)^[66]35 ([67]https://cutadapt.readthedocs.io/en/stable/). The fourteenth (p14) patch release for the GRCh38 reference assembly and annotation was downloaded from [68]https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001 405.40_GRCh38.p14/. The STAR aligner (v2.7.9a_2021-06-25) ([69]https://code.google.com/archive/p/rna-star/) with default parameters was used to map reads against the reference sequence^[70]36. Sorting and indexing of the alignment files were performed using Samtools (v.1.10) ([71]https://www.htslib.org/)^[72]37. Quality control and differentially expressed gene analysis A significant portion of the mapped reads was concentrated on only a few genes in each sample, which necessitated an additional filtering step using Samtools (v. 1.16.1) to mark duplicated reads for removal using the no-multi-dup option. Gene counts were computed by use of R (v.4.1.1) ([73]https://www.r-project.org/) and Rsubread (v.2.6.4) ([74]https://bioconductor.org/packages/release/bioc/html/Rsubread.html) with multi-mapping and multi-overlapping reads removed^[75]38,[76]39. In addition to the read-level quality control (QC) detailed in section "[77]next-generation sequencing", further QC steps were performed according to the recommendations specified by Liu et al.^[78]40 for FFPE RNA-seq count data^[79]40. As FFPE samples were used, the total number of mapped reads in all samples was generally low, as expected. Therefore, the quality metric used for filtering the samples was the median sample-sample Spearman correlation after count normalization. Samples with a median correlation below 0.75 were removed, leaving 36 hyaluronan-positive and 41 hyaluronan-negative samples. Differential expression analysis was performed using edgeR (v.3.34.1) ([80]https://bioconductor.org/packages/release/bioc/html/edgeR.html)^[8 1]41. Only protein-coding genes and genes with mean counts above 1 across all samples were included, leaving 10,633 genes for consideration. Count normalization was performed using the trimmed mean M-value method in edgeR. The results we plotted using the ggplot2 package (v.3.4.2) ([82]https://ggplot2.tidyverse.org)^[83]42. Gene Ontology and pathway enrichment analysis Gene Ontology (GO) is a bioinformatics database established to provide simple annotation of gene products^[84]43,[85]44. GO terms include biological processes (BP), cellular components (CC), and molecular functions (MF) of gene products. The Kyoto Encyclopedia of Genes and Genomes (KEGG) and REACTOME are free online databases containing information on biological pathways, molecular interactions, and reactions^[86]45–[87]48. Database analysis was performed to investigate the molecular function of the identified DEGs using ToppGene ([88]https://toppgene.cchmc.org/enrichment.jsp)^[89]49. ToppGene is a bioinformatics portal for gene-list enrichment analysis. The cutoff value for the false discovery rate (FDR) was set at p < 0.05. The Benjamini–Hochberg procedure was used to account for multiple testing. Results were plotted by SRplot ([90]https://www.bioinformatics.com.cn/srplot), an online platform for data analysis and visualization^[91]50. Gene set enrichment analysis Gene Set Enrichment Analysis (GSEA) was performed to examine the gene expression profiles of hyaluronan-positive and hyaluronan-negative samples which had passed quality control steps^[92]51,[93]52. The hyaluronan-positive phenotype was compared with the hyaluronan-negative phenotype. The trimmed mean of M values (TMM)-normalized count data and phenotype data were uploaded to GSEA software (build v.4.3.2.) ([94]https://www.gsea-msigdb.org/gsea/index.jsp) and Human MSigDB h.all.v2023.1.Hs.symbols hallmark gene set was chosen. The number of permutations was set to 1000. All other run parameters were maintained at their default values. Protein-to-protein network construction and subnetworks The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) is a biological database of known and predicted protein–protein interactions (PPI) ([95]https://string-db.org/)^[96]53. This includes the interactions derived from experiments and computationally predicted interactions. STRING (v.11.5) was used to predict interactions between the DEGs. Interactions with a combined score > 0.4 were considered statistically significant. The PPI network provided by STRING was imported into and visualized in Cytoscape (v.3.9.1.) ([97]http://www.cytoscape.org)^[98]54. The Cytoscape plug-in MCODE (v.2.0.2) was used to identify highly interconnected regions in the network^[99]55. Settings used were as follows: Cluster Finding Haircut, Node Score Cutoff = 0.2, K-Core = 2, Max. Depth = 100. Hub genes discovery and analysis Hubgene analysis was performed using the CytoHubba (v.0.1) plug-in in Cytoscape^[100]56. Seven common algorithms (MCC, MNC, Degree, Closeness, Radiality, Stress, and EPC) were used to identify hub genes, and the UpSet intersection plot (by SRPlot) was used to identify common genes. The hub genes were then exported to GeneMANIA ([101]http://www.genemania.org/), a predictor software used to identify other genes related to a set of input genes and internal associations in the gene sets^[102]57. Default methods were used to calculate connection weights. Lastly, we used Transcriptional Regulatory Relationships Unraveled by Sentence-based Text Mining (TRRUST) (v.2) to predict transcription factors (TFs) of hub genes ([103]https://www.grnpedia.org/trrust/)^[104]58. TFs with adjusted P-value < 0.05 were considered significant. Subsequently, we used the TCGA-KIRC dataset to examine the expression of these TFs^[105]59. In-silico TCGA heatmap and cluster analysis To test the gene signature identified by DEG analysis with another dataset, the TCGA-KIRC RNA expression dataset was downloaded via the Bioconductor package TCGABiolinks (v.2.26.0) ([106]https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks. html) using R (v.4.2.2) and RStudio (2022.12.0 + 353)^[107]38,[108]60,[109]61. Phenotype data concerning tumor grade and patient survival were downloaded from the same source. Data from 537 ccRCC samples were collected after removing duplicates. The count values were TMM-normalized and log2(count + 1)-transformed. Genome-Wide Heatmap (GWH) analysis was performed using the NOJAH (NOt Just Another Heatmap) (v.1) interactive tool ([110]https://github.com/bbisr-shinyapps/NOJAH/)^[111]62. A heatmap was plotted, using the set of 129 DEGs. The values selected for hierarchical clustering analysis parameters were the Z-score method for data normalization type, Euclidean for the distance method, and ward.D2 for the clustering method. Tumor phenotype data were combined into a heatmap, using NOJAH. Furthermore, the methylation status of TCGA samples, as described by Ricketts et al.^[112]63, was combined with the heatmap^[113]63. The observed distribution of tumor grade, patient survival, and methylation cluster within each heatmap cluster was tested using the chi-square test. TCGA GSEA analysis GSEA analysis was performed on TCGA-KIRC data using the newly discovered HA-positive and HA-negative phenotypes. TMM-normalized expression and phenotype data were uploaded to GSEA software (build v.4.3.2.) and Human MSigDB h.all.v2023.1.Hs.symbols hallmark gene set was chosen. The number of permutations was set to 1000. All other run parameters were maintained at their default values. Hub gene expression analysis in silico The mRNA expression levels of the identified hub genes were investigated, using the TCGA-KIRC dataset. RNA-seq data from 72 healthy renal tissues were downloaded using the TCGAbiolinks package (v.2.26.0) in R (v.4.2.2). TCGA samples identified belonging to “HA-negative” and “HA-positive” gene expression clusters were compared against each other and to normal renal tissue. The expression levels of the hub genes were TPM-normalized and log2-transformed for each cohort. The Mann–Whitney U-test was used to compare each group with the other two groups, and the expression levels were box-plotted using R package ggpubr (v.0.6.0) ([114]https://rpkgs.datanovia.com/ggpubr/)^[115]64. Prognostic value of hub genes We used the Kaplan–Meier Plotter online database ([116]http://kmplot.com/analysis/) containing TCGA-KIRC data from 530 patients to analyze the prognosis of hub genes^[117]65. Kaplan–Meier estimators were plotted, and hazard ratios were calculated for overall survival (OS) and disease-free survival (DFS). The samples were stratified into low- and high-expression groups based on the median cut-off (50%). Statistical significance was set at p < 0.05 and HR > 1.0 were considered significant. Ethical considerations The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Northern Savo Hospital District (379/2016, November 1st, 2016). Results DEG identification DEG analysis identified 129 differentially expressed genes between the hyaluronan-positive and hyaluronan-negative groups (Fig. [118]2). The full DEG list can be found in Supplementary Table S1. Of these genes, 97 were upregulated and 32 were downregulated in the hyaluronan-positive group compared with those in the hyaluronan-negative group. Only protein-coding genes were included, and the FDR was set at < 0.05. There were 53 genes with |log2 fold-change|≥ 1. Figure 2. [119]Figure 2 [120]Open in a new tab Volcano plot of differentially expressed gene (DEG) analysis. The red horizontal line represents a false discovery rate (FDR) level of 0.05. Genes with a log2 fold-change > 0 were overexpressed, and those with values < 0 were under-expressed. Functional analysis of DEGs The ToppGene portal was used to perform GO, KEGG, and REACTOME enrichment analyses on the biological functions and pathways of the 129 DEGs identified. Gene ontology (GO) analysis revealed that these genes were enriched in 82 biological processes. Of these “tube development”, “cell adhesion”, and “extracellular matrix organization” were the most statistically significant. Of the 31 statistically significant cellular compartments (CC), “the external encapsulating structure”, “extracellular matrix”, and “collagen-containing extracellular matrix” were the most enriched. The most enriched molecular functions (N = 21) were “signaling receptor binding”, “peptidase inhibitor activity”, and “endopeptidase inhibitor activity” (Fig. [121]3). In terms of KEGG enrichment analysis, three pathways, “ECM receptor interaction” (FDR = 3.07E − 4), “glycosaminoglycan biosynthesis chondroitin sulfate” (FDR = 1.72E − 3) and “complement and coagulation cascades” (FDR = 4.0E − 2), were statistically significantly enriched. In the REACTOME pathway analysis, the top three enriched pathways were “extracellular matrix organization” (FDR = 1.36E − 8), “collagen formation” (p = 7.68E − 6), and “regulation of insulin-like growth factor transport and uptake by insulin-like growth factor-binding proteins” (7.77E − 6) (Fig. [122]3). Figure 3. [123]Figure 3 [124]Open in a new tab (A) ToppGene results of Gene Ontology (GO) and pathway analyses. The bars show the top three most significantly enriched GO terms from each subontology. (B) Bubble plot showing the 15 most significantly enriched pathways in the KEGG and REACTOME pathways. BP biological process, CC cellular compartment, MF molecular function, KEGG Kyoto Encyclopedia of Genes and Genomes. Gene set enrichment analysis of DEGs GSEA analysis showed that 8 gene sets were significantly enriched in the hyaluronan-positive phenotype at FDR < 0.25 and nominal p-value < 0.05. Four gene sets were significantly enriched at FDR < 0.25 and nominal p-value < 0.01. No gene sets were significantly enriched in the hyaluronan-negative phenotype. The pathways with the highest normalized enrichment scores (NES) in the hyaluronan-positive phenotype were epithelial-mesenchymal transition (NES 1.72, FDR = 0.034), coagulation (NES 1.67, FDR = 0.033), P53 pathway (NES 1.57, FDR = 0.075), apoptosis (NES 1.48, FDR = 0.153), MTORC1 signaling (NES 1.48, FDR = 0.127), apical surface (NES 1.46, FDR = 0.126), apical junction (NES 1.45, FDR = 0.125), and KRAS signaling up (NES 1.36, FDR = 0.166). The enrichment plots are shown in Fig. [125]4. Figure 4. [126]Figure 4 [127]Open in a new tab Enrichment plots of Gene Set Enrichment Analysis (GSEA) results of the comparison between HA-positive and HA-negative groups, using the hallmark gene set. NES normalized enrichment score, FDR false discovery rate. Protein-to-protein interaction network and subnetworks A PPI network was constructed from 129 DEGs, using a minimum interaction score of 0.4. The network contained 127 nodes and 172 edges (PPI enrichment, p < 1.0E − 16). Fifty of the nodes were singletons with no connection to other nodes. The PPI network is shown in Supplementary Figure S1. The MCODE plug-in identified five closely connected subnetworks from the PPI network; the highly connected regions are shown in Supplementary Table S2. These networks contained 27 unique genes. ToppGene GO revealed that these genes were mostly associated with the molecular functions “endopeptidase inhibitor activity” and “peptidase inhibitor activity”, the biological processes “locomotion and cartilage development”, and the cellular components “extracellular matrix and external encapsulating structure”. The most enriched REACTOME pathway was “extracellular matrix organization” and the most enriched KEGG pathway was “glycosaminoglycan biosynthesis-chondroitin sulfate”. Hub gene discovery and analysis Using seven ranking algorithms of Cytoscape’s CytoHubba plug-in, we calculated the top 20 hub genes (Supplementary Table S3). The intersection of these results revealed 17 common hub genes: ANXA2, CD44, COL6A3, DCN, ENO2, GAPDH, HSP90B1, LOX, LRP1, MMP7, NCAM1, P4HB, SERPINE1, SERPINH1, SPP1, TGFBI, and TIMP1 (Fig. [128]5A). The full names and functions of the genes are listed in Table [129]2. All the common hub genes were overexpressed in HA-positive tumors compared with HA-negative tumors. The most enriched GO ontologies, as well as the KEGG and REACTOME pathways, did not differ significantly from those of the DEGs (Supplementary Dataset). Figure 5. [130]Figure 5 [131]Open in a new tab Overlapping hub genes and co-expression networks of hub genes. (A) UpSet intersection plot showing the seven algorithms used to identify 17 overlapping hub genes. (B) Hub genes (inner circles) and 20 co-expressed genes. Circle diameter correspond to score assigned to each gene (GeneMANIA). Table 2. 17 hub genes. Gene symbol Gene function References