Abstract

   Hyaluronan (HA) accumulation in clear cell renal cell carcinoma (ccRCC)
   is associated with poor prognosis; however, its biology and role in
   tumorigenesis are unknown. RNA sequencing of 48 HA-positive and 48
   HA-negative formalin-fixed paraffin-embedded (FFPE) samples was
   performed to identify differentially expressed genes (DEG). The DEGs
   were subjected to pathway and gene enrichment analyses. The Cancer
   Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) data and
   DEGs were used for the cluster analysis. In total, 129 DEGs were
   identified. HA-positive tumors exhibited enhanced expression of genes
   related to extracellular matrix (ECM) organization and ECM receptor
   interaction pathways. Gene set enrichment analysis showed that
   epithelial–mesenchymal transition-associated genes were highly enriched
   in the HA-positive phenotype. A protein–protein interaction network was
   constructed, and 17 hub genes were discovered. Heatmap analysis of
   TCGA-KIRC data identified two prognostic clusters corresponding to
   HA-positive and HA-negative phenotypes. These clusters were used to
   verify the expression levels and conduct survival analysis of the hub
   genes, 11 of which were linked to poor prognosis. These findings
   enhance our understanding of hyaluronan in ccRCC.

   Subject terms: Medical research, Urology, Renal cancer, Prognostic
   markers

Introduction

   Global Cancer Statistics showed that approximately 430,000 new cases of
   kidney cancer (KC) were diagnosed in 2020, accounting for 2.2% of all
   human malignancies. It contributed to 180,000 deaths worldwide, making
   it the most lethal urological malignancy worldwide^[34]1, and the
   incidence of KC is increasing^[35]2. Renal cell carcinoma (RCC) is the
   most common type of kidney cancer, accounting for > 90% of primary
   kidney tumors^[36]3. The three most common histological subtypes of RCC
   are clear cell RCC (ccRCC), papillary RCC, and chromophobe RCC.
   Recently, more entities with molecularly defined pathogenesis have been
   identified^[37]4. RCC is often diagnosed incidentally, and one-third of
   the patients present with metastatic disease. Twenty percent of
   patients who undergo surgery for a primary tumor later develop
   metastases. Despite recent advances in systemic therapies, the
   prognosis of metastatic disease remains dismal^[38]5,[39]6. Therefore,
   it is imperative to identify new biomarkers for disease detection,
   prognostication, and treatment.

   Hyaluronan (HA) is a ubiquitous large glycosaminoglycan (GAG) found in
   the extracellular matrix (ECM), where it forms a pericellular coat
   surrounding cells and functions as a cushion^[40]7. It is composed of a
   variable number of repeating disaccharide units of N-acetyl-glucosamine
   (GlcNAc) and glucuronic acid (GlcUA), with an average molecular mass
   ranging from 1000 to 8000 kD^[41]8. The turnover of hyaluronan is
   rapid, and one-third of the hyaluronan mass undergoes turnover
   daily^[42]9. Hyaluronan is synthesized by the hyaluronan synthase
   enzymes HAS1, HAS2, and HAS3^[43]10. Degradation is mediated mainly by
   the family of hyaluronidases (HYAL 1-4, PH20, CEMIP, and
   TMEM2)^[44]11–[45]13. In addition to RCC, increased HA content has been
   associated with worse outcomes and more aggressive tumor growth in
   several human malignancies, including breast cancer, colon carcinoma,
   gastric carcinoma, thyroid cancer, pancreatic cancer, lung
   adenocarcinoma, lymphoma, hepatocellular carcinomas, and
   gliomas^[46]14–[47]23. It is postulated that HA acts as a barrier that
   shields tumor cells from insults, such as therapeutic agents and the
   immune system, and could serve as a potential target for anticancer
   drugs^[48]24. The use of PEGylated human hyaluronidase (PEGPH20) has
   shown promising efficacy in sensitizing pancreatic cancer cells to
   radiotherapy and in improving the efficacy of anti-PD-1
   therapy^[49]25,[50]26.

   In normal human kidneys, most HA is produced in the renal medulla,
   while the renal cortex, from which renal cell carcinomas usually arise,
   is hyaluronan poor^[51]27. Increased cortical HA content is associated
   with various non-neoplastic kidney diseases/conditions such as acute
   kidney injury, chronic kidney disease, allograft, and diabetic
   nephropathy^[52]28–[53]31. To date, reports on HA in RCCs are limited.
   In our previous study, we showed that increased cellular hyaluronan
   conveys poor prognosis in patients with ccRCC^[54]14. In addition, a
   higher hyaluronan content was associated with a higher tumor grade. The
   individual molecules associated with HA have been studied on gene
   expression levels. Chi et al.^[55]32 showed that expression of HAS1 was
   increased in RCC tissue compared with adjacent normal tissue while
   HYAL1 expression was lower in ccRCC than in normal renal tissue^[56]32.
   Cai et al. found that HAS1-3 mRNA expression was higher in human ccRCC
   tissues than in adjacent normal tissues^[57]33. However, only HAS3
   protein expression was higher. In conclusion, the results of expression
   studies are inconsistent, and the expression levels of different HA
   family proteins might not necessarily reflect overall HA levels.
   Therefore, to better understand the biological background of hyaluronan
   accumulation in ccRCC, we performed RNA sequencing of previously found
   hyaluronan-positive and-negative tumor cohorts. The aim of this study
   was to investigate the differences in RNA expression profiles and to
   find new potential hyaluronan-associated molecules.

Materials and methods

Patients and sample selection

   A research flowchart of this study is shown in Fig. [58]1.
   Formalin-fixed and paraffin-embedded (FFPE) tissue samples from
   patients who underwent surgery for ccRCC in the period 2000–2013 at
   Kuopio University Hospital were collected from the Biobank of Eastern
   Finland. The study (Hyaluronan in Renal Cell Carcinoma, HARCC) was
   approved by the Ethics Committee of the Northern Savo Hospital District
   (379/2016, November 1st, 2016). The diagnostic samples were processed
   and diagnosed according to the routine protocol in the Department of
   Clinical Pathology. Hyaluronan staining and evaluation were performed
   as described by Jokelainen et al.^[59]14. We selected 48
   hyaluronan-positive and 48 hyaluronan-negative tumor samples for RNA
   sequencing on the basis of tumor grade, sarcomatoid change, sex,
   survival, and metastasis status (Table [60]1). Three 1-mm-wide tissue
   cores were punched from each representative tumor block.

Figure 1.

   [61]Figure 1
   [62]Open in a new tab

   Research flowchart. The dashed line represents in silico analysis using
   the Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma (TCGA-KIRC)
   data. HA Hyaluronan, ccRCC clear cell renal cell carcinoma, DEG
   differentially expressed gene, GO Gene Ontology, KEGG Kyoto
   Encyclopedia of Genes and Genomes, GSEA Gene Set Enrichment Analysis,
   STRING Search Tool for the Retrieval of Interacting Genes/Proteins,
   TRRUST Transcriptional Regulatory Relationships Unraveled by
   Sentence-based Text mining, NOJAH NOt Just Another Heatmap.

Table 1.

   Characteristics of 96 renal cell carcinoma tissue samples.
                     Hyaluronan-positive N (%)   Hyaluronan-negative N (%)
   Samples           48                          48
   Sex
    Male             31   (64.6)                 24   (50)
    Female           17   (35.4)                 24   (50)
   Age (mean, range) 61.8 (41–82)                67.0 (36–86)
   WHO/ISUP grade
    1                2    (4.2)                  8    (16.7)
    2                22   (45.8)                 22   (45.8)
    3                10   (20.8)                 9    (18.8)
    4                14   (29.2)                 9    (18.8)
   Sarcomatoid change
    No               41   (85.4)                 43   (89.6)
    Yes              7    (14.6)                 5    (10.4)
   Disease-related death
    No               28   (58.6)                 35   (72.9)
    Yes              20   (41.2)                 13   (27.1)
   Clinical stage
    I                16   (33.3)                 24   (50.0)
    II               9    (18.8)                 8    (16.7)
    III              12   (25.0)                 7    (14.6)
    IV               11   (22.9)                 9    (18.8)
   Metastasis at diagnosis
    M0               37   (77.1)                 39   (81.3)
    M1               11   (22.9)                 9    (18.7)
   [63]Open in a new tab

Next-generation sequencing

   RNA was isolated by use of an RNeasy FFPE Kit (Qiagen), and
   deparaffinization was performed using 640–750 µL deparaffinization
   solution from the kit with 60–90 min incubation at 56 °C. RNA was
   eluted with 2′ 14 µL of RNase-free water. rRNA was removed by use of a
   QIASeq FastSelect (rRNA HMR, Qiagen), using 1 µg RNA or as much as
   could be handled by the kit (max 10 µL). RNA-sequencing libraries were
   constructed with a TruSeq Stranded mRNA Library Prep kit (Illumina) and
   0.3 µL adapters with 30 PCR cycles. Barcodes were optimized by use of
   BARCOSEL software
   ([64]http://ekhidna2.biocenter.helsinki.fi/barcosel/)^[65]34.
   Sequencing was performed using an Illumina Novaseq 6000 instrument.
   Adapter sequences, low-quality bases (q = 25), and short sequences
   (m = 30) were first trimmed using cutadapt (v.4.1)^[66]35
   ([67]https://cutadapt.readthedocs.io/en/stable/). The fourteenth (p14)
   patch release for the GRCh38 reference assembly and annotation was
   downloaded from
   [68]https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001
   405.40_GRCh38.p14/. The STAR aligner (v2.7.9a_2021-06-25)
   ([69]https://code.google.com/archive/p/rna-star/) with default
   parameters was used to map reads against the reference sequence^[70]36.
   Sorting and indexing of the alignment files were performed using
   Samtools (v.1.10) ([71]https://www.htslib.org/)^[72]37.

Quality control and differentially expressed gene analysis

   A significant portion of the mapped reads was concentrated on only a
   few genes in each sample, which necessitated an additional filtering
   step using Samtools (v. 1.16.1) to mark duplicated reads for removal
   using the no-multi-dup option. Gene counts were computed by use of R
   (v.4.1.1) ([73]https://www.r-project.org/) and Rsubread (v.2.6.4)
   ([74]https://bioconductor.org/packages/release/bioc/html/Rsubread.html)
   with multi-mapping and multi-overlapping reads removed^[75]38,[76]39.
   In addition to the read-level quality control (QC) detailed in section
   "[77]next-generation sequencing", further QC steps were performed
   according to the recommendations specified by Liu et al.^[78]40 for
   FFPE RNA-seq count data^[79]40. As FFPE samples were used, the total
   number of mapped reads in all samples was generally low, as expected.
   Therefore, the quality metric used for filtering the samples was the
   median sample-sample Spearman correlation after count normalization.
   Samples with a median correlation below 0.75 were removed, leaving 36
   hyaluronan-positive and 41 hyaluronan-negative samples. Differential
   expression analysis was performed using edgeR (v.3.34.1)
   ([80]https://bioconductor.org/packages/release/bioc/html/edgeR.html)^[8
   1]41. Only protein-coding genes and genes with mean counts above 1
   across all samples were included, leaving 10,633 genes for
   consideration. Count normalization was performed using the trimmed mean
   M-value method in edgeR. The results we plotted using the ggplot2
   package (v.3.4.2) ([82]https://ggplot2.tidyverse.org)^[83]42.

Gene Ontology and pathway enrichment analysis

   Gene Ontology (GO) is a bioinformatics database established to provide
   simple annotation of gene products^[84]43,[85]44. GO terms include
   biological processes (BP), cellular components (CC), and molecular
   functions (MF) of gene products. The Kyoto Encyclopedia of Genes and
   Genomes (KEGG) and REACTOME are free online databases containing
   information on biological pathways, molecular interactions, and
   reactions^[86]45–[87]48. Database analysis was performed to investigate
   the molecular function of the identified DEGs using ToppGene
   ([88]https://toppgene.cchmc.org/enrichment.jsp)^[89]49. ToppGene is a
   bioinformatics portal for gene-list enrichment analysis. The cutoff
   value for the false discovery rate (FDR) was set at p < 0.05. The
   Benjamini–Hochberg procedure was used to account for multiple testing.
   Results were plotted by SRplot
   ([90]https://www.bioinformatics.com.cn/srplot), an online platform for
   data analysis and visualization^[91]50.

Gene set enrichment analysis

   Gene Set Enrichment Analysis (GSEA) was performed to examine the gene
   expression profiles of hyaluronan-positive and hyaluronan-negative
   samples which had passed quality control steps^[92]51,[93]52. The
   hyaluronan-positive phenotype was compared with the hyaluronan-negative
   phenotype. The trimmed mean of M values (TMM)-normalized count data and
   phenotype data were uploaded to GSEA software (build v.4.3.2.)
   ([94]https://www.gsea-msigdb.org/gsea/index.jsp) and Human MSigDB
   h.all.v2023.1.Hs.symbols hallmark gene set was chosen. The number of
   permutations was set to 1000. All other run parameters were maintained
   at their default values.

Protein-to-protein network construction and subnetworks

   The Search Tool for the Retrieval of Interacting Genes/Proteins
   (STRING) is a biological database of known and predicted
   protein–protein interactions (PPI) ([95]https://string-db.org/)^[96]53.
   This includes the interactions derived from experiments and
   computationally predicted interactions. STRING (v.11.5) was used to
   predict interactions between the DEGs. Interactions with a combined
   score > 0.4 were considered statistically significant. The PPI network
   provided by STRING was imported into and visualized in Cytoscape
   (v.3.9.1.) ([97]http://www.cytoscape.org)^[98]54. The Cytoscape plug-in
   MCODE (v.2.0.2) was used to identify highly interconnected regions in
   the network^[99]55. Settings used were as follows: Cluster Finding
   Haircut, Node Score Cutoff = 0.2, K-Core = 2, Max. Depth = 100.

Hub genes discovery and analysis

   Hubgene analysis was performed using the CytoHubba (v.0.1) plug-in in
   Cytoscape^[100]56. Seven common algorithms (MCC, MNC, Degree,
   Closeness, Radiality, Stress, and EPC) were used to identify hub genes,
   and the UpSet intersection plot (by SRPlot) was used to identify common
   genes. The hub genes were then exported to GeneMANIA
   ([101]http://www.genemania.org/), a predictor software used to identify
   other genes related to a set of input genes and internal associations
   in the gene sets^[102]57. Default methods were used to calculate
   connection weights. Lastly, we used Transcriptional Regulatory
   Relationships Unraveled by Sentence-based Text Mining (TRRUST) (v.2) to
   predict transcription factors (TFs) of hub genes
   ([103]https://www.grnpedia.org/trrust/)^[104]58. TFs with adjusted
   P-value < 0.05 were considered significant. Subsequently, we used the
   TCGA-KIRC dataset to examine the expression of these TFs^[105]59.

In-silico TCGA heatmap and cluster analysis

   To test the gene signature identified by DEG analysis with another
   dataset, the TCGA-KIRC RNA expression dataset was downloaded via the
   Bioconductor package TCGABiolinks (v.2.26.0)
   ([106]https://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.
   html) using R (v.4.2.2) and RStudio
   (2022.12.0 + 353)^[107]38,[108]60,[109]61. Phenotype data concerning
   tumor grade and patient survival were downloaded from the same source.
   Data from 537 ccRCC samples were collected after removing duplicates.
   The count values were TMM-normalized and log2(count + 1)-transformed.
   Genome-Wide Heatmap (GWH) analysis was performed using the NOJAH (NOt
   Just Another Heatmap) (v.1) interactive tool
   ([110]https://github.com/bbisr-shinyapps/NOJAH/)^[111]62. A heatmap was
   plotted, using the set of 129 DEGs. The values selected for
   hierarchical clustering analysis parameters were the Z-score method for
   data normalization type, Euclidean for the distance method, and ward.D2
   for the clustering method. Tumor phenotype data were combined into a
   heatmap, using NOJAH. Furthermore, the methylation status of TCGA
   samples, as described by Ricketts et al.^[112]63, was combined with the
   heatmap^[113]63. The observed distribution of tumor grade, patient
   survival, and methylation cluster within each heatmap cluster was
   tested using the chi-square test.

TCGA GSEA analysis

   GSEA analysis was performed on TCGA-KIRC data using the newly
   discovered HA-positive and HA-negative phenotypes. TMM-normalized
   expression and phenotype data were uploaded to GSEA software (build
   v.4.3.2.) and Human MSigDB h.all.v2023.1.Hs.symbols hallmark gene set
   was chosen. The number of permutations was set to 1000. All other run
   parameters were maintained at their default values.

Hub gene expression analysis in silico

   The mRNA expression levels of the identified hub genes were
   investigated, using the TCGA-KIRC dataset. RNA-seq data from 72 healthy
   renal tissues were downloaded using the TCGAbiolinks package (v.2.26.0)
   in R (v.4.2.2). TCGA samples identified belonging to “HA-negative” and
   “HA-positive” gene expression clusters were compared against each other
   and to normal renal tissue. The expression levels of the hub genes were
   TPM-normalized and log2-transformed for each cohort. The Mann–Whitney
   U-test was used to compare each group with the other two groups, and
   the expression levels were box-plotted using R package ggpubr (v.0.6.0)
   ([114]https://rpkgs.datanovia.com/ggpubr/)^[115]64.

Prognostic value of hub genes

   We used the Kaplan–Meier Plotter online database
   ([116]http://kmplot.com/analysis/) containing TCGA-KIRC data from 530
   patients to analyze the prognosis of hub genes^[117]65. Kaplan–Meier
   estimators were plotted, and hazard ratios were calculated for overall
   survival (OS) and disease-free survival (DFS). The samples were
   stratified into low- and high-expression groups based on the median
   cut-off (50%). Statistical significance was set at p < 0.05 and
   HR > 1.0 were considered significant.

Ethical considerations

   The study was conducted in accordance with the Declaration of Helsinki
   and approved by the Ethics Committee of the Northern Savo Hospital
   District (379/2016, November 1st, 2016).

Results

DEG identification

   DEG analysis identified 129 differentially expressed genes between the
   hyaluronan-positive and hyaluronan-negative groups (Fig. [118]2). The
   full DEG list can be found in Supplementary Table S1. Of these genes,
   97 were upregulated and 32 were downregulated in the
   hyaluronan-positive group compared with those in the
   hyaluronan-negative group. Only protein-coding genes were included, and
   the FDR was set at < 0.05. There were 53 genes with |log2
   fold-change|≥ 1.

Figure 2.

   [119]Figure 2
   [120]Open in a new tab

   Volcano plot of differentially expressed gene (DEG) analysis. The red
   horizontal line represents a false discovery rate (FDR) level of 0.05.
   Genes with a log2 fold-change > 0 were overexpressed, and those with
   values < 0 were under-expressed.

Functional analysis of DEGs

   The ToppGene portal was used to perform GO, KEGG, and REACTOME
   enrichment analyses on the biological functions and pathways of the 129
   DEGs identified. Gene ontology (GO) analysis revealed that these genes
   were enriched in 82 biological processes. Of these “tube development”,
   “cell adhesion”, and “extracellular matrix organization” were the most
   statistically significant. Of the 31 statistically significant cellular
   compartments (CC), “the external encapsulating structure”,
   “extracellular matrix”, and “collagen-containing extracellular matrix”
   were the most enriched. The most enriched molecular functions (N = 21)
   were “signaling receptor binding”, “peptidase inhibitor activity”, and
   “endopeptidase inhibitor activity” (Fig. [121]3). In terms of KEGG
   enrichment analysis, three pathways, “ECM receptor interaction”
   (FDR = 3.07E − 4), “glycosaminoglycan biosynthesis chondroitin sulfate”
   (FDR = 1.72E − 3) and “complement and coagulation cascades”
   (FDR = 4.0E − 2), were statistically significantly enriched. In the
   REACTOME pathway analysis, the top three enriched pathways were
   “extracellular matrix organization” (FDR = 1.36E − 8), “collagen
   formation” (p = 7.68E − 6), and “regulation of insulin-like growth
   factor transport and uptake by insulin-like growth factor-binding
   proteins” (7.77E − 6) (Fig. [122]3).

Figure 3.

   [123]Figure 3
   [124]Open in a new tab

   (A) ToppGene results of Gene Ontology (GO) and pathway analyses. The
   bars show the top three most significantly enriched GO terms from each
   subontology. (B) Bubble plot showing the 15 most significantly enriched
   pathways in the KEGG and REACTOME pathways. BP biological process, CC
   cellular compartment, MF molecular function, KEGG Kyoto Encyclopedia of
   Genes and Genomes.

Gene set enrichment analysis of DEGs

   GSEA analysis showed that 8 gene sets were significantly enriched in
   the hyaluronan-positive phenotype at FDR < 0.25 and nominal
   p-value < 0.05. Four gene sets were significantly enriched at
   FDR < 0.25 and nominal p-value < 0.01. No gene sets were significantly
   enriched in the hyaluronan-negative phenotype. The pathways with the
   highest normalized enrichment scores (NES) in the hyaluronan-positive
   phenotype were epithelial-mesenchymal transition (NES 1.72,
   FDR = 0.034), coagulation (NES 1.67, FDR = 0.033), P53 pathway (NES
   1.57, FDR = 0.075), apoptosis (NES 1.48, FDR = 0.153), MTORC1 signaling
   (NES 1.48, FDR = 0.127), apical surface (NES 1.46, FDR = 0.126), apical
   junction (NES 1.45, FDR = 0.125), and KRAS signaling up (NES 1.36,
   FDR = 0.166). The enrichment plots are shown in Fig. [125]4.

Figure 4.

   [126]Figure 4
   [127]Open in a new tab

   Enrichment plots of Gene Set Enrichment Analysis (GSEA) results of the
   comparison between HA-positive and HA-negative groups, using the
   hallmark gene set. NES normalized enrichment score, FDR false discovery
   rate.

Protein-to-protein interaction network and subnetworks

   A PPI network was constructed from 129 DEGs, using a minimum
   interaction score of 0.4. The network contained 127 nodes and 172 edges
   (PPI enrichment, p < 1.0E − 16). Fifty of the nodes were singletons
   with no connection to other nodes. The PPI network is shown in
   Supplementary Figure S1. The MCODE plug-in identified five closely
   connected subnetworks from the PPI network; the highly connected
   regions are shown in Supplementary Table S2. These networks contained
   27 unique genes. ToppGene GO revealed that these genes were mostly
   associated with the molecular functions “endopeptidase inhibitor
   activity” and “peptidase inhibitor activity”, the biological processes
   “locomotion and cartilage development”, and the cellular components
   “extracellular matrix and external encapsulating structure”. The most
   enriched REACTOME pathway was “extracellular matrix organization” and
   the most enriched KEGG pathway was “glycosaminoglycan
   biosynthesis-chondroitin sulfate”.

Hub gene discovery and analysis

   Using seven ranking algorithms of Cytoscape’s CytoHubba plug-in, we
   calculated the top 20 hub genes (Supplementary Table S3). The
   intersection of these results revealed 17 common hub genes: ANXA2,
   CD44, COL6A3, DCN, ENO2, GAPDH, HSP90B1, LOX, LRP1, MMP7, NCAM1, P4HB,
   SERPINE1, SERPINH1, SPP1, TGFBI, and TIMP1 (Fig. [128]5A). The full
   names and functions of the genes are listed in Table [129]2. All the
   common hub genes were overexpressed in HA-positive tumors compared with
   HA-negative tumors. The most enriched GO ontologies, as well as the
   KEGG and REACTOME pathways, did not differ significantly from those of
   the DEGs (Supplementary Dataset).

Figure 5.

   [130]Figure 5
   [131]Open in a new tab

   Overlapping hub genes and co-expression networks of hub genes. (A)
   UpSet intersection plot showing the seven algorithms used to identify
   17 overlapping hub genes. (B) Hub genes (inner circles) and 20
   co-expressed genes. Circle diameter correspond to score assigned to
   each gene (GeneMANIA).

Table 2.

   17 hub genes.
   Gene symbol Gene function References