Abstract

   Recent research emphasised the indispensable role of histone
   lactylation in the activation of hepatic stellate cells. The VHL
   mutation is extremely common in clear cell renal cell carcinoma, which
   normally causes a metabolic shift in cancer cells and increases lactate
   production, eventually creating a lactate-enriched tumour
   microenvironment. Cancer-associated fibroblasts (CAFs) promote tumour
   progression, which is also vital in clear cell renal cell carcinoma.
   Therefore, this study investigated histone lactylation in CAFs and its
   impact on patient survival. Multiomics technology was employed to
   determine the role of histone lactylation-related genes in the
   evolution of CAFs which correlated with the function and molecular
   signatures of CAFs. The results suggested that TIMP1 was the hub gene
   of histone lactylation-related genes in clear cell renal cell
   carcinoma.

   Keywords: Histone lactylation, Cancer-associated fibroblasts, Tumour
   microenvironment, TIMP1

Highlights

     * •
       This work provided a novel angle into the connection of histone
       lactylation and the evolution of CAFs.
     * •
       This work defined three histone lactylation stages of CAFs, which
       had distinct molecular and functioning modalities.
     * •
       Based on multi-omics data, this work illustrated the distribution
       and molecular patterns of high-CAF infiltrated area.
     * •
       Using machine learning methods, this work defined TIMP1 as the hub
       gene in the histone lactylation related genes.

1. Introduction

   In 2022, kidney and renal pelvis cancer comprised 5 per cent of
   estimated new cases, ranking sixth among all cancer types [[49]1].
   Clear cell renal cell carcinoma (ccRCC) is the most common type of
   renal cell carcinoma, accounting for more than 70 per cent of RCC. VHL
   mutations occur in most ccRCC cases and induce cancer cells to undergo
   metabolic reprogramming to an aerobic glycolysis phenotype
   [[50]2,[51]3], generating a lactate-enriched tumour microenvironment
   (TME).

   A recent study disclosed the significance of histone lactylation in
   hepatic stellate cell activation and identified 1126 H3K18la-marked
   genes [[52]4]. In addition, Yang et al. elucidated that the coupling of
   inactive VHL and histone lactylation induced platelet-derived growth
   factor receptor β (PDGFRβ)promoting tumour progression, and inhibition
   of histone lactylation and PDGFRβ impeded tumour growth and improved
   therapeutic efficacy in ccRCC [[53]5]. Since the concept of histone
   lactylation was proposed [[54]6], its importance has been corroborated
   in numerous cancer types including ocular melanoma [[55]7], prostate
   cancer [[56]8], liver cancer [[57]9] and ccRCC [[58]10]. However, few
   studies have investigated the role of histone lactylation in
   cancer-associated fibroblasts (CAFs) in ccRCC. Considering their great
   value in shaping the TME [[59]11,[60]12] and assisting tumour
   resistance to immunotherapy [[61]13,[62]14], it is necessary to
   investigate the plasticity of CAFs to histone lactylation.

   This study employed multiomic technologies to investigate the
   significant role of histone lactylation-related genes in the evolution
   of CAFs. The evolutionary process of CAFs with histone
   lactylation-related genes was explored using single-cell RNA sequencing
   (scRNA-seq) data and three stages were defined based on the gene
   expression trajectories. The distribution modalities and gene
   expression patterns were explored in different molecular clusters,
   where common patterns of histone lactylation-related genes were
   observed in the high myoCAF infiltrated region. Moreover, patients in
   the TCGA-KIRC cohort were clustered into high and low histone
   lactylation subgroups with significant differences in clinical status
   and immune signatures. Finally, TIMP1 was identified as the hub gene,
   and it alone or together with its receptors had significant value in
   predicting patient overall survival which was gender-related.

2. Methods

2.1. Data acquisition

   The bulk RNA sequencing results and clinical data of TCGA-KIRC (tumour:
   541, normal: 72) were downloaded from the Cancer Genome Atlas (TCGA)
   database ([63]https://www.cancer.gov/tcga) with the E-MTAB-1980 cohort
   containing 101 ccRCC samples utilised for validation. Notably, the raw
   count format data of TCGA-KIRC was used for the detection of
   differentially expressed genes (DEGs), while the transcripts per
   kilobase million (TPM) format data was utilised for other analyses. For
   genes with multiple sequencing results, the highest RNA expression was
   utilised.

   The single-cell RNA sequencing (scRNA-seq) data, single-cell Assay for
   Transposase-Accessible Chromatin using sequencing (scATAC-seq) data and
   Spatial Transcriptomics (ST) data can be accessed in [64]GSE207493
   [[65]15], [66]GSE210042 [[67]10] and [68]GSE175540 [[69]16]. The
   immunomodulatory genes were generated and revised from previous
   research [[70]17].

2.2. Differentially expressed gene extraction

   The Wilcoxon test and R package Limma [[71]18] were used to detect
   differentially expressed histone lactylation-related genes in tumour
   and normal tissues, with the criteria set as the absolute logFC greater
   or equal to 0.2 and a false discovery rate (FDR) lower than 0.05.
   Random forest was performed using the R package randomForest [[72]19],
   and the uniCox and Least absolute shrinkage and selection operator
   (LASSO) was based on the R package glmnet [[73][20], [74][21],
   [75][22]].

2.3. Calculation of enrichment scores

   Genes used to calculate the histone lactylation score were obtained
   from the study by Rho et al. [[76]4] which identified 1126 genes marked
   by H3K18la in the activation of hepatic stellate cells. These gene
   symbols were transformed to yield 1118 genes with corresponding gene
   symbols, referred to as the histone lactylation-related gene (HLRG)
   list, which was used to calculate the histone lactylation score (see
   Supplemental materials). The normal fibroblast and CAF-related
   signatures were generated from R package IOBR [[77]23] including
   Normal.Fibroblast, ecm.myCAF, TGFb.myCAF, wound.myCAF, detox.iCAF,
   IL.iCAF and IFNG.iCAF. The histone lactylation scores and CAF-related
   signature scores in scRNA-seq and ST data were calculated using
   AddModuleScore based on R package Seurat [[78][24], [79][25], [80][26],
   [81][27]], while the histone lactylation scores were calculated using
   Gene Set Variation Analysis (GSVA) in the bulk RNA sequencing data
   based on the 380 upregulated genes in tumour tissue according to the R
   package GSVA [[82]28]. The estimated CAF-related signature scores in
   bulk RNA sequencing data were obtained using R package IOBR based on
   the single sample gene set enrichment analysis (ssGSEA) algorithm.

2.4. Survival analysis

   The prognostic analysis was implemented using the R packages survival
   and survminer [[83]29]. The statistically significant criterion was set
   as p-value <0.05 and the surv_cutpoint was used to calculate the
   optimal cutoff value. A nomogram was constructed based on the R package
   nomogramFormula which was drawn using DynNom [[84]30].

2.5. Correlation analysis

   The correlation analysis was conducted using the R package ggplot2
   [[85]31] based on spearman correlation method, and the statistically
   significant criterion was set as p-value <0.05.

2.6. Processing of the single-cell RNA sequencing data

   The R package Seurat [[86][24], [87][25], [88][26], [89][27]] was used
   to filter out low-quality cells in [90]GSE210038, retaining cells with
   feature counts <3500 and a percentage of mitochondrial reads <20 %.
   Then, FindIntegrationAnchors (with the reduction parameter set as
   “rpca”) and the IntegrateData command were used for data integration
   and batch effect removal before the RunPCA and RunUMAP commands were
   used for dimensionality reduction. The 30 most remarkable principal
   components and the top 2000 highly variable genes were filtered and
   clustered using the FindNeighbors and FindClusters commands with a
   resolution of 0.8. The canonical marker genes to identify clusters were
   obtained from the literature. For the re-clustering of MSCs, the
   resolution parameter was set to 0.1 to avoid excessive distinction.

   Notably, for scRNA-seq data of [91]GSE207493, the raw data was first
   filtered with the following criteria: 1) after clustering, the space
   between each cell cluster should be distinct; 2) the absolute cell
   number of ACTA2 positive clusters should be greater than 200; 3) the
   cell proportion of ACTA2 positive clusters should be greater than 0.07.
   Finally, 9 of 19 samples were selected with a comparatively sufficient
   number of MSCs for further study ([92]Fig. S1). The filtering criteria
   were set as a feature count <7500, a unique molecular identifier (UMI)
   count ranging from 1500 to 10000 and a percentage of mitochondrial
   reads <10 %. The integration and dimensionality reduction methods were
   consistent with [93]GSE210038 and the resolution parameter was set to
   0.6.

2.7. Evaluation of tissue distribution preferences