Abstract

Background

   Cervical cancer is the fourth most common cancer in women globally, and
   the main cause of the disease has been found to be ongoing HPV
   infection. Cervical cancer remains the primary cause of cancer-related
   death despite major improvements in screening and treatment approaches,
   especially in low- and middle-income nations. Therefore, it is crucial
   to investigate the tumor microenvironment in advanced cervical cancer
   in order to identify possible treatment targets.

Materials and methods

   In order to better understand malignant cervical cancer epithelial
   cells (EPCs), this study used bulk RNA-seq data from UCSC in
   conjunction with single-cell RNA sequencing data from the ArrayExpress
   database. After putting quality control procedures into place, cell
   type identification and clustering analysis using the Seurat software
   were carried out. To clarify functional pathways, enrichment analysis
   and differential gene expression were carried out. The CIBERSORT and
   ESTIMATE R packages were used to evaluate the immune microenvironment
   characteristics, and univariate and multivariate Cox regression
   analyses were used to extract prognostic features. Furthermore,
   assessments of drug sensitivity and functional enrichment were carried
   out.

Results

   Eight cell types were identified, with EPCs showing high proliferative
   and stemness features. Five EPC subpopulations were defined, with C1
   NNMT+ CAEPCs driving tumor differentiation. A NNMT CAEPCs Risk Score
   (NCRS) model was developed, revealing a correlation between elevated
   NCRS scores and adverse patient outcomes characterized by immune
   evasion. In vitro experiments validated that the prognostic gene PLOD2
   significantly enhances proliferation, migration, and invasion of
   cervical cancer cells.

Conclusion

   This investigation delineated eight cell types and five subpopulations
   of malignant EPCs in cervical cancer, establishing the C1 NNMT+ CAEPCs
   as a crucial therapeutic target. The NCRS model demonstrated its
   prognostic capability, indicating that higher scores are associated
   with poorer clinical outcomes. The validation of PLOD2 as a prognostic
   gene highlights its therapeutic potential, underscoring the critical
   need for integrating immunotherapy and targeted treatment strategies to
   enhance diagnostic and therapeutic approaches in cervical cancer.

   Keywords: cervical cancer, single-cell RNA sequencing, tumor
   microenvironment, prognostic model, immune evasion, therapeutic targets

Introduction

   Cervical cancer ranks as the fourth most prevalent malignancy among
   women globally, with incidence and mortality rates of 13.1% and 6.1%,
   respectively ([43]1, [44]2). It continues to be the primary cause of
   cancer-related mortality in nations with few resources ([45]3),
   primarily due to persistent infection with HPV. Although significant
   reductions in cervical cancer rates have been achieved in certain
   developed regions through HPV vaccination, effective screening
   programs, and advancements in diagnostic and adjuvant therapies
   ([46]4), a study from 2020 revealed that nearly 90% of cervical cancer
   fatalities occur in low- and middle-income countries ([47]5). Current
   treatment modalities encompass conventional chemotherapy regimens
   utilizing cisplatin and paclitaxel, radiotherapy ([48]6), targeted
   therapies like bevacizumab, and immunotherapies employing immune
   checkpoint inhibitors ([49]7). The dearth of treatment choices for
   cervical cancer that is advanced, recurring, or metastatic emphasizes
   the critical need for additional research into the tumor
   microenvironment in order to identify novel therapeutic targets.

   An increasing number of studies have highlighted the critical role of
   the tumor microenvironment (TME) in understanding tumor progression and
   drug resistance ([50]8). Exploring the TME in different types of tumors
   holds significant clinical value ([51]9–[52]11). Techniques such as
   high-throughput sequencing and single-cell RNA sequencing (scRNA-seq)
   have enabled researchers to deeply analyze cellular heterogeneity,
   immune evasion mechanisms, and the tumor’s response to therapy within
   the TME ([53]12). ScRNA-seq has transformed our understanding of tumor
   heterogeneity, particularly enhancing the exploration of the TME
   ([54]13). It has also significantly advanced our understanding of the
   mechanisms and evolutionary pathways underlying tumor development
   ([55]14), For example, single-cell sequencing analysis has been widely
   used in breast cancer research to analyze intercellular communication
   within the TME and explore key factors that influence tumor
   heterogeneity ([56]15–[57]18), scRNA-seq has been widely employed to
   investigate cervical cancer heterogeneity ([58]19, [59]20). Our
   research focuses on malignant epithelial cells in cervical cancer, as
   chronic HPV infection primarily targets the squamous epithelium, and
   HPV’s life cycle is intricately tied to the differentiation of host
   epithelial cells ([60]21).

   This study sought to explore the variability of particular cell types
   in order to clarify the reasons behind the advancement of cervical
   cancer. Using scRNA-seq, we performed a thorough investigation of
   malignant cervical cancer EPCs and created a unique predictive model to
   find possible treatment targets. To confirm our findings, we also
   looked at the tumor immune microenvironment and ran cellular tests. Our
   study offers new perspectives on potential cervical cancer treatment
   approaches.

Methods

Acquisition of cervical cancer scRNA-seq data

   The ArrayExpress database provided the scRNA-seq data for cervical
   cancer used in this investigation (accession number E-MTAB-12305)
   ([61]22). Additionally, bulk RNA-seq data, which included information
   on somatic mutations and clinical variables (such as age, race, tumor
   stage, and survival time), were retrieved from the University of
   California, Santa Cruz (UCSC, [62]https://xena.ucsc.edu/).

ScRNA-seq data quality control and cell type identification

   The “Seurat” R package (version 4.3.0) was used to transform the
   scRNA-seq data into Seurat objects ([63]23). To identify and exclude
   doublet cells, the “DoubletFinder” R package (version 2.0.3) ([64]24)
   was applied. Subsequent filtering was conducted based on specific
   criteria: features between 300 and 6,000 (nFeature), counts ranging
   from 500 to 50,000 (nCount), and fewer than 5% of genes associated with
   red blood cells. Additionally, cells with mitochondrial gene expression
   levels higher than 25% of the total were not included in the analysis.

   The filtered cell data was then normalized using the “NormalizeData”
   function, and the “FindVariableFeatures” function ([65]25–[66]27) was
   utilized to identify the top 2,000 highly variable genes. The
   “ScaleData” function was then used to normalize the data ([67]28,
   [68]29). The “CellCycleScoring” function was utilized to evaluate cell
   cycle phases, followed by dimensionality reduction via “RunPCA,” where
   the first 30 principal components (PCs) were selected for subsequent
   analysis. To mitigate batch effects across samples, the “harmony” R
   package (v0.1.1) was implemented ([69]30, [70]31). The “FindNeighbors”
   and “FindClusters” projects were used to cluster cells, and the
   annotation of the clusters was done using previously identified marker
   genes. Visualization of the clusters was performed using Uniform
   Manifold Approximation and Projection (UMAP) ([71]32, [72]33).

InferCNV analysis

   To differentiate tumor cells from non-tumor cells, we utilized
   single-cell sequencing data and chromosomal sorting, employing the CNV
   algorithm via the “InferCNV” R package ([73]34). EPCs exhibiting
   heightened copy number variations were classified as Cancer-Associated
   EPCs (CAEPCs). Additionally, we conducted InferCNV analysis on the
   identified tumor subpopulations.

Identification and enrichment of DEGs in CAEPC subpopulations

   Differentially expressed genes (DEGs) within CAEPC subpopulations were
   identified using the “FindAllMarkers” function from the Seurat package.
   We further employed the Ro/e algorithm to evaluate tissue-specific
   expression patterns and cell cycle preferences within these