Graphical abstract graphic file with name fx1.jpg [24]Open in a new tab Highlights * • Steps for evaluating immune levels of each cell * • Instructions to detect abnormal cancer signaling pathways * • Guidance on visualizing and analyzing tumor cell signaling * • Techniques for identifying central genes related to immunity __________________________________________________________________ Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics. __________________________________________________________________ Malignant tumor cells are typically more active in terms of metabolism and signal transduction compared to immune and normal cells. Here, we present a protocol to evaluate immune levels, abnormal metabolism, and signaling pathways in tumor tissue based on single-cell sequencing based on patient data obtained from the GEO database. We describe steps for tumor immune microenvironment (TIME)-based evaluation, tumor purity assessment, and identification of abnormal signal transduction and metabolic pathways. We then detail procedures for screening hub genes. Before you begin The ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data) score is a method used to infer the proportion of stromal and immune cells in tumor samples using gene expression data.[25]^2 This score provides insight into the tumor microenvironment, which is crucial for understanding cancer progression and how it interacts with the body’s immune system.[26]^3 The ESTIMATE score is derived from the expression of genes that are representative of the presence of stromal and immune cells in the tumor tissue. Given the common tendency of malignant tumor cells to exhibit immune evasion and lower immune levels,[27]^4^,[28]^5^,[29]^6^,[30]^7 this protocol is also applicable for evaluating abnormal signaling pathways and metabolic pathways in such cells. The following section outlines six key steps. * 1. Analysis of immune levels and tumor purity. * 2. Cell grouping based on immune levels. * 3. Enrichment of signaling pathways. * 4. Assessment of signaling pathway enrichment levels. * 5. Analysis of abnormal signaling pathways. * 6. Selection of hub genes. Download and install R, Rtools, RStudio and Git Inline graphic Timing: ∼1 h * 7. The analysis in this protocol is conducted using R language, and you can download the appropriate version of R programming language based on the computer’s operating system. Generally speaking, this protocol is not limited by the R language version or computer operating system. + a. Visit the following link to obtain the R language: [31]https://cran.r-project.org/. Note: The current pipeline was performed using R version 4.3.2. * 8. Rtools includes a set of tools and dependencies, the most important of which is the GNU Compiler Collection (GCC). GCC is a powerful compiler that supports multiple programming languages, including C, C++, and Fortran. In R programming, it is sometimes necessary to write extension packages using these languages. Rtools provides tools such as GCC, which makes compiling these extension packages on computer simple and efficient. + a. Visit the following link to obtain the Rtools: [32]https://cran.r-project.org/bin/windows/Rtools/. Note: The current pipeline was performed using Rtools43. * 9. RStudio is an integrated development environment (IDE) for R. Due to the extensive amount of code involved in this protocol, we recommend installing RStudio for easier code modification and plotting. + a. Visit the following link to obtain the RStudio: [33]https://rstudio.com/products/rstudio/. * 10. Git is an open-source software that can be used for various operations on GitHub, such as creating, modifying, uploading, cloning, and more. + a. Visit the following link to obtain the Git: [34]https://git-scm.com/download. Download required packages in RStudio Inline graphic Timing: ∼1 h * 11. If this is your first-time installing R language and RStudio, please install the R packages used in this method and their dependent environments, as most of the R packages involved are related to biology. You can use the BiocManager software to install them. Use the following command for installation: BiocManager::install() or install.packages(). + a. Install the R packages involved in this protocol. if (!requireNamespace("BiocManager", quietly = TRUE)); install.packages("BiocManager"); BiocManager::install(c("ggplot2","data.table","pheatmap","clus terProfiler","enrichplot" ,"stringr","KEGGREST","devtools", "ggvenn", "gridExtra","SingleR","celldex")) + b. To install the estimate R packages. library(utils); rforge <- "http://r-forge.r-project.org"; install.packages("estimate", repos=rforge, dependencies=TRUE) + c. To install the SeuratObject (version = “4.1.3”) and Seurat (version = “4.2.1”) R packages, we strongly recommend using the following code, as the latest versions of SeuratObject and Seurat encounter errors when merging multiple single-cell datasets: o i. # Uninstall the installed Seurat and SeuratObject R package. remove.packages(c("SeuratObject","Seurat")) #Reinstall the SeuratObject (version="4.1.3") and Seurat (version="4.2.1") R packages. library(devtools) install_version("SeuratObject", version = "4.1.3",repos = "http://cran.us.r-project.org") install_version("Seurat", version = "4.2.1",repos = "http://cran.us.r-project.org") o ii. Alternatively, you can use the following code for installation: # Uninstall the installed Seurat and SeuratObject R package. remove.packages(c("SeuratObject","Seurat")) #Reinstall the SeuratObject (version="4.1.3") and Seurat (version="4.2.1") R packages. packageurl1 <-[35]http://cran.us.r-project.org/src/contri b/Archive/SeuratObject/SeuratObject_4.1.3.tar.gz install.packages(packageurl1, repos=NULL, type="source") packageurl2 <-[36]http://cran.us.r-project.org/src/contri b/Archive/Seurat/Seurat_4.2.1.tar.gz install.packages(packageurl2, repos=NULL, type="source") + d. Import the R packages used in this protocol. { library(ggplot2) library(data.table) library(estimate) library(pheatmap) library(clusterProfiler) library(enrichplot) library(stringr) library(KEGGREST) library(devtools) library(ggvenn) library(SingleR) library(celldex)} Dataset selection Inline graphic Timing: ∼2 h * 12. The data selection can be determined based on your specific needs. One approach is to obtain data from public databases, while another is to prepare your own single-cell sequencing data. This method is also suitable for the integration analysis of multiple single-cell sequencing data. The data preprocessing steps include the code for data merging. It is crucial to pay attention to preparing the data in the Seurat object format that can be recognized by Seurat, and it should also include the annotation of cell types for the specific cells you intend to analyze. + a. This protocol does not include cell type annotation for single-cell sequencing. Please refer to the following options for annotating cell types in advance. o i. To use SingleR,[37]^8 refer to the usage instructions for this software available at the following link: [38]https://www.bioconductor.org/packages/release/bioc/vi gnettes/SingleR/inst/doc/SingleR.html. o ii. To use Cellassign,[39]^9 refer to the usage instructions for this software available at the following link: [40]https://github.com/irrationone/cellassign. o iii. To use Celaref.,[41]^10 refer to the usage instructions for this software available at the following link: [42]https://bioconductor.org/packages/release/bioc/vignet tes/celaref/inst/doc/celaref_doco.html. o iv. To use scCATCH,[43]^11 refer to the usage instructions for this software available at the following link: [44]https://github.com/ZJUFanLab/scCATCH. + b. If you want to annotate malignant tumors, you can use the inferCNV R package. o i. To use inferCNV,[45]^12 refer to the usage instructions for this software available at the following link: [46]https://github.com/broadinstitute/inferCNV. Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Deposited data __________________________________________________________________ Raw code GitHub repository [47]https://github.com/weiyubai/TIME_ASP The code version in this protocol TIME-ASP V1.0[48]^13 [49]https://zenodo.org/doi/10.5281/zenodo.11000276 [50]GSE125449 Ma et al.[51]^14 [52]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125449 __________________________________________________________________ Software and algorithms __________________________________________________________________ R software R Core Team[53]^15 [54]https://www.r-project.org/ RStudio software RStudio Team[55]^16 [56]https://rstudio.com/ Rtools software R Core Team[57]^15 [58]https://cran.r-project.org/bin/windows/Rtools/ Git software Open source [59]https://git-scm.com data.table package Dowle et al.[60]^17 [61]https://cloud.r-project.org/web/packages/data.table/index.html ESTIMATE package Yoshihara et al.[62]^2 [63]https://bioinformatics.mdanderson.org/public-software/estimate/ pheatmap package Kolde et al.[64]^18 [65]https://cran.r-project.org/web/packages/pheatmap/index.html clusterProfiler package Yu et al.[66]^19 [67]https://bioconductor.org/packages/release/bioc/html/clusterProfiler .html stringr package Wickham et al.[68]^20 [69]https://cran.r-project.org/web/packages/stringr/index.html KEGGREST package Tenenbaum et al.[70]^21 [71]https://www.bioconductor.org/packages/release/bioc/vignettes/KEGGRE ST/inst/doc/KEGGREST-vignette.html devtools package Wickham et al.[72]^22 [73]https://cran.r-project.org/web/packages/devtools/index.html SingleR package Lun et al.[74]^8 [75]https://www.bioconductor.org/packages/release/bioc/vignettes/Single R/inst/doc/SingleR.html cellassign package Zhang et al.[76]^9 [77]https://github.com/irrationone/cellassign Celaref package Williams et al.[78]^23 [79]https://bioconductor.org/packages/release/bioc/vignettes/celaref/in st/doc/celaref_doco.html scCATCH package Shao et al.[80]^11 [81]https://github.com/ZJUFanLab/scCATCH ggvenn package Yan et al.[82]^24 [83]https://github.com/yanlinlin82/ggvenn Seurat package Butler et al.[84]^25 [85]http://cran.us.r-project.org/src/contrib/Archive/Seurat/Seurat_4.2. 1.tar.gz SeuratObject package Satija et al.[86]^25 [87]http://cran.us.r-project.org/src/contrib/Archive/SeuratObject/Seura tObject_4.1.3.tar.gz gridExtra package Auguie et al.[88]^26 [89]https://cran.r-project.org/web/packages/gridExtra/index.html celldex package Aran et al.[90]^8 [91]https://bioconductor.org/packages/release/data/experiment/html/cell dex.html GOplot package Walter et al.[92]^27 [93]https://wencke.github.io/ __________________________________________________________________ Other __________________________________________________________________ Computing platform: this protocol runs on a Windows system with 1 TB of memory, powered by an Intel Core i7-8550U processor, quad-core, with a maximum turbo frequency of 1.99 GHz, and equipped with 16 GB of RAM. Windows 11 Home edition N/A [94]Open in a new tab Materials and equipment This protocol can be applied to analyze single-cell sequencing data from your own laboratory as well as publicly available databases. It is important to note that cell types need to be annotated in advance. The protocol does not perform normalization and scaling steps for scRNA data. If it is necessary to organize FASTA files into expression matrices, it may require a lot of computing power and require the use of servers or computer clusters. Step-by-step method details Clone project and raw code from GitHub Inline graphic Timing: ∼30 min * 1. Clone the TIME-ASP repository from GitHub. + a. Double-click to open the ‘Git CMD.exe’. + b. Create a folder for storing the project at an absolute path using the ‘mkdir’ command. > mkdir D:\TIME-ASP + c. Set the working directory to the absolute path established in the previous step. > cd /d D:\TIME-ASP + d. Clone the TIME-ASP project from GitHub. > git clone[95]https://github.com/weiyubai/TIME_ASP.git Note: All data for this analysis workflow is included therein. The raw code is located in the ‘code’ folder and named according to the corresponding step. ‘D:\TIME-ASP' represents the absolute path designated for storing the project files. ‘TIME-ASP’ serves as the folder name for storage, and you can customize it as per your requirements. Prepare input data Inline graphic Timing: ∼2 h Set Rstudio working path. * 2. Set the RStudio working directory to the project folder. > setwd("D:/TIME_ASP ") Note: If you wish to temporarily halt your program, the 'save' function can be employed to preserve the environment variables. Subsequently, the 'load' function allows for the reloading of these variables prior to the next execution. For instance, to store variables and execute subsequent code, you could utilize `save(list = ls(Sc_data,p1,p2,p3,p4), file = “./my_data.Rdata”)`, followed by reloading the saved environment variables with `load(“./my_data.Rdata”)`. Notably, the variables that require saving are listed in `ls()`and 'file = ' is the path where the environment variables are saved. * 3. Prepare Seurat object format single-cell sequencing data of interest. Option 1: User’s own or public database single-cell sequencing data. + a. Put the three files ‘barcodes.tsv’, ‘genes.tsv’, and ‘matrix.mtx’ into the 'folder' folder. + b. Build Seurat object. > Sc_data = CreateSeuratObject(counts = Read10X(folder) Note: This step takes the 10× Genomics data format as an example. For other data formats, please refer to the data import help in the Seurat R package. + c. Calculate the proportion of mitochondrial genes. + d. Quality control. > Sc_data[["percent.mt"]] <- PercentageFeatureSet(Sc_data, pattern = "ˆMT-") # The variable in this step is defined as "Sc_data" in order to connect with the analysis steps of subsequent sample data. o i. Filter cells with UMI (Unique Molecular Identifier) numbers greater than 2500. o ii. Filter cells with UMI numbers less than 200. o iii. Filter cells with a percentage of mitochondrial genes greater than 5% of the total number of genes > Sc_data <- subset(Sc_data, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) + e. Data standardization. > Sc_data <- NormalizeData(Sc_data, normalization.method = "LogNormalize", scale.factor = 10000) Note: The default data standardization method used is LogNormalize, where the total expression level of each cell is standardized to 10000, and then log is taken as the logarithm; the results are stored in Sc_data[“RNA”] @ data. + f. Perform cell type annotation and linear dimensionality reduction. > Sc_data <- FindVariableFeatures(Sc_data, selection.method = "vst", nfeatures = 2000) > all.genes <- rownames(Sc_data) > Sc_data <- ScaleData(Sc_data, features = all.genes) > Sc_data <- ScaleData(Sc_data, vars.to.regress = "percent.mt") #Using SingleR for cell annotation > library(SingleR) > library(celldex) > hpca.se <- HumanPrimaryCellAtlasData() > lihcSingleR <- GetAssayData(Sc_data, slot = "data") > lihc.hesc <- SingleR(test=lihcSingleR, ref = hpca.se, labels =hpca.se$label.main) > table(lihc.hesc$labels, Sc_data $seurat_clusters) > Sc_data @meta.data$labels <- lihc.hesc$labels > Sc_data <- RunPCA(Sc_data, features = VariableFeatures(object = Sc_data)) > Sc_data<- RunTSNE(Sc_data, dims = 1:10) > DimPlot(Sc_data, group.by ="labels",reduction = "tsne",label = TRUE) > ggsave("ScType.pdf",width = 4,height = 2.5) >save(Sc_data,file="./Sc_data.Rdata) #Save single-cell sequencing expression matrix. Option 2: Using sample data. + g. Download the [96]GSE125449 dataset from the public database o i. Open the “Project” folder and create a new folder named “set”. o ii. Navigate to [97]https://www.ncbi.nlm.nih.gov/. o iii. Select ‘GEO DataSets’ search term. o iv. Enter GEO ID (e.g., [98]GSE125449) in the input field and select ‘Search’ option. o v. Click to enter the first project. o vi. Download the files of the set1 series into the working directory ‘project/set/set1’, and download the files of the set2 series into the working directory ‘project/data /set2’. Note: We have already downloaded the data and uploaded it to the ‘set’ folder in the GitHub TIME-ASP repository. You don’t need to download it again here if you have cloned the project directly from GitHub. + h. Build Seurat object with [99]GSE125449. > rm(list=ls()) #Clear variables > set.seed(123456) # Set random seeds > library(Seurat) > library(ggplot2) > library(data.table) > samples=list.files('./set') > sceList = lapply(samples, function(pro){ folder=file.path('./set', pro) sce=CreateSeuratObject(counts = Read10X(folder), project = pro) return(sce) }) #file.path(a,b) means the path: a/b > sceList <- lapply(sceList, FUN = function(x) { x <- NormalizeData(x) x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)}) #Build a list based on two datasets > sceall = merge(sceList[[1]], y = sceList[[2]], add.cell.ids = samples, merge.data = TRUE) # Merge data> sample1 <- read.table(gzfile("./set/set1/samples.txt.gz"),sep = "\t",header = T) #Import sample information from dataset1 > rownames(sample1) <-sample1$Cell.Barcode > sample2 <- read.table(gzfile("./set/set2/samples.txt.gz"),sep = "\t",header = T) #Import sample information from dataset2. > rownames(sample2) <-sample2$Cell.Barcode > sample <- rbind(sample1,sample2) > sceall@meta.data$sample <- sample$Sample > sceall@meta.data$Type <- sample$Type #Import cell annotation information > Idents(sceall) <- sceall@meta.data$sample > Sc_data <-subset(sceall,idents = c("S16_P10_LCP18","S02_P01_LCP21","S10_P05_LCP23","S07_P02_LCP 28","S12_P07_LCP30","S21_P13_LCP37","S15_P09_LCP38","S351_P10_ LCP34","S364_P21_LCP65"),invert = FALSE) #Filter data for specified samples (hepatocellular carcinoma) Note: Please be mindful that file names should have the prefixes removed. For instance, use ‘barcodes.tsv’, ‘features.tsv’, and ‘matrix.mtx’ rather than ‘GSE125449_Set1_barcodes.tsv’, ‘GSE125449_Set1_genes.tsv’, and ‘GSE125449_Set1_matrix.mtx’. Run single-cell sequencing data preparation program, display single-cell classification status ([100]Figure 1). > Sc_data[["percent.mt"]] <- PercentageFeatureSet(Sc_data, pattern = "ˆMT-") > Sc_data <- subset(Sc_data, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) > Sc_data <- NormalizeData(Sc_data, normalization.method = "RC", scale.factor = 10000) > Sc_data <- FindVariableFeatures(Sc_data, selection.method = "vst", nfeatures = 2000) > all.genes <- rownames(Sc_data) > Sc_data <- ScaleData(Sc_data, features = all.genes) > Sc_data <- ScaleData(Sc_data, vars.to.regress = "percent.mt") > Sc_data <- RunPCA(Sc_data, features = VariableFeatures(object = Sc_data)) > Sc_data<- RunTSNE(Sc_data, dims = 1:10) > DimPlot(Sc_data, group.by = "Type",reduction = "tsne",pt.size=0.5,label = FALSE) > ggsave("output/ScType.pdf",width = 4,height = 2.5) Note: A file named ‘ScType.pdf’ will be generated in the ‘output’ folder. If you need to eliminate batch effects from your data, you can utilize the Integration method from the Seurat package or the harmony[101]^28 package. This step uses data from liver cancer patients in the [102]GSE125449 dataset in the GEO database, with patient numbers listed as follows: “S16-P10-LCP18”, “S02-P01-LCP21”, “S10-P05-LCP23”, “S07-P02-LCP28”, “S12-007-LCP30”, “S21P13-LCP37”, “S15-P09-LCP38”, “S351-P10-LCP34”, “S364-P21-LCP65“ Figure 1. [103]Figure 1 [104]Open in a new tab Detection of clustering and annotation status in input single-cell sequencing data This data comes from Bai et al. 2023, Heliyon. Calculate immune score and tumor purity Inline graphic Timing: ∼1 h Here, we describe the steps for calculating immune score, stromal score, tumor purity score, and estimate score based on the ESTIMATE R package. * 4. Compute the immune score, tumor purity score, stromal score, and ESTIMATE score. > proj <- Sc_data > source("./code/AddModuleScore.r") Note: Evaluate the scRNA-seq obtained from the previous step and generate a heatmap (illustrating the overall abundance of different cell types in this evaluation, [105]Figure 2) and a RidgePlot (depicting the proportions of different cell types in this evaluation, [106]Figure 3). All raw data of the scoring will be saved in the ‘output’ folder and named as ‘estimate_score.csv’. Optional: If you encounter an error, please execute the following code line by line. Set ESTIMATE path. + a. Run the ESTIMATE R package to score each cell. >library("Seurat") > library("estimate") > DefaultAssay(proj) <- "RNA" > myfilterCommonGenes(input.f= as.matrix(proj@assays$RNA@data), output.f=paste0("matrixgenes.gct"), id="GeneSymbol") > estimateScore(paste0("matrixgenes.gct"),paste0("estimate_score .gct"), platform= "affymetrix") > estimate_score <- read.table(paste0("estimate_score.gct"),header = F,skip=2,stringsAsFactors = FALSE) > estimate_score <- t(estimate_score) > rownames(estimate_score) <- estimate_score[,1] > colnames(estimate_score) <- estimate_score[2,] > estimate_score <- estimate_score[-c(1:2),] > estimate_score[1:4,1:4] > estimate_score <- as.data.frame(estimate_score) > proj@meta.data$Stromal <- as.numeric(estimate_score$StromalScore) > proj@meta.data$Immune <- as.numeric(estimate_score$ImmuneScore) > proj@meta.data$TumorPurity <- as.numeric(estimate_score$TumorPurity) > proj@meta.data$ESTIMATE <- as.numeric(estimate_score$ESTIMATEScore) + b. View the score distribution of different types of cancer cells. > p1<- RidgePlot(proj,features = "Stromal",group.by="Type") > p2<- RidgePlot(proj,features = "Immune",group.by="Type") > p3<- RidgePlot(proj,features = "TumorPurity",group.by="Type") > p4<- RidgePlot(proj,features = "ESTIMATE",group.by="Type") > library(gridExtra) pdf(file= "./output/RidgePlot.pdf",height=4, width=10 ) > grid.arrange(p1,p2,p3,p4,ncol = 2 ) > dev.off() + c. View the score distribution for each cancer cell. > mydata1<- FetchData(proj,vars = c("tSNE_1","tSNE_2","Stromal","Immune","TumorPurity","ESTIMATE ")) > a1 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour = TumorPurity))+ geom_point(size = 1)+scale_color_gradientn(values = seq(0,1,0.2),colours = c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74 120C"))+theme_bw() > a2 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour = Stromal))+ geom_point(size = 1)+scale_color_gradientn(values = seq(0,1,0.2), colours = c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74 120C"))+theme_bw() > a3 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour = Immune))+ geom_point(size = 1)+scale_color_gradientn(values = seq(0,1,0.2), colours = c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74 120C"))+theme_bw() > a4 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour = ESTIMATE))+ geom_point(size = 1)+scale_color_gradientn(values = seq(0,1,0.2), colours = c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74 120C"))+theme_bw() > pdf(file= "./output/HeatMap.pdf",height=4, width=7 ) grid.arrange(a1,a2,a3,a4,ncol = 2 ) > dev.off() > write.csv(estimate_score,"./output/estimate_score.csv") Inline graphic CRITICAL: In this step, you can use the single-cell sequencing data of your interest. When executing 'proj <- Sc_data', replace 'Sc_data' with the name of the single-cell sequencing data you want to analyze. ‘proj@assays $ RNA@data’ represents the gene expression matrix within the single-cell sequencing dataset. This matrix is crucial for performing immune scoring on cells, please make sure to follow the previous prompts correctly. This method is based on single sample gene set enrichment analysis (ssGSEA) algorithm.[107]^2 This function computes stromal, immune, and ESTIMATE scores using gene-level expression data. For Affymetrix platform data, tumor purity is derived from ESTIMATE scores by applying non-linear squares methods to TCGA Affymetrix expression data (n = 995). The meanings of each score are as follows: Stromal Score, Numeric scalar specifying the presence of stromal cells in tumor tissue. Immune Score, Numeric scalar specifying the level of infiltrating immune cells in tumor tissue. ESTIMATE Score, Numeric scalar specifying tumor cellularity. Tumor Purity, Numeric scalar specifying ESTIMATE-based tumor purity, with values in the range [0, 1]. Figure 2. [108]Figure 2 [109]Open in a new tab Assessment of tumor purity and the immune microenvironment of tumors This data comes from Bai et al. 2023, Heliyon. (A) Tumor Purity score of scRNA-seq data. (B) Stromal score of scRNA-seq data. (C) Immune score of scRNA-seq data. (D) ESTIMATE score of scRNA-seq data. Figure 3. [110]Figure 3 [111]Open in a new tab Mountain map of the distribution of stromal score, tumor purity score, immune score, and ESTIMATE score in various cells This data comes from Bai et al. 2023, Heliyon. (A) The proportion of Stromal score in various types of cells. (B) The proportion of Tumor Purity score in various types of cells. (C) The proportion of Immune score in various types of cells. (D) The proportion of ESTIMATE score in various types of cells. Grouping all cells based on immune scores Inline graphic Timing: ∼1 h Here, we describe the steps of grouping all cancer cells based on immune scores. Grouping is performed based on the calculated immune scores from the previous step (or based on tumor purity), and the FindMarkers() function is used for differential analysis ([112]Table S1). * 5. Identification of key genes related to the tumor immune microenvironment. >source("./code/TIME.r") Note: The first column represents the gene symbol, the second column “avg_logFC” indicates the average expression fold change in logarithm between the two groups. A positive value indicates higher expression of the gene in the first group. The third column “pct.1” represents the percentage of cells in which the gene is detected in the first group, and the fourth column “pct.2” represents the percentage of cells in which the gene is detected in the second group. The fifth column “p_val_adj” indicates the adjusted p-value based on the Bonferroni correction using all genes in the dataset. Optional: If you encounter an error, please execute the following code line by line. > library(pheatmap) > group <- ifelse(proj@meta.data$Immune >= median(proj@meta.data$Immune),'Immune_high','Immune_low') table(group) > proj@meta.data$group <- group > diff_dat <- FindMarkers(proj,ident.1="Immune_high",ident.2="Immune_low", group.by='group') > diff1 <- diff_dat[diff_dat$p_val_adj<0.05 & abs(diff_dat$avg_log2FC)>=1,] > write.csv(diff1,"./output/diff1.csv") Note: If you want to use tumor purity as the grouping condition, please open the ‘Estimate.r’ file and modify the grouping code as follows: > group <- ifelse(proj@meta.data$TumorPurity>=median(proj@meta.data$ TumorPurity),' TumorPurity_high',' TumorPurity_low') Signal pathway enrichment Inline graphic Timing: ∼30 min Here, we describe the steps of pathway enrichment using differentially expressed genes obtained through immune scoring grouping. * 6. Enrichment analysis of pathways can be performed using two alternative methods at this step. Option 1: Perform KEGG pathway enrichment using 'clusterprofiler'. Running ‘source(Kegg.r)’ will generate a file named ‘kegg.csv’ in the ‘output’ folder, which represents the KEGG pathway enrichment results ([113]Table S2). > source(Kegg.r) Optional: If you encounter an error, please execute the following code line by line. > library(clusterProfiler) > library(GOplot) > gene<-bitr(rownames(diff1),fromType = 'SYMBOL',toType = 'ENTREZID',OrgDb = 'org.Hs.eg.db') > KEGG<-enrichKEGG( gene$ENTREZID, organism = "hsa",minGSSize =5, maxGSSize =500, keyType = "kegg", use_internal_data = FALSE ) > barplot(KEGG,showCategory = 10,title = 'KEGG Pathway')+scale_color_gradient(low="green",high = "red") > write.csv(KEGG,"./output/kegg.csv") Note: If the gene species is not human, please modify the 'organism' parameter in the Kegg.r code according to the abbreviation of the queried species. Please refer to '[114]https://www.genome.jp/kegg/catalog/org_list.html' for the abbreviations of the specific species. For detailed explanations of other additional parameters, please refer to the usage guide of the 'clusterProfiler' R package. Option 2: Perform KEGG pathway enrichment analysis using the online tool David. + a. Open the official website of David ([115]https://david.ncifcrf.gov/). + b. The “Start Analysis” option requires input of three sets of data before beginning the analysis, gene symbols, identifiers, and list type. + c. Paste the gene symbols from the “diff.csv” file obtained in Step 5 into the ‘Upload Gene List’ input box. + d. Select the 'Step2: OFFICAL_GENE_SYMBOL' option in the Select Identifier dropdown menu. + e. Enter ‘Homo sapiens’ in ‘Step 2a: Select species’. + f. Select the ‘Gene List’ option in the ‘Step 3: List Type’ dropdown menu. + g. Click the 'Submit List' button in 'Step 4: Submit List'. + h. Click on the ‘Functional Annotation Chart’ option on the ‘Analysis Wizard’ page. + i. Click on 'Clear All'. + j. Click on ‘Pathways’ and then select ‘KEGG_PATHWAY’. + k. Click on the 'Functional Annotation Chart' option. + l. Click on the 'Download File' option on the popped-up page, then copy all the signal pathways and paste them in Excel as Unicode file type. Option 3: Perform KEGG pathway enrichment analysis using the online tool Enrichr. + m. Open the official website of Enrichr ([116]https://maayanlab.cloud/Enrichr/). + n. Paste a list of Entrez gene symbols in the text box of the Input data option, one per line. Entrez gene symbols can be found in the output file [117]Table S1 from step 5. + o. Click “Submit” and then select the “Pathways” option. + p. Select “KEGG 2021 Human” and click “table” to download the signal pathway enrichment results. + q. Copy all the signal pathways and paste them in Excel as Unicode file type. Note: This step is optional and suitable for users who are not familiar with the usage of the ‘clusterProfiler’ R package. Evaluation of signaling pathway enrichment level Inline graphic Timing: ∼1 h Here, we describe the steps of quantitatively analyzing signal pathways using the 'AddModuleScore' function. * 7. Computational analysis of pathway expression across different cell types. + a. Specify the identifier for the signal pathway that needs to be computed. Using ‘Cholesterol metabolism’ as an example, input the identifier for Cholesterol metabolism. >ID = "hsa04979" Note: KEGG ID can be queried from the following two websites: [118]https://www.kegg.jp/ Or [119]http://lmmd.ecust.edu.cn/netinfer/get_results.php?id=6248 + b. Running the calculation program will generate a file in Seurat object format named Inscore, and the calculated scores will be named after the first word of the signal pathway, contained within the Inscore file. > source("./code/Chose_pathway.r") Optional: If you encounter an error, please execute the following code line by line. o i. Calculate the enrichment fraction of the hsa04979 signaling pathway. > library(KEGGREST) > library(Seurat) > gsInfo = keggGet(ID)[[1]] > names(gsInfo) > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04979 <- list(geneSet$`Cholesterol metabolism - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04979, ctrl = 10, name = "hsa04979") > Sc_data@meta.data$hsa04979 <- as.numeric(Inscore$hsa049791) o ii. Integrate the calculated scores of the signal pathways into the single-cell sequencing data. > Sc_data@meta.data$hsa04979 <- as.numeric(Inscore$hsa049791) Note: The ‘AddModuleScore()’ function defaults to adding a numerical sequence after the name, so it is important to add the number ‘1’ after the pathway name here. o iii. Generate a single battery heat map for hsa04979 ([120]Figure 4) and create the necessary data for drawing, named ‘mydata1’ here. > library(ggplot2) > mydata1<- FetchData(Sc_data,vars = c("tSNE_1","tSNE_2","hsa04979")) > ggplot(mydata1,aes(x = tSNE_1, y =tSNE_2, colour = hsa04979)) + geom_point(size = 1) + scale_color_gradientn(values = seq(0,1,0.2),colours = c("#0425D4","#0085FF","#00B9D1","#66D404","#EF6909","red" ,"#74120C")) + theme_bw() Note: Draw a heatmap of the hsa04979 signaling pathway in single-cell sequencing data. You can change the color of the heatmap based on the colors parameter, but it is recommended not to modify other parameters to maintain consistency with the previous single-cell clustering map. * 8. Display multiple signal pathways in the form of heatmaps, and repeat steps 6a–6d. Here, we will use signal pathways such as hsa05200, hsa04510, hsa05208, hsa04932, hsa04151, hsa05205, hsa04610, hsa00190, hsa04512, hsa04979, hsa04066, and hsa04668 as examples to draw a heat map ([121]Figure 5). >source("./code/All_pathways.r") Optional: If you encounter an error, please execute the following code line by line. + a. Calculate the expression abundance of hsa04979. > gsInfo = keggGet('hsa04979')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04979 <- list(geneSet$`Cholesterol metabolism - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04979, ctrl = 10, name = "hsa04979") > Sc_data@meta.data$hsa04979 <- as.numeric(Inscore$hsa049791) #Proteoglycans in cancer > gsInfo = keggGet('hsa05205')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa05205 <- list(geneSet$`Proteoglycans in cancer - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa05205, ctrl = 10, name = "hsa05205") > Sc_data@meta.data$hsa05205 <- as.numeric(Inscore$hsa052051) + b. Calculate the expression abundance of hsa00190. > gsInfo = keggGet('hsa00190')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa00190 <- list(geneSet$`Oxidative phosphorylation - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa00190, ctrl = 10, name = "hsa00190") > Sc_data@meta.data$hsa00190 <- as.numeric(Inscore$hsa001901) + c. Calculate the expression abundance of hsa04668. > gsInfo = keggGet('hsa04668')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04668 <- list(geneSet$`TNF signaling pathway - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04668, ctrl = 10, name = "hsa04668") > Sc_data@meta.data$hsa04668 <- as.numeric(Inscore$hsa046681) + d. Calculate the expression abundance of hsa052081. > gsInfo = keggGet('hsa05208')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) names(geneSet) = gsInfo$NAME > hsa05208 <- list(geneSet$`Chemical carcinogenesis - reactive oxygen species - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa05208, ctrl = 10, name = "hsa05208") > Sc_data@meta.data$hsa05208 <- as.numeric(Inscore$hsa052081) + e. Calculate the expression abundance of hsa05200. > gsInfo = keggGet('hsa05200')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa05200 <- list(geneSet$`Pathways in cancer - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa05200, ctrl = 10, name = "hsa05200") > Sc_data@meta.data$hsa05200 <- as.numeric(Inscore$hsa052001) + f. Calculate the expression abundance of hsa04510. > gsInfo = keggGet('hsa04510')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04510 <- list(geneSet$`Focal adhesion - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04510, ctrl = 10, name = "hsa04510") > Sc_data@meta.data$hsa04510 <- as.numeric(Inscore$hsa045101) + g. Calculate the expression abundance of hsa04932. > gsInfo = keggGet('hsa04932')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04932 <- list(geneSet$`Non-alcoholic fatty liver disease - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04932, ctrl = 10, name = "hsa04932") > Sc_data@meta.data$hsa04932 <- as.numeric(Inscore$hsa049321) + h. Calculate the expression abundance of hsa04151. > gsInfo = keggGet('hsa04151')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04151 <- list(geneSet$`PI3K-Akt signaling pathway - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04151, ctrl = 10, name = "hsa04151") > Sc_data@meta.data$hsa04151 <- as.numeric(Inscore$hsa041511) + i. Calculate the expression abundance of hsa04610. > gsInfo = keggGet('hsa04610')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04610 <- list(geneSet$`Complement and coagulation cascades - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04610, ctrl = 10, name = "hsa04610") > Sc_data@meta.data$hsa04610 <- as.numeric(Inscore$hsa046101) + j. Calculate the expression abundance of hsa04512. > gsInfo = keggGet('hsa04512')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04512 <- list(geneSet$`ECM-receptor interaction - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04512, ctrl = 10, name = "hsa04512") > Sc_data@meta.data$hsa04512 <- as.numeric(Inscore$hsa045121) + k. Calculate the expression abundance of hsa04066. > gsInfo = keggGet('hsa04066')[[1]] > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04066 <- list(geneSet$`HIF-1 signaling pathway - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04066, ctrl = 10, name = "hsa04066") > Sc_data@meta.data$hsa04066 <- as.numeric(Inscore$hsa040661) + l. Organize heat map input data. > mydata1<- FetchData(Sc_data,vars = c("Type","sample","hsa05200", "hsa04510", "hsa05208", "hsa04932", "hsa04151", "hsa05205", "hsa04610", "hsa00190", "hsa04512", "hsa04979", "hsa04066", "hsa04668")) > ann_col <- data.frame(mydata1[,1:2]) > ann_col$id <-rownames(ann_col) > ann_col <- ann_col[order(ann_col[,1]),] > ac <- data.frame(ann_col$Type) > rownames(ac) <- rownames(ann_col) > colnames(ac) <- c("cell type") > ac$sample <- ann_col$sample > mydata <- mydata1[row.names(ac),3:14] > mydata <- t(mydata) + m. Draw a signal pathway heatmap. > library(pheatmap) > pdf(file= "./output/KEGG_HeatMap.pdf",height=5.7, width=5 ) > pheatmap(mydata, show_rownames = T, show_colnames = F,cluster_cols = FALSE, cluster_rows = F,fontsize=6, annotation_col = ac) > dev.off() Note: In this step, the analysis process from a to k remains exactly the same, except for the different IDs of the signaling pathways. If you need to analyze a specific signaling pathway, simply modify the ID of the pathway. For example, if you need to analyze the ‘Adherens junction signaling pathway (ID hsa04520): > gsInfo = keggGet(‘hsa04520’)[[1]]; > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x) x[1]) > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)]) > names(geneSet) = gsInfo$NAME > hsa04520 = list(geneSet$`Adherens junction signaling - Homo sapiens (human)`) > Inscore <- AddModuleScore(Sc_data, features = hsa04520, ctrl = 10, name = "hsa0452") > Sc_data@meta.data$hsa04520 <- as.numeric(Inscore$hsa045201)# It should be noted that the algorithm for this line of code will default to adding a sequence number 1 after the ID number. Figure 4. [122]Figure 4 [123]Open in a new tab The expression abundance of the hsa04979 signaling pathway This data comes from Bai et al. 2023, Heliyon. Figure 5. [124]Figure 5 [125]Open in a new tab Assessment of expression abundance for all signaling pathways This data comes from Bai et al. 2023, Heliyon. Differential analysis of abnormal signaling pathways Inline graphic Timing: ∼30 min Here, we describe the steps of screening differentially expressed genes based on abnormal signaling pathway grouping. * 9. Calculate differential genes in single-cell sequencing data by grouping a certain signaling pathway. For example, using the ‘hsa04979’ signaling pathway, all cells are divided into two groups: high and low. > source("./code/Aps.r") Optional: If you encounter an error, please execute the following code line by line. > group1 <- ifelse(Sc_data@meta.data$hsa04979>=median(Sc_data@meta.data$hsa04979),' hsa04979_high','hsa04979_low') > table(group1) > Sc_data@meta.data$group1 <- group1 > diff_dat1 <- FindMarkers(Sc_data,ident.1="hsa04979_high",ident.2="hsa04979_low", group.by='group1') > diff2 <- diff_dat1[diff_dat1$p_val_adj<0.05 & abs(diff_dat1$avg_log2FC)>=1,] > write.csv(diff2,"./output/diff2.csv") Note: After running this code, a file named ‘diff2.csv’ will be generated in the output directory, containing the list of genes with |log2FC|>=1 and p value < 0.05 ([126]Table S3). The file format is the same as that generated in Step 4. The first column represents the gene symbol, the second column ‘avg_logFC’ indicates the average expression fold change in logarithm between the two groups. A positive value indicates higher expression of the gene in the first group. The third column ‘pct.1’ represents the percentage of cells in which the gene is detected in the first group, and the fourth column ‘pct.2’ represents the percentage of cells in which the gene is detected in the second group. The fifth column ‘p_val_adj’ indicates the adjusted p-value based on the bonferroni correction using all genes in the dataset. Hub gene screening Inline graphic Timing: ∼30 min Here, we describe the steps for screening hub genes based on differential analysis of immune and abnormal signaling pathways. * 10. Identify and overlap the differential genes associated with the tumor immune microenvironment and the abnormal signaling pathways for screening ([127]Table S4), and visually represent this information using a Venn diagram ([128]Figure 6). > source("./code/hub_genes.r") Optional: If you encounter an error, please execute the following code line by line. > diff1 <- read.csv("./output/diff1.csv",header = T,row.names = 1) > diff2 <- read.csv("./output/diff2.csv",header = T,row.names = 1) > TIME_DEGs <- rownames(diff1) > Chose_pathway_DEGs <- rownames(diff2) > overlapping_Degs <- intersect(TIME_DEGs,Chose_pathway_DEGs) > library(ggvenn) > x <- list(TIME_DEGs=TIME_DEGs,Chose_pathway_DEGs=Chose_pathway_DEGs) > ggvenn(x,show_percentage = F,text_size = 4,set_name_size = 3,stroke_color = "white",fill_color = c("#DF8F4499","#00A1D599"), set_name_color = c("#DF8F4499","#00A1D599")) > ggsave(file= "./output/vn.pdf",height=4, width=4 ) > write.csv(overlapping_Degs,"output/Hub_genes.csv") Figure 6. [129]Figure 6 [130]Open in a new tab Identification of hub genes associated with the tumor immune microenvironment and aberrant signaling pathways This data comes from Bai et al. 2023, Heliyon. Expected outcomes [131]Figure 1 presents the cell types that need to be analyzed, including cell annotations and clustering information. In order to understand the immune microenvironment of various cancer cell types, we conducted immunological and tumor purity scoring for all cell types ([132]Figure 2), as well as the proportion of each score in different cell types ([133]Figure 3). The colors represent the magnitude of the different score values, with shades closer to red indicating higher scores and shades closer to blue indicating lower scores. Here, we can observe that within malignant tumors, the immune score is the lowest, while the tumor purity score is the highest, indicating the accuracy of our analysis results. We observed a remarkably low immune score in malignant tumor cells. Therefore, based on this, we grouped all cell types according to their immune scores, allowing us to more accurately identify which cells and genes have lower immune levels. The results are saved in the supplementary material file diff1.csv. At the same time, by conducting pathway enrichment analysis on the genes showing significant differences in the tumor immune microenvironment, we can identify the signaling pathways that are abnormally expressed in malignant cells ([134]Figure 5). Subsequently, to further confirm the high expression of the identified signaling pathway (hsa04979) in malignant cells, we performed pathway scoring for all cells and generated a heatmap. From the heatmap, it is clear that the hsa04979 signaling pathway is abnormally highly expressed in malignant cells ([135]Figure 4). Similarly, by grouping genes according to the hsa04979 signaling pathway, we can obtain the genes that impact the hsa04979 signaling pathway ([136]Table S3). Finally, in order to identify potentially valuable genes for research that impact both the tumor immune microenvironment and certain aberrant signaling pathways, we obtained the associated hub genes by taking the intersection of the two ([137]Figure 4; [138]Table S4). Quantification and statistical analysis We evaluated the immune score and tumor purity using the ESTIMATE R package, and calculated differentially expressed genes using the ‘FindMarker()’ function in the Seurat R package. These details are described in Step 5 of the Step-by-step method section. Limitations This protocol does not have any specific requirements regarding the operating system; however, it was developed and tested on a Windows system. We cannot guarantee its error-free operation on other systems. In theory, it should be able to run smoothly, but there might be unforeseen issues on different platforms. Troubleshooting It is crucial to consider the critical statements and notes accompanying each step outlined in the method details. Problem 1 There are no available input files for Seurat, only the raw data. (NGS data or FASTA data) Additional processing of the raw data is required, such as quality control, genome alignment, read counting, etc. which are not covered in this protocol. Potential solution For quality control of the raw data, FastQC, MultiQC and Trim Galore can be used as reference. * • FastQC: [139]https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ * • MultiQC: [140]https://multiqc.info/ * • Trim Galore: [141]https://www.bioinformatics.babraham.ac.uk/projects/trim_galore / For genome alignment and read count statistics, specialized processing methods are provided by Cell Ranger (commercial software from 10× Genomics), alevin, kallisto|bustools, STARsolo and alevin-fry. * • Cell Ranger: [142]https://support.10xgenomics.com/single-cell-gene-expression/so ftware/pipelines/latest/what-is-cell-ranger * • alevin :[143]https://alevin-fry.readthedocs.io/en/latest/ * • kallisto|bustools :[144]https://bustools.github.io/BUS_notebooks_R/velocity.html Problem 2 When running code, the output is not expected (i.e., error messages in the console or unexpected features in the drawing). Potential solution In order to ensure the proper execution of the code, it is necessary to confirm the correctness of syntax, R package version, input data format, and the saved path, such as. * • Commas, quotation marks, parentheses, and other punctuation marks must be in English input format. * • Please ensure that your data is saved in the 'project/data' directory. Problem 3 Only 10× Genomics analyzed data (barcodes, features, and matrix files) were available, but cell types were not annotated. Potential solution You can use software such as SingleR, CellAssign, Celaref, scCATCH, etc. for cell annotation. Please refer to ‘[145]before you begin’ and ‘[146]dataset selection’. Problem 4 There are common cell type annotations, but there are no annotations for malignant cell types. Potential solution You can utilize the inferCNV R package for annotation purposes, even in the absence of a strong CNV profile. You can directly execute this protocol because malignant tumor cells often accompany lower immune levels. This protocol is based on immune scoring to group all cells and screen for abnormal signaling pathways, thus not impeding the progress of the analysis workflow. Problem 5 The following error is displayed: Error in as.matrix(proj@assays$RNA@data): The slot name ‘data’ does not exist in the ‘Assay5’ class object. Potential solution The protocol employs Seurat version 4.2.1, and subsequent versions may lead to potential errors. This arises from the merging of two single-cell sequencing data in the provided example data. In versions beyond 4.2.1, the merging process will generate a single list file, while the expression matrices will be separated, inadvertently causing errors. Please refer to the instructions in '[147]before you begin'. Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Weiyu Bai (925605397@qq.com). Technical contact For technical issues related to this protocol, please contact Weiyu Bai (925605397@qq.com). Materials availability The materials, data, software, and code involved in this agreement have been listed in the [148]key resources table. Data and code availability The single-cell sequencing data in this protocol can be obtained from the following link: [149]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125449. The raw codes in this protocol can be obtained from the following link: [150]https://github.com/weiyubai/TIME_ASP and [151]https://zenodo.org/doi/10.5281/zenodo.11000276. Acknowledgments