Graphical abstract

   graphic file with name fx1.jpg
   [24]Open in a new tab

Highlights

     * •
       Steps for evaluating immune levels of each cell
     * •
       Instructions to detect abnormal cancer signaling pathways
     * •
       Guidance on visualizing and analyzing tumor cell signaling
     * •
       Techniques for identifying central genes related to immunity
     __________________________________________________________________

   Publisher’s note: Undertaking any experimental protocol requires
   adherence to local institutional guidelines for laboratory safety and
   ethics.
     __________________________________________________________________

   Malignant tumor cells are typically more active in terms of metabolism
   and signal transduction compared to immune and normal cells. Here, we
   present a protocol to evaluate immune levels, abnormal metabolism, and
   signaling pathways in tumor tissue based on single-cell sequencing
   based on patient data obtained from the GEO database. We describe steps
   for tumor immune microenvironment (TIME)-based evaluation, tumor purity
   assessment, and identification of abnormal signal transduction and
   metabolic pathways. We then detail procedures for screening hub genes.

Before you begin

   The ESTIMATE (Estimation of Stromal and Immune cells in Malignant Tumor
   tissues using Expression data) score is a method used to infer the
   proportion of stromal and immune cells in tumor samples using gene
   expression data.[25]^2 This score provides insight into the tumor
   microenvironment, which is crucial for understanding cancer progression
   and how it interacts with the body’s immune system.[26]^3 The ESTIMATE
   score is derived from the expression of genes that are representative
   of the presence of stromal and immune cells in the tumor tissue. Given
   the common tendency of malignant tumor cells to exhibit immune evasion
   and lower immune levels,[27]^4^,[28]^5^,[29]^6^,[30]^7 this protocol is
   also applicable for evaluating abnormal signaling pathways and
   metabolic pathways in such cells. The following section outlines six
   key steps.
     * 1.
       Analysis of immune levels and tumor purity.
     * 2.
       Cell grouping based on immune levels.
     * 3.
       Enrichment of signaling pathways.
     * 4.
       Assessment of signaling pathway enrichment levels.
     * 5.
       Analysis of abnormal signaling pathways.
     * 6.
       Selection of hub genes.

Download and install R, Rtools, RStudio and Git

     Inline graphic Timing: ∼1 h

     * 7.
       The analysis in this protocol is conducted using R language, and
       you can download the appropriate version of R programming language
       based on the computer’s operating system. Generally speaking, this
       protocol is not limited by the R language version or computer
       operating system.
          + a.
            Visit the following link to obtain the R language:
            [31]https://cran.r-project.org/.

     Note: The current pipeline was performed using R version 4.3.2.

     * 8.
       Rtools includes a set of tools and dependencies, the most important
       of which is the GNU Compiler Collection (GCC). GCC is a powerful
       compiler that supports multiple programming languages, including C,
       C++, and Fortran. In R programming, it is sometimes necessary to
       write extension packages using these languages. Rtools provides
       tools such as GCC, which makes compiling these extension packages
       on computer simple and efficient.
          + a.
            Visit the following link to obtain the Rtools:
            [32]https://cran.r-project.org/bin/windows/Rtools/.

     Note: The current pipeline was performed using Rtools43.

     * 9.
       RStudio is an integrated development environment (IDE) for R. Due
       to the extensive amount of code involved in this protocol, we
       recommend installing RStudio for easier code modification and
       plotting.
          + a.
            Visit the following link to obtain the RStudio:
            [33]https://rstudio.com/products/rstudio/.
     * 10.
       Git is an open-source software that can be used for various
       operations on GitHub, such as creating, modifying, uploading,
       cloning, and more.
          + a.
            Visit the following link to obtain the Git:
            [34]https://git-scm.com/download.

Download required packages in RStudio

     Inline graphic Timing: ∼1 h

     * 11.
       If this is your first-time installing R language and RStudio,
       please install the R packages used in this method and their
       dependent environments, as most of the R packages involved are
       related to biology. You can use the BiocManager software to install
       them. Use the following command for installation:
       BiocManager::install() or install.packages().
          + a.
            Install the R packages involved in this protocol.
            if (!requireNamespace("BiocManager", quietly = TRUE));
            install.packages("BiocManager");
            BiocManager::install(c("ggplot2","data.table","pheatmap","clus
            terProfiler","enrichplot" ,"stringr","KEGGREST","devtools",
            "ggvenn", "gridExtra","SingleR","celldex"))
          + b.
            To install the estimate R packages.
            library(utils);
            rforge <- "http://r-forge.r-project.org";
            install.packages("estimate", repos=rforge, dependencies=TRUE)
          + c.
            To install the SeuratObject (version = “4.1.3”) and Seurat
            (version = “4.2.1”) R packages, we strongly recommend using
            the following code, as the latest versions of SeuratObject and
            Seurat encounter errors when merging multiple single-cell
            datasets:
               o i.
                 # Uninstall the installed Seurat and SeuratObject R
                 package.
                 remove.packages(c("SeuratObject","Seurat"))
                 #Reinstall the SeuratObject (version="4.1.3") and Seurat
                 (version="4.2.1") R packages.
                 library(devtools)
                 install_version("SeuratObject", version = "4.1.3",repos =
                 "http://cran.us.r-project.org")
                 install_version("Seurat", version = "4.2.1",repos =
                 "http://cran.us.r-project.org")
               o ii.
                 Alternatively, you can use the following code for
                 installation:
                 # Uninstall the installed Seurat and SeuratObject R
                 package.
                 remove.packages(c("SeuratObject","Seurat"))
                 #Reinstall the SeuratObject (version="4.1.3") and Seurat
                 (version="4.2.1") R packages.
                 packageurl1 <-[35]http://cran.us.r-project.org/src/contri
                 b/Archive/SeuratObject/SeuratObject_4.1.3.tar.gz
                 install.packages(packageurl1, repos=NULL, type="source")
                 packageurl2 <-[36]http://cran.us.r-project.org/src/contri
                 b/Archive/Seurat/Seurat_4.2.1.tar.gz
                 install.packages(packageurl2, repos=NULL, type="source")
          + d.
            Import the R packages used in this protocol.
            {
            library(ggplot2)
            library(data.table)
            library(estimate)
            library(pheatmap)
            library(clusterProfiler)
            library(enrichplot)
            library(stringr)
            library(KEGGREST)
            library(devtools)
            library(ggvenn)
            library(SingleR)
            library(celldex)}

Dataset selection

     Inline graphic Timing: ∼2 h

     * 12.
       The data selection can be determined based on your specific needs.
       One approach is to obtain data from public databases, while another
       is to prepare your own single-cell sequencing data. This method is
       also suitable for the integration analysis of multiple single-cell
       sequencing data. The data preprocessing steps include the code for
       data merging. It is crucial to pay attention to preparing the data
       in the Seurat object format that can be recognized by Seurat, and
       it should also include the annotation of cell types for the
       specific cells you intend to analyze.
          + a.
            This protocol does not include cell type annotation for
            single-cell sequencing. Please refer to the following options
            for annotating cell types in advance.
               o i.
                 To use SingleR,[37]^8 refer to the usage instructions for
                 this software available at the following link:
                 [38]https://www.bioconductor.org/packages/release/bioc/vi
                 gnettes/SingleR/inst/doc/SingleR.html.
               o ii.
                 To use Cellassign,[39]^9 refer to the usage instructions
                 for this software available at the following link:
                 [40]https://github.com/irrationone/cellassign.
               o iii.
                 To use Celaref.,[41]^10 refer to the usage instructions
                 for this software available at the following link:
                 [42]https://bioconductor.org/packages/release/bioc/vignet
                 tes/celaref/inst/doc/celaref_doco.html.
               o iv.
                 To use scCATCH,[43]^11 refer to the usage instructions
                 for this software available at the following link:
                 [44]https://github.com/ZJUFanLab/scCATCH.
          + b.
            If you want to annotate malignant tumors, you can use the
            inferCNV R package.
               o i.
                 To use inferCNV,[45]^12 refer to the usage instructions
                 for this software available at the following link:
                 [46]https://github.com/broadinstitute/inferCNV.

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Deposited data
     __________________________________________________________________

   Raw code GitHub repository [47]https://github.com/weiyubai/TIME_ASP
   The code version in this protocol TIME-ASP V1.0[48]^13
   [49]https://zenodo.org/doi/10.5281/zenodo.11000276
   [50]GSE125449 Ma et al.[51]^14
   [52]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125449
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   R software R Core Team[53]^15 [54]https://www.r-project.org/
   RStudio software RStudio Team[55]^16 [56]https://rstudio.com/
   Rtools software R Core Team[57]^15
   [58]https://cran.r-project.org/bin/windows/Rtools/
   Git software Open source [59]https://git-scm.com
   data.table package Dowle et al.[60]^17
   [61]https://cloud.r-project.org/web/packages/data.table/index.html
   ESTIMATE package Yoshihara et al.[62]^2
   [63]https://bioinformatics.mdanderson.org/public-software/estimate/
   pheatmap package Kolde et al.[64]^18
   [65]https://cran.r-project.org/web/packages/pheatmap/index.html
   clusterProfiler package Yu et al.[66]^19
   [67]https://bioconductor.org/packages/release/bioc/html/clusterProfiler
   .html
   stringr package Wickham et al.[68]^20
   [69]https://cran.r-project.org/web/packages/stringr/index.html
   KEGGREST package Tenenbaum et al.[70]^21
   [71]https://www.bioconductor.org/packages/release/bioc/vignettes/KEGGRE
   ST/inst/doc/KEGGREST-vignette.html
   devtools package Wickham et al.[72]^22
   [73]https://cran.r-project.org/web/packages/devtools/index.html
   SingleR package Lun et al.[74]^8
   [75]https://www.bioconductor.org/packages/release/bioc/vignettes/Single
   R/inst/doc/SingleR.html
   cellassign package Zhang et al.[76]^9
   [77]https://github.com/irrationone/cellassign
   Celaref package Williams et al.[78]^23
   [79]https://bioconductor.org/packages/release/bioc/vignettes/celaref/in
   st/doc/celaref_doco.html
   scCATCH package Shao et al.[80]^11
   [81]https://github.com/ZJUFanLab/scCATCH
   ggvenn package Yan et al.[82]^24
   [83]https://github.com/yanlinlin82/ggvenn
   Seurat package Butler et al.[84]^25
   [85]http://cran.us.r-project.org/src/contrib/Archive/Seurat/Seurat_4.2.
   1.tar.gz
   SeuratObject package Satija et al.[86]^25
   [87]http://cran.us.r-project.org/src/contrib/Archive/SeuratObject/Seura
   tObject_4.1.3.tar.gz
   gridExtra package Auguie et al.[88]^26
   [89]https://cran.r-project.org/web/packages/gridExtra/index.html
   celldex package Aran et al.[90]^8
   [91]https://bioconductor.org/packages/release/data/experiment/html/cell
   dex.html
   GOplot package Walter et al.[92]^27 [93]https://wencke.github.io/
     __________________________________________________________________

   Other
     __________________________________________________________________

   Computing platform: this protocol runs on a Windows system with 1 TB of
   memory, powered by an Intel Core i7-8550U processor, quad-core, with a
   maximum turbo frequency of 1.99 GHz, and equipped with 16 GB of RAM.
   Windows 11 Home edition N/A
   [94]Open in a new tab

Materials and equipment

   This protocol can be applied to analyze single-cell sequencing data
   from your own laboratory as well as publicly available databases. It is
   important to note that cell types need to be annotated in advance. The
   protocol does not perform normalization and scaling steps for scRNA
   data. If it is necessary to organize FASTA files into expression
   matrices, it may require a lot of computing power and require the use
   of servers or computer clusters.

Step-by-step method details

Clone project and raw code from GitHub

     Inline graphic Timing: ∼30 min

     * 1.
       Clone the TIME-ASP repository from GitHub.
          + a.
            Double-click to open the ‘Git CMD.exe’.
          + b.
            Create a folder for storing the project at an absolute path
            using the ‘mkdir’ command.
            > mkdir D:\TIME-ASP
          + c.
            Set the working directory to the absolute path established in
            the previous step.
            > cd /d D:\TIME-ASP
          + d.
            Clone the TIME-ASP project from GitHub.
            > git clone[95]https://github.com/weiyubai/TIME_ASP.git

     Note: All data for this analysis workflow is included therein. The
     raw code is located in the ‘code’ folder and named according to the
     corresponding step. ‘D:\TIME-ASP' represents the absolute path
     designated for storing the project files. ‘TIME-ASP’ serves as the
     folder name for storage, and you can customize it as per your
     requirements.

Prepare input data

     Inline graphic Timing: ∼2 h

   Set Rstudio working path.
     * 2.
       Set the RStudio working directory to the project folder.

   > setwd("D:/TIME_ASP ")

     Note: If you wish to temporarily halt your program, the 'save'
     function can be employed to preserve the environment variables.
     Subsequently, the 'load' function allows for the reloading of these
     variables prior to the next execution. For instance, to store
     variables and execute subsequent code, you could utilize
     `save(list = ls(Sc_data,p1,p2,p3,p4), file = “./my_data.Rdata”)`,
     followed by reloading the saved environment variables with
     `load(“./my_data.Rdata”)`. Notably, the variables that require
     saving are listed in `ls()`and 'file = ' is the path where the
     environment variables are saved.

     * 3.
       Prepare Seurat object format single-cell sequencing data of
       interest.
       Option 1: User’s own or public database single-cell sequencing
       data.
          + a.
            Put the three files ‘barcodes.tsv’, ‘genes.tsv’, and
            ‘matrix.mtx’ into the 'folder' folder.
          + b.
            Build Seurat object.
            > Sc_data = CreateSeuratObject(counts = Read10X(folder)

     Note: This step takes the 10× Genomics data format as an example.
     For other data formats, please refer to the data import help in the
     Seurat R package.
          + c.
            Calculate the proportion of mitochondrial genes.
          + d.
            Quality control.
            > Sc_data[["percent.mt"]] <- PercentageFeatureSet(Sc_data,
            pattern = "ˆMT-")
            # The variable in this step is defined as "Sc_data" in order
            to connect with the analysis steps of subsequent sample data.
               o i.
                 Filter cells with UMI (Unique Molecular Identifier)
                 numbers greater than 2500.
               o ii.
                 Filter cells with UMI numbers less than 200.
               o iii.
                 Filter cells with a percentage of mitochondrial genes
                 greater than 5% of the total number of genes
                 > Sc_data <- subset(Sc_data, subset =
                 nFeature_RNA > 200 & nFeature_RNA < 2500 &
                 percent.mt < 5)
          + e.
            Data standardization.
            > Sc_data <- NormalizeData(Sc_data, normalization.method =
            "LogNormalize", scale.factor = 10000)

     Note: The default data standardization method used is LogNormalize,
     where the total expression level of each cell is standardized to
     10000, and then log is taken as the logarithm; the results are
     stored in Sc_data[“RNA”] @ data.
          + f.
            Perform cell type annotation and linear dimensionality
            reduction.
            > Sc_data <- FindVariableFeatures(Sc_data, selection.method =
            "vst", nfeatures = 2000)
            > all.genes <- rownames(Sc_data)
            > Sc_data <- ScaleData(Sc_data, features = all.genes)
            > Sc_data <- ScaleData(Sc_data, vars.to.regress =
            "percent.mt")
            #Using SingleR for cell annotation
            > library(SingleR)
            > library(celldex)
            > hpca.se <- HumanPrimaryCellAtlasData()
            > lihcSingleR <- GetAssayData(Sc_data, slot = "data")
            > lihc.hesc <- SingleR(test=lihcSingleR, ref = hpca.se,
            labels =hpca.se$label.main)
            > table(lihc.hesc$labels, Sc_data $seurat_clusters)
            > Sc_data @meta.data$labels <- lihc.hesc$labels
            > Sc_data <- RunPCA(Sc_data, features =
            VariableFeatures(object = Sc_data))
            > Sc_data<- RunTSNE(Sc_data, dims = 1:10)
            > DimPlot(Sc_data, group.by ="labels",reduction =
            "tsne",label = TRUE)
            > ggsave("ScType.pdf",width = 4,height = 2.5)
            >save(Sc_data,file="./Sc_data.Rdata)
            #Save single-cell sequencing expression matrix.
            Option 2: Using sample data.
          + g.
            Download the [96]GSE125449 dataset from the public database
               o i.
                 Open the “Project” folder and create a new folder named
                 “set”.
               o ii.
                 Navigate to [97]https://www.ncbi.nlm.nih.gov/.
               o iii.
                 Select ‘GEO DataSets’ search term.
               o iv.
                 Enter GEO ID (e.g., [98]GSE125449) in the input field and
                 select ‘Search’ option.
               o v.
                 Click to enter the first project.
               o vi.
                 Download the files of the set1 series into the working
                 directory ‘project/set/set1’, and download the files of
                 the set2 series into the working directory ‘project/data
                 /set2’.

     Note: We have already downloaded the data and uploaded it to the
     ‘set’ folder in the GitHub TIME-ASP repository. You don’t need to
     download it again here if you have cloned the project directly from
     GitHub.
          + h.
            Build Seurat object with [99]GSE125449.
            > rm(list=ls())
            #Clear variables
            > set.seed(123456)
            # Set random seeds
            > library(Seurat)
            > library(ggplot2)
            > library(data.table)
            > samples=list.files('./set')
            > sceList = lapply(samples, function(pro){
            folder=file.path('./set', pro)
            sce=CreateSeuratObject(counts = Read10X(folder),
            project = pro)
            return(sce)
            })
            #file.path(a,b) means the path: a/b
            > sceList <- lapply(sceList, FUN = function(x) {
            x <- NormalizeData(x)
            x <- FindVariableFeatures(x, selection.method = "vst",
            nfeatures = 2000)})
            #Build a list based on two datasets
            > sceall = merge(sceList[[1]], y = sceList[[2]],
            add.cell.ids = samples, merge.data = TRUE)
            # Merge data> sample1 <-
            read.table(gzfile("./set/set1/samples.txt.gz"),sep =
            "\t",header = T)
            #Import sample information from dataset1
            > rownames(sample1) <-sample1$Cell.Barcode
            > sample2 <-
            read.table(gzfile("./set/set2/samples.txt.gz"),sep =
            "\t",header = T)
            #Import sample information from dataset2.
            > rownames(sample2) <-sample2$Cell.Barcode
            > sample <- rbind(sample1,sample2)
            > sceall@meta.data$sample <- sample$Sample
            > sceall@meta.data$Type <- sample$Type
            #Import cell annotation information
            > Idents(sceall) <- sceall@meta.data$sample
            > Sc_data <-subset(sceall,idents =
            c("S16_P10_LCP18","S02_P01_LCP21","S10_P05_LCP23","S07_P02_LCP
            28","S12_P07_LCP30","S21_P13_LCP37","S15_P09_LCP38","S351_P10_
            LCP34","S364_P21_LCP65"),invert = FALSE)
            #Filter data for specified samples (hepatocellular carcinoma)

     Note: Please be mindful that file names should have the prefixes
     removed. For instance, use ‘barcodes.tsv’, ‘features.tsv’, and
     ‘matrix.mtx’ rather than ‘GSE125449_Set1_barcodes.tsv’,
     ‘GSE125449_Set1_genes.tsv’, and ‘GSE125449_Set1_matrix.mtx’.
            Run single-cell sequencing data preparation program, display
            single-cell classification status ([100]Figure 1).
            > Sc_data[["percent.mt"]] <- PercentageFeatureSet(Sc_data,
            pattern = "ˆMT-")
            > Sc_data <- subset(Sc_data, subset = nFeature_RNA > 200 &
            nFeature_RNA < 2500 & percent.mt < 5)
            > Sc_data <- NormalizeData(Sc_data, normalization.method =
            "RC", scale.factor = 10000)
            > Sc_data <- FindVariableFeatures(Sc_data, selection.method =
            "vst", nfeatures = 2000)
            > all.genes <- rownames(Sc_data)
            > Sc_data <- ScaleData(Sc_data, features = all.genes)
            > Sc_data <- ScaleData(Sc_data, vars.to.regress =
            "percent.mt")
            > Sc_data <- RunPCA(Sc_data, features =
            VariableFeatures(object = Sc_data))
            > Sc_data<- RunTSNE(Sc_data, dims = 1:10)
            > DimPlot(Sc_data, group.by = "Type",reduction =
            "tsne",pt.size=0.5,label = FALSE)
            > ggsave("output/ScType.pdf",width = 4,height = 2.5)

     Note: A file named ‘ScType.pdf’ will be generated in the ‘output’
     folder. If you need to eliminate batch effects from your data, you
     can utilize the Integration method from the Seurat package or the
     harmony[101]^28 package. This step uses data from liver cancer
     patients in the [102]GSE125449 dataset in the GEO database, with
     patient numbers listed as follows: “S16-P10-LCP18”, “S02-P01-LCP21”,
     “S10-P05-LCP23”, “S07-P02-LCP28”, “S12-007-LCP30”, “S21P13-LCP37”,
     “S15-P09-LCP38”, “S351-P10-LCP34”, “S364-P21-LCP65“

Figure 1.

   [103]Figure 1
   [104]Open in a new tab

   Detection of clustering and annotation status in input single-cell
   sequencing data

   This data comes from Bai et al. 2023, Heliyon.

Calculate immune score and tumor purity

     Inline graphic Timing: ∼1 h

   Here, we describe the steps for calculating immune score, stromal
   score, tumor purity score, and estimate score based on the ESTIMATE R
   package.
     * 4.
       Compute the immune score, tumor purity score, stromal score, and
       ESTIMATE score.
       > proj <- Sc_data
       > source("./code/AddModuleScore.r")

     Note: Evaluate the scRNA-seq obtained from the previous step and
     generate a heatmap (illustrating the overall abundance of different
     cell types in this evaluation, [105]Figure 2) and a RidgePlot
     (depicting the proportions of different cell types in this
     evaluation, [106]Figure 3). All raw data of the scoring will be
     saved in the ‘output’ folder and named as ‘estimate_score.csv’.

     Optional: If you encounter an error, please execute the following
     code line by line.
       Set ESTIMATE path.
          + a.
            Run the ESTIMATE R package to score each cell.
            >library("Seurat")
            > library("estimate")
            > DefaultAssay(proj) <- "RNA"
            > myfilterCommonGenes(input.f=
            as.matrix(proj@assays$RNA@data),
            output.f=paste0("matrixgenes.gct"), id="GeneSymbol")
            >
            estimateScore(paste0("matrixgenes.gct"),paste0("estimate_score
            .gct"), platform= "affymetrix")
            > estimate_score <-
            read.table(paste0("estimate_score.gct"),header =
            F,skip=2,stringsAsFactors = FALSE)
            > estimate_score <- t(estimate_score)
            > rownames(estimate_score) <- estimate_score[,1]
            > colnames(estimate_score) <- estimate_score[2,]
            > estimate_score <- estimate_score[-c(1:2),]
            > estimate_score[1:4,1:4]
            > estimate_score <- as.data.frame(estimate_score)
            > proj@meta.data$Stromal <-
            as.numeric(estimate_score$StromalScore)
            > proj@meta.data$Immune <-
            as.numeric(estimate_score$ImmuneScore)
            > proj@meta.data$TumorPurity <-
            as.numeric(estimate_score$TumorPurity)
            > proj@meta.data$ESTIMATE <-
            as.numeric(estimate_score$ESTIMATEScore)
          + b.
            View the score distribution of different types of cancer
            cells.
            > p1<- RidgePlot(proj,features = "Stromal",group.by="Type")
            > p2<- RidgePlot(proj,features = "Immune",group.by="Type")
            > p3<- RidgePlot(proj,features =
            "TumorPurity",group.by="Type")
            > p4<- RidgePlot(proj,features = "ESTIMATE",group.by="Type")
            > library(gridExtra)
            pdf(file= "./output/RidgePlot.pdf",height=4, width=10 )
            > grid.arrange(p1,p2,p3,p4,ncol = 2 )
            > dev.off()
          + c.
            View the score distribution for each cancer cell.
            > mydata1<- FetchData(proj,vars =
            c("tSNE_1","tSNE_2","Stromal","Immune","TumorPurity","ESTIMATE
            "))
            > a1 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour =
            TumorPurity))+
            geom_point(size = 1)+scale_color_gradientn(values =
            seq(0,1,0.2),colours =
            c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74
            120C"))+theme_bw()
            > a2 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour =
            Stromal))+
            geom_point(size = 1)+scale_color_gradientn(values =
            seq(0,1,0.2), colours =
            c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74
            120C"))+theme_bw()
            > a3 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour =
            Immune))+
            geom_point(size = 1)+scale_color_gradientn(values =
            seq(0,1,0.2), colours =
            c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74
            120C"))+theme_bw()
            > a4 <- ggplot(mydata1,aes(x = tSNE_1,y =tSNE_2,colour =
            ESTIMATE))+
            geom_point(size = 1)+scale_color_gradientn(values =
            seq(0,1,0.2), colours =
            c('#0425D4',"#0085FF","#00B9D1","#66D404","#EF6909","red","#74
            120C"))+theme_bw()
            > pdf(file= "./output/HeatMap.pdf",height=4, width=7 )
            grid.arrange(a1,a2,a3,a4,ncol = 2 )
            > dev.off()
            > write.csv(estimate_score,"./output/estimate_score.csv")

     Inline graphic CRITICAL: In this step, you can use the single-cell
     sequencing data of your interest. When executing 'proj <- Sc_data',
     replace 'Sc_data' with the name of the single-cell sequencing data
     you want to analyze. ‘proj@assays $ RNA@data’ represents the gene
     expression matrix within the single-cell sequencing dataset. This
     matrix is crucial for performing immune scoring on cells, please
     make sure to follow the previous prompts correctly. This method is
     based on single sample gene set enrichment analysis (ssGSEA)
     algorithm.[107]^2 This function computes stromal, immune, and
     ESTIMATE scores using gene-level expression data. For Affymetrix
     platform data, tumor purity is derived from ESTIMATE scores by
     applying non-linear squares methods to TCGA Affymetrix expression
     data (n = 995). The meanings of each score are as follows:
            Stromal Score, Numeric scalar specifying the presence of
            stromal cells in tumor tissue. Immune Score, Numeric scalar
            specifying the level of infiltrating immune cells in tumor
            tissue. ESTIMATE Score, Numeric scalar specifying tumor
            cellularity.
            Tumor Purity, Numeric scalar specifying ESTIMATE-based tumor
            purity, with values in the range [0, 1].

Figure 2.

   [108]Figure 2
   [109]Open in a new tab

   Assessment of tumor purity and the immune microenvironment of tumors

   This data comes from Bai et al. 2023, Heliyon.

   (A) Tumor Purity score of scRNA-seq data.

   (B) Stromal score of scRNA-seq data.

   (C) Immune score of scRNA-seq data.

   (D) ESTIMATE score of scRNA-seq data.

Figure 3.

   [110]Figure 3
   [111]Open in a new tab

   Mountain map of the distribution of stromal score, tumor purity score,
   immune score, and ESTIMATE score in various cells

   This data comes from Bai et al. 2023, Heliyon.

   (A) The proportion of Stromal score in various types of cells.

   (B) The proportion of Tumor Purity score in various types of cells.

   (C) The proportion of Immune score in various types of cells.

   (D) The proportion of ESTIMATE score in various types of cells.

Grouping all cells based on immune scores

     Inline graphic Timing: ∼1 h

   Here, we describe the steps of grouping all cancer cells based on
   immune scores. Grouping is performed based on the calculated immune
   scores from the previous step (or based on tumor purity), and the
   FindMarkers() function is used for differential analysis
   ([112]Table S1).
     * 5.
       Identification of key genes related to the tumor immune
       microenvironment.

   >source("./code/TIME.r")

     Note: The first column represents the gene symbol, the second column
     “avg_logFC” indicates the average expression fold change in
     logarithm between the two groups. A positive value indicates higher
     expression of the gene in the first group. The third column “pct.1”
     represents the percentage of cells in which the gene is detected in
     the first group, and the fourth column “pct.2” represents the
     percentage of cells in which the gene is detected in the second
     group. The fifth column “p_val_adj” indicates the adjusted p-value
     based on the Bonferroni correction using all genes in the dataset.

     Optional: If you encounter an error, please execute the following
     code line by line.

   > library(pheatmap)

   > group <- ifelse(proj@meta.data$Immune >=
   median(proj@meta.data$Immune),'Immune_high','Immune_low')

   table(group)

   > proj@meta.data$group <- group

   > diff_dat <-
   FindMarkers(proj,ident.1="Immune_high",ident.2="Immune_low",

            group.by='group')

   > diff1 <- diff_dat[diff_dat$p_val_adj<0.05 &
   abs(diff_dat$avg_log2FC)>=1,]

   > write.csv(diff1,"./output/diff1.csv")

     Note: If you want to use tumor purity as the grouping condition,
     please open the ‘Estimate.r’ file and modify the grouping code as
     follows:

   > group <- ifelse(proj@meta.data$TumorPurity>=median(proj@meta.data$
   TumorPurity),' TumorPurity_high',' TumorPurity_low')

Signal pathway enrichment

     Inline graphic Timing: ∼30 min

   Here, we describe the steps of pathway enrichment using differentially
   expressed genes obtained through immune scoring grouping.
     * 6.
       Enrichment analysis of pathways can be performed using two
       alternative methods at this step.
       Option 1: Perform KEGG pathway enrichment using 'clusterprofiler'.
       Running ‘source(Kegg.r)’ will generate a file named ‘kegg.csv’ in
       the ‘output’ folder, which represents the KEGG pathway enrichment
       results ([113]Table S2).
       > source(Kegg.r)

     Optional: If you encounter an error, please execute the following
     code line by line.
       > library(clusterProfiler)
       > library(GOplot)
       > gene<-bitr(rownames(diff1),fromType = 'SYMBOL',toType =
       'ENTREZID',OrgDb = 'org.Hs.eg.db')
       > KEGG<-enrichKEGG(
        gene$ENTREZID,
        organism = "hsa",minGSSize =5,
        maxGSSize =500,
        keyType = "kegg",
        use_internal_data = FALSE
       )
       > barplot(KEGG,showCategory = 10,title = 'KEGG
       Pathway')+scale_color_gradient(low="green",high = "red")
       > write.csv(KEGG,"./output/kegg.csv")

     Note: If the gene species is not human, please modify the 'organism'
     parameter in the Kegg.r code according to the abbreviation of the
     queried species. Please refer to
     '[114]https://www.genome.jp/kegg/catalog/org_list.html' for the
     abbreviations of the specific species. For detailed explanations of
     other additional parameters, please refer to the usage guide of the
     'clusterProfiler' R package.
       Option 2: Perform KEGG pathway enrichment analysis using the online
       tool David.
          + a.
            Open the official website of David
            ([115]https://david.ncifcrf.gov/).
          + b.
            The “Start Analysis” option requires input of three sets of
            data before beginning the analysis, gene symbols, identifiers,
            and list type.
          + c.
            Paste the gene symbols from the “diff.csv” file obtained in
            Step 5 into the ‘Upload Gene List’ input box.
          + d.
            Select the 'Step2: OFFICAL_GENE_SYMBOL' option in the Select
            Identifier dropdown menu.
          + e.
            Enter ‘Homo sapiens’ in ‘Step 2a: Select species’.
          + f.
            Select the ‘Gene List’ option in the ‘Step 3: List Type’
            dropdown menu.
          + g.
            Click the 'Submit List' button in 'Step 4: Submit List'.
          + h.
            Click on the ‘Functional Annotation Chart’ option on the
            ‘Analysis Wizard’ page.
          + i.
            Click on 'Clear All'.
          + j.
            Click on ‘Pathways’ and then select ‘KEGG_PATHWAY’.
          + k.
            Click on the 'Functional Annotation Chart' option.
          + l.
            Click on the 'Download File' option on the popped-up page,
            then copy all the signal pathways and paste them in Excel as
            Unicode file type.
            Option 3: Perform KEGG pathway enrichment analysis using the
            online tool Enrichr.
          + m.
            Open the official website of Enrichr
            ([116]https://maayanlab.cloud/Enrichr/).
          + n.
            Paste a list of Entrez gene symbols in the text box of the
            Input data option, one per line. Entrez gene symbols can be
            found in the output file [117]Table S1 from step 5.
          + o.
            Click “Submit” and then select the “Pathways” option.
          + p.
            Select “KEGG 2021 Human” and click “table” to download the
            signal pathway enrichment results.
          + q.
            Copy all the signal pathways and paste them in Excel as
            Unicode file type.

     Note: This step is optional and suitable for users who are not
     familiar with the usage of the ‘clusterProfiler’ R package.

Evaluation of signaling pathway enrichment level

     Inline graphic Timing: ∼1 h

   Here, we describe the steps of quantitatively analyzing signal pathways
   using the 'AddModuleScore' function.
     * 7.
       Computational analysis of pathway expression across different cell
       types.
          + a.
            Specify the identifier for the signal pathway that needs to be
            computed. Using ‘Cholesterol metabolism’ as an example, input
            the identifier for Cholesterol metabolism.
            >ID = "hsa04979"

     Note: KEGG ID can be queried from the following two websites:
     [118]https://www.kegg.jp/ Or
     [119]http://lmmd.ecust.edu.cn/netinfer/get_results.php?id=6248
          + b.
            Running the calculation program will generate a file in Seurat
            object format named Inscore, and the calculated scores will be
            named after the first word of the signal pathway, contained
            within the Inscore file.
            > source("./code/Chose_pathway.r")

     Optional: If you encounter an error, please execute the following
     code line by line.
               o i.
                 Calculate the enrichment fraction of the hsa04979
                 signaling pathway.
                 > library(KEGGREST)
                 > library(Seurat)
                 > gsInfo = keggGet(ID)[[1]]
                 > names(gsInfo)
                 > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"),
                 function(x) x[1])
                 > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw),
                 2)])
                 > names(geneSet) = gsInfo$NAME
                 > hsa04979 <- list(geneSet$`Cholesterol metabolism - Homo
                 sapiens (human)`)
                 > Inscore <- AddModuleScore(Sc_data,
                        features = hsa04979,
                        ctrl = 10,
                        name = "hsa04979")
                 > Sc_data@meta.data$hsa04979 <-
                 as.numeric(Inscore$hsa049791)
               o ii.
                 Integrate the calculated scores of the signal pathways
                 into the single-cell sequencing data.
                 > Sc_data@meta.data$hsa04979 <-
                 as.numeric(Inscore$hsa049791)

     Note: The ‘AddModuleScore()’ function defaults to adding a numerical
     sequence after the name, so it is important to add the number ‘1’
     after the pathway name here.
               o iii.
                 Generate a single battery heat map for hsa04979
                 ([120]Figure 4) and create the necessary data for
                 drawing, named ‘mydata1’ here.
                 > library(ggplot2)
                 > mydata1<- FetchData(Sc_data,vars =
                 c("tSNE_1","tSNE_2","hsa04979"))
                 > ggplot(mydata1,aes(x = tSNE_1, y =tSNE_2, colour =
                 hsa04979)) + geom_point(size = 1) +
                 scale_color_gradientn(values = seq(0,1,0.2),colours =
                 c("#0425D4","#0085FF","#00B9D1","#66D404","#EF6909","red"
                 ,"#74120C")) + theme_bw()

     Note: Draw a heatmap of the hsa04979 signaling pathway in
     single-cell sequencing data. You can change the color of the heatmap
     based on the colors parameter, but it is recommended not to modify
     other parameters to maintain consistency with the previous
     single-cell clustering map.
     * 8.
       Display multiple signal pathways in the form of heatmaps, and
       repeat steps 6a–6d. Here, we will use signal pathways such as
       hsa05200, hsa04510, hsa05208, hsa04932, hsa04151, hsa05205,
       hsa04610, hsa00190, hsa04512, hsa04979, hsa04066, and hsa04668 as
       examples to draw a heat map ([121]Figure 5).
       >source("./code/All_pathways.r")

     Optional: If you encounter an error, please execute the following
     code line by line.
          + a.
            Calculate the expression abundance of hsa04979.
            > gsInfo = keggGet('hsa04979')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04979 <- list(geneSet$`Cholesterol metabolism - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04979,
            ctrl = 10, name = "hsa04979")
            > Sc_data@meta.data$hsa04979 <- as.numeric(Inscore$hsa049791)
            #Proteoglycans in cancer
            > gsInfo = keggGet('hsa05205')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa05205 <- list(geneSet$`Proteoglycans in cancer - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa05205,
            ctrl = 10, name = "hsa05205")
            > Sc_data@meta.data$hsa05205 <- as.numeric(Inscore$hsa052051)
          + b.
            Calculate the expression abundance of hsa00190.
            > gsInfo = keggGet('hsa00190')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa00190 <- list(geneSet$`Oxidative phosphorylation - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa00190,
            ctrl = 10, name = "hsa00190")
            > Sc_data@meta.data$hsa00190 <- as.numeric(Inscore$hsa001901)
          + c.
            Calculate the expression abundance of hsa04668.
            > gsInfo = keggGet('hsa04668')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04668 <- list(geneSet$`TNF signaling pathway - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04668,
            ctrl = 10, name = "hsa04668")
            > Sc_data@meta.data$hsa04668 <- as.numeric(Inscore$hsa046681)
          + d.
            Calculate the expression abundance of hsa052081.
            > gsInfo = keggGet('hsa05208')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            names(geneSet) = gsInfo$NAME
            > hsa05208 <- list(geneSet$`Chemical carcinogenesis - reactive
            oxygen species - Homo sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa05208,
            ctrl = 10, name = "hsa05208")
            > Sc_data@meta.data$hsa05208 <- as.numeric(Inscore$hsa052081)
          + e.
            Calculate the expression abundance of hsa05200.
            > gsInfo = keggGet('hsa05200')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa05200 <- list(geneSet$`Pathways in cancer - Homo sapiens
            (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa05200,
            ctrl = 10, name = "hsa05200")
            > Sc_data@meta.data$hsa05200 <- as.numeric(Inscore$hsa052001)
          + f.
            Calculate the expression abundance of hsa04510.
            > gsInfo = keggGet('hsa04510')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04510 <- list(geneSet$`Focal adhesion - Homo sapiens
            (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04510,
            ctrl = 10, name = "hsa04510")
            > Sc_data@meta.data$hsa04510 <- as.numeric(Inscore$hsa045101)
          + g.
            Calculate the expression abundance of hsa04932.
            > gsInfo = keggGet('hsa04932')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04932 <- list(geneSet$`Non-alcoholic fatty liver disease
            - Homo sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04932,
            ctrl = 10, name = "hsa04932")
            > Sc_data@meta.data$hsa04932 <- as.numeric(Inscore$hsa049321)
          + h.
            Calculate the expression abundance of hsa04151.
            > gsInfo = keggGet('hsa04151')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04151 <- list(geneSet$`PI3K-Akt signaling pathway - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04151,
            ctrl = 10, name = "hsa04151")
            > Sc_data@meta.data$hsa04151 <- as.numeric(Inscore$hsa041511)
          + i.
            Calculate the expression abundance of hsa04610.
            > gsInfo = keggGet('hsa04610')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04610 <- list(geneSet$`Complement and coagulation
            cascades - Homo sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04610,
            ctrl = 10, name = "hsa04610")
            > Sc_data@meta.data$hsa04610 <- as.numeric(Inscore$hsa046101)
          + j.
            Calculate the expression abundance of hsa04512.
            > gsInfo = keggGet('hsa04512')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04512 <- list(geneSet$`ECM-receptor interaction - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04512,
            ctrl = 10, name = "hsa04512")
            > Sc_data@meta.data$hsa04512 <- as.numeric(Inscore$hsa045121)
          + k.
            Calculate the expression abundance of hsa04066.
            > gsInfo = keggGet('hsa04066')[[1]]
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04066 <- list(geneSet$`HIF-1 signaling pathway - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04066,
            ctrl = 10, name = "hsa04066")
            > Sc_data@meta.data$hsa04066 <- as.numeric(Inscore$hsa040661)
          + l.
            Organize heat map input data.
            > mydata1<- FetchData(Sc_data,vars =
            c("Type","sample","hsa05200", "hsa04510", "hsa05208",
            "hsa04932", "hsa04151", "hsa05205", "hsa04610", "hsa00190",
            "hsa04512", "hsa04979", "hsa04066", "hsa04668"))
            > ann_col <- data.frame(mydata1[,1:2])
            > ann_col$id <-rownames(ann_col)
            > ann_col <- ann_col[order(ann_col[,1]),]
            > ac <- data.frame(ann_col$Type)
            > rownames(ac) <- rownames(ann_col)
            > colnames(ac) <- c("cell type")
            > ac$sample <- ann_col$sample
            > mydata <- mydata1[row.names(ac),3:14]
            > mydata <- t(mydata)
          + m.
            Draw a signal pathway heatmap.
            > library(pheatmap)
            > pdf(file= "./output/KEGG_HeatMap.pdf",height=5.7, width=5 )
            > pheatmap(mydata, show_rownames = T, show_colnames =
            F,cluster_cols = FALSE, cluster_rows = F,fontsize=6,
            annotation_col = ac)
            > dev.off()

     Note: In this step, the analysis process from a to k remains exactly
     the same, except for the different IDs of the signaling pathways. If
     you need to analyze a specific signaling pathway, simply modify the
     ID of the pathway. For example, if you need to analyze the ‘Adherens
     junction signaling pathway (ID hsa04520):
            > gsInfo = keggGet(‘hsa04520’)[[1]];
            > geneSetRaw = sapply(strsplit(gsInfo$GENE, ";"), function(x)
            x[1])
            > geneSet = list(geneSetRaw[seq(2, length(geneSetRaw), 2)])
            > names(geneSet) = gsInfo$NAME
            > hsa04520 = list(geneSet$`Adherens junction signaling - Homo
            sapiens (human)`)
            > Inscore <- AddModuleScore(Sc_data, features = hsa04520,
            ctrl = 10, name = "hsa0452")
            > Sc_data@meta.data$hsa04520 <- as.numeric(Inscore$hsa045201)#
            It should be noted that the algorithm for this line of code
            will default to adding a sequence number 1 after the ID
            number.

Figure 4.

   [122]Figure 4
   [123]Open in a new tab

   The expression abundance of the hsa04979 signaling pathway

   This data comes from Bai et al. 2023, Heliyon.

Figure 5.

   [124]Figure 5
   [125]Open in a new tab

   Assessment of expression abundance for all signaling pathways

   This data comes from Bai et al. 2023, Heliyon.

Differential analysis of abnormal signaling pathways

     Inline graphic Timing: ∼30 min

   Here, we describe the steps of screening differentially expressed genes
   based on abnormal signaling pathway grouping.
     * 9.
       Calculate differential genes in single-cell sequencing data by
       grouping a certain signaling pathway. For example, using the
       ‘hsa04979’ signaling pathway, all cells are divided into two
       groups: high and low.

   > source("./code/Aps.r")

     Optional: If you encounter an error, please execute the following
     code line by line.

   > group1 <-
   ifelse(Sc_data@meta.data$hsa04979>=median(Sc_data@meta.data$hsa04979),'
   hsa04979_high','hsa04979_low')

   > table(group1)

   > Sc_data@meta.data$group1 <- group1

   > diff_dat1 <-
   FindMarkers(Sc_data,ident.1="hsa04979_high",ident.2="hsa04979_low",

           group.by='group1')

   > diff2 <- diff_dat1[diff_dat1$p_val_adj<0.05 &
   abs(diff_dat1$avg_log2FC)>=1,]

   > write.csv(diff2,"./output/diff2.csv")

     Note: After running this code, a file named ‘diff2.csv’ will be
     generated in the output directory, containing the list of genes with
     |log2FC|>=1 and p value < 0.05 ([126]Table S3). The file format is
     the same as that generated in Step 4. The first column represents
     the gene symbol, the second column ‘avg_logFC’ indicates the average
     expression fold change in logarithm between the two groups. A
     positive value indicates higher expression of the gene in the first
     group. The third column ‘pct.1’ represents the percentage of cells
     in which the gene is detected in the first group, and the fourth
     column ‘pct.2’ represents the percentage of cells in which the gene
     is detected in the second group. The fifth column ‘p_val_adj’
     indicates the adjusted p-value based on the bonferroni correction
     using all genes in the dataset.

Hub gene screening

     Inline graphic Timing: ∼30 min

   Here, we describe the steps for screening hub genes based on
   differential analysis of immune and abnormal signaling pathways.
     * 10.
       Identify and overlap the differential genes associated with the
       tumor immune microenvironment and the abnormal signaling pathways
       for screening ([127]Table S4), and visually represent this
       information using a Venn diagram ([128]Figure 6).

   > source("./code/hub_genes.r")

     Optional: If you encounter an error, please execute the following
     code line by line.

   > diff1 <- read.csv("./output/diff1.csv",header = T,row.names = 1)

   > diff2 <- read.csv("./output/diff2.csv",header = T,row.names = 1)

   > TIME_DEGs <- rownames(diff1)

   > Chose_pathway_DEGs <- rownames(diff2)

   > overlapping_Degs <- intersect(TIME_DEGs,Chose_pathway_DEGs)

   > library(ggvenn)

   > x <- list(TIME_DEGs=TIME_DEGs,Chose_pathway_DEGs=Chose_pathway_DEGs)

   > ggvenn(x,show_percentage = F,text_size = 4,set_name_size =
   3,stroke_color = "white",fill_color = c("#DF8F4499","#00A1D599"),
   set_name_color = c("#DF8F4499","#00A1D599"))

   > ggsave(file= "./output/vn.pdf",height=4, width=4 )

   > write.csv(overlapping_Degs,"output/Hub_genes.csv")

Figure 6.

   [129]Figure 6
   [130]Open in a new tab

   Identification of hub genes associated with the tumor immune
   microenvironment and aberrant signaling pathways

   This data comes from Bai et al. 2023, Heliyon.

Expected outcomes

   [131]Figure 1 presents the cell types that need to be analyzed,
   including cell annotations and clustering information. In order to
   understand the immune microenvironment of various cancer cell types, we
   conducted immunological and tumor purity scoring for all cell types
   ([132]Figure 2), as well as the proportion of each score in different
   cell types ([133]Figure 3). The colors represent the magnitude of the
   different score values, with shades closer to red indicating higher
   scores and shades closer to blue indicating lower scores. Here, we can
   observe that within malignant tumors, the immune score is the lowest,
   while the tumor purity score is the highest, indicating the accuracy of
   our analysis results.

   We observed a remarkably low immune score in malignant tumor cells.
   Therefore, based on this, we grouped all cell types according to their
   immune scores, allowing us to more accurately identify which cells and
   genes have lower immune levels. The results are saved in the
   supplementary material file diff1.csv. At the same time, by conducting
   pathway enrichment analysis on the genes showing significant
   differences in the tumor immune microenvironment, we can identify the
   signaling pathways that are abnormally expressed in malignant cells
   ([134]Figure 5). Subsequently, to further confirm the high expression
   of the identified signaling pathway (hsa04979) in malignant cells, we
   performed pathway scoring for all cells and generated a heatmap. From
   the heatmap, it is clear that the hsa04979 signaling pathway is
   abnormally highly expressed in malignant cells ([135]Figure 4).
   Similarly, by grouping genes according to the hsa04979 signaling
   pathway, we can obtain the genes that impact the hsa04979 signaling
   pathway ([136]Table S3).

   Finally, in order to identify potentially valuable genes for research
   that impact both the tumor immune microenvironment and certain aberrant
   signaling pathways, we obtained the associated hub genes by taking the
   intersection of the two ([137]Figure 4; [138]Table S4).

Quantification and statistical analysis

   We evaluated the immune score and tumor purity using the ESTIMATE R
   package, and calculated differentially expressed genes using the
   ‘FindMarker()’ function in the Seurat R package. These details are
   described in Step 5 of the Step-by-step method section.

Limitations

   This protocol does not have any specific requirements regarding the
   operating system; however, it was developed and tested on a Windows
   system. We cannot guarantee its error-free operation on other systems.
   In theory, it should be able to run smoothly, but there might be
   unforeseen issues on different platforms.

Troubleshooting

   It is crucial to consider the critical statements and notes
   accompanying each step outlined in the method details.

Problem 1

   There are no available input files for Seurat, only the raw data. (NGS
   data or FASTA data) Additional processing of the raw data is required,
   such as quality control, genome alignment, read counting, etc. which
   are not covered in this protocol.

Potential solution

   For quality control of the raw data, FastQC, MultiQC and Trim Galore
   can be used as reference.
     * •
       FastQC:
       [139]https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
     * •
       MultiQC: [140]https://multiqc.info/
     * •
       Trim Galore:
       [141]https://www.bioinformatics.babraham.ac.uk/projects/trim_galore
       /

   For genome alignment and read count statistics, specialized processing
   methods are provided by Cell Ranger (commercial software from 10×
   Genomics), alevin, kallisto|bustools, STARsolo and alevin-fry.
     * •
       Cell Ranger:
       [142]https://support.10xgenomics.com/single-cell-gene-expression/so
       ftware/pipelines/latest/what-is-cell-ranger
     * •
       alevin :[143]https://alevin-fry.readthedocs.io/en/latest/
     * •
       kallisto|bustools
       :[144]https://bustools.github.io/BUS_notebooks_R/velocity.html

Problem 2

   When running code, the output is not expected (i.e., error messages in
   the console or unexpected features in the drawing).

Potential solution

   In order to ensure the proper execution of the code, it is necessary to
   confirm the correctness of syntax, R package version, input data
   format, and the saved path, such as.
     * •
       Commas, quotation marks, parentheses, and other punctuation marks
       must be in English input format.
     * •
       Please ensure that your data is saved in the 'project/data'
       directory.

Problem 3

   Only 10× Genomics analyzed data (barcodes, features, and matrix files)
   were available, but cell types were not annotated.

Potential solution

   You can use software such as SingleR, CellAssign, Celaref, scCATCH,
   etc. for cell annotation. Please refer to ‘[145]before you begin’ and
   ‘[146]dataset selection’.

Problem 4

   There are common cell type annotations, but there are no annotations
   for malignant cell types.

Potential solution

   You can utilize the inferCNV R package for annotation purposes, even in
   the absence of a strong CNV profile. You can directly execute this
   protocol because malignant tumor cells often accompany lower immune
   levels. This protocol is based on immune scoring to group all cells and
   screen for abnormal signaling pathways, thus not impeding the progress
   of the analysis workflow.

Problem 5

   The following error is displayed: Error in
   as.matrix(proj@assays$RNA@data):

   The slot name ‘data’ does not exist in the ‘Assay5’ class object.

Potential solution

   The protocol employs Seurat version 4.2.1, and subsequent versions may
   lead to potential errors. This arises from the merging of two
   single-cell sequencing data in the provided example data. In versions
   beyond 4.2.1, the merging process will generate a single list file,
   while the expression matrices will be separated, inadvertently causing
   errors. Please refer to the instructions in '[147]before you begin'.

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Weiyu Bai
   (925605397@qq.com).

Technical contact

   For technical issues related to this protocol, please contact Weiyu Bai
   (925605397@qq.com).

Materials availability

   The materials, data, software, and code involved in this agreement have
   been listed in the [148]key resources table.

Data and code availability

   The single-cell sequencing data in this protocol can be obtained from
   the following link:
   [149]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125449. The
   raw codes in this protocol can be obtained from the following link:
   [150]https://github.com/weiyubai/TIME_ASP and
   [151]https://zenodo.org/doi/10.5281/zenodo.11000276.

Acknowledgments