Abstract

Background

   Single-cell sequencing approaches allow gene expression to be measured
   at the single-cell level, providing opportunities and challenges to
   study the aetiology of complex diseases, including cancer.

Methods

   Based on single-cell gene and lncRNA expression levels, we proposed a
   computational framework for cell type identification that fully
   considers cell dropout characteristics. First, we defined the dropout
   features of the cells and identified the dropout clusters. Second, we
   constructed a differential co-expression network and identified
   differential modules. Finally, we identified cell types based on the
   differential modules.

Results

   The method was applied to single-cell melanoma data, and eight cell
   types were identified. Enrichment analysis of the candidate cell marker
   genes for the two key cell types showed that both key cell types were
   closely related to the physiological activities of the major
   histocompatibility complex (MHC); one key cell type was associated with
   mitosis-related activities, and the other with pathways related to ten
   diseases.

Conclusions

   Through identification and analysis of key melanoma-related cell types,
   we explored the molecular mechanism of melanoma, providing insight into
   melanoma research. Moreover, the candidate cell markers for the two key
   cell types are potential therapeutic targets for melanoma.

   Keywords: Single-cell sequencing, Melanoma, Cell type, Cell marker,
   lncRNA

Background

   Melanoma is a malignant tumor that develops from melanocytes and is
   considered a multifactorial disease caused by the interaction between
   genetic susceptibility factors and environmental exposure [[35]1,
   [36]2]. Although the incidence of many cancers is declining, the
   incidence of melanoma is increasing [[37]3, [38]4]. The prognosis of
   melanoma is proportionate to the depth of the tumor, which increases
   with time; thus, melanoma must be identified, detected, and treated in
   a timely manner [[39]1]. Schomberg et al. [[40]5] used RNA sequencing
   (RNA-seq) to profile luteolin-induced differentially expressed genes
   (DEGs) in 4 melanoma cell lines and found that luteolin-mediated growth
   inhibition may be mediated in melanoma cells through simultaneous
   action on multiple pathways. Mahata [[41]6] proposed a clustering
   method to explore the subtypes of melanoma and breast cancer. Klinke et
   al. [[42]7] developed an unsupervised feature extraction and selection
   strategy to capture functional plasticity separately tailored to breast
   cancer and melanoma.

   The limitations of bulk RNA-seq data are that the molecular expression
   in a single cell is masked, and the cell heterogeneity in a sample is
   ignored. With the development of single-cell RNA sequencing (scRNA-seq)
   technology, scRNA-seq data analysis has been widely used in the study
   of different biological tissues, revealing the meanings of differential
   gene expression between cells [[43]8–[44]10] and researchers have begun
   to decipher the functional states of cancer cells at the single-cell
   level [[45]11–[46]14]. Various methods related to the life sciences
   have been applied in cancer research and have led to discoveries in
   cancer evolution, metastasis, treatment resistance and the tumor
   microenvironment [[47]15, [48]16].

   Compared with next generation RNA-seq data, there are more noise data
   and more dropouts in scRNA-seq data. There are several reasons for the
   dropout phenomenon [[49]17]. Firstly, transcripts do not exist, so zero
   is an accurate representation of the state of a cell; secondly, the
   depth of sequencing is low, despite the existence of transcripts, it
   has not been reported. Thirdly, as part of library preparation,
   transcripts were not captured or failed to amplify. Some methods were
   proposed for imputing zeros. Lin et al. [[50]18] introduced the
   Clustering through Imputation and Dimensionality Reduction(CIDR), which
   used a novel but very simple implicit imputation approach in a
   principled way in order to mitigate the impact of dropout values in
   scRNA-seq data. van Dijk et al. [[51]19] developed the Markov
   affinity-based graph imputation of cells(MAGIC), to share information
   between similar cells through data diffusion to denoise the cell count
   matrix and fill in missing transcripts. Li et al. [[52]20] introduced
   the scImpute to impute the dropout values in scRNA-seq data. Instead of
   eliminating the influence of dropout values to improve clustering
   accuracy, we attempted to amplify the influence of dropout values to
   explore the molecular mechanisms of melanoma.

   In this study, we fully considered the characteristics of scRNA-seq
   data and identified a variety of cell types in melanoma cancer cells
   and a series of candidate cell markers for various cell types based on
   scRNA-seq gene and lncRNA expression data. Furthermore, by evaluation
   of each cell type, we identified and analyzed the key cell types
   associated with melanoma and then revealed the pathogenesis of
   melanoma, providing new insight into its diagnosis and prognosis.

Methods

   In this paper, considering the characteristics of scRNA-seq data, we
   proposed a framework for cell type identification (Fig. [53]1). The
   framework consists of three parts: identification of dropout clusters,
   construction of the differential co-expression network, and
   identification of cell types and candidate cell markers.

Fig. 1.

   [54]Fig. 1
   [55]Open in a new tab

   Schematic illustration of the scRNA-seq data-based cell type
   identification framework. (I) Identification of dropout clusters.
   First, the dropout information was extracted from the gene expression
   profile, and the dropout feature matrix was constructed. Then, the
   dropout distances between cells were calculated, and DBSCAN was
   performed to identify dropout clusters of cells. (II) Construction of
   the differential co-expression network. We performed differential
   analysis of the dropout rates and molecule expression levels and
   constructed a differential co-expression network based on the
   correlations between differential molecules. (III) Identification of
   cell types and candidate cell markers. We performed MCL on the
   differential co-expression network to identify differential modules. We
   then used these differential modules as new features to identify the
   cell types and further identified candidate cell markers for each cell
   type

Molecular expression datasets

   The expression profiles used in this experiment were extracted from the
   EXP0072 dataset in CancerSEA
   ([56]http://biocc.hrbmu.edu.cn/CancerSEA/goDownload) [[57]21], which
   was collated from the expression files from the GEO dataset
   [58]GSE81383 [[59]22]. scRNA-seq was applied to profile the
   transcriptomes of 307 single cells cultured from three biopsies of
   three different patients, who had BRAF/NRAS wild type, BRAF mutant/NRAS
   wild type and BRAF wild/NRAS mutant type metastatic melanoma. The
   expression profiles contain the expression values for 18,938
   protein-coding genes and 15,626 lncRNAs.

Known melanoma-related biomolecules

   To analyze the correlation between each cell type pair, we collated
   known melanoma-related biomolecules from multiple public databases and
   published research results.

   From the OMIM catalogue ([60]https://omim.org/) [[61]23], the COSMIC
   database ([62]https://cancer.sanger.ac.uk/cosmic) [[63]24] and a
   published study by Bailey et al. [[64]25], we obtained 539, 63 and 24
   known melanoma-related genes, respectively. A total of 580 known
   melanoma-related genes were obtained.

   From the Lnc2Cancer 2.0 ([65]http://www.bio-bigdata.net/lnc2cancer/)
   [[66]26], LncRNADisease v2.0 ([67]http://www.rnanut.net/lncrnadisease/)
   [[68]27] and NONCODE v5.0 ([69]http://www.noncode.org/) [[70]28]
   data-bases, we obtained 24, 2,712 and 14 known melanoma-related
   lncRNAs, respectively. A total of 2,719 known melanoma-related lncRNAs
   were obtained.

   In addition, we obtained a cell marker entity associated with melanoma
   from the CellMarker ([71]http://biocc.hrbmu.edu.cn/CellMarker/)
   [[72]29] database. This entry comprises data from cancer cells in
   peripheral blood and contains four cell marker genes, MITF, MLANA, PMEL
   and TYR. Schelker et al. [[73]30] used scRNA-seq data to identify nine
   main cell types, and the four above mentioned genes (MITF, MLANA, PMEL
   and TYR) were used as cell markers for melanoma cell types.

Data preprocessing

   We used the EXP0072 dataset in CancerSEA [[74]24], which contains the
   expression profiles of 18,938 protein-coding genes and 15,626 lncRNAs
   from 307 melanoma cancer cells.

   First, we filtered the genes and lncRNAs with expression values of
   greater than 0 in fewer than 3 cells or average normalized expression
   values of less than 10^–5, fitted the normal distribution to the genes
   (or lncRNAs) and cells, and removed cells in which significantly few
   genes were detected (FDR < 0.05). The resulting expression profile
   consisted of 307 cells, 15,488 genes and 8,524 lncRNAs.

   Next, feature selection was performed for the genes and lncRNAs with
   non-log-transformed expression data using the M3Drop method in the
   M3Drop package [[75]31].

Feature selection

   Cell clustering is dependent on the selection of genes, and traditional
   methods generally use the most variable genes as features. An important
   characteristic of scRNA-seq data is the high dropout rate, which
   usually accounts for more than half of the values in the expression
   matrix, and M3Drop [[76]31] selects genes based on the dropout
   property.

   M3Drop is based on the Michaelis–Menten equation, which is used to
   represent enzymatic reactions to fit the relationship between the
   average expression value and the dropout rate, as shown in (1):
   [MATH: <mrow><msub><mi>P</mi><mrow><mi
   mathvariant="italic">dropout</mi></mrow></msub><mo>=</mo><mn>1</mn><mo>
   -</mo><mi>S</mi><mo stretchy="false">/</mo><mrow><mo
   stretchy="false">(</mo><mi>K</mi><mo>+</mo><mi>S</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   1

   where S is the average expression level of the gene in all cells, K is
   the Michaelis constant, and P[dropout] is the ratio of the dropout
   value of the gene to the expression of the gene in all cells, i.e., the
   dropout rate of the gene.

   The parameter K in the Michaelis–Menten equation was used to calculate
   the specific K[j] for gene j, and the global KM of all genes was fitted
   by the z-test. Then, the significant genes were selected as the result
   of feature selection by multiple testing corrections. The equation for
   calculating K[j] is shown as (2):
   [MATH: <mrow><msub><mi>K</mi><mi>j</mi></msub><mo>=</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>P</mi><mi>j</mi></msub><mrow></mrow><m
   o>∗</mo><msub><mi>S</mi><mi>j</mi></msub><mo
   stretchy="false">)</mo></mrow><mo stretchy="false">/</mo><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>-</mo><msub><mi>P</mi><mi>j</mi></
   msub><mo stretchy="false">)</mo></mrow></mrow> :MATH]
   2

   where P[j], S[j] and K[j] are the corresponding P[dropout], S and K
   values for gene j in (1).

   Identification of dropout clusters.

   To fully explore the dropout information in the single-cell expression
   data, we fuzzed specific gene expression values and highlighted dropout
   values. According to whether the gene expression in the cell was a
   dropout value, we binarized the gene expression data. In detail, all
   expression values greater than 0 in the gene expression profile were
   recorded as 1; otherwise, as 0. The matrix generated by binarizing the
   gene expression values was called the dropout feature matrix.

   Next, we defined the dropout distance between cells based on the
   dropout features. There are a large number of zeros in scRNA-seq data,
   so we applied Manhattan distance to measure the distance between cells
   to avoid bias. Firstly, we calculated the Manhattan distance between
   each cell pair and then used the z-score to normalize the Manhattan
   distance to obtain the dropout distance between the cells.

   Then, based on the dropout distance between the cells, we clustered
   cells with density-based spatial application of applications with noise
   (DBSCAN) [[77]32], which can identify clusters with various shapes and
   sizes and effectively identify noise in cells. Before clustering, we
   visualized the data distribution and found it was based on the density
   distribution, and then compared to some other clustering algorithms(for
   example, SC3 and pcaReduce), DBSCAN is sensitive to noisy data while
   SC3 and pcaReduce often mistake noise for true structure, which means
   DBSCAN is more appropriate for our dataset[[78]17]. We then defined
   dropout clusters as cell clusters obtained by cluster analysis based on
   the dropout distance.

   Two hyperparameters should be determined in DBSCAN, one is the field
   radius eps that defines the field range, and the other is the minimum
   field point MinPts required for the sample to be defined as the core
   point. The parameter selection method is as follows:

   Step 1 initialize MinPts as M[i] and calculate the M[i] distance range
   R for all samples. Given R and step size, say 0.001, calculate eps and
   the number of clusters k, and retain the eps that maximizes the
   silhouette coefficient [[79]33] and set it as e[i].

   Step 2 assign e[i] to eps and calculated the maximized the silhouette
   coefficient[[80]33] of MinPts, marked as M[i].

   Step 3 update M[i] and repeat Step 1 and Step 2. The parameters that
   maximized the silhouette coefficient[[81]33] are set to be the input of
   DBSCAN.

Construction of the differential co-expression network

   To analyze the differences among dropout clusters, for each dropout
   cluster, we divided all the cells into two groups—cells belonging to
   the cluster and the remaining cells—and performed differential analysis
   of genes and lncRNAs from two aspects: the dropout rate and molecule
   expression value.

   Differential dropout analysis. We calculated the dropout rate for all
   gene/lncRNA expression values for each cell group. The genes and
   lncRNAs with a difference in the dropout rate of greater than 50%
   between the two groups of cells were defined as differentially dropout
   genes (DDGs) and differentially dropout lncRNAs (DDLRs).

   Differential expression analysis. We calculated the fold changes in the
   gene/lncRNA expression values between the two groups of cells and
   selected genes and lncRNAs with a fold change of greater than two
   (|logFC|> 1, p value < 0.05). Then, we defined these genes and lncRNAs
   as differentially expressed genes (DEGs) and differentially expressed
   lncRNAs (DELRs).

   Herein, we refer to DDGs and DEGs as differential genes, to DDLRs and
   DELRs as differential lncRNAs, and to the collective set of
   differential genes and differential lncRNAs as differential molecules.

   Then, we calculated the Spearman correlation coefficient (SCC) between
   each pair of differential molecules and selected strong correlations
   with |SCC|> 0.4 and p value < 0.05. The differential molecules with
   strong correlations constituted the differential co-expression network.
   The absolute values of the Spearman correlation coefficients were used
   as the edge weights.

Identification of cell types and candidate cell markers

   We used the Markov clustering algorithm (MCL) [[82]34] to cluster the
   differential co-expression network and identify molecule modules, which
   we call differential modules herein. MCL is a graph-based, rapidly
   scalable unsupervised clustering algorithm, and it simulates a random
   flow to discover the communities in the network. Then, we calculated
   the average expression value of every differential module as a new
   feature of the cells. Herein, we call this value the differential
   module feature of cells.

   According to the differential module features, we calculated the
   Manhattan distance between cell pairs and normalized it by the z-score.
   Then, we applied DBSCAN to cluster the cells, and each cluster was
   considered a cell type.

   For each cell type, we calculated the fold change in the expression
   level of each gene and lncRNA between cells in that cell type and the
   remaining cells and selected genes and lncRNAs with a significant
   difference of at least a fourfold change (|logFC|> 2, p value < 0.05)
   as the candidate cell marker genes and candidate cell marker lncRNAs
   for that cell type. Herein, the candidate cell marker genes and lncRNAs
   are collectively referred to as candidate cell markers for a cell type.

Results

Feature selection

   The results of data preprocessing are shown in Fig. [83]2(A) and (B).
   Finally, 3,454 genes and 966 lncRNAs were selected for further
   analysis. We also performed log transformation on the expression data
   with a base of 2 and an offset of 1.

Fig. 2.

   [84]Fig. 2
   [85]Open in a new tab

   Results of feature selection using M3Drop. The solid blue lines
   indicate the Michaelis–Menten curves fit to all genes or lncRNAs, and
   the orange dots indicate the significant genes or lncRNAs with an FDR
   of < 0.01 in the hypothesis test. A Results of feature selection for
   genes. B Results of feature selection for lncRNAs

Identification of dropout clusters and differential molecules

   The dropout distances between every pair of cells were calculated
   according to the dropout feature matrix, and the cells were analyzed
   with DBSCAN [[86]32]. Four cell dropout clusters that individually
   contained 27, 8, 51 and 45 cells. The result of the dropout clusters
   was a transition to get the final cell types.

   Furthermore, differential dropout analysis and differential expression
   analysis were performed on different dropout clusters, and the number
   of differential molecules identified is shown in Table [87]1. Since
   some DEGS, DDGs, DELRs and DDLRs may belong to multiple clusters, Sum
   represents the total number of four clusters’ molecules. More
   differential expressed genes than differential expressed lncRNAs were
   identified, and the differences in the expression levels were generally
   greater than the differences in the dropout rates. Differential
   analysis of all dropout clusters identified 1950 differential expressed
   genes and 209 differential expressed lncRNAs. The number 1950 means the
   total number of DEGs and DELRs after deduplication. Similarly, the
   number 209 means the total number of DDGs and DDLRs after
   deduplication.

Table 1.

   Number of differential molecules in the dropout clusters, including the
   number of differential expressed genes/lncRNAs in each dropout cluster
   and the number of differential molecules in all dropout clusters
   Dropout cluster                      1   2  3    4    Sum
   Differential expressed genes   DEGs  805 6  1185 1067 1940
                                  DDGs  17  46 80   143  263
   Differential expressed lncRNAs DELRs 5   2  119  146  207
                                  DDLRs 0   4  7    7    16
   [88]Open in a new tab

Analysis of the differential co-expression network

   For the 2159 differential expressed molecules (1950 differential
   expressed genes and 209 expressed differential lncRNAs) obtained from
   the differential analysis, the Spearman correlation coefficient between
   each pair of differential molecules was calculated with cut-off
   criteria of |SCC|> 0.4 and p value < 0.05. We obtained 48,940 strong
   correlations among differential expressed molecules, specifically,
   17,908 positive correlations and 31,032 negative correlations. The
   resulting differential co-expression network was an undirected and
   weighted network consisting of 892 nodes and 48,940 edges.

   The differential co-expression network was a scale-free network with a
   power law node degree distribution. The protein-coding gene HLA-DRA was
   a hub node in the network, with a degree of 484. GeneCards [[89]35]
   shows that HLA-DRA is a protein-coding gene whose main function is to
   bind to peptides produced by antigens in the endocytosis of
   antigen-presenting cells (APCs) and display them on the cell surface
   for recognition by CD4+ T cells.

   The edge weights in the differential co-expression network were
   calculated as the absolute values of the Spearman correlation
   coefficients. Among the connected nodes, the protein-coding gene
   PHACTR1 had the strongest correlation with the lncRNA AL008729.2, with
   the Spearman correlation coefficient of 0.96.

Identification of differential modules and cell types

   The differential molecules in the differential co-expression network
   were further divided by MCL with the inflation parameter set at 2.5.
   Twenty differential modules were identified. For each identified
   differential module, we extracted the sub-network from the differential
   co-expression network (Fig. [90]3).

Fig. 3.

   [91]Fig. 3
   [92]Open in a new tab

   Structure of subnetworks of differential modules. Each connected
   network is a differential module, except for the module marked by *,
   which includes a separate node

   Then, we calculated the differential module features of cells and used
   DBSCAN to identify cell types. A total of eight cell types were
   identified and are denoted by the letters A–H. The number of cells of
   each cell type is shown in Table [93]2. Cell type B contained
   significantly more cells than the other seven cell types, each of which
   contained a relatively small number of cells. In addition, 100 cells
   did not belong to any cell type; we then combined all of these cells
   into a distinct cell type named cell type 0.

Table 2.

   Number of cells in each cell type
   Cell type       A  B   C D E  F  G  H
   Number of cells 10 121 8 7 11 22 22 6
   [94]Open in a new tab

Analysis of cell similarity in cell types

   The Spearman correlation coefficient between each pair of cells in the
   same cell type was calculated according to the expression values of
   genes and lncRNAs to indicate the similarity of the expression patterns
   between the two cells. The boxplot of the similarity between cells in
   each cell type is shown in Fig. [95]4.

Fig. 4.

   [96]Fig. 4
   [97]Open in a new tab

   Similarity of molecular expression patterns in each cell type. The
   Spearman correlation coefficients for the gene and lncRNA expression
   levels in all cells in each cell type were calculated, and all
   correlation coefficients were significant

   These results showed that the expression patterns of any two cells in
   the same cell type were positively correlated with significant p
   values. Figure [98]4 indicates that the average similarity of cells in
   all cell types except cell type B was significantly higher than that in
   cell type 0. In particular, in cell types A, C, D, and E, the lowest
   correlation coefficient between two cells was greater than 0.4, and the
   correlation coefficients between all cells in cell types G and H were
   greater than 0.3. In addition, in cell type 0, the correlation
   coefficients between cells had a large span and a low average value,
   consistent with the experimental results indicating that the cells did
   not belong to the same cell type.

Differential analysis between cell types

   Furthermore, the R package Limma [[99]36] was used to calculate the
   fold changes in expression levels, and the significantly differential
   genes and lncRNAs with at least a fourfold change in expression were
   selected as candidate cell markers. The number of candidate cell
   markers obtained are shown in Table [100]3. More than 200 candidate
   cell markers were identified for cell types A, E, G, and H, indicating
   significant differences between cells in these cell types and other
   cell types.

Table 3.

   Number of candidate cell markers for each cell type, including
   candidate cell marker genes and candidate cell marker lncRNAs
   Cell type                               0 A   B  C  D   E   F  G   H
   Number of candidate cell marker genes   0 213 79 20 96  207 92 216 224
   Number of candidate cell marker lncRNAs 0 23  5  7  15  37  0  6   38
   Number of candidate cell markers        0 236 84 27 111 244 92 222 262
   [101]Open in a new tab

   We analyzed regulation directions of the candidate cell markers for
   each cell type. In cell types A, C, and D, all candidate cell markers
   were upregulated (logFC > 0). In cell type E, 3 candidate cell marker
   genes were downregulated (logFC < 0), and the remaining candidate cell
   markers were upregulated. In cell type F, only one candidate cell
   marker gene was upregulated, and the rest were downregulated. Candidate
   cell markers in other cell types were upregulated and downregulated in
   different patterns.

   In addition, in cell type 0, no significant similarity was observed
   among the expression patterns of the cells, and no candidate cell
   markers were found, which verified the reliability of the results for
   the eight identified cell types.

Prediction of candidate cell markers in melanoma

   To verify the relationships between the cell types and melanoma, we
   collated 580 known melanoma-related genes and 2719 known
   melanoma-related lncRNAs from multiple databases and published research
   results and compared them with candidate cell markers in each cell type
   (see the Materials and Methods for details). The results are shown in
   Fig. [102]5.

Fig. 5.

   [103]Fig. 5
   [104]Open in a new tab

   Relationships between candidate cell markers for each cell type and
   known melanoma-related biomolecules. A Relationships between candidate
   cell marker genes and known melanoma-related genes, including the
   numbers and percentages of known melanoma-related genes. B
   Relationships between candidate cell marker lncRNAs and known
   melanoma-related lncRNAs, including the numbers and percentages of
   known melanoma-related lncRNAs

   Figure [105]5(A) indicates that except for cell type C, which had only
   20 candidate cell marker genes, all cell types had known
   melanoma-related genes among the candidate cell marker genes. The
   candidate cell markers for cell types H and G included 29 and 28 known
   melanoma-related genes, respectively. In addition, in cell types B, G,
   and H, known melanoma-related genes accounted for more than 10% of all
   candidate cell marker genes.

   Figure [106]5(B) indicates that 9, 8, and 5 candidate cell marker
   lncRNAs in cell types E, H, and A, respectively, were known
   melanoma-related lncRNAs and that known melanoma-related lncRNAs
   accounted for the largest percentage of candidate cell marker lncRNAs
   in cell type G (33.33%).

   The above analysis of the correlations between eight cell types and
   melanoma indicated that candidate cell markers in cell types A, B, E,
   G, and H, especially cell types G and H, were strongly related to
   melanoma.

Analysis of known melanoma cell markers

   Four known melanoma cell marker genes, MITF, MLANA, PMEL and TYR, were
   obtained from CellMarker [[107]29]. Then, we investigated whether these
   four known melanoma cell markers appeared among the candidate cell
   markers for each cell type.

   All four known melanoma cell markers were candidate cell marker genes
   for cell type H, and MLANA, PMEL, and TYR were candidate cell marker
   genes for cell type G. In addition, the protein-coding gene PMEL was a
   candidate cell marker gene for five cell types (all cell types except
   C, D, and F). PMEL plays a central role in the biogenesis of
   melanosomes and participates in the maturation of melanosomes from
   stage I to stage II [[108]37, [109]38]. According to GeneCards, PMEL is
   associated with the incidence of various melanomas, such as skin
   melanoma, gallbladder melanoma, and melanoma in congenital melanocytic
   nevus.

   PMEL, also known as premelanosome protein gene, participates in the
   maturation process of melanosomes from phase I to phase II, and plays a
   central role in melanogenesis [[110]37, [111]38]. MITF plays a role in
   multiple activity levels that determine the fate of melanoma cells.
   Melanoma cells that highly express MITF can differentiate or
   proliferate. Stem cell-like or invasive potential can cause low MITF
   activity. And long-term MITF inhibition will drive cell senescence
   [[112]39]. MITF up- or down-regulation modulates MLANA expression in
   parallel directions at both mRNA and protein levels. As a target gene
   for melanocyte restriction, MLANA may provide an opportunity to study
   whether their melanocyte restriction expression is produced by the
   unique activity of the MITF melanocyte isotype, or whether other
   transcription factors may contribute (together with MITF) give
   melanocyte-specific expression [[113]40]. TYR, TYRP1 and downstream
   enzymes metabolize tyrosine to melanin [[114]41].

   These results showed that the candidate cell markers for cell types G
   and H were closely related to the known melanoma cell markers and that
   other unknown candidate cell markers in these two cell types could be
   potential driver biomolecules for melanoma.

Enrichment analysis of key cell types

   The previous analysis of the eight cell types indicated that cell types
   G and H were highly correlated with melanoma and that the cells
   belonging to these cell types had similar expression patterns and were
   significantly different from the cells not belonging to these cell
   types. We defined these two cell types as the key cell types associated
   with melanoma and further used the R package clusterProfiler [[115]42]
   to perform Gene Ontology (GO) functional and Kyoto Encyclopedia of
   Genes and Genomes (KEGG) pathway enrichment analyses of their candidate
   cell marker genes to explore the pathogenesis of melanoma.

   We performed GO functional enrichment analysis, including analysis of
   biological process (BP), molecular function (MF) and cellular component
   (CC) terms, and focused on GO terms with an adjusted p-value (p.adjust)
   of < 0.01. We obtained 113 and 95 GO terms related to cell types G and
   H, respectively; the top three GO terms in the BP, MF and CC aspects
   are shown in Tables [116]4 and [117]5. Comparison of the results
   revealed that cell type G was related to the mitotic process of cells,
   while cell type H was more strongly related to activities of the major
   histocompatibility complex (MHC).

Table 4.

   Gene Ontology (GO) functional enrichment analysis of candidate cell
   marker genes for cell type G. The top 3 GO terms in the biological
   process (BP), molecular function (MF) and cellular component (CC)
   aspects are listed (p.adjust < 0.01)
   Aspect GO ID      Descriptions                         p.adjust Count
          GO:0140014 Mitotic nuclear division             7.11E−06 17
   BP     GO:0048285 Organelle fission                    2.83E−05 20
          GO:0000280 Nuclear division                     2.83E−05 19
          GO:0023026 MHC class II protein complex binding 0.001115 4
   MF     GO:0023023 MHC protein complex binding          0.005092 4
          GO:0050786 RAGE receptor binding                0.006432 3
          GO:0042613 MHC class II protein complex         1.47E−05 6
   CC     GO:0000793 Condensed chromosome                 2.83E−05 14
          GO:0098687 Chromosomal region                   3.37E−05 17
   [118]Open in a new tab

Table 5.

   Gene Ontology (GO) functional enrichment analysis of candidate cell
   marker genes for cell type H. The top 3 GO terms in the biological
   process (BP), molecular function (MF) and cellular component (CC)
   aspects are listed (p.adjust < 0.01)
   Aspect GO ID Descriptions p.adjust Count
   GO:0140014 Mitotic nuclear division 7.59E−06 17
   BP GO:0019886 Antigen processing and presentation of exogenous peptide
   antigen via MHC class II 7.83E−06 11
   GO:0002495 Antigen processing and presentation of peptide antigen via
   MHC class II 8.00E−06 11
   GO:0023026 MHC class II protein complex binding 0.000107 5
   MF GO:0023023 MHC protein complex binding 0.000724 5
   GO:0042605 Peptide antigen binding 0.001458 5
   GO:0042613 MHC class II protein complex 6.05E−07 7
   CC GO:0042611 MHC protein complex 7.81E−06 7
   GO:0030669 Clathrin-coated endocytic vesicle membrane 1.06E−05 8
   [119]Open in a new tab

   In addition, some enriched GO terms in the two key cell types were
   associated with melanin and melanosomes (p.adjust < 0.05), as shown in
   Table [120]6, supporting the reliability of the identification of the
   two key cell types.

Table 6.

   The GO terms related to melanin and melanosomes in the GO enrichment
   (p.adjust < 0.05). "–" means that this term did not appear in the
   enrichment results for this cell type
   Aspects GO ID Descriptions p.adjust in cell type G p.adjust in cell
   type H
   BP GO:0042438 Melanin biosynthetic process 2.83E−05 0.000325
   BP GO:0006582 Melanin metabolic process 2.83E−05 0.000404
   CC GO:0033162 Melanosome membrane 4.27E−05 0.001092
   CC GO:0042470 Melanosome 0.000143 0.001458
   BP GO:0030318 Melanocyte differentiation 0.005577 0.049285
   BP GO:0032402 Melanosome transport 0.03374 –
   BP GO:0032401 Establishment of melanosome localization 0.03712 –
   BP GO:0032400 Melanosome localization 0.044045 –
   BP GO:0032438 Melanosome organization 0.044045 0.047491
   [121]Open in a new tab

   KEGG pathway enrichment analysis of candidate cell marker genes in the
   two cell types are shown in Fig. [122]6(A) and (B). A total of 12 and
   18 significantly enriched pathways (p.adjust < 0.01) were related to
   cell types G and H, respectively.

Fig. 6.

   [123]Fig. 6
   [124]Open in a new tab

   KEGG pathway enrichment analysis of candidate cell marker genes for the
   two key cell types (p.adjust < 0.01). A Cell type G. B Cell type H

   The two cell types shared 10 enrichment pathways, many of which were
   related to immune response processes, such as antigen processing and
   presentation, allograft rejection, and autoimmune thyroid disease. Cell
   type G was related to DNA replication and mismatch repair pathways.
   Cell type H was related to melanogenesis and ten diseases, namely,
   graft-versus-host disease, rheumatoid arthritis, type I diabetes
   mellitus, asthma, leishmaniosis, viral myocarditis, autoimmune thyroid
   disease, hypertrophic cardiomyopathy, malaria, and dilated
   cardiomyopathy.

   The functional and pathway enrichment analysis indicated that the two
   key cell types were strongly related to the biological activities of
   melanin and melanosomes. In addition, candidate cell marker genes for
   cell type G were significantly enriched in mitosis-related biological
   activities, and cell type H was associated with the occurrence of ten
   diseases.

Discussion

   scRNA-seq technology allows researchers to study biomolecules at the
   single-cell level so that the molecular mechanisms of some complex
   diseases, such as cancer, can be studied and analyzed at a single-cell
   resolution.

   In this paper, considering the characteristics of the scRNA-seq data,
   we proposed a framework for cell type identification and applied it to
   a single-cell melanoma dataset. Two key cell types related to melanoma
   were identified, and the molecular mechanisms of melanoma were analyzed
   at the single-cell level.

   First, making full use of the dropout information in the gene
   expression data, we identified four different dropout clusters and
   found that the expression levels of the protein-coding gene REXO2
   differed significantly among the four dropout clusters.

   Then, MCL was performed on the differential co-expression network, and
   20 differential modules were identified. Then, eight cell types were
   identified by using the differential modules as new cell features. Our
   analysis identified strong correlations among cells in each cell type,
   with similar expression patterns, and revealed significant differences
   among cells of different cell types. In addition, we found that each
   cell type showed a different extent of association with melanoma.

   Finally, we defined cell types G and H as the key cell types associated
   with melanoma and found that both of these key cell types were related
   to melanosomes and melanin and were highly correlated with the
   biological activities of MHC molecules. In addition, cell type G was
   related to cell mitosis, and cell type H was related to ten diseases.

   In summary, by identifying cell types of melanoma cancer cells and
   further analyzing all cell types, we distinguished two key cell types
   that are highly related to melanoma, providing a key insight for the
   future direction of melanoma research. In addition, candidate cell
   markers for the two key cell types can be focused on as potential
   therapeutic targets for melanoma. Furthermore, the computational
   framework proposed in this paper is not limited to melanoma and can be
   extended to the pathological study of other cancers or complex
   diseases.

Conclusion

   We proposed a computational framework for cell type identification that
   fully considers cell dropout characteristics. This method was applied
   to single-cell RNA-seq data of melanoma, and eight cell types were
   identified. Enrichment analysis of the candidate cell marker genes for
   the two key cell types showed that both key cell types were closely
   related to the physiological activities of the MHC. Through
   identification and analysis of key melanoma-related cell types, we
   explored the molecular mechanism of melanoma, providing insight into
   melanoma research. Moreover, the candidate cell markers for the two key
   cell types are potential therapeutic targets for melanoma.

Acknowledgements