Graphical abstract

   graphic file with name fx1.jpg
   [49]Open in a new tab

Highlights

     * •
       Includes an intuitive graphical user interface for interactive
       analysis of scRNA-seq data
     * •
       Allows non-computational users to analyze scRNA-seq data with
       end-to-end workflows
     * •
       Provides interoperability between tools across different
       programming environments
     * •
       Produces HTML reports for reproducibility and easy sharing of
       results

The bigger picture

   Single-cell data can be used to understand complex biological systems.
   However, many single-cell analysis tools can only be used by trained
   computational biologists and are scattered across different programming
   languages. The Single-Cell Toolkit (SCTK) is a software package that
   brings together many different tools in one place and allows
   non-computational users to analyze their own data using a graphical
   user interface. Overall, SCTK gives computational and non-computational
   researchers the ability to access a wide variety of single-cell tools
   to perform complex analysis workflows.
     __________________________________________________________________

   The Single Cell Toolkit (SCTK) is a software package that gives
   computational and non-computational researchers the ability to utilize
   a wide variety of tools and complex workflows for single-cell analysis.

Introduction

   Single-cell RNA sequencing (scRNA-seq) is a molecular assay that can
   quantify the levels of mRNA transcripts for each gene in individual
   cells. This approach can be used to gain insights into cellular
   heterogeneity not previously possible with “bulk” transcriptomic
   assays.[50]^1^,[51]^2 Profiling the transcriptome of individual cells
   has revealed novel cell subpopulations in normal tissues and cell
   states associated with the pathogenesis of complex diseases.[52]^3 A
   large number of tools and software packages are available to perform
   different steps of scRNA-seq data analysis. However, these tools are
   spread across different programming environments and rely on different
   data structures for input of data or output of results. As the
   interoperability for tools between platforms is lacking, users
   generally have to choose a single analysis workflow or spend
   considerable effort manually converting data between environments
   running different tools and integrating results.[53]^4 Moreover, many
   researchers without strong computational backgrounds are generating
   scRNA-seq data but do not have the necessary training for analysis and
   interpretation.

   Currently, there are limited options for frameworks that allow for
   interoperability of tools across environments and contain a graphical
   user interface (GUI) for non-computational users to perform flexible
   end-to-end analysis.[54]^5^,[55]^6^,[56]^7^,[57]^8^,[58]^9 While some
   web applications are available for the analysis of scRNA-seq data,
   there are no online tools that can import data from a variety of
   formats, perform comprehensive quality control and filtering, run
   flexible clustering and trajectory workflows, and apply a series of
   downstream analysis and visualization tools within an interactive
   interface amiable to users without a strong programming background. We
   have previously developed the Single Cell Toolkit (SCTK), which is
   implemented in the R/Bioconductor package singleCellTK for
   comprehensive importing and quality control of scRNA-seq data.[59]^10
   SCTK2 greatly expands our previous package[60]^10 with a variety of
   tools for normalization, integration, dimensionality reduction,
   clustering, trajectory analysis, cell-type labeling, pathway scoring,
   and visualization. SCTK2 facilitates interoperability between workflows
   and programming environments by integrating tools from Seurat,
   Bioconductor, and the Python-based Scanpy package. All of the
   end-to-end analysis workflows are accessible using a “point-and-click”
   GUI powered by a Shiny app to enable users without programming skills
   to analyze their own data. The R/Shiny app is available online at
   [61]https://sctk.bu.edu and gives users the ability to analyze their
   own data without access to substantial computational resources. When
   compared with existing tools, the SCTK2 framework offers more options
   for analysis, interactive visualization, and generation of HTML reports
   for reproducibility.

Results

Overview of the general framework

   singleCellTK (SCTK) is an R package that provides a uniform interface
   to popular scRNA-seq tools and workflows for quality control,
   clustering or trajectory analysis, and visualization. SCTK2 gives users
   the opportunity to seamlessly run different tools from different
   packages and environments during different stages of the analysis.
   Tools can be run by computational users in the R[62]^11 console, by
   non-computational users with an interactive GUI developed in
   R/Shiny,[63]^12 or with HTML reports generated with Rmarkdown. SCTK2
   utilizes multiple Bioconductor Experiment objects such as the
   SingleCellExperiment (SCE) as the primary data container for storing
   expression matrices, reduced dimensional representations, cell and
   feature annotations, and other tool outputs.[64]^13^,[65]^14

Flexible and comprehensive workflows for scRNA-seq analysis

   The major steps of the SCTK2 workflow can be divided into three major
   components: (1) importing, quality control, and filtering, (2)
   normalization, dimensionality reduction, and clustering, and (3)
   various downstream analyses and visualizations for exploring biological
   patterns of the cell clusters ([66]Figure 1). For the first component,
   we have included the ability to import data from 11 different
   preprocessing tools or file formats including Seurat objects or SCE
   objects from R and AnnData objects from Python. SCTK2 generates
   standard quality control (QC) metrics such as the total number of
   counts, the features detected per cell, or the mitochondrial percentage
   using the scater R package.[67]^15 Doublet detection can be performed
   with 3 different tools from R and 1 tool from Python. Ambient RNA
   quantification and removal can be performed with DecontX[68]^16 or
   SoupX.[69]^17 For filtering, users can choose to exclude cells or genes
   based on one or a combination of QC metrics produced by the various QC
   tools.

Figure 1.

   [70]Figure 1
   [71]Open in a new tab

   Overview of analysis workflows available in SCTK2.

   Analysis of scRNA-seq data can be divided into three major parts:
   importing and quality control (QC), clustering workflows, and
   downstream analysis. For importing and QC (top), SCTK2 can import data
   from many different upstream preprocessing tools and formats. A variety
   of metrics for general QC, empty drop detection, doublet detection, and
   ambient RNA quantification can be calculated and displayed for each
   sample. For clustering workflows (middle), SCTK2 provides an à la carte
   workflow that allows users to pick and choose different tools at each
   step for normalization, batch correction, or integration,
   dimensionality reduction, and clustering. For downstream analysis
   (bottom), SCTK2 provides access to additional tools and analyses for
   differential expression, cell-type labeling, pathway analysis, and
   trajectory analysis. Overall, the toolkit provides a wide variety of
   methods for each part of the analysis workflow.

   The major steps for the clustering workflows include normalization,
   selection of highly variable genes (HVGs), dimensionality reduction
   such as principal-component analysis (PCA), clustering, and 2D
   embedding such as uniform manifold approximation and projection (UMAP).
   SCTK2 provides an “à la carte” workflow, which allows users to pick and
   choose different tools at each step ([72]Figure 1). Users also have the
   option to perform batch correction or integration after normalization
   with 8 tools including 6 from R and 2 from Python. SCTK2 also provides
   access to several “curated workflows” that allow users to select from
   specific tools or functions in predefined workflows from other packages
   ([73]Figure 2). Curated workflows include those from the
   Seurat[74]^18^,[75]^19^,[76]^20^,[77]^21 and celda[78]^22 packages in R
   and the scanpy[79]^23 package in Python. All three curated workflows
   can be used to cluster cells and produce 2D embeddings, whereas celda
   can also be used to cluster genes into co-expression modules.

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Overview of curated analysis workflows

   In addition to the à la carte clustering workflow, SCTK2 provides
   access to workflows from the R packages Seurat and celda as well as the
   Python package scanpy. Users can recapitulate the analysis, results,
   and plots from each package all while using the common and unified
   interface in SCTK2 without having to know the underlying commands from
   each package. Functions for normalization, variable feature selection,
   dimensionality reduction, and clustering are available from the Seurat
   and scanpy workflows. Celda can be used to group cells into clusters
   and genes into modules.

   Downstream analyses after clustering include finding markers for cell
   clusters using differential expression (DE); DE analysis between
   user-specified conditions; automated cell-type labeling with
   SingleR[82]^24; pathway enrichment analysis with gene set variation
   analysis (GSVA),[83]^25 variance-adjusted Mahalanobis (VAM),[84]^26 or
   Enrichr[85]^27^,[86]^28; and trajectory analysis with TSCAN[87]^29
   ([88]Figure 1). All of these analyses can be applied after the à la
   carte or curated workflows. DE analysis can be performed with the
   Wilcoxon rank-sum test, MAST,[89]^30 limma,[90]^31 ANOVA, or
   DESeq2[91]^32 and visualized with heatmaps or volcano plots. The
   expression of individual genes can be displayed on 2D embeddings,
   violin plots, or boxplots. Finally, results from SCTK2 can be exported
   as flat text files (e.g., MTX, TXT, CSV), an SCE object, a Seurat
   object, or an AnnData[92]^23^,[93]^33 object to allow for further
   analysis and integration with other tools.

Interactive analysis with the SCTK2 GUI

   Users without a strong programming background can analyze scRNA-seq
   data with the interactive GUI built with Shiny and available at
   [94]https://sctk.bu.edu ([95]Figure 3A). The major steps in the à la
   carte analysis are accessible via the menus in the top navigation bar.
   Within each major section, parameters to run tools can be selected in
   the left panel, and results will be displayed in the right panel. Many
   plots can be customized with additional options such as the choice of
   the embedding in a scatterplot or choosing to color the points by a
   particular metric or label. A “next steps” panel provides a
   “wizard”-like guide by suggesting links to the most common next steps.
   The curated workflows can be run using a “vertical tab” walkthrough
   format ([96]Figure 3B). SCTK2 also has a general visualization tab
   called the “Cell Viewer,” which supports functionality to generate and
   visualize custom scatterplots, bar plots, and violin plots for
   user-selected genes or gene sets. Additionally, a generic heatmap
   plotting tab can be used to visualize the expression levels of multiple
   features from an expression matrix along with a variety of cell or
   feature annotations. The majority of plots are made interactive with
   the plotly[97]^34 package and can be highlighted, cropped, zoomed in
   on, and saved in various formats.

Figure 3.

   [98]Figure 3
   [99]Open in a new tab

   Interactive analysis of scRNA-seq data with a graphical user interface
   (GUI)

   SCTK2 allows non-computational users to analyze scRNA-seq data using an
   interactive GUI built with R/Shiny that can be hosted on a web server.

   (A) (1) The menu bar allows the users to navigate through the main
   sections including data importing, QC, the à la carte clustering
   workflow, and downstream analysis. (2) Within each major section,
   parameters to run tools can be selected in the left panel. (3) Results
   and plots will be displayed in the right panel. (4) Many plots can be
   customized with additional options such as changing the color of points
   to reflect different phenotypes. (5) A “next steps” panel provides a
   “wizard”-like guide by suggesting links to the recommended next steps.

   (B) The curated workflows for Seurat, celda, and scanpy can be used to
   run a series of predefined steps using vertical tabs. (1) Curated
   workflows can be selected from the top navigation menu bar. The Seurat
   curated workflow is shown as an example. (2) Steps for normalization,
   feature selection, dimensionality reduction, clustering, 2D embedding,
   and finding markers can be selected and run using the vertical tabs.
   (3) Within each major section, parameters to run tools can be selected
   in the left panel, and (4) results and plots will be displayed in the
   right panel. (5) Within the Seurat curated workflow, an extra section
   is given for exploring expression of features using UMAPs, heatmaps,
   and violin plots.

Reproducible and sharable analysis with HTML reports

   SCTK2 can generate HTML reports for QC tools, DE results, differential
   abundance (DA) results, and the curated workflows. These reporting
   tools can be used to plot and share a previously run analysis or start
   a new analysis workflow de novo with user-specified parameters. The
   output of these functions is a comprehensive HTML report that describes
   the input data, run parameters, and results with the standard
   visualizations. These reports provide reproducibility and offer a quick
   and easy way to explore and share the results of an individual analysis
   or whole workflow. For example, the “Seurat run” and “Seurat results”
   reports allow users to recapitulate the entire Seurat curated workflow
   ([100]Figure 4). Users can select and review each step of the workflow
   using the content menu on the left. Each section contains a description
   for that step of the workflow along with the plot or results that were
   produced by that step. Code used to produce the plots can also be
   viewed. For example, the “clustering” section shows different choices
   of the “resolution” parameter in different tabs to allow users to
   explore different sets of cluster labels. The “Seurat results” section
   allows users to view heatmaps and UMAPs of the marker genes for each
   cluster. These reports can be generated with a single command and thus
   streamline the process of generating sharable figures and tables along
   with descriptions.

Figure 4.

   [101]Figure 4
   [102]Open in a new tab

   Facilitating reproducibility and sharing of results with HTML reports

   SCTK2 provides the ability to generate HTML reports for several
   individual analyses or entire workflows to enable reproducibility and
   facilitate sharing of results. An HTML report for clustering of PMBC
   data with Seurat is shown as an example. (1) Different steps that were
   run in the workflow can be selected with the content menu on the left
   of the report. (2) In each section, a description of the step or tool
   and the selected parameters are shown at the top, and (3) the code used
   to produce the plot can be expanded. (4) The results and plots are
   shown on the right side. The “clustering” section shows different
   choices of the “resolution” parameter in different tabs to allow users
   to easily explore different sets of cluster labels.

Benchmarking

   We benchmarked the ability of the SCTK2 to analyze four datasets of
   different sizes. Two datasets of peripheral blood mononuclear cells
   (PBMCs) were obtained from 10× Genomics that contained 5,419 (pbmc6k)
   and 68,579 cells (pbmc68k). Two more datasets of immune cells were
   obtained from the “1M Immune Cells” project from the Human Cell Atlas
   that contained 100,000 (immune100k) and 300,000 cells (immune300k). The
   workflow consisted of the following steps: importing data from sparse
   matrix files, generating QC metrics, filtering, normalization, variable
   feature selection, dimension reduction, 2D embedding, clustering, and
   marker detection. We recorded the RAM usage for the SCE object after
   each step ([103]Figure 5A), the peak RAM allocation that was used
   during each step ([104]Figure 5B), and the time elapsed during each
   step ([105]Figure 5C). The largest RAM usage for the SCE object was
   6.23 GB and occurred after the marker detection step for the largest
   dataset. The largest peak RAM usage was 16.65 GB and occurred during
   the importing step of the largest dataset (16.65 GB). The longest time
   elapsed was 80.46 min and was contributed by the marker detection step
   for the largest dataset. These results demonstrate that the SCTK2 GUI
   deployed on a server with typical memory availability (e.g., 64 GB) can
   be used to analyze many standard single-cell datasets for several users
   at a time.

Figure 5.

   [106]Figure 5
   [107]Open in a new tab

   Benchmarking of RAM and CPU usage for datasets of different sizes

   RAM allocation and elapsed time was benchmarked for four datasets
   (pbmc6k, pbmc68k, immune100k, and immune300k) using a
   Bioconductor-based analysis workflow.

   (A) The RAM usage for the output SCE object after each step is shown
   for each dataset.

   (B) The peak RAM usage during each step is displayed for each dataset.
   (C) The time elapsed during each step is displayed for each dataset.
   The left part zooms in on the y axis of the right part.

Comparison with other tools with GUI for scRNA-seq analysis

   Some other tools and packages are available that provide a GUI to
   scRNA-seq data analysis. We compared the availability of supported
   methods between SCTK2 and Pegasus,[108]^5 ASAP,[109]^6^,[110]^7
   BingleSeq,[111]^8 and CReSCENT[112]^9 ([113]Table S1). Generally, SCTK2
   supports more methods and options for the various stages of a typical
   scRNA-seq analysis. Particularly, SCTK2 has more options for importing
   from different data sources and supports more QC algorithms. Similar to
   SCTK2, several methods and workflows are available in Pegasus. However,
   the GUI in Pegasus is only available via Jupiter Notebooks in the Terra
   Cloud platform, and non-computational users need to have access to a
   Cloud account and a Terra workspace before they can fully utilize this
   tool. Options for ASAP that are not in SCTK2 include voom and DESeq2
   for normalization, M3Drop for variable feature detection, and Seurat
   Leiden, hierarchical, and SC3 methods for clustering. BingleSeq has
   Monocle for trajectory analysis and dot plots for visualization.
   Lastly, CReSCENT has dot plots for visualization. With respect to
   trajectory analyses, SCTK2 uses TSCAN, while Pegasus supports diffusion
   maps and BingleSeq includes Monocle.

Discussion

   SCTK2 provides an intuitive and easy-to-use GUI that integrates a
   variety of widely used methods into a single end-to-end workflow.
   Instead of having to switch between different graphic-based tools or
   learning a programming language to run a method that utilizes specific
   data structures, users can use the “point-and-click” GUI to access
   existing analysis methods for scRNA-seq data. Features available in the
   GUI include the ability to import scRNA-seq data from a variety of
   formats, import and edit annotations for genes and cells; run QC
   analysis and apply filters; and normalization, dimensionality
   reduction, clustering, DE, pathway analysis, trajectory analysis, and
   interactive visualization. The ability to easily generate comprehensive
   HTML reports enables quick sharing between collaborators and
   reproducibility of results. As a large number of tools have been
   developed for scRNA-seq analysis, we prioritized tools that have
   been widely used and have stable code bases in standard repositories.
   We also performed benchmarking using datasets with different numbers of
   cells to demonstrate that our platform can analyze small, medium, and
   large datasets in a reasonable amount of time. In the future, the
   singleCellTK package will be updated to utilize the
   MultiAssayExperiment and ExperimentSubset packages to store and
   manipulate both multimodal data and subsets of existing datasets using
   a single underlying object and from the same interactive interface.
   Overall, these features make SCTK2 a convenient toolkit for the
   analysis of scRNA-seq data regardless of a person’s programming
   background.

Experimental procedures

Comprehensive importing

   SCTK2 enables importing data from the following preprocessing tools:
   CellRanger,[114]^35 Optimus, DropEst,[115]^36
   BUStools,[116]^37^,[117]^38 Seqc,[118]^39 STARSolo,[119]^40^,[120]^41
   and Alevin.[121]^42^,[122]^43 In all cases, SCTK2 parses the standard
   output directory structure from the preprocessing tools and
   automatically identifies the count files to import. These functions
   also support importing of count matrices stored in plain text files
   (e.g., MTX, CSV, and TSV formats), of SCE objects saved in an RDS file,
   and AnnData objects saved in a h5ad file. The Shiny GUI allows users to
   specify the location of files for multiple samples on their local
   device. The data for these samples are uploaded and combined into a
   single SCE object to use across analyses.

QC and filtering

   Performing comprehensive QC is necessary to remove poor-quality cells
   for downstream analysis of scRNA-seq data. Within droplet-based
   scRNA-seq data, droplets containing cells must be differentiated from
   empty droplets. Therefore, assessment of the data is required, for
   which various QC algorithms have been developed. In SCTK2, we support
   EmptyDrops[123]^44 and BarcodeRank[124]^45 tools in R for detection of
   empty droplets. General QC metrics include the total number of counts,
   the number of features detected, and the percentage of mitochondrial
   reads. Tools for doublet detection include Scrublet[125]^46 from Python
   and scDblFinder,[126]^47 cxds, bcds, a hybrid of cxds and bcds from
   scds[127]^48 and doubletFinder[128]^49 from R. Tools for detection and
   removal of ambient RNA include decontX[129]^16 and SoupX.[130]^17 The
   metrics computed from these algorithms can be visualized on a 2D
   embedding or violin plot. Based on these metrics, users can filter the
   cells by selecting an appropriate metric and a cutoff value. The
   filtered data are stored in a separate SCE object and can be utilized
   in all subsequent analyses.

À la carte analysis workflow

   The à la carte analysis workflow includes the main interface and the
   functions of the toolkit that let the users select and pick different
   methods and options for various steps of the analysis workflow
   including normalization, batch correction or integration, feature
   selection, dimensionality reduction, 2D embedding, and clustering.

Normalization

   SCTK2 offers a convenient way to normalize data for downstream analysis
   using a number of methods available through the toolkit. Normalization
   methods available with the toolkit include “LogNormalize,” “CLR,” “RC”
   and “SCTransform” from the Seurat R package, and “logNormCounts” and
   “CPM” from the scater R package. Additional transformation options are
   available for users including “log,” “log1p,” trimming of data assays,
   and Z score scaling.

Batch correction and integration

   SCTK2 provides access to methods for batch correction and integration
   of samples from R packages including Batchelor (MNN),[131]^50 ComBat
   (sva),[132]^51^,[133]^52 limma,[134]^31 scMerge,[135]^53 Seurat, and
   ZINBWaVE,[136]^54 as well as Python packages including BBKNN[137]^55
   and Scanorama.[138]^56 These methods accept various types of input
   expression matrices (e.g., raw counts or log-normalized counts) and
   generate either a new corrected expression matrix or a low-dimensional
   representation of the integrated data.

Feature selection

   Several methods are available to compute and select the most variable
   features to use in the downstream analysis. Feature selection methods
   available with the toolkit include “vst,” “mean.var.plot,” and
   “dispersion” from the Seurat R package and “modelGeneVar” from scran R
   package.[139]^57 The top variable genes can be visualized through the
   toolkit in a scatterplot of the genes or features using the
   mean-to-variance or mean-to-dispersion plot depending upon the
   algorithm used.

Dimensionality reduction and 2D embedding

   The toolkit provides access to both PCA and ICA (independent-component
   analysis) algorithms from multiple packages for reducing the expression
   matrices into reduced dimensions. PCA is implemented from both scater
   and Seurat R packages, while implementation of ICA is only available
   from Seurat. Reduced dimensions computed from these methods can be
   visualized through various plots including component plot, elbow plot,
   jackstraw plot, and heatmaps. 2D embedding methods available with the
   toolkit include “tSNE” and “UMAP” from the Seurat package, “tSNE” from
   the Rtsne package, and “UMAP” from the scater package. The results
   computed from these methods can also be visualized using a 2D
   scatterplot.

Clustering

   Graph-based clustering methods available within SCTK2 include
   “Walktrap,”[140]^58 “Louvain,”[141]^59 “infomap,”[142]^60
   “fastGreedy,”[143]^61 and “labelProp”[144]^62 from the scran R package
   or “Louvain,” “multilevel,”[145]^63 or “SLM”[146]^64 from the Seurat R
   package. Additionally, K-means methods can be run using
   “Hartigan-Wong,” “Lloyd,” or “MacQueen” algorithms from the stats R
   package.

Curated workflows

   SCTK2 provides access to Seurat, Scanpy, and Celda analysis workflows
   through a streamlined and guided interface. Seurat is a widely used R
   package that implements various methods for processing and clustering
   of scRNA-seq data. Similarly, Scanpy is a Python package that also
   provides methods for analyzing scRNA-seq data. Celda is an R package
   that performs co-clustering of genes into modules and cells into
   subpopulations. In the SCTK2 GUI, all the steps of the Seurat, Scanpy,
   and Celda workflows can be run in a “step-by-step” fashion with the
   “vertical” layout. These curated workflows allow new or beginner users
   to quickly run an exploratory analysis of single-cell data without
   having to try too many combinations of parameters or tools.

DE and marker selection

   The toolkit offers DE in a group-vs.-group way using one of the five
   implemented methods including Wilcoxon rank-sum test, MAST, limma,
   DESeq2, or ANOVA. Alternatively, users can also use the DE methods in a
   “find marker” analysis to identify the top marker genes for each group
   of cells against all the other cells. The results for both approaches
   can be viewed through tables that display the top differentially
   expressed genes or marker genes along with the metrics computed by the
   selected method.

Cell-type labeling

   Cell-type labeling from a reference can be performed with the SingleR
   package in R. SingleR works by comparing the expression profile of each
   single cell to an annotated reference dataset and labels each cell with
   a cell type of the highest likelihood. SingleR can also label clusters
   of cells instead of individual cells. The cell-type assignments of
   clusters or individual cells can be visualized on a 2D embedding in the
   same fashion as labels from de novo clustering algorithms.

Pathway analysis

   Custom gene sets can be imported by the users or automatically
   downloaded from the MsigDB[147]^65 using the R package msigdb. Methods
   for scoring the levels of a gene set in each individual cell include
   VAM[148]^26 and GSVA in R.[149]^25 The scores for gene sets can be used
   in a DE analysis to compare different cell annotations such as cell
   type or experimental condition. The distribution of gene set scores can
   be visualized using violin plots. The EnrichR R
   package[150]^27^,[151]^28 can be used to determine if sets of genes are
   enriched for biological pathways in curated databases such as
   KEGG,[152]^66 GO,[153]^67 and MsigDB.

Trajectory analysis

   Cell trajectory can be constructed by building a cluster based minimum
   spanning tree (MST) and estimating pseudotime on the paths with the
   TSCAN R package.[154]^29 Based on the trajectory, SCTK2 also provides
   TSCAN methods to test features that are differentially expressed on a
   path or between paths. The pseudotime value or the expression of DE
   features can be visualized on a 2D embedding with the MST projected and
   overlaid on it.

Benchmarking

   The pbmc6k and pbmc68k datasets were obtained using the
   importExampleData() function, which utilized the TENxPBMCData package
   (v.1.12.0) and the ExperimentHub package (v.2.2.1) to retrieve the
   data. The immune100k and immune300k datasets were retrieved and
   downsampled from the Human Cell Atlas Portal. All datasets were
   exported to MTX format. The workflow that was benchmarked included
   steps for (1) importing the data from an MTX file using the
   importFromFiles() function; (2) calculation of general QC metrics using
   the runPerCellQC() function; (3) normalization using the
   runNormalization() with the “logNormCounts” method; (4) calculation of
   variable features using the runFeatureSelection() function with the
   “modelGeneVar” method; (5) dimensionality reduction using the
   runDimReduce() function with the “scaterPCA” method; (6) UMAP embedding
   using the runDimReduce() function with the “scaterUMAP” method; (7)
   clustering using the runScranSNN() function with the “Louvain” method;
   and (8) a differential gene expression analysis using the
   runFindMarker() function with the “Wilcox” method. For each of the
   steps, we used the peakRAM() function from the peakRAM package
   (v.1.0.2) to record the RAM used by the SCE object after the completion
   of each step and the peak RAM allocation used during each step, as well
   as the time elapsed for each step. All the analyses were performed on
   an x86_64 Linux cluster node, configured with 404 GB RAM and an Intel
   Xeon 2.80 GHz CPU with 32 cores.

Data analysis

   All figures in the results were generated by analyzing the PBMC and
   immune datasets with the singleCellTK package v.2.8.1. Code to
   reproduce the analysis can be found at
   [155]https://github.com/campbio-manuscripts/SCTK2.

Experimental procedures

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Joshua D.
   Campbell ([156]camp@bu.edu).

Materials availability

   This study did not generate new unique materials.

Acknowledgments