1 Introduction

1.1 Background and context

This report presents a reproducible analysis pipeline for a single-cell RNA-seq (scRNA-seq) dataset. The workflow is adapted from existing analysis scripts (Ziemann Lab, sc_macrophage repository) and materials from Workshop 6 of the bioinformatics workshop series (Ziemann Lab, 2025). Differential expression was performed using established Bioconductor workflows (Love et al., 2014). Pathway enrichment analysis was performed using the fgsea Bioconductor package, which implements a fast preranked gene set enrichment algorithm (Sergushichev, 2016). Gene sets were obtained from the Reactome pathway database (Jassal et al., 2020).

In my current work, scRNA-seq is used to investigate cellular and transcriptional dynamics of HIV infection in primary macrophages. The dataset analysed here includes samples from multiple donors and experimental batches, with cells stratified by HIV/GFP expression status. The complexity of this experimental design necessitates a robust and reusable analysis pipeline to enable consistent data processing, exploration, and presentation.

1.2 Purpose of the analysis

The primary aim of this analysis is to identify transcriptional and pathway-level differences between monocyte-derived macrophages (MDM) and alveolar monocyte–derived macrophages (AlvMDM). In addition, the analysis seeks to characterise gene expression and pathway changes associated with HIV infection. Identifying key differentially expressed genes and enriched pathways provides insight into biological mechanisms underlying macrophage identity and infection-driven responses. For the purpose of this analysis we will be investigating transcriptional differences between mock- and productively-infected (referred to as ‘active’ for simplicity) MDM.

1.3 Overview of the analysis workflow

The pipeline performs the following key steps:
1. Import, checking, and cleaning of scRNA-seq data
2. Basic quality control to assess read counts and mitochondrial content
3. Principal component analysis to quantify within and between group variation
4. Pseudobulk differential expression analysis
5. Visualisation of key results, including gene- and pathway-level summaries

Note: data acquisition and demultiplexing prior to step 1 are not performed in this script see https://github.com/markziemann/sc_macrophage/blob/main/scanalysis.Rmd for SingleCellExperiment (SCE) object generation. To summarise; Multiplexed 10x Genomics scRNA-seq libraries were generated from mock and HIV-infected macrophages (MDM, N=4) and alveolar macrophages (AlvMDM, N=3), with pooled donors and hashtag oligonucleotides (HTOs) where applicable. Raw FASTQ files were aligned to a combined human-HIV reference using Cell Ranger to produce filtered gene expression matrices and whitelisted cell barcodes. HTO counts were quantified with CITE-seq, and donor identities were assigned via SNP-based demultiplexing using cellSNP-lite and vireo. Low-quality or ambiguous cells were filtered, and experimental conditions and donor identities were validated with HTO signals and HIV transcript abundance. This outputs a SingleCellExperiment (SCE) object with gene expression, HTO counts, donor assignments, and QC metadata.

1.4 Relevance to current and future work

This analysis is directly relevant to my current research, where scRNA-seq datasets are being generated across multiple experimental conditions. The development of a reproducible pipeline enables consistent re-analysis as new samples are added and supports the efficient generation of figures and tables for reports, presentations, and future publications.

2 How to run this analysis

This report is written as an R Markdown document and should be rendered from top to bottom.

Requirements:
- R (≥ 4.2; tested with R 4.3.3)
- Internet access (for downloading the Reactome GMT file)
- The file sce.Rds downloaded from: https://ziemann-lab.net/public/anna/macrophage_may2025/sce.Rds

To run:
1. Place sce.Rds in the same directory as this Rmd file.
2. Open the Rmd in RStudio.
3. Click Knit to render the HTML report.

3 Set up

Load packages

library(SingleCellExperiment)
library(dplyr)
library(scran)
library(scater)
library(DESeq2)
library(gplots)
library(kableExtra)
library(Matrix)
library(fgsea)
library(irlba)
library(ggplot2)
library(patchwork)
library(BiocParallel)

4 Data loading and quality control (QC)

4.1 Data import

The scRNA-seq dataset was loaded into a SingleCellExperiment object.

sce <- readRDS("sce.Rds")
sce
## class: SingleCellExperiment 
## dim: 36622 24311 
## metadata(0):
## assays(2): counts logcounts
## rownames(36622): HIV-LTRR HIV-LTRU5 ... AC007325.4 AC007325.2
## rowData names(0):
## colnames(24311): mdm_mock1|AAACGAATCACATACG mdm_mock1|AAACGCTCATCAGCGC
##   ... react6|TTTGTTGTCTGAACGT react6|TTTGTTGTCTTGGCTC
## colData names(13): orig.ident nCount_RNA ... monaco_fine_pruned_labels
##   ident
## reducedDimNames(3): PCA HARMONY UMAP
## mainExpName: RNA
## altExpNames(0):

4.2 Per-cell QC metrics

# Remove old QC columns if present
qc_cols <- grep("^subsets_Mt|^sum$|^detected$|^total$", colnames(colData(sce)), value = TRUE)
if(length(qc_cols) > 0) {
  colData(sce)[, qc_cols] <- NULL
}

# Identify mitochondrial genes
is.mito <- grepl("^MT-", rownames(sce))

# Add per-cell QC metrics
sce <- addPerCellQC(sce, subsets = list(Mt = is.mito), assay.type = "counts")

# Create data frame for plotting
df <- as.data.frame(colData(sce))

4.3 QC Visualisation

Quality control metrics were visualised to assess gene detection rates and mitochondrial content across samples.

Note: alv preffix represents the AlvMDM cell type. Bystander, latent, and reactivated/‘react’ cell types are not being used in this example but exist in this dataset.

4.4 Cell and gene filtering

low_lib       <- sce$sum < 200 # to remove empty droplets / low sequencng depth
high_lib      <- sce$sum > 100000 # to remove doublets

low_detected  <- sce$detected < 1500 # to remove low-quality cells and low numbers of detected genes, would consider making this more lenient if analysing latent cells
high_detected <- sce$detected > 8000  # remove very complex/doublet-like cells

high_mito     <- sce$subsets_Mt_percent > 15  # fixed cutoff (standard)

#combine all filters
discard_cells <- low_lib | high_lib | low_detected | high_detected | high_mito

#subset
sce_sub <- sce[, !discard_cells]

counts_mat <- counts(sce_sub)

# Gene detection and abundance
genes_detected <- rowSums(counts_mat > 0)
genes_counts   <- rowSums(counts_mat)

# Keep informative genes
keep_genes <- genes_detected >= 4 & genes_counts >= 20

sce_sub <- sce_sub[keep_genes, ]

# Compute library size per cell if not already present
if(!"sum" %in% colnames(colData(sce_sub))) {
  sce$sum <- colSums(counts(sce_sub))
}

# Now compute total UMIs per demultiplexed sample
library_sizes <- tapply(
  sce_sub$sum,
  sce_sub$sample,
  sum
)

The numbers refer to different donors within the dataset.

5 Normalisation and feature selection

# Deconvolution-based size factors (robust for scRNA-seq)
register(MulticoreParam(2))  # use 2 cores
time_taken_deconvol <- system.time({
sce_sub <- computeSumFactors(sce_sub, BPPARAM=MulticoreParam(2)) # can take a while so use parallel computing
})
time_taken_deconvol # to check how long it takes to optimise fastest computation
sce_sub <- logNormCounts(sce_sub)

# Model gene variance
dec <- modelGeneVar(sce_sub)

# Select top 2000 HVGs
top_hvgs <- getTopHVGs(dec, n = 2000)

length(top_hvgs)

# Subset logcounts to HVGs for PCA
log_cpm <- logcounts(sce_sub)[top_hvgs, ]

# Transpose & scale
x_scaled <- scale(t(logcounts(sce_sub)[top_hvgs, ]))

6 Dimensionality reduction

6.1 Principal component analysis (PCA)

##    user  system elapsed 
##   1.956   0.246   2.215

Principal component analysis revealed separation between samples consistent with infection status, indicating structured transcriptional variation.

7 Differential expression analysis

7.1 Pseudobulk differential expression

# Exclude HIV genes
host_genes <- !grepl("^HIV-", rownames(counts(sce_sub)))
sce_host <- sce_sub[host_genes, ]

# Subset metadata & counts to your 8 groups of interest
keep_groups <- c("mdm_mock1","mdm_mock2","mdm_mock3","mdm_mock4",
                 "mdm_active1","mdm_active2","mdm_active3","mdm_active4")

meta <- as.data.frame(colData(sce_host))

meta_sub <- meta[meta$sample %in% keep_groups, , drop=FALSE]
counts_sub <- counts(sce_host)[, rownames(meta_sub)]
meta_sub$sample <- factor(meta_sub$sample)

# Pseudobulk aggregation (manual, similar to your rowsum)
pb_mat <- t(rowsum(t(counts_sub), group = meta_sub$sample))

dim(pb_mat)
## [1] 23268     8
summary(pb_mat)
##   mdm_active1       mdm_active2         mdm_active3         mdm_active4       
##  Min.   :      0   Min.   :      0.0   Min.   :      0.0   Min.   :      0.0  
##  1st Qu.:      4   1st Qu.:      3.0   1st Qu.:      1.0   1st Qu.:      2.0  
##  Median :     47   Median :     35.0   Median :     19.0   Median :     22.0  
##  Mean   :   1230   Mean   :    889.3   Mean   :    469.9   Mean   :    586.5  
##  3rd Qu.:    426   3rd Qu.:    322.0   3rd Qu.:    176.0   3rd Qu.:    211.0  
##  Max.   :4383127   Max.   :2637039.0   Max.   :1762057.0   Max.   :1675093.0  
##    mdm_mock1         mdm_mock2           mdm_mock3          mdm_mock4        
##  Min.   :      0   Min.   :      0.0   Min.   :     0.0   Min.   :      0.0  
##  1st Qu.:      4   1st Qu.:      3.0   1st Qu.:     0.0   1st Qu.:      3.0  
##  Median :     47   Median :     35.0   Median :     8.0   Median :     35.0  
##  Mean   :   1215   Mean   :    831.8   Mean   :   204.8   Mean   :    885.4  
##  3rd Qu.:    431   3rd Qu.:    322.0   3rd Qu.:    76.0   3rd Qu.:    335.0  
##  Max.   :2831345   Max.   :1343859.0   Max.   :506522.0   Max.   :1647158.0
# DESeq2 metadata
colData_pb <- data.frame(row.names = colnames(pb_mat), 
                         sample = factor(gsub("[0-9]+$", "", colnames(pb_mat)),
                         levels = c("mdm_mock", "mdm_active"))) # Set mock as reference


dds <- DESeqDataSetFromMatrix(countData = pb_mat, colData = colData_pb, design = ~ sample)
dds <- DESeq(dds)
res <- results(dds)

# Variance stabilising transform for heatmaps
vsd <- vst(dds, blind = FALSE)

# Combine DE results & normalized counts for ranking / heatmaps
dge <- cbind(as.data.frame(res), assay(vsd))
dge <- dge[order(dge$pvalue), ]

8 Visualisation of differential expression results

Differentially expressed genes were visualised using a heatmap of the top ranked genes across experimental groups.

9 Pathway enrichment analysis

# Ranked gene list for fgsea
stat <- res$stat
names(stat) <- rownames(res)
stat[is.na(stat)] <- 0

# Download GMT file if missing
GMT <- "ReactomePathways_2025-10-15.gmt"
if (!file.exists(GMT)) {
  download.file("https://ziemann-lab.net/public/bioinfo_workshop/ReactomePathways_2025-10-15.gmt", destfile=GMT)
}
gs <- gmtPathways(GMT)

# Run fgsea
fres <- fgsea(pathways=gs, stats=stat, minSize=5, nproc=4)
##   |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  11%  |                                                                              |========                                                              |  12%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |==========                                                            |  15%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |============                                                          |  18%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |=================                                                     |  25%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |======================                                                |  32%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |=========================                                             |  36%  |                                                                              |==========================                                            |  36%  |                                                                              |==========================                                            |  37%  |                                                                              |==========================                                            |  38%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |=============================                                         |  41%  |                                                                              |=============================                                         |  42%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |===============================                                       |  45%  |                                                                              |================================                                      |  45%  |                                                                              |================================                                      |  46%  |                                                                              |=================================                                     |  47%  |                                                                              |=================================                                     |  48%  |                                                                              |==================================                                    |  48%  |                                                                              |==================================                                    |  49%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  51%  |                                                                              |====================================                                  |  52%  |                                                                              |=====================================                                 |  52%  |                                                                              |=====================================                                 |  53%  |                                                                              |======================================                                |  54%  |                                                                              |======================================                                |  55%  |                                                                              |=======================================                               |  55%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  58%  |                                                                              |=========================================                             |  59%  |                                                                              |==========================================                            |  60%  |                                                                              |===========================================                           |  61%  |                                                                              |===========================================                           |  62%  |                                                                              |============================================                          |  62%  |                                                                              |============================================                          |  63%  |                                                                              |============================================                          |  64%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |==============================================                        |  65%  |                                                                              |==============================================                        |  66%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  68%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |==================================================                    |  71%  |                                                                              |==================================================                    |  72%  |                                                                              |===================================================                   |  72%  |                                                                              |===================================================                   |  73%  |                                                                              |====================================================                  |  74%  |                                                                              |====================================================                  |  75%  |                                                                              |=====================================================                 |  75%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  77%  |                                                                              |=======================================================               |  78%  |                                                                              |=======================================================               |  79%  |                                                                              |========================================================              |  80%  |                                                                              |========================================================              |  81%  |                                                                              |=========================================================             |  81%  |                                                                              |=========================================================             |  82%  |                                                                              |==========================================================            |  82%  |                                                                              |==========================================================            |  83%  |                                                                              |===========================================================           |  84%  |                                                                              |===========================================================           |  85%  |                                                                              |============================================================          |  85%  |                                                                              |============================================================          |  86%  |                                                                              |=============================================================         |  87%  |                                                                              |=============================================================         |  88%  |                                                                              |==============================================================        |  88%  |                                                                              |==============================================================        |  89%  |                                                                              |===============================================================       |  90%  |                                                                              |===============================================================       |  91%  |                                                                              |================================================================      |  91%  |                                                                              |================================================================      |  92%  |                                                                              |=================================================================     |  92%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  95%  |                                                                              |===================================================================   |  96%  |                                                                              |====================================================================  |  97%  |                                                                              |====================================================================  |  98%  |                                                                              |===================================================================== |  98%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================|  99%  |                                                                              |======================================================================| 100%
fsig <- subset(fres, padj<0.05)
## png 
##   2

Overall, the differential expression and pathway enrichment analyses reveal clear transcriptional differences between mock and HIV-active MDMs, with multiple immune-related pathways showing significant modulation, consistent with an infection-driven response.

10 Reflection on workshop series

The bioinformatics workshop series provided a structured progression from basic data skills to advanced analysis workflows that significantly broadened my understanding of programming and its application to biological data analysis. Early sessions on the Linux command line, R data structures, and R Markdown reports were particularly useful in developing a reproducible workflow for single-cell RNA-seq data. Intermediate R programming skills and hands-on omics analysis directly informed the pipeline presented here. Concepts such as differential expression, PCA, and pathway enrichment, although introduced in the context of bulk RNA-seq, were directly adaptable to my current work. The later sessions on workflow management, HPC usage, machine learning, and proteomics analysis, while not directly applied here, offered useful context and highlighted the versatility of programming tools in addressing diverse biological questions. Overall, the workshops increased my confidence in structuring reproducible, shareable analyses and adapting example code to new biological questions.

The series also strengthened my confidence in independently structuring analyses, documenting code, and producing reproducible results. I now feel equipped to tackle lab-based questions and generate analyses that can be shared or revisited as new data are added.

11 Feedback on the workshops

The workshops covered a wide range of topics, which sometimes made it challenging to consolidate learning before moving to the next focus. Nevertheless, this structure ultimately encouraged adaptability, prompting me to actively integrate and revisit ideas. Exposure to a wide range of tools and approaches has inspired me to explore novel methods and incorporate them into existing pipelines, enhancing both analytical depth and workflow efficiency. If possible for future iterations, having a few consecutive sessions on a similar theme could help participants reinforce key concepts and deepen understanding. Additionally, larger attendance or opportunities for peer interaction, such as in-person buddy systems, could support progress checks and collaborative learning if it is available.

12 References

Ziemann M. sc_macrophage: Single-cell macrophage RNA-seq analysis workflow. GitHub repository. https://github.com/markziemann/sc_macrophage.

Ziemann Lab (2024). Bioinformatics Workshop 6: RNA-seq analysis. https://ziemann-lab.net/public/bioinfo_workshop/workshop6.html.

Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology.

Sergushichev A. fgsea: Fast Gene Set Enrichment Analysis. Bioconductor package. 2016. https://bioconductor.org/packages/fgsea

Jassal B, et al. The Reactome pathway knowledgebase. Nucleic Acids Research. 2020;48(D1):D498–D503.

13 Session Information

For reproducibility.

sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0  LAPACK version 3.10.0
## 
## locale:
## [1] C
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] BiocParallel_1.43.3         patchwork_1.3.2            
##  [3] irlba_2.3.5.1               fgsea_1.35.2               
##  [5] Matrix_1.7-4                kableExtra_1.4.0           
##  [7] gplots_3.2.0                DESeq2_1.49.2              
##  [9] scater_1.37.0               ggplot2_4.0.1              
## [11] scran_1.38.0                scuttle_1.19.0             
## [13] dplyr_1.1.4                 SingleCellExperiment_1.31.0
## [15] SummarizedExperiment_1.39.0 Biobase_2.69.0             
## [17] GenomicRanges_1.61.0        GenomeInfoDb_1.45.4        
## [19] IRanges_2.43.0              S4Vectors_0.47.0           
## [21] BiocGenerics_0.55.0         generics_0.1.4             
## [23] MatrixGenerics_1.21.0       matrixStats_1.5.0          
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-9        gridExtra_2.3       rlang_1.1.6        
##  [4] magrittr_2.0.4      compiler_4.5.2      systemfonts_1.3.1  
##  [7] vctrs_0.6.5         stringr_1.6.0       pkgconfig_2.0.3    
## [10] crayon_1.5.3        fastmap_1.2.0       XVector_0.49.0     
## [13] labeling_0.4.3      caTools_1.18.3      rmarkdown_2.30     
## [16] UCSC.utils_1.5.0    ggbeeswarm_0.7.2    xfun_0.54          
## [19] bluster_1.20.0      cachem_1.1.0        beachmat_2.25.1    
## [22] jsonlite_2.0.0      DelayedArray_0.35.1 parallel_4.5.2     
## [25] cluster_2.1.8.1     R6_2.6.1            bslib_0.9.0        
## [28] stringi_1.8.7       RColorBrewer_1.1-3  limma_3.65.1       
## [31] jquerylib_0.1.4     Rcpp_1.1.0          knitr_1.50         
## [34] igraph_2.2.1        tidyselect_1.2.1    rstudioapi_0.17.1  
## [37] dichromat_2.0-0.1   abind_1.4-8         yaml_2.3.10        
## [40] viridis_0.6.5       codetools_0.2-20    lattice_0.22-7     
## [43] tibble_3.3.0        withr_3.0.2         S7_0.2.1           
## [46] evaluate_1.0.5      xml2_1.5.0          pillar_1.11.1      
## [49] KernSmooth_2.23-26  scales_1.4.0        gtools_3.9.5       
## [52] glue_1.8.0          metapod_1.18.0      tools_4.5.2        
## [55] BiocNeighbors_2.3.1 data.table_1.17.8   ScaledMatrix_1.17.0
## [58] locfit_1.5-9.12     fastmatch_1.1-6     cowplot_1.2.0      
## [61] grid_4.5.2          edgeR_4.7.2         beeswarm_0.4.0     
## [64] BiocSingular_1.25.0 vipor_0.4.7         cli_3.6.5          
## [67] rsvd_1.0.5          textshaping_1.0.4   S4Arrays_1.9.1     
## [70] viridisLite_0.4.2   svglite_2.2.2       gtable_0.3.6       
## [73] sass_0.4.10         digest_0.6.39       SparseArray_1.9.0  
## [76] ggrepel_0.9.6       dqrng_0.4.1         farver_2.1.2       
## [79] htmltools_0.5.8.1   lifecycle_1.0.4     httr_1.4.7         
## [82] statmod_1.5.1