Single cell transcriptomes of macrophages with HIV 2024

Introduction

Ee are interested in comparing DEGs between the following comparison groups:

Latently- vs productively-infected cells (groups 3 vs 4).
Latently-infected vs bystander cells (2 vs 3).
Productively-infected vs mock (4 vs 1)
Mock vs bystander (1 vs 2).

As discussed, I think we will do these comparisons separately for both MDM and AlvMDM samples initially. Then, it would be of interest to compare DEGs for MDM vs AlvMDM for each of the four infection groups.

suppressPackageStartupMessages({
  library("kableExtra")
  library("ggplot2")
  library("plyr")
  library("Seurat")
  library("hdf5r")
  library("SingleCellExperiment")
  library("parallel")
  library("stringi")
  library("beeswarm")
  library("muscat")
  library("DESeq2")
  library("mitch")
  library("harmony")
  library("celldex")
  library("SingleR")
  library("limma")
  library("gplots")
})

Load data

Sample	Patient	Group
mock0	MDMP236	mock
mock1	MDMV236	mock
mock2	MDMO236	mock
mock3	MDMP239	mock
mock4	AlvP239	mock
mock5	AlvM239	mock
mock6	AlvN239	mock
active0	MDMP236	active
active1	MDMV236	active
active2	MDMO236	active
active3	MDMP239	active
active4	AlvP239	active
active5	AlvM239	active
active6	AlvN239	active
latent0	MDMP236	latent
latent1	MDMV236	latent
latent2	MDMO236	latent
latent3	MDMP239	latent
latent4	AlvP239	latent
latent5	AlvM239	latent
latent6	AlvN239	latent
bystander0	MDMP236	bystander
bystander1	MDMV236	bystander
bystander2	MDMO236	bystander
bystander3	MDMP239	bystander
bystander4	AlvP239	bystander
bystander5	AlvM239	bystander
bystander6	AlvN239	bystander
react6	AlvN239	reactivated

exclude react6

ss <- read.table("samplesheet.tsv",header=TRUE,row.names=1)

ss %>% kbl(caption="sample sheet") %>% kable_paper("hover", full_width = F)

sample sheet
	Patient	Group
mock0	MDMP236	mock
mock1	MDMV236	mock
mock2	MDMO236	mock
mock3	MDMP239	mock
mock4	AlvP239	mock
mock5	AlvM239	mock
mock6	AlvN239	mock
active0	MDMP236	active
active1	MDMV236	active
active2	MDMO236	active
active3	MDMP239	active
active4	AlvP239	active
active5	AlvM239	active
active6	AlvN239	active
latent0	MDMP236	latent
latent1	MDMV236	latent
latent2	MDMO236	latent
latent3	MDMP239	latent
latent4	AlvP239	latent
latent5	AlvM239	latent
latent6	AlvN239	latent
bystander0	MDMP236	bystander
bystander1	MDMV236	bystander
bystander2	MDMO236	bystander
bystander3	MDMP239	bystander
bystander4	AlvP239	bystander
bystander5	AlvM239	bystander
bystander6	AlvN239	bystander
react6	AlvN239	reactivated

mylist <- readRDS("macrophage_counts_nofilter.rds")

Make single cell experiment object

comb <- do.call(cbind,mylist)
sce <- SingleCellExperiment(list(counts=comb))
sce

## class: SingleCellExperiment 
## dim: 36603 28971 
## metadata(0):
## assays(1): counts
## rownames(36603): gene-HIV1-1 gene-HIV1-2 ... AC007325.4 AC007325.2
## rowData names(0):
## colnames(28971): mdm_mock1|AAACGAATCACATACG mdm_mock1|AAACGCTCATCAGCGC
##   ... react6|TTTGTTGTCTGAACGT react6|TTTGTTGTCTTGGCTC
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

Normalise data

cellmetadata <- data.frame(colnames(comb) ,sapply(strsplit(colnames(comb),"\\|"),"[[",1))
colnames(cellmetadata) <- c("cell","sample")
comb <- CreateSeuratObject(comb, project = "mac", assay = "RNA", meta.data = cellmetadata)
comb <- NormalizeData(comb)

## Normalizing layer: counts

comb <- FindVariableFeatures(comb, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

comb <- ScaleData(comb)

## Centering and scaling data matrix

PCA and Cluster

comb <- RunPCA(comb, features = VariableFeatures(object = comb))

## PC_ 1 
## Positive:  MT-ATP8, MTRNR2L1, FABP4, AC105402.3, FABP3, gene-HIV1-2, HAMP, C1orf56, PVALB, AL136097.2 
##     GAPDH, BIRC7, UTS2, S100A9, C1QC, MT1G, MT1H, AC025154.2, CARD16, COX7A1 
##     BEX3, MT1A, AL450311.1, PPP1R1A, NUPR1, LILRA1, MT2A, CCL23, FXYD2, CCL15 
## Negative:  ARL15, NEAT1, DOCK3, EXOC4, MALAT1, RASAL2, FTX, LRMDA, DPYD, PLXDC2 
##     JMJD1C, TMEM117, FHIT, ZEB2, MITF, DENND4C, ATG7, FRMD4B, VPS13B, ZNF438 
##     MAML2, COP1, TPRG1, FMNL2, ATXN1, JARID2, ASAP1, ELMO1, FNDC3B, TCF12 
## PC_ 2 
## Positive:  CYSTM1, FAH, CD164, FDX1, GDF15, PSAT1, ATP6V1D, SAT1, TXN, BCAT1 
##     CCPG1, SLAMF9, CSTA, CTSL, HES2, HEBP2, S100A10, NUPR1, PHGDH, STMN1 
##     TCEA1, GARS, MARCO, SCD, RILPL2, SNHG5, RHOQ, CD48, PLIN2, SNHG32 
## Negative:  SPRED1, RCBTB2, KCNMA1, GCLC, FGL2, HLA-DRA, MRC1, HLA-DRB1, RTN1, SLCO2B1 
##     HLA-DPA1, AC020656.1, CD74, HLA-DPB1, PDGFC, LINC02345, DPYD, MERTK, C1QA, CCDC102B 
##     STAC, ATP8B4, NFATC3, RNF128, VAMP5, MX1, ATG7, TGFBI, XYLT1, CRIP1 
## PC_ 3 
## Positive:  NCAPH, CRABP2, TM4SF19, GAL, DUSP2, RGCC, CHI3L1, CCL22, TMEM114, ACAT2 
##     RGS20, TRIM54, LINC01010, LINC02244, TCTEX1D2, MGST1, GPC4, NUMB, CCND1, SYNGR1 
##     CLU, POLE4, SERTAD2, AC092353.2, SLC20A1, IL1RN, ZNF366, AC005280.2, OCSTAMP, GCLC 
## Negative:  MARCKS, CD14, TGFBI, BTG1, MS4A6A, HLA-DQA1, FPR3, CTSC, CD163, MPEG1 
##     MEF2C, AIF1, TIMP1, HLA-DPB1, C1QC, HLA-DQB1, SELENOP, RNASE1, NAMPT, HLA-DQA2 
##     TCF4, EPB41L3, ARL4C, NUPR1, ALDH2, SAMSN1, ZNF331, HIF1A, MS4A7, C1QA 
## PC_ 4 
## Positive:  MT-ATP8, MTRNR2L1, XIST, PDE4D, AL035446.2, AC073359.2, CCDC26, AL365295.1, CLVS1, PRKCE 
##     EPHB1, RAD51B, C5orf17, LINC01135, PLEKHA5, LINC02320, LY86-AS1, STX18-AS1, FHIT, LINC01473 
##     DMGDH, FTX, AC079142.1, PKD1L1, SLC22A15, AF117829.1, GUCY2F, SH3RF3, STOX2, MIR646HG 
## Negative:  HLA-DRB1, CD74, CTSB, HSP90B1, TUBA1B, GAPDH, HLA-DPA1, HLA-DRA, ACTG1, IFI30 
##     RGCC, AIF1, HSPA5, CYP1B1, HLA-DPB1, C1QA, C15orf48, LYZ, CA2, S100A4 
##     LGMN, TUBB, FBP1, TUBA1A, MMP9, RNASE6, HLA-DQB1, CHI3L1, CCL3, TXN 
## PC_ 5 
## Positive:  PTGDS, LINC02244, MMP9, MGST1, LYZ, RGCC, IFI30, CHI3L1, CLU, GCLC 
##     GCHFR, CRABP2, FABP3, VAMP5, TMEM176B, HLA-DRB1, SYNGR1, CTSB, NCAPH, ISG15 
##     CD74, NCF1, SOD2, SAT1, FGL2, HSPA5, CDKN2A, MX1, CLEC12A, S100A10 
## Negative:  TYMS, PCLAF, TK1, MKI67, MYBL2, CENPM, CLSPN, RRM2, BIRC5, SHCBP1 
##     CEP55, DIAPH3, CDK1, CENPK, PRC1, CENPF, ESCO2, HELLS, NUSAP1, TOP2A 
##     NCAPG, FAM111B, ANLN, TPX2, CCNA2, KIF11, ASPM, CENPU, MAD2L1, HMMR

comb <- RunHarmony(comb,"sample")

## Transposing data matrix

## Initializing state using k-means centroids initialization

## Harmony 1/10

## Harmony 2/10

## Harmony 3/10

## Harmony 4/10

## Harmony converged after 4 iterations

DimHeatmap(comb, dims = 1:6, cells = 500, balanced = TRUE)

ElbowPlot(comb)

comb <- JackStraw(comb, num.replicate = 100)
comb <- FindNeighbors(comb, dims = 1:10)

## Computing nearest neighbor graph

## Computing SNN

comb <- FindClusters(comb, resolution = 0.5)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 28971
## Number of edges: 896422
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8952
## Number of communities: 12
## Elapsed time: 5 seconds

UMAP

comb <- RunUMAP(comb, dims = 1:10)

## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

## 13:43:51 UMAP embedding parameters a = 0.9922 b = 1.112

## 13:43:51 Read 28971 rows and found 10 numeric columns

## 13:43:51 Using Annoy for neighbor search, n_neighbors = 30

## 13:43:51 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 13:43:53 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea3bb493d7
## 13:43:53 Searching Annoy index using 1 thread, search_k = 3000
## 13:44:02 Annoy recall = 100%
## 13:44:03 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:44:05 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:44:05 Commencing optimization for 200 epochs, with 1186670 positive edges
## 13:44:13 Optimization finished

DimPlot(comb, reduction = "umap")

Assign names to clusters

ADGRE1, CCR2, CD169, CX3CR1, CD206, CD163, LYVE1, CD9, TREM2

HLA-DP, HLA-DM, HLA-DOA, HLA-DOB, HLA-DQ, and HLA-DR.

message("macrophage markers")

## macrophage markers

FeaturePlot(comb, features = c("ADGRE1", "CCR2", "SIGLEC1", "CX3CR1", "MRC1", "CD163", "LYVE1", "CD9", "TREM2"))

DimPlot(comb, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

That’s pretty useless. Let’s use celldex pkg to annotate cells and get the macs.

ref <- celldex::MonacoImmuneData()

DefaultAssay(comb) <- "RNA"
comb2 <- as.SingleCellExperiment(comb)

lc <- logcounts(comb2)

pred_imm_broad <- SingleR(test=comb2, ref=ref,
                          labels=ref$label.main)

head(pred_imm_broad)

## DataFrame with 6 rows and 4 columns
##                                                    scores      labels
##                                                  <matrix> <character>
## mdm_mock1|AAACGAATCACATACG 0.307826:0.325650:0.169064:...   Monocytes
## mdm_mock1|AAACGCTCATCAGCGC 0.311008:0.281528:0.189016:...   Monocytes
## mdm_mock1|AAACGCTGTCGAGTGA 0.318344:0.277003:0.170895:...   Monocytes
## mdm_mock1|AAAGAACGTGCAACAG 0.214292:0.190767:0.121117:...   Monocytes
## mdm_mock1|AAAGGTAAGCCATATC 0.290416:0.291088:0.155001:...   Monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG 0.292140:0.281397:0.184535:...   Monocytes
##                            delta.next pruned.labels
##                             <numeric>   <character>
## mdm_mock1|AAACGAATCACATACG  0.1786319     Monocytes
## mdm_mock1|AAACGCTCATCAGCGC  0.0338547     Monocytes
## mdm_mock1|AAACGCTGTCGAGTGA  0.1301308     Monocytes
## mdm_mock1|AAAGAACGTGCAACAG  0.1116322            NA
## mdm_mock1|AAAGGTAAGCCATATC  0.1794308     Monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG  0.0952702     Monocytes

table(pred_imm_broad$pruned.labels)

## 
##       Basophils Dendritic cells       Monocytes     Neutrophils        NK cells 
##               4             657           24154              14               4 
##     Progenitors 
##               2

cellmetadata$label <- pred_imm_broad$pruned.labels

pred_imm_fine <- SingleR(test=comb2, ref=ref,
                          labels=ref$label.fine)
head(pred_imm_fine)

## DataFrame with 6 rows and 4 columns
##                                                     scores              labels
##                                                   <matrix>         <character>
## mdm_mock1|AAACGAATCACATACG 0.1836411:0.485683:0.206442:... Classical monocytes
## mdm_mock1|AAACGCTCATCAGCGC 0.1961234:0.430705:0.226843:... Classical monocytes
## mdm_mock1|AAACGCTGTCGAGTGA 0.1718613:0.442610:0.188544:... Classical monocytes
## mdm_mock1|AAAGAACGTGCAACAG 0.0943609:0.264157:0.120656:... Classical monocytes
## mdm_mock1|AAAGGTAAGCCATATC 0.1563980:0.415082:0.167965:... Classical monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG 0.1857623:0.432131:0.207144:... Classical monocytes
##                            delta.next       pruned.labels
##                             <numeric>         <character>
## mdm_mock1|AAACGAATCACATACG  0.0626269 Classical monocytes
## mdm_mock1|AAACGCTCATCAGCGC  0.1150706 Classical monocytes
## mdm_mock1|AAACGCTGTCGAGTGA  0.0651352 Classical monocytes
## mdm_mock1|AAAGAACGTGCAACAG  0.0648711 Classical monocytes
## mdm_mock1|AAAGGTAAGCCATATC  0.1073274 Classical monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG  0.1521533 Classical monocytes

table(pred_imm_fine$pruned.labels)

## 
##           Classical monocytes        Intermediate monocytes 
##                         21583                          3569 
##         Low-density basophils       Low-density neutrophils 
##                             2                            21 
##       Myeloid dendritic cells                 Naive B cells 
##                           461                             1 
##          Natural killer cells       Non classical monocytes 
##                             3                            64 
##            Non-Vd2 gd T cells  Plasmacytoid dendritic cells 
##                             1                            35 
##              Progenitor cells Terminal effector CD4 T cells 
##                             4                             1

cellmetadata$finelabel <- pred_imm_fine$pruned.labels

col_pal <- c('#e31a1c', '#ff7f00', "#999900", '#cc00ff', '#1f78b4', '#fdbf6f',
             '#33a02c', '#fb9a99', "#a6cee3", "#cc6699", "#b2df8a", "#99004d", "#66ff99",
             "#669999", "#006600", "#9966ff", "#cc9900", "#e6ccff", "#3399ff", "#ff66cc",
             "#ffcc66", "#003399")

annot_df <- data.frame(
  barcodes = rownames(pred_imm_broad),
  monaco_broad_annotation = pred_imm_broad$labels,
  monaco_broad_pruned_labels = pred_imm_broad$pruned.labels,
  monaco_fine_annotation = pred_imm_fine$labels,
  monaco_fine_pruned_labels = pred_imm_fine$pruned.labels
)

meta_inf <- comb@meta.data
meta_inf$cell_barcode <- colnames(comb)

meta_inf <- meta_inf %>% dplyr::left_join(y = annot_df,
                                          by = c("cell_barcode" = "barcodes"))
rownames(meta_inf) <- colnames(lc)

comb@meta.data <- meta_inf

DimPlot(comb, label=TRUE, group.by = "monaco_broad_annotation", reduction = "umap",
  cols = col_pal, pt.size = 0.5) + ggtitle("Annotation With the Monaco Reference Database")

DimPlot(comb, label=TRUE, group.by = "monaco_fine_annotation", reduction = "umap",
  cols = col_pal, pt.size = 0.5) + ggtitle("Annotation With the Monaco Reference Database")

Make MDM object

mdmlist <- mylist[grep("mdm",names(mylist))]
comb1 <- do.call(cbind,mdmlist)
sce1 <- SingleCellExperiment(list(counts=comb1))
sce1

## class: SingleCellExperiment 
## dim: 36603 12633 
## metadata(0):
## assays(1): counts
## rownames(36603): gene-HIV1-1 gene-HIV1-2 ... AC007325.4 AC007325.2
## rowData names(0):
## colnames(12633): mdm_mock1|AAACGAATCACATACG mdm_mock1|AAACGCTCATCAGCGC
##   ... mdm_bystander3|TTTCGATCACCTAAAC mdm_bystander3|TTTGACTGTATGTGTC
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

cellmetadata1 <- data.frame(colnames(comb1) ,sapply(strsplit(colnames(comb1),"\\|"),"[[",1))
colnames(cellmetadata1) <- c("cell","sample")
comb1 <- CreateSeuratObject(comb1, project = "mac", assay = "RNA", meta.data = cellmetadata1)
comb1 <- NormalizeData(comb1)

## Normalizing layer: counts

comb1 <- FindVariableFeatures(comb1, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

comb1 <- ScaleData(comb1)

## Centering and scaling data matrix

comb1 <- RunPCA(comb1, features = VariableFeatures(object = comb1))

## PC_ 1 
## Positive:  FABP4, MT-ATP8, FABP3, GAL, TXN, gene-HIV1-2, PRDX6, STMN1, NUPR1, S100A10 
##     UTS2, CYSTM1, LILRA1, GAPDH, SLAMF9, NMB, S100A9, GSTM3, LDHB, SEL1L2 
##     S100P, GDF15, APOC2, HAMP, UCHL1, MT1G, SLC35F1, COX5B, MMP12, OCIAD2 
## Negative:  ARL15, NEAT1, DPYD, EXOC4, FTX, ZEB2, PLXDC2, LRMDA, DOCK3, RASAL2 
##     ATG7, JMJD1C, TRIO, MALAT1, MAML2, FHIT, VPS13B, TMEM117, MITF, ELMO1 
##     TCF12, COP1, JARID2, DENND4C, ATXN1, MEF2A, ARHGAP15, DOCK4, FMNL2, MED13L 
## PC_ 2 
## Positive:  TM4SF19, GPC4, CYSTM1, SPP1, ANO5, CD164, TXNRD1, MGST1, FNIP2, TXN 
##     C15orf48, ACAT2, FABP3, S100A10, CHI3L1, PRDX6, RETREG1, PSD3, RGCC, SCD 
##     COX5B, GDF15, TGM2, RGS20, CRABP2, MREG, CCL22, CSF1, GAL, HES2 
## Negative:  TGFBI, HLA-DPB1, HLA-DRA, MS4A6A, HLA-DPA1, HLA-DQB1, HLA-DQA1, CEBPD, MPEG1, C1QA 
##     ST8SIA4, FPR3, CD74, RNASE1, C1QC, CD163, TCF4, EPB41L3, CD14, LILRB2 
##     HLA-DRB5, ARL4C, PDE4B, MS4A7, SIPA1L1, AIF1, SEMA4A, FOS, HLA-DQA2, HLA-DOA 
## PC_ 3 
## Positive:  GSN, CYP1B1, GCLC, S100A4, LPL, DUSP2, TIMP3, CALR, LINC02345, NCAPH 
##     CRIP1, OCSTAMP, CAMK1, ID2, ACTB, LHFPL2, FAIM2, STAC, MRC1, SPRED1 
##     IGSF6, RGCC, CA2, CCND2, AC020656.1, RCBTB2, MNDA, IL1RN, LINC02408, PLCL1 
## Negative:  NUPR1, SAT1, XIST, PLEKHA5, PSAT1, CCPG1, BTG1, G0S2, STMN1, GDF15 
##     CLGN, SUPV3L1, MARCKS, C5orf17, CCDC26, PHGDH, PRKCE, NAMPT, GRB10, S100P 
##     HES2, RETREG1, AC073359.2, BCAT1, REV3L, BEX2, MAML3, AL035446.2, NIBAN1, DTNB 
## PC_ 4 
## Positive:  MT-ATP8, OSBP2, AC067751.1, TMEM117, TPRG1, AC092353.2, LINC02320, AC008591.1, LINC02408, KCNJ1 
##     NFATC3, PKD1L1, CT69, CPEB2, NOS1AP, PLCL1, AL136317.2, MICAL3, LY86-AS1, TTC28 
##     DNM3, ZNF366, XYLT1, DOCK3, ST5, ZFPM2, VWA8, MCU, AC093865.1, PDE4D 
## Negative:  IFI30, SRGN, CTSL, GAPDH, SAT1, BLVRB, LGMN, MARCKS, S100A10, ANXA1 
##     ACTG1, FUCA1, TXN, HSP90B1, PLIN2, TUBB, GLIPR1, HLA-C, CD164, ACTB 
##     CD74, TUBA1B, CD36, LDHB, TUBA1C, PLEK, HLA-DRB1, PRDX6, TMEM176A, HLA-DPA1 
## PC_ 5 
## Positive:  SLC35F1, CCL7, gene-HIV1-1, CSF1, TPM4, CCL3, gene-HIV1-2, ACTB, MSMO1, SQLE 
##     ACTG1, TCTEX1D2, LINC01091, INSIG1, MADD, PCSK6, PHLDA1, SPOCD1, C3, MACC1 
##     ATP13A3, TNFRSF9, CCL2, IL1RN, DHCR24, FMN1, B4GALT5, TM4SF19, LIMA1, PALM2-AKAP2 
## Negative:  PTGDS, CLU, LINC02244, SYNGR1, BX664727.3, HLA-C, COX5B, NCAPH, RNASE6, MGST1 
##     RAB6B, CSRP2, FOLR2, CCPG1, RARRES1, LINC01010, AKR7A2, CPE, MEIKIN, AL136317.2 
##     PEBP4, CAMP, LYZ, RCBTB2, AC015660.2, VAMP5, MT1E, SOD2, MT2A, MT1X

comb1 <- RunHarmony(comb1,"sample")

## Transposing data matrix

## Initializing state using k-means centroids initialization

## Harmony 1/10

## Harmony 2/10

## Harmony 3/10

## Harmony 4/10

## Harmony 5/10

## Harmony converged after 5 iterations

DimHeatmap(comb1, dims = 1:6, cells = 500, balanced = TRUE)

ElbowPlot(comb1)

comb1 <- JackStraw(comb1, num.replicate = 100)
comb1 <- FindNeighbors(comb1, dims = 1:10)

## Computing nearest neighbor graph

## Computing SNN

comb1 <- FindClusters(comb1, resolution = 0.5)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 12633
## Number of edges: 400608
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8898
## Number of communities: 13
## Elapsed time: 1 seconds

comb1 <- RunUMAP(comb1, dims = 1:10)

## 13:49:44 UMAP embedding parameters a = 0.9922 b = 1.112

## 13:49:44 Read 12633 rows and found 10 numeric columns

## 13:49:44 Using Annoy for neighbor search, n_neighbors = 30

## 13:49:44 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 13:49:45 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea25b5fc8d
## 13:49:45 Searching Annoy index using 1 thread, search_k = 3000
## 13:49:49 Annoy recall = 100%
## 13:49:50 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:49:51 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:49:52 Commencing optimization for 200 epochs, with 511998 positive edges
## 13:49:56 Optimization finished

DimPlot(comb1, reduction = "umap")

message("macrophage markers")

## macrophage markers

FeaturePlot(comb1, features = c("ADGRE1", "CCR2", "SIGLEC1", "CX3CR1", "MRC1", "CD163", "LYVE1", "CD9", "TREM2"))

DimPlot(comb1, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

ref <- celldex::MonacoImmuneData()
DefaultAssay(comb1) <- "RNA"
comb21 <- as.SingleCellExperiment(comb1)
lc1 <- logcounts(comb21)
pred_imm_broad1 <- SingleR(test=comb21, ref=ref,labels=ref$label.main)
head(pred_imm_broad1)

## DataFrame with 6 rows and 4 columns
##                                                    scores      labels
##                                                  <matrix> <character>
## mdm_mock1|AAACGAATCACATACG 0.307826:0.325650:0.169064:...   Monocytes
## mdm_mock1|AAACGCTCATCAGCGC 0.311008:0.281528:0.189016:...   Monocytes
## mdm_mock1|AAACGCTGTCGAGTGA 0.318344:0.277003:0.170895:...   Monocytes
## mdm_mock1|AAAGAACGTGCAACAG 0.214292:0.190767:0.121117:...   Monocytes
## mdm_mock1|AAAGGTAAGCCATATC 0.290416:0.291088:0.155001:...   Monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG 0.292140:0.281397:0.184535:...   Monocytes
##                            delta.next pruned.labels
##                             <numeric>   <character>
## mdm_mock1|AAACGAATCACATACG  0.1786319     Monocytes
## mdm_mock1|AAACGCTCATCAGCGC  0.0338547     Monocytes
## mdm_mock1|AAACGCTGTCGAGTGA  0.1301308     Monocytes
## mdm_mock1|AAAGAACGTGCAACAG  0.1116322            NA
## mdm_mock1|AAAGGTAAGCCATATC  0.1794308     Monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG  0.0952702     Monocytes

table(pred_imm_broad1$pruned.labels)

## 
##       Basophils Dendritic cells       Monocytes     Neutrophils 
##               2             249           10174               2

cellmetadata1$label <- pred_imm_broad1$pruned.labels
pred_imm_fine1 <- SingleR(test=comb21, ref=ref, labels=ref$label.fine)
head(pred_imm_fine1)

## DataFrame with 6 rows and 4 columns
##                                                     scores              labels
##                                                   <matrix>         <character>
## mdm_mock1|AAACGAATCACATACG 0.1836411:0.485683:0.206442:... Classical monocytes
## mdm_mock1|AAACGCTCATCAGCGC 0.1961234:0.430705:0.226843:... Classical monocytes
## mdm_mock1|AAACGCTGTCGAGTGA 0.1718613:0.442610:0.188544:... Classical monocytes
## mdm_mock1|AAAGAACGTGCAACAG 0.0943609:0.264157:0.120656:... Classical monocytes
## mdm_mock1|AAAGGTAAGCCATATC 0.1563980:0.415082:0.167965:... Classical monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG 0.1857623:0.432131:0.207144:... Classical monocytes
##                            delta.next       pruned.labels
##                             <numeric>         <character>
## mdm_mock1|AAACGAATCACATACG  0.0626269 Classical monocytes
## mdm_mock1|AAACGCTCATCAGCGC  0.1150706 Classical monocytes
## mdm_mock1|AAACGCTGTCGAGTGA  0.0651352 Classical monocytes
## mdm_mock1|AAAGAACGTGCAACAG  0.0648711 Classical monocytes
## mdm_mock1|AAAGGTAAGCCATATC  0.1073274 Classical monocytes
## mdm_mock1|AAAGGTAGTTTCCAAG  0.1521533 Classical monocytes

table(pred_imm_fine1$pruned.labels)

## 
##          Classical monocytes       Intermediate monocytes 
##                         9388                         1390 
##      Low-density neutrophils      Myeloid dendritic cells 
##                            3                          159 
##      Non classical monocytes Plasmacytoid dendritic cells 
##                           23                           15

cellmetadata1$finelabel <- pred_imm_fine1$pruned.labels
col_pal <- c('#e31a1c', '#ff7f00', "#999900", '#cc00ff', '#1f78b4', '#fdbf6f',
             '#33a02c', '#fb9a99', "#a6cee3", "#cc6699", "#b2df8a", "#99004d", "#66ff99",
             "#669999", "#006600", "#9966ff", "#cc9900", "#e6ccff", "#3399ff", "#ff66cc",
             "#ffcc66", "#003399")
annot_df1 <- data.frame(
  barcodes = rownames(pred_imm_broad1),
  monaco_broad_annotation = pred_imm_broad1$labels,
  monaco_broad_pruned_labels = pred_imm_broad1$pruned.labels,
  monaco_fine_annotation = pred_imm_fine1$labels,
  monaco_fine_pruned_labels = pred_imm_fine1$pruned.labels)

meta_inf1 <- comb1@meta.data
meta_inf1$cell_barcode <- colnames(comb1)
meta_inf1 <- meta_inf1 %>% dplyr::left_join(y = annot_df1, by = c("cell_barcode" = "barcodes"))
rownames(meta_inf1) <- colnames(lc1)
comb1@meta.data <- meta_inf1
DimPlot(comb1, label=TRUE, group.by = "monaco_broad_annotation", reduction = "umap",
  cols = col_pal, pt.size = 0.5) + ggtitle("Annotation With the Monaco Reference Database")

DimPlot(comb1, label=TRUE, group.by = "monaco_fine_annotation", reduction = "umap",
  cols = col_pal, pt.size = 0.5) + ggtitle("Annotation With the Monaco Reference Database")

message("extract mono")

## extract mono

mono <- comb1[,which(meta_inf1$monaco_broad_annotation == "Monocytes")]
mono_metainf1 <- meta_inf1[which(meta_inf1$monaco_broad_annotation == "Monocytes"),]
mono_metainf1 <- mono_metainf1[grep("monocytes",mono_metainf1$monaco_fine_pruned_labels),]
mono <- mono[,which(colnames(mono) %in% rownames(mono_metainf1))]
mono <- FindVariableFeatures(mono, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

mono <- RunPCA(mono, features = VariableFeatures(object = mono))

## Warning in PrepDR5(object = object, features = features, layer = layer, : The
## following features were not available: MT-ND5, MT-ND2, MT-ND1, AL358779.1,
## TSPAN8, FAH, PFN1, TCEA1, GLUL, EPB41L1, DDIT3, EPHA6, TUBB4B, TFRC, CNIH3,
## RABGEF1, AL669970.1, ECE2, SNHG5, MCTP1, RHOF, FRMD4A, FARP1, RERE, MTRNR2L1,
## RAP1GDS1, GPAT3, SETD2, HSP90AA1, STRBP, BCL2A1, EDIL3, NPTX1, RCAN1, RREB1,
## CFD, DZIP1, PEAK1, ACP5, BIN2, CSTA, PPARGC1B, ATP6V1D.

## PC_ 1 
## Positive:  S100A10, GAPDH, TXN, FABP3, PRDX6, CYSTM1, FABP4, LDHB, COX5B, C15orf48 
##     BLVRB, S100A9, PSME2, SLAMF9, STMN1, GDF15, SPP1, NMB, LILRA1, GAL 
##     CTSL, TUBA1B, HAMP, UCHL1, CARD16, ANXA1, NUPR1, LILRA2, TUBA1A, GSTM4 
## Negative:  ARL15, FTX, DPYD, EXOC4, VPS13B, LRMDA, DOCK3, NEAT1, ELMO1, RASAL2 
##     ATG7, JMJD1C, TRIO, TMEM117, ZEB2, MAML2, FHIT, DOCK4, MALAT1, TCF12 
##     COP1, SPIDR, ATP9B, ARHGAP15, MED13L, ZSWIM6, MBD5, RAD51B, TPRG1, TBC1D5 
## PC_ 2 
## Positive:  TM4SF19, ANO5, GPC4, FNIP2, SPP1, TXNRD1, PSD3, CYSTM1, RETREG1, RGS20 
##     CD164, TCTEX1D2, MGST1, ACAT2, CCDC26, CCL22, MREG, SLC28A3, FABP3, SCD 
##     LINC01135, C15orf48, TXN, GDF15, GAL, HES2, CHI3L1, NIBAN1, COL8A2, RGCC 
## Negative:  TGFBI, HLA-DPB1, HLA-DRA, HLA-DPA1, CD74, HLA-DQB1, MS4A6A, HLA-DQA1, C1QA, CEBPD 
##     C1QC, MPEG1, FPR3, AIF1, CD163, MS4A7, CD14, HLA-DRB5, ST8SIA4, LILRB2 
##     RNASE1, TCF4, HLA-DRB1, FOS, EPB41L3, ARL4C, HLA-DQA2, TIMP1, SEMA4A, PDE4B 
## PC_ 3 
## Positive:  NUPR1, SAT1, BTG1, CCPG1, MARCKS, PSAT1, G0S2, XIST, STMN1, PLEKHA5 
##     GDF15, SUPV3L1, CLGN, NAMPT, PHGDH, ADAMDEC1, HES2, S100P, BCAT1, CXCR4 
##     SDS, C5orf17, SLAMF9, REV3L, RETREG1, CCDC26, GRB10, PRKCE, BEX2, CD14 
## Negative:  GCLC, GSN, CYP1B1, DUSP2, LINC02345, S100A4, TIMP3, LPL, NCAPH, OCSTAMP 
##     CRIP1, STAC, CAMK1, FAIM2, CALR, SPRED1, ID2, LHFPL2, MRC1, LINC02408 
##     PLCL1, RCBTB2, HIVEP3, ACTB, CA2, RNF128, AC020656.1, CCND2, IGSF6, RGCC 
## PC_ 4 
## Positive:  MT-ATP8, PTGDS, AC008591.1, LY86-AS1, AL136317.2, RCBTB2, KCNJ1, PKD1L1, BX664727.3, AC067751.1 
##     EPHB1, LINC01010, MT1G, CLU, TMEM117, PDE4D, OSBP2, LINC02320, LINC02244, AC015660.2 
##     XYLT1, MT1H, CT69, KLRD1, LINC02408, CC2D2B, MT1X, AP000331.1, SPON2, KCNMA1 
## Negative:  ACTG1, ACTB, TPM4, TUBA1B, GAPDH, TUBB, GLIPR1, CSF1, CD36, CCL3 
##     CTSL, PLEK, LGMN, LDHA, MGLL, HSP90B1, TUBA1C, MARCKS, CALR, INSIG1 
##     MSMO1, SLC35F1, C15orf48, ANXA1, PHLDA1, BLVRB, CD164, TMEM176A, DUSP6, IFI30 
## PC_ 5 
## Positive:  CSF1, ACTB, TCTEX1D2, MSMO1, gene-HIV1-1, CCL3, gene-HIV1-2, TPM4, ACTG1, TM4SF19 
##     MADD, SPOCD1, CCL7, SLC35F1, PCSK6, UGCG, PHLDA1, PPARG, SPRED1, ATP13A3 
##     ANK2, IL1RN, CD36, CBLB, SQLE, TM4SF19-AS1, B4GALT5, TNFRSF9, NRP1, TIMP3 
## Negative:  TYMS, PCLAF, TK1, MKI67, MYBL2, CEP55, BIRC5, CENPM, DIAPH3, CENPK 
##     RRM2, KIF11, PRC1, CDK1, SHCBP1, CLSPN, CENPF, ANLN, RAD51AP1, NUSAP1 
##     CENPU, KIF4A, ESCO2, TOP2A, TPX2, ASPM, CIT, MAD2L1, FAM111B, KNL1

DimHeatmap(mono, dims = 1:2, cells = 500, balanced = TRUE)

DimHeatmap(mono, dims = 3:4, cells = 500, balanced = TRUE)

ElbowPlot(mono)

mono <- FindNeighbors(mono, dims = 1:4)

## Computing nearest neighbor graph
## Computing SNN

mono <- FindClusters(mono, algorithm = 3, resolution = 0.3, verbose = FALSE)
mono <- RunUMAP(mono, dims = 1:4)

## 13:50:52 UMAP embedding parameters a = 0.9922 b = 1.112
## 13:50:52 Read 10758 rows and found 4 numeric columns
## 13:50:52 Using Annoy for neighbor search, n_neighbors = 30
## 13:50:52 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 13:50:53 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea6a14858b
## 13:50:53 Searching Annoy index using 1 thread, search_k = 3000
## 13:50:57 Annoy recall = 100%
## 13:50:58 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:51:01 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:51:01 Commencing optimization for 200 epochs, with 364062 positive edges
## 13:51:04 Optimization finished

DimPlot(mono, reduction = "umap", label=TRUE)

DimPlot(mono, group.by="monaco_fine_annotation" , reduction = "umap", label=TRUE)

DimPlot(mono, group.by="sample" , reduction = "umap", label=TRUE)

Make Alv object

alvlist <- mylist[grep("alv",names(mylist))]

comb1 <- do.call(cbind,alvlist)
sce1 <- SingleCellExperiment(list(counts=comb1))
sce1

## class: SingleCellExperiment 
## dim: 36603 13240 
## metadata(0):
## assays(1): counts
## rownames(36603): gene-HIV1-1 gene-HIV1-2 ... AC007325.4 AC007325.2
## rowData names(0):
## colnames(13240): alv_mock1|AAACCCAGTGCTGCAC alv_mock1|AAAGAACCATTGAGGG
##   ... alv_bystander4|TTTCGATGTGAGCAGT alv_bystander4|TTTGATCTCGGCTTGG
## colData names(0):
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

cellmetadata1 <- data.frame(colnames(comb1) ,sapply(strsplit(colnames(comb1),"\\|"),"[[",1))
colnames(cellmetadata1) <- c("cell","sample")
comb1 <- CreateSeuratObject(comb1, project = "mac", assay = "RNA", meta.data = cellmetadata1)
comb1 <- NormalizeData(comb1)

## Normalizing layer: counts

comb1 <- FindVariableFeatures(comb1, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

comb1 <- ScaleData(comb1)

## Centering and scaling data matrix

comb1 <- RunPCA(comb1, features = VariableFeatures(object = comb1))

## PC_ 1 
## Positive:  FABP4, gene-HIV1-2, NUPR1, TIMP1, S100A8, STMN1, VSIG4, CAMP, RBP1, LILRA1 
##     ORM1, PRG2, HLA-DOA, CD79A, CCL5, PDPN, VMO1, MDK, LGALS2, UTS2 
##     AC078850.1, AC026369.3, RBP4, MS4A6E, PTGES, HCFC1-AS1, PPBP, CXCL10, FXYD2, LINC01467 
## Negative:  DOCK3, RASAL2, ARL15, MALAT1, NEAT1, ASAP1, PLXDC2, LRMDA, TMEM117, DPYD 
##     EXOC4, MITF, ATG7, FTX, FRMD4B, JMJD1C, FMNL2, ZNF438, UBE2E2, FHIT 
##     MAML2, LPP, CHD9, DENND4C, TPRG1, ZEB2, LRCH1, RAPGEF1, ELMO1, MEF2A 
## PC_ 2 
## Positive:  CYSTM1, CSTB, TM4SF19, GAL, ATP6V1D, FDX1, GM2A, CCL22, TGM2, ACAT2 
##     LGALS1, SCD, CD164, ELOC, FAH, PRDX6, CSF1, RHOF, GAPLINC, FCMR 
##     CYTOR, C15orf48, UPP1, GOLGA7B, CIR1, TNFRSF12A, NCAPH, SNHG32, DCSTAMP, BCAT1 
## Negative:  TRPS1, SAMSN1, SLC8A1, PDGFC, AL356124.1, TGFBI, CAMK1D, SPRED1, SLC9A9, ARHGAP15 
##     RTN1, RCBTB2, ATP8B4, MRC1, FCHSD2, CD14, XYLT1, FOS, HLA-DPB1, ME1 
##     C1QC, TCF7L2, AIF1, MS4A6A, MAML3, VAMP5, C1QA, DLEU2, TCF12, DIAPH2 
## PC_ 3 
## Positive:  KCNJ1, C2orf92, LINC02320, AC067751.1, AC015660.2, NCAPH, RGS20, LINC01800, AL136317.2, CCL22 
##     XIST, BX664727.3, LINC01010, AP000331.1, CT69, MICAL3, GDA, AC012668.3, ANO5, TRIM54 
##     AC008591.1, NOS1AP, AC092353.2, PDE4D, TCTEX1D2, LY86-AS1, TDRD3, MIR646HG, OSBP2, SLC25A48 
## Negative:  AIF1, IFI30, CTSZ, MARCKS, CD74, HLA-DRA, HLA-DPA1, LGMN, HLA-DRB1, C1QA 
##     HLA-DPB1, CTSL, CTSB, ACTB, FPR3, C1QC, BLVRB, ACTG1, HLA-DQB1, FUCA1 
##     TUBA1B, CTSC, HLA-DQA1, GAPDH, FCGR3A, IL4I1, TUBB, C1QB, FBP1, CD163 
## PC_ 4 
## Positive:  FMN1, MARCO, SLC11A1, XIST, SNCA, HAMP, S100A9, FRMD4A, CCDC26, BCAT1 
##     ME3, CLMP, SASH1, TDRD9, TTC39B, PHACTR1, SLC16A10, DLEU1, UGCG, AC023282.1 
##     PLEKHA5, SEL1L2, SLA, AL035446.2, MSR1, OSBP2, GYPC, DNAJC6, AP005262.2, B4GALT5 
## Negative:  PTGDS, GCLC, CRIP1, CLU, MX1, MMP9, TMEM176A, ALOX5AP, TUBA1A, FGL2 
##     PLEK, LINC02244, VAMP5, IFIT3, IFI6, MMP7, KCNMA1, DUSP2, RGCC, MGST1 
##     MX2, RNF128, SYNGR1, ISG15, NCAPH, IFIT1, C1QTNF4, MIF, FAIM2, HLA-DRB1 
## PC_ 5 
## Positive:  CFD, CTSZ, IFI30, SAT1, CTSB, FUCA1, LYZ, GCHFR, CD74, AC020656.1 
##     HLA-DRB1, CSTB, IFI6, CHI3L1, ATP1B1, C15orf48, GPX3, RARRES1, TFRC, HLA-DQB1 
##     HLA-DRB5, MS4A7, GM2A, RGCC, HLA-DRA, MMP9, CA2, HLA-DPB1, SELENOP, RASSF4 
## Negative:  TYMS, PCLAF, TK1, CLSPN, MKI67, CENPM, RRM2, DIAPH3, MYBL2, BIRC5 
##     CDK1, SHCBP1, ESCO2, CEP55, NCAPG, CENPK, FAM111B, NUSAP1, TOP2A, CENPF 
##     HELLS, ATAD2, PRC1, CCNA2, HMMR, ASPM, TPX2, KIF11, MCM10, DTL

comb1 <- RunHarmony(comb1,"sample")

## Transposing data matrix

## Initializing state using k-means centroids initialization

## Harmony 1/10

## Harmony 2/10

## Harmony 3/10

## Harmony 4/10

## Harmony converged after 4 iterations

DimHeatmap(comb1, dims = 1:6, cells = 500, balanced = TRUE)

ElbowPlot(comb1)

comb1 <- JackStraw(comb1, num.replicate = 100)
comb1 <- FindNeighbors(comb1, dims = 1:10)

## Computing nearest neighbor graph

## Computing SNN

comb1 <- FindClusters(comb1, resolution = 0.5)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 13240
## Number of edges: 416149
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8777
## Number of communities: 11
## Elapsed time: 1 seconds

comb1 <- RunUMAP(comb1, dims = 1:10)

## 13:55:41 UMAP embedding parameters a = 0.9922 b = 1.112

## 13:55:41 Read 13240 rows and found 10 numeric columns

## 13:55:41 Using Annoy for neighbor search, n_neighbors = 30

## 13:55:41 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 13:55:42 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea5e26fe47
## 13:55:42 Searching Annoy index using 1 thread, search_k = 3000
## 13:55:46 Annoy recall = 100%
## 13:55:47 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:55:50 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:55:50 Commencing optimization for 200 epochs, with 539610 positive edges
## 13:55:54 Optimization finished

DimPlot(comb1, reduction = "umap")

message("macrophage markers")

## macrophage markers

FeaturePlot(comb1, features = c("ADGRE1", "CCR2", "SIGLEC1", "CX3CR1", "MRC1", "CD163", "LYVE1", "CD9", "TREM2"))

DimPlot(comb1, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

ref <- celldex::MonacoImmuneData()
DefaultAssay(comb1) <- "RNA"
comb21 <- as.SingleCellExperiment(comb1)
lc1 <- logcounts(comb21)
pred_imm_broad1 <- SingleR(test=comb21, ref=ref,labels=ref$label.main)
head(pred_imm_broad1)

## DataFrame with 6 rows and 4 columns
##                                                      scores          labels
##                                                    <matrix>     <character>
## alv_mock1|AAACCCAGTGCTGCAC 0.273681:0.2723954:0.1612672:...       Monocytes
## alv_mock1|AAAGAACCATTGAGGG 0.125532:0.0955108:0.0727378:... Dendritic cells
## alv_mock1|AAAGAACTCTCATGCC 0.093777:0.0888391:0.0837904:... Dendritic cells
## alv_mock1|AAAGGATAGCATGAAT 0.318637:0.3074635:0.1919116:...       Monocytes
## alv_mock1|AAAGGATAGTCAGGGT 0.294499:0.2756627:0.1829148:...       Monocytes
## alv_mock1|AAAGGATAGTTCCGGC 0.293979:0.2941562:0.1826853:...       Monocytes
##                            delta.next   pruned.labels
##                             <numeric>     <character>
## alv_mock1|AAACCCAGTGCTGCAC 0.13448850       Monocytes
## alv_mock1|AAAGAACCATTGAGGG 0.11133884 Dendritic cells
## alv_mock1|AAAGAACTCTCATGCC 0.00571364 Dendritic cells
## alv_mock1|AAAGGATAGCATGAAT 0.10466784       Monocytes
## alv_mock1|AAAGGATAGTCAGGGT 0.13985529       Monocytes
## alv_mock1|AAAGGATAGTTCCGGC 0.12840710       Monocytes

table(pred_imm_broad1$pruned.labels)

## 
##       Basophils Dendritic cells       Monocytes     Neutrophils        NK cells 
##               2             329           11305              10               3 
##     Progenitors 
##               1

cellmetadata1$label <- pred_imm_broad1$pruned.labels
pred_imm_fine1 <- SingleR(test=comb21, ref=ref, labels=ref$label.fine)
head(pred_imm_fine1)

## DataFrame with 6 rows and 4 columns
##                                                      scores
##                                                    <matrix>
## alv_mock1|AAACCCAGTGCTGCAC 0.1630294:0.398305:0.1767129:...
## alv_mock1|AAAGAACCATTGAGGG 0.0671221:0.151726:0.0730234:...
## alv_mock1|AAAGAACTCTCATGCC 0.0614206:0.101479:0.0761397:...
## alv_mock1|AAAGGATAGCATGAAT 0.1803723:0.435098:0.2088530:...
## alv_mock1|AAAGGATAGTCAGGGT 0.1703657:0.388786:0.1868922:...
## alv_mock1|AAAGGATAGTTCCGGC 0.1664892:0.422481:0.1894544:...
##                                            labels  delta.next
##                                       <character>   <numeric>
## alv_mock1|AAACCCAGTGCTGCAC    Classical monocytes 1.43376e-01
## alv_mock1|AAAGAACCATTGAGGG Myeloid dendritic ce.. 2.22045e-16
## alv_mock1|AAAGAACTCTCATGCC Myeloid dendritic ce.. 5.33208e-03
## alv_mock1|AAAGGATAGCATGAAT    Classical monocytes 1.21257e-01
## alv_mock1|AAAGGATAGTCAGGGT    Classical monocytes 5.02055e-02
## alv_mock1|AAAGGATAGTTCCGGC    Classical monocytes 9.94518e-02
##                                     pruned.labels
##                                       <character>
## alv_mock1|AAACCCAGTGCTGCAC    Classical monocytes
## alv_mock1|AAAGAACCATTGAGGG Myeloid dendritic ce..
## alv_mock1|AAAGAACTCTCATGCC Myeloid dendritic ce..
## alv_mock1|AAAGGATAGCATGAAT    Classical monocytes
## alv_mock1|AAAGGATAGTCAGGGT    Classical monocytes
## alv_mock1|AAAGGATAGTTCCGGC    Classical monocytes

table(pred_imm_fine1$pruned.labels)

## 
##           Classical monocytes        Intermediate monocytes 
##                          9996                          1550 
##         Low-density basophils       Low-density neutrophils 
##                             1                            14 
##       Myeloid dendritic cells          Natural killer cells 
##                           246                             3 
##       Non classical monocytes  Plasmacytoid dendritic cells 
##                            32                             9 
##              Progenitor cells Terminal effector CD4 T cells 
##                             4                             1

cellmetadata1$finelabel <- pred_imm_fine1$pruned.labels
col_pal <- c('#e31a1c', '#ff7f00', "#999900", '#cc00ff', '#1f78b4', '#fdbf6f',
             '#33a02c', '#fb9a99', "#a6cee3", "#cc6699", "#b2df8a", "#99004d", "#66ff99",
             "#669999", "#006600", "#9966ff", "#cc9900", "#e6ccff", "#3399ff", "#ff66cc",
             "#ffcc66", "#003399")
annot_df1 <- data.frame(
  barcodes = rownames(pred_imm_broad1),
  monaco_broad_annotation = pred_imm_broad1$labels,
  monaco_broad_pruned_labels = pred_imm_broad1$pruned.labels,
  monaco_fine_annotation = pred_imm_fine1$labels,
  monaco_fine_pruned_labels = pred_imm_fine1$pruned.labels)

meta_inf1 <- comb1@meta.data
meta_inf1$cell_barcode <- colnames(comb1)
meta_inf1 <- meta_inf1 %>% dplyr::left_join(y = annot_df1, by = c("cell_barcode" = "barcodes"))
rownames(meta_inf1) <- colnames(lc1)
comb1@meta.data <- meta_inf1
DimPlot(comb1, label=TRUE, group.by = "monaco_broad_annotation", reduction = "umap",
  cols = col_pal, pt.size = 0.5) + ggtitle("Annotation With the Monaco Reference Database")

DimPlot(comb1, label=TRUE, group.by = "monaco_fine_annotation", reduction = "umap",
  cols = col_pal, pt.size = 0.5) + ggtitle("Annotation With the Monaco Reference Database")

message("extract mono")

## extract mono

mono <- comb1[,which(meta_inf1$monaco_broad_annotation == "Monocytes")]
mono_metainf1 <- meta_inf1[which(meta_inf1$monaco_broad_annotation == "Monocytes"),]
mono_metainf1 <- mono_metainf1[grep("monocytes",mono_metainf1$monaco_fine_pruned_labels),]
mono <- mono[,which(colnames(mono) %in% rownames(mono_metainf1))]
mono <- FindVariableFeatures(mono, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

mono <- RunPCA(mono, features = VariableFeatures(object = mono))

## Warning in PrepDR5(object = object, features = features, layer = layer, : The
## following features were not available: IGSF6, RPS20, TRIO, MOSPD1, DOP1B,
## SNX10, AC008443.9, SDC2, SCFD1, MCTP1, CRABP2, QKI, HTRA4, S100A6, ALDH1A1,
## ASPH, NABP1, C11orf45, MGAT5, ARHGEF3, CDKAL1, ZFAND3, PTPRJ, LGALS3, HMOX1,
## AC092422.1, RP1L1, UBASH3B, PPP1R12B, RALGAPA2, AEBP2, C1orf21, DPH6, PLA2G7,
## DOCK2, LPAR1, AFF1, ABCA1, MB21D2, ATP6V1H, PITPNC1, ATP6V0D2, DMXL1, MARCH1,
## RNLS, ZPLD1, SLC22A15, ZNF609, SPIRE1, CALM3, CHAC1, SH3PXD2B, SLC4A1.

## PC_ 1 
## Positive:  GAPDH, LGALS1, MIF, CSTB, BLVRB, TUBA1B, IFI30, PRDX6, FABP4, TMEM176A 
##     TXN, FAH, TUBA1A, CD74, EMP3, HLA-DPA1, TIMP1, ACTB, S100A9, NUPR1 
##     MMP9, HLA-DRB1, CFD, C15orf48, HLA-DRA, ELOC, AIF1, H2AFZ, PTGDS, LILRA1 
## Negative:  DOCK3, ARL15, MALAT1, TMEM117, RASAL2, FTX, EXOC4, FHIT, LRMDA, DPYD 
##     TPRG1, VPS13B, JMJD1C, NEAT1, ELMO1, ATG7, MAML2, ZNF438, COP1, MITF 
##     TTC28, FMNL2, VTI1A, MED13L, ERC1, ATP9B, ASAP1, DENND4C, SPIDR, RAD51B 
## PC_ 2 
## Positive:  GAL, CCL22, TM4SF19, FDX1, CYSTM1, ATP6V1D, TGM2, ACAT2, SCD, CIR1 
##     GM2A, RHOF, CSF1, NCAPH, IARS, GOLGA7B, DUSP13, BCAT1, CSTB, CSRP2 
##     FAH, SNHG32, SLC20A1, FCMR, CD164, GAPLINC, TCTEX1D2, EPB41L1, SH3BP5, NMB 
## Negative:  HLA-DPA1, HLA-DPB1, HLA-DRA, AIF1, CD74, C1QA, TGFBI, SAMSN1, FOS, C1QC 
##     CD14, MRC1, CTSC, SLC8A1, VAMP5, MS4A6A, HLA-DRB1, TRPS1, SLC9A9, FPR3 
##     CLEC4A, PDGFC, LYZ, SLCO2B1, FGL2, C1QB, RTN1, CAMK1D, CLEC7A, SPRED1 
## PC_ 3 
## Positive:  MARCKS, LGMN, TPM4, CTSL, AIF1, FPR3, HLA-DQA1, CTSZ, HAMP, CD36 
##     ACTG1, CD163, C1QC, HLA-DQB1, GYPC, CCL3, IL18BP, BLVRB, STMN1, SMS 
##     FMN1, MARCO, C1QA, FUCA1, PLAU, IFI30, FABP4, FCGR3A, COLEC12, S100A9 
## Negative:  PTGDS, LINC02244, CLU, LINC01010, NCAPH, CRIP1, AC015660.2, LINC01800, SYNGR1, AC067751.1 
##     GCLC, KCNMA1, RCBTB2, C1QTNF4, KCNJ1, DUSP2, SPON2, NRCAM, SERTAD2, C2orf92 
##     MGST1, FGL2, TRIM54, CT69, RNF128, MX1, NCF1, NFATC3, RGS20, IFIT3 
## PC_ 4 
## Positive:  XIST, JAML, HLA-DRB5, QPCT, SLC11A1, GCHFR, MARCO, PAX8-AS1, S100A9, CCDC26 
##     AC020656.1, SASH1, MSR1, MS4A7, AL136317.2, GPX3, C22orf34, AL035446.2, TMTC1, FRMD4A 
##     OSBP2, RARRES1, MIR646HG, TDRD9, ST6GAL1, DLEU1, SLC6A16, NMB, GPRIN3, CLMP 
## Negative:  TYMS, PCLAF, CLSPN, TK1, CENPM, RRM2, DIAPH3, MYBL2, MKI67, SHCBP1 
##     CDK1, BIRC5, CEP55, HELLS, ESCO2, FAM111B, NCAPG, NUSAP1, HMMR, TOP2A 
##     KIF11, ATAD2, CENPF, PRC1, CENPK, TPX2, CENPU, MCM10, CCNA2, ASPM 
## PC_ 5 
## Positive:  TYMS, MKI67, PCLAF, TK1, JAML, RRM2, CDK1, CLSPN, DIAPH3, CENPK 
##     CENPM, MYBL2, XIST, BIRC5, SHCBP1, ESCO2, RAD51AP1, TOP2A, NCAPG, CENPF 
##     CCNA2, FAM111B, ATAD2, CEP55, NUSAP1, HLA-DRB5, ASPM, ANLN, PRC1, DTL 
## Negative:  TMEM176A, PLEK, LINC01091, MIF, IL1RN, ACTG1, TUBA1A, CYTOR, MYL9, ACTB 
##     LINC00278, MMP19, MGLL, PHLDA1, ADAMDEC1, TTTY14, CSF1, AC006001.2, GLIPR1, TEX14 
##     PRDX6, RABGEF1, HLA-DPA1, LCP1, PPM1H, CCND1, HDAC9, DPYSL3, TIMP3, HLA-DRA

DimHeatmap(mono, dims = 1:2, cells = 500, balanced = TRUE)

DimHeatmap(mono, dims = 3:4, cells = 500, balanced = TRUE)

ElbowPlot(mono)

mono <- FindNeighbors(mono, dims = 1:4)

## Computing nearest neighbor graph
## Computing SNN

mono <- FindClusters(mono, algorithm = 3, resolution = 0.3, verbose = FALSE)
mono <- RunUMAP(mono, dims = 1:4)

## 13:56:49 UMAP embedding parameters a = 0.9922 b = 1.112
## 13:56:49 Read 11565 rows and found 4 numeric columns
## 13:56:49 Using Annoy for neighbor search, n_neighbors = 30
## 13:56:49 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 13:56:50 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea29100cdd
## 13:56:50 Searching Annoy index using 1 thread, search_k = 3000
## 13:56:54 Annoy recall = 100%
## 13:56:55 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:56:57 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:56:57 Commencing optimization for 200 epochs, with 384128 positive edges
## 13:57:01 Optimization finished

DimPlot(mono, reduction = "umap", label=TRUE)

DimPlot(mono, group.by="monaco_fine_annotation" , reduction = "umap", label=TRUE)

DimPlot(mono, group.by="sample" , reduction = "umap", label=TRUE)

Extract monocytes MDM only

mono <- comb[,which(meta_inf$monaco_broad_annotation == "Monocytes")]
mono_metainf <- meta_inf[which(meta_inf$monaco_broad_annotation == "Monocytes"),]
mono_metainf1 <- mono_metainf[grep("monocytes",mono_metainf$monaco_fine_pruned_labels),]
mono <- mono[,which(colnames(mono) %in% rownames(mono_metainf))]
mono <- FindVariableFeatures(mono, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

mono <- RunPCA(mono, features = VariableFeatures(object = mono))

## Warning in PrepDR5(object = object, features = features, layer = layer, : The
## following features were not available: LINC02739, ARHGAP10, AL157702.2, DUSP1,
## RPS6KA2, CEP85L, CAMK1, OTOA, ANKRD44, RSPO3, TMCC1.

## PC_ 1 
## Positive:  MT-ATP8, MTRNR2L1, FABP4, FABP3, GAPDH, HAMP, S100A9, AC105402.3, C1orf56, PVALB 
##     AL136097.2, TUBA1B, CARD16, FAH, BIRC7, C1QC, UTS2, gene-HIV1-2, MT2A, NUPR1 
##     BLVRB, BEX3, LILRA1, GCHFR, AC025154.2, MT1G, TXN, GSTP1, PRDX6, FXYD2 
## Negative:  ARL15, DOCK3, NEAT1, EXOC4, FTX, MALAT1, RASAL2, LRMDA, DPYD, JMJD1C 
##     TMEM117, FHIT, PLXDC2, VPS13B, MITF, ZEB2, DENND4C, MAML2, ATG7, ZNF438 
##     TPRG1, COP1, FRMD4B, ELMO1, JARID2, ATXN1, FMNL2, TCF12, ERC1, ARID1B 
## PC_ 2 
## Positive:  CYSTM1, FAH, GDF15, PSAT1, CD164, FDX1, ATP6V1D, CCPG1, BCAT1, SAT1 
##     SLAMF9, TXN, PHGDH, HES2, CSTA, HEBP2, NUPR1, STMN1, TCEA1, CTSL 
##     GARS, MARCO, RETREG1, S100A10, RILPL2, CLGN, SNHG5, SCD, RHOQ, SNHG32 
## Negative:  SPRED1, RCBTB2, KCNMA1, GCLC, HLA-DRB1, HLA-DRA, FGL2, CD74, HLA-DPA1, MRC1 
##     SLCO2B1, AC020656.1, HLA-DPB1, C1QA, RTN1, LINC02345, STAC, MERTK, PDGFC, VAMP5 
##     CCDC102B, DPYD, RNF128, MX1, ATP8B4, TGFBI, CRIP1, NFATC3, HLA-DRB5, ATG7 
## PC_ 3 
## Positive:  NCAPH, CRABP2, TM4SF19, GAL, DUSP2, RGCC, CHI3L1, CCL22, TMEM114, ACAT2 
##     TRIM54, RGS20, LINC01010, LINC02244, MGST1, TCTEX1D2, GPC4, NUMB, CCND1, SYNGR1 
##     CLU, SERTAD2, AC092353.2, POLE4, SLC20A1, IL1RN, ZNF366, AC005280.2, OCSTAMP, GCLC 
## Negative:  MARCKS, CD14, TGFBI, BTG1, MS4A6A, HLA-DQA1, FPR3, CTSC, CD163, MPEG1 
##     MEF2C, AIF1, TIMP1, HLA-DPB1, C1QC, HLA-DQB1, SELENOP, RNASE1, HLA-DQA2, NAMPT 
##     TCF4, EPB41L3, ARL4C, NUPR1, ALDH2, SAMSN1, ZNF331, HIF1A, MS4A7, SEMA4A 
## PC_ 4 
## Positive:  MT-ATP8, MTRNR2L1, XIST, PDE4D, AL035446.2, AC073359.2, AL365295.1, CCDC26, EPHB1, LY86-AS1 
##     CLVS1, RAD51B, PRKCE, C5orf17, LINC02320, FTX, DMGDH, LINC01473, LINC01135, FHIT 
##     AC008591.1, PLEKHA5, MIR646HG, BCAS3, PKD1L1, AF117829.1, LINC02649, ZBTB20, STX18-AS1, AC079142.1 
## Negative:  HLA-DRB1, CTSB, TUBA1B, HSP90B1, CD74, GAPDH, ACTG1, HLA-DPA1, HLA-DRA, IFI30 
##     RGCC, HSPA5, AIF1, C15orf48, C1QA, LGMN, TUBB, CYP1B1, HLA-DPB1, FBP1 
##     CA2, CCL3, TUBA1A, HLA-DQB1, S100A4, RNASE6, LYZ, TXN, MMP9, TNFSF13 
## PC_ 5 
## Positive:  RGCC, MMP9, IFI30, CTSB, PTGDS, LINC02244, MGST1, HLA-DRB1, CHI3L1, LYZ 
##     CD74, C15orf48, HSPA5, FABP3, GCLC, GCHFR, CRABP2, HLA-DPA1, TXN, HLA-DRA 
##     ISG15, S100A10, NCAPH, TMEM176B, CDKN2A, RALA, HSP90B1, SAT1, CLU, VAMP5 
## Negative:  TYMS, PCLAF, TK1, MKI67, MYBL2, CENPM, RRM2, BIRC5, CLSPN, CEP55 
##     SHCBP1, DIAPH3, CDK1, CENPK, PRC1, CENPF, ESCO2, NUSAP1, TOP2A, HELLS 
##     NCAPG, ANLN, FAM111B, TPX2, CCNA2, KIF11, ASPM, CENPU, MAD2L1, HMMR

DimHeatmap(mono, dims = 1:2, cells = 500, balanced = TRUE)

DimHeatmap(mono, dims = 3:4, cells = 500, balanced = TRUE)

ElbowPlot(mono)

mono <- FindNeighbors(mono, dims = 1:4)

## Computing nearest neighbor graph

## Computing SNN

mono <- FindClusters(mono, algorithm = 3, resolution = 0.3, verbose = FALSE)
mono <- RunUMAP(mono, dims = 1:4)

## 13:57:54 UMAP embedding parameters a = 0.9922 b = 1.112

## 13:57:54 Read 28289 rows and found 4 numeric columns

## 13:57:54 Using Annoy for neighbor search, n_neighbors = 30

## 13:57:54 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 13:57:56 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea74d9bf5e
## 13:57:57 Searching Annoy index using 1 thread, search_k = 3000
## 13:58:05 Annoy recall = 100%
## 13:58:06 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:58:08 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:58:08 Commencing optimization for 200 epochs, with 936166 positive edges
## 13:58:15 Optimization finished

DimPlot(mono, reduction = "umap", label=TRUE)

DimPlot(mono, group.by="monaco_fine_annotation" , reduction = "umap", label=TRUE)

DimPlot(mono, group.by="sample" , reduction = "umap", label=TRUE)

Cell counts

Most cells are classified as monocytes.

cc <- table(meta_inf[,c("sample","monaco_broad_annotation")])

cc %>% kbl(caption="cell counts") %>% kable_paper("hover", full_width = F)

cell counts
	Basophils	Dendritic cells	Monocytes	Neutrophils	NK cells	Progenitors
alv_active1	1	34	908	2	0	0
alv_active2	0	8	613	0	0	0
alv_active3	1	28	684	0	0	0
alv_bystander1	0	29	1028	2	0	0
alv_bystander2	0	20	1100	0	1	1
alv_bystander3	0	12	491	1	0	0
alv_bystander4	0	20	558	3	0	0
alv_latent1	0	12	740	0	0	0
alv_latent2	0	30	1905	1	0	0
alv_latent3	1	26	1804	0	0	0
alv_latent4	0	52	1823	0	0	0
alv_mock1	0	18	956	0	0	0
alv_mock2	0	19	785	0	0	0
alv_mock3	0	47	1304	1	2	0
mdm_active1	0	17	863	0	0	0
mdm_active2	0	3	476	0	0	0
mdm_active3	0	13	408	0	0	0
mdm_active4	0	4	432	0	0	0
mdm_bystander1	0	48	1446	0	0	0
mdm_bystander2	0	31	1303	0	0	0
mdm_bystander3	0	47	490	3	0	0
mdm_latent1	1	17	1154	0	0	0
mdm_latent2	0	11	1240	0	0	0
mdm_latent3	0	11	337	0	0	0
mdm_mock1	0	6	790	0	0	0
mdm_mock2	0	7	652	0	0	0
mdm_mock3	0	4	173	0	0	0
mdm_mock4	0	4	811	0	0	0
react6	1	79	3015	1	1	1

tcc <- t(cc)

pctcc <- apply(tcc,2,function(x) { x/sum(x)*100} )

pctcc %>% kbl(caption="cell proportions") %>% kable_paper("hover", full_width = F)

cell proportions
	alv_active1	alv_active2	alv_active3	alv_bystander1	alv_bystander2	alv_bystander3	alv_bystander4	alv_latent1	alv_latent2	alv_latent3	alv_latent4	alv_mock1	alv_mock2	alv_mock3	mdm_active1	mdm_active2	mdm_active3	mdm_active4	mdm_bystander1	mdm_bystander2	mdm_bystander3	mdm_latent1	mdm_latent2	mdm_latent3	mdm_mock1	mdm_mock2	mdm_mock3	mdm_mock4	react6
Basophils	0.1058201	0.000000	0.1402525	0.0000000	0.0000000	0.0000000	0.0000000	0.000000	0.0000000	0.054615	0.000000	0.000000	0.000000	0.0000000	0.000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0853242	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0322789
Dendritic cells	3.5978836	1.288245	3.9270687	2.7384325	1.7825312	2.3809524	3.4423408	1.595745	1.5495868	1.419989	2.773333	1.848049	2.363184	3.4711965	1.931818	0.6263048	3.087886	0.9174312	3.212851	2.323838	8.7037037	1.4505119	0.8792966	3.160919	0.7537688	1.062215	2.259887	0.4907975	2.5500323
Monocytes	96.0846561	98.711755	95.9326788	97.0727101	98.0392157	97.4206349	96.0413081	98.404255	98.3987603	98.525396	97.226667	98.151951	97.636816	96.3072378	98.068182	99.3736952	96.912114	99.0825688	96.787149	97.676162	90.7407407	98.4641638	99.1207034	96.839080	99.2462312	98.937785	97.740113	99.5092025	97.3208522
Neutrophils	0.2116402	0.000000	0.0000000	0.1888574	0.0000000	0.1984127	0.5163511	0.000000	0.0516529	0.000000	0.000000	0.000000	0.000000	0.0738552	0.000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.5555556	0.0000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0322789
NK cells	0.0000000	0.000000	0.0000000	0.0000000	0.0891266	0.0000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.000000	0.000000	0.1477105	0.000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0322789
Progenitors	0.0000000	0.000000	0.0000000	0.0000000	0.0891266	0.0000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.000000	0.000000	0.0000000	0.000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0000000	0.0000000	0.000000	0.0000000	0.000000	0.000000	0.0000000	0.0322789

Focus on MOCK vs ACTIVE - exclude latent and bystander

focus <- meta_inf[grep("latent",rownames(meta_inf),invert=TRUE),]
focus <- focus[grep("bystander",rownames(focus),invert=TRUE),]
focus_mdm <- focus[grep("mdm",rownames(focus)),]
focus_alv <- focus[grep("alv",rownames(focus)),]

mono_focus_mdm <- mono[,which(colnames(mono) %in% rownames(focus_mdm))]
mono_focus_alv <- mono[,which(colnames(mono) %in% rownames(focus_alv))]

# mono_focus_mdm
mono_focus_mdm <- FindVariableFeatures(mono_focus_mdm, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

mono_focus_mdm <- RunPCA(mono_focus_mdm, features = VariableFeatures(object = mono_focus_mdm))

## Warning: The following 45 features requested have zero variance; running
## reduction without them: AC073359.1, KCNIP1, AC093916.1, PLK1, NAALADL2, ANK2,
## PIWIL2, MDN1, CCL5, AC013287.1, LINC01937, FANCM, MT-ND1, FGL2, RABGAP1L,
## TNFRSF4, AHNAK2, LINC02221, LRRK2, L3MBTL4, ZC3H12C, LINGO1, RANBP17, BCL2,
## CREB5, CTHRC1, AC023194.3, VIPR1, DIAPH2, TESC, SLC44A5, SFMBT2, RRAS2,
## ARHGAP26, AC024901.1, LINC00910, ARRDC4, RIIAD1, AL121820.1, AC048382.1,
## AC004448.3, SUCLG2-AS1, LINC01605, BDNF-AS, C22orf24

## Warning in PrepDR5(object = object, features = features, layer = layer, : The
## following features were not available: HIST1H2BJ, OASL, ENG, CENPA, HIST1H3J,
## NCALD, SSBP3, PCBP3, AL355388.1, LINC00885, AL358944.1, SCAI, IFT80, C5AR2,
## KLF12, AL009177.1, SLC2A14, LINC02739, LINC01320, TVP23A, CCND2, ZFPM2-AS1,
## AC108134.2, TTK, PPP6R2, TNRC6C, STXBP1, AC106793.1, HCAR2, RBM6, AC064805.2,
## PNPLA7, GPR155, ANKRD34B, KLRG1, SETD2, AL021917.1, AC015660.2, ZBTB41, EIF2B3,
## NUP93, AGPAT5, AC025262.1, LINC02853, ESR2, CLPX, ZZEF1, MARCH1, FAM118B,
## CWC22, RASIP1, ATP6V0A2, AL353759.1, SH3GL1, TMEM212-AS1, ASTL, PKIA-AS1,
## LINC02585, PCA3, P2RY12, RPS4Y1, MT-ND5, AL357873.1, OBI1-AS1, AC007364.1,
## SF3B3, CHRNB4, TMEM220-AS1, PARD3, GYG2, AL157911.1, AC034213.1, CCNB2, AURKC,
## RAP1GDS1, NR2F1-AS1, DUSP16, IFT140, AC072022.2, AC078923.1, HLA-C, BTBD2,
## MT-ND2, AP000462.1, FAM184A, ZNF276, AC133550.2, AC093766.1, MAP4K1,
## AC108879.1, SRRM2-AS1, RORA, CFAP52, LPL, DOK5, CLIC5, SIGLEC10, CREM,
## LINC01054, LRRC9, CLEC4C, GRM7, PSME2, STAT5B, AC117383.1, WWP1, TTC21B-AS1,
## NYX, AC099552.1, SNAP25-AS1, FHOD3, NLRP4, ZNF385D-AS1, ACTR3C, LDHA, PDIA4,
## AC100849.2, CU638689.5, RGS18, DAPK2, JAKMIP3, MRC2, CCDC7, AP000446.1,
## MAP3K15, DPEP3, SETBP1, AL592295.3, LINC02015, AC092490.1, PACS1, MASP2, CDK19,
## AC089983.1, ANK3, AC240274.1, NELL2, APCDD1L, P2RY13, AL360015.1, AC105001.1,
## MMP28, FDXACB1, RAB10, SRGN, AC104459.1, ARC, CPLX1, AC007091.1, SAMD11,
## GAS1RR, SF3B1, AC013799.1, AL031658.1, CRISPLD2, AC114977.1, AC007389.1, OR3A2,
## AL589740.1, AHR, IFI6, GFRA2, MIR155HG, RASSF4, AC117473.1, CARD11, AL132775.1,
## TUBB3, KIF21A, TPTE2, AC016074.2, GTSF1, AC019117.1, AL391807.1, TEX15,
## AC005393.1, ADRA2B, MANF, UBASH3B, PLEK, AC005264.1, CEP112, ZDHHC17,
## AC011509.2, CACNA2D3, AC084809.2, CD93, AC007128.2, AC073569.1, NR2F2-AS1,
## PRRT2, AC026333.3, LINC01600, TSPAN8, Z99758.1, GNG4, CFAP69, AC011365.1,
## BCL2A1, P3H2, CEP135, TFG, JAKMIP2-AS1, AL049828.1, AC137770.1, DHX8, U62317.4,
## MT-CO3, IGSF6, DHCR24, TMEM45A, COX5B, SULF2, TRIM37, CPE, AC096570.1, SRGAP3,
## SESN3, LINC01098, TACC3, MPDZ, GAL3ST4, AP000829.1, BMPR1B, LINC00237,
## LINC00649, STPG2, MKX, CORIN, SNHG12, RTKN2, PRTG, BCAR3, OSBPL6, HIST1H2BG,
## LINC01917, LINC01855, MLLT3, LNX1, PAX8-AS1, SIGLEC7, MAPK8, HIST1H2AC,
## AL136418.1, EPHA6, MCF2L-AS1, DPF1, AC006427.2, AC079062.1, DARS, FKBP1B,
## LINC01353, Z94721.1, ABCA13, PODN, MT-ATP6, SOX15, ACMSD, KCNMB4, TASP1,
## AC021231.1, VGLL3, HIST2H2BE, CHRM3, AC087683.3, CREG2, PPFIA2, LINC01891,
## TRIM71, AC124017.1, AC008115.2, AC009315.1, SCAPER, LINC02112, CNDP1,
## AC073475.1, AKR7A2, RAP2C-AS1, HCAR3, TMCC1, AC021086.1, RUFY2, RFX7, TMEM71,
## MCCC2, TUBB4B, LINC01191, MNAT1, MRVI1, SOX21-AS1, CKMT1A, ATF3, KDM6A, CD72,
## AL031710.2, AL031005.2, KIAA0825, MAP1A, AC010834.2, DNAH3, ZBTB46, AC090630.1,
## PPP1R12B, AC129803.1, GSN, PDE11A, AL096794.1, PTPRB, ZHX2, PCP4L1, LINC00958,
## MNDA, PTPRG-AS1, KCNJ1, GAK, GLIPR1, RALGAPA2, COL8A2, LYPLAL1-DT, TNS3, MREG,
## RHOF, ITGA4, C11orf45, AL356379.2, AC011476.3, TUBB4A, LGALSL, SCFD2, ALDH1A1,
## SPEG, BFSP2, AC019330.1, AC063923.2, ECE2, AF131216.1, AC103726.2, PROCR,
## MARCH3, AEBP2, AC246817.1, SPIB, POF1B, ANGPTL4, COL28A1, AL157834.1, ACTB,
## PRKG2, FILIP1L, RFX3, GMDS, SOCS3, SPTLC3, PAN3, OXSR1, LINC00571, AC097654.1,
## AC092811.1, UBE2T, SPAG7, MCM9, MAFB, LINC00407, LILRB2, NEGR1, PLA2G5, MAPK13,
## KIAA1841, TRIO, AC079584.1, LINC02851, NETO2, CYP4F22, CLDN4, TMEM131L, RNF180,
## SLC26A7, AL365295.2, AC098588.3, C3orf79, ADORA2B, SPOCK3, RNF24, GRK3, NPRL3,
## TMEM72-AS1, SGO1, ENPEP, ITM2A, BUB1, AC114763.1, PCLO, XKR9, AL136298.1,
## RAPGEF1, ST3GAL6, GLRX, LRRIQ4, AP002793.1, LINC00987, PIP4K2C, CDT1,
## AL137076.1, AL158839.1, AL589765.6, AC096639.1, AC114810.1, AL133243.1,
## AC009229.3, AC012358.1, AC063944.3, AC092902.6, AC128709.2, AC021220.2,
## ADAMTS3, AC093801.1, AC116351.1, LINC02149, AC113386.1, AC011374.3, LINC02571,
## Z84484.1, MRAP2, AL357992.1, AL078582.1, AC002480.2, ITPRID1, DEFB136,
## AC084026.1, CALB1, AF178030.1, AF235103.1, AL353764.1, TMEM246-AS1, PPP3R2,
## AL353803.1, AL731559.1, AL121748.1, MTRNR2L7, LINC02625, AC016816.1, SYCE1,
## OR10A4, LINC02751, SAA4, AC013714.1, OR8J1, AC024940.1, LINC01479, AC012464.3,
## AC073863.1, C1QTNF9-AS1, SMAD9-IT1, LINC00563, AL161717.1, CLYBL-AS2,
## AL442125.2, KLHL33, CMA1, AL163953.1, LINC02280, AC091167.5, BAIAP3,
## AL133297.1, AC012186.2, AC092378.1, AC129507.2, RAI1-AS1, AC007952.6,
## AC243773.2, MIR4527HG, AC090409.1, AC105094.2, OR7E24, ANGPTL8, AL021396.1,
## NUDT11, LINC00266-4P, CFD, TFRC, TRAF2, PTPN13, CALR, NKAIN1, STAG1, ATP6V0D2,
## ATP13A3, EXD2, ZC4H2, E2F1, HERPUD1, KIF20B, CHM, ADIRF, BACH1, COL27A1,
## ADAMTS10, CDKAL1, LINC01762, AL645929.2, MMD2, NIPAL2, NFKB1, ENTPD1, LY96,
## ZFP36L1, SH3D19, ITGA9, MGAT5, ARRDC3-AS1, CD200R1, AC092134.3, LGALS3, PKP2,
## PFKFB3, SEMA6D, COBL, MYADM, ELANE, TNIK, TIMP4, FAM135A, AHCYL2, FAM110B,
## PRDM1, LINC02398, LINC02798, STUM, MDH1, KL, DUSP5, LINC01572, MTUS1, GADD45G,
## CACNA2D2, RALYL, PRH1, CASC2, GNG7, CASP1, SPATA5, DOCK10, PLBD1, LINC02476,
## AC083836.1, PCNX2, MCF2, LINC00511, SIRPD, MOCOS, AL158068.2, AC117465.1,
## AC007391.2, AC073050.1, DRD5, RPS6KA6, FAM189A2, SLC41A2, CFAP43, ST3GAL3,
## MT-CYB, FANCC, AC007405.3, EDA, PMAIP1, PRORP, ERBB4, TRAF1, UFL1-AS1, DYM,
## MFSD4B, CARMIL3, AC097515.1, NCAPD3, PAFAH1B1, AL445584.2, CALM3, NPTX2,
## LAMC1-AS1, AC099782.2, WWTR1-AS1, AC104785.1, AL357054.5, AC010719.1,
## AC073210.3, AP005436.3, AP002884.1, LUM, AC063943.1, PLUT, AC011939.3, NPIPB8,
## AC092127.1, AC104423.1, L1CAM, SCLT1, MATN1, AC020743.2, AC021546.1, PRSS3,
## LUZP4, LINC02457, MCTP2, CRYM, PKN2, CAMK2D, SLC7A8, IGF2R, AFF1, ITPR2,
## AC024084.1, AC096992.3, STMN2.

## PC_ 1 
## Positive:  SLC35F1, PRDX6, TUBA1B, CRABP2, TUBA1A, GAL, MT-ATP8, C15orf48, FABP3, DUSP2 
##     LILRA1, PTGDS, MYL9, HPGDS, MT2A, UCHL1, CRIP1, MMP7, CCNA1, MT1E 
##     NCAPH, S100A4, LINC02244, CA2, SEL1L2, AC105402.3, CCL7, RGCC, RNF128, TMEM114 
## Negative:  FMN1, NEAT1, FHIT, RAD51B, ARL15, MALAT1, SNTB1, MBD5, AL035446.2, FTX 
##     EXOC4, FNDC3B, VTI1A, PDE4D, FRMD4B, GMDS-DT, COP1, JMJD1C, TBC1D8, SLC8A1 
##     SLC22A15, DENND1A, DENND4C, DOCK3, CBLB, DOCK4, VPS13B, LRMDA, SBF2, COG5 
## PC_ 2 
## Positive:  GDF15, PSAT1, B4GALT5, TM4SF19, PHGDH, FDX1, SLAMF9, SNCA, TCEA1, S100A10 
##     TXN, BEX2, OCIAD2, PHLDA1, RAB38, UGCG, TPM4, SUPV3L1, DRAXIN, RETREG1 
##     NMRK2, HES2, NMB, CCPG1, CTSL, SCD, CLGN, FABP4, gene-HIV1-2, MARCO 
## Negative:  TGFBI, HLA-DRA, CEBPD, HLA-DPB1, CD74, HLA-DPA1, HLA-DRB1, SIPA1L1, MS4A6A, LYZ 
##     EPB41L3, SSBP2, ST8SIA4, AOAH, PTGDS, HLA-DQB1, KCNMA1, RGS2, HLA-DRB5, SAMSN1 
##     RNASE1, FOS, PDE4B, C1QA, TCF4, MS4A7, VMO1, RCBTB2, FPR3, RNASE6 
## PC_ 3 
## Positive:  BTG1, G0S2, MARCKS, CD14, ARL4C, CLEC4E, TGFBI, CXCR4, SDS, FUCA1 
##     SEMA4A, TIMP1, HIF1A, VMO1, STMN1, NUPR1, MEF2C, MS4A6A, RNASE1, P2RY8 
##     TCF4, OLFML2B, CLEC4A, PDK4, NAMPT, HES4, BLVRB, LGMN, MPEG1, FPR3 
## Negative:  NCAPH, CRABP2, RGCC, LINC01010, LINC02244, CHI3L1, GCLC, DUSP2, TMEM114, NUMB 
##     RGS20, LRCH1, PTGDS, AC092353.2, S100A4, MICAL3, ASAP1, CYP1B1, AC005280.2, DOCK3 
##     SERTAD2, CRIP1, RASAL2, TIMP3, TMEM117, NOS1AP, TCTEX1D2, SLC28A3, MYO1E, GPC4 
## PC_ 4 
## Positive:  AC078850.1, TUBA1B, ACTG1, SLC35F1, INSIG1, HSP90B1, TUBB, TMEM176A, CCL7, FABP4 
##     TPM4, MGLL, TUBA1A, LGMN, CD36, C3, CD300LB, TIMP3, FBP1, IL18BP 
##     TUBA1C, PHLDA1, CSF1, CADM1, LINC01091, CTSL, TMEM176B, HLA-DRB1, ALCAM, IL4I1 
## Negative:  MT-ATP8, HES2, BEX2, PDE4D, CLGN, S100P, PSAT1, CCPG1, NIBAN1, C5orf17 
##     NUPR1, CPD, PHGDH, XIST, ST6GALNAC3, RETREG1, EPHB1, HES4, MT1G, LINC02154 
##     SUPV3L1, PECR, CLVS1, CEP70, ALKAL2, TDRD3, ME1, AL035446.2, DMGDH, G0S2 
## PC_ 5 
## Positive:  TYMS, PCLAF, MKI67, CENPF, CEP55, BIRC5, PRC1, NUSAP1, KIF4A, CDKN3 
##     DLGAP5, CENPM, CDK1, TK1, HMMR, TPX2, CENPK, ASPM, PTTG1, ANLN 
##     RAD51AP1, CCNA2, MAD2L1, RRM2, CIT, BUB1B, DIAPH3, SHCBP1, MYBL2, NCAPG 
## Negative:  gene-HIV1-2, gene-HIV1-1, CCL7, AC078850.1, PHLDA1, LINC01091, TNFRSF9, TCTEX1D2, C3, CSF1 
##     SLC35F1, MADD, CD36, IL1RN, MACC1, SPOCD1, AC005062.1, DPYSL3, INSIG1, LIMA1 
##     TIMP3, TPM2, TPM4, CCL3, B4GALT5, CD300LB, TM4SF19, PCSK6, SDS, CADM1

DimHeatmap(mono_focus_mdm, dims = 1:2, cells = 500, balanced = TRUE)

DimHeatmap(mono_focus_mdm, dims = 3:4, cells = 500, balanced = TRUE)

ElbowPlot(mono_focus_mdm)

mono_focus_mdm <- FindNeighbors(mono_focus_mdm, dims = 1:4)

## Computing nearest neighbor graph

## Computing SNN

mono_focus_mdm <- FindClusters(mono_focus_mdm, algorithm = 3, resolution = 0.3, verbose = FALSE)
mono_focus_mdm <- RunUMAP(mono_focus_mdm, dims = 1:4)

## 13:58:28 UMAP embedding parameters a = 0.9922 b = 1.112

## 13:58:28 Read 4605 rows and found 4 numeric columns

## 13:58:28 Using Annoy for neighbor search, n_neighbors = 30

## 13:58:28 Building Annoy index with metric = cosine, n_trees = 50

## 0%   10   20   30   40   50   60   70   80   90   100%

## [----|----|----|----|----|----|----|----|----|----|

## **************************************************|
## 13:58:28 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea732d5a26
## 13:58:28 Searching Annoy index using 1 thread, search_k = 3000
## 13:58:30 Annoy recall = 100%
## 13:58:30 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:58:32 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:58:32 Commencing optimization for 500 epochs, with 162990 positive edges
## 13:58:35 Optimization finished

DimPlot(mono_focus_mdm, reduction = "umap", label=TRUE)

DimPlot(mono_focus_mdm, group.by="monaco_fine_annotation" , reduction = "umap", label=TRUE)

DimPlot(mono_focus_mdm, group.by="sample" , reduction = "umap", label=TRUE)

# mono_focus_alv
mono_focus_alv <- FindVariableFeatures(mono_focus_alv, selection.method = "vst", nfeatures = 2000)

## Finding variable features for layer counts

mono_focus_alv <- RunPCA(mono_focus_alv, features = VariableFeatures(object = mono_focus_alv))

## Warning: The following 40 features requested have zero variance; running
## reduction without them: F13A1, TMEM163, FCGBP, MARCKS, TIMP3, LINC01954, S100P,
## TACSTD2, FAM9B, SLC17A6, LINC00636, MAML2, GAPDH, MND1, SMS, CD226, AC010834.2,
## CCDC57, CD5L, SEMA3D, MFSD11, LINC01320, ELMO1, SLC6A7, AC092813.1, CLSTN2,
## CDH12, AC007381.1, EYA2, SLC23A2, LINC02777, ANKRD7, LINC01198, TSGA13, IGSF6,
## LINC01948, HMSD, FOLR2, CACNB4, GRB10

## Warning in PrepDR5(object = object, features = features, layer = layer, : The
## following features were not available: STEAP1B, NUP210L, CPNE8, AC083837.1,
## AP002075.1, HCAR2, CD28, AL096794.1, PPP1R1C, SFMBT1, IQCA1, ACVR2A, SLC6A16,
## SPINK6, TENM4, IFI6, EFCAB7, NPTX2, C11orf49, IL6, DNAH12, COL25A1,
## SLC7A14-AS1, LINC02068, DNAJC9, HIST1H2BG, AC076968.2, GNLY, LINC02269,
## LINC01605, EMP3, MS4A5, DSG2, INO80, ACTN2, AC207130.1, BANK1, BTNL8,
## LINC00350, AC092821.3, AC013391.2, KIF6, ARMC8, SMCHD1, TEK, ADAM22, EXO1,
## AC135050.3, MIF, COL27A1, GCH1, PLAAT3, CCDC85A, MSRB3, ERAP2, CNTNAP2, TRIM2,
## SCFD1, AC084740.1, AHCTF1, LINC02073, MIR4300HG, CTSW, MARCKSL1, CH25H, BRMS1L,
## MNDA, PMAIP1, AL353596.1, RAD51C, TUBB4A, GIGYF2, LINC00299, GPRC5D, SVEP1,
## LINC02789, KIAA1755, NEURL3, RFX3-AS1, OR10G3, USP10, MRPS6, ZC3H8, PTPN2,
## LINC01344, ASAP3, SHANK2, DYNC1H1, RFTN2, XDH, YJEFN3, SPOCK1, AC007785.1,
## MECP2, CR1, LPCAT1, TNFAIP3, AC019131.1, LINC01862, AC104596.1, AC079313.2,
## RPH3AL, NEK4, SH3TC2, URB1, LINC00511, STK32B, CFAP74, AC110992.1, HSF2BP,
## SPACA3, XPO5, RLF, PPEF1, AC084809.2, EML2-AS1, FBXO4, AC009292.2, ZPLD1,
## MBOAT4, NIPAL2, POU6F2, AC005264.1, PTX3, LGALS1, AC125421.1, COL4A5,
## SMAD1-AS2, DZIP1, HNRNPM, LINC02466, RGS8, LINC00842, NDRG1, INKA2-AS1, WDR90,
## TEPP, SOX6, ABCB1, TIMELESS, SYT12, LINC01999, SRL, ZSCAN32, PLEK, STS, NRG1,
## FCRLB, DCSTAMP, SLC44A5, ADCY3, PMPCA, CNR2, ZNF157, AL135926.2, STXBP6, CCNC,
## FO393415.3, IGSF23, DDN-AS1, NCMAP, LMNB1-DT, LINC01414, CSTB, PAX8-AS1, RFX8,
## CFD, GNRHR, FSD1, CLDN18, MID2, AC073569.2, GTF2IRD2, ABRAXAS2, WIPF3, HRH2,
## MIR2052HG, AKAP6, GATA2, AC011346.1, AC107081.2, AC041005.1, NNMT, HLTF-AS1,
## C2orf72, PLXDC1, PTPRB, HCAR3, TMEM37, PHACTR1, CTSZ, IL17RB, IGF2R, ANOS1,
## DACH1, AL662884.3, AC006441.4, RELN, C22orf34, ERI2, NCR3, PLCH1, EAF2, CKAP5,
## AOC1, UNC5D, COBL, PDE1C, AL592295.3, AC068228.3, SLC2A5, SVOP, AC022035.1,
## PSME2, GOLGA7B, AC004949.1, ANGPT1, AC009435.1, RGS7, AC008415.1, TNR,
## AC097518.2, CFAP57, CNTNAP5, SHCBP1L, TNC, AC007100.1, AC005753.2, NBAT1,
## AC109454.3, NIPAL4, AC025430.1, TPSAB1, SPRY4-AS1, ENTPD1, RAB7B, STPG2, MTFMT,
## NDRG2, ABLIM1, ADM, CDK6, GRID2, KIAA0825, UST, HSPA1B, ULBP2, LDHA,
## AL354733.2, MRC2, DHRS9, AC008609.1, PRDM14, AL161646.1, KRTAP10-4, LINC00378,
## Z99758.1, SHROOM4, AC010997.3, CTXND1, AL513166.1, BPIFC, MEI4, HIP1, GALNT14,
## RHOF, ZNF385D-AS1, FUT2, DENND2A, NEIL3, LGALS3, KLB, OASL, NSG2, CYP4F22,
## LINC01358, VIPR1, GPRIN3, GAPLINC, KIF21A, NRIP3, MIR3142HG, AC116563.1, KDR,
## AL049828.1, SLC4A1, AC110491.2, NRCAM, CRIM1, FOXM1, ZSCAN5A, ARHGAP6, GINS2,
## SCIN, LINC00973, AC073091.4, SH3TC2-DT, LINGO2, CHODL, SERINC2, GRM7, CHDH,
## ITGA2, PARD3, TROAP, APOM, RXFP1, GLIPR1, POU2F2, LINC01643, MEP1A, NES,
## PCDH15, MSH4, AL033530.1, LINC01900, AC092957.1, AC015908.2, SFTPD-AS1,
## AC006460.1, LINC00571, PRKCG, FAXC, ATP1B2, AC093307.1, IAPP, IKZF3, GPX3, SLA,
## YLPM1, CLEC7A, CLPB, RBL1, BRCA1, CRIPT, AC108066.1, CDHR3, AL109930.1, ASMT,
## AIM2, AC011893.1, ABCG2, AC117453.1, CHAC1, SLC22A2, NCAM2, HMOX1, TMEM45A,
## AC104041.1, AC093898.1, DPYS, PCLO, ELOC, LIPG, FAF1, SAMD12, MCAM, LIN28B-AS1,
## CHD5, LINC00519, AF274853.1, AC016831.7, MREG, MOSPD1, SIK2, CD1E, AL359220.1,
## ID2, AC008443.9, AMPD1, PRRX2, AC124852.1, SLC23A1, GRIK5, AC099489.1, ZNF365,
## TNFRSF11A, AC002454.1, AC093010.2, UPK1A-AS1, AC092718.1, LCP1, DHCR24, RHOD,
## PARN, AC007529.2, LINC01924, AC007402.1, ARL9, TGFB3, MAS1, SEMA6D, UBE2E2,
## FHAD1, TBC1D24, AC084816.1, GRHL2, AL360015.1, TDRD9, AC024901.1, SRGAP3,
## MEIS1, XKR9, AC079742.1, LINC02698, ING3, GNAI1, AC246817.1, RNF212,
## AL713998.1, ANKH, SLC35F3, MX2, LINC02109, AC130456.2, AHCYL2, AL136456.1,
## KIAA1217, SPSB1, SLC4A8, AC006333.1, ZAR1L, RBPJL, AC026391.1, ERBB4,
## LINC02752, PLTP, SNAI3, OPRD1, LINC02621, TMEM233, AC018618.1, FBLN1,
## AL355981.1, TMEM213, AC087897.2, DEGS2, WDFY4, AC099560.1, TNIK, SLC9A2,
## LINC01572, AL591845.1, GLT1D1, RASL10A, RALGAPA2, GALNTL6, COL8A2, GLUL, MAP1A,
## SEPTIN4-AS1, ST3GAL6, LINC02742, RNF144A, PTP4A2, LGR4, LINC01800, KDM7A,
## CNGA4, BX004807.1, DNAH2, ELOVL5, AC010343.3, LINC02805, GNGT1, FRRS1, MB21D2,
## CAMK1, CPEB2-DT, CEP126, CSMD2, AP001496.1, FBXW7, EXOSC10, ELF5, NANOS1,
## RP1L1, CFI, PROSER2-AS1, AC009107.1, TEKT1, AC113137.1, SOX5, STXBP5L, SPOCK3,
## TMEM132C, INPP5J, CLDN4, MARCH1, IL3RA, TNFRSF12A, S100A6, FBXO43, PPP1R14C,
## AP000424.1, EOGT, AC079163.2, AC087280.2, GLCCI1, PDLIM4, SKA3, STUM, C1orf143,
## CD96, ASTN2, GM2A, GRAMD2A, LINC01933, ACTB, LINC00894, DNAJC1, LIMCH1, HECW1,
## NR6A1, AL645465.1, HIST2H2BE, AL805961.1, AC239860.4, SCG2, AC021134.1,
## AC137810.1, AC145146.1, AL591519.1, AC108860.2, AL157702.2, AC068305.2, OTOGL,
## AC090515.6, LHCGR, ZBED9, AL162411.1, CLSTN3, AMPD3, HIST1H2AC, RASSF4,
## TMEM131L, C11orf45, SH3PXD2B, GALNT18, CDT1, INPP4B, SPOPL, FA2H, SLC30A8,
## FAM92B, AC016587.1, BICD1, ZNF431, FCMR, LPAR1, AL353595.1, CIDEC, STMND1,
## SSPO, NLRP7, GPAT3, WDR54, MYO16-AS1, AC005808.1, NUDT10, AC096531.2,
## LINC01276, AL354811.2, RBM11, PDE3A, TRMT5, JAML, AC019197.1, CORO1A, SERPINA1,
## KCNJ1, EMP1, LMO4, CDC5L, AC055855.2, BTG3-AS1, RBPJ, SYNE1, KCP, GRXCR2, EML4,
## AP001636.3, SCFD2, LINC01169, GABRR3, AC090099.1, ROBO4, AC092131.1, TPPP3,
## ATP13A3, TYW1B, HIST1H1C, AC005280.1, ITSN2, SLC25A23, FABP6, MPDZ, AC011476.3,
## EFNA2, AEBP2, AP000812.1, AC009315.1, HTRA4, ATP1B1, AC021851.2, AC092746.1,
## AHRR, BTBD11, MARCH3, AL122003.2, PKIB, MIR155HG, FRMD6-AS1, AC104809.2, BARD1,
## ADD2, RNF217-AS1, MS4A4A, AC079793.1, AL731563.2, AC092801.1, AC068587.4,
## AC007262.2, MED13L, LAMA3, NME5, LMCD1-AS1, LINC02224, AXL, ZNF543, ZNF385B,
## SH3BP5, SNX10, LINC02733, PLAT, HPN, RFX7, AL592431.2, ERLEC1, CHD9, SRPX2,
## KCNF1, ABCA6, AC097487.1, KCNN2, AGMO, DMRT1.

## PC_ 1 
## Positive:  NUPR1, FABP4, STMN1, PSAT1, PHGDH, CLGN, G0S2, IFITM3, BTG1, ALDH2 
##     DDIT4, CAMP, BLVRB, ADAMDEC1, TRIB3, GDF15, BEX2, AC073359.1, BEX3, CD14 
##     DNAJC5B, CARD16, CLEC4A, RAB6B, TIMP1, FAH, GYPC, FXYD2, RBP1, CES1 
## Negative:  RASAL2, DOCK3, TMEM117, AC092353.2, DPYD, ASAP1, LINC01374, PLXDC2, LRMDA, CPEB2 
##     ATG7, FMNL2, LRCH1, NEAT1, MITF, ARL15, TPRG1, ATXN1, EXOC4, NUMB 
##     DENND1B, MALAT1, PPARG, VWA8, FHIT, MYO1E, RABGAP1L, MEF2A, JARID2, PDE3B 
## PC_ 2 
## Positive:  gene-HIV1-1, CCL22, IL6R-AS1, GAL, gene-HIV1-2, AL157912.1, SLC20A1, DUSP13, IL1RN, TRIM54 
##     CYTOR, MYL9, CSF1, POLE4, MIR4435-2HG, CIR1, TM4SF19, BIN2, NMRK2, GPC3 
##     ACAT2, ATP6V1D, C3, SCD, NCAPH, TCTEX1D2, LAYN, UPP1, C15orf48, TNFSF14 
## Negative:  SLC8A1, MRC1, AL356124.1, TRPS1, SAMSN1, RCBTB2, PDGFC, FCGR3A, FCHSD2, SPRED1 
##     ME1, ATP8B4, CTSC, NRP1, CAMK1D, XYLT1, LYZ, AIF1, ARHGAP15, CCDC102B 
##     DOCK4, SLCO2B1, TGFBI, HLA-DPA1, C1QA, FOS, TCF7L2, TMEM236, HLA-DRA, HLA-DPB1 
## PC_ 3 
## Positive:  CTSL, MARCO, LGMN, BCAT1, TPM4, SAT1, HAMP, BLVRB, SLC11A1, CTSB 
##     TXN, FMN1, CCL3, CTSC, STMN1, MSR1, NUPR1, FABP4, S100A9, SNCA 
##     FDX1, PLAU, FCGR3A, CD164, UGCG, FPR3, CD36, FAH, GYPC, SLAMF9 
## Negative:  AC067751.1, CRIP1, CLU, MMP7, DUSP2, KCNMA1, PTGDS, LINC02408, TRIM54, NCAPH 
##     SERTAD2, C2orf92, GCLC, RNF128, KCNQ5, ZNF366, C1QTNF4, ST5, KCNA2, NFATC3 
##     IL1RN, AC092944.1, RCBTB2, DPH6-DT, TMEM176B, MSI2, LINC00910, NOS1AP, FGL2, RGS20 
## PC_ 4 
## Positive:  XIST, LINC02320, CCDC26, AC008591.1, OSBP2, MIR646HG, LINC01340, AL136317.2, LINC01708, SKAP1 
##     ZFPM2, KCNMB2-AS1, LIX1-AS1, AC068413.1, SEL1L2, PKD1L1, LINC01374, AC012668.3, LINC00607, CLMP 
##     AP005262.2, AL365295.1, UBA6-AS1, LINC01135, LINC00923, SASH1, STOX2, PLXNA2, AC098829.1, AC079142.1 
## Negative:  TUBA1A, IL1RN, TUBA1B, ALOX5AP, ACTG1, MMP7, SLC35F1, MMP9, TMEM176B, TUBB 
##     TMEM176A, HLA-DRB1, RGCC, CD74, CTSB, GCLC, HLA-DRA, CRIP1, HLA-DPA1, PTGDS 
##     LINC00910, RNF128, AC007952.4, TEX14, AIF1, S100A4, DUSP2, ALCAM, H2AFZ, STAC 
## PC_ 5 
## Positive:  gene-HIV1-1, gene-HIV1-2, CTSB, AIF1, IL6R-AS1, SPRED1, MRC1, C1QA, IL1RN, STAC 
##     AL157912.1, LINC02345, SLCO2B1, FPR3, HLA-DRA, CCL2, LGMN, ISG15, ALCAM, C1QB 
##     C15orf48, CD74, RTN1, HLA-DRB1, IL4I1, HLA-DPA1, CCL7, MMP19, CTSC, TNFSF13B 
## Negative:  CLSPN, TYMS, TK1, MKI67, PCLAF, DIAPH3, SHCBP1, RRM2, CENPK, FAM111B 
##     HELLS, ATAD2, CDK1, MYBL2, NCAPG, CDCA2, ESCO2, CCNA2, TOP2A, BIRC5 
##     KIF11, CENPF, POLQ, CENPM, HMMR, WDR76, CIT, DTL, KNL1, NUSAP1

DimHeatmap(mono_focus_alv, dims = 1:2, cells = 500, balanced = TRUE)

DimHeatmap(mono_focus_alv, dims = 3:4, cells = 500, balanced = TRUE)

ElbowPlot(mono_focus_alv)

mono_focus_alv <- FindNeighbors(mono_focus_alv, dims = 1:4)

## Computing nearest neighbor graph
## Computing SNN

mono_focus_alv <- FindClusters(mono_focus_alv, algorithm = 3, resolution = 0.3, verbose = FALSE)
mono_focus_alv <- RunUMAP(mono_focus_alv, dims = 1:4)

## 13:58:44 UMAP embedding parameters a = 0.9922 b = 1.112
## 13:58:44 Read 5250 rows and found 4 numeric columns
## 13:58:44 Using Annoy for neighbor search, n_neighbors = 30
## 13:58:44 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 13:58:45 Writing NN index file to temp file /tmp/RtmpuopyiY/file2952ea3ead0f57
## 13:58:45 Searching Annoy index using 1 thread, search_k = 3000
## 13:58:46 Annoy recall = 100%
## 13:58:47 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:58:49 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:58:49 Commencing optimization for 500 epochs, with 185554 positive edges
## 13:58:52 Optimization finished

DimPlot(mono_focus_alv, reduction = "umap", label=TRUE)

DimPlot(mono_focus_alv, group.by="monaco_fine_annotation" , reduction = "umap", label=TRUE)

DimPlot(mono_focus_alv, group.by="sample" , reduction = "umap", label=TRUE)

Differential expression

We are going to use muscat for pseudobulk analysis. First need to convert seurat obj to singlecellexperiment object. Then summarise counts to pseudobulk level.

sce <- Seurat::as.SingleCellExperiment(comb, assay = "RNA")

head(colData(sce),2)

## DataFrame with 2 rows and 13 columns
##                            orig.ident nCount_RNA nFeature_RNA
##                              <factor>  <numeric>    <integer>
## mdm_mock1|AAACGAATCACATACG        mac      72710         7097
## mdm_mock1|AAACGCTCATCAGCGC        mac      49092         6276
##                                              cell      sample RNA_snn_res.0.5
##                                       <character> <character>        <factor>
## mdm_mock1|AAACGAATCACATACG mdm_mock1|AAACGAATCA..   mdm_mock1              3 
## mdm_mock1|AAACGCTCATCAGCGC mdm_mock1|AAACGCTCAT..   mdm_mock1              11
##                            seurat_clusters           cell_barcode
##                                   <factor>            <character>
## mdm_mock1|AAACGAATCACATACG              3  mdm_mock1|AAACGAATCA..
## mdm_mock1|AAACGCTCATCAGCGC              11 mdm_mock1|AAACGCTCAT..
##                            monaco_broad_annotation monaco_broad_pruned_labels
##                                        <character>                <character>
## mdm_mock1|AAACGAATCACATACG               Monocytes                  Monocytes
## mdm_mock1|AAACGCTCATCAGCGC               Monocytes                  Monocytes
##                            monaco_fine_annotation monaco_fine_pruned_labels
##                                       <character>               <character>
## mdm_mock1|AAACGAATCACATACG    Classical monocytes       Classical monocytes
## mdm_mock1|AAACGCTCATCAGCGC    Classical monocytes       Classical monocytes
##                               ident
##                            <factor>
## mdm_mock1|AAACGAATCACATACG       3 
## mdm_mock1|AAACGCTCATCAGCGC       11

colnames(colData(sce))

##  [1] "orig.ident"                 "nCount_RNA"                
##  [3] "nFeature_RNA"               "cell"                      
##  [5] "sample"                     "RNA_snn_res.0.5"           
##  [7] "seurat_clusters"            "cell_barcode"              
##  [9] "monaco_broad_annotation"    "monaco_broad_pruned_labels"
## [11] "monaco_fine_annotation"     "monaco_fine_pruned_labels" 
## [13] "ident"

#muscat library
pb <- aggregateData(sce,
    assay = "counts", fun = "sum",
    by = c("monaco_broad_annotation", "sample"))

# one sheet per subpopulation
assayNames(pb)

## [1] "Basophils"       "Dendritic cells" "Monocytes"       "Neutrophils"    
## [5] "NK cells"        "Progenitors"

t(head(assay(pb)))

##                gene-HIV1-1 gene-HIV1-2 MIR1302-2HG FAM138A OR4F5 AL627309.1
## alv_active1              2          29           0       0     0          0
## alv_active2              0           0           0       0     0          0
## alv_active3              0           2           0       0     0          0
## alv_bystander1           0           0           0       0     0          0
## alv_bystander2           0           0           0       0     0          0
## alv_bystander3           0           0           0       0     0          0
## alv_bystander4           0           0           0       0     0          0
## alv_latent1              0           0           0       0     0          0
## alv_latent2              0           0           0       0     0          0
## alv_latent3              0          15           0       0     0          0
## alv_latent4              0           0           0       0     0          0
## alv_mock1                0           0           0       0     0          0
## alv_mock2                0           0           0       0     0          0
## alv_mock3                0           0           0       0     0          0
## mdm_active1              0           0           0       0     0          0
## mdm_active2              0           0           0       0     0          0
## mdm_active3              0           0           0       0     0          0
## mdm_active4              0           0           0       0     0          0
## mdm_bystander1           0           0           0       0     0          0
## mdm_bystander2           0           0           0       0     0          0
## mdm_bystander3           0           0           0       0     0          0
## mdm_latent1              0           2           0       0     0          0
## mdm_latent2              0           0           0       0     0          0
## mdm_latent3              0           0           0       0     0          0
## mdm_mock1                0           0           0       0     0          0
## mdm_mock2                0           0           0       0     0          0
## mdm_mock3                0           0           0       0     0          0
## mdm_mock4                0           0           0       0     0          0
## react6                   0           0           0       0     0          0

plotMDS(assays(pb)[[3]], main="Monocytes" )

par(mfrow=c(2,3))

dump <- lapply(1:length(assays(pb)) , function(i) {
  cellname=names(assays(pb))[i]
  plotMDS(assays(pb)[[i]],cex=1,pch=19,main=paste(cellname),labels=colnames(assays(pb)[[1]]))
})

par(mfrow=c(1,1))

DE Prep

pbmono <- assays(pb)[["Monocytes"]]

head(pbmono)

##             alv_active1 alv_active2 alv_active3 alv_bystander1 alv_bystander2
## gene-HIV1-1       29173       20670       14188              0              0
## gene-HIV1-2      120841      105818      103519              0              0
## MIR1302-2HG           0           0           0              0              0
## FAM138A               0           0           0              0              0
## OR4F5                 0           0           0              0              0
## AL627309.1           16          11          27             20             38
##             alv_bystander3 alv_bystander4 alv_latent1 alv_latent2 alv_latent3
## gene-HIV1-1              0              0         380        1777        1654
## gene-HIV1-2              0              0        2126       12238       13801
## MIR1302-2HG              0              0           0           1           0
## FAM138A                  0              0           0           0           0
## OR4F5                    0              0           0           0           0
## AL627309.1               9             14           6          65          59
##             alv_latent4 alv_mock1 alv_mock2 alv_mock3 mdm_active1 mdm_active2
## gene-HIV1-1        2215        92       115      1038       11089        7725
## gene-HIV1-2       26684       727      1653      6685       65635       42374
## MIR1302-2HG           0         0         0         0           0           0
## FAM138A               0         0         0         0           0           0
## OR4F5                 0         0         0         0           0           0
## AL627309.1           76        46        26        52          54          37
##             mdm_active3 mdm_active4 mdm_bystander1 mdm_bystander2
## gene-HIV1-1        4907       11852              0              0
## gene-HIV1-2       73030       35142              0              0
## MIR1302-2HG           0           0              0              0
## FAM138A               0           0              0              0
## OR4F5                 0           0              0              0
## AL627309.1           30           5             65             51
##             mdm_bystander3 mdm_latent1 mdm_latent2 mdm_latent3 mdm_mock1
## gene-HIV1-1              0         649         799         156        37
## gene-HIV1-2              6        6540        7091        3290       542
## MIR1302-2HG              0           0           0           0         0
## FAM138A                  0           0           0           0         0
## OR4F5                    0           0           0           0         0
## AL627309.1              19          72          78          28        61
##             mdm_mock2 mdm_mock3 mdm_mock4 react6
## gene-HIV1-1        67         3       141    895
## gene-HIV1-2       517       100      1078   7528
## MIR1302-2HG         0         0         0      0
## FAM138A             0         0         0      0
## OR4F5               0         0         0      0
## AL627309.1         30         6        15     37

dim(pbmono)

## [1] 36603    29

hiv <- pbmono[1:2,]
pbmono <- pbmono[3:nrow(pbmono),]

Gene sets

Gene ontology.

#go <- gmt_import("c5.go.v2023.2.Hs.symbols.gmt")
#names(go) <- gsub("_"," ",names(go))

#wget https://ziemann-lab.net/public/tmp/go_2024-11.gmt
go <- gmt_import("go_2024-11.gmt")

DE1 Latently- vs productively-infected cells (groups 3 vs 4).

MDM group.

pbmdm <- pbmono[,grep("mdm",colnames(pbmono))]

pb1m <- pbmdm[,c(grep("active",colnames(pbmdm)),grep("latent",colnames(pbmdm)))]

head(pb1m)

##             mdm_active1 mdm_active2 mdm_active3 mdm_active4 mdm_latent1
## MIR1302-2HG           0           0           0           0           0
## FAM138A               0           0           0           0           0
## OR4F5                 0           0           0           0           0
## AL627309.1           54          37          30           5          72
## AL627309.3            0           2           0           0           0
## AL627309.2            0           0           0           0           0
##             mdm_latent2 mdm_latent3
## MIR1302-2HG           0           0
## FAM138A               0           0
## OR4F5                 0           0
## AL627309.1           78          28
## AL627309.3            2           2
## AL627309.2            0           0

pb1mf <- pb1m[which(rowMeans(pb1m)>=10),]
head(pb1mf)

##            mdm_active1 mdm_active2 mdm_active3 mdm_active4 mdm_latent1
## AL627309.1          54          37          30           5          72
## AL627309.5          15          14          15           6          35
## LINC01409          129         114         104          27         153
## LINC01128          262         164         167          94         215
## LINC00115           22           6           7           4          23
## FAM41C              49          39          63          68          62
##            mdm_latent2 mdm_latent3
## AL627309.1          78          28
## AL627309.5          49           8
## LINC01409          221          60
## LINC01128          202          87
## LINC00115           10           4
## FAM41C              54          21

colSums(pb1mf)

## mdm_active1 mdm_active2 mdm_active3 mdm_active4 mdm_latent1 mdm_latent2 
##    30843590    22855559    26196614    13879736    35238765    39384235 
## mdm_latent3 
##    15058506

des1m <- as.data.frame(grepl("active",colnames(pb1mf)))
colnames(des1m) <- "case"

plot(cmdscale(dist(t(pb1mf))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb1mf))), labels=colnames(pb1mf) )

dds <- DESeqDataSetFromMatrix(countData = pb1mf , colData = des1m, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de1mf <- de
write.table(de1mf,"de1mf.tsv",sep="\t")

nrow(subset(de1mf,padj<0.05 & log2FoldChange>0))

## [1] 111

nrow(subset(de1mf,padj<0.05 & log2FoldChange<0))

## [1] 232

head(subset(de1mf,padj<0.05 & log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active MDM cells") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active MDM cells
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
LRRC25	389.10200	1.2420866	0.2192739	5.664542	0.0e+00	0.0000173
NMRK2	1221.29447	1.8111399	0.3328557	5.441216	1.0e-07	0.0000385
PHACTR1	363.65716	1.4735762	0.2807122	5.249421	2.0e-07	0.0000797
IL7R	18.77671	4.1477458	0.8171411	5.075923	4.0e-07	0.0001699
DRAXIN	478.82311	1.0070479	0.2036374	4.945299	8.0e-07	0.0002897
IL6R-AS1	79.77999	2.5938871	0.5445190	4.763630	1.9e-06	0.0005362
UBE2J1	2818.57491	0.4811650	0.1023425	4.701518	2.6e-06	0.0006742
HBEGF	408.02932	0.8614193	0.1855476	4.642578	3.4e-06	0.0008299
NDFIP2	270.63116	0.9089767	0.1982830	4.584240	4.6e-06	0.0010361
DIRC3	378.78002	1.1228802	0.2496282	4.498211	6.9e-06	0.0014864

head(subset(de1mf,padj<0.05 & log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in active MDM cells") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in active MDM cells
	baseMean	log2FoldChange	lfcSE	stat	padj
RCBTB2	819.53562	-1.473879	0.1941941	-7.589725	0.00e+00
TGFBI	461.99975	-2.449123	0.3481134	-7.035418	0.00e+00
AL031123.1	525.20294	-0.995166	0.1419224	-7.012041	0.00e+00
HIST1H1C	223.23751	-2.154687	0.3283867	-6.561431	2.00e-07
PTGDS	28566.30351	-2.275352	0.3586803	-6.343678	6.00e-07
EVI2A	562.82781	-1.145883	0.1812752	-6.321233	6.00e-07
MAP1A	95.16198	-2.073547	0.3294914	-6.293175	6.00e-07
FLRT2	303.44509	-1.893853	0.3124055	-6.062163	2.40e-06
IGF1R	384.50752	-1.003939	0.1694912	-5.923255	4.90e-06
CFD	3010.50393	-1.551909	0.2685750	-5.778309	1.06e-05

m1m <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 15002

## Note: no. genes in output = 15002

## Note: estimated proportion of input genes in output = 1

mres1m <- mitch_calc(m1m,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres1m$enrichment_result
mitchtbl <- mres1m$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de1mf_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("MDM_latent_vs_active.html") ) {
  mitch_report(mres1m,outfile="MDM_latent_vs_active.html")
}

Alv cells.

pbalv <- pbmono[,grep("alv",colnames(pbmono))]

pb1a <- pbalv[,c(grep("active",colnames(pbalv)),grep("latent",colnames(pbalv)))]
head(pb1a)

##             alv_active1 alv_active2 alv_active3 alv_latent1 alv_latent2
## MIR1302-2HG           0           0           0           0           1
## FAM138A               0           0           0           0           0
## OR4F5                 0           0           0           0           0
## AL627309.1           16          11          27           6          65
## AL627309.3            0           0           0           0           1
## AL627309.2            0           0           0           0           0
##             alv_latent3 alv_latent4
## MIR1302-2HG           0           0
## FAM138A               0           0
## OR4F5                 0           0
## AL627309.1           59          76
## AL627309.3            1           0
## AL627309.2            0           1

pb1af <- pb1a[which(rowMeans(pb1a)>=10),]
head(pb1af)

##            alv_active1 alv_active2 alv_active3 alv_latent1 alv_latent2
## AL627309.1          16          11          27           6          65
## AL627309.5          26          23          15          18         108
## LINC01409           87         151          92          34         133
## LINC01128          293         284         219         102         299
## LINC00115           10           7          10           5          14
## FAM41C             136         117          84          80         123
##            alv_latent3 alv_latent4
## AL627309.1          59          76
## AL627309.5          90          67
## LINC01409          306         198
## LINC01128          413         348
## LINC00115           27          21
## FAM41C             134         150

colSums(pb1af)

## alv_active1 alv_active2 alv_active3 alv_latent1 alv_latent2 alv_latent3 
##    30059559    28510626    24033240    16645696    43686559    56858514 
## alv_latent4 
##    52271268

des1a <- as.data.frame(grepl("active",colnames(pb1af)))
colnames(des1a) <- "case"

plot(cmdscale(dist(t(pb1af))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb1af))), labels=colnames(pb1af) )

dds <- DESeqDataSetFromMatrix(countData = pb1af , colData = des1a, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de1af <- de
write.table(de1af,"de1af.tsv",sep="\t")

nrow(subset(de1af,padj<0.05 & log2FoldChange>0))

## [1] 360

nrow(subset(de1af,padj<0.05 & log2FoldChange<0))

## [1] 259

head(subset(de1af,padj<0.05 & log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active Alv cells") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active Alv cells
	baseMean	log2FoldChange	lfcSE	stat
LAYN	646.06412	1.949699	0.1837933	10.608112
IL6R-AS1	496.13994	3.348015	0.3195846	10.476147
AL157912.1	876.82015	2.266161	0.2172491	10.431163
CDKN1A	6332.25123	1.338772	0.1368473	9.782967
DNAJA4	337.39117	1.595379	0.1654746	9.641232
TNFSF15	504.64619	2.273073	0.2526606	8.996549
TCTEX1D2	5924.67714	1.301338	0.1452066	8.961980
IGFL2	135.86673	4.178248	0.4907575	8.513876
MID2	153.58862	2.106117	0.2485700	8.472936
CLIC6	76.68185	2.633839	0.3172527	8.302021

head(subset(de1af,padj<0.05 & log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active Alv cells") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active Alv cells
	baseMean	log2FoldChange	lfcSE	stat	padj
CEBPD	942.22837	-1.5655586	0.1606184	-9.747070	0e+00
LYZ	58220.50593	-1.1385910	0.1231063	-9.248847	0e+00
H1FX	1153.38317	-1.2336404	0.1591139	-7.753189	0e+00
P2RY11	202.29568	-1.2342577	0.1696748	-7.274256	0e+00
JAKMIP2	75.00681	-3.9911471	0.5561287	-7.176662	0e+00
IFNGR2	2572.40169	-0.6681364	0.0973720	-6.861687	0e+00
CSF1R	652.75459	-1.7013376	0.2517604	-6.757765	0e+00
SLCO3A1	245.66623	-1.7535308	0.2756341	-6.361806	1e-07
CD44	5417.83243	-0.8186934	0.1309255	-6.253126	2e-07
GCA	1488.45196	-0.9167194	0.1478308	-6.201139	3e-07

m1a <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 16484

## Note: no. genes in output = 16484

## Note: estimated proportion of input genes in output = 1

mres1a <- mitch_calc(m1a,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres1a$enrichment_result

mitchtbl <- mres1a$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de1af_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("Alv_latent_vs_active.html") ) {
  mitch_report(mres1a,outfile="Alv_latent_vs_active.html")
}

Combined.

mm1 <- merge(m1a,m1m,by=0)

head(mm1)

##   Row.names        x.x        x.y
## 1      A1BG  1.0451580  0.1247177
## 2  A1BG-AS1 -0.6726339 -1.1348606
## 3       A2M  0.5824820 -0.5012939
## 4   A2M-AS1 -0.3548217 -1.0457070
## 5 A2ML1-AS1 -1.6525887  0.7798403
## 6    A4GALT  3.8981565 -0.7614755

rownames(mm1) <- mm1[,1]
mm1[,1]=NULL
colnames(mm1) <- c("Alv","MDM")
plot(mm1)
mylm <- lm(mm1)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.0314 -0.8127 -0.0676  0.7340 10.1779 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.032175   0.010989  -2.928  0.00342 ** 
## MDM          0.442823   0.008061  54.936  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.334 on 14743 degrees of freedom
## Multiple R-squared:  0.1699, Adjusted R-squared:  0.1699 
## F-statistic:  3018 on 1 and 14743 DF,  p-value: < 2.2e-16

cor.test(mm1$Alv,mm1$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm1$Alv and mm1$MDM
## t = 54.936, df = 14743, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3987295 0.4255273
## sample estimates:
##       cor 
## 0.4122176

mm1r <- as.data.frame(apply(mm1,2,rank))
plot(mm1r,cex=0.3)
mylm <- lm(mm1r)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm1r)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9874.1 -3236.5   -26.6  3193.4  9902.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.479e+03  6.449e+01   69.45   <2e-16 ***
## MDM         3.925e-01  7.575e-03   51.82   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3915 on 14743 degrees of freedom
## Multiple R-squared:  0.1541, Adjusted R-squared:  0.154 
## F-statistic:  2685 on 1 and 14743 DF,  p-value: < 2.2e-16

cor.test(mm1r$Alv,mm1r$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm1r$Alv and mm1r$MDM
## t = 51.821, df = 14743, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3787906 0.4060997
## sample estimates:
##       cor 
## 0.3925316

DE2 Latently-infected vs bystander cells

MDM group.

pb2m <- pbmdm[,c(grep("bystander",colnames(pbmdm)),grep("latent",colnames(pbmdm)))]
head(pb2m)

##             mdm_bystander1 mdm_bystander2 mdm_bystander3 mdm_latent1
## MIR1302-2HG              0              0              0           0
## FAM138A                  0              0              0           0
## OR4F5                    0              0              0           0
## AL627309.1              65             51             19          72
## AL627309.3               2              4              0           0
## AL627309.2               0              0              0           0
##             mdm_latent2 mdm_latent3
## MIR1302-2HG           0           0
## FAM138A               0           0
## OR4F5                 0           0
## AL627309.1           78          28
## AL627309.3            2           2
## AL627309.2            0           0

pb2mf <- pb2m[which(rowMeans(pb2m)>=10),]
head(pb2mf)

##            mdm_bystander1 mdm_bystander2 mdm_bystander3 mdm_latent1 mdm_latent2
## AL627309.1             65             51             19          72          78
## AL627309.5             33             35              8          35          49
## LINC01409             136            246             63         153         221
## LINC01128             216            208            111         215         202
## LINC00115              18             24              7          23          10
## FAM41C                 43             69             19          62          54
##            mdm_latent3
## AL627309.1          28
## AL627309.5           8
## LINC01409           60
## LINC01128           87
## LINC00115            4
## FAM41C              21

colSums(pb2mf)

## mdm_bystander1 mdm_bystander2 mdm_bystander3    mdm_latent1    mdm_latent2 
##       40549303       36895352       16235047       35242448       39389780 
##    mdm_latent3 
##       15060675

des2m <- as.data.frame(grepl("latent",colnames(pb2mf)))
colnames(des2m) <- "case"

plot(cmdscale(dist(t(pb2mf))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb2mf))), labels=colnames(pb2mf) )

dds <- DESeqDataSetFromMatrix(countData = pb2mf , colData = des2m, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de2mf <- de
#write.table(de2mf,"de2mf.tsv",sep="\t")

nrow(subset(de2mf,padj<0.05 & log2FoldChange>0))

## [1] 0

nrow(subset(de2mf,padj<0.05 & log2FoldChange<0))

## [1] 0

head(subset(de2mf,log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in latent compared to bystander (MDM unpaired)") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in latent compared to bystander (MDM unpaired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
ZFPM2	815.724490	1.0134336	0.3505885	2.890664	0.0038443	0.9997657
AC015660.1	13.134808	1.7822471	0.6580113	2.708536	0.0067581	0.9997657
PAEP	8.352154	3.2437035	1.2087877	2.683435	0.0072870	0.9997657
MAP1A	161.014052	0.5807312	0.2246662	2.584862	0.0097418	0.9997657
GRIP1	22.096324	1.3653641	0.5493578	2.485382	0.0129412	0.9997657
AL603840.1	64.653599	0.6972954	0.2851908	2.445013	0.0144847	0.9997657
LINC02257	59.540802	1.0232779	0.4203183	2.434531	0.0149111	0.9997657
FAM83G	16.640778	1.2031135	0.5112349	2.353348	0.0186052	0.9997657
LINC02479	31.481351	1.0513244	0.4468126	2.352943	0.0186255	0.9997657
PTH2R	9.387033	1.9614365	0.8560226	2.291337	0.0219439	0.9997657

head(subset(de2mf,log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in latent compared to bystander (MDM unpaired)") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in latent compared to bystander (MDM unpaired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
HSPA5	3129.492289	-0.5464083	0.2136635	-2.557330	0.0105479	0.9997657
AKR1C3	49.148938	-0.8102721	0.3304780	-2.451818	0.0142136	0.9997657
PLCB1	87.139407	-2.1553080	0.8985866	-2.398553	0.0164600	0.9997657
ZNF93	235.615217	-0.4680771	0.2074674	-2.256148	0.0240614	0.9997657
AC092142.1	9.750122	-1.6296531	0.8050444	-2.024302	0.0429391	0.9997657
ACTG1	17039.224704	-0.2770873	0.1382883	-2.003692	0.0451031	0.9997657
ZNF821	226.318656	-0.3641906	0.1830925	-1.989107	0.0466894	0.9997657
SMIM15-AS1	38.595801	-0.7115038	0.3610366	-1.970725	0.0487553	0.9997657
SYN2	12.534778	-1.2709728	0.6477603	-1.962104	0.0497504	0.9997657
PARTICL	19.357134	-1.0880757	0.5595985	-1.944386	0.0518489	0.9997657

des2m$sample <- rep(1:3,2)

dds <- DESeqDataSetFromMatrix(countData = pb2mf , colData = des2m, design = ~ sample + case)

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de2mf <- de
write.table(de2mf,"de2mf.tsv",sep="\t")

nrow(subset(de2mf,padj<0.05 & log2FoldChange>0))

## [1] 0

nrow(subset(de2mf,padj<0.05 & log2FoldChange<0))

## [1] 0

head(subset(de2mf,log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in latent compared to bystander (MDM paired)") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in latent compared to bystander (MDM paired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
ZFPM2	815.724490	1.0325654	0.2720485	3.795519	0.0001473	0.9999834
IL1RN	254.031816	0.7031503	0.1985602	3.541246	0.0003982	0.9999834
PAEP	8.352154	3.4470829	1.0303317	3.345605	0.0008210	0.9999834
MAP1A	161.014052	0.5736414	0.1906963	3.008141	0.0026285	0.9999834
IGFBP6	155.859355	0.6138135	0.2083494	2.946078	0.0032183	0.9999834
AC015660.1	13.134808	1.7526778	0.6688347	2.620495	0.0087802	0.9999834
GRIP1	22.096324	1.3575888	0.5660267	2.398454	0.0164645	0.9999834
PRKN	1022.174743	0.2546727	0.1065789	2.389522	0.0168703	0.9999834
LINC02257	59.540802	0.9626412	0.4034090	2.386266	0.0170204	0.9999834
NEO1	23.796659	1.1376615	0.4798766	2.370738	0.0177526	0.9999834

head(subset(de2mf,log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in latent compared to bystander (MDM paired)") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in latent compared to bystander (MDM paired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
HSPA5	3129.492289	-0.5271919	0.1925378	-2.738121	0.0061791	0.9999834
VSIG4	137.115837	-0.5384120	0.2116862	-2.543444	0.0109766	0.9999834
ZNF93	235.615217	-0.4639859	0.1972344	-2.352459	0.0186497	0.9999834
AKR1C3	49.148938	-0.8055583	0.3468676	-2.322380	0.0202125	0.9999834
ACTG1	17039.224704	-0.2776672	0.1205940	-2.302496	0.0213072	0.9999834
SOWAHD	243.662376	-0.4059598	0.1797295	-2.258726	0.0239004	0.9999834
PLCB1	87.139407	-1.5516291	0.7104441	-2.184027	0.0289603	0.9999834
AC092142.1	9.750122	-1.7293643	0.8532186	-2.026871	0.0426756	0.9999834
ZNF821	226.318656	-0.3630509	0.1824288	-1.990097	0.0465803	0.9999834
CYTL1	716.355028	-0.4856587	0.2455353	-1.977959	0.0479334	0.9999834

m2m <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 15300

## Note: no. genes in output = 15300

## Note: estimated proportion of input genes in output = 1

mres2m <- mitch_calc(m2m,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres2m$enrichment_result

mitchtbl <- mres2m$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de2mf_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("MDM_bystander_vs_latent.html") ) {
  mitch_report(mres2m,outfile="MDM_bystander_vs_latent.html")
}

Alv cells.

pb2a <- pbalv[,c(grep("bystander",colnames(pbalv)),grep("latent",colnames(pbalv)))]
head(pb2a)

##             alv_bystander1 alv_bystander2 alv_bystander3 alv_bystander4
## MIR1302-2HG              0              0              0              0
## FAM138A                  0              0              0              0
## OR4F5                    0              0              0              0
## AL627309.1              20             38              9             14
## AL627309.3               0              1              0              0
## AL627309.2               0              0              0              0
##             alv_latent1 alv_latent2 alv_latent3 alv_latent4
## MIR1302-2HG           0           1           0           0
## FAM138A               0           0           0           0
## OR4F5                 0           0           0           0
## AL627309.1            6          65          59          76
## AL627309.3            0           1           1           0
## AL627309.2            0           0           0           1

pb2af <- pb2a[which(rowMeans(pb2a)>=10),]
head(pb2af)

##            alv_bystander1 alv_bystander2 alv_bystander3 alv_bystander4
## AL627309.1             20             38              9             14
## AL627309.5             27             51             24             12
## LINC01409              58             83             75             73
## LINC01128             141            162            109            105
## LINC00115               1              7              5              2
## FAM41C                 82             60             27             37
##            alv_latent1 alv_latent2 alv_latent3 alv_latent4
## AL627309.1           6          65          59          76
## AL627309.5          18         108          90          67
## LINC01409           34         133         306         198
## LINC01128          102         299         413         348
## LINC00115            5          14          27          21
## FAM41C              80         123         134         150

colSums(pb2af)

## alv_bystander1 alv_bystander2 alv_bystander3 alv_bystander4    alv_latent1 
##       20821170       22496343       14124595       13510489       16643623 
##    alv_latent2    alv_latent3    alv_latent4 
##       43678024       56851234       52263981

des2a <- as.data.frame(grepl("latent",colnames(pb2af)))
colnames(des2a) <- "case"

plot(cmdscale(dist(t(pb2af))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb2af))), labels=colnames(pb2af) )

dds <- DESeqDataSetFromMatrix(countData = pb2af , colData = des2a, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de2af <- de
#write.table(de2af,"de2af.tsv",sep="\t")

nrow(subset(de2af,padj<0.05 & log2FoldChange>0))

## [1] 0

nrow(subset(de2af,padj<0.05 & log2FoldChange<0))

## [1] 0

head(subset(de2af, log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in latent compared to bystander (Alv unpaired)") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in latent compared to bystander (Alv unpaired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
IGFL2	6.831276	4.1473419	1.1419919	3.631674	0.0002816	0.9999275
IL6R-AS1	59.471366	0.9936364	0.3155831	3.148573	0.0016407	0.9999275
GUCY1A2	57.478671	1.1652601	0.3867694	3.012803	0.0025885	0.9999275
CLIC6	12.514507	1.3804460	0.5392894	2.559750	0.0104748	0.9999275
IL7R	45.847087	1.0647822	0.4599362	2.315065	0.0206094	0.9999275
AC131254.1	20.634480	1.5538947	0.6839264	2.272020	0.0230853	0.9999275
CCL17	34.574222	1.5455779	0.6967303	2.218330	0.0265323	0.9999275
TLCD5	102.898040	0.5295161	0.2426812	2.181941	0.0291139	0.9999275
CXCL10	28.584967	4.3629570	2.0025608	2.178689	0.0293548	0.9999275
AC113404.1	13.483464	1.8436836	0.8522499	2.163313	0.0305171	0.9999275

head(subset(de2af, log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in latent compared to bystander (Alv unpaired)") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in latent compared to bystander (Alv unpaired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
PBK	15.68054	-1.5165527	0.5250931	-2.888160	0.0038750	0.9999275
AL162586.1	23.37148	-0.8275908	0.3459676	-2.392105	0.0167521	0.9999275
CENPF	110.08618	-1.0023498	0.4539709	-2.207960	0.0272470	0.9999275
TOP2A	100.15582	-0.9579134	0.4374134	-2.189950	0.0285279	0.9999275
GTSE1	39.83766	-0.9404875	0.4642714	-2.025728	0.0427927	0.9999275
CCNA2	54.74930	-0.7658193	0.3795303	-2.017808	0.0436113	0.9999275
CDCA3	13.65163	-1.0069050	0.5029028	-2.002186	0.0452647	0.9999275
ASPM	52.29393	-1.0870301	0.5442275	-1.997382	0.0457837	0.9999275
HJURP	12.85119	-1.4352668	0.7272403	-1.973580	0.0484295	0.9999275
AC016590.1	10.67140	-0.9703872	0.5041304	-1.924873	0.0542452	0.9999275

des2a$sample <- rep(1:4,2)

dds <- DESeqDataSetFromMatrix(countData = pb2af , colData = des2a, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de2af <- de
write.table(de2af,"de2af.tsv",sep="\t")

nrow(subset(de2af,padj<0.05 & log2FoldChange>0))

## [1] 0

nrow(subset(de2af,padj<0.05 & log2FoldChange<0))

## [1] 0

head(subset(de2af, log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in latent compared to bystander (Alv paired)") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in latent compared to bystander (Alv paired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
IGFL2	6.831276	4.1473419	1.1419919	3.631674	0.0002816	0.9999275
IL6R-AS1	59.471366	0.9936364	0.3155831	3.148573	0.0016407	0.9999275
GUCY1A2	57.478671	1.1652601	0.3867694	3.012803	0.0025885	0.9999275
CLIC6	12.514507	1.3804460	0.5392894	2.559750	0.0104748	0.9999275
IL7R	45.847087	1.0647822	0.4599362	2.315065	0.0206094	0.9999275
AC131254.1	20.634480	1.5538947	0.6839264	2.272020	0.0230853	0.9999275
CCL17	34.574222	1.5455779	0.6967303	2.218330	0.0265323	0.9999275
TLCD5	102.898040	0.5295161	0.2426812	2.181941	0.0291139	0.9999275
CXCL10	28.584967	4.3629570	2.0025608	2.178689	0.0293548	0.9999275
AC113404.1	13.483464	1.8436836	0.8522499	2.163313	0.0305171	0.9999275

head(subset(de2af, log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in latent compared to bystander (Alv paired)") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in latent compared to bystander (Alv paired)
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
PBK	15.68054	-1.5165527	0.5250931	-2.888160	0.0038750	0.9999275
AL162586.1	23.37148	-0.8275908	0.3459676	-2.392105	0.0167521	0.9999275
CENPF	110.08618	-1.0023498	0.4539709	-2.207960	0.0272470	0.9999275
TOP2A	100.15582	-0.9579134	0.4374134	-2.189950	0.0285279	0.9999275
GTSE1	39.83766	-0.9404875	0.4642714	-2.025728	0.0427927	0.9999275
CCNA2	54.74930	-0.7658193	0.3795303	-2.017808	0.0436113	0.9999275
CDCA3	13.65163	-1.0069050	0.5029028	-2.002186	0.0452647	0.9999275
ASPM	52.29393	-1.0870301	0.5442275	-1.997382	0.0457837	0.9999275
HJURP	12.85119	-1.4352668	0.7272403	-1.973580	0.0484295	0.9999275
AC016590.1	10.67140	-0.9703872	0.5041304	-1.924873	0.0542452	0.9999275

m2a <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 15956

## Note: no. genes in output = 15956

## Note: estimated proportion of input genes in output = 1

mres2a <- mitch_calc(m2a,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres2a$enrichment_result

mitchtbl <- mres2a$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de2af_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("Alv_bystander_vs_latent.html") ) {
  mitch_report(mres2a,outfile="Alv_bystander_vs_latent.html")
}

Combined.

mm2 <- merge(m2a,m2m,by=0)

head(mm2)

##   Row.names          x.x        x.y
## 1      A1BG  0.144282200 -0.4751495
## 2  A1BG-AS1  0.428994559  0.6099293
## 3       A2M -0.252894005  0.4934169
## 4   A2M-AS1  0.161704527 -0.2630109
## 5 A2ML1-AS1 -0.647095488 -0.3808451
## 6      AAAS -0.009550646 -0.4414420

rownames(mm2) <- mm2[,1]
mm2[,1]=NULL
colnames(mm2) <- c("Alv","MDM")
plot(mm2)
mylm <- lm(mm2)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3098 -0.2762 -0.0051  0.2754  3.6223 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.004641   0.003933   -1.18    0.238    
## MDM          0.082903   0.006446   12.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4781 on 14792 degrees of freedom
## Multiple R-squared:  0.01106,    Adjusted R-squared:  0.01099 
## F-statistic: 165.4 on 1 and 14792 DF,  p-value: < 2.2e-16

cor.test(mm2$Alv,mm2$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm2$Alv and mm2$MDM
## t = 12.861, df = 14792, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08919709 0.12106940
## sample estimates:
##       cor 
## 0.1051602

mm2r <- as.data.frame(apply(mm2,2,rank))
plot(mm2r,cex=0.3)
mylm <- lm(mm2r)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm2r)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8172.8 -3619.7    46.7  3659.7  8139.7 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.600e+03  6.982e+01   94.53   <2e-16 ***
## MDM         1.078e-01  8.174e-03   13.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4246 on 14792 degrees of freedom
## Multiple R-squared:  0.01162,    Adjusted R-squared:  0.01155 
## F-statistic: 173.9 on 1 and 14792 DF,  p-value: < 2.2e-16

cor.test(mm2r$Alv,mm2r$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm2r$Alv and mm2r$MDM
## t = 13.188, df = 14792, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.09184355 0.12369775
## sample estimates:
##       cor 
## 0.1077983

DE3 Active vs mock

MDM group.

pb3m <- pbmdm[,c(grep("active",colnames(pbmdm)),grep("mock",colnames(pbmdm)))]

head(pb3m)

##             mdm_active1 mdm_active2 mdm_active3 mdm_active4 mdm_mock1 mdm_mock2
## MIR1302-2HG           0           0           0           0         0         0
## FAM138A               0           0           0           0         0         0
## OR4F5                 0           0           0           0         0         0
## AL627309.1           54          37          30           5        61        30
## AL627309.3            0           2           0           0         0         0
## AL627309.2            0           0           0           0         0         0
##             mdm_mock3 mdm_mock4
## MIR1302-2HG         0         0
## FAM138A             0         0
## OR4F5               0         0
## AL627309.1          6        15
## AL627309.3          0         0
## AL627309.2          0         0

pb3mf <- pb3m[which(rowMeans(pb3m)>=10),]
head(pb3mf)

##            mdm_active1 mdm_active2 mdm_active3 mdm_active4 mdm_mock1 mdm_mock2
## AL627309.1          54          37          30           5        61        30
## AL627309.5          15          14          15           6        17        27
## LINC01409          129         114         104          27       119       116
## LINC01128          262         164         167          94       173       118
## FAM41C              49          39          63          68        61        25
## NOC2L              183         126         246         155       185       153
##            mdm_mock3 mdm_mock4
## AL627309.1         6        15
## AL627309.5         1        16
## LINC01409         33        57
## LINC01128         57       159
## FAM41C            14        85
## NOC2L             75       221

colSums(pb3mf)

## mdm_active1 mdm_active2 mdm_active3 mdm_active4   mdm_mock1   mdm_mock2 
##    30838055    22850851    26192488    13877828    29112212    21615267 
##   mdm_mock3   mdm_mock4 
##     7549870    20729914

des3m <- as.data.frame(grepl("active",colnames(pb3mf)))
colnames(des3m) <- "case"

plot(cmdscale(dist(t(pb3mf))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb3mf))), labels=colnames(pb3mf) )

dds <- DESeqDataSetFromMatrix(countData = pb3mf , colData = des3m, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de3mf <- de
write.table(de3mf,"de3mf.tsv",sep="\t")

nrow(subset(de3mf,padj<0.05 & log2FoldChange>0))

## [1] 102

nrow(subset(de3mf,padj<0.05 & log2FoldChange<0))

## [1] 220

head(subset(de3mf,padj<0.05 & log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active MDM cells compared to mock") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active MDM cells compared to mock
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
PLAAT3	35.54458	2.6018024	0.4350913	5.979899	0.0e+00	0.0000031
LYRM1	207.51046	0.9010836	0.1550569	5.811308	0.0e+00	0.0000060
FAM241B	98.31075	1.4123422	0.2609457	5.412399	1.0e-07	0.0000431
CDKN1A	2611.64337	1.1118551	0.2071428	5.367578	1.0e-07	0.0000484
GRIP1	33.91692	2.0231034	0.3825670	5.288233	1.0e-07	0.0000641
DDB2	470.91041	0.6805458	0.1292762	5.264279	1.0e-07	0.0000706
KCNMB2-AS1	215.01389	1.5174273	0.2887412	5.255319	1.0e-07	0.0000716
FAS	177.02499	0.9989147	0.1966111	5.080661	4.0e-07	0.0001505
MDM2	3009.65137	1.0736852	0.2190317	4.901962	9.0e-07	0.0003209
SFXN1	174.71064	0.8474226	0.1751359	4.838657	1.3e-06	0.0004321

head(subset(de1mf,padj<0.05 & log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in active MDM cells compared to mock") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in active MDM cells compared to mock
	baseMean	log2FoldChange	lfcSE	stat	padj
RCBTB2	819.53562	-1.473879	0.1941941	-7.589725	0.00e+00
TGFBI	461.99975	-2.449123	0.3481134	-7.035418	0.00e+00
AL031123.1	525.20294	-0.995166	0.1419224	-7.012041	0.00e+00
HIST1H1C	223.23751	-2.154687	0.3283867	-6.561431	2.00e-07
PTGDS	28566.30351	-2.275352	0.3586803	-6.343678	6.00e-07
EVI2A	562.82781	-1.145883	0.1812752	-6.321233	6.00e-07
MAP1A	95.16198	-2.073547	0.3294914	-6.293175	6.00e-07
FLRT2	303.44509	-1.893853	0.3124055	-6.062163	2.40e-06
IGF1R	384.50752	-1.003939	0.1694912	-5.923255	4.90e-06
CFD	3010.50393	-1.551909	0.2685750	-5.778309	1.06e-05

m3m <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 14592

## Note: no. genes in output = 14592

## Note: estimated proportion of input genes in output = 1

mres3m <- mitch_calc(m3m,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres3m$enrichment_result

mitchtbl <- mres3m$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de3mf_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("MDM_mock_vs_active.html") ) {
  mitch_report(mres3m,outfile="MDM_mock_vs_active.html")
}

pb3a <- pbalv[,c(grep("active",colnames(pbalv)),grep("mock",colnames(pbalv)))]

head(pb3a)

##             alv_active1 alv_active2 alv_active3 alv_mock1 alv_mock2 alv_mock3
## MIR1302-2HG           0           0           0         0         0         0
## FAM138A               0           0           0         0         0         0
## OR4F5                 0           0           0         0         0         0
## AL627309.1           16          11          27        46        26        52
## AL627309.3            0           0           0         0         0         1
## AL627309.2            0           0           0         0         0         0

pb3af <- pb3a[which(rowMeans(pb3a)>=10),]
head(pb3af)

##            alv_active1 alv_active2 alv_active3 alv_mock1 alv_mock2 alv_mock3
## AL627309.1          16          11          27        46        26        52
## AL627309.5          26          23          15        63        29        50
## LINC01409           87         151          92        59       103       140
## LINC01128          293         284         219       176       200       250
## LINC00115           10           7          10        10        12        15
## FAM41C             136         117          84        74        53        98

colSums(pb3af)

## alv_active1 alv_active2 alv_active3   alv_mock1   alv_mock2   alv_mock3 
##    30051410    28505235    24028409    20446755    25545005    34873898

des3a <- as.data.frame(grepl("active",colnames(pb3af)))
colnames(des3a) <- "case"

plot(cmdscale(dist(t(pb3af))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb3af))), labels=colnames(pb3af) )

dds <- DESeqDataSetFromMatrix(countData = pb3af , colData = des3a, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de3af <- de
write.table(de3af,"de3af.tsv",sep="\t")

nrow(subset(de3af,padj<0.05 & log2FoldChange>0))

## [1] 891

nrow(subset(de3af,padj<0.05 & log2FoldChange<0))

## [1] 653

head(subset(de3af,padj<0.05 & log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active Alv cells compared to mock") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active Alv cells compared to mock
	baseMean	log2FoldChange	lfcSE	stat
AL157912.1	754.26670	2.731970	0.2091846	13.060089
LAYN	553.09496	2.261020	0.1871843	12.079108
HES4	382.20667	2.359190	0.2058619	11.460064
CDKN1A	5303.58884	1.555197	0.1361511	11.422583
OCIAD2	353.43173	2.483107	0.2219754	11.186406
SDS	4763.87880	2.107153	0.1953024	10.789184
IL6R-AS1	455.12951	3.520018	0.3307596	10.642225
CCL7	1444.11412	1.947067	0.1938562	10.043870
TNFRSF9	169.65948	2.509770	0.2514923	9.979509
CLIC6	64.19776	4.054497	0.4350349	9.319935

head(subset(de3af,padj<0.05 & log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active Alv cells compared to mock") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active Alv cells compared to mock
	baseMean	log2FoldChange	lfcSE	stat
NDRG2	317.0287	-1.8033036	0.1656273	-10.887716
HIST1H1C	791.4543	-1.1579180	0.1105796	-10.471351
CEBPD	686.4608	-1.4769083	0.1637149	-9.021221
ADA2	1791.4889	-1.0408494	0.1158943	-8.981020
CRIM1	388.2713	-1.6778340	0.1948415	-8.611277
LYZ	46011.7884	-1.1761399	0.1385676	-8.487841
RPL10A	8121.1434	-0.8692328	0.1046647	-8.304927
CHST13	459.7071	-1.6239334	0.1970679	-8.240475
TGFBI	395.6418	-2.3302926	0.2874197	-8.107631
PIK3CD	583.8467	-1.0404514	0.1382467	-7.526050

m3a <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 15748

## Note: no. genes in output = 15748

## Note: estimated proportion of input genes in output = 1

mres3a <- mitch_calc(m3a,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres3a$enrichment_result

mitchtbl <- mres3a$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de3af_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("Alv_mock_vs_active.html") ) {
  mitch_report(mres3a,outfile="Alv_mock_vs_active.html")
}

Combined.

mm3 <- merge(m3a,m3m,by=0)

head(mm3)

##   Row.names        x.x         x.y
## 1      A1BG  1.2688786  0.37303646
## 2  A1BG-AS1 -0.7086023 -1.24450352
## 3       A2M -1.1312069 -0.45366896
## 4 A2ML1-AS1 -0.5102606  0.25614343
## 5    A4GALT  3.2131861  0.03284305
## 6      AAAS  0.8200706 -0.35943212

rownames(mm3) <- mm3[,1]
mm3[,1]=NULL
colnames(mm3) <- c("Alv","MDM")
plot(mm3)
mylm <- lm(mm3)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3766 -0.8804 -0.0974  0.7740 11.4780 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.093161   0.011884   7.839 4.85e-15 ***
## MDM         0.839716   0.009027  93.021  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.423 on 14332 degrees of freedom
## Multiple R-squared:  0.3765, Adjusted R-squared:  0.3764 
## F-statistic:  8653 on 1 and 14332 DF,  p-value: < 2.2e-16

cor.test(mm3$Alv,mm3$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm3$Alv and mm3$MDM
## t = 93.021, df = 14332, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6032502 0.6236681
## sample estimates:
##       cor 
## 0.6135617

mm3r <- as.data.frame(apply(mm3,2,rank))
plot(mm3r,cex=0.3)
mylm <- lm(mm3r)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm3r)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9964.7 -2495.7   -94.3  2415.6 11009.1 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.776e+03  5.464e+01   50.81   <2e-16 ***
## MDM         6.126e-01  6.602e-03   92.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3271 on 14332 degrees of freedom
## Multiple R-squared:  0.3753, Adjusted R-squared:  0.3753 
## F-statistic:  8611 on 1 and 14332 DF,  p-value: < 2.2e-16

cor.test(mm3r$Alv,mm3r$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm3r$Alv and mm3r$MDM
## t = 92.797, df = 14332, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6023124 0.6227672
## sample estimates:
##       cor 
## 0.6126424

DE4 Mock vs bystander

MDM group.

pb4m <- pbmdm[,c(grep("mock",colnames(pbmdm)),grep("bystander",colnames(pbmdm)))]

head(pb4m)

##             mdm_mock1 mdm_mock2 mdm_mock3 mdm_mock4 mdm_bystander1
## MIR1302-2HG         0         0         0         0              0
## FAM138A             0         0         0         0              0
## OR4F5               0         0         0         0              0
## AL627309.1         61        30         6        15             65
## AL627309.3          0         0         0         0              2
## AL627309.2          0         0         0         0              0
##             mdm_bystander2 mdm_bystander3
## MIR1302-2HG              0              0
## FAM138A                  0              0
## OR4F5                    0              0
## AL627309.1              51             19
## AL627309.3               4              0
## AL627309.2               0              0

pb4mf <- pb4m[which(rowMeans(pb4m)>=10),]
head(pb4mf)

##            mdm_mock1 mdm_mock2 mdm_mock3 mdm_mock4 mdm_bystander1
## AL627309.1        61        30         6        15             65
## AL627309.5        17        27         1        16             33
## LINC01409        119       116        33        57            136
## LINC01128        173       118        57       159            216
## LINC00115         19         6         3         1             18
## FAM41C            61        25        14        85             43
##            mdm_bystander2 mdm_bystander3
## AL627309.1             51             19
## AL627309.5             35              8
## LINC01409             246             63
## LINC01128             208            111
## LINC00115              24              7
## FAM41C                 69             19

colSums(pb4mf)

##      mdm_mock1      mdm_mock2      mdm_mock3      mdm_mock4 mdm_bystander1 
##       29115660       21618680        7551139       20731699       40542425 
## mdm_bystander2 mdm_bystander3 
##       36888573       16232197

des4m <- as.data.frame(grepl("bystander",colnames(pb4mf)))
colnames(des4m) <- "case"

plot(cmdscale(dist(t(pb4mf))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb4mf))), labels=colnames(pb4mf) )

dds <- DESeqDataSetFromMatrix(countData = pb4mf , colData = des4m, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de4mf <- de
write.table(de4mf,"de4mf.tsv",sep="\t")

nrow(subset(de4mf,padj<0.05 & log2FoldChange>0))

## [1] 1

nrow(subset(de4mf,padj<0.05 & log2FoldChange<0))

## [1] 27

head(subset(de4mf, log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in bystander MDM cells compared to mock") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in bystander MDM cells compared to mock
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
RBP7	3.371428e+02	0.9028624	0.2108153	4.282718	0.0000185	0.0182163
IFI6	1.236179e+04	1.0642894	0.2742746	3.880380	0.0001043	0.0514515
CCL8	9.256934e+00	3.2369960	0.8663313	3.736441	0.0001866	0.0863224
PLAAT3	2.037398e+01	1.5551917	0.4436038	3.505813	0.0004552	0.1727485
ISG15	9.101177e+02	0.7042685	0.2014310	3.496326	0.0004717	0.1745331
TNFAIP6	1.180878e+02	1.3809310	0.4155475	3.323161	0.0008900	0.2533181
PSME2	3.473007e+03	0.3948661	0.1219777	3.237198	0.0012071	0.3370754
HEXIM2	9.105406e+01	0.7473266	0.2461812	3.035676	0.0024000	0.5919922
HES1	7.898897e+01	1.8750551	0.6238906	3.005423	0.0026521	0.6330866
APOE	2.885174e+05	0.3003852	0.1018483	2.949339	0.0031845	0.7141099

head(subset(de4mf, log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top downregulated genes in bystander MDM cells compared to mock") %>%
  kable_paper("hover", full_width = F)

Top downregulated genes in bystander MDM cells compared to mock
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
CDK1	60.01187	-3.338466	0.5884689	-5.673138	0.0e+00	0.0002075
CENPF	126.75646	-2.876893	0.5474866	-5.254728	1.0e-07	0.0010970
CIT	20.94293	-2.835207	0.5566413	-5.093419	4.0e-07	0.0017349
CEP55	46.62027	-2.159231	0.4297160	-5.024785	5.0e-07	0.0018648
CCL4L2	38.47946	-4.011528	0.8118221	-4.941388	8.0e-07	0.0022960
TYMS	130.85145	-2.198421	0.4486049	-4.900574	1.0e-06	0.0023571
UBE2C	70.95719	-3.619243	0.7488856	-4.832838	1.3e-06	0.0027504
RRM2	48.14325	-2.669279	0.5545961	-4.813015	1.5e-06	0.0027504
MKI67	150.04105	-3.068534	0.6451095	-4.756610	2.0e-06	0.0032374
TOP2A	77.58402	-2.914842	0.6184587	-4.713074	2.4e-06	0.0036113

m4m <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 14825

## Note: no. genes in output = 14825

## Note: estimated proportion of input genes in output = 1

mres4m <- mitch_calc(m4m,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres4m$enrichment_result

mitchtbl <- mres4m$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de4mf_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("MDM_mock_vs_bystander.html") ) {
  mitch_report(mres4m,outfile="MDM_mock_vs_bystander.html")
}

Alv cells.

pb4a <- pbalv[,c(grep("mock",colnames(pbalv)),grep("bystander",colnames(pbalv)))]

head(pb4a)

##             alv_mock1 alv_mock2 alv_mock3 alv_bystander1 alv_bystander2
## MIR1302-2HG         0         0         0              0              0
## FAM138A             0         0         0              0              0
## OR4F5               0         0         0              0              0
## AL627309.1         46        26        52             20             38
## AL627309.3          0         0         1              0              1
## AL627309.2          0         0         0              0              0
##             alv_bystander3 alv_bystander4
## MIR1302-2HG              0              0
## FAM138A                  0              0
## OR4F5                    0              0
## AL627309.1               9             14
## AL627309.3               0              0
## AL627309.2               0              0

pb4af <- pb4a[which(rowMeans(pb4a)>=10),]
head(pb4af)

##            alv_mock1 alv_mock2 alv_mock3 alv_bystander1 alv_bystander2
## AL627309.1        46        26        52             20             38
## AL627309.5        63        29        50             27             51
## LINC01409         59       103       140             58             83
## LINC01128        176       200       250            141            162
## FAM41C            74        53        98             82             60
## SAMD11            19        44        37              1             29
##            alv_bystander3 alv_bystander4
## AL627309.1              9             14
## AL627309.5             24             12
## LINC01409              75             73
## LINC01128             109            105
## FAM41C                 27             37
## SAMD11                 16             24

colSums(pb4af)

##      alv_mock1      alv_mock2      alv_mock3 alv_bystander1 alv_bystander2 
##       20441114       25539265       34865693       20814782       22486763 
## alv_bystander3 alv_bystander4 
##       14120665       13506372

des4a <- as.data.frame(grepl("bystander",colnames(pb4af)))
colnames(des4a) <- "case"

plot(cmdscale(dist(t(pb4af))), xlab="Coordinate 1", ylab="Coordinate 2",
  type = "p",pch=19,col="gray",cex=2)

text(cmdscale(dist(t(pb4af))), labels=colnames(pb4af) )

dds <- DESeqDataSetFromMatrix(countData = pb4af , colData = des4a, design = ~ case)

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
de <- as.data.frame(zz[order(zz$pvalue),])
de4af <- de
write.table(de4af,"de4af.tsv",sep="\t")

nrow(subset(de4af,padj<0.05 & log2FoldChange>0))

## [1] 4

nrow(subset(de4af,padj<0.05 & log2FoldChange<0))

## [1] 0

head(subset(de4af,padj<0.05 & log2FoldChange>0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in latent Alv cells compared to ") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in latent Alv cells compared to
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
IFITM3	396.1914	1.8164917	0.3245822	5.596399	0.0e+00	0.0001983
EPSTI1	421.0338	2.9145326	0.5241180	5.560833	0.0e+00	0.0001983
IFI6	13043.5823	2.2745088	0.4392235	5.178477	2.0e-07	0.0011013
STAT1	2314.6957	0.8931956	0.1992453	4.482893	7.4e-06	0.0271889

head(subset(de4af, log2FoldChange<0),10)[,1:6] %>%
  kbl(caption="Top upregulated genes in active Alv cells") %>%
  kable_paper("hover", full_width = F)

Top upregulated genes in active Alv cells
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
SPP1	40458.66137	-0.4122804	0.0965604	-4.269664	0.0000196	0.0578258
PPBP	21.33519	-6.4579302	1.5515157	-4.162337	0.0000315	0.0775391
F13A1	30.49474	-5.3300819	1.2950413	-4.115762	0.0000386	0.0814198
FCGBP	20.90544	-4.3156170	1.2150996	-3.551657	0.0003828	0.3140987
MTSS1	195.93714	-0.7031042	0.2021971	-3.477320	0.0005065	0.3561808
SCN9A	18.86558	-1.2563392	0.4509299	-2.786108	0.0053345	0.9998790
CTSV	120.56641	-0.9427725	0.3458440	-2.726005	0.0064106	0.9998790
SLC9A3R2	32.00124	-0.9907193	0.3969273	-2.495972	0.0125613	0.9998790
PLD4	55.32855	-0.8244766	0.3314254	-2.487669	0.0128583	0.9998790
BIN2	739.68199	-0.4605689	0.1929669	-2.386776	0.0169968	0.9998790

m4a <- mitch_import(de,DEtype="deseq2",joinType="full")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 15089

## Note: no. genes in output = 15089

## Note: estimated proportion of input genes in output = 1

mres4a <- mitch_calc(m4a,genesets=go,minsetsize=5,cores=4,priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

res <- mres4a$enrichment_result

mitchtbl <- mres4a$enrichment_result
goid <- sapply(strsplit(mitchtbl$set," "),"[[",1)
mysplit <- strsplit(mitchtbl$set," ")
mysplit <- lapply(mysplit, function(x) { x[2:length(x)] } )
godescription <- unlist(lapply(mysplit, function(x) { paste(x,collapse=" ") } ))
em <- data.frame(goid,godescription,mitchtbl$pANOVA,mitchtbl$p.adjustANOVA,sign(mitchtbl$s.dist),mitchtbl$s.dist)
colnames(em) <- c("GO.ID","Description","p.Val","FDR","Phenotype","ES")
write.table(em,"de4af_em.tsv",row.names = FALSE, quote=FALSE,sep="\t")

res <- subset(res,p.adjustANOVA<0.05)
resup <- subset(res,s.dist>0)
resdn <- subset(res,s.dist<0)
s <- c(head(resup$s.dist,10), head(resdn$s.dist,10))
names(s) <- c(head(resup$set,10),head(resdn$set,10))
s <- s[order(s)]
cols <- gsub("1","red",gsub("-1","blue",as.character(sign(s))))
par(mar=c(5,27,3,1))
barplot(abs(s),las=1,horiz=TRUE,col=cols,xlab="ES",cex.names=0.8,main="")

if (! file.exists("Alv_mock_vs_bystander.html") ) {
  mitch_report(mres4a,outfile="Alv_mock_vs_bystander.html")
}

Combined.

mm4 <- merge(m4a,m4m,by=0)

head(mm4)

##   Row.names         x.x         x.y
## 1      A1BG -0.07091926  0.31515557
## 2  A1BG-AS1 -0.45060890 -0.64197143
## 3       A2M -1.45649601 -0.27561959
## 4 A2ML1-AS1  1.53565784 -0.01805401
## 5      AAAS  0.58832021 -0.11873620
## 6      AACS  0.13939707 -0.80459468

rownames(mm4) <- mm4[,1]
mm4[,1]=NULL
colnames(mm4) <- c("Alv","MDM")
plot(mm4)
mylm <- lm(mm4)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4069 -0.4651 -0.0270  0.4554  5.4426 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.099212   0.005777  17.175  < 2e-16 ***
## MDM         0.041525   0.006321   6.569 5.24e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6877 on 14276 degrees of freedom
## Multiple R-squared:  0.003014,   Adjusted R-squared:  0.002944 
## F-statistic: 43.15 on 1 and 14276 DF,  p-value: 5.236e-11

cor.test(mm4$Alv,mm4$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm4$Alv and mm4$MDM
## t = 6.5691, df = 14276, p-value = 5.236e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03852895 0.07123594
## sample estimates:
##        cor 
## 0.05489717

mm4r <- as.data.frame(apply(mm4,2,rank))
plot(mm4r,cex=0.3)
mylm <- lm(mm4r)
abline(mylm,col="red",lty=2,lwd=3)

summary(mylm)

## 
## Call:
## lm(formula = mm4r)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7325.0 -3551.3    12.5  3565.9  7363.7 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.886e+03  6.895e+01   99.86  < 2e-16 ***
## MDM         3.554e-02  8.364e-03    4.25 2.16e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4119 on 14276 degrees of freedom
## Multiple R-squared:  0.001263,   Adjusted R-squared:  0.001193 
## F-statistic: 18.06 on 1 and 14276 DF,  p-value: 2.156e-05

cor.test(mm4r$Alv,mm4r$MDM)

## 
##  Pearson's product-moment correlation
## 
## data:  mm4r$Alv and mm4r$MDM
## t = 4.2495, df = 14276, p-value = 2.156e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01915191 0.05191631
## sample estimates:
##        cor 
## 0.03554366

Cross comparison pathway enrichment heatmap

l1 <- list("de1a"=de1af,"de1m"=de1mf,"de2a"=de2af,"de2m"=de2mf,
  "de3a"=de3af,"de3m"=de3mf,"de4a"=de4af,"de4m"=de4mf)

lm <- mitch_import(x=l1,DEtype="deseq2",joinType="inner")

## Note: Mean no. genes in input = 15374.5

## Note: no. genes in output = 13992

## Note: estimated proportion of input genes in output = 0.91

lmres <- mitch_calc(x=lm,genesets=go,minsetsize=5,cores=8,priority="effect")

## Note: Enrichments with large effect sizes may not be 
##             statistically significant.

top <- head(lmres$enrichment_result,50)

top %>%
  kbl(caption="Top pathways across all contrasts") %>%
  kable_paper("hover", full_width = F)

Top pathways across all contrasts
	set	setSize	pMANOVA	s.de1a	s.de1m	s.de2a	s.de2m	s.de3a	s.de3m	s.de4a	s.de4m	p.de1a	p.de1m	p.de2a	p.de2m	p.de3a	p.de3m	p.de4a	p.de4m	s.dist	SD	p.adjustMANOVA
2039	GO:0019773 proteasome core complex, alpha-subunit complex	7	0.0000005	-0.0805455	0.3509168	-0.3864447	-0.8354972	0.2738342	0.7930231	0.8250983	0.8143317	0.7121416	0.1079160	0.0766589	0.0001290	0.2096693	0.0002794	0.0001564	0.0001904	1.739201	0.6141127	0.0000219
2051	GO:0019864 IgG binding	6	0.0003750	-0.8217027	-0.5944039	-0.5871586	-0.4254493	-0.7562563	-0.7475094	0.5545069	0.1266981	0.0004905	0.0116901	0.0127520	0.0711396	0.0013360	0.0015190	0.0186682	0.5910092	1.734502	0.4909392	0.0064502
4609	GO:0070508 cholesterol import	5	0.0017572	0.7392150	0.7706728	-0.2950025	0.3536570	0.7544005	0.6239937	0.5328519	-0.5624508	0.0042016	0.0028406	0.2533459	0.1708785	0.0034841	0.0156782	0.0390812	0.0294088	1.706237	0.5137242	0.0223181
2587	GO:0032395 MHC class II receptor activity	6	0.0000000	-0.8632919	-0.3760904	0.3394108	-0.2494399	-0.3413175	-0.7069689	0.9591020	-0.4419658	0.0002499	0.1106666	0.1499766	0.2900629	0.1477004	0.0027088	0.0000472	0.0608409	1.672020	0.5907155	0.0000008
3683	GO:0045656 negative regulation of monocyte differentiation	5	0.0080738	-0.7485951	-0.3788804	0.3954672	0.5263602	-0.8770859	-0.6009151	-0.5030242	-0.4858368	0.0037441	0.1423596	0.1256973	0.0415310	0.0006820	0.0199687	0.0514372	0.0599376	1.660375	0.5160364	0.0676118
3932	GO:0048245 eosinophil chemotaxis	7	0.0000511	0.6365085	0.7221104	0.5989172	0.4531488	0.6475407	0.7461770	-0.1213034	-0.4145564	0.0035421	0.0009375	0.0060692	0.0378899	0.0030085	0.0006286	0.5784183	0.0575364	1.628893	0.4338978	0.0012065
3271	GO:0042613 MHC class II protein complex	12	0.0000000	-0.8022651	-0.5341440	0.2022890	-0.4797926	-0.3812470	-0.7281354	0.8564139	-0.1764187	0.0000015	0.0013563	0.2250775	0.0040060	0.0222244	0.0000125	0.0000003	0.2900624	1.624857	0.5501007	0.0000000
374	GO:0002503 peptide antigen assembly with MHC class II protein complex	11	0.0000000	-0.7876209	-0.5446808	0.1782874	-0.4826745	-0.3406766	-0.7168105	0.8480145	-0.1272181	0.0000061	0.0017601	0.3059731	0.0055748	0.0504383	0.0000384	0.0000011	0.4651049	1.595925	0.5425071	0.0000000
2244	GO:0030292 protein tyrosine kinase inhibitor activity	5	0.0057415	-0.8815186	-0.6753271	-0.2132409	-0.3907772	-0.8098806	-0.5730035	0.2059770	0.1067992	0.0006404	0.0089188	0.4089930	0.1302467	0.0017106	0.0264981	0.4251371	0.6792243	1.571430	0.4078682	0.0524191
3335	GO:0043009 chordate embryonic development	5	0.0101461	-0.4091657	-0.5252163	-0.3357260	0.7828555	-0.6367198	-0.6649746	-0.4130836	-0.4469722	0.1131168	0.0419759	0.1936174	0.0024320	0.0136779	0.0100225	0.1097073	0.0834973	1.544949	0.4644070	0.0784181
1457	GO:0008541 proteasome regulatory particle, lid subcomplex	8	0.0000735	0.0808066	0.5838995	-0.5467499	-0.6032966	0.2541655	0.7735090	0.6165082	0.5232408	0.6923097	0.0042383	0.0074095	0.0031278	0.2132352	0.0001513	0.0025309	0.0103866	1.525500	0.5309661	0.0016901
922	GO:0006271 DNA strand elongation involved in DNA replication	8	0.0000000	-0.5730478	-0.1114667	-0.7970717	0.5430134	-0.6506543	-0.4310462	-0.1739846	-0.5926952	0.0050053	0.5851546	0.0000944	0.0078245	0.0014379	0.0347670	0.3941930	0.0036967	1.503897	0.4294211	0.0000013
3139	GO:0038094 Fc-gamma receptor signaling pathway	8	0.0010790	-0.7986091	-0.6071582	-0.2996818	-0.2346253	-0.6835491	-0.6791154	0.3622890	0.1619887	0.0000915	0.0029413	0.1422019	0.2505476	0.0008137	0.0008798	0.0760168	0.4276086	1.495729	0.4263174	0.0152976
2619	GO:0032489 regulation of Cdc42 protein signal transduction	5	0.0709729	-0.7237149	-0.5813827	0.2108672	0.0616144	-0.7318653	-0.7415600	-0.4250947	-0.0678773	0.0050696	0.0243674	0.4142279	0.8114408	0.0045948	0.0040827	0.0997590	0.7926912	1.476781	0.3885175	0.2684563
2508	GO:0031726 CCR1 chemokine receptor binding	6	0.0000043	0.5503599	0.5440917	0.6158063	0.4796225	0.5921159	0.4842462	0.2049430	-0.5855856	0.0195697	0.0210045	0.0089970	0.0419115	0.0120166	0.0399744	0.3847119	0.0129935	1.475871	0.4030879	0.0001377
5422	GO:1903238 positive regulation of leukocyte tethering or rolling	6	0.0002540	-0.8001335	-0.6353020	0.3086896	-0.1596835	-0.6827065	-0.6750798	0.0389199	-0.2793508	0.0006879	0.0070412	0.1904322	0.4982287	0.0037791	0.0041876	0.8688866	0.2360749	1.471728	0.4010405	0.0047086
3095	GO:0036150 phosphatidylserine acyl-chain remodeling	7	0.0127078	-0.6422698	-0.6399408	0.2954901	0.1527044	-0.6631697	-0.7263599	-0.4122682	-0.2904643	0.0032536	0.0033675	0.1758352	0.4842095	0.0023776	0.0008743	0.0589281	0.1832994	1.467786	0.3935202	0.0907612
3281	GO:0042719 mitochondrial intermembrane space protein transporter complex	6	0.0051529	0.5746222	0.6736737	0.1501263	0.2599266	0.7952715	0.4638448	0.5239525	-0.3720387	0.0147919	0.0042672	0.5242922	0.2702599	0.0007416	0.0491296	0.0262520	0.1145616	1.461053	0.3697564	0.0488673
510	GO:0004185 serine-type carboxypeptidase activity	5	0.0009746	-0.4496032	-0.7268607	-0.0723672	-0.6067205	-0.2650604	-0.2766712	0.5152070	0.7822264	0.0816953	0.0048813	0.7793224	0.0188027	0.3047414	0.2840446	0.0460434	0.0024517	1.458745	0.5314035	0.0142406
4686	GO:0071139 resolution of DNA recombination intermediates	5	0.0681896	-0.7636663	-0.2820476	-0.2642883	-0.0390791	-0.7269465	-0.7304640	-0.0474870	-0.5688282	0.0031031	0.2747908	0.3061524	0.8797293	0.0048763	0.0046734	0.8541178	0.0276185	1.456695	0.3064802	0.2619653
3655	GO:0045569 TRAIL binding	5	0.0008343	0.5551870	0.2297133	0.4938729	0.1994852	0.8454851	0.5646243	0.1587903	0.6388075	0.0315686	0.3737613	0.0558283	0.4398751	0.0010592	0.0287877	0.5386660	0.0133723	1.453148	0.2430014	0.0124773
4545	GO:0070106 interleukin-27-mediated signaling pathway	8	0.0000000	-0.4510512	-0.4420767	0.2545051	-0.3206701	0.6387300	0.0834883	0.8810605	0.5833631	0.0271709	0.0303805	0.2126241	0.1163138	0.0017570	0.6826460	0.0000159	0.0042736	1.448450	0.5223175	0.0000000
811	GO:0005839 proteasome core complex	18	0.0000000	-0.1652036	0.1315379	-0.3506989	-0.7031948	0.2096274	0.4878823	0.7592911	0.7359064	0.2250778	0.3340891	0.0100090	0.0000002	0.1237123	0.0003392	0.0000000	0.0000001	1.436009	0.5222824	0.0000000
4210	GO:0051383 kinetochore organization	5	0.0104385	-0.4150854	-0.3544577	-0.5133481	0.4800887	-0.4371059	-0.6966612	0.3763352	-0.6336312	0.1079968	0.1699119	0.0468342	0.0630283	0.0905437	0.0069803	0.1450599	0.0141415	1.418260	0.4487812	0.0795739
3113	GO:0036402 proteasome-activating activity	6	0.0023304	0.0461175	0.5064588	-0.4429668	-0.5022165	0.1682158	0.8227513	0.5707612	0.5154917	0.8449226	0.0316941	0.0602586	0.0331510	0.4755556	0.0004824	0.0154756	0.0287740	1.415551	0.4853636	0.0276216
3253	GO:0042555 MCM complex	11	0.0000003	-0.5265913	-0.1052922	-0.2900495	0.1136933	-0.6778875	-0.6640636	-0.4832077	-0.6751565	0.0024940	0.5454647	0.0958158	0.5138801	0.0000989	0.0001368	0.0055226	0.0001055	1.405446	0.2944633	0.0000152
4955	GO:0097100 supercoiled DNA binding	6	0.0066141	-0.7991325	-0.4206588	-0.3615520	-0.1812050	-0.7859288	-0.4602936	0.3706325	0.1649268	0.0006987	0.0743809	0.1251444	0.4421532	0.0008559	0.0508907	0.1159384	0.4842278	1.404723	0.4155287	0.0579291
2620	GO:0032494 response to peptidoglycan	5	0.0699122	-0.6597698	-0.1965968	-0.5409738	0.0590405	-0.6087224	-0.6217917	0.2737828	-0.5850719	0.0106221	0.4465256	0.0361901	0.8191783	0.0184146	0.0160493	0.2891021	0.0234775	1.394460	0.3600882	0.2662736
848	GO:0005942 phosphatidylinositol 3-kinase complex	6	0.0252338	-0.6262930	-0.5649221	0.0452119	0.5048620	-0.5719291	-0.7129272	0.0726679	-0.3550932	0.0078918	0.0165625	0.8479305	0.0322359	0.0152659	0.0024923	0.7579220	0.1320322	1.391061	0.4351368	0.1470388
5171	GO:0140367 antibacterial innate immune response	5	0.0037706	-0.2229356	-0.3465933	0.5975406	0.6729249	-0.0896404	-0.6593122	-0.1035104	-0.7079288	0.3880243	0.1795852	0.0206751	0.0091648	0.7285287	0.0106763	0.6885764	0.0061172	1.390797	0.5129721	0.0397019
2333	GO:0030836 positive regulation of actin filament depolymerization	5	0.0493005	0.2536784	0.8464860	-0.2788732	-0.0843498	0.4178308	0.6617716	0.5427468	-0.3829699	0.3259787	0.0010448	0.2802293	0.7439718	0.1056855	0.0103878	0.0355838	0.1381018	1.385475	0.4521868	0.2183580
495	GO:0004045 aminoacyl-tRNA hydrolase activity	5	0.0358800	0.3955244	0.5036820	-0.2125545	-0.4395653	0.6315150	0.7064417	0.5412597	0.2227926	0.1256426	0.0511330	0.4105026	0.0887439	0.0144670	0.0062254	0.0360917	0.3883287	1.374410	0.4139049	0.1829010
2386	GO:0031048 regulatory ncRNA-mediated heterochromatin formation	7	0.0035132	-0.5724603	-0.2635375	-0.3095255	0.4347209	-0.6489504	-0.5691098	0.1552173	-0.6528117	0.0087217	0.2273159	0.1561927	0.0464143	0.0029459	0.0091227	0.4770470	0.0027805	1.370120	0.4037799	0.0376852
5517	GO:1904948 midbrain dopaminergic neuron differentiation	5	0.0625096	-0.6873096	-0.2812469	-0.5601058	0.0144277	-0.5597340	-0.6172732	0.4095088	-0.3505684	0.0077779	0.2761558	0.0300917	0.9554508	0.0302012	0.0168350	0.1128149	0.1746458	1.360274	0.3749702	0.2510905
903	GO:0006172 ADP biosynthetic process	5	0.0201749	0.6002288	0.3193966	0.5374848	0.5289054	0.6607135	0.6062344	-0.0520340	-0.1110317	0.0201106	0.2161911	0.0374089	0.0405555	0.0105111	0.0188980	0.8403287	0.6672611	1.360154	0.3062642	0.1260674
5238	GO:0140948 histone H3K9 monomethyltransferase activity	5	0.0412565	-0.6338600	-0.6695789	-0.4182026	-0.0031315	-0.7124187	-0.5421177	-0.1074283	-0.0231787	0.0141067	0.0095175	0.1053755	0.9903260	0.0058010	0.0357979	0.6774408	0.9284886	1.355923	0.2999000	0.1953033
1726	GO:0014909 smooth muscle cell migration	5	0.0978603	0.6563952	0.4643598	0.0810610	-0.2764138	0.7266605	0.5528991	0.3417602	0.3892615	0.0110277	0.0721644	0.7536241	0.2844928	0.0048931	0.0322764	0.1857296	0.1317440	1.353336	0.3282024	0.3206772
5368	GO:1902254 negative regulation of intrinsic apoptotic signaling pathway by p53 class mediator	6	0.0042211	0.5770532	0.1011011	0.5082702	0.1512465	0.6712903	0.5179465	0.0758854	0.6892845	0.0143752	0.6680639	0.0310890	0.5212016	0.0044051	0.0280217	0.7475608	0.0034563	1.350628	0.2589746	0.0427940
5577	GO:1990414 replication-born double-strand break repair via sister chromatid exchange	6	0.0119313	-0.3756375	-0.4652986	-0.6112303	0.5056247	-0.6144954	-0.5279089	-0.2128080	-0.3483722	0.1110966	0.0484235	0.0095211	0.0319762	0.0091444	0.0251398	0.3667345	0.1395077	1.345291	0.3648702	0.0868390
4446	GO:0060992 response to fungicide	6	0.0497383	-0.6623767	-0.6846370	-0.0978598	0.2016064	-0.5906383	-0.6910005	-0.1310358	0.0008103	0.0049577	0.0036816	0.6780987	0.3924987	0.0122318	0.0033763	0.5783667	0.9972577	1.342087	0.3625275	0.2189863
3270	GO:0042612 MHC class I protein complex	9	0.0000000	-0.2064332	-0.5217526	-0.1754909	-0.4302923	0.3301231	-0.1629677	0.8378030	0.6505121	0.2836124	0.0067213	0.3620235	0.0254078	0.0863902	0.3972917	0.0000134	0.0007263	1.338435	0.5040533	0.0000001
3149	GO:0038156 interleukin-3-mediated signaling pathway	6	0.0150804	-0.8707279	-0.3100243	0.0131322	-0.1334668	-0.8502312	-0.4150579	0.1197150	-0.0780304	0.0002208	0.1885224	0.9555822	0.5713374	0.0003098	0.0783224	0.6116248	0.7406784	1.337107	0.3762780	0.1024677
491	GO:0004017 adenylate kinase activity	7	0.0024552	0.6206752	0.2577558	0.5443690	0.3783952	0.7255631	0.5853721	0.1314572	0.0739466	0.0044588	0.2376767	0.0126297	0.0830017	0.0008858	0.0073195	0.5470331	0.7347970	1.335308	0.2412153	0.0287593
3245	GO:0042492 gamma-delta T cell differentiation	5	0.0542480	-0.5320798	-0.4154286	-0.6826196	-0.2984915	-0.5092014	-0.5967112	0.2953457	-0.1973976	0.0393660	0.1077057	0.0082079	0.2477752	0.0486396	0.0208520	0.2527941	0.4446762	1.323745	0.3103825	0.2331044
4689	GO:0071162 CMG complex	10	0.0000000	-0.4103276	0.1139322	-0.3473609	0.0933200	-0.5041482	-0.6597196	-0.3704763	-0.7840652	0.0246621	0.5327866	0.0571912	0.6094150	0.0057717	0.0003031	0.0425169	0.0000175	1.323678	0.3214564	0.0000001
2919	GO:0035005 1-phosphatidylinositol-4-phosphate 3-kinase activity	6	0.0243224	-0.0530769	-0.7187664	0.2353067	0.6745078	-0.2369751	-0.6149959	-0.4994757	-0.1865675	0.8218875	0.0022957	0.3182617	0.0042198	0.3148425	0.0090879	0.0341224	0.4287646	1.322283	0.4634371	0.1428913
2040	GO:0019774 proteasome core complex, beta-subunit complex	10	0.0000003	-0.2968674	0.0192676	-0.2923330	-0.7058790	0.1167072	0.2804320	0.7055214	0.6892862	0.1040859	0.9159902	0.1094807	0.0001108	0.5228550	0.1246935	0.0001117	0.0001602	1.318091	0.4933941	0.0000143
5282	GO:1900028 negative regulation of ruffle assembly	5	0.0579430	-0.6502181	-0.4106813	-0.0623865	-0.3339530	-0.5656252	-0.7264603	0.2996926	-0.2793594	0.0118059	0.1117880	0.8091232	0.1959812	0.0285056	0.0049049	0.2458774	0.2793916	1.312400	0.3362544	0.2406799
731	GO:0005658 alpha DNA polymerase:primase complex	5	0.0263270	-0.4710803	-0.1615643	-0.6857368	0.5147208	-0.6494173	-0.3316937	-0.0013012	-0.4492314	0.0681361	0.5315953	0.0079198	0.0462491	0.0119102	0.1990238	0.9959801	0.0819480	1.310254	0.3950033	0.1511033
5375	GO:1902425 positive regulation of attachment of mitotic spindle microtubules to kinetochore	8	0.0000755	-0.2920302	-0.0800915	-0.6364416	0.2548269	-0.3103368	-0.4766161	0.5589960	-0.7132616	0.1526696	0.6948952	0.0018252	0.2120463	0.1285547	0.0195809	0.0061842	0.0004764	1.306285	0.4387112	0.0017228

colfunc <- colorRampPalette(c("blue", "white", "red"))

mx <- top[,grep("^s\\.",colnames(top))]
mx <- mx[,-ncol(mx)]
rownames(mx) <- top$set

heatmap.2(as.matrix(mx),scale="none",trace="none",margins=c(6,25),
  col=colfunc(25),cexRow=0.6,cexCol=0.8)

Session information

For reproducibility.

save.image("scanalysis.Rdata")

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] gplots_3.2.0                limma_3.60.6               
##  [3] SingleR_2.6.0               celldex_1.14.0             
##  [5] harmony_1.2.1               Rcpp_1.0.13-1              
##  [7] ggplot2_3.5.1               kableExtra_1.4.0           
##  [9] vioplot_0.5.0               zoo_1.8-12                 
## [11] sm_2.2-6.0                  mitch_1.19.3               
## [13] DESeq2_1.44.0               muscat_1.18.0              
## [15] beeswarm_0.4.0              stringi_1.8.4              
## [17] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [19] Biobase_2.64.0              GenomicRanges_1.56.2       
## [21] GenomeInfoDb_1.40.1         IRanges_2.38.1             
## [23] S4Vectors_0.42.1            BiocGenerics_0.50.0        
## [25] MatrixGenerics_1.16.0       matrixStats_1.4.1          
## [27] hdf5r_1.3.11                Seurat_5.1.0               
## [29] SeuratObject_5.0.2          sp_2.1-4                   
## [31] plyr_1.8.9                 
## 
## loaded via a namespace (and not attached):
##   [1] R.methodsS3_1.8.2         progress_1.2.3           
##   [3] goftest_1.2-3             HDF5Array_1.32.1         
##   [5] Biostrings_2.72.1         vctrs_0.6.5              
##   [7] spatstat.random_3.3-2     digest_0.6.37            
##   [9] png_0.1-8                 corpcor_1.6.10           
##  [11] shape_1.4.6.1             gypsum_1.0.1             
##  [13] ggrepel_0.9.6             echarts4r_0.4.5          
##  [15] deldir_2.0-4              parallelly_1.39.0        
##  [17] MASS_7.3-61               reshape2_1.4.4           
##  [19] httpuv_1.6.15             foreach_1.5.2            
##  [21] withr_3.0.2               xfun_0.49                
##  [23] survival_3.7-0            memoise_2.0.1.9000       
##  [25] ggbeeswarm_0.7.2          systemfonts_1.1.0        
##  [27] GlobalOptions_0.1.2       gtools_3.9.5             
##  [29] pbapply_1.7-2             R.oo_1.27.0              
##  [31] prettyunits_1.2.0         GGally_2.2.1             
##  [33] KEGGREST_1.44.1           promises_1.3.1           
##  [35] httr_1.4.7                globals_0.16.3           
##  [37] fitdistrplus_1.2-1        rhdf5filters_1.16.0      
##  [39] rhdf5_2.48.0              rstudioapi_0.17.1        
##  [41] UCSC.utils_1.0.0          miniUI_0.1.1.1           
##  [43] generics_0.1.3            curl_6.0.1               
##  [45] zlibbioc_1.50.0           ScaledMatrix_1.12.0      
##  [47] polyclip_1.10-7           GenomeInfoDbData_1.2.12  
##  [49] ExperimentHub_2.12.0      SparseArray_1.4.8        
##  [51] xtable_1.8-4              stringr_1.5.1            
##  [53] doParallel_1.0.17         evaluate_1.0.1           
##  [55] S4Arrays_1.4.1            BiocFileCache_2.12.0     
##  [57] hms_1.1.3                 irlba_2.3.5.1            
##  [59] colorspace_2.1-1          filelock_1.0.3           
##  [61] ROCR_1.0-11               reticulate_1.40.0        
##  [63] spatstat.data_3.1-4       magrittr_2.0.3           
##  [65] lmtest_0.9-40             later_1.4.0              
##  [67] viridis_0.6.5             lattice_0.22-6           
##  [69] spatstat.geom_3.3-4       future.apply_1.11.3      
##  [71] scattermore_1.2           scuttle_1.14.0           
##  [73] cowplot_1.1.3             RcppAnnoy_0.0.22         
##  [75] pillar_1.9.0              nlme_3.1-166             
##  [77] iterators_1.0.14          caTools_1.18.3           
##  [79] compiler_4.4.1            beachmat_2.20.0          
##  [81] RSpectra_0.16-2           tensor_1.5               
##  [83] minqa_1.2.8               crayon_1.5.3             
##  [85] abind_1.4-8               scater_1.32.1            
##  [87] blme_1.0-6                locfit_1.5-9.10          
##  [89] bit_4.5.0                 dplyr_1.1.4              
##  [91] codetools_0.2-20          BiocSingular_1.20.0      
##  [93] bslib_0.8.0               alabaster.ranges_1.4.2   
##  [95] GetoptLong_1.0.5          plotly_4.10.4            
##  [97] remaCor_0.0.18            mime_0.12                
##  [99] splines_4.4.1             circlize_0.4.16          
## [101] fastDummies_1.7.4         dbplyr_2.5.0             
## [103] sparseMatrixStats_1.16.0  knitr_1.49               
## [105] blob_1.2.4                utf8_1.2.4               
## [107] clue_0.3-66               BiocVersion_3.19.1       
## [109] lme4_1.1-35.5             listenv_0.9.1            
## [111] DelayedMatrixStats_1.26.0 Rdpack_2.6.2             
## [113] tibble_3.2.1              Matrix_1.7-1             
## [115] statmod_1.5.0             svglite_2.1.3            
## [117] fANCOVA_0.6-1             pkgconfig_2.0.3          
## [119] tools_4.4.1               cachem_1.1.0             
## [121] RhpcBLASctl_0.23-42       rbibutils_2.3            
## [123] RSQLite_2.3.8             viridisLite_0.4.2        
## [125] DBI_1.2.3                 numDeriv_2016.8-1.1      
## [127] fastmap_1.2.0             rmarkdown_2.29           
## [129] scales_1.3.0              grid_4.4.1               
## [131] ica_1.0-3                 broom_1.0.7              
## [133] AnnotationHub_3.12.0      sass_0.4.9               
## [135] patchwork_1.3.0           BiocManager_1.30.25      
## [137] ggstats_0.7.0             dotCall64_1.2            
## [139] RANN_2.6.2                alabaster.schemas_1.4.0  
## [141] farver_2.1.2              reformulas_0.4.0         
## [143] aod_1.3.3                 mgcv_1.9-1               
## [145] yaml_2.3.10               cli_3.6.3                
## [147] purrr_1.0.2               leiden_0.4.3.1           
## [149] lifecycle_1.0.4           uwot_0.2.2               
## [151] glmmTMB_1.1.10            mvtnorm_1.3-2            
## [153] backports_1.5.0           BiocParallel_1.38.0      
## [155] gtable_0.3.6              rjson_0.2.23             
## [157] ggridges_0.5.6            progressr_0.15.1         
## [159] jsonlite_1.8.9            edgeR_4.2.2              
## [161] RcppHNSW_0.6.0            bitops_1.0-9             
## [163] bit64_4.5.2               Rtsne_0.17               
## [165] alabaster.matrix_1.4.2    spatstat.utils_3.1-1     
## [167] BiocNeighbors_1.22.0      alabaster.se_1.4.1       
## [169] jquerylib_0.1.4           spatstat.univar_3.1-1    
## [171] R.utils_2.12.3            pbkrtest_0.5.3           
## [173] lazyeval_0.2.2            alabaster.base_1.4.2     
## [175] shiny_1.9.1               htmltools_0.5.8.1        
## [177] sctransform_0.4.1         rappdirs_0.3.3           
## [179] glue_1.8.0                spam_2.11-0              
## [181] httr2_1.0.7               XVector_0.44.0           
## [183] gridExtra_2.3             EnvStats_3.0.0           
## [185] boot_1.3-31               igraph_2.1.1             
## [187] variancePartition_1.34.0  TMB_1.9.15               
## [189] R6_2.5.1                  tidyr_1.3.1              
## [191] labeling_0.4.3            cluster_2.1.6            
## [193] Rhdf5lib_1.26.0           nloptr_2.1.1             
## [195] DelayedArray_0.30.1       tidyselect_1.2.1         
## [197] vipor_0.4.7               xml2_1.3.6               
## [199] AnnotationDbi_1.66.0      future_1.34.0            
## [201] rsvd_1.0.5                munsell_0.5.1            
## [203] KernSmooth_2.23-24        data.table_1.16.2        
## [205] htmlwidgets_1.6.4         ComplexHeatmap_2.20.0    
## [207] RColorBrewer_1.1-3        rlang_1.1.4              
## [209] spatstat.sparse_3.1-0     spatstat.explore_3.3-3   
## [211] lmerTest_3.1-3            fansi_1.0.6

Single cell transcriptomes of macrophages with HIV 2024 - comparative analysis

Burnet Bioinformatics group

2024-12-19

Introduction

Load data

Make single cell experiment object

Normalise data

PCA and Cluster

UMAP

Assign names to clusters

Make MDM object

Make Alv object

Extract monocytes MDM only

Cell counts

Focus on MOCK vs ACTIVE - exclude latent and bystander

Differential expression

DE Prep

Gene sets

DE1 Latently- vs productively-infected cells (groups 3 vs 4).

DE2 Latently-infected vs bystander cells

DE3 Active vs mock

DE4 Mock vs bystander

Cross comparison pathway enrichment heatmap

Session information