PADDI RNA expression analysis

Introduction

In this report we will do QC and differential expression analysis. Let’s QC this data.

suppressPackageStartupMessages({
  library("gplots")
  library("reshape2")
  library("WGCNA")
  library("dplyr")
  library("DESeq2")
  library("mitch")
  library("MASS")
  library("eulerr")
  library("beeswarm")
})

Multi-qc results

Please have a look at the multiQC report. Here are a few key points:

Skewer trimming resulted in loss of only a tiny number of bases. This indicates the sequence quality is very high.
Fastqc results showing the number of unique and duplicate reads indicates a few samples with <10M unique reads.
Per seqence GC content showed an unusual profile for two samples. PG1423-EOS R1 and R2 had GC profile max at 40% compared to the mean. PG2090-EOS also showed an unusual pattern with underrepresented low GC%.
Sequence duplication levels were elevated for some fastq files. Here are the files of concern, with <20% unique reads: PG3627-POD1_S86_R1_001 PG3627-POD1_S86_R2_001 PG3609-T0_S317_R1_001 PG2090-EOS_S134_R1_001 PG2090-EOS_S134_R2_001
There were two files with overrepresented sequences: PG2090-EOS R1 and R2. Others are okay.
Adapter content was very low which is good.

The fastq files were also checked with validatefastq-assembly which looks for signs of file corruption which can occur in large data transfers. No problematic files were detected.

rRNA amount

Ribosomal RNA carryover can be a source of noise. The proportion should be <10% and there were a few samples in excess of this including PG2020-EOS, PG815-EOS, PG1452-EOS and PG702-POD1.

rrna <- read.table("rrna_stats.txt")
rrna <- rrna[,c(1,5)]
rrna$V1 <- sapply(strsplit(rrna$V1,"\\."),"[[",1)
rrna$V5 <- gsub("\\(","",rrna$V5)
rrna$V5 <- gsub("%","",rrna$V5)
rrna$V5 <- as.numeric(rrna$V5)
str(rrna)

## 'data.frame':    319 obs. of  2 variables:
##  $ V1: chr  "3166-POD1_S266_R1_001" "3166-T0_S265_R1_001" "3167-POD1_S268_R1_001" "3167-T0_S267_R1_001" ...
##  $ V5: num  0.57 1.11 0.61 0.93 0.96 0.79 0.7 5.2 1.14 2.83 ...

rrna2 <- rrna[,2]
names(rrna2) <- rrna[,1]

par(mar=c(5,8,3,1))
barplot(rrna2,horiz=TRUE,las=1,cex.names=0.5,main="rRNA carryover")

rrna2 <- rrna2[order(-rrna2)]
barplot(head(rrna2,20),horiz=TRUE,las=1,cex.names=0.6,main="rRNA carryover")

Load the data

tmp <- read.table("3col.tsv.gz",header=FALSE)
x <- as.matrix(acast(tmp, V2~V1, value.var="V3", fun.aggregate = sum))
x <- as.data.frame(x)
accession <- sapply((strsplit(rownames(x),"\\|")),"[[",2)
symbol<-sapply((strsplit(rownames(x),"\\|")),"[[",6)
x$geneid <- paste(accession,symbol)
xx <- aggregate(. ~ geneid,x,sum)
rownames(xx) <- xx$geneid
colnames <- gsub("T0R","T0",colnames(xx))
xx$geneid = NULL
xx <- round(xx)
xx[1:10,1:6]

##                             3166-POD1 3166-T0 3167-POD1 3167-T0 3171-POD1
## ENSG00000000003.15 TSPAN6           3       1         5       5        23
## ENSG00000000005.6 TNMD              0       0         0       0         0
## ENSG00000000419.14 DPM1           685     577       521     735       811
## ENSG00000000457.14 SCYL3          622     611       550     777       789
## ENSG00000000460.17 C1orf112       181     171       232     263       215
## ENSG00000000938.13 FGR          33797   44344     31524   38959     26402
## ENSG00000000971.16 CFH            106      40        98     183       195
## ENSG00000001036.14 FUCA2         1229     769      1150     868       978
## ENSG00000001084.13 GCLC           944    1085       577     961       908
## ENSG00000001167.15 NFYA          1243    1277      1295    1605      1166
##                             3171-T0
## ENSG00000000003.15 TSPAN6         4
## ENSG00000000005.6 TNMD            1
## ENSG00000000419.14 DPM1         494
## ENSG00000000457.14 SCYL3        575
## ENSG00000000460.17 C1orf112     196
## ENSG00000000938.13 FGR        33751
## ENSG00000000971.16 CFH          130
## ENSG00000001036.14 FUCA2        805
## ENSG00000001084.13 GCLC         798
## ENSG00000001167.15 NFYA        1251

Number of reads per sample

Let’s look at the number of reads per sample

Most samples were in the range of 25-30 million assigned reads. Just 2 samples had less than 20 million reads: PG1452-EOS and PG1423-EOS. The maximum read count was about 40 million for PG7072-EOS.

xxcs <- colSums(xx)
par(mar=c(5,8,3,1))
barplot(xxcs,horiz=TRUE,las=1,main="no. reads per sample")

barplot(head(xxcs[order(xxcs)],20),horiz=TRUE,las=1,main="lowest no. reads per sample")

barplot(head(xxcs[order(-xxcs)],20),horiz=TRUE,las=1,main="highest no. reads per sample")

MDS

Some outliers are apparent.

PG2090-EOS to the left of the chart - this is clearly the effect of rRNA carryover. Other samples over to the left of the chart include PG815-EOS, PG145-EOS and PG702-POD1 which all have elevated rRNA.

heatmap.2( cor(xx),trace="none",scale="none")

mds <- cmdscale(dist(t(xx)))

par(mar=c(5,5,3,1))
minx <- min(mds[,1])
maxx <- max(mds[,1])
miny <- min(mds[,2])
maxy <- max(mds[,2])

plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(minx*1.1,maxx*1.1), ylim = c(miny*1.1,maxy*1.1) ,
  type = "p", col="gray", pch=19, cex.axis=1.3,cex.lab=1.3, bty='n')
text(mds, labels=rownames(mds), cex=0.8)

col <- rownames(mds)
col <- sapply(strsplit(col,"-"),"[[",2)
col <- gsub("T0","lightblue",col)
col <- gsub("POD1","orange",col)
col <- gsub("EOS","pink",col)

plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(minx*1.1,maxx*1.1), ylim = c(miny*1.1,maxy*1.1) , cex=1.5 ,
  type = "p", col=col, pch=19, cex.axis=1.3,cex.lab=1.3, bty='n')
#text(mds, labels=rownames(mds), cex=0.8) 
mtext("blue=T0, orange=POD1, pink=EOS")

Exclude PG2090-EOS and repeat the analysis.

xx <- xx[,grep("PG2090-EOS",colnames(xx),invert=TRUE)]

mds <- cmdscale(dist(t(xx)))

par(mar=c(5,5,3,1))
minx <- min(mds[,1])
maxx <- max(mds[,1])
miny <- min(mds[,2])
maxy <- max(mds[,2])

plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(minx*1.1,maxx*1.1), ylim = c(miny*1.1,maxy*1.1) ,
  type = "p", col="gray", pch=19, cex.axis=1.3,cex.lab=1.3, bty='n')
text(mds, labels=rownames(mds), cex=0.8)

col <- rownames(mds)
col <- sapply(strsplit(col,"-"),"[[",2)
col <- gsub("T0","lightblue",col)
col <- gsub("POD1","orange",col)
col <- gsub("EOS","pink",col)

plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(minx*1.1,maxx*1.1), ylim = c(miny*1.1,maxy*1.1) , cex=1.5 ,
  type = "p", col=col, pch=19, cex.axis=1.3,cex.lab=1.3, bty='n')
#text(mds, labels=rownames(mds), cex=0.8) 
mtext("blue=T0, orange=POD1, pink=EOS")

In the MDS plot with PG2090-EOS removed, there appears to be some separation of T0, POD1 and EOS samples. POD1 (orange) are more towards the upper side of the chart and T0 (blue) are toward the bottom right. EOS (pink) are quite spread out.

Conclusion

PG2090-EOS suffered rRNA carryover and needs to be re-prepared. The other samples with slightly higher rRNA are not a problem as the rRNA can be corrected for statistically. not sure what to do about samples with low numbers of unique reads.

Load patient info

xx <- xx[,order(colnames(xx))]

ss <- read.csv("PADDIgenomicsData.csv")
ss <- ss[order(ss$PG_number),]
colnames(ss)

##  [1] "PG_number"                        "sexD"                            
##  [3] "ageD"                             "weightD"                         
##  [5] "heightD"                          "asaD"                            
##  [7] "ethnicityD"                       "ethnicity_otherD"                
##  [9] "current_smokerD"                  "diabetes_typeD"                  
## [11] "daily_insulinD"                   "oral_hypoglycemicsD"             
## [13] "non_insulin_injectablesD"         "diabetes_yrs_since_diagnosisD"   
## [15] "DM_years"                         "creatinine_preopD"               
## [17] "crp_preopD"                       "crp_preop_typeD"                 
## [19] "crp_preop_naD"                    "hba1c_doneD"                     
## [21] "surgery_typeD"                    "surgery_procedureD"              
## [23] "surgery_dominantD"                "wound_typeOP"                    
## [25] "non_study_dexameth_steriodPOSTOP" "nonstudy_dexameth_steriodD3"     
## [27] "HbA1c"                            "bmi"                             
## [29] "whodas_total_preop"               "revised_whodas_preop"            
## [31] "neut_lymph_ratio_d0"              "neut_lymph_ratio_d1"             
## [33] "neut_lymph_ratio_change_d1"       "neut_lymph_ratio_d2"             
## [35] "neut_lymph_ratio_change_d2"       "neut_lymph_ratio_d1_2"           
## [37] "neut_lymph_ratio_d2_2"            "ab_noninfection"                 
## [39] "risk"                             "risk_cat"                        
## [41] "bmi_cat"                          "asa_cat"                         
## [43] "wound_type_cat"                   "oxygen_quin"                     
## [45] "duration_sx"                      "duration_sx_quin"                
## [47] "anyDex"                           "anyDex_count"                    
## [49] "anyDexMiss"                       "anyDex2"                         
## [51] "treatment_group"                  "deltacrp"                        
## [53] "crp_group"

str(ss)

## 'data.frame':    117 obs. of  53 variables:
##  $ PG_number                       : chr  "3166" "3167" "3171" "3172" ...
##  $ sexD                            : chr  "Male" "Male" "Male" "Male" ...
##  $ ageD                            : int  62 67 61 78 73 77 84 54 70 62 ...
##  $ weightD                         : num  64.5 78.8 71.1 43 83.6 ...
##  $ heightD                         : num  163 169 165 156 171 167 133 155 170 175 ...
##  $ asaD                            : int  2 2 2 2 2 3 3 2 2 2 ...
##  $ ethnicityD                      : chr  "Asian" "Asian" "Asian" "Asian" ...
##  $ ethnicity_otherD                : chr  "" "" "" "" ...
##  $ current_smokerD                 : chr  "No" "No" "No" "No" ...
##  $ diabetes_typeD                  : chr  "" "" "" "" ...
##  $ daily_insulinD                  : chr  "" "" "" "" ...
##  $ oral_hypoglycemicsD             : chr  "" "" "" "" ...
##  $ non_insulin_injectablesD        : chr  "" "" "" "" ...
##  $ diabetes_yrs_since_diagnosisD   : int  NA NA NA NA NA 1 NA NA NA NA ...
##  $ DM_years                        : int  NA NA NA NA NA 1 NA NA NA NA ...
##  $ creatinine_preopD               : int  68 82 82 96 105 90 54 47 109 98 ...
##  $ crp_preopD                      : chr  "2.1" "0.6" "2.7" "1.2" ...
##  $ crp_preop_typeD                 : chr  "CRP" "CRP" "CRP" "CRP" ...
##  $ crp_preop_naD                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ hba1c_doneD                     : chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ surgery_typeD                   : chr  "Laparoscopic assisted low anterior resection of rectum" "Laparoscopic sigmoidectomy" "Laparoscopic assisted anterior resection of rectum" "Robotic assisted laparoscopic radical prostatectomy, pelvic lymph node dissection" ...
##  $ surgery_procedureD              : chr  "None of the above" "None of the above" "None of the above" "None of the above" ...
##  $ surgery_dominantD               : chr  "Gastrointestinal" "Gastrointestinal" "Gastrointestinal" "Urology-renal" ...
##  $ wound_typeOP                    : chr  "Clean / contaminated" "Clean / contaminated" "Clean / contaminated" "Clean / contaminated" ...
##  $ non_study_dexameth_steriodPOSTOP: chr  "No" "No" "No" "No" ...
##  $ nonstudy_dexameth_steriodD3     : chr  "No" "No" "No" "No" ...
##  $ HbA1c                           : num  5.7 6.2 6.2 6.3 6.3 ...
##  $ bmi                             : num  24.3 27.6 26.1 17.7 28.6 ...
##  $ whodas_total_preop              : int  16 12 12 12 12 12 24 14 12 12 ...
##  $ revised_whodas_preop            : int  16 12 12 12 12 12 24 14 12 12 ...
##  $ neut_lymph_ratio_d0             : num  4.3 2.94 2.29 2.93 2.62 ...
##  $ neut_lymph_ratio_d1             : num  13 6.5 7.22 23.2 8.57 ...
##  $ neut_lymph_ratio_change_d1      : num  8.7 3.56 4.93 20.27 5.95 ...
##  $ neut_lymph_ratio_d2             : num  5.92 3.68 3.77 22 NA ...
##  $ neut_lymph_ratio_change_d2      : num  1.623 0.741 1.475 19.071 NA ...
##  $ neut_lymph_ratio_d1_2           : num  13 6.5 7.22 23.2 8.57 ...
##  $ neut_lymph_ratio_d2_2           : num  5.92 3.68 3.77 22 NA ...
##  $ ab_noninfection                 : int  1 1 0 1 1 1 1 1 1 1 ...
##  $ risk                            : int  2 2 2 2 2 5 4 1 2 1 ...
##  $ risk_cat                        : chr  "Moderate" "Moderate" "Moderate" "Moderate" ...
##  $ bmi_cat                         : chr  "Normal [18.5 to <25]" "Overweight [25 to <30]" "Overweight [25 to <30]" "Underweight [BMI<18.5]" ...
##  $ asa_cat                         : chr  "1-2" "1-2" "1-2" "1-2" ...
##  $ wound_type_cat                  : chr  "Contaminated" "Contaminated" "Contaminated" "Contaminated" ...
##  $ oxygen_quin                     : chr  "0.21-0.4" "0.21-0.4" "0.21-0.4" "0.21-0.4" ...
##  $ duration_sx                     : num  2.5 2.67 2.42 3.17 2.5 ...
##  $ duration_sx_quin                : chr  "2.18-2.82" "2.18-2.82" "2.18-2.82" "2.83-3.75" ...
##  $ anyDex                          : chr  "No" "No" "No" "No" ...
##  $ anyDex_count                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ anyDexMiss                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ anyDex2                         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ treatment_group                 : int  1 1 2 2 1 1 2 1 2 1 ...
##  $ deltacrp                        : num  39.3 38.3 49 189.9 7.3 ...
##  $ crp_group                       : int  1 1 1 4 1 1 4 1 4 1 ...

summary(ss)

##   PG_number             sexD                ageD          weightD      
##  Length:117         Length:117         Min.   :25.00   Min.   : 41.00  
##  Class :character   Class :character   1st Qu.:54.00   1st Qu.: 68.50  
##  Mode  :character   Mode  :character   Median :62.00   Median : 82.00  
##                                        Mean   :61.03   Mean   : 84.55  
##                                        3rd Qu.:69.00   3rd Qu.: 95.40  
##                                        Max.   :86.00   Max.   :185.00  
##                                                                        
##     heightD           asaD        ethnicityD        ethnicity_otherD  
##  Min.   :133.0   Min.   :1.000   Length:117         Length:117        
##  1st Qu.:163.0   1st Qu.:2.000   Class :character   Class :character  
##  Median :171.0   Median :2.000   Mode  :character   Mode  :character  
##  Mean   :170.2   Mean   :2.308                                        
##  3rd Qu.:178.0   3rd Qu.:3.000                                        
##  Max.   :193.0   Max.   :4.000                                        
##                                                                       
##  current_smokerD    diabetes_typeD     daily_insulinD     oral_hypoglycemicsD
##  Length:117         Length:117         Length:117         Length:117         
##  Class :character   Class :character   Class :character   Class :character   
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  non_insulin_injectablesD diabetes_yrs_since_diagnosisD    DM_years     
##  Length:117               Min.   : 1.000                Min.   : 1.000  
##  Class :character         1st Qu.: 1.500                1st Qu.: 1.500  
##  Mode  :character         Median : 7.000                Median : 7.000  
##                           Mean   : 7.467                Mean   : 7.467  
##                           3rd Qu.:11.000                3rd Qu.:11.000  
##                           Max.   :18.000                Max.   :18.000  
##                           NA's   :102                   NA's   :102     
##  creatinine_preopD  crp_preopD        crp_preop_typeD    crp_preop_naD
##  Min.   : 19.0     Length:117         Length:117         Min.   :0    
##  1st Qu.: 66.0     Class :character   Class :character   1st Qu.:0    
##  Median : 76.0     Mode  :character   Mode  :character   Median :0    
##  Mean   : 80.3                                           Mean   :0    
##  3rd Qu.: 91.0                                           3rd Qu.:0    
##  Max.   :177.0                                           Max.   :0    
##  NA's   :10                                                           
##  hba1c_doneD        surgery_typeD      surgery_procedureD surgery_dominantD 
##  Length:117         Length:117         Length:117         Length:117        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  wound_typeOP       non_study_dexameth_steriodPOSTOP
##  Length:117         Length:117                      
##  Class :character   Class :character                
##  Mode  :character   Mode  :character                
##                                                     
##                                                     
##                                                     
##                                                     
##  nonstudy_dexameth_steriodD3     HbA1c             bmi       
##  Length:117                  Min.   : 4.500   Min.   :16.59  
##  Class :character            1st Qu.: 5.200   1st Qu.:24.93  
##  Mode  :character            Median : 5.600   Median :28.07  
##                              Mean   : 5.714   Mean   :29.00  
##                              3rd Qu.: 5.900   3rd Qu.:31.73  
##                              Max.   :10.000   Max.   :72.27  
##                                                              
##  whodas_total_preop revised_whodas_preop neut_lymph_ratio_d0
##  Min.   :12.00      Min.   :12.00        Min.   : 0.5312    
##  1st Qu.:12.00      1st Qu.:12.00        1st Qu.: 1.8254    
##  Median :14.00      Median :14.00        Median : 2.5737    
##  Mean   :16.74      Mean   :16.74        Mean   : 2.8745    
##  3rd Qu.:17.00      3rd Qu.:17.00        3rd Qu.: 3.3338    
##  Max.   :50.00      Max.   :50.00        Max.   :11.0000    
##                                          NA's   :9          
##  neut_lymph_ratio_d1 neut_lymph_ratio_change_d1 neut_lymph_ratio_d2
##  Min.   : 1.375      Min.   :-1.255             Min.   : 0.1235    
##  1st Qu.: 5.132      1st Qu.: 2.610             1st Qu.: 3.7692    
##  Median : 7.353      Median : 4.450             Median : 6.7273    
##  Mean   : 8.882      Mean   : 6.088             Mean   : 8.1589    
##  3rd Qu.:11.627      3rd Qu.: 8.730             3rd Qu.:10.8889    
##  Max.   :44.000      Max.   :39.299             Max.   :25.6042    
##  NA's   :13          NA's   :21                 NA's   :28         
##  neut_lymph_ratio_change_d2 neut_lymph_ratio_d1_2 neut_lymph_ratio_d2_2
##  Min.   :-6.182             Min.   : 1.375        Min.   : 0.1235      
##  1st Qu.: 1.591             1st Qu.: 5.132        1st Qu.: 3.7692      
##  Median : 4.356             Median : 7.353        Median : 6.7273      
##  Mean   : 5.356             Mean   : 8.882        Mean   : 8.1589      
##  3rd Qu.: 7.403             3rd Qu.:11.627        3rd Qu.:10.8889      
##  Max.   :22.776             Max.   :44.000        Max.   :25.6042      
##  NA's   :35                 NA's   :13            NA's   :28           
##  ab_noninfection       risk         risk_cat           bmi_cat         
##  Min.   :0.0000   Min.   :0.000   Length:117         Length:117        
##  1st Qu.:0.0000   1st Qu.:1.000   Class :character   Class :character  
##  Median :0.0000   Median :1.000   Mode  :character   Mode  :character  
##  Mean   :0.4495   Mean   :1.598                                        
##  3rd Qu.:1.0000   3rd Qu.:2.000                                        
##  Max.   :1.0000   Max.   :6.000                                        
##  NA's   :8                                                             
##    asa_cat          wound_type_cat     oxygen_quin         duration_sx     
##  Length:117         Length:117         Length:117         Min.   : 0.6833  
##  Class :character   Class :character   Class :character   1st Qu.: 2.5000  
##  Mode  :character   Mode  :character   Mode  :character   Median : 3.3333  
##                                                           Mean   : 3.9007  
##                                                           3rd Qu.: 4.7667  
##                                                           Max.   :10.6667  
##                                                                            
##  duration_sx_quin      anyDex           anyDex_count      anyDexMiss      
##  Length:117         Length:117         Min.   :0.0000   Min.   :0.000000  
##  Class :character   Class :character   1st Qu.:0.0000   1st Qu.:0.000000  
##  Mode  :character   Mode  :character   Median :0.0000   Median :0.000000  
##                                        Mean   :0.1282   Mean   :0.008547  
##                                        3rd Qu.:0.0000   3rd Qu.:0.000000  
##                                        Max.   :2.0000   Max.   :1.000000  
##                                                                           
##     anyDex2       treatment_group    deltacrp       crp_group    
##  Min.   :0.0000   Min.   :1.000   Min.   :-16.7   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:1.000   1st Qu.: 32.9   1st Qu.:1.000  
##  Median :0.0000   Median :2.000   Median : 49.5   Median :1.000  
##  Mean   :0.1111   Mean   :1.556   Mean   :130.9   Mean   :2.487  
##  3rd Qu.:0.0000   3rd Qu.:2.000   3rd Qu.:221.1   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :2.000   Max.   :359.0   Max.   :4.000  
##

ss1 <- ss

rownames(ss) <- paste(ss$PG_number,ss$timepoint,sep="-")

dim(ss)

## [1] 117  53

ss$ageCS <- scale(ss$ageD)
ss$sexD <- as.numeric(factor(ss$sexD))
ss$ethnicityCAT <- ss$ethnicityD
ss$ethnicityD <- as.numeric(factor(ss$ethnicityD))
ss$current_smokerD <- as.numeric(factor(ss$current_smokerD))
ss$diabetes_typeD <- as.numeric(factor(ss$diabetes_typeD))
ss$daily_insulinD <- as.numeric(factor(ss$daily_insulinD))
ss$oral_hypoglycemicsD <- as.numeric(factor(ss$oral_hypoglycemicsD))
ss$crp_preopD <- as.numeric(gsub("<5","2.5",gsub("<1","0.5",gsub("<1.0","0.5",ss$crp_preopD))))
ss$surgery_dominantD <- as.numeric(factor(ss$surgery_dominantD))
ss$wound_typeOP <- as.numeric(factor(ss$wound_typeOP))
ss$risk_cat <- as.numeric(factor(ss$risk_cat,levels=c("Low","Moderate","High")))
ss$wound_type_cat <- as.numeric(factor(ss$wound_type_cat))
ss$anyDex <- as.numeric(factor(ss$anyDex))

ss$bmi_cat <- as.numeric(factor(ss$bmi_cat,
  levels=c("Underweight [BMI<18.5]","Normal [18.5 to <25]",
  "Overweight [25 to <30]","Obese [30 to <40]","Super obese [40+]")))

ss <- ss[,c("PG_number","sexD","ageD","ageCS","weightD","asaD","heightD","ethnicityCAT","ethnicityD",
  "current_smokerD","diabetes_typeD","daily_insulinD","creatinine_preopD",
  "surgery_dominantD","wound_typeOP","HbA1c","bmi","revised_whodas_preop",
  "neut_lymph_ratio_d0","neut_lymph_ratio_d1","neut_lymph_ratio_d2","ab_noninfection",
  "risk","risk_cat","bmi_cat","wound_type_cat","duration_sx","anyDex","treatment_group",
  "deltacrp","crp_group")]

ss <- ss[order(rownames(ss)),]

ss_t0 <- ss
ss_eos <- ss
ss_pod1 <- ss

ss_t0$timepoint <- "T0"
ss_eos$timepoint <- "EOS"
ss_pod1$timepoint <- "POD1"

rownames(ss_t0) <- paste(ss_t0$PG_number,"T0",sep="-")
rownames(ss_eos) <- paste(ss_t0$PG_number,"EOS",sep="-")
rownames(ss_pod1) <- paste(ss_t0$PG_number,"POD1",sep="-")

ss <- rbind(ss_t0, ss_eos, ss_pod1)

rownames(ss) <- paste(ss$PG_number,ss$timepoint,sep="-")

xt0 <- xx[,grep("T0",colnames(xx))]
xpod1 <- xx[,grep("POD1",colnames(xx))]
xeos <- xx[,grep("EOS",colnames(xx))]

xt0f <- xt0[rowMeans(xt0)>=10,]
xpod1f <- xpod1[rowMeans(xpod1)>=10,]
xeosf <- xeos[rowMeans(xeos)>=10,]

dim(xt0f)

## [1] 21935   111

dim(xpod1f)

## [1] 21313   109

dim(xeosf)

## [1] 22067    98

ss_t0 <- ss_t0[which(rownames(ss_t0) %in% colnames(xt0)),]
ss_pod1 <- ss_pod1[which(rownames(ss_pod1) %in% colnames(xpod1)),]
ss_eos <- ss_eos[which(rownames(ss_eos) %in% colnames(xeos)),]

colnames(xt0) %in% rownames(ss_t0)

##   [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [106] TRUE TRUE TRUE TRUE TRUE TRUE

colnames(xpod1) %in% rownames(ss_pod1)

##   [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [106] TRUE TRUE TRUE TRUE

colnames(xeos) %in% rownames(ss_eos)

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

rownames(ss_t0) %in% colnames(xt0)

##   [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [106] TRUE TRUE TRUE TRUE TRUE TRUE

rownames(ss_pod1) %in% colnames(xpod1)

##   [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##  [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [106] TRUE TRUE TRUE TRUE

rownames(ss_eos) %in% colnames(xeos)

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

xxf <- xx[rowMeans(xx)>=10,]
xxf <- xxf[,order(colnames(xxf))]

PCA Analysis

This is a clinical study and each patient has detailed clinical metadata. Not all of these will be important to the gene expression profiles. Do determine that, we will use PCA analysis of the first 5 PCs to understand which PCs associate with which clinical parameters.

TODO: Infection

mx <- xt0f
ss2 <- ss_t0
ss2$ethnicityCAT = ss2$ageCS = NULL
ss2$timepoint = ss2$PG_number = NULL

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
loadings = pca$x

par(cex=0.75, mar = c(6, 8.5, 3, 3))

plot(pca,type="lines",col="blue")

nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-trait relationships @T0: Top principal components"))

mx <- xeosf
ss2 <- ss_eos
ss2$ethnicityCAT = ss2$ageCS = NULL
ss2$timepoint = ss2$PG_number =NULL

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
loadings = pca$x
plot(pca,type="lines",col="blue")

nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-trait relationships @EOS: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

mx <- xpod1f
ss2 <- ss_pod1
ss2$ethnicityCAT = ss2$ageCS = NULL
ss2$timepoint = ss2$PG_number = NULL

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
loadings = pca$x
plot(pca,type="lines",col="blue")

nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-trait relationships @POD1: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

Now export PDF.

mx <- xt0f
ss2 <- ss_t0
ss2$ethnicityCAT = ss2$ageCS = NULL
ss2$timepoint = ss2$PG_number = NULL
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
loadings = pca$x

pdf("pca_cor.pdf",height=7,width=7)
par(cex=0.75, mar = c(6, 8.5, 3, 3))

plot(pca,type="lines",col="blue")
nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-trait relationships @T0: Top principal components"))

mx <- xeosf
ss2 <- ss_eos
ss2$ethnicityCAT = ss2$ageCS = NULL
ss2$timepoint = ss2$PG_number = NULL
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
loadings = pca$x
plot(pca,type="lines",col="blue")
nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-trait relationships @EOS: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

mx <- xpod1f
ss2 <- ss_pod1
ss2$ethnicityCAT = ss2$ageCS = NULL
ss2$timepoint = ss2$PG_number = NULL
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
loadings = pca$x
plot(pca,type="lines",col="blue")
nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-trait relationships @POD1: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

dev.off()

## X11cairo 
##        2

PCA plots

par(mfrow=c(3,3))

#T0
mx <- xt0f
ss2 <- ss_t0
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-T0","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("T0")
text(pca$x[,1:2],labels=labs)
plot(pca$x[,c(1,3)],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("T0")
text(pca$x[,c(1,3)],labels=labs)
XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("T0")
text(pca$x[,2:3],labels=labs)

#EOS
mx <- xeosf
ss2 <- ss_eos
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-EOS","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("EOS")
text(pca$x[,1:2],labels=labs)
plot(pca$x[,c(1,3)],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("EOS")
text(pca$x[,c(1,3)],labels=labs)
XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("EOS")
text(pca$x[,2:3],labels=labs)

#POD1
mx <- xpod1f
ss2 <- ss_pod1
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-POD1","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs)
mtext("POD1")
plot(pca$x[,c(1,3)],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs)
mtext("POD1")
XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs)
mtext("POD1")

dev.off()

## X11cairo 
##        2

pdf("pca_charts.pdf",width=9,height=9)
par(mfrow=c(3,3))
#T0
mx <- xt0f
ss2 <- ss_t0
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-T0","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("T0")
text(pca$x[,1:2],labels=labs)

plot(pca$x[,c(1,3)],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("T0")
text(pca$x[,c(1,3)],labels=labs)

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("T0")
text(pca$x[,2:3],labels=labs)

#EOS
mx <- xeosf
ss2 <- ss_eos
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-EOS","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("EOS")
text(pca$x[,1:2],labels=labs)
plot(pca$x[,c(1,3)],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("EOS")
text(pca$x[,c(1,3)],labels=labs)
XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
mtext("EOS")
text(pca$x[,2:3],labels=labs)

#POD1
mx <- xpod1f
ss2 <- ss_pod1
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-POD1","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs)
mtext("POD1")
plot(pca$x[,c(1,3)],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs)
mtext("POD1")
XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col="gray",pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs)
mtext("POD1")

dev.off()

## X11cairo 
##        2

Specific PCA charts

Specific PCAs for key clinical parameters:

wound type
surg duration
ethnicity
age
sex
treatment group

And ones we didn’t include:

bmi
asaD
smoker
diabetes_typeD
crp

# wound type clean (1) contaminated (2)

mx <- xt0f
ss2 <- ss_t0
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-T0","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$wound_type_cat)
cols <- gsub("2","red",gsub("1","gray",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - wound type")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - wound type")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - wound type")

# surg duration
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_t0$duration_sx, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - surgical duration deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - surgical duration deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - surgical duration deciles")

# Ethnicity Levels [1-4]: Asian, Maori/Polynesian, Other, White/Caucasian
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$ethnicityD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - ethnicity")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - ethnicity")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - ethnicity")

# age
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_t0$ageD, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - age deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - age deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - age deciles")

# sex female=1 male=2
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$sexD)
cols <- gsub("1","pink",gsub("2","lightblue",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - sex: female=pink, male=lightblue")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - sex: female=pink, male=lightblue")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - sex: female=pink, male=lightblue")

# bmi
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_t0$bmi, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - BMI deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - BMI deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - BMI deciles")

# asaD levels 1:4 black,red,green,blue
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$asaD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - asaD")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - asaD")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - asaD")

# Current smoker no, yes
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$current_smokerD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - current smoker")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - current smoker")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - current smoker")

# diabetes
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$diabetes_typeD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - diabetes")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - diabetes")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - diabetes")

# treatment group
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$treatment_group)
cols <- gsub("2","orange",gsub("1","cyan3",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - treatment group")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - treatment group")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - treatment group")

# treatment group
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_t0$crp_group)
cols <- gsub("4","orange",gsub("1","cyan3",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - CRP group")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - CRP group")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - CRP group")

EOS.

# wound type clean (1) contaminated (2)

mx <- xeosf
ss2 <- ss_eos
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-EOS","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$wound_type_cat)
cols <- gsub("2","red",gsub("1","gray",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - wound type")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - wound type")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - wound type")

# surg duration
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_eos$duration_sx, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - surgical duration deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - surgical duration deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - surgical duration deciles")

# Ethnicity Levels [1-4]: Asian, Maori/Polynesian, Other, White/Caucasian
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$ethnicityD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS  - ethnicity")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS  - ethnicity")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS  - ethnicity")

# age
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_eos$ageD, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - age deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - age deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - age deciles")

# sex female=1 male=2
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$sexD)
cols <- gsub("1","pink",gsub("2","lightblue",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - sex: female=pink, male=lightblue")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - sex: female=pink, male=lightblue")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - sex: female=pink, male=lightblue")

# bmi
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_eos$bmi, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - BMI deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - BMI deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - BMI deciles")

# asaD levels 1:4 black,red,green,blue
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$asaD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - asaD")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - asaD")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - asaD")

# Current smoker no, yes
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$current_smokerD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - current smoker")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - current smoker")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - current smoker")

# diabetes
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$diabetes_typeD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - diabetes")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - diabetes")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - diabetes")

# treatment group
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$treatment_group)
cols <- gsub("2","orange",gsub("1","cyan3",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - treatment group")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - treatment group")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - treatment group")

# treatment group
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_eos$crp_group)
cols <- gsub("4","orange",gsub("1","cyan3",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - CRP group")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - CRP group")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - CRP group")

POD1.

# wound type clean (1) contaminated (2)

mx <- xpod1f
ss2 <- ss_pod1
pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-POD1","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$wound_type_cat)
cols <- gsub("2","red",gsub("1","gray",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - wound type")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - wound type")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - wound type")

# surg duration
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_pod1$duration_sx, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - surgical duration deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - surgical duration deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - surgical duration deciles")

# Ethnicity Levels [1-4]: Asian, Maori/Polynesian, Other, White/Caucasian
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$ethnicityD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1  - ethnicity")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1  - ethnicity")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1  - ethnicity")

# age
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_pod1$ageD, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - age deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - age deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - age deciles")

# sex female=1 male=2
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$sexD)
cols <- gsub("1","pink",gsub("2","lightblue",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - sex: female=pink, male=lightblue")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - sex: female=pink, male=lightblue")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - sex: female=pink, male=lightblue")

# bmi
my_palette <- colorRampPalette(c("yellow", "orange", "red"))(n = 10)
decile <- ntile(ss_pod1$bmi, 10)
mycols <- my_palette[decile]
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
plot(pca$x[,1:2],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - BMI deciles")

plot(pca$x[,c(1,3)],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - BMI deciles")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=mycols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - BMI deciles")

# asaD levels 1:4 black,red,green,blue
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$asaD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - asaD")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - asaD")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - asaD")

# Current smoker no, yes
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$current_smokerD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - current smoker")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - current smoker")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - current smoker")

# diabetes
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$diabetes_typeD)
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - diabetes")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - diabetes")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - diabetes")

# treatment group
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$treatment_group)
cols <- gsub("2","orange",gsub("1","cyan3",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - treatment group")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - treatment group")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - treatment group")

# treatment group
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss_pod1$crp_group)
cols <- gsub("4","orange",gsub("1","cyan3",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - CRP group")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - CRP group")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - CRP group")

Specific PCA charts for infection

Load infection data

infec <- read.table("infec.tsv",header=TRUE)
head(infec)

##   PG_number infection30d crp_group
## 1     PG022            0         1
## 2     PG177            0         4
## 3     PG198            1         4
## 4    PG3233            1         1
## 5     PG002            0         4
## 6     PG004            0         1

mx <- xt0f
ss2 <- ss_t0
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 21

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-T0","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss2$infec)
cols <- gsub("1","red",gsub("0","gray",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("T0 - infection")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("T0 - infection")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("T0 - infection")

mx <- xeosf
ss2 <- ss_eos
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 77 21

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-EOS","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss2$infec)
cols <- gsub("1","red",gsub("0","gray",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("EOS - infection")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("EOS - infection")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("EOS - infection")

mx <- xpod1f
ss2 <- ss_pod1
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 19

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)
labs=gsub("-POD1","",rownames(pca$x))
XMIN=min(pca$x[,1])*1.1
XMAX=max(pca$x[,1])*1.1
cols <- as.character(ss2$infec)
cols <- gsub("1","red",gsub("0","gray",cols))
plot(pca$x[,1:2],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,1:2],labels=labs,cex=0.7)
mtext("POD1 - infection")

plot(pca$x[,c(1,3)],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,c(1,3)],labels=labs,cex=0.7)
mtext("POD1 - infection")

XMIN=min(pca$x[,2])*1.1
XMAX=max(pca$x[,2])*1.1
plot(pca$x[,2:3],cex=2,col=cols,pch=19,bty="none", xlim=c(XMIN,XMAX) )
text(pca$x[,2:3],labels=labs,cex=0.7)
mtext("POD1 - infection")

Blood composition

xn <- xx
gt <- as.data.frame(sapply(strsplit(rownames(xn)," "),"[[",2) )
rownames(gt) <- rownames(xx)
colnames(gt) = "genesymbol"
gt$geneID <- rownames(xx)
blood <- read.table("https://raw.githubusercontent.com/giannimonaco/ABIS/master/data/sigmatrixRNAseq.txt")
blood2 <- merge(gt,blood,by.x="genesymbol",by.y=0)
blood2 <- blood2[which(!duplicated(blood2$genesymbol)),]
rownames(blood2) <- blood2$geneID
blood2 <- blood2[,c(3:ncol(blood2))]
genes <- intersect(rownames(xx), rownames(blood2))
dec <- apply(xx[genes, , drop=F], 2, function(x) coef(rlm( as.matrix(blood2[genes,]), x, maxit =100 ))) *100

## Warning in rlm.default(as.matrix(blood2[genes, ]), x, maxit = 100): 'rlm'
## failed to converge in 100 steps
## Warning in rlm.default(as.matrix(blood2[genes, ]), x, maxit = 100): 'rlm'
## failed to converge in 100 steps

dec <- t(dec/colSums(dec)*100)
dec <- signif(dec, 3)
# remove negative values
dec2 <- t(apply(dec,2,function(x) { mymin=min(x) ; if (mymin<0) { x + (mymin * -1) } else { x } } ))
dec2 <- apply(dec2,2,function(x) {x / sum(x) *100} )
colfunc <- colorRampPalette(c("blue", "white", "red"))
heatmap.2( dec2, col=colfunc(25),scale="row",
 trace="none",margins = c(5,5), cexRow=.7, cexCol=.8,  main="cell type abundances")

heatmap.2( dec2, col=colfunc(25),scale="none",
 trace="none",margins = c(5,5), cexRow=.7, cexCol=.8,  main="cell type abundances")

par(mar=c(5,10,3,1))
boxplot(t(dec2[order(rowMeans(dec2)),]),horizontal=TRUE,las=1, xlab="estimated cell proportion (%)")

par(mar = c(5.1, 4.1, 4.1, 2.1))
heatmap.2( cor(dec2),trace="none",scale="none")

heatmap.2( cor(t(dec2)),trace="none",scale="none", margins = c(8,8))

par(mar=c(5,10,3,1))
barplot(apply(dec2,1,sd),horiz=TRUE,las=1,xlab="SD of cell proportions (%)")

which(apply(dec2,1,sd)>4)

##    Monocytes.C             NK   T.CD8.Memory    T.CD4.Naive Neutrophils.LD 
##              1              2              3              4             10

Based on this analysis we can begin with correction of:

Monocytes.C
NK
T.CD8.Memory
T.CD4.Naive
Neutrophils.LD

According to the correlation heatmap, these are not strongly correlated.

Now look at how the cell proportions change over time.

ct0 <- dec2[,grep("-T0",colnames(dec2))]
ceos <- dec2[,grep("-EOS",colnames(dec2))]
cpod1 <- dec2[,grep("-POD1",colnames(dec2))]
par(mar=c(5,10,3,1))
boxplot(t(ct0),horizontal=TRUE,las=1, xlab="estimated cell proportion (%)",main="T0")

boxplot(t(ceos),horizontal=TRUE,las=1, xlab="estimated cell proportion (%)",main="EOS")

boxplot(t(cpod1),horizontal=TRUE,las=1, xlab="estimated cell proportion (%)",main="POD1")

sscell <- as.data.frame(t(dec2))
sscell_t0 <- sscell[grep("-T0",rownames(sscell)),]
sscell_eos <- sscell[grep("-EOS",rownames(sscell)),]
sscell_pod1 <- sscell[grep("POD1",rownames(sscell)),]

Now look at how cell types associate with the PCAs.

#xt0f xeosf xpod1f
#sscell_t0 sscell_eos sscell_pod1

## T0
mx <- xt0f

ss2 <- sscell_t0

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)

loadings = pca$x
par(mar = c(5.1, 4.1, 4.1, 2.1))
plot(pca,type="lines",col="blue")

nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-cell relationships @T0: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are below given minimum and will be truncated to
## the minimum.

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

## EOS
mx <- xeosf

ss2 <- sscell_eos

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)

loadings = pca$x

plot(pca,type="lines",col="blue")

nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-cell relationships @EOS: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are below given minimum and will be truncated to
## the minimum.
## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

## POD1
mx <- xpod1f

ss2 <- sscell_pod1

pca <- prcomp(t(mx),center = TRUE, scale = TRUE,retx=TRUE)

loadings = pca$x

plot(pca,type="lines",col="blue")

nGenes <- nrow(mx)
nSamples <- ncol(mx)
datTraits <- ss2
moduleTraitCor <- cor(loadings[,1:8], datTraits, use = "p")
moduleTraitPvalue <- corPvalueStudent(moduleTraitCor, nSamples)
textMatrix <- paste(signif(moduleTraitCor, 2), "\n(",
  signif(moduleTraitPvalue, 1), ")", sep = "")

dim(textMatrix) = dim(moduleTraitCor)

labeledHeatmap(Matrix = t(moduleTraitCor),
  xLabels = colnames(loadings)[1:ncol(t(moduleTraitCor))],
  yLabels = names(datTraits), colorLabels = FALSE, colors = blueWhiteRed(6),
  textMatrix = t(textMatrix), setStdMargins = FALSE, cex.text = 0.5,
  cex.lab.y = 0.6, zlim = c(-0.45,0.45),
  main = paste("PCA-cell relationships @POD1: Top principal components"))

## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are below given minimum and will be truncated to
## the minimum.
## Warning in numbers2colors(data, signed, colors = colors, lim = zlim, naColor =
## naColor): Some values of 'x' are above given maximum and will be truncated to
## the maximum.

The conclusion here is that the cell types correlate strongly with the principal components. The good news is that we have selected the cell types that associate the strongest, so we can correct for their contribution.

Differential expression

Specific PCAs for key clinical parameters:

wound type
surg duration
ethnicity
age
sex

And blood composition:

Monocytes.C
NK
T.CD8.Memory
T.CD4.Naive
Neutrophils.LD

And ones we didn’t include:

bmi
asaD
smoker
diabetes_typeD

TODO:

age data centred and scaled
ethnicity categories unordered

Overview

CRP group comparisons not stratified for treatment group (inflamation)
Treatment group comparisons not stratified for CRP group (Steroid)
CRP group comparisons statified for treatment group: inflammation and steroid
Treatment group comparisons stratified for CRP group: steroid and inflammation
Sex differences in low CRP group (not stratified for treatment group)
Sex differences in high CRP group (not stratified for treatment group)

CRP group differences not stratified

CRP low vs high at t=0

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 390 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000179593.16 ALOX15B      192.12350     -0.7187508 0.10489179 -6.852308
## ENSG00000141744.4 PNMT           35.64128     -0.4354625 0.09456325 -4.604986
## ENSG00000087116.16 ADAMTS2       96.08857     -0.5387022 0.12313891 -4.374752
## ENSG00000057294.16 PKP2          83.96200     -0.3049742 0.07109219 -4.289842
## ENSG00000279359.1 RP11-36D19.9   12.76771     -0.5030155 0.11986382 -4.196558
## ENSG00000276168.1 RN7SL1        591.11188      0.2489061 0.06119920  4.067147
## ENSG00000063438.20 AHRR          92.23299     -0.4595163 0.11376981 -4.039000
## ENSG00000233916.1 ZDHHC20P1      21.16714     -0.3347389 0.08736544 -3.831480
## ENSG00000189056.15 RELN          17.27434      0.1809771 0.04776302  3.789062
## ENSG00000274012.1 RN7SL2       1037.64399      0.2367552 0.06518141  3.632250
##                                      pvalue         padj
## ENSG00000179593.16 ALOX15B     7.266794e-12 1.593971e-07
## ENSG00000141744.4 PNMT         4.124926e-06 4.524013e-02
## ENSG00000087116.16 ADAMTS2     1.215705e-05 8.888828e-02
## ENSG00000057294.16 PKP2        1.788006e-05 9.804975e-02
## ENSG00000279359.1 RP11-36D19.9 2.710017e-05 1.188885e-01
## ENSG00000276168.1 RN7SL1       4.759225e-05 1.682084e-01
## ENSG00000063438.20 AHRR        5.367947e-05 1.682084e-01
## ENSG00000233916.1 ZDHHC20P1    1.273749e-04 3.492459e-01
## ENSG00000189056.15 RELN        1.512169e-04 3.685491e-01
## ENSG00000274012.1 RN7SL2       2.809608e-04 6.162874e-01

mean(abs(dge$stat))

## [1] 0.7207644

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 19 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000223609.11 HBD          153.11532      0.6127017 0.13080850  4.683959
## ENSG00000261026.1 CTD-3247F14.2  13.74757     -1.0503473 0.23156953 -4.535775
## ENSG00000206177.7 HBM            45.70050      0.6155113 0.14218666  4.328896
## ENSG00000004939.16 SLC4A1       233.82225      0.4920445 0.12031713  4.089563
## ENSG00000169877.10 AHSP          34.93993      0.6188578 0.15316672  4.040419
## ENSG00000179593.16 ALOX15B      250.75091     -0.3691708 0.09277372 -3.979261
## ENSG00000218052.5 ADAMTS7P4      23.44923      0.2098524 0.05372181  3.906280
## ENSG00000268734.1 CTB-61M7.2     10.82400     -1.4946787 0.38782949 -3.853958
## ENSG00000166947.15 EPB42         37.40076      0.4773174 0.12488402  3.822085
## ENSG00000179388.9 EGR3          270.39768     -0.5197675 0.13823652 -3.759987
##                                       pvalue      padj
## ENSG00000223609.11 HBD          2.813859e-06 0.0617220
## ENSG00000261026.1 CTD-3247F14.2 5.739229e-06 0.0629450
## ENSG00000206177.7 HBM           1.498584e-05 0.1095715
## ENSG00000004939.16 SLC4A1       4.321860e-05 0.2340715
## ENSG00000169877.10 AHSP         5.335572e-05 0.2340715
## ENSG00000179593.16 ALOX15B      6.912983e-05 0.2527272
## ENSG00000218052.5 ADAMTS7P4     9.372795e-05 0.2937032
## ENSG00000268734.1 CTB-61M7.2    1.162234e-04 0.3186701
## ENSG00000166947.15 EPB42        1.323281e-04 0.3225129
## ENSG00000179388.9 EGR3          1.699222e-04 0.3727244

mean(abs(dge$stat))

## [1] 0.7617828

crp_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 10 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000223609.11 HBD      153.11532     0.58209593 0.13422143  4.336833
## ENSG00000206177.7 HBM        45.70050     0.55382261 0.14446831  3.833523
## ENSG00000004939.16 SLC4A1   233.82225     0.46687689 0.12375126  3.772704
## ENSG00000132122.12 SPATA6   215.55260    -0.09259567 0.02479843 -3.733933
## ENSG00000169877.10 AHSP      34.93993     0.58801794 0.15784167  3.725366
## ENSG00000076864.20 RAP1GAP   14.58834     0.30003375 0.08110668  3.699248
## ENSG00000181126.13 HLA-V    366.66600    -0.44464770 0.12076431 -3.681946
## ENSG00000218052.5 ADAMTS7P4  23.44923     0.18427003 0.05011221  3.677148
## ENSG00000170153.11 RNF150    16.74377    -0.64212105 0.17646644 -3.638771
## ENSG00000166947.15 EPB42     37.40076     0.45926219 0.12825639  3.580813
##                                   pvalue      padj
## ENSG00000223609.11 HBD      1.445504e-05 0.3170712
## ENSG00000206177.7 HBM       1.263209e-04 0.6289033
## ENSG00000004939.16 SLC4A1   1.614877e-04 0.6289033
## ENSG00000132122.12 SPATA6   1.885127e-04 0.6289033
## ENSG00000169877.10 AHSP     1.950322e-04 0.6289033
## ENSG00000076864.20 RAP1GAP  2.162390e-04 0.6289033
## ENSG00000181126.13 HLA-V    2.314602e-04 0.6289033
## ENSG00000218052.5 ADAMTS7P4 2.358558e-04 0.6289033
## ENSG00000170153.11 RNF150   2.739418e-04 0.6289033
## ENSG00000166947.15 EPB42    3.425263e-04 0.6289033

mean(abs(dge$stat))

## [1] 0.7499593

crp_t0_adj <- dge

CRP low vs high at EOS

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 118 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE     stat
## ENSG00000139572.4 GPR84          231.01210      0.7893177 0.09543038 8.271137
## ENSG00000113368.12 LMNB1        2665.88768      0.4224482 0.05453011 7.747062
## ENSG00000280091.1 CTC-312O10.3    32.83608      0.5061573 0.06958189 7.274267
## ENSG00000137193.14 PIM1         7966.43548      0.3023359 0.04160685 7.266494
## ENSG00000170525.21 PFKFB3       4788.95198      0.5364125 0.07387475 7.261108
## ENSG00000079385.23 CEACAM1      1095.76169      0.7812428 0.10858770 7.194579
## ENSG00000069399.15 BCL3         3591.85376      0.4579237 0.06480903 7.065740
## ENSG00000184557.4 SOCS3        13113.96513      0.6655088 0.09494900 7.009118
## ENSG00000198019.13 FCGR1B        681.83152      0.5275512 0.07634910 6.909724
## ENSG00000163251.4 FZD5            91.58576      0.4341470 0.06308193 6.882272
##                                      pvalue         padj
## ENSG00000139572.4 GPR84        1.326872e-16 2.871217e-12
## ENSG00000113368.12 LMNB1       9.404335e-15 1.017502e-10
## ENSG00000280091.1 CTC-312O10.3 3.483056e-13 1.661581e-09
## ENSG00000137193.14 PIM1        3.689370e-13 1.661581e-09
## ENSG00000170525.21 PFKFB3      3.839320e-13 1.661581e-09
## ENSG00000079385.23 CEACAM1     6.265387e-13 2.259612e-09
## ENSG00000069399.15 BCL3        1.597623e-12 4.938709e-09
## ENSG00000184557.4 SOCS3        2.398244e-12 6.486949e-09
## ENSG00000198019.13 FCGR1B      4.855988e-12 1.167541e-08
## ENSG00000163251.4 FZD5         5.890533e-12 1.274652e-08

mean(abs(dge$stat))

## [1] 1.485292

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 10 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE     stat
## ENSG00000139572.4 GPR84     231.01210      0.7183062 0.11564967 6.211053
## ENSG00000127954.13 STEAP4  2533.73582      0.5130516 0.09062352 5.661352
## ENSG00000184557.4 SOCS3   13113.96513      0.6216040 0.11307436 5.497303
## ENSG00000176597.12 B3GNT5   367.05640      0.4553917 0.08837595 5.152891
## ENSG00000059804.16 SLC2A3  9795.99656      0.4116953 0.08137334 5.059338
## ENSG00000170525.21 PFKFB3  4788.95198      0.4492633 0.08908047 5.043343
## ENSG00000069399.15 BCL3    3591.85376      0.3933702 0.07815095 5.033467
## ENSG00000121742.19 GJB6      54.89967      0.6668659 0.13279923 5.021610
## ENSG00000113368.12 LMNB1   2665.88768      0.3025021 0.06103820 4.955947
## ENSG00000173281.5 PPP1R3B  1142.35672      0.4439020 0.08968117 4.949780
##                                 pvalue         padj
## ENSG00000139572.4 GPR84   5.263078e-10 1.138877e-05
## ENSG00000127954.13 STEAP4 1.501854e-08 1.624930e-04
## ENSG00000184557.4 SOCS3   3.856442e-08 2.781651e-04
## ENSG00000176597.12 B3GNT5 2.565005e-07 1.385985e-03
## ENSG00000059804.16 SLC2A3 4.207135e-07 1.385985e-03
## ENSG00000170525.21 PFKFB3 4.574686e-07 1.385985e-03
## ENSG00000069399.15 BCL3   4.816882e-07 1.385985e-03
## ENSG00000121742.19 GJB6   5.124025e-07 1.385985e-03
## ENSG00000113368.12 LMNB1  7.197870e-07 1.407299e-03
## ENSG00000173281.5 PPP1R3B 7.429758e-07 1.407299e-03

mean(abs(dge$stat))

## [1] 1.185118

crp_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 9 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000197632.9 SERPINB2      260.41250      0.3545761 0.06850137  5.176189
## ENSG00000127954.13 STEAP4      2533.73582      0.2836443 0.05488758  5.167732
## ENSG00000139572.4 GPR84         231.01210      0.4490274 0.08779641  5.114416
## ENSG00000211459.2 MT-RNR1    100058.33839     -0.2858692 0.05843432 -4.892145
## ENSG00000241560.7 ZBTB20-AS1     46.10569      0.2579937 0.05302105  4.865873
## ENSG00000210082.2 MT-RNR2    242496.36564     -0.2401137 0.04976253 -4.825190
## ENSG00000064763.11 FAR2         604.97645     -0.2053606 0.04311803 -4.762754
## ENSG00000135678.12 CPM          575.43286     -0.3101463 0.06641943 -4.669512
## ENSG00000155659.15 VSIG4        356.81388     -0.4006950 0.08583339 -4.668288
## ENSG00000050730.16 TNIP3         55.66088      0.2325018 0.04992622  4.656908
##                                    pvalue        padj
## ENSG00000197632.9 SERPINB2   2.264642e-07 0.002314927
## ENSG00000127954.13 STEAP4    2.369520e-07 0.002314927
## ENSG00000139572.4 GPR84      3.147134e-07 0.002314927
## ENSG00000211459.2 MT-RNR1    9.974275e-07 0.005029176
## ENSG00000241560.7 ZBTB20-AS1 1.139524e-06 0.005029176
## ENSG00000210082.2 MT-RNR2    1.398696e-06 0.005144171
## ENSG00000064763.11 FAR2      1.909682e-06 0.006020136
## ENSG00000135678.12 CPM       3.019165e-06 0.007083372
## ENSG00000155659.15 VSIG4     3.037196e-06 0.007083372
## ENSG00000050730.16 TNIP3     3.209939e-06 0.007083372

mean(abs(dge$stat))

## [1] 0.9721167

crp_eos_adj <- dge

CRP low vs high at POD1

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 134 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange      lfcSE      stat
## ENSG00000007968.7 E2F2       870.17416      0.4436761 0.03676788 12.066948
## ENSG00000137869.15 CYP19A1    81.93527      0.9960649 0.09264040 10.751949
## ENSG00000163710.9 PCOLCE2     18.25602      1.0744848 0.10096581 10.642066
## ENSG00000104918.8 RETN      1801.19251      0.7757049 0.07614859 10.186726
## ENSG00000132170.24 PPARG     168.63657      0.5429631 0.05405474 10.044690
## ENSG00000145287.11 PLAC8    4671.33188      0.3457873 0.03478431  9.940898
## ENSG00000183578.8 TNFAIP8L3   24.65972      0.7720344 0.07809643  9.885655
## ENSG00000135424.18 ITGA7     436.10324      0.5378278 0.05479987  9.814399
## ENSG00000108950.12 FAM20A   1751.41363      0.6102328 0.06295930  9.692497
## ENSG00000165092.13 ALDH1A1   410.41112     -0.5138028 0.05334920 -9.630936
##                                   pvalue         padj
## ENSG00000007968.7 E2F2      1.578813e-33 3.364925e-29
## ENSG00000137869.15 CYP19A1  5.802268e-27 6.183186e-23
## ENSG00000163710.9 PCOLCE2   1.898717e-26 1.348912e-22
## ENSG00000104918.8 RETN      2.272894e-24 1.211055e-20
## ENSG00000132170.24 PPARG    9.695213e-24 4.132682e-20
## ENSG00000145287.11 PLAC8    2.763251e-23 9.815527e-20
## ENSG00000183578.8 TNFAIP8L3 4.804293e-23 1.462770e-19
## ENSG00000135424.18 ITGA7    9.761766e-23 2.600656e-19
## ENSG00000108950.12 FAM20A   3.244949e-22 7.684400e-19
## ENSG00000165092.13 ALDH1A1  5.918789e-22 1.261472e-18

mean(abs(dge$stat))

## [1] 1.827282

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 21 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000007968.7 E2F2      870.17416      0.3860936 0.04401299  8.772264
## ENSG00000163710.9 PCOLCE2    18.25602      0.9879318 0.11793493  8.376923
## ENSG00000137869.15 CYP19A1   81.93527      0.8122448 0.10657233  7.621535
## ENSG00000104918.8 RETN     1801.19251      0.6489135 0.08536396  7.601727
## ENSG00000132170.24 PPARG    168.63657      0.4789945 0.06318857  7.580398
## ENSG00000135424.18 ITGA7    436.10324      0.4767597 0.06544080  7.285359
## ENSG00000108950.12 FAM20A  1751.41363      0.5260557 0.07302081  7.204190
## ENSG00000169994.19 MYO7B    618.31752      0.3324557 0.04684399  7.097083
## ENSG00000165092.13 ALDH1A1  410.41112     -0.4432919 0.06264590 -7.076151
## ENSG00000116016.14 EPAS1    154.17271      0.3555413 0.05078144  7.001402
##                                  pvalue         padj
## ENSG00000007968.7 E2F2     1.751085e-18 3.732087e-14
## ENSG00000163710.9 PCOLCE2  5.432955e-17 5.789629e-13
## ENSG00000137869.15 CYP19A1 2.506768e-14 1.468447e-10
## ENSG00000104918.8 RETN     2.922041e-14 1.468447e-10
## ENSG00000132170.24 PPARG   3.444956e-14 1.468447e-10
## ENSG00000135424.18 ITGA7   3.208156e-13 1.139591e-09
## ENSG00000108950.12 FAM20A  5.839004e-13 1.777810e-09
## ENSG00000169994.19 MYO7B   1.274176e-12 3.394564e-09
## ENSG00000165092.13 ALDH1A1 1.482132e-12 3.509854e-09
## ENSG00000116016.14 EPAS1   2.534142e-12 5.401016e-09

mean(abs(dge$stat))

## [1] 1.324953

crp_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 8 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000007968.7 E2F2      870.17416      0.3472345 0.04014188  8.650180
## ENSG00000165092.13 ALDH1A1  410.41112     -0.4939869 0.05929803 -8.330579
## ENSG00000137869.15 CYP19A1   81.93527      0.7605312 0.09261922  8.211375
## ENSG00000132170.24 PPARG    168.63657      0.4271253 0.05248859  8.137489
## ENSG00000163710.9 PCOLCE2    18.25602      0.7905130 0.09924488  7.965277
## ENSG00000108950.12 FAM20A  1751.41363      0.4989988 0.06457599  7.727313
## ENSG00000135424.18 ITGA7    436.10324      0.4381670 0.05874170  7.459216
## ENSG00000116016.14 EPAS1    154.17271      0.2817621 0.03934883  7.160622
## ENSG00000104918.8 RETN     1801.19251      0.5040455 0.07279215  6.924449
## ENSG00000169994.19 MYO7B    618.31752      0.3134418 0.04566546  6.863870
##                                  pvalue         padj
## ENSG00000007968.7 E2F2     5.141826e-18 1.095877e-13
## ENSG00000165092.13 ALDH1A1 8.044826e-17 8.572969e-13
## ENSG00000137869.15 CYP19A1 2.186696e-16 1.553502e-12
## ENSG00000132170.24 PPARG   4.035604e-16 2.150271e-12
## ENSG00000163710.9 PCOLCE2  1.648539e-15 7.027064e-12
## ENSG00000108950.12 FAM20A  1.098407e-14 3.901725e-11
## ENSG00000135424.18 ITGA7   8.703888e-14 2.650085e-10
## ENSG00000116016.14 EPAS1   8.031209e-13 2.139615e-09
## ENSG00000104918.8 RETN     4.376743e-12 1.036461e-08
## ENSG00000169994.19 MYO7B   6.701962e-12 1.428389e-08

mean(abs(dge$stat))

## [1] 1.110871

crp_pod1_adj <- dge

Treatment group differences not stratified

Treatment A vs B at t0

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 364 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000179593.16 ALOX15B     250.75091     -2.4535079 0.3356004 -7.310802
## ENSG00000279359.1 RP11-36D19.9  24.97417     -2.8528835 0.4030002 -7.079112
## ENSG00000141744.4 PNMT          35.64128     -1.7581232 0.2682321 -6.554486
## ENSG00000276085.1 CCL3L1       308.64747     -1.7142384 0.3110313 -5.511466
## ENSG00000057294.16 PKP2         92.33310     -1.2052440 0.2230360 -5.403810
## ENSG00000079215.15 SLC1A3      219.79787     -1.7005517 0.3337533 -5.095236
## ENSG00000164056.11 SPRY1        69.48429     -1.1641127 0.2336978 -4.981274
## ENSG00000233916.1 ZDHHC20P1     21.16714     -1.2626226 0.2544397 -4.962364
## ENSG00000277632.2 CCL3         559.74862     -1.3892331 0.2870354 -4.839937
## ENSG00000122644.13 ARL4A       383.47521     -0.6984909 0.1479189 -4.722121
##                                      pvalue         padj
## ENSG00000179593.16 ALOX15B     2.655536e-13 5.824918e-09
## ENSG00000279359.1 RP11-36D19.9 1.450807e-12 1.591172e-08
## ENSG00000141744.4 PNMT         5.583392e-11 4.082390e-07
## ENSG00000276085.1 CCL3L1       3.558581e-08 1.951437e-04
## ENSG00000057294.16 PKP2        6.523988e-08 2.862073e-04
## ENSG00000079215.15 SLC1A3      3.483078e-07 1.273355e-03
## ENSG00000164056.11 SPRY1       6.316694e-07 1.909446e-03
## ENSG00000233916.1 ZDHHC20P1    6.964017e-07 1.909446e-03
## ENSG00000277632.2 CCL3         1.298804e-06 3.165474e-03
## ENSG00000122644.13 ARL4A       2.333975e-06 5.119575e-03

mean(abs(dge$stat))

## [1] 0.733293

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 14 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange      lfcSE      stat
## ENSG00000123838.11 C4BPA      33.66576      2.4906731 0.51109471  4.873212
## ENSG00000131845.15 ZNF304    418.79903      0.2874110 0.06121095  4.695418
## ENSG00000277632.2 CCL3       559.74862     -1.2496211 0.26775579 -4.667018
## ENSG00000179593.16 ALOX15B   250.75091     -1.0955858 0.23730414 -4.616800
## ENSG00000122644.13 ARL4A     383.47521     -0.6835449 0.15051405 -4.541403
## ENSG00000229807.13 XIST    10626.92285     -1.8128259 0.41367719 -4.382223
## ENSG00000162599.17 NFIA      232.38832     -0.3571616 0.08172420 -4.370328
## ENSG00000276085.1 CCL3L1     308.64747     -1.2489298 0.29873614 -4.180712
## ENSG00000079215.15 SLC1A3    219.79787     -0.9048776 0.21710493 -4.167927
## ENSG00000115306.16 SPTBN1   3986.49830      0.2464913 0.06133022  4.019084
##                                  pvalue       padj
## ENSG00000123838.11 C4BPA   1.097980e-06 0.01722679
## ENSG00000131845.15 ZNF304  2.660615e-06 0.01722679
## ENSG00000277632.2 CCL3     3.056028e-06 0.01722679
## ENSG00000179593.16 ALOX15B 3.897024e-06 0.01722679
## ENSG00000122644.13 ARL4A   5.588117e-06 0.01976182
## ENSG00000229807.13 XIST    1.174742e-05 0.03133750
## ENSG00000162599.17 NFIA    1.240598e-05 0.03133750
## ENSG00000276085.1 CCL3L1   2.905975e-05 0.06039029
## ENSG00000079215.15 SLC1A3  3.073818e-05 0.06039029
## ENSG00000115306.16 SPTBN1  5.842491e-05 0.09853338

mean(abs(dge$stat))

## [1] 0.8180539

avb_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 19 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange      lfcSE      stat
## ENSG00000169429.11 CXCL8    1371.70225     -2.0614320 0.36621890 -5.628961
## ENSG00000131845.15 ZNF304    418.79903      0.2742914 0.05932004  4.623924
## ENSG00000234665.9 LERFS       41.30684     -1.7954846 0.40114817 -4.475864
## ENSG00000122644.13 ARL4A     383.47521     -0.6387885 0.14587971 -4.378872
## ENSG00000104361.10 NIPAL2    763.75731     -0.3251850 0.07817291 -4.159817
## ENSG00000276085.1 CCL3L1     308.64747     -1.2043196 0.29535827 -4.077488
## ENSG00000162599.17 NFIA      232.38832     -0.3313865 0.08168434 -4.056915
## ENSG00000256128.6 LINC00944  120.56609     -0.4234680 0.10605135 -3.993047
## ENSG00000166394.15 CYB5R2     33.12472     -0.5263833 0.13248031 -3.973294
## ENSG00000115306.16 SPTBN1   3986.49830      0.1888577 0.04792220  3.940922
##                                   pvalue         padj
## ENSG00000169429.11 CXCL8    1.812982e-08 0.0003976776
## ENSG00000131845.15 ZNF304   3.765473e-06 0.0412978268
## ENSG00000234665.9 LERFS     7.610294e-06 0.0556439312
## ENSG00000122644.13 ARL4A    1.192954e-05 0.0654186306
## ENSG00000104361.10 NIPAL2   3.185028e-05 0.1397271679
## ENSG00000276085.1 CCL3L1    4.552497e-05 0.1558172013
## ENSG00000162599.17 NFIA     4.972512e-05 0.1558172013
## ENSG00000256128.6 LINC00944 6.522973e-05 0.1618763170
## ENSG00000166394.15 CYB5R2   7.088532e-05 0.1618763170
## ENSG00000115306.16 SPTBN1   8.116910e-05 0.1618763170

mean(abs(dge$stat))

## [1] 0.8369

avb_t0_adj <- dge

Treatment A vs B at EOS

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 129 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000164056.11 SPRY1         165.16703     -2.6495876 0.15023175 -17.63667
## ENSG00000141744.4 PNMT            87.09740     -3.6453219 0.21308129 -17.10766
## ENSG00000048740.18 CELF2       14860.56325     -0.8538206 0.06266085 -13.62606
## ENSG00000279359.1 RP11-36D19.9   103.42684     -3.9721070 0.31107434 -12.76900
## ENSG00000179593.16 ALOX15B       847.15559     -3.1265685 0.24545606 -12.73779
## ENSG00000057294.16 PKP2          172.03315     -2.2928606 0.18394552 -12.46489
## ENSG00000064300.9 NGFR            61.81062     -2.2893270 0.18445275 -12.41145
## ENSG00000196935.9 SRGAP1         329.84060     -1.7220092 0.14799455 -11.63563
## ENSG00000272870.3 SAP30-DT       136.77299     -0.7521312 0.06569743 -11.44841
## ENSG00000145990.11 GFOD1        1933.33604     -1.2571490 0.11028762 -11.39882
##                                      pvalue         padj
## ENSG00000164056.11 SPRY1       1.288377e-69 2.843061e-65
## ENSG00000141744.4 PNMT         1.301255e-65 1.435740e-61
## ENSG00000048740.18 CELF2       2.802994e-42 2.061789e-38
## ENSG00000279359.1 RP11-36D19.9 2.442852e-37 1.347661e-33
## ENSG00000179593.16 ALOX15B     3.645637e-37 1.608966e-33
## ENSG00000057294.16 PKP2        1.160326e-35 4.267485e-32
## ENSG00000064300.9 NGFR         2.265006e-35 7.140270e-32
## ENSG00000196935.9 SRGAP1       2.715924e-31 7.491537e-28
## ENSG00000272870.3 SAP30-DT     2.394942e-30 5.872132e-27
## ENSG00000145990.11 GFOD1       4.238097e-30 9.352208e-27

mean(abs(dge$stat))

## [1] 1.492199

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 9 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000164056.11 SPRY1         165.16703     -2.7059237 0.15638312 -17.30317
## ENSG00000141744.4 PNMT            87.09740     -3.5010599 0.21937506 -15.95924
## ENSG00000279359.1 RP11-36D19.9   103.42684     -4.2902517 0.32298993 -13.28293
## ENSG00000179593.16 ALOX15B       847.15559     -3.2375724 0.25327794 -12.78269
## ENSG00000196935.9 SRGAP1         329.84060     -1.8289813 0.14310723 -12.78050
## ENSG00000048740.18 CELF2       14860.56325     -0.8384829 0.06660418 -12.58904
## ENSG00000057294.16 PKP2          172.03315     -2.2412930 0.18649597 -12.01792
## ENSG00000064300.9 NGFR            61.81062     -2.3210963 0.19383753 -11.97444
## ENSG00000272870.3 SAP30-DT       136.77299     -0.7544104 0.06880368 -10.96468
## ENSG00000145990.11 GFOD1        1933.33604     -1.2370843 0.11592467 -10.67145
##                                      pvalue         padj
## ENSG00000164056.11 SPRY1       4.452072e-67 9.824386e-63
## ENSG00000141744.4 PNMT         2.456924e-57 2.710847e-53
## ENSG00000279359.1 RP11-36D19.9 2.908007e-40 2.139033e-36
## ENSG00000179593.16 ALOX15B     2.048692e-37 9.300053e-34
## ENSG00000196935.9 SRGAP1       2.107231e-37 9.300053e-34
## ENSG00000048740.18 CELF2       2.425963e-36 8.922286e-33
## ENSG00000057294.16 PKP2        2.860952e-33 9.018946e-30
## ENSG00000064300.9 NGFR         4.836794e-33 1.334169e-29
## ENSG00000272870.3 SAP30-DT     5.649914e-28 1.385296e-24
## ENSG00000145990.11 GFOD1       1.384500e-26 3.055175e-23

mean(abs(dge$stat))

## [1] 1.414198

avb_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 12 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000164056.11 SPRY1         165.1670     -2.7923982 0.17387073 -16.06020
## ENSG00000179593.16 ALOX15B       847.1556     -3.8150687 0.26013490 -14.66573
## ENSG00000198585.12 NUDT16       5301.4426     -1.3105279 0.09401215 -13.93998
## ENSG00000135678.12 CPM           575.4329     -1.7177297 0.12683010 -13.54355
## ENSG00000111666.11 CHPT1        1197.7668     -1.0307176 0.07716078 -13.35805
## ENSG00000279359.1 RP11-36D19.9   103.4268     -4.1756341 0.32422360 -12.87887
## ENSG00000141744.4 PNMT            87.0974     -3.0318991 0.23852957 -12.71079
## ENSG00000177575.13 CD163       25120.8560     -2.0894785 0.16631212 -12.56360
## ENSG00000171105.14 INSR         1211.6882     -1.3065370 0.10720954 -12.18676
## ENSG00000136478.8 TEX2           944.3836     -0.9718897 0.08209004 -11.83931
##                                      pvalue         padj
## ENSG00000164056.11 SPRY1       4.850104e-58 1.070273e-53
## ENSG00000179593.16 ALOX15B     1.068588e-48 1.179026e-44
## ENSG00000198585.12 NUDT16      3.620213e-44 2.662908e-40
## ENSG00000135678.12 CPM         8.650568e-42 4.772302e-38
## ENSG00000111666.11 CHPT1       1.063116e-40 4.691957e-37
## ENSG00000279359.1 RP11-36D19.9 5.919469e-38 2.177082e-34
## ENSG00000141744.4 PNMT         5.151169e-37 1.623869e-33
## ENSG00000177575.13 CD163       3.347600e-36 9.233935e-33
## ENSG00000171105.14 INSR        3.656660e-34 8.965724e-31
## ENSG00000136478.8 TEX2         2.444439e-32 5.394144e-29

mean(abs(dge$stat))

## [1] 1.213911

avb_eos_adj <- dge

Treatment A vs B at POD1

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 253 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000186081.12 KRT5          14.54120      2.1890736 0.27368008  7.998659
## ENSG00000115414.21 FN1          184.93708     -1.5590144 0.21045312 -7.407894
## ENSG00000155659.15 VSIG4       1506.08672     -2.0082041 0.28041210 -7.161617
## ENSG00000149534.9 MS4A2          83.83329      1.4807287 0.21267882  6.962276
## ENSG00000154269.15 ENPP3         37.09384      1.2813396 0.18714544  6.846758
## ENSG00000131016.17 AKAP12       103.10713      1.4235129 0.21406789  6.649820
## ENSG00000259162.1 RP11-203M5.6   24.74288      1.3334574 0.20408383  6.533871
## ENSG00000140287.11 HDC          496.98643      1.4313408 0.22105777  6.474963
## ENSG00000179348.12 GATA2        687.51559      1.2570611 0.19461993  6.459056
## ENSG00000163050.18 COQ8A       2009.95700     -0.3111471 0.04862536 -6.398865
##                                      pvalue         padj
## ENSG00000186081.12 KRT5        1.257816e-15 2.680784e-11
## ENSG00000115414.21 FN1         1.283207e-13 1.367449e-09
## ENSG00000155659.15 VSIG4       7.973092e-13 5.664350e-09
## ENSG00000149534.9 MS4A2        3.348181e-12 1.783995e-08
## ENSG00000154269.15 ENPP3       7.554214e-12 3.220059e-08
## ENSG00000131016.17 AKAP12      2.934514e-11 1.042388e-07
## ENSG00000259162.1 RP11-203M5.6 6.409120e-11 1.951394e-07
## ENSG00000140287.11 HDC         9.483534e-11 2.494996e-07
## ENSG00000179348.12 GATA2       1.053581e-10 2.494996e-07
## ENSG00000163050.18 COQ8A       1.565362e-10 3.336256e-07

mean(abs(dge$stat))

## [1] 1.083107

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 19 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000186081.12 KRT5          14.54120       2.189179 0.2813120  7.782032
## ENSG00000155659.15 VSIG4       1506.08672      -2.046371 0.2808756 -7.285686
## ENSG00000149534.9 MS4A2          83.83329       1.501000 0.2091159  7.177839
## ENSG00000259162.1 RP11-203M5.6   24.74288       1.419827 0.1981389  7.165817
## ENSG00000229961.3 RP11-71G12.1   66.93635       1.279558 0.1908412  6.704834
## ENSG00000154269.15 ENPP3         37.09384       1.266610 0.1901442  6.661315
## ENSG00000131016.17 AKAP12       103.10713       1.405739 0.2170297  6.477170
## ENSG00000179348.12 GATA2        687.51559       1.268771 0.1991819  6.369909
## ENSG00000115414.21 FN1          184.93708      -1.183497 0.1874768 -6.312766
## ENSG00000140287.11 HDC          496.98643       1.423136 0.2259646  6.298047
##                                      pvalue         padj
## ENSG00000186081.12 KRT5        7.136868e-15 1.521081e-10
## ENSG00000155659.15 VSIG4       3.200364e-13 3.410468e-09
## ENSG00000149534.9 MS4A2        7.082214e-13 4.120009e-09
## ENSG00000259162.1 RP11-203M5.6 7.732388e-13 4.120009e-09
## ENSG00000229961.3 RP11-71G12.1 2.016353e-11 8.594904e-08
## ENSG00000154269.15 ENPP3       2.713883e-11 9.640164e-08
## ENSG00000131016.17 AKAP12      9.345891e-11 2.845557e-07
## ENSG00000179348.12 GATA2       1.891400e-10 5.038926e-07
## ENSG00000115414.21 FN1         2.740912e-10 6.424139e-07
## ENSG00000140287.11 HDC         3.014188e-10 6.424139e-07

mean(abs(dge$stat))

## [1] 0.9476336

avb_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 7 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000186081.12 KRT5           14.54120      2.2389705 0.28501584  7.855600
## ENSG00000155659.15 VSIG4        1506.08672     -1.8413863 0.23966497 -7.683168
## ENSG00000259162.1 RP11-203M5.6    24.74288      1.4492545 0.19622316  7.385746
## ENSG00000229961.3 RP11-71G12.1    66.93635      1.3530051 0.19043462  7.104828
## ENSG00000149534.9 MS4A2           83.83329      1.4271747 0.20531024  6.951308
## ENSG00000105426.17 PTPRS         203.16124      0.8200685 0.12225141  6.708049
## ENSG00000131016.17 AKAP12        103.10713      1.4256435 0.21310542  6.689851
## ENSG00000154269.15 ENPP3          37.09384      1.1882697 0.18396236  6.459309
## ENSG00000135218.19 CD36        11489.98755      0.4990621 0.07877094  6.335612
## ENSG00000070915.10 SLC12A3        21.89994      1.1148229 0.17749304  6.280938
##                                      pvalue         padj
## ENSG00000186081.12 KRT5        3.978614e-15 8.479620e-11
## ENSG00000155659.15 VSIG4       1.552015e-14 1.653905e-10
## ENSG00000259162.1 RP11-203M5.6 1.516008e-13 1.077022e-09
## ENSG00000229961.3 RP11-71G12.1 1.204729e-12 6.419095e-09
## ENSG00000149534.9 MS4A2        3.619152e-12 1.542700e-08
## ENSG00000105426.17 PTPRS       1.972438e-11 6.801829e-08
## ENSG00000131016.17 AKAP12      2.233979e-11 6.801829e-08
## ENSG00000154269.15 ENPP3       1.051821e-10 2.802183e-07
## ENSG00000135218.19 CD36        2.364016e-10 5.598251e-07
## ENSG00000070915.10 SLC12A3     3.365353e-10 7.172577e-07

mean(abs(dge$stat))

## [1] 1.031521

avb_pod1_adj <- dge

Treatment group differences stratified

Treatment A vs B t0 in CRP low group

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2 <- subset(ss2,crp_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 480 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000279359.1 RP11-36D19.9  39.871182     -3.3255603 0.6426406 -5.174837
## ENSG00000141744.4 PNMT          51.670695     -2.1905424 0.4563927 -4.799687
## ENSG00000204936.10 CD177       229.544285     -2.5908627 0.5761080 -4.497182
## ENSG00000169429.11 CXCL8       901.340924     -2.4617931 0.5620815 -4.379780
## ENSG00000179593.16 ALOX15B     395.592390     -2.3562217 0.5403176 -4.360808
## ENSG00000122644.13 ARL4A       438.849245     -0.9185857 0.2291829 -4.008090
## ENSG00000115155.19 OTOF        120.892044      1.2138657 0.3046429  3.984553
## ENSG00000258471.2 RP11-84C10.4  14.882458      0.8381249 0.2161208  3.878039
## ENSG00000253230.9 MIR124-1HG     8.779545     -3.3333199 0.8661717 -3.848336
## ENSG00000079215.15 SLC1A3      309.537268     -2.0483519 0.5335749 -3.838921
##                                      pvalue        padj
## ENSG00000279359.1 RP11-36D19.9 2.281100e-07 0.005003594
## ENSG00000141744.4 PNMT         1.589136e-06 0.017428844
## ENSG00000204936.10 CD177       6.885989e-06 0.050348057
## ENSG00000169429.11 CXCL8       1.187994e-05 0.056847985
## ENSG00000179593.16 ALOX15B     1.295828e-05 0.056847985
## ENSG00000122644.13 ARL4A       6.121173e-05 0.211852584
## ENSG00000115155.19 OTOF        6.760739e-05 0.211852584
## ENSG00000258471.2 RP11-84C10.4 1.053019e-04 0.257544790
## ENSG00000253230.9 MIR124-1HG   1.189227e-04 0.257544790
## ENSG00000079215.15 SLC1A3      1.235763e-04 0.257544790

mean(abs(dge$stat))

## [1] 0.7426341

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 14 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000123838.11 C4BPA         56.27087      4.1714068 0.7549693  5.525267
## ENSG00000115155.19 OTOF         120.89204      1.3340109 0.2995082  4.454005
## ENSG00000258471.2 RP11-84C10.4   14.88246      0.9535482 0.2237348  4.261957
## ENSG00000234665.9 LERFS          57.91435     -2.2458162 0.5334487 -4.209995
## ENSG00000119922.11 IFIT2       1771.40268      1.5450687 0.3900461  3.961246
## ENSG00000126262.5 FFAR2        1188.14797      1.9522596 0.5036095  3.876534
## ENSG00000185745.10 IFIT1        822.12911      1.4987801 0.3868363  3.874456
## ENSG00000215630.6 GUSBP9        203.39614     -0.6165686 0.1653777 -3.728245
## ENSG00000119917.15 IFIT3       1355.72352      1.3841010 0.3721027  3.719675
## ENSG00000287095.1 CTC-215C12.2   51.19664      0.7011972 0.1942702  3.609391
##                                      pvalue         padj
## ENSG00000123838.11 C4BPA       3.289854e-08 0.0007216295
## ENSG00000115155.19 OTOF        8.428321e-06 0.0924376158
## ENSG00000258471.2 RP11-84C10.4 2.026441e-05 0.1400421942
## ENSG00000234665.9 LERFS        2.553767e-05 0.1400421942
## ENSG00000119922.11 IFIT2       7.455966e-05 0.3270932241
## ENSG00000126262.5 FFAR2        1.059549e-04 0.3348632016
## ENSG00000185745.10 IFIT1       1.068631e-04 0.3348632016
## ENSG00000215630.6 GUSBP9       1.928181e-04 0.4861753379
## ENSG00000119917.15 IFIT3       1.994793e-04 0.4861753379
## ENSG00000287095.1 CTC-215C12.2 3.069169e-04 0.5721559718

mean(abs(dge$stat))

## [1] 0.7451909

avb_crplo_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 34 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000074803.20 SLC12A1       84.81635     -2.4537886 0.56636642 -4.332511
## ENSG00000198794.12 SCAMP5       140.06582      0.6687557 0.16129243  4.146231
## ENSG00000258471.2 RP11-84C10.4   14.88246      0.9213004 0.22556852  4.084348
## ENSG00000146426.19 TIAM2        156.83010      0.2658418 0.06670186  3.985523
## ENSG00000165029.17 ABCA1        721.16202      0.4761514 0.12104587  3.933644
## ENSG00000177191.2 B3GNT8        337.40623     -0.4161721 0.10764360 -3.866204
## ENSG00000215630.6 GUSBP9        203.39614     -0.5856877 0.15250841 -3.840363
## ENSG00000125384.7 PTGER2       3444.28853     -0.3095152 0.08088605 -3.826559
## ENSG00000115155.19 OTOF         120.89204      1.1760123 0.31810878  3.696887
## ENSG00000234665.9 LERFS          57.91435     -1.9741234 0.53705027 -3.675863
##                                      pvalue      padj
## ENSG00000074803.20 SLC12A1     1.474184e-05 0.3231808
## ENSG00000198794.12 SCAMP5      3.379930e-05 0.3231808
## ENSG00000258471.2 RP11-84C10.4 4.420071e-05 0.3231808
## ENSG00000146426.19 TIAM2       6.733154e-05 0.3562985
## ENSG00000165029.17 ABCA1       8.366765e-05 0.3562985
## ENSG00000177191.2 B3GNT8       1.105425e-04 0.3562985
## ENSG00000215630.6 GUSBP9       1.228525e-04 0.3562985
## ENSG00000125384.7 PTGER2       1.299470e-04 0.3562985
## ENSG00000115155.19 OTOF        2.182591e-04 0.5031082
## ENSG00000234665.9 LERFS        2.370464e-04 0.5031082

mean(abs(dge$stat))

## [1] 0.7355129

avb_crplo_t0_adj <- dge

Treatment A vs B t0 in CRP high group

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2 <- subset(ss2,crp_group==4)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 280 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000211652.2 IGLV7-43      57.72448      2.1239087 0.4170212  5.093048
## ENSG00000276085.1 CCL3L1       255.05022     -2.1290046 0.4413594 -4.823744
## ENSG00000263711.6 LINC02864     15.37815     -1.0912344 0.2360114 -4.623651
## ENSG00000211655.3 IGLV1-36      15.01384      1.7966484 0.3932588  4.568616
## ENSG00000278920.1 RP3-412A9.17 103.00331     -0.4584785 0.1045193 -4.386546
## ENSG00000211644.3 IGLV1-51     213.01635      1.6268219 0.3714944  4.379130
## ENSG00000211640.4 IGLV6-57      85.08443      1.7610091 0.4123498  4.270669
## ENSG00000211673.2 IGLV3-1      232.41139      1.6806179 0.3957575  4.246585
## ENSG00000211659.2 IGLV3-25     147.02115      1.3601124 0.3264996  4.165740
## ENSG00000203999.9 LINC01270    100.96329     -0.9138279 0.2252296 -4.057317
##                                      pvalue        padj
## ENSG00000211652.2 IGLV7-43     3.523525e-07 0.007728851
## ENSG00000276085.1 CCL3L1       1.408886e-06 0.015451954
## ENSG00000263711.6 LINC02864    3.770448e-06 0.026922780
## ENSG00000211655.3 IGLV1-36     4.909556e-06 0.026922780
## ENSG00000278920.1 RP3-412A9.17 1.151649e-05 0.043560797
## ENSG00000211644.3 IGLV1-51     1.191542e-05 0.043560797
## ENSG00000211640.4 IGLV6-57     1.948878e-05 0.059513437
## ENSG00000211673.2 IGLV3-1      2.170538e-05 0.059513437
## ENSG00000211659.2 IGLV3-25     3.103448e-05 0.075637926
## ENSG00000203999.9 LINC01270    4.963970e-05 0.108884692

mean(abs(dge$stat))

## [1] 0.9391897

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 7 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000274611.4 TBC1D3          75.11060     28.5191889 3.4788629  8.197848
## ENSG00000225630.1 MTND2P28       130.06123     -2.3686307 0.4535022 -5.222975
## ENSG00000276085.1 CCL3L1         255.05022     -2.3116418 0.4587641 -5.038847
## ENSG00000211652.2 IGLV7-43        57.72448      2.0854416 0.4313195  4.835027
## ENSG00000278920.1 RP3-412A9.17   103.00331     -0.5045173 0.1058888 -4.764595
## ENSG00000263711.6 LINC02864       15.37815     -1.0913346 0.2490844 -4.381385
## ENSG00000211935.3 IGHV1-3        103.80490      1.9185937 0.4471155  4.291047
## ENSG00000272763.1 RP11-357H14.17  12.74908     -1.7976987 0.4292167 -4.188324
## ENSG00000248801.7 C8orf34-AS1     64.73886     -0.4796651 0.1151577 -4.165288
## ENSG00000211655.3 IGLV1-36        15.01384      1.5932344 0.3874480  4.112124
##                                        pvalue         padj
## ENSG00000274611.4 TBC1D3         2.447286e-16 5.368122e-12
## ENSG00000225630.1 MTND2P28       1.760713e-07 1.931062e-03
## ENSG00000276085.1 CCL3L1         4.683455e-07 3.424386e-03
## ENSG00000211652.2 IGLV7-43       1.331273e-06 7.300367e-03
## ENSG00000278920.1 RP3-412A9.17   1.892338e-06 8.301685e-03
## ENSG00000263711.6 LINC02864      1.179272e-05 4.311222e-02
## ENSG00000211935.3 IGHV1-3        1.778326e-05 5.572511e-02
## ENSG00000272763.1 RP11-357H14.17 2.810225e-05 7.578767e-02
## ENSG00000248801.7 C8orf34-AS1    3.109592e-05 7.578767e-02
## ENSG00000211655.3 IGLV1-36       3.920355e-05 8.599298e-02

mean(abs(dge$stat))

## [1] 0.9639989

avb_crphi_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 40 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000278920.1 RP3-412A9.17   103.00331     -0.5815433 0.10376894 -5.604213
## ENSG00000184166.3 OR1D2           43.84299     -0.7606212 0.14772438 -5.148921
## ENSG00000130368.7 MAS1            36.87331     -0.6653173 0.13143995 -5.061759
## ENSG00000243290.3 IGKV1-12        66.84308      1.3286837 0.28374046  4.682743
## ENSG00000100304.13 TTLL12       1016.56775      0.3673839 0.07974214  4.607149
## ENSG00000248801.7 C8orf34-AS1     64.73886     -0.5485503 0.12079641 -4.541115
## ENSG00000261501.1 BBS7-DT         61.88604     -0.6923031 0.15272923 -4.532879
## ENSG00000287671.1 RP11-728E14.5  125.36742     -0.5274317 0.11684974 -4.513760
## ENSG00000142910.16 TINAGL1        22.44874     -0.6820575 0.15340449 -4.446138
## ENSG00000229321.2 AC008269.2      16.95875     -0.7855289 0.17956739 -4.374563
##                                       pvalue         padj
## ENSG00000278920.1 RP3-412A9.17  2.092028e-08 0.0004588863
## ENSG00000184166.3 OR1D2         2.619887e-07 0.0028733615
## ENSG00000130368.7 MAS1          4.154066e-07 0.0030373144
## ENSG00000243290.3 IGKV1-12      2.830609e-06 0.0155223525
## ENSG00000100304.13 TTLL12       4.082279e-06 0.0174625398
## ENSG00000248801.7 C8orf34-AS1   5.595761e-06 0.0174625398
## ENSG00000261501.1 BBS7-DT       5.818510e-06 0.0174625398
## ENSG00000287671.1 RP11-728E14.5 6.368831e-06 0.0174625398
## ENSG00000142910.16 TINAGL1      8.742790e-06 0.0213081227
## ENSG00000229321.2 AC008269.2    1.216759e-05 0.0248308670

mean(abs(dge$stat))

## [1] 1.015081

avb_crphi_t0_adj <- dge

Treatment A vs B EOS in CRP low group

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2 <- subset(ss2,crp_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 107 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE       stat
## ENSG00000279359.1 RP11-36D19.9   119.04727     -5.1395504 0.37985433 -13.530320
## ENSG00000141744.4 PNMT           139.75155     -3.9629221 0.29974528 -13.220966
## ENSG00000164056.11 SPRY1         233.24893     -2.7293406 0.22123908 -12.336611
## ENSG00000101187.16 SLCO4A1       173.38666     -2.6729667 0.23016884 -11.613069
## ENSG00000145990.11 GFOD1        2388.12563     -1.6208396 0.15147358 -10.700478
## ENSG00000048740.18 CELF2       17256.47082     -0.9431284 0.08907268 -10.588301
## ENSG00000079215.15 SLC1A3       1233.66208     -3.3548344 0.32004337 -10.482437
## ENSG00000064300.9 NGFR            79.80579     -2.7214417 0.27282216  -9.975149
## ENSG00000057294.16 PKP2          259.99533     -2.5509565 0.28208476  -9.043227
## ENSG00000168807.16 SNTB2        1676.95276     -0.9053557 0.10490101  -8.630572
##                                      pvalue         padj
## ENSG00000279359.1 RP11-36D19.9 1.035724e-41 2.285429e-37
## ENSG00000141744.4 PNMT         6.640567e-40 7.326538e-36
## ENSG00000164056.11 SPRY1       5.752639e-35 4.231258e-31
## ENSG00000101187.16 SLCO4A1     3.536877e-31 1.951118e-27
## ENSG00000145990.11 GFOD1       1.012550e-26 4.468585e-23
## ENSG00000048740.18 CELF2       3.376656e-26 1.241821e-22
## ENSG00000079215.15 SLC1A3      1.040281e-25 3.279263e-22
## ENSG00000064300.9 NGFR         1.958070e-23 5.400845e-20
## ENSG00000057294.16 PKP2        1.521137e-19 3.729489e-16
## ENSG00000168807.16 SNTB2       6.104587e-18 1.347038e-14

mean(abs(dge$stat))

## [1] 1.272032

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 16 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE       stat
## ENSG00000279359.1 RP11-36D19.9   119.04727     -5.3472996 0.39841130 -13.421556
## ENSG00000141744.4 PNMT           139.75155     -3.7766877 0.30856468 -12.239533
## ENSG00000164056.11 SPRY1         233.24893     -2.8225809 0.23140843 -12.197399
## ENSG00000079215.15 SLC1A3       1233.66208     -3.7771409 0.31036871 -12.169851
## ENSG00000101187.16 SLCO4A1       173.38666     -2.7558923 0.24132532 -11.419822
## ENSG00000048740.18 CELF2       17256.47082     -0.9616907 0.09345649 -10.290251
## ENSG00000145990.11 GFOD1        2388.12563     -1.5620788 0.15651131  -9.980613
## ENSG00000057294.16 PKP2          259.99533     -2.7302298 0.28018872  -9.744253
## ENSG00000064300.9 NGFR            79.80579     -2.5660575 0.28091181  -9.134744
## ENSG00000119138.5 KLF9          2597.08994     -1.0787187 0.11881849  -9.078710
##                                      pvalue         padj
## ENSG00000279359.1 RP11-36D19.9 4.521202e-41 9.976937e-37
## ENSG00000141744.4 PNMT         1.911285e-34 2.108816e-30
## ENSG00000164056.11 SPRY1       3.209131e-34 2.360530e-30
## ENSG00000079215.15 SLC1A3      4.499039e-34 2.482007e-30
## ENSG00000101187.16 SLCO4A1     3.329138e-30 1.469282e-26
## ENSG00000048740.18 CELF2       7.797324e-25 2.867726e-21
## ENSG00000145990.11 GFOD1       1.853188e-23 5.842043e-20
## ENSG00000057294.16 PKP2        1.952091e-22 5.384598e-19
## ENSG00000064300.9 NGFR         6.556155e-20 1.607496e-16
## ENSG00000119138.5 KLF9         1.098688e-19 2.424474e-16

mean(abs(dge$stat))

## [1] 1.214831

avb_crplo_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 36 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE       stat
## ENSG00000279359.1 RP11-36D19.9   119.0473      -5.771853 0.4355807 -13.250941
## ENSG00000079215.15 SLC1A3       1233.6621      -4.437439 0.3485794 -12.730066
## ENSG00000164056.11 SPRY1         233.2489      -2.991515 0.2506529 -11.934892
## ENSG00000177575.13 CD163       22464.4744      -2.338915 0.2145180 -10.903116
## ENSG00000198363.18 ASPH         1869.6176      -1.762933 0.1692565 -10.415751
## ENSG00000101187.16 SLCO4A1       173.3867      -2.889167 0.2818293 -10.251477
## ENSG00000179593.16 ALOX15B      1208.5228      -4.514366 0.4436746 -10.174949
## ENSG00000174705.13 SH3PXD2B      449.8765      -2.978486 0.2929394 -10.167583
## ENSG00000111666.11 CHPT1        1299.4517      -1.180981 0.1223822  -9.649940
## ENSG00000135678.12 CPM           757.4190      -2.017817 0.2097880  -9.618365
##                                      pvalue         padj
## ENSG00000279359.1 RP11-36D19.9 4.455820e-40 9.832658e-36
## ENSG00000079215.15 SLC1A3      4.024988e-37 4.440971e-33
## ENSG00000164056.11 SPRY1       7.786033e-33 5.727146e-29
## ENSG00000177575.13 CD163       1.113768e-27 6.144379e-24
## ENSG00000198363.18 ASPH        2.101319e-25 9.273963e-22
## ENSG00000101187.16 SLCO4A1     1.165485e-24 4.286460e-21
## ENSG00000179593.16 ALOX15B     2.565331e-24 7.632111e-21
## ENSG00000174705.13 SH3PXD2B    2.766887e-24 7.632111e-21
## ENSG00000111666.11 CHPT1       4.918461e-22 1.205952e-18
## ENSG00000135678.12 CPM         6.688594e-22 1.475972e-18

mean(abs(dge$stat))

## [1] 1.063154

avb_crplo_eos_adj <- dge

Treatment A vs B EOS in CRP high group

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2 <- subset(ss2,crp_group==4)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 138 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange      lfcSE       stat
## ENSG00000164056.11 SPRY1       108.76416      -2.532611 0.23899712 -10.596828
## ENSG00000141744.4 PNMT          43.59054      -3.129900 0.34146482  -9.166098
## ENSG00000179593.16 ALOX15B     547.79622      -3.162199 0.34916134  -9.056555
## ENSG00000279359.1 RP11-36D19.9  90.34984      -4.142272 0.49438312  -8.378669
## ENSG00000196935.9 SRGAP1       282.89286      -1.854628 0.22365011  -8.292544
## ENSG00000078053.17 AMPH        157.80806      -2.281000 0.27608001  -8.262098
## ENSG00000162599.17 NFIA        308.67237      -1.033152 0.12746020  -8.105685
## ENSG00000272870.3 SAP30-DT     121.11095      -0.790170 0.09793683  -8.068160
## ENSG00000110721.12 CHKA        869.95783      -1.224275 0.15266598  -8.019306
## ENSG00000121578.13 B4GALT4     865.64102      -1.070193 0.13932538  -7.681251
##                                      pvalue         padj
## ENSG00000164056.11 SPRY1       3.082572e-26 6.802313e-22
## ENSG00000141744.4 PNMT         4.904496e-20 5.411376e-16
## ENSG00000179593.16 ALOX15B     1.346361e-19 9.903380e-16
## ENSG00000279359.1 RP11-36D19.9 5.352966e-17 2.953098e-13
## ENSG00000196935.9 SRGAP1       1.108512e-16 4.892306e-13
## ENSG00000078053.17 AMPH        1.431340e-16 5.264228e-13
## ENSG00000162599.17 NFIA        5.244902e-16 1.653418e-12
## ENSG00000272870.3 SAP30-DT     7.136550e-16 1.968528e-12
## ENSG00000110721.12 CHKA        1.063440e-15 2.607436e-12
## ENSG00000121578.13 B4GALT4     1.575425e-14 3.476490e-11

mean(abs(dge$stat))

## [1] 1.198902

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 4 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE       stat
## ENSG00000164056.11 SPRY1        108.76416     -2.6668649 0.24704112 -10.795227
## ENSG00000279359.1 RP11-36D19.9   90.34984     -4.7806957 0.51198611  -9.337550
## ENSG00000141744.4 PNMT           43.59054     -3.2127745 0.34991481  -9.181590
## ENSG00000196935.9 SRGAP1        282.89286     -1.9362951 0.21148877  -9.155545
## ENSG00000179593.16 ALOX15B      547.79622     -3.1108104 0.36495565  -8.523804
## ENSG00000272870.3 SAP30-DT      121.11095     -0.8277754 0.09990025  -8.286020
## ENSG00000198585.12 NUDT16      4719.70533     -1.1852330 0.14956356  -7.924611
## ENSG00000121578.13 B4GALT4      865.64102     -1.0802468 0.13973364  -7.730757
## ENSG00000078053.17 AMPH         157.80806     -2.1961542 0.28426014  -7.725860
## ENSG00000124523.17 SIRT5        897.95933     -0.9857449 0.12942403  -7.616398
##                                      pvalue         padj
## ENSG00000164056.11 SPRY1       3.625670e-27 8.000766e-23
## ENSG00000279359.1 RP11-36D19.9 9.858924e-21 1.087784e-16
## ENSG00000141744.4 PNMT         4.247710e-20 2.983674e-16
## ENSG00000196935.9 SRGAP1       5.408391e-20 2.983674e-16
## ENSG00000179593.16 ALOX15B     1.543970e-17 6.814156e-14
## ENSG00000272870.3 SAP30-DT     1.171015e-16 4.306798e-13
## ENSG00000198585.12 NUDT16      2.288621e-15 7.214715e-12
## ENSG00000121578.13 B4GALT4     1.069090e-14 2.724055e-11
## ENSG00000078053.17 AMPH        1.111002e-14 2.724055e-11
## ENSG00000124523.17 SIRT5       2.608535e-14 5.404484e-11

mean(abs(dge$stat))

## [1] 1.268338

avb_crphi_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 21 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000164056.11 SPRY1         108.76416     -2.6240004 0.2866809 -9.153036
## ENSG00000198585.12 NUDT16       4719.70533     -1.1065368 0.1246267 -8.878812
## ENSG00000179593.16 ALOX15B       547.79622     -2.9766733 0.3371682 -8.828453
## ENSG00000141744.4 PNMT            43.59054     -2.7567440 0.3639445 -7.574628
## ENSG00000135678.12 CPM           424.57761     -1.1706712 0.1570310 -7.455033
## ENSG00000136478.8 TEX2           868.89100     -0.8306822 0.1119672 -7.418975
## ENSG00000279359.1 RP11-36D19.9    90.34984     -3.6900821 0.4986320 -7.400412
## ENSG00000196935.9 SRGAP1         282.89286     -1.4946291 0.2031922 -7.355740
## ENSG00000177575.13 CD163       27248.95998     -1.9982460 0.2717813 -7.352403
## ENSG00000111666.11 CHPT1        1111.30704     -0.9224094 0.1255707 -7.345735
##                                      pvalue         padj
## ENSG00000164056.11 SPRY1       5.535555e-20 1.221531e-15
## ENSG00000198585.12 NUDT16      6.757838e-19 7.456261e-15
## ENSG00000179593.16 ALOX15B     1.061335e-18 7.806830e-15
## ENSG00000141744.4 PNMT         3.601587e-14 1.986906e-10
## ENSG00000135678.12 CPM         8.984555e-14 3.965244e-10
## ENSG00000136478.8 TEX2         1.180302e-13 4.279816e-10
## ENSG00000279359.1 RP11-36D19.9 1.357625e-13 4.279816e-10
## ENSG00000196935.9 SRGAP1       1.898722e-13 4.515593e-10
## ENSG00000177575.13 CD163       1.946742e-13 4.515593e-10
## ENSG00000111666.11 CHPT1       2.046310e-13 4.515593e-10

mean(abs(dge$stat))

## [1] 0.9300171

avb_crphi_eos_adj <- dge

Treatment A vs B POD1 in CRP low group

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2 <- subset(ss2,crp_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 101 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000186081.12 KRT5          14.54816       2.576551 0.4235145  6.083739
## ENSG00000204044.6 SLC12A5-AS1    25.14097      -3.075518 0.5156713 -5.964106
## ENSG00000152463.15 OLAH          24.53332      -2.873612 0.4851144 -5.923576
## ENSG00000146072.6 TNFRSF21       42.89273       1.040202 0.1856047  5.604397
## ENSG00000259162.1 RP11-203M5.6   26.14874       1.581181 0.2897282  5.457464
## ENSG00000142627.13 EPHA2         19.20075       1.637299 0.3055916  5.357800
## ENSG00000204936.10 CD177        618.22625      -3.037256 0.5784007 -5.251128
## ENSG00000155659.15 VSIG4       1772.93303      -2.147337 0.4164836 -5.155873
## ENSG00000154269.15 ENPP3         32.41553       1.154158 0.2238815  5.155219
## ENSG00000229961.3 RP11-71G12.1   75.23398       1.526810 0.2963257  5.152473
##                                      pvalue         padj
## ENSG00000186081.12 KRT5        1.174117e-09 2.237975e-05
## ENSG00000204044.6 SLC12A5-AS1  2.459774e-09 2.237975e-05
## ENSG00000152463.15 OLAH        3.150155e-09 2.237975e-05
## ENSG00000146072.6 TNFRSF21     2.089813e-08 1.113505e-04
## ENSG00000259162.1 RP11-203M5.6 4.829843e-08 2.058769e-04
## ENSG00000142627.13 EPHA2       8.424125e-08 2.992389e-04
## ENSG00000204936.10 CD177       1.511709e-07 4.602722e-04
## ENSG00000155659.15 VSIG4       2.524517e-07 5.478997e-04
## ENSG00000154269.15 ENPP3       2.533344e-07 5.478997e-04
## ENSG00000229961.3 RP11-71G12.1 2.570730e-07 5.478997e-04

mean(abs(dge$stat))

## [1] 1.083967

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 13 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000155659.15 VSIG4       1772.93303     -2.8430229 0.3779008 -7.523199
## ENSG00000087116.16 ADAMTS2     1436.45625     -3.7809954 0.5047904 -7.490228
## ENSG00000186081.12 KRT5          14.54816      2.8196082 0.4406358  6.398953
## ENSG00000100985.7 MMP9         3694.58444     -2.9906166 0.5378907 -5.559897
## ENSG00000229961.3 RP11-71G12.1   75.23398      1.6244682 0.2922447  5.558589
## ENSG00000259162.1 RP11-203M5.6   26.14874      1.5474793 0.2819920  5.487672
## ENSG00000105223.20 PLD3        5830.74295     -0.5634004 0.1044934 -5.391731
## ENSG00000142627.13 EPHA2         19.20075      1.7165708 0.3199429  5.365241
## ENSG00000149534.9 MS4A2          73.42690      1.3047162 0.2497248  5.224615
## ENSG00000115590.14 IL1R2       1210.70599     -2.4983175 0.4790051 -5.215638
##                                      pvalue         padj
## ENSG00000155659.15 VSIG4       5.345199e-14 4.485859e-10
## ENSG00000087116.16 ADAMTS2     6.875406e-14 4.485859e-10
## ENSG00000186081.12 KRT5        1.564457e-10           NA
## ENSG00000100985.7 MMP9         2.699346e-08 1.174126e-04
## ENSG00000229961.3 RP11-71G12.1 2.719649e-08           NA
## ENSG00000259162.1 RP11-203M5.6 4.072656e-08           NA
## ENSG00000105223.20 PLD3        6.978235e-08 2.276475e-04
## ENSG00000142627.13 EPHA2       8.084104e-08           NA
## ENSG00000149534.9 MS4A2        1.745178e-07           NA
## ENSG00000115590.14 IL1R2       1.831854e-07 4.780772e-04

mean(abs(dge$stat))

## [1] 1.269171

avb_crplo_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 31 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000155659.15 VSIG4       1772.93303     -2.0162400 0.33720409 -5.979287
## ENSG00000087116.16 ADAMTS2     1436.45625     -2.5921949 0.44416561 -5.836100
## ENSG00000186081.12 KRT5          14.54816      2.3884544 0.46530924  5.133047
## ENSG00000101004.15 NINL         135.28899     -0.8218318 0.16059437 -5.117438
## ENSG00000229961.3 RP11-71G12.1   75.23398      1.5446697 0.30769579  5.020120
## ENSG00000135218.19 CD36        8887.60274      0.5861706 0.12676967  4.623902
## ENSG00000259162.1 RP11-203M5.6   26.14874      1.3790996 0.29872566  4.616609
## ENSG00000149534.9 MS4A2          73.42690      1.1437686 0.24915410  4.590607
## ENSG00000146072.6 TNFRSF21       42.89273      0.9927942 0.21822908  4.549321
## ENSG00000102524.12 TNFSF13B    1891.40085      0.4415408 0.09736608  4.534853
##                                      pvalue         padj
## ENSG00000155659.15 VSIG4       2.241164e-09 4.776592e-05
## ENSG00000087116.16 ADAMTS2     5.343688e-09 5.694501e-05
## ENSG00000186081.12 KRT5        2.850879e-07 1.650231e-03
## ENSG00000101004.15 NINL        3.097136e-07 1.650231e-03
## ENSG00000229961.3 RP11-71G12.1 5.163929e-07 2.201176e-03
## ENSG00000135218.19 CD36        3.765872e-06 1.177432e-02
## ENSG00000259162.1 RP11-203M5.6 3.900613e-06 1.177432e-02
## ENSG00000149534.9 MS4A2        4.419584e-06 1.177432e-02
## ENSG00000146072.6 TNFRSF21     5.381922e-06 1.228560e-02
## ENSG00000102524.12 TNFSF13B    5.764368e-06 1.228560e-02

mean(abs(dge$stat))

## [1] 1.007265

avb_crplo_pod1_adj <- dge

Treatment A vs B POD in CRP high group

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2 <- subset(ss2,crp_group==4)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ treatment_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 250 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000131016.17 AKAP12       118.08416       1.904352 0.3589243  5.305720
## ENSG00000149534.9 MS4A2          93.32501       1.866884 0.3696944  5.049802
## ENSG00000186081.12 KRT5          14.48988       2.017019 0.4076553  4.947854
## ENSG00000229961.3 RP11-71G12.1   58.76722       1.339119 0.2735828  4.894748
## ENSG00000140287.11 HDC          571.69806       1.885327 0.3856378  4.888854
## ENSG00000179348.12 GATA2        810.50517       1.608564 0.3351194  4.799972
## ENSG00000158715.6 SLC45A3       258.52371       1.306224 0.2724756  4.793914
## ENSG00000155659.15 VSIG4       1247.57497      -2.125654 0.4456523 -4.769759
## ENSG00000246363.3 LINC02458      30.59379       1.828502 0.3945107  4.634861
## ENSG00000259162.1 RP11-203M5.6   23.31281       1.481681 0.3212551  4.612163
##                                      pvalue        padj
## ENSG00000131016.17 AKAP12      1.122292e-07 0.002252664
## ENSG00000149534.9 MS4A2        4.422678e-07 0.004071602
## ENSG00000186081.12 KRT5        7.503634e-07 0.004071602
## ENSG00000229961.3 RP11-71G12.1 9.843169e-07 0.004071602
## ENSG00000140287.11 HDC         1.014249e-06 0.004071602
## ENSG00000179348.12 GATA2       1.586878e-06 0.004627772
## ENSG00000158715.6 SLC45A3      1.635586e-06 0.004627772
## ENSG00000155659.15 VSIG4       1.844469e-06 0.004627772
## ENSG00000246363.3 LINC02458    3.571771e-06 0.007965842
## ENSG00000259162.1 RP11-203M5.6 3.985011e-06 0.007998713

mean(abs(dge$stat))

## [1] 0.7008559

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 12 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange     lfcSE      stat
## ENSG00000274611.4 TBC1D3     56.78775     24.6259129 3.4483190  7.141425
## ENSG00000155659.15 VSIG4   1247.57497     -2.6203399 0.4278950 -6.123792
## ENSG00000225630.1 MTND2P28  106.98061     -2.1843250 0.4275721 -5.108671
## ENSG00000078053.17 AMPH     167.98498     -1.4433937 0.2894022 -4.987501
## ENSG00000186081.12 KRT5      14.48988      2.0084378 0.4078153  4.924871
## ENSG00000198794.12 SCAMP5    65.44507      0.9688744 0.1982756  4.886503
## ENSG00000131016.17 AKAP12   118.08416      1.8265393 0.3775147  4.838327
## ENSG00000211935.3 IGHV1-3    64.54582      1.5403262 0.3349438  4.598760
## ENSG00000179348.12 GATA2    810.50517      1.6245124 0.3536185  4.593969
## ENSG00000158715.6 SLC45A3   258.52371      1.3109805 0.2872035  4.564640
##                                  pvalue         padj
## ENSG00000274611.4 TBC1D3   9.236800e-13 1.968639e-08
## ENSG00000155659.15 VSIG4   9.137432e-10 9.737304e-06
## ENSG00000225630.1 MTND2P28 3.244325e-07 2.304877e-03
## ENSG00000078053.17 AMPH    6.116527e-07 3.259039e-03
## ENSG00000186081.12 KRT5    8.441599e-07 3.598316e-03
## ENSG00000198794.12 SCAMP5  1.026427e-06 3.646041e-03
## ENSG00000131016.17 AKAP12  1.309367e-06 3.986649e-03
## ENSG00000211935.3 IGHV1-3  4.250123e-06 1.000198e-02
## ENSG00000179348.12 GATA2   4.348936e-06 1.000198e-02
## ENSG00000158715.6 SLC45A3  5.003516e-06 1.000198e-02

mean(abs(dge$stat))

## [1] 0.6749622

avb_crphi_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + treatment_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 27 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange     lfcSE      stat
## ENSG00000155659.15 VSIG4   1247.57497     -2.2950320 0.4103782 -5.592480
## ENSG00000186081.12 KRT5      14.48988      2.2017581 0.4208578  5.231596
## ENSG00000244116.3 IGKV2-28   94.97033      1.5259902 0.2978606  5.123169
## ENSG00000078053.17 AMPH     167.98498     -1.3607277 0.2682920 -5.071816
## ENSG00000198794.12 SCAMP5    65.44507      1.0198080 0.2040033  4.998977
## ENSG00000100453.13 GZMB    1428.85140      0.7408132 0.1523886  4.861341
## ENSG00000111249.14 CUX2      26.61410      1.3377186 0.2819178  4.745066
## ENSG00000132465.12 JCHAIN  1006.89958      1.1477793 0.2442314  4.699557
## ENSG00000211644.3 IGLV1-51  142.58626      1.2413126 0.2645593  4.692001
## ENSG00000211648.2 IGLV1-47  132.22632      1.2555063 0.2699166  4.651460
##                                  pvalue         padj
## ENSG00000155659.15 VSIG4   2.238485e-08 0.0004770883
## ENSG00000186081.12 KRT5    1.680525e-07 0.0017908519
## ENSG00000244116.3 IGKV2-28 3.004422e-07 0.0020995293
## ENSG00000078053.17 AMPH    3.940373e-07 0.0020995293
## ENSG00000198794.12 SCAMP5  5.763528e-07 0.0024567613
## ENSG00000100453.13 GZMB    1.165930e-06 0.0041415789
## ENSG00000111249.14 CUX2    2.084384e-06 0.0063463532
## ENSG00000132465.12 JCHAIN  2.607268e-06 0.0064068377
## ENSG00000211644.3 IGLV1-51 2.705463e-06 0.0064068377
## ENSG00000211648.2 IGLV1-47 3.295937e-06 0.0070246314

mean(abs(dge$stat))

## [1] 0.7716238

avb_crphi_pod1_adj <- dge

CRP Group Comparisons statified

CRP low vs high at t=0 treatment group A

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2 <- subset(ss2,treatment_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 116 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange      lfcSE      stat
## ENSG00000211640.4 IGLV6-57      73.94116     -0.5012755 0.09503047 -5.274892
## ENSG00000211644.3 IGLV1-51     249.02248     -0.6160909 0.11760002 -5.238867
## ENSG00000211652.2 IGLV7-43      52.51336     -0.6502306 0.12553982 -5.179477
## ENSG00000211655.3 IGLV1-36      15.50197     -0.6064267 0.12419400 -4.882899
## ENSG00000279359.1 RP11-36D19.9  43.68155     -1.1247960 0.24645555 -4.563890
## ENSG00000263711.6 LINC02864     16.11890      0.3756337 0.08415429  4.463631
## ENSG00000211673.2 IGLV3-1      180.11226     -0.4099034 0.09521025 -4.305244
## ENSG00000113790.11 EHHADH       70.48409      0.1809165 0.04324047  4.183961
## ENSG00000087116.16 ADAMTS2     144.89306     -0.8669518 0.21120081 -4.104869
## ENSG00000211649.3 IGLV7-46      63.48682     -0.5888044 0.14456169 -4.073032
##                                      pvalue        padj
## ENSG00000211640.4 IGLV6-57     1.328340e-07 0.001595313
## ENSG00000211644.3 IGLV1-51     1.615655e-07 0.001595313
## ENSG00000211652.2 IGLV7-43     2.225087e-07 0.001595313
## ENSG00000211655.3 IGLV1-36     1.045375e-06 0.005621245
## ENSG00000279359.1 RP11-36D19.9 5.021440e-06 0.021601231
## ENSG00000263711.6 LINC02864    8.058233e-06 0.028887422
## ENSG00000211673.2 IGLV3-1      1.668016e-05 0.051253362
## ENSG00000113790.11 EHHADH      2.864731e-05 0.077021866
## ENSG00000087116.16 ADAMTS2     4.045433e-05 0.096681348
## ENSG00000211649.3 IGLV7-46     4.640502e-05 0.099812567

mean(abs(dge$stat))

## [1] 0.8542119

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 32 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000274611.4 TBC1D3         61.02093     -9.1421319 1.32357018 -6.907176
## ENSG00000278599.6 TBC1D3E        18.70446     -7.7732985 1.32279633 -5.876414
## ENSG00000280035.1 RP11-10J21.2   19.49861      0.4739069 0.10831763  4.375159
## ENSG00000203999.9 LINC01270     118.07792      0.3907644 0.09069852  4.308388
## ENSG00000203814.6 H2BC18         64.74206      0.4615320 0.10714744  4.307448
## ENSG00000211652.2 IGLV7-43       52.51336     -0.6209722 0.14601307 -4.252854
## ENSG00000211655.3 IGLV1-36       15.50197     -0.5985422 0.14468746 -4.136794
## ENSG00000225764.2 P3H2-AS1       10.20911      0.3674466 0.08926867  4.116188
## ENSG00000211679.2 IGLC3         802.12582     -0.6140036 0.14922781 -4.114539
## ENSG00000267303.1 CTD-2369P2.12  14.16784      1.8578040 0.45893943  4.048037
##                                       pvalue         padj
## ENSG00000274611.4 TBC1D3        4.943970e-12 1.084460e-07
## ENSG00000278599.6 TBC1D3E       4.192497e-09 4.598122e-05
## ENSG00000280035.1 RP11-10J21.2  1.213438e-05 7.245086e-02
## ENSG00000203999.9 LINC01270     1.644490e-05 7.245086e-02
## ENSG00000203814.6 H2BC18        1.651490e-05 7.245086e-02
## ENSG00000211652.2 IGLV7-43      2.110636e-05 7.716132e-02
## ENSG00000211655.3 IGLV1-36      3.521917e-05 9.455306e-02
## ENSG00000225764.2 P3H2-AS1      3.851906e-05 9.455306e-02
## ENSG00000211679.2 IGLC3         3.879542e-05 9.455306e-02
## ENSG00000267303.1 CTD-2369P2.12 5.164888e-05 1.132918e-01

mean(abs(dge$stat))

## [1] 0.8946569

crp_t0_a <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 64 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000159189.12 C1QC           45.19243    -0.61040271 0.11320809 -5.391865
## ENSG00000087116.16 ADAMTS2       144.89306    -0.85240097 0.18580458 -4.587621
## ENSG00000211679.2 IGLC3          802.12582    -0.62143112 0.13983515 -4.444027
## ENSG00000173372.17 C1QA          192.01764    -0.39137607 0.09031117 -4.333640
## ENSG00000203999.9 LINC01270      118.07792     0.22311851 0.05197879  4.292491
## ENSG00000173369.17 C1QB           92.28254    -0.43614730 0.10549032 -4.134477
## ENSG00000213614.11 HEXA         3639.45630    -0.06053357 0.01485216 -4.075741
## ENSG00000255446.1 CTD-2531D15.4   17.55993     0.76019150 0.18656840  4.074600
## ENSG00000225764.2 P3H2-AS1        10.20911     0.38295801 0.09524856  4.020618
## ENSG00000211652.2 IGLV7-43        52.51336    -0.56403860 0.14280690 -3.949659
##                                       pvalue        padj
## ENSG00000159189.12 C1QC         6.973031e-08 0.001529534
## ENSG00000087116.16 ADAMTS2      4.483259e-06 0.049170146
## ENSG00000211679.2 IGLC3         8.829069e-06 0.064555213
## ENSG00000173372.17 C1QA         1.466640e-05 0.077509273
## ENSG00000203999.9 LINC01270     1.766794e-05 0.077509273
## ENSG00000173369.17 C1QB         3.557638e-05 0.126382864
## ENSG00000213614.11 HEXA         4.586806e-05 0.126382864
## ENSG00000255446.1 CTD-2531D15.4 4.609359e-05 0.126382864
## ENSG00000225764.2 P3H2-AS1      5.804578e-05 0.141470463
## ENSG00000211652.2 IGLV7-43      7.826251e-05 0.164623107

mean(abs(dge$stat))

## [1] 0.9127956

crp_t0_a_adj <- dge

CRP low vs high at t=0 treatment group B

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2 <- subset(ss2,treatment_group==2)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 698 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange      lfcSE      stat
## ENSG00000274012.1 RN7SL2   1005.235348      0.3958682 0.08669141  4.566406
## ENSG00000276168.1 RN7SL1    552.306545      0.3503249 0.07971834  4.394533
## ENSG00000165029.17 ABCA1    676.864267     -0.2828435 0.07210273 -3.922786
## ENSG00000050767.18 COL23A1   32.027614      0.3304131 0.08555804  3.861860
## ENSG00000134321.13 RSAD2    562.578486     -0.3324054 0.08678497 -3.830218
## ENSG00000183117.19 CSMD1     45.694830     -0.4349344 0.11463824 -3.793973
## ENSG00000160179.19 ABCG1    431.970974     -0.1967463 0.05391060 -3.649492
## ENSG00000170153.11 RNF150     8.709392     -0.6451407 0.18034165 -3.577325
## ENSG00000196565.15 HBG2     259.037625      0.6586983 0.18849443  3.494524
## ENSG00000049247.14 UTS2      39.902866      0.3607631 0.10531292  3.425630
##                                  pvalue      padj
## ENSG00000274012.1 RN7SL2   4.961569e-06 0.1088320
## ENSG00000276168.1 RN7SL1   1.110112e-05 0.1217515
## ENSG00000165029.17 ABCA1   8.753099e-05 0.5419992
## ENSG00000050767.18 COL23A1 1.125272e-04 0.5419992
## ENSG00000134321.13 RSAD2   1.280297e-04 0.5419992
## ENSG00000183117.19 CSMD1   1.482560e-04 0.5419992
## ENSG00000160179.19 ABCG1   2.627594e-04 0.8233755
## ENSG00000170153.11 RNF150  3.471282e-04 0.9483616
## ENSG00000196565.15 HBG2    4.749080e-04 0.9483616
## ENSG00000049247.14 UTS2    6.133744e-04 0.9483616

mean(abs(dge$stat))

## [1] 0.8139089

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 10 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000261026.1 CTD-3247F14.2   17.85183     -1.5571374 0.32799079 -4.747503
## ENSG00000078114.19 NEBL           35.44421     -0.8971179 0.19693027 -4.555510
## ENSG00000181126.13 HLA-V         336.08002     -0.6596009 0.14626672 -4.509576
## ENSG00000243224.1 RP5-1157M23.2   44.63609      0.2250026 0.05160220  4.360330
## ENSG00000139287.13 TPH2           42.04591     -0.1966640 0.04960475 -3.964621
## ENSG00000123838.11 C4BPA          52.54006     -1.0745917 0.27120418 -3.962298
## ENSG00000152767.17 FARP1         160.37684     -0.1615427 0.04078672 -3.960669
## ENSG00000254681.6 PKD1P5        2181.95360      0.3720053 0.09477315  3.925218
## ENSG00000234200.2 U82671.8         7.77978     -3.3541119 0.85766640 -3.910742
## ENSG00000119922.11 IFIT2        1651.64063     -0.5395922 0.13801521 -3.909657
##                                       pvalue       padj
## ENSG00000261026.1 CTD-3247F14.2 2.059429e-06 0.04517357
## ENSG00000078114.19 NEBL         5.225851e-06 0.04749455
## ENSG00000181126.13 HLA-V        6.495722e-06 0.04749455
## ENSG00000243224.1 RP5-1157M23.2 1.298667e-05 0.07121566
## ENSG00000139287.13 TPH2         7.351253e-05 0.20273898
## ENSG00000123838.11 C4BPA        7.423193e-05 0.20273898
## ENSG00000152767.17 FARP1        7.474008e-05 0.20273898
## ENSG00000254681.6 PKD1P5        8.665112e-05 0.20273898
## ENSG00000234200.2 U82671.8      9.201305e-05 0.20273898
## ENSG00000119922.11 IFIT2        9.242716e-05 0.20273898

mean(abs(dge$stat))

## [1] 1.048837

crp_t0_b <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 37 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000074803.20 SLC12A1        35.08453      0.9232535 0.20493435  4.505118
## ENSG00000181126.13 HLA-V         336.08002     -0.6874493 0.15532026 -4.426012
## ENSG00000260447.1 RP11-304L19.2   11.22085      0.5616031 0.13834177  4.059534
## ENSG00000243224.1 RP5-1157M23.2   44.63609      0.1916944 0.04730875  4.051987
## ENSG00000102854.16 MSLN           53.97058     -0.8976728 0.22200779 -4.043429
## ENSG00000274012.1 RN7SL2        1005.23535      0.4319977 0.11522675  3.749110
## ENSG00000154620.6 TMSB4Y          57.87988     -0.3468061 0.09388817 -3.693821
## ENSG00000249021.1 CTC-505O3.3     12.00956     -0.2429091 0.06703903 -3.623398
## ENSG00000275329.1 RP11-83N9.6      9.92069      0.3085148 0.08571373  3.599363
## ENSG00000276168.1 RN7SL1         552.30655      0.3722778 0.10422593  3.571835
##                                       pvalue      padj
## ENSG00000074803.20 SLC12A1      6.633602e-06 0.1052785
## ENSG00000181126.13 HLA-V        9.599131e-06 0.1052785
## ENSG00000260447.1 RP11-304L19.2 4.917078e-05 0.2310855
## ENSG00000243224.1 RP5-1157M23.2 5.078452e-05 0.2310855
## ENSG00000102854.16 MSLN         5.267506e-05 0.2310855
## ENSG00000274012.1 RN7SL2        1.774633e-04 0.6487762
## ENSG00000154620.6 TMSB4Y        2.209090e-04 0.6895471
## ENSG00000249021.1 CTC-505O3.3   2.907582e-04 0.6895471
## ENSG00000275329.1 RP11-83N9.6   3.189981e-04 0.6895471
## ENSG00000276168.1 RN7SL1        3.544892e-04 0.6895471

mean(abs(dge$stat))

## [1] 0.9050121

crp_t0_b_adj <- dge

CRP low vs high at EOS treatment group A

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2 <- subset(ss2,treatment_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 147 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE       stat
## ENSG00000234200.2 U82671.8   23.74781     -8.4175144 0.77303756 -10.888881
## ENSG00000204936.10 CD177   2664.36071      1.3040566 0.17489685   7.456147
## ENSG00000139572.4 GPR84     194.83068      0.9793090 0.14085127   6.952788
## ENSG00000170525.21 PFKFB3  4648.61993      0.6889378 0.10435049   6.602152
## ENSG00000176597.12 B3GNT5   364.32833      0.6671879 0.10515716   6.344674
## ENSG00000132170.24 PPARG    105.03591      0.8466718 0.13451424   6.294291
## ENSG00000079385.23 CEACAM1 1083.04204      0.9180621 0.14625935   6.276946
## ENSG00000187775.17 DNAH17   603.73830      0.4485171 0.07177030   6.249342
## ENSG00000136634.7 IL10       75.57604      0.7835233 0.12565466   6.235529
## ENSG00000135916.16 ITM2C    660.76621     -0.3402779 0.05494465  -6.193103
##                                  pvalue         padj
## ENSG00000234200.2 U82671.8 1.302299e-27 2.873783e-23
## ENSG00000204936.10 CD177   8.908934e-14 9.829672e-10
## ENSG00000139572.4 GPR84    3.581374e-12 2.634339e-08
## ENSG00000170525.21 PFKFB3  4.052308e-11 2.235557e-07
## ENSG00000176597.12 B3GNT5  2.228974e-10 9.837353e-07
## ENSG00000132170.24 PPARG   3.088083e-10 1.088493e-06
## ENSG00000079385.23 CEACAM1 3.452870e-10 1.088493e-06
## ENSG00000187775.17 DNAH17  4.121860e-10 1.103974e-06
## ENSG00000136634.7 IL10     4.502544e-10 1.103974e-06
## ENSG00000135916.16 ITM2C   5.899107e-10 1.301756e-06

mean(abs(dge$stat))

## [1] 1.38808

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 15 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange      lfcSE      stat
## ENSG00000278599.6 TBC1D3E          21.72131     -9.5091840 1.36712905 -6.955586
## ENSG00000258035.2 RP11-74K11.2     20.51590      0.7829509 0.14913630  5.249902
## ENSG00000076356.7 PLXNA2          257.73317      0.4637443 0.09006613  5.148932
## ENSG00000204936.10 CD177         2664.36071      1.1813741 0.23038336  5.127862
## ENSG00000159339.13 PADI4        13293.00692      0.8773302 0.17145591  5.116943
## ENSG00000283345.1 CTD-3092A11.3    36.24736     -0.8647693 0.17126294 -5.049366
## ENSG00000235750.10 KIAA0040      3047.27341      0.3626779 0.07233427  5.013916
## ENSG00000211966.2 IGHV5-51        190.52981     -0.6341490 0.12802140 -4.953461
## ENSG00000176597.12 B3GNT5         364.32833      0.6767111 0.13768368  4.914969
## ENSG00000203999.9 LINC01270       154.60133      0.5209756 0.10632606  4.899793
##                                       pvalue         padj
## ENSG00000278599.6 TBC1D3E       3.510989e-12 7.597430e-08
## ENSG00000258035.2 RP11-74K11.2  1.521803e-07 1.343898e-03
## ENSG00000076356.7 PLXNA2        2.619744e-07 1.343898e-03
## ENSG00000204936.10 CD177        2.930508e-07 1.343898e-03
## ENSG00000159339.13 PADI4        3.105268e-07 1.343898e-03
## ENSG00000283345.1 CTD-3092A11.3 4.432785e-07 1.598684e-03
## ENSG00000235750.10 KIAA0040     5.333326e-07 1.648683e-03
## ENSG00000211966.2 IGHV5-51      7.290485e-07 1.971985e-03
## ENSG00000176597.12 B3GNT5       8.879630e-07 2.075999e-03
## ENSG00000203999.9 LINC01270     9.593786e-07 2.075999e-03

mean(abs(dge$stat))

## [1] 1.342191

crp_eos_a <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 40 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange      lfcSE      stat
## ENSG00000278599.6 TBC1D3E         21.721315    -10.2553792 1.57048944 -6.530053
## ENSG00000283345.1 CTD-3092A11.3   36.247361     -1.0143138 0.19599438 -5.175219
## ENSG00000282339.1 LLNLF-176F2.1    8.210905     -7.0092066 1.56534067 -4.477752
## ENSG00000228668.1 TRGV5P          68.256402     -0.6121273 0.14417889 -4.245609
## ENSG00000102524.12 TNFSF13B     1626.167180      0.2026085 0.04878493  4.153095
## ENSG00000233937.7 CTC-338M12.4   139.717383     -0.1962152 0.04828903 -4.063350
## ENSG00000087116.16 ADAMTS2       779.297104     -0.8224640 0.21388124 -3.845424
## ENSG00000204001.10 LCN8           74.503582      1.3227176 0.35412841  3.735135
## ENSG00000260271.3 RP1-45N11.1     58.349444      0.2237876 0.05997524  3.731333
## ENSG00000258035.2 RP11-74K11.2    20.515904      0.4374891 0.12062897  3.626734
##                                       pvalue         padj
## ENSG00000278599.6 TBC1D3E       6.574645e-11 1.450827e-06
## ENSG00000283345.1 CTD-3092A11.3 2.276443e-07 2.511713e-03
## ENSG00000282339.1 LLNLF-176F2.1 7.543333e-06 5.548624e-02
## ENSG00000228668.1 TRGV5P        2.180001e-05 1.202652e-01
## ENSG00000102524.12 TNFSF13B     3.280081e-05 1.447631e-01
## ENSG00000233937.7 CTC-338M12.4  4.837345e-05 1.779095e-01
## ENSG00000087116.16 ADAMTS2      1.203443e-04 3.793770e-01
## ENSG00000204001.10 LCN8         1.876142e-04 4.670096e-01
## ENSG00000260271.3 RP1-45N11.1   1.904693e-04 4.670096e-01
## ENSG00000258035.2 RP11-74K11.2  2.870291e-04 5.974092e-01

mean(abs(dge$stat))

## [1] 0.8431272

crp_eos_a_adj <- dge

CRP low vs high at EOS treatment group B

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2 <- subset(ss2,treatment_group==2)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 128 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000254873.1 RP11-770J1.5    47.54511     -0.5668914 0.10956959 -5.173802
## ENSG00000224370.1 RP11-814E24.3   49.76184      0.4472603 0.09003302  4.967737
## ENSG00000241860.7 RP11-34P13.13 4903.99846      0.3733341 0.07548746  4.945644
## ENSG00000236911.6 RP11-78B10.2    74.74105      0.5093406 0.10317766  4.936539
## ENSG00000238035.8 AC138035.2    1313.25226      0.3740127 0.07824059  4.780290
## ENSG00000280279.1 LINC02887      485.42034      0.3717109 0.07800330  4.765323
## ENSG00000101187.16 SLCO4A1        97.59953      0.5650296 0.11859698  4.764283
## ENSG00000260528.5 FAM157C       3056.40251      0.3665312 0.07752802  4.727726
## ENSG00000230724.9 LINC01001     3269.93917      0.3263093 0.06964405  4.685387
## ENSG00000264769.1 RP11-498C9.12   41.75148      0.2941786 0.06310258  4.661910
##                                       pvalue        padj
## ENSG00000254873.1 RP11-770J1.5  2.293775e-07 0.004386980
## ENSG00000224370.1 RP11-814E24.3 6.773878e-07 0.004386980
## ENSG00000241860.7 RP11-34P13.13 7.589257e-07 0.004386980
## ENSG00000236911.6 RP11-78B10.2  7.952109e-07 0.004386980
## ENSG00000238035.8 AC138035.2    1.750423e-06 0.005974689
## ENSG00000280279.1 LINC02887     1.885513e-06 0.005974689
## ENSG00000101187.16 SLCO4A1      1.895265e-06 0.005974689
## ENSG00000260528.5 FAM157C       2.270486e-06 0.006262853
## ENSG00000230724.9 LINC01001     2.794320e-06 0.006349407
## ENSG00000264769.1 RP11-498C9.12 3.132880e-06 0.006349407

mean(abs(dge$stat))

## [1] 1.041369

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 6 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000181126.13 HLA-V        396.17882     -0.8065180 0.16968812 -4.752943
## ENSG00000254873.1 RP11-770J1.5   47.54511     -0.5826915 0.13036812 -4.469586
## ENSG00000175874.10 CREG2         15.85760      0.3267149 0.07358462  4.439989
## ENSG00000074803.20 SLC12A1       36.66028      0.8797267 0.20623352  4.265682
## ENSG00000147852.17 VLDLR         60.44142      0.2384517 0.05590948  4.264959
## ENSG00000136274.9 NACAD          13.87223     -0.6254904 0.14828300 -4.218221
## ENSG00000158321.18 AUTS2       1107.33233      0.2789351 0.07162732  3.894256
## ENSG00000241484.10 ARHGAP8       48.02920      0.3161707 0.08197094  3.857107
## ENSG00000179841.8 AKAP5         164.98261      0.2475874 0.06420904  3.855958
## ENSG00000043514.17 TRIT1        438.15726     -0.1127517 0.02930561 -3.847443
##                                      pvalue       padj
## ENSG00000181126.13 HLA-V       2.004768e-06 0.04423922
## ENSG00000254873.1 RP11-770J1.5 7.837117e-06 0.06617416
## ENSG00000175874.10 CREG2       8.996352e-06 0.06617416
## ENSG00000074803.20 SLC12A1     1.992921e-05 0.08824094
## ENSG00000147852.17 VLDLR       1.999387e-05 0.08824094
## ENSG00000136274.9 NACAD        2.462376e-05 0.09056209
## ENSG00000158321.18 AUTS2       9.850051e-05 0.26338539
## ENSG00000241484.10 ARHGAP8     1.147368e-04 0.26338539
## ENSG00000179841.8 AKAP5        1.152772e-04 0.26338539
## ENSG00000043514.17 TRIT1       1.193571e-04 0.26338539

mean(abs(dge$stat))

## [1] 0.8858679

crp_eos_b <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 43 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000074803.20 SLC12A1       36.66028      0.9974386 0.22461306  4.440697
## ENSG00000099139.14 PCSK5       1284.53014      0.3871194 0.09428587  4.105805
## ENSG00000175874.10 CREG2         15.85760      0.2917051 0.07128112  4.092320
## ENSG00000112799.9 LY86          950.19080     -0.1665188 0.04275946 -3.894314
## ENSG00000181126.13 HLA-V        396.17882     -0.7233685 0.18577054 -3.893882
## ENSG00000167680.17 SEMA6B       577.15312      0.6250133 0.16781047  3.724519
## ENSG00000225813.1 AC009299.4     13.97230      0.6532641 0.17553863  3.721484
## ENSG00000264522.6 OTUD7B        213.51620      0.1084653 0.02937036  3.693019
## ENSG00000254873.1 RP11-770J1.5   47.54511     -0.4819888 0.13089433 -3.682274
## ENSG00000188599.17 NPIPP1       219.50741     -0.1879393 0.05163696 -3.639628
##                                      pvalue      padj
## ENSG00000074803.20 SLC12A1     8.966780e-06 0.1978699
## ENSG00000099139.14 PCSK5       4.029098e-05 0.3141451
## ENSG00000175874.10 CREG2       4.270791e-05 0.3141451
## ENSG00000112799.9 LY86         9.847704e-05 0.4353942
## ENSG00000181126.13 HLA-V       9.865279e-05 0.4353942
## ENSG00000167680.17 SEMA6B      1.956880e-04 0.5667857
## ENSG00000225813.1 AC009299.4   1.980558e-04 0.5667857
## ENSG00000264522.6 OTUD7B       2.216079e-04 0.5667857
## ENSG00000254873.1 RP11-770J1.5 2.311629e-04 0.5667857
## ENSG00000188599.17 NPIPP1      2.730319e-04 0.6024996

mean(abs(dge$stat))

## [1] 0.812993

crp_eos_b_adj <- dge

CRP low vs high at POD treatment group A

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2 <- subset(ss2,treatment_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 189 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000007968.7 E2F2      650.98716      0.3793948 0.06004490  6.318518
## ENSG00000137869.15 CYP19A1   60.07323      0.9765750 0.15547632  6.281181
## ENSG00000157064.11 NMNAT2    41.43058      0.4835623 0.07985053  6.055844
## ENSG00000229647.2 MYOSLID    64.61066      0.3230300 0.05367352  6.018424
## ENSG00000165092.13 ALDH1A1  556.69422     -0.4848466 0.08067903 -6.009574
## ENSG00000145287.11 PLAC8   3561.63102      0.2630043 0.04382192  6.001663
## ENSG00000138821.14 SLC39A8  521.43838      0.2641136 0.04467962  5.911276
## ENSG00000132170.24 PPARG    144.66113      0.5565381 0.09494223  5.861861
## ENSG00000198018.7 ENTPD7    368.85840      0.2161036 0.03790805  5.700732
## ENSG00000116016.14 EPAS1    116.95950      0.3548271 0.06246220  5.680670
##                                  pvalue         padj
## ENSG00000007968.7 E2F2     2.640843e-10 2.955877e-06
## ENSG00000137869.15 CYP19A1 3.360096e-10 2.955877e-06
## ENSG00000157064.11 NMNAT2  1.396832e-09 5.727059e-06
## ENSG00000229647.2 MYOSLID  1.761237e-09 5.727059e-06
## ENSG00000165092.13 ALDH1A1 1.860109e-09 5.727059e-06
## ENSG00000145287.11 PLAC8   1.953072e-09 5.727059e-06
## ENSG00000138821.14 SLC39A8 3.394684e-09 8.532295e-06
## ENSG00000132170.24 PPARG   4.577089e-09 1.006616e-05
## ENSG00000198018.7 ENTPD7   1.192940e-08 2.332066e-05
## ENSG00000116016.14 EPAS1   1.341684e-08 2.360559e-05

mean(abs(dge$stat))

## [1] 1.199973

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 24 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                             baseMean log2FoldChange      lfcSE       stat
## ENSG00000274611.4 TBC1D3    51.46372    -29.9971014 1.33164948 -22.526274
## ENSG00000278599.6 TBC1D3E   16.34998    -24.6438767 1.33044880 -18.522980
## ENSG00000124508.17 BTN2A2 1048.79703     -0.1524054 0.02952679  -5.161599
## ENSG00000215883.11 CYB5RL  377.01565     -0.1494372 0.02926069  -5.107098
## ENSG00000145287.11 PLAC8  3561.63102      0.2386112 0.05025490   4.748019
## ENSG00000116016.14 EPAS1   116.95950      0.3652741 0.07758263   4.708195
## ENSG00000157064.11 NMNAT2   41.43058      0.4652492 0.10224666   4.550263
## ENSG00000132170.24 PPARG   144.66113      0.5281924 0.11718850   4.507203
## ENSG00000128928.10 IVD    1354.85885     -0.1506996 0.03353677  -4.493564
## ENSG00000229647.2 MYOSLID   64.61066      0.2852099 0.06372435   4.475682
##                                  pvalue          padj
## ENSG00000274611.4 TBC1D3  2.294649e-112 3.942437e-108
## ENSG00000278599.6 TBC1D3E  1.347666e-76            NA
## ENSG00000124508.17 BTN2A2  2.448497e-07  1.873552e-03
## ENSG00000215883.11 CYB5RL  3.271437e-07  1.873552e-03
## ENSG00000145287.11 PLAC8   2.054189e-06  8.587756e-03
## ENSG00000116016.14 EPAS1   2.499202e-06  8.587756e-03
## ENSG00000157064.11 NMNAT2  5.357881e-06  1.454044e-02
## ENSG00000132170.24 PPARG   6.568765e-06  1.454044e-02
## ENSG00000128928.10 IVD     7.004101e-06  1.454044e-02
## ENSG00000229647.2 MYOSLID  7.616782e-06  1.454044e-02

mean(abs(dge$stat))

## [1] 0.9984722

crp_pod1_a <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 43 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange      lfcSE      stat
## ENSG00000157064.11 NMNAT2     41.43058      0.4307897 0.07049745  6.110713
## ENSG00000137869.15 CYP19A1    60.07323      0.7601826 0.12811907  5.933407
## ENSG00000132170.24 PPARG     144.66113      0.4814874 0.08507043  5.659869
## ENSG00000188404.10 SELL    17514.30914      0.2026588 0.03680197  5.506737
## ENSG00000116016.14 EPAS1     116.95950      0.3446736 0.06274336  5.493387
## ENSG00000145287.11 PLAC8    3561.63102      0.2334478 0.04478158  5.213032
## ENSG00000124508.17 BTN2A2   1048.79703     -0.1590625 0.03058358 -5.200911
## ENSG00000148926.10 ADM      1409.50883      0.3668759 0.07139352  5.138785
## ENSG00000121316.11 PLBD1   16469.89344      0.2213555 0.04552841  4.861920
## ENSG00000213694.6 S1PR3     1119.21786     -0.2721949 0.05603887 -4.857252
##                                  pvalue         padj
## ENSG00000157064.11 NMNAT2  9.918731e-10 1.622109e-05
## ENSG00000137869.15 CYP19A1 2.967124e-09 2.426217e-05
## ENSG00000132170.24 PPARG   1.514889e-08 8.258167e-05
## ENSG00000188404.10 SELL    3.655462e-08 1.289671e-04
## ENSG00000116016.14 EPAS1   3.942984e-08 1.289671e-04
## ENSG00000145287.11 PLAC8   1.857793e-07 4.633183e-04
## ENSG00000124508.17 BTN2A2  1.983140e-07 4.633183e-04
## ENSG00000148926.10 ADM     2.765205e-07 5.652770e-04
## ENSG00000121316.11 PLBD1   1.162527e-06 1.835260e-03
## ENSG00000213694.6 S1PR3    1.190260e-06 1.835260e-03

mean(abs(dge$stat))

## [1] 1.026972

crp_pod1_a_adj <- dge

CRP low vs high at POD1 treatment group B

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2 <- subset(ss2,treatment_group==2)
mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ crp_group )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 155 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE     stat
## ENSG00000163710.9 PCOLCE2        25.62512      1.3200438 0.14618904 9.029704
## ENSG00000108950.12 FAM20A      2102.38150      0.6575075 0.07427370 8.852494
## ENSG00000100985.7 MMP9        18630.15953      1.8475668 0.21250458 8.694245
## ENSG00000007968.7 E2F2         1048.78071      0.4485439 0.05226883 8.581478
## ENSG00000137869.15 CYP19A1       99.79869      1.1020306 0.13128996 8.393869
## ENSG00000132170.24 PPARG        188.27369      0.5948070 0.07128398 8.344190
## ENSG00000204044.6 SLC12A5-AS1   114.09553      1.7590874 0.21836871 8.055583
## ENSG00000104918.8 RETN         2356.87387      0.9009685 0.11335111 7.948475
## ENSG00000170439.7 METTL7B       242.42694      0.8427309 0.10841597 7.773125
## ENSG00000135424.18 ITGA7        522.88039      0.5671980 0.07353784 7.713009
##                                     pvalue         padj
## ENSG00000163710.9 PCOLCE2     1.721371e-19 3.668585e-15
## ENSG00000108950.12 FAM20A     8.558475e-19 9.119911e-15
## ENSG00000100985.7 MMP9        3.491438e-18 2.480318e-14
## ENSG00000007968.7 E2F2        9.366156e-18 4.990288e-14
## ENSG00000137869.15 CYP19A1    4.704022e-17 2.005042e-13
## ENSG00000132170.24 PPARG      7.170384e-17 2.546921e-13
## ENSG00000204044.6 SLC12A5-AS1 7.910065e-16 2.408276e-12
## ENSG00000104918.8 RETN        1.888211e-15 5.030195e-12
## ENSG00000170439.7 METTL7B     7.657300e-15 1.813249e-11
## ENSG00000135424.18 ITGA7      1.228856e-14 2.618938e-11

mean(abs(dge$stat))

## [1] 1.485697

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 9 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE     stat
## ENSG00000163710.9 PCOLCE2    25.62512      1.2259018 0.16300795 7.520503
## ENSG00000108950.12 FAM20A  2102.38150      0.5855328 0.08329079 7.029983
## ENSG00000007968.7 E2F2     1048.78071      0.3962539 0.05925388 6.687392
## ENSG00000104918.8 RETN     2356.87387      0.6987862 0.11335834 6.164401
## ENSG00000132170.24 PPARG    188.27369      0.4927135 0.08019928 6.143615
## ENSG00000135424.18 ITGA7    522.88039      0.5073466 0.08559086 5.927579
## ENSG00000169994.19 MYO7B    752.03451      0.3289064 0.05770625 5.699667
## ENSG00000170439.7 METTL7B   242.42694      0.7107035 0.12800229 5.552272
## ENSG00000050767.18 COL23A1   59.82685      0.4828160 0.08728568 5.531446
## ENSG00000137869.15 CYP19A1   99.79869      0.8331307 0.15065203 5.530166
##                                  pvalue         padj
## ENSG00000163710.9 PCOLCE2  5.456575e-14 1.095298e-09
## ENSG00000108950.12 FAM20A  2.065586e-12 2.073126e-08
## ENSG00000007968.7 E2F2     2.271825e-11 1.520078e-07
## ENSG00000104918.8 RETN     7.075058e-10 3.238341e-06
## ENSG00000132170.24 PPARG   8.066410e-10 3.238341e-06
## ENSG00000135424.18 ITGA7   3.074338e-09 1.028520e-05
## ENSG00000169994.19 MYO7B   1.200415e-08 3.442275e-05
## ENSG00000170439.7 METTL7B  2.819810e-08 6.421919e-05
## ENSG00000050767.18 COL23A1 3.176018e-08 6.421919e-05
## ENSG00000137869.15 CYP19A1 3.199282e-08 6.421919e-05

mean(abs(dge$stat))

## [1] 1.126178

crp_pod1_b <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 11 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000108950.12 FAM20A  2102.38150      0.5299315 0.07604550  6.968611
## ENSG00000163710.9 PCOLCE2    25.62512      0.9580851 0.14903229  6.428708
## ENSG00000132170.24 PPARG    188.27369      0.3979331 0.06842514  5.815597
## ENSG00000007968.7 E2F2     1048.78071      0.3171167 0.05559108  5.704454
## ENSG00000050767.18 COL23A1   59.82685      0.4708113 0.08827758  5.333306
## ENSG00000169994.19 MYO7B    752.03451      0.2899857 0.05465457  5.305789
## ENSG00000165092.13 ALDH1A1  290.37570     -0.4508214 0.08776359 -5.136770
## ENSG00000135424.18 ITGA7    522.88039      0.4244227 0.08275511  5.128659
## ENSG00000101187.16 SLCO4A1   83.64489      0.3936650 0.07837397  5.022906
## ENSG00000104918.8 RETN     2356.87387      0.4949748 0.10112306  4.894776
##                                  pvalue         padj
## ENSG00000108950.12 FAM20A  3.200849e-12 6.425064e-08
## ENSG00000163710.9 PCOLCE2  1.286929e-10 1.291626e-06
## ENSG00000132170.24 PPARG   6.041783e-09 4.042557e-05
## ENSG00000007968.7 E2F2     1.167168e-08 5.857141e-05
## ENSG00000050767.18 COL23A1 9.644083e-08 3.753197e-04
## ENSG00000169994.19 MYO7B   1.121864e-07 3.753197e-04
## ENSG00000165092.13 ALDH1A1 2.795007e-07 7.321965e-04
## ENSG00000135424.18 ITGA7   2.918135e-07 7.321965e-04
## ENSG00000101187.16 SLCO4A1 5.089559e-07 1.135141e-03
## ENSG00000104918.8 RETN     9.841752e-07 1.975535e-03

mean(abs(dge$stat))

## [1] 0.9390948

crp_pod1_b_adj <- dge

Sex differences in low CRP group (not stratified for treatment group)

SexD: 1=Female and 2=Male I confirmed with this expresion data

T0

No correction for treatment group.

#load chromossome2gene table
chr2gene <- read.table("../ref/chr2gene.tsv")
xyg <- subset(chr2gene,V1=="chrX" | V1=="chrY")

mx <- xt0

dim(mx)

## [1] 60649   111

mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
dim(mx)

## [1] 57660   111

ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2 <- subset(ss2,crp_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 21291    56

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 344 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309      33.64159      2.3288688 0.1992312 11.689275
## ENSG00000223078.1 RNU2-55P       14.21314      2.1704799 0.2226793  9.747112
## ENSG00000287059.1 RP11-14A10.1   32.94434      1.8071869 0.2310635  7.821169
## ENSG00000249036.1 RP11-625I7.1   28.25079     -1.5570247 0.2110902 -7.376110
## ENSG00000280384.1 RP4-695O20.1   17.21499      0.8828052 0.1408053  6.269688
## ENSG00000196415.10 PRTN3        119.82185      2.4998558 0.4519385  5.531407
## ENSG00000247081.8 BAALC-AS1      32.75231      0.7696459 0.1396175  5.512532
## ENSG00000205611.5 LINC01597      73.26068      0.7738234 0.1414218  5.471742
## ENSG00000287763.1 RP11-153P14.1  75.91258     -2.5117536 0.4793555 -5.239855
## ENSG00000164821.5 DEFA4         258.59435      2.1153822 0.4041379  5.234308
##                                       pvalue         padj
## ENSG00000234551.2 LINC01309     1.446156e-31 3.079011e-27
## ENSG00000223078.1 RNU2-55P      1.897902e-22 2.020412e-18
## ENSG00000287059.1 RP11-14A10.1  5.233482e-15 3.714202e-11
## ENSG00000249036.1 RP11-625I7.1  1.629809e-13 8.675068e-10
## ENSG00000280384.1 RP4-695O20.1  3.617720e-10 1.540498e-06
## ENSG00000196415.10 PRTN3        3.176724e-08 1.075830e-04
## ENSG00000247081.8 BAALC-AS1     3.537085e-08 1.075830e-04
## ENSG00000205611.5 LINC01597     4.456328e-08 1.185996e-04
## ENSG00000287763.1 RP11-153P14.1 1.607025e-07 3.525872e-04
## ENSG00000164821.5 DEFA4         1.656039e-07 3.525872e-04

mean(abs(dge$stat))

## [1] 0.8831267

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 11 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000234551.2 LINC01309      33.64159      2.4713038 0.23173499 10.664353
## ENSG00000223078.1 RNU2-55P       14.21314      2.1079567 0.25785314  8.175028
## ENSG00000287059.1 RP11-14A10.1   32.94434      1.7241550 0.26989561  6.388229
## ENSG00000249036.1 RP11-625I7.1   28.25079     -1.4908856 0.24549422 -6.072997
## ENSG00000205611.5 LINC01597      73.26068      0.8616120 0.15206704  5.666001
## ENSG00000280384.1 RP4-695O20.1   17.21499      0.8767754 0.16064316  5.457907
## ENSG00000261795.1 RP11-90P13.1   15.89334     -3.3636765 0.67220055 -5.003978
## ENSG00000184385.2 UMODL1-AS1     14.24879      3.4958582 0.72614584  4.814265
## ENSG00000196415.10 PRTN3        119.82185      2.3118201 0.48338320  4.782583
## ENSG00000128872.10 TMOD2       1918.82328     -0.4642050 0.09895551 -4.691047
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    1.494350e-26 3.181620e-22
## ENSG00000223078.1 RNU2-55P     2.957964e-16 3.148900e-12
## ENSG00000287059.1 RP11-14A10.1 1.678178e-10 1.191003e-06
## ENSG00000249036.1 RP11-625I7.1 1.255450e-09 6.682449e-06
## ENSG00000205611.5 LINC01597    1.461687e-08 6.224154e-05
## ENSG00000280384.1 RP4-695O20.1 4.817810e-08 1.709600e-04
## ENSG00000261795.1 RP11-90P13.1 5.615930e-07 1.708125e-03
## ENSG00000184385.2 UMODL1-AS1   1.477430e-06 3.931995e-03
## ENSG00000196415.10 PRTN3       1.730573e-06 4.093959e-03
## ENSG00000128872.10 TMOD2       2.718100e-06 5.787108e-03

mean(abs(dge$stat))

## [1] 0.9241176

mvf_lo_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 32 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     33.64159      2.4518641 0.2491134  9.842363
## ENSG00000223078.1 RNU2-55P      14.21314      2.0931762 0.2679161  7.812805
## ENSG00000249036.1 RP11-625I7.1  28.25079     -1.5651460 0.2478516 -6.314851
## ENSG00000287059.1 RP11-14A10.1  32.94434      1.6366468 0.2682068  6.102182
## ENSG00000205611.5 LINC01597     73.26068      0.7846061 0.1529007  5.131476
## ENSG00000110203.9 FOLR3        799.87269      1.6227754 0.3273100  4.957916
## ENSG00000261795.1 RP11-90P13.1  15.89334     -3.3514359 0.6905475 -4.853302
## ENSG00000196415.10 PRTN3       119.82185      2.4205744 0.5002323  4.838901
## ENSG00000280384.1 RP4-695O20.1  17.21499      0.8158338 0.1693220  4.818240
## ENSG00000165029.17 ABCA1       721.24100     -0.7541062 0.1577748 -4.779636
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    7.395309e-23 1.543993e-18
## ENSG00000223078.1 RNU2-55P     5.592889e-15 5.838417e-11
## ENSG00000249036.1 RP11-625I7.1 2.704224e-10 1.881960e-06
## ENSG00000287059.1 RP11-14A10.1 1.046300e-09 5.461163e-06
## ENSG00000205611.5 LINC01597    2.874783e-07 1.200394e-03
## ENSG00000110203.9 FOLR3        7.125324e-07 2.479375e-03
## ENSG00000261795.1 RP11-90P13.1 1.214223e-06 3.358953e-03
## ENSG00000196415.10 PRTN3       1.305591e-06 3.358953e-03
## ENSG00000280384.1 RP4-695O20.1 1.448304e-06 3.358953e-03
## ENSG00000165029.17 ABCA1       1.756131e-06 3.358953e-03

mean(abs(dge$stat))

## [1] 1.053754

mvf_lo_t0_adj <- dge

dim(subset(mvf_lo_t0,padj<0.05))

## [1] 19 62

EOS

No correction for treatment group.

#load chromossome2gene table
chr2gene <- read.table("../ref/chr2gene.tsv")
xyg <- subset(chr2gene,V1=="chrX" | V1=="chrY")

mx <- xeos

dim(mx)

## [1] 60649    98

mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
dim(mx)

## [1] 57660    98

ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2 <- subset(ss2,crp_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 21512    46

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 106 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309    32.07936       2.626331 0.2043337 12.853148
## ENSG00000223078.1 RNU2-55P     16.91390       2.317063 0.2134552 10.855030
## ENSG00000287059.1 RP11-14A10.1 31.85305       2.343386 0.2446115  9.580030
## ENSG00000249036.1 RP11-625I7.1 25.07900      -1.818686 0.1976662 -9.200793
## ENSG00000241111.1 PRICKLE2-AS1 13.60330       1.929835 0.2331559  8.277016
## ENSG00000261618.2 LINC02605    34.62957       1.152047 0.1606795  7.169844
## ENSG00000280384.1 RP4-695O20.1 17.77907       1.061146 0.1746189  6.076926
## ENSG00000279319.1 RP11-693M3.1 16.85722       1.165895 0.2188475  5.327430
## ENSG00000284692.2 RP1-58B11.2  15.70704       1.211100 0.2331978  5.193448
## ENSG00000159212.13 CLIC6       13.32563       1.538270 0.3144323  4.892215
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    8.257972e-38 1.776455e-33
## ENSG00000223078.1 RNU2-55P     1.887450e-27 2.030141e-23
## ENSG00000287059.1 RP11-14A10.1 9.701656e-22 6.956734e-18
## ENSG00000249036.1 RP11-625I7.1 3.553184e-20 1.910902e-16
## ENSG00000241111.1 PRICKLE2-AS1 1.263003e-16 5.433946e-13
## ENSG00000261618.2 LINC02605    7.508315e-13 2.691981e-09
## ENSG00000280384.1 RP4-695O20.1 1.225079e-09 3.764843e-06
## ENSG00000279319.1 RP11-693M3.1 9.961199e-08 2.678566e-04
## ENSG00000284692.2 RP1-58B11.2  2.064350e-07 4.934255e-04
## ENSG00000159212.13 CLIC6       9.970758e-07 2.144910e-03

mean(abs(dge$stat))

## [1] 0.9159988

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309    32.07936       2.589031 0.2311757 11.199412
## ENSG00000223078.1 RNU2-55P     16.91390       2.527264 0.2374445 10.643597
## ENSG00000287059.1 RP11-14A10.1 31.85305       2.572627 0.2805731  9.169186
## ENSG00000241111.1 PRICKLE2-AS1 13.60330       2.013881 0.2721151  7.400843
## ENSG00000249036.1 RP11-625I7.1 25.07900      -1.735733 0.2349588 -7.387393
## ENSG00000261618.2 LINC02605    34.62957       1.170551 0.1797314  6.512778
## ENSG00000184385.2 UMODL1-AS1   59.58609       5.093233 0.9088449  5.604073
## ENSG00000280384.1 RP4-695O20.1 17.77907       1.056001 0.1916923  5.508834
## ENSG00000205611.5 LINC01597    94.98889       1.075723 0.2276478  4.725385
## ENSG00000284692.2 RP1-58B11.2  15.70704       1.226824 0.2615174  4.691173
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    4.104502e-29 8.829604e-25
## ENSG00000223078.1 RNU2-55P     1.867768e-26 2.008971e-22
## ENSG00000287059.1 RP11-14A10.1 4.766060e-20 3.417583e-16
## ENSG00000241111.1 PRICKLE2-AS1 1.353225e-13 6.442204e-10
## ENSG00000249036.1 RP11-625I7.1 1.497351e-13 6.442204e-10
## ENSG00000261618.2 LINC02605    7.377374e-11 2.645034e-07
## ENSG00000184385.2 UMODL1-AS1   2.093725e-08 6.434317e-05
## ENSG00000280384.1 RP4-695O20.1 3.612176e-08 9.713142e-05
## ENSG00000205611.5 LINC01597    2.296802e-06 5.464631e-03
## ENSG00000284692.2 RP1-58B11.2  2.716431e-06 5.464631e-03

mean(abs(dge$stat))

## [1] 1.04124

mvf_lo_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 44 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309      32.07936      2.5687275 0.2647720  9.701658
## ENSG00000223078.1 RNU2-55P       16.91390      2.4648245 0.2728490  9.033659
## ENSG00000287059.1 RP11-14A10.1   31.85305      2.4178005 0.2710450  8.920291
## ENSG00000249036.1 RP11-625I7.1   25.07900     -1.8549612 0.2709654 -6.845748
## ENSG00000241111.1 PRICKLE2-AS1   13.60330      1.8163376 0.2918429  6.223682
## ENSG00000261618.2 LINC02605      34.62957      1.1442358 0.2006206  5.703481
## ENSG00000205611.5 LINC01597      94.98889      1.0459231 0.1887536  5.541208
## ENSG00000165617.15 DACT1        178.23996     -1.3978545 0.2563630 -5.452638
## ENSG00000157985.19 AGAP1        505.51722      0.8943585 0.1906237  4.691749
## ENSG00000287763.1 RP11-153P14.1  61.09995     -3.3614571 0.7429108 -4.524712
##                                       pvalue         padj
## ENSG00000234551.2 LINC01309     2.966372e-22 6.257266e-18
## ENSG00000223078.1 RNU2-55P      1.660257e-19 1.751074e-15
## ENSG00000287059.1 RP11-14A10.1  4.650657e-19 3.270032e-15
## ENSG00000249036.1 RP11-625I7.1  7.607712e-12 4.011927e-08
## ENSG00000241111.1 PRICKLE2-AS1  4.856196e-10 2.048732e-06
## ENSG00000261618.2 LINC02605     1.173850e-08 4.126866e-05
## ENSG00000205611.5 LINC01597     3.003928e-08 9.052122e-05
## ENSG00000165617.15 DACT1        4.962809e-08 1.308569e-04
## ENSG00000157985.19 AGAP1        2.708789e-06 6.348799e-03
## ENSG00000287763.1 RP11-153P14.1 6.047789e-06 1.275721e-02

mean(abs(dge$stat))

## [1] 0.8323054

mvf_lo_eos_adj <- dge

dim(subset(mvf_lo_eos,padj<0.05))

## [1] 33 52

POD1

No correction for treatment group.

#load chromossome2gene table
chr2gene <- read.table("../ref/chr2gene.tsv")
xyg <- subset(chr2gene,V1=="chrX" | V1=="chrY")

mx <- xpod1

dim(mx)

## [1] 60649   109

mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
dim(mx)

## [1] 57660   109

ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2 <- subset(ss2,crp_group==1)
mx <- mx[,colnames(mx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 20659    55

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 122 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE       stat
## ENSG00000234551.2 LINC01309     34.12360      2.4651830 0.1761162  13.997477
## ENSG00000249036.1 RP11-625I7.1  21.88654     -1.9476850 0.1672608 -11.644596
## ENSG00000223078.1 RNU2-55P      13.46670      2.4858941 0.2197302  11.313394
## ENSG00000251199.6 RP11-400D2.2  25.27517      1.2521909 0.1550861   8.074167
## ENSG00000241111.1 PRICKLE2-AS1  10.02828      1.7763327 0.2567832   6.917637
## ENSG00000287763.1 RP11-153P14.1 69.51778     -2.6560497 0.4655662  -5.704988
## ENSG00000287059.1 RP11-14A10.1  22.95559      1.3682563 0.2444267   5.597818
## ENSG00000261618.2 LINC02605     31.80415      0.8652806 0.1596389   5.420235
## ENSG00000078114.19 NEBL         33.02077      3.1015394 0.5786268   5.360172
## ENSG00000242741.2 LINC02005     15.47253      0.9387608 0.1777857   5.280294
##                                       pvalue         padj
## ENSG00000234551.2 LINC01309     1.615029e-44 3.336487e-40
## ENSG00000249036.1 RP11-625I7.1  2.444769e-31 2.525325e-27
## ENSG00000223078.1 RNU2-55P      1.126456e-29 7.757148e-26
## ENSG00000251199.6 RP11-400D2.2  6.793902e-16 3.508881e-12
## ENSG00000241111.1 PRICKLE2-AS1  4.592394e-12 1.897485e-08
## ENSG00000287763.1 RP11-153P14.1 1.163512e-08 4.006167e-05
## ENSG00000287059.1 RP11-14A10.1  2.170659e-08 6.406234e-05
## ENSG00000261618.2 LINC02605     5.952074e-08 1.537049e-04
## ENSG00000078114.19 NEBL         8.314272e-08 1.908495e-04
## ENSG00000242741.2 LINC02005     1.289770e-07 2.664537e-04

mean(abs(dge$stat))

## [1] 0.7233669

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 8 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309    34.12360      2.5895969 0.1983491 13.055751
## ENSG00000249036.1 RP11-625I7.1 21.88654     -1.8468544 0.1885919 -9.792862
## ENSG00000223078.1 RNU2-55P     13.46670      2.4049572 0.2521087  9.539366
## ENSG00000251199.6 RP11-400D2.2 25.27517      1.2983374 0.1804491  7.195034
## ENSG00000241111.1 PRICKLE2-AS1 10.02828      2.0733786 0.2923354  7.092465
## ENSG00000261618.2 LINC02605    31.80415      0.9719705 0.1888472  5.146862
## ENSG00000184385.2 UMODL1-AS1   10.70454      4.5617023 0.9045180  5.043241
## ENSG00000205611.5 LINC01597    51.31977      0.8279942 0.1641945  5.042766
## ENSG00000287059.1 RP11-14A10.1 22.95559      1.3892825 0.2782894  4.992223
## ENSG00000242741.2 LINC02005    15.47253      0.9523263 0.1973762  4.824930
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    5.892606e-39 1.217353e-34
## ENSG00000249036.1 RP11-625I7.1 1.208266e-22 1.248078e-18
## ENSG00000223078.1 RNU2-55P     1.437083e-21 9.896229e-18
## ENSG00000251199.6 RP11-400D2.2 6.244523e-13 3.225140e-09
## ENSG00000241111.1 PRICKLE2-AS1 1.317439e-12 5.443395e-09
## ENSG00000261618.2 LINC02605    2.648805e-07 9.120278e-04
## ENSG00000184385.2 UMODL1-AS1   4.577127e-07 1.184926e-03
## ENSG00000205611.5 LINC01597    4.588514e-07 1.184926e-03
## ENSG00000287059.1 RP11-14A10.1 5.968836e-07 1.370113e-03
## ENSG00000242741.2 LINC02005    1.400521e-06 2.893337e-03

mean(abs(dge$stat))

## [1] 0.802444

mvf_lo_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 24 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE       stat
## ENSG00000234551.2 LINC01309      34.12360      2.6736069 0.2200505  12.149972
## ENSG00000249036.1 RP11-625I7.1   21.88654     -1.9946353 0.1980806 -10.069818
## ENSG00000223078.1 RNU2-55P       13.46670      2.0483749 0.2508718   8.165027
## ENSG00000251199.6 RP11-400D2.2   25.27517      1.2757453 0.2026393   6.295645
## ENSG00000241111.1 PRICKLE2-AS1   10.02828      1.8227198 0.3176453   5.738224
## ENSG00000287763.1 RP11-153P14.1  69.51778     -2.9784871 0.5306885  -5.612496
## ENSG00000287059.1 RP11-14A10.1   22.95559      1.3525084 0.2781883   4.861846
## ENSG00000261618.2 LINC02605      31.80415      0.9841545 0.2160895   4.554384
## ENSG00000165617.15 DACT1        136.99579     -1.0197244 0.2519796  -4.046853
## ENSG00000284692.2 RP1-58B11.2    13.53876      1.2539524 0.3136024   3.998543
##                                       pvalue         padj
## ENSG00000234551.2 LINC01309     5.738502e-34 1.185517e-29
## ENSG00000249036.1 RP11-625I7.1  7.511698e-24 7.759208e-20
## ENSG00000223078.1 RNU2-55P      3.213630e-16 2.213013e-12
## ENSG00000251199.6 RP11-400D2.2  3.061249e-10 1.581059e-06
## ENSG00000241111.1 PRICKLE2-AS1  9.567436e-09 3.953073e-05
## ENSG00000287763.1 RP11-153P14.1 1.994291e-08 6.866676e-05
## ENSG00000287059.1 RP11-14A10.1  1.162964e-06 3.432238e-03
## ENSG00000261618.2 LINC02605     5.253923e-06 1.356760e-02
## ENSG00000165617.15 DACT1        5.191081e-05 1.191584e-01
## ENSG00000284692.2 RP1-58B11.2   6.373373e-05 1.307941e-01

mean(abs(dge$stat))

## [1] 0.8050376

mvf_lo_pod1_adj <- dge

dim(subset(mvf_lo_pod1,padj<0.05))

## [1] 16 61

Sex differences in high CRP group (not stratified for treatment group)

T0

No correction for treatment group.

#load chromosome2gene table
chr2gene <- read.table("../ref/chr2gene.tsv")
xyg <- subset(chr2gene,V1=="chrX" | V1=="chrY")

mx <- xt0

dim(mx)

## [1] 60649   111

mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
dim(mx)

## [1] 57660   111

ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2 <- subset(ss2,crp_group==4)
mx <- mx[,colnames(mx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 21177    55

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 291 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     41.09906      2.7475538 0.1703948 16.124634
## ENSG00000249036.1 RP11-625I7.1  22.89777     -1.6712038 0.1890118 -8.841796
## ENSG00000223078.1 RNU2-55P      15.68780      2.0376671 0.2319994  8.783072
## ENSG00000287059.1 RP11-14A10.1  30.25752      1.8377734 0.2273394  8.083833
## ENSG00000251199.6 RP11-400D2.2  34.79657      1.1193965 0.1559826  7.176420
## ENSG00000241111.1 PRICKLE2-AS1  12.71155      1.9977146 0.2785771  7.171138
## ENSG00000282826.2 FRG1CP       568.54614      0.5373063 0.1038772  5.172515
## ENSG00000029534.21 ANK1        308.08237     -0.6663338 0.1370848 -4.860743
## ENSG00000118492.18 ADGB         28.31195      0.6425526 0.1324436  4.851519
## ENSG00000205611.5 LINC01597     72.91149      0.7235389 0.1492394  4.848175
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    1.712703e-58 3.626992e-54
## ENSG00000249036.1 RP11-625I7.1 9.419198e-19 9.973518e-15
## ENSG00000223078.1 RNU2-55P     1.590687e-18 1.122866e-14
## ENSG00000287059.1 RP11-14A10.1 6.276204e-16 3.322779e-12
## ENSG00000251199.6 RP11-400D2.2 7.156033e-13 2.625132e-09
## ENSG00000241111.1 PRICKLE2-AS1 7.437689e-13 2.625132e-09
## ENSG00000282826.2 FRG1CP       2.309639e-07 6.987318e-04
## ENSG00000029534.21 ANK1        1.169460e-06 2.638709e-03
## ENSG00000118492.18 ADGB        1.225196e-06 2.638709e-03
## ENSG00000205611.5 LINC01597    1.246026e-06 2.638709e-03

mean(abs(dge$stat))

## [1] 0.8218259

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 6 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309      41.09906      2.7234360 0.1838595 14.812594
## ENSG00000223078.1 RNU2-55P       15.68780      2.0820131 0.2407429  8.648286
## ENSG00000249036.1 RP11-625I7.1   22.89777     -1.6615975 0.2070612 -8.024668
## ENSG00000287059.1 RP11-14A10.1   30.25752      1.8229839 0.2395041  7.611493
## ENSG00000251199.6 RP11-400D2.2   34.79657      1.1448244 0.1615118  7.088176
## ENSG00000241111.1 PRICKLE2-AS1   12.71155      1.9854604 0.2902379  6.840804
## ENSG00000160789.24 LMNA        1114.84849     -0.7007711 0.1364879 -5.134309
## ENSG00000205611.5 LINC01597      72.91149      0.7420386 0.1454348  5.102207
## ENSG00000259719.6 LINC02284      54.28891     -0.9905106 0.1944177 -5.094755
## ENSG00000163735.7 CXCL5         255.03703     -1.6090605 0.3224991 -4.989348
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    1.214613e-49 2.572185e-45
## ENSG00000223078.1 RNU2-55P     5.227859e-18 5.535519e-14
## ENSG00000249036.1 RP11-625I7.1 1.018008e-15 7.186116e-12
## ENSG00000287059.1 RP11-14A10.1 2.709486e-14 1.434469e-10
## ENSG00000251199.6 RP11-400D2.2 1.358908e-12 5.755518e-09
## ENSG00000241111.1 PRICKLE2-AS1 7.874971e-12 2.779471e-08
## ENSG00000160789.24 LMNA        2.831819e-07 8.216514e-04
## ENSG00000205611.5 LINC01597    3.357159e-07 8.216514e-04
## ENSG00000259719.6 LINC02284    3.491931e-07 8.216514e-04
## ENSG00000163735.7 CXCL5        6.058330e-07 1.282973e-03

mean(abs(dge$stat))

## [1] 0.815457

mvf_hi_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 31 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309      41.09906      2.6846929 0.1911398 14.045707
## ENSG00000249036.1 RP11-625I7.1   22.89777     -1.6363919 0.1700257 -9.624382
## ENSG00000223078.1 RNU2-55P       15.68780      2.1673734 0.2405782  9.009017
## ENSG00000287059.1 RP11-14A10.1   30.25752      1.9148333 0.2435354  7.862649
## ENSG00000241111.1 PRICKLE2-AS1   12.71155      2.1716924 0.2840957  7.644228
## ENSG00000251199.6 RP11-400D2.2   34.79657      1.1927663 0.1641208  7.267611
## ENSG00000160789.24 LMNA        1114.84849     -0.7762545 0.1363735 -5.692123
## ENSG00000259719.6 LINC02284      54.28891     -1.0524868 0.1892147 -5.562395
## ENSG00000154917.11 RAB6B        155.08204     -0.8860746 0.1625378 -5.451499
## ENSG00000119326.15 CTNNAL1       57.45744     -0.9210509 0.1717614 -5.362386
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    8.184765e-45 1.699648e-40
## ENSG00000249036.1 RP11-625I7.1 6.308485e-22 6.550100e-18
## ENSG00000223078.1 RNU2-55P     2.079120e-19 1.439167e-15
## ENSG00000287059.1 RP11-14A10.1 3.760924e-15 1.952483e-11
## ENSG00000241111.1 PRICKLE2-AS1 2.102020e-14 8.730108e-11
## ENSG00000251199.6 RP11-400D2.2 3.659007e-13 1.266382e-09
## ENSG00000160789.24 LMNA        1.254694e-08 3.722140e-05
## ENSG00000259719.6 LINC02284    2.660978e-08 6.907234e-05
## ENSG00000154917.11 RAB6B       4.994700e-08 1.152444e-04
## ENSG00000119326.15 CTNNAL1     8.212991e-08 1.676947e-04

mean(abs(dge$stat))

## [1] 0.9985743

mvf_hi_t0_adj <- dge

dim(subset(mvf_hi_t0,padj<0.05))

## [1] 71 61

EOS

No correction for treatment group.

#load chromosome2gene table
chr2gene <- read.table("../ref/chr2gene.tsv")
xyg <- subset(chr2gene,V1=="chrX" | V1=="chrY")

mx <- xeos

dim(mx)

## [1] 60649    98

mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
dim(mx)

## [1] 57660    98

ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2 <- subset(ss2,crp_group==4)
mx <- mx[,colnames(mx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 21199    52

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 124 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     33.33921      2.4403190 0.1856270 13.146361
## ENSG00000287059.1 RP11-14A10.1  29.01227      1.5746345 0.2048465  7.686900
## ENSG00000223078.1 RNU2-55P      15.44212      1.6837808 0.2307396  7.297320
## ENSG00000241111.1 PRICKLE2-AS1  11.99614      1.8361849 0.2656031  6.913266
## ENSG00000249036.1 RP11-625I7.1  20.48683     -1.3690932 0.2206460 -6.204930
## ENSG00000251199.6 RP11-400D2.2  29.81371      1.3818661 0.2242179  6.163049
## ENSG00000142606.16 MMEL1        41.01197      1.2360933 0.2570226  4.809278
## ENSG00000261618.2 LINC02605     38.40968      0.6575011 0.1371833  4.792867
## ENSG00000182263.14 FIGN         14.80449      3.0725635 0.6525952  4.708223
## ENSG00000164821.5 DEFA4        396.33649      2.4120317 0.5537353  4.355929
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    1.785629e-39 3.785355e-35
## ENSG00000287059.1 RP11-14A10.1 1.507434e-14 1.597805e-10
## ENSG00000223078.1 RNU2-55P     2.935551e-13 2.074358e-09
## ENSG00000241111.1 PRICKLE2-AS1 4.736200e-12 2.510068e-08
## ENSG00000249036.1 RP11-625I7.1 5.472112e-10 2.320066e-06
## ENSG00000251199.6 RP11-400D2.2 7.135740e-10 2.521176e-06
## ENSG00000142606.16 MMEL1       1.514764e-06 4.356779e-03
## ENSG00000261618.2 LINC02605    1.644145e-06 4.356779e-03
## ENSG00000182263.14 FIGN        2.498862e-06 5.885930e-03
## ENSG00000164821.5 DEFA4        1.325038e-05 2.808948e-02

mean(abs(dge$stat))

## [1] 0.7524278

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     33.33921      2.3569271 0.1869109 12.609898
## ENSG00000287059.1 RP11-14A10.1  29.01227      1.5794570 0.2151250  7.342044
## ENSG00000223078.1 RNU2-55P      15.44212      1.6681236 0.2412151  6.915503
## ENSG00000241111.1 PRICKLE2-AS1  11.99614      1.8381605 0.2758089  6.664617
## ENSG00000249036.1 RP11-625I7.1  20.48683     -1.4387781 0.2362987 -6.088810
## ENSG00000251199.6 RP11-400D2.2  29.81371      1.4085373 0.2348460  5.997706
## ENSG00000279319.1 RP11-693M3.1  16.33692      1.0589761 0.2254952  4.696226
## ENSG00000261618.2 LINC02605     38.40968      0.6611660 0.1467102  4.506612
## ENSG00000282826.2 FRG1CP       514.33435      0.4766021 0.1125323  4.235247
## ENSG00000142606.16 MMEL1        41.01197      1.0664933 0.2523226  4.226705
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    1.862331e-36 3.947956e-32
## ENSG00000287059.1 RP11-14A10.1 2.103563e-13 2.229672e-09
## ENSG00000223078.1 RNU2-55P     4.662056e-12 3.294364e-08
## ENSG00000241111.1 PRICKLE2-AS1 2.653562e-11 1.406321e-07
## ENSG00000249036.1 RP11-625I7.1 1.137527e-09 4.822889e-06
## ENSG00000251199.6 RP11-400D2.2 2.001239e-09 7.070711e-06
## ENSG00000279319.1 RP11-693M3.1 2.650129e-06 8.025727e-03
## ENSG00000261618.2 LINC02605    6.587084e-06 1.745495e-02
## ENSG00000282826.2 FRG1CP       2.283009e-05 5.027092e-02
## ENSG00000142606.16 MMEL1       2.371382e-05 5.027092e-02

mean(abs(dge$stat))

## [1] 0.7620071

mvf_hi_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 21 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange      lfcSE      stat
## ENSG00000234551.2 LINC01309     33.33921      2.3190333 0.19888030 11.660448
## ENSG00000287059.1 RP11-14A10.1  29.01227      1.6504563 0.21782313  7.577048
## ENSG00000223078.1 RNU2-55P      15.44212      1.7428476 0.23923642  7.285043
## ENSG00000241111.1 PRICKLE2-AS1  11.99614      1.8924611 0.28170825  6.717805
## ENSG00000249036.1 RP11-625I7.1  20.48683     -1.5987534 0.25234023 -6.335705
## ENSG00000251199.6 RP11-400D2.2  29.81371      1.4028800 0.24878909  5.638832
## ENSG00000282826.2 FRG1CP       514.33435      0.4860188 0.09781564  4.968723
## ENSG00000261618.2 LINC02605     38.40968      0.6865628 0.15310432  4.484281
## ENSG00000205611.5 LINC01597     72.82931      0.7217618 0.16444591  4.389053
## ENSG00000149531.15 FRG1BP       65.48543      0.8062481 0.18386032  4.385112
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    2.029715e-31 4.302794e-27
## ENSG00000287059.1 RP11-14A10.1 3.535059e-14 3.746985e-10
## ENSG00000223078.1 RNU2-55P     3.215679e-13 2.272306e-09
## ENSG00000241111.1 PRICKLE2-AS1 1.844822e-11 9.777095e-08
## ENSG00000249036.1 RP11-625I7.1 2.362579e-10 1.001686e-06
## ENSG00000251199.6 RP11-400D2.2 1.712071e-08 6.049032e-05
## ENSG00000282826.2 FRG1CP       6.739532e-07 2.041019e-03
## ENSG00000261618.2 LINC02605    7.316020e-06 1.938654e-02
## ENSG00000205611.5 LINC01597    1.138455e-05 2.457517e-02
## ENSG00000149531.15 FRG1BP      1.159261e-05 2.457517e-02

mean(abs(dge$stat))

## [1] 0.7764473

mvf_hi_eos_adj <- dge

dim(subset(mvf_hi_eos,padj<0.05))

## [1]  8 58

POD1

No correction for treatment group.

#load chromosome2gene table
chr2gene <- read.table("../ref/chr2gene.tsv")
xyg <- subset(chr2gene,V1=="chrX" | V1=="chrY")

mx <- xpod1

dim(mx)

## [1] 60649   109

mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
dim(mx)

## [1] 57660   109

ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2 <- subset(ss2,crp_group==4)
mx <- mx[,colnames(mx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 20547    54

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD )

## converting counts to integer mode

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 231 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     28.88719      2.3303369 0.2131694 10.931854
## ENSG00000251199.6 RP11-400D2.2  24.04062      1.5721928 0.1783430  8.815555
## ENSG00000249036.1 RP11-625I7.1  16.02735     -1.8716484 0.2140172 -8.745319
## ENSG00000287059.1 RP11-14A10.1  17.45351      1.2737217 0.2194248  5.804820
## ENSG00000223078.1 RNU2-55P      10.63492      1.3736437 0.2454948  5.595408
## ENSG00000142606.16 MMEL1        42.71184      1.1433146 0.2127386  5.374269
## ENSG00000280384.1 RP4-695O20.1  14.18123      0.9356208 0.1807317  5.176848
## ENSG00000162069.16 BICDL2       33.70854     -2.7279023 0.5323684 -5.124087
## ENSG00000254873.1 RP11-770J1.5  48.30434      1.8877549 0.3853291  4.899071
## ENSG00000268758.7 ADGRE4P      473.39915     -1.2894914 0.2855855 -4.515255
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    8.117291e-28 1.667779e-23
## ENSG00000251199.6 RP11-400D2.2 1.190934e-18 1.223447e-14
## ENSG00000249036.1 RP11-625I7.1 2.223873e-18 1.523056e-14
## ENSG00000287059.1 RP11-14A10.1 6.443507e-09 3.309708e-05
## ENSG00000223078.1 RNU2-55P     2.201039e-08 9.044510e-05
## ENSG00000142606.16 MMEL1       7.689389e-08 2.633103e-04
## ENSG00000280384.1 RP4-695O20.1 2.256658e-07 6.623612e-04
## ENSG00000162069.16 BICDL2      2.989829e-07 7.678627e-04
## ENSG00000254873.1 RP11-770J1.5 9.629062e-07 2.198208e-03
## ENSG00000268758.7 ADGRE4P      6.324051e-06 1.299339e-02

mean(abs(dge$stat))

## [1] 0.6861829

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 5 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     28.88719       2.385083 0.2268342 10.514655
## ENSG00000251199.6 RP11-400D2.2  24.04062       1.570040 0.1886671  8.321744
## ENSG00000249036.1 RP11-625I7.1  16.02735      -1.776728 0.2305346 -7.706991
## ENSG00000287059.1 RP11-14A10.1  17.45351       1.416739 0.2211889  6.405110
## ENSG00000223078.1 RNU2-55P      10.63492       1.447740 0.2598982  5.570413
## ENSG00000280384.1 RP4-695O20.1  14.18123       0.920274 0.1922327  4.787291
## ENSG00000255398.3 HCAR3        337.50562      -2.477792 0.5269334 -4.702287
## ENSG00000162069.16 BICDL2       50.61023      -2.985381 0.6528841 -4.572605
## ENSG00000137261.15 KIAA0319     44.83620      -2.267737 0.5122779 -4.426770
## ENSG00000288700.1 RP11-22E12.2  31.68989      -2.035255 0.4605020 -4.419644
##                                      pvalue         padj
## ENSG00000234551.2 LINC01309    7.395101e-26 1.519471e-21
## ENSG00000251199.6 RP11-400D2.2 8.667817e-17 8.904882e-13
## ENSG00000249036.1 RP11-625I7.1 1.288186e-14 8.822788e-11
## ENSG00000287059.1 RP11-14A10.1 1.502613e-10 7.718545e-07
## ENSG00000223078.1 RNU2-55P     2.541368e-08 1.044350e-04
## ENSG00000280384.1 RP4-695O20.1 1.690472e-06 5.789023e-03
## ENSG00000255398.3 HCAR3        2.572637e-06 7.551426e-03
## ENSG00000162069.16 BICDL2      4.816974e-06 1.237180e-02
## ENSG00000137261.15 KIAA0319    9.565469e-06 2.031348e-02
## ENSG00000288700.1 RP11-22E12.2 9.886349e-06 2.031348e-02

mean(abs(dge$stat))

## [1] 0.6998276

mvf_hi_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + sexD )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 27 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange     lfcSE      stat
## ENSG00000234551.2 LINC01309     28.88719      2.4115310 0.2449942  9.843215
## ENSG00000251199.6 RP11-400D2.2  24.04062      1.6172506 0.2030531  7.964666
## ENSG00000249036.1 RP11-625I7.1  16.02735     -1.7745609 0.2596696 -6.833918
## ENSG00000287059.1 RP11-14A10.1  17.45351      1.5247645 0.2436644  6.257643
## ENSG00000223078.1 RNU2-55P      10.63492      1.3888619 0.2622136  5.296682
## ENSG00000175084.13 DES          28.27529     -1.6131153 0.3231908 -4.991216
## ENSG00000287763.1 RP11-153P14.1 50.70110     -2.4542507 0.5217216 -4.704138
## ENSG00000142606.16 MMEL1        42.71184      0.9755700 0.2083536  4.682281
## ENSG00000254873.1 RP11-770J1.5  48.30434      1.6729181 0.3625609  4.614171
## ENSG00000280384.1 RP4-695O20.1  14.18123      0.9225726 0.2086620  4.421374
##                                       pvalue         padj
## ENSG00000234551.2 LINC01309     7.332945e-23 1.506700e-18
## ENSG00000251199.6 RP11-400D2.2  1.656702e-15 1.702013e-11
## ENSG00000249036.1 RP11-625I7.1  8.262641e-12 5.659083e-08
## ENSG00000287059.1 RP11-14A10.1  3.908403e-10 2.007649e-06
## ENSG00000223078.1 RNU2-55P      1.179260e-07 4.846052e-04
## ENSG00000175084.13 DES          6.000027e-07 2.054709e-03
## ENSG00000287763.1 RP11-153P14.1 2.549403e-06 7.286498e-03
## ENSG00000142606.16 MMEL1        2.837007e-06 7.286498e-03
## ENSG00000254873.1 RP11-770J1.5  3.946675e-06 9.010259e-03
## ENSG00000280384.1 RP4-695O20.1  9.807504e-06 2.015148e-02

mean(abs(dge$stat))

## [1] 0.8264306

mvf_hi_pod1_adj <- dge

dim(subset(mvf_hi_pod1,padj<0.05))

## [1] 18 60

Effect of surgery in males with high CRP

In sexD==1 females

16 females only with T0 and POD1

ss2 <- merge(sscell,ss,by=0)
rownames(ss2) <- ss2$Row.names

ss2 <- subset(ss2,crp_group==4 & timepoint != "EOS" & sexD == 1 )

mx <- xx[,colnames(xx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]
dim(mx)

## [1] 21567    33

table(chr2gene[match(sapply(strsplit(rownames(mx)," "),"[[",1),chr2gene$V2),1])

## 
##  chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 chr20 
##  2142   809  1139  1125   375   806   739  1104  1352   336  1548  1412   545 
## chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrM  chrX  chrY 
##   266   600  1147   760   931  1087  1110   724   805    19   654    32

ss2 <- ss2[which(rownames(ss2) %in% colnames(mx)),]
ss2 <- ss2[order(rownames(ss2)),]

ss2$timepoint <- factor(ss2$timepoint,levels=c("T0","POD1"))

#dim(mx)
#mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
#dim(mx)

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ PG_number + timepoint )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                             baseMean log2FoldChange      lfcSE     stat
## ENSG00000108950.12 FAM20A  1418.5179       3.869053 0.19174287 20.17834
## ENSG00000014257.16 ACP3     869.0445       1.338041 0.07937339 16.85755
## ENSG00000156414.19 TDRD9    896.8816       2.319293 0.13761906 16.85299
## ENSG00000132170.24 PPARG    123.3816       3.213308 0.20483828 15.68705
## ENSG00000168615.13 ADAM9   1574.2542       1.624515 0.10366969 15.67010
## ENSG00000161944.16 ASGR2   2906.2035       1.620646 0.10347070 15.66285
## ENSG00000169385.3 RNASE2   1250.4443       2.346518 0.15249733 15.38727
## ENSG00000164125.16 GASK1B  3608.4252       1.654531 0.10827604 15.28067
## ENSG00000183019.7 MCEMP1   4357.8963       2.863233 0.18810447 15.22151
## ENSG00000203710.12 CR1    11629.4813       2.374281 0.16726003 14.19515
##                                 pvalue         padj
## ENSG00000108950.12 FAM20A 1.517611e-90 3.273032e-86
## ENSG00000014257.16 ACP3   9.233298e-64 7.170255e-60
## ENSG00000156414.19 TDRD9  9.973926e-64 7.170255e-60
## ENSG00000132170.24 PPARG  1.854888e-55 9.757214e-52
## ENSG00000168615.13 ADAM9  2.421816e-55 9.757214e-52
## ENSG00000161944.16 ASGR2  2.714484e-55 9.757214e-52
## ENSG00000169385.3 RNASE2  1.992547e-53 6.139037e-50
## ENSG00000164125.16 GASK1B 1.028698e-52 2.773242e-49
## ENSG00000183019.7 MCEMP1  2.546033e-52 6.101143e-49
## ENSG00000203710.12 CR1    9.817353e-46 2.117308e-42

mean(abs(dge$stat))

## [1] 2.247041

surgfemale <- dge
dim(subset(surgfemale,padj<0.05))

## [1] 8135   39

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ PG_number + Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + timepoint )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## 2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange     lfcSE      stat
## ENSG00000165092.13 ALDH1A1   556.0900     -2.9820987 0.4109257 -7.257027
## ENSG00000108950.12 FAM20A   1418.5179      2.5446023 0.3561134  7.145483
## ENSG00000152518.8 ZFP36L2  21784.6356     -1.3510264 0.2412505 -5.600098
## ENSG00000161944.16 ASGR2    2906.2035      1.1293842 0.2188673  5.160131
## ENSG00000132170.24 PPARG     123.3816      2.4161730 0.5013513  4.819322
## ENSG00000204642.14 HLA-F   11602.3630     -0.6359361 0.1322543 -4.808434
## ENSG00000135218.19 CD36    10743.9529      1.0554441 0.2204909  4.786793
## ENSG00000156414.19 TDRD9     896.8816      1.5403556 0.3221443  4.781570
## ENSG00000203710.12 CR1     11629.4813      1.5053257 0.3160264  4.763290
## ENSG00000019169.11 MARCO    1204.4460      1.3706841 0.2912783  4.705754
##                                  pvalue         padj
## ENSG00000165092.13 ALDH1A1 3.956894e-13 5.721273e-09
## ENSG00000108950.12 FAM20A  8.967968e-13 6.483393e-09
## ENSG00000152518.8 ZFP36L2  2.142310e-08 1.032522e-04
## ENSG00000161944.16 ASGR2   2.467767e-07 8.920360e-04
## ENSG00000132170.24 PPARG   1.440472e-06 3.059869e-03
## ENSG00000204642.14 HLA-F   1.521174e-06 3.059869e-03
## ENSG00000135218.19 CD36    1.694679e-06 3.059869e-03
## ENSG00000156414.19 TDRD9   1.739316e-06 3.059869e-03
## ENSG00000203710.12 CR1     1.904614e-06 3.059869e-03
## ENSG00000019169.11 MARCO   2.529292e-06 3.657104e-03

mean(abs(dge$stat))

## [1] 0.7913256

surgfemale_adj <- dge

dim(subset(surgfemale_adj,padj<0.05))

## [1] 41 39

(dim(subset(surgfemale,padj<0.05))[1] - dim(subset(surgfemale_adj,padj<0.05))[1]) / dim(subset(surgfemale,padj<0.05))[1]

## [1] 0.99496

In sexD==2 males

38 males with T0 and POD1

ss2 <- merge(sscell,ss,by=0)
rownames(ss2) <- ss2$Row.names

ss2 <- subset(ss2,crp_group==4 & timepoint != "EOS" & sexD == 2 )

mx <- xx[,colnames(xx) %in% rownames(ss2)]
mx <- mx[which(rowMeans(mx)>10),]

table(chr2gene[match(sapply(strsplit(rownames(mx)," "),"[[",1),chr2gene$V2),1])

## 
##  chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 chr20 
##  2146   816  1148  1130   380   814   743  1109  1354   334  1553  1416   548 
## chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrM  chrX  chrY 
##   264   606  1150   758   937  1084  1120   730   805    18   647    48

ss2 <- ss2[which(rownames(ss2) %in% colnames(mx)),]
ss2 <- ss2[order(rownames(ss2)),]

ss2$timepoint <- factor(ss2$timepoint,levels=c("T0","POD1"))

#dim(mx)
#mx <- mx[which(! sapply(strsplit(rownames(mx)," "),"[[",1) %in% xyg$V2),]
#dim(mx)

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ PG_number + timepoint )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## 1 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange      lfcSE     stat
## ENSG00000108950.12 FAM20A    1528.14262       3.829099 0.16185989 23.65687
## ENSG00000132170.24 PPARG      149.75730       3.301257 0.15532787 21.25348
## ENSG00000170439.7 METTL7B     166.27177       4.936676 0.24257968 20.35074
## ENSG00000121316.11 PLBD1    15706.49246       2.007735 0.11104993 18.07957
## ENSG00000163221.9 S100A12   16539.12795       3.113329 0.18300044 17.01268
## ENSG00000174705.13 SH3PXD2B   504.64277       3.214712 0.19136611 16.79875
## ENSG00000137869.15 CYP19A1     81.26793       6.512124 0.39413773 16.52246
## ENSG00000168615.13 ADAM9     1617.51324       1.541855 0.09517653 16.19995
## ENSG00000166033.13 HTRA1      134.28398       2.514197 0.15543931 16.17478
## ENSG00000169385.3 RNASE2     1315.10460       2.148848 0.13303322 16.15272
##                                    pvalue          padj
## ENSG00000108950.12 FAM20A   1.002886e-123 2.172051e-119
## ENSG00000132170.24 PPARG    3.061301e-100  3.315083e-96
## ENSG00000170439.7 METTL7B    4.573255e-92  3.301585e-88
## ENSG00000121316.11 PLBD1     4.616370e-73  2.499533e-69
## ENSG00000163221.9 S100A12    6.613770e-65  2.864821e-61
## ENSG00000174705.13 SH3PXD2B  2.492307e-63  8.996398e-60
## ENSG00000137869.15 CYP19A1   2.528798e-61  7.824100e-58
## ENSG00000168615.13 ADAM9     5.047116e-59  1.366380e-55
## ENSG00000166033.13 HTRA1     7.596758e-59  1.828118e-55
## ENSG00000169385.3 RNASE2     1.086669e-58  2.353508e-55

mean(abs(dge$stat))

## [1] 3.040762

surgmale <- dge
dim(subset(surgmale,padj<0.05))

## [1] 11793    82

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ PG_number + Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + timepoint )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE      stat
## ENSG00000108950.12 FAM20A    1528.1426      2.7390970 0.2890367  9.476642
## ENSG00000163221.9 S100A12   16539.1280      1.7317637 0.2119146  8.171988
## ENSG00000132170.24 PPARG      149.7573      2.1957935 0.2737136  8.022231
## ENSG00000137959.17 IFI44L    1295.0650     -2.1478366 0.2917842 -7.361045
## ENSG00000088827.13 SIGLEC1   1468.6407     -2.0825545 0.2913658 -7.147559
## ENSG00000170439.7 METTL7B     166.2718      3.3486625 0.4788782  6.992723
## ENSG00000183019.7 MCEMP1     5938.1573      1.5733238 0.2251231  6.988727
## ENSG00000165092.13 ALDH1A1    518.0640     -2.2354764 0.3218900 -6.944846
## ENSG00000174705.13 SH3PXD2B   504.6428      2.3935512 0.3493225  6.851982
## ENSG00000116574.6 RHOU       1117.3491      0.9279911 0.1416458  6.551490
##                                   pvalue         padj
## ENSG00000108950.12 FAM20A   2.625986e-21 5.246195e-17
## ENSG00000163221.9 S100A12   3.033483e-16 3.030146e-12
## ENSG00000132170.24 PPARG    1.038414e-15 6.915143e-12
## ENSG00000137959.17 IFI44L   1.824764e-13 9.113785e-10
## ENSG00000088827.13 SIGLEC1  8.833456e-13 3.529496e-09
## ENSG00000170439.7 METTL7B   2.696019e-12 7.916782e-09
## ENSG00000183019.7 MCEMP1    2.773925e-12 7.916782e-09
## ENSG00000165092.13 ALDH1A1  3.788738e-12 9.461427e-09
## ENSG00000174705.13 SH3PXD2B 7.283397e-12 1.616752e-08
## ENSG00000116574.6 RHOU      5.696589e-11 1.138065e-07

mean(abs(dge$stat))

## [1] 0.9878333

surgmale_adj <- dge

dim(subset(surgmale_adj,padj<0.05))

## [1] 487  82

table(chr2gene[match(sapply(strsplit(rownames(surgmale_adj)," "),"[[",1),chr2gene$V2),1])

## 
##  chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 chr20 
##  2146   816  1148  1130   380   814   743  1109  1354   334  1553  1416   548 
## chr21 chr22  chr3  chr4  chr5  chr6  chr7  chr8  chr9  chrM  chrX  chrY 
##   264   606  1150   758   937  1084  1120   730   805    18   647    48

(dim(subset(surgmale,padj<0.05))[1] - dim(subset(surgmale_adj,padj<0.05))[1]) / dim(subset(surgmale,padj<0.05))[1]

## [1] 0.9587043

Compare male and female

surgmale_up <- rownames(subset(surgmale,padj<0.05 & log2FoldChange >0))
surgmale_dn <- rownames(subset(surgmale,padj<0.05 & log2FoldChange <0))

surgfemale_up <- rownames(subset(surgfemale,padj<0.05 & log2FoldChange >0))
surgfemale_dn <- rownames(subset(surgfemale,padj<0.05 & log2FoldChange <0))

v1 <- list("male_up"=surgmale_up, "male_dn"=surgmale_dn,
  "female_up"=surgfemale_up,"female_dn"=surgfemale_dn)

plot(euler(v1),quantities = TRUE)

common=3541+3402

uniq=1700+472+684+3114

common/(common+uniq) #54% common

## [1] 0.5376752

Now look at infection

Infection in all samples

TODO: Look at cell composition by infection status.

Infection in all samples T0

It looks like CCL3 is positively associated with infection outcome at T0, while CD177 is associated with no infection. CCL3 is involved in macrophage activation, while CD177 is a neutrophil activator. After correction for cell types and clinical covariates, the picture CXCL2, CCL3 and PRRG3 remain associated with infection.

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 21

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 304 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                            baseMean log2FoldChange     lfcSE      stat
## ENSG00000277632.2 CCL3    559.74862      2.1662530 0.3345312  6.475489
## ENSG00000081041.9 CXCL2    53.13096      2.1990827 0.3779392  5.818615
## ENSG00000232810.4 TNF     428.02933      1.2897952 0.2419915  5.329919
## ENSG00000204936.10 CD177  172.96072     -2.5210703 0.4954814 -5.088123
## ENSG00000115590.14 IL1R2  713.50209     -2.5648448 0.5141818 -4.988206
## ENSG00000177606.8 JUN    4050.31490      1.5953388 0.3206681  4.975047
## ENSG00000123358.20 NR4A1 1665.75695      1.3430362 0.2847870  4.715932
## ENSG00000137331.12 IER3   881.33195      0.9049066 0.1921988  4.708180
## ENSG00000145632.15 PLK2    64.29472      0.9055198 0.1969684  4.597284
## ENSG00000146122.17 DAAM2   54.32940     -2.1255210 0.4886383 -4.349887
##                                pvalue         padj
## ENSG00000277632.2 CCL3   9.450545e-11 2.072977e-06
## ENSG00000081041.9 CXCL2  5.933706e-09 6.507792e-05
## ENSG00000232810.4 TNF    9.825632e-08 7.184175e-04
## ENSG00000204936.10 CD177 3.616246e-07 1.983059e-03
## ENSG00000115590.14 IL1R2 6.094254e-07 2.384767e-03
## ENSG00000177606.8 JUN    6.523183e-07 2.384767e-03
## ENSG00000123358.20 NR4A1 2.406071e-06 6.852998e-03
## ENSG00000137331.12 IER3  2.499384e-06 6.852998e-03
## ENSG00000145632.15 PLK2  4.280331e-06 1.043212e-02
## ENSG00000146122.17 DAAM2 1.362079e-05 2.987720e-02

mean(abs(dge$stat))

## [1] 0.8396572

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 27 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange     lfcSE     stat
## ENSG00000130032.17 PRRG3       37.78635      1.4726086 0.2445894 6.020737
## ENSG00000277632.2 CCL3        559.74862      1.7221417 0.3518844 4.894056
## ENSG00000081041.9 CXCL2        53.13096      1.8603134 0.3958621 4.699397
## ENSG00000287970.1 CH17-98J9.1  15.30396      2.2967264 0.4953308 4.636753
## ENSG00000237973.1 MTCO1P12     33.56076      1.0869755 0.2385261 4.557051
## ENSG00000110436.13 SLC1A2      29.47504      0.9824034 0.2157723 4.552965
## ENSG00000109321.11 AREG       132.29571      2.0856862 0.4588181 4.545780
## ENSG00000079215.15 SLC1A3     219.79787      1.2256380 0.2719961 4.506087
## ENSG00000154099.18 DNAAF1      32.52378      0.8387759 0.1875360 4.472612
## ENSG00000276085.1 CCL3L1      308.64747      1.7359990 0.3916430 4.432606
##                                     pvalue         padj
## ENSG00000130032.17 PRRG3      1.736247e-09 3.808457e-05
## ENSG00000277632.2 CCL3        9.877874e-07 1.083356e-02
## ENSG00000081041.9 CXCL2       2.609306e-06 1.715072e-02
## ENSG00000287970.1 CH17-98J9.1 3.539251e-06 1.715072e-02
## ENSG00000237973.1 MTCO1P12    5.187685e-06 1.715072e-02
## ENSG00000110436.13 SLC1A2     5.289518e-06 1.715072e-02
## ENSG00000109321.11 AREG       5.473218e-06 1.715072e-02
## ENSG00000079215.15 SLC1A3     6.603414e-06 1.810574e-02
## ENSG00000154099.18 DNAAF1     7.726981e-06 1.883237e-02
## ENSG00000276085.1 CCL3L1      9.310094e-06 2.042169e-02

mean(abs(dge$stat))

## [1] 0.7424039

infec_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 17 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange     lfcSE     stat
## ENSG00000081041.9 CXCL2        53.13096      2.6792765 0.4027935 6.651737
## ENSG00000277632.2 CCL3        559.74862      2.1983492 0.3538976 6.211823
## ENSG00000130032.17 PRRG3       37.78635      1.3506034 0.2517614 5.364616
## ENSG00000154099.18 DNAAF1      32.52378      0.8881684 0.1897045 4.681851
## ENSG00000276085.1 CCL3L1      308.64747      1.7480191 0.3933633 4.443777
## ENSG00000222043.2 AC079305.10  21.70668      0.9359953 0.2240505 4.177608
## ENSG00000166592.12 RRAD        16.82374      0.9340806 0.2265840 4.122446
## ENSG00000112137.18 PHACTR1    249.38474      0.3878710 0.0957853 4.049379
## ENSG00000161835.11 TAMALIN    430.67506      0.6506758 0.1671846 3.891959
## ENSG00000168502.17 MTCL1       43.92701      0.6761268 0.1737423 3.891549
##                                     pvalue         padj
## ENSG00000081041.9 CXCL2       2.896543e-11 6.353566e-07
## ENSG00000277632.2 CCL3        5.237324e-10 5.744035e-06
## ENSG00000130032.17 PRRG3      8.112159e-08 5.931340e-04
## ENSG00000154099.18 DNAAF1     2.842964e-06 1.559010e-02
## ENSG00000276085.1 CCL3L1      8.839307e-06 3.877804e-02
## ENSG00000222043.2 AC079305.10 2.945903e-05 1.076973e-01
## ENSG00000166592.12 RRAD       3.748697e-05 1.174681e-01
## ENSG00000112137.18 PHACTR1    5.135372e-05 1.408055e-01
## ENSG00000161835.11 TAMALIN    9.943791e-05 2.184865e-01
## ENSG00000168502.17 MTCL1      9.960634e-05 2.184865e-01

mean(abs(dge$stat))

## [1] 0.7232922

infec_t0_adj <- dge

Look at RPM of top genes in a box plot.

# make RPM
rpm <- apply(mx,2,function(x) { x/sum(x) *1e6} )
# separate by infection status
rpm_i0 <- rpm[,which(colnames(rpm) %in% rownames(ss2[which(ss2$infec==0),]))]
rpm_i1 <- rpm[,which(colnames(rpm) %in% rownames(ss2[which(ss2$infec==1),]))]

# get sig hits 
top <- union(rownames(head(subset(infec_t0,padj<0.05),10)) , rownames(head(subset(infec_t0_adj,padj<0.05),10)) )

g <- top[1]

par(mfrow=c(2,3))
par(mar=c(2.1, 3.1, 2.1, 1.1))
lapply(top,function(g) {
  g0 <- rpm_i0[which(rownames(rpm_i0) == g),]
  g1 <- rpm_i1[which(rownames(rpm_i1) == g),]
  gl <- list("Ctrl"=log10(g0+0.1),"Infec"=log10(g1+0.1))
  boxplot(gl,cex=0,col="white",ylab="log10(RPM)")
  beeswarm(gl,add=TRUE,pch=19)
  mtext(g,cex=0.7)
})

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL

par(mfrow=c(1,1))

par(mar= c(5.1, 4.1, 4.1, 2.1) )

Infection in all samples EOS

Infection is associated with NFKB activation, but this is less clear after correction for cell types. After cell type correction, ZC3H12A and CD83 remain associated with infection.

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 77 21

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 140 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange     lfcSE      stat
## ENSG00000081041.9 CXCL2     45.457083      2.5452648 0.3389850  7.508487
## ENSG00000100906.11 NFKBIA 7744.851242      1.2083811 0.2045627  5.907143
## ENSG00000177606.8 JUN     2333.003081      1.5914940 0.2882386  5.521446
## ENSG00000125968.9 ID1       50.805092      1.9987675 0.3880398  5.150934
## ENSG00000112149.10 CD83    184.858033      1.2212714 0.2387950  5.114309
## ENSG00000125538.12 IL1B    539.354039      1.4830131 0.3087587  4.803146
## ENSG00000162772.17 ATF3     99.087468      1.3333849 0.2833832  4.705237
## ENSG00000132972.19 RNF17     9.554713     -4.8037268 1.0336382 -4.647397
## ENSG00000181649.8 PHLDA2    10.653718      1.3053781 0.2838003  4.599635
## ENSG00000183496.6 MEX3B     89.095978      0.5351826 0.1163839  4.598424
##                                 pvalue         padj
## ENSG00000081041.9 CXCL2   5.981471e-14 1.319931e-09
## ENSG00000100906.11 NFKBIA 3.480915e-09 3.840668e-05
## ENSG00000177606.8 JUN     3.362211e-08 2.473130e-04
## ENSG00000125968.9 ID1     2.591926e-07 1.389744e-03
## ENSG00000112149.10 CD83   3.148919e-07 1.389744e-03
## ENSG00000125538.12 IL1B   1.561916e-06 5.744468e-03
## ENSG00000162772.17 ATF3   2.535715e-06 7.993661e-03
## ENSG00000132972.19 RNF17  3.361503e-06 9.272286e-03
## ENSG00000181649.8 PHLDA2  4.232309e-06 9.393914e-03
## ENSG00000183496.6 MEX3B   4.256997e-06 9.393914e-03

mean(abs(dge$stat))

## [1] 0.9416072

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 19 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                             baseMean log2FoldChange     lfcSE      stat
## ENSG00000081041.9 CXCL2     45.45708      1.9177576 0.3443618  5.569019
## ENSG00000100906.11 NFKBIA 7744.85124      1.1001093 0.2196977  5.007377
## ENSG00000112149.10 CD83    184.85803      1.2101339 0.2539642  4.764979
## ENSG00000155090.15 KLF10  2270.44476      0.8164591 0.1841295  4.434156
## ENSG00000183496.6 MEX3B     89.09598      0.5545294 0.1252629  4.426924
## ENSG00000278196.3 IGLV2-8  167.59369     -1.4846410 0.3388031 -4.382017
## ENSG00000181649.8 PHLDA2    10.65372      1.2575257 0.2994339  4.199677
## ENSG00000177606.8 JUN     2333.00308      1.2431176 0.3028638  4.104543
## ENSG00000105697.9 HAMP      44.89965      0.9849616 0.2415030  4.078466
## ENSG00000107719.9 PALD1     80.25012      0.9632821 0.2407615  4.000980
##                                 pvalue         padj
## ENSG00000081041.9 CXCL2   2.561770e-08 0.0005653057
## ENSG00000100906.11 NFKBIA 5.517680e-07 0.0060879323
## ENSG00000112149.10 CD83   1.888734e-06 0.0138928949
## ENSG00000155090.15 KLF10  9.243361e-06 0.0421861780
## ENSG00000183496.6 MEX3B   9.558657e-06 0.0421861780
## ENSG00000278196.3 IGLV2-8 1.175855e-05 0.0432459737
## ENSG00000181649.8 PHLDA2  2.672963e-05 0.0842632443
## ENSG00000177606.8 JUN     4.051140e-05 0.1111535256
## ENSG00000105697.9 HAMP    4.533383e-05 0.1111535256
## ENSG00000107719.9 PALD1   6.308058e-05 0.1375554202

mean(abs(dge$stat))

## [1] 0.8005392

infec_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 17 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE     stat
## ENSG00000203811.1 H3C14      21.12624      0.8409487 0.14963233 5.620100
## ENSG00000203852.3 H3C15      21.12624      0.8409487 0.14963233 5.620100
## ENSG00000163874.11 ZC3H12A 1241.05455      0.4046799 0.08262408 4.897845
## ENSG00000112149.10 CD83     184.85803      1.1174654 0.22913969 4.876787
## ENSG00000166716.10 ZNF592  3052.71249      0.2779405 0.05725090 4.854780
## ENSG00000163545.11 NUAK2   1213.80073      0.3931730 0.08265567 4.756757
## ENSG00000135540.11 NHSL1     81.38326      0.4925112 0.10577313 4.656298
## ENSG00000205189.12 ZBTB10   263.45971      0.5202030 0.11327291 4.592475
## ENSG00000183496.6 MEX3B      89.09598      0.5839444 0.12822548 4.554043
## ENSG00000155090.15 KLF10   2270.44476      0.7291764 0.16041129 4.545667
##                                  pvalue         padj
## ENSG00000203811.1 H3C14    1.908466e-08 0.0002105706
## ENSG00000203852.3 H3C15    1.908466e-08 0.0002105706
## ENSG00000163874.11 ZC3H12A 9.689361e-07 0.0053190451
## ENSG00000112149.10 CD83    1.078280e-06 0.0053190451
## ENSG00000166716.10 ZNF592  1.205203e-06 0.0053190451
## ENSG00000163545.11 NUAK2   1.967275e-06 0.0072353102
## ENSG00000135540.11 NHSL1   3.219464e-06 0.0101491291
## ENSG00000205189.12 ZBTB10  4.380195e-06 0.0115985264
## ENSG00000183496.6 MEX3B    5.262450e-06 0.0115985264
## ENSG00000155090.15 KLF10   5.476145e-06 0.0115985264

mean(abs(dge$stat))

## [1] 0.8969433

infec_eos_adj <- dge

Look at RPM of top genes in a box plot.

# make RPM
rpm <- apply(mx,2,function(x) { x/sum(x) *1e6} )
# separate by infection status
rpm_i0 <- rpm[,which(colnames(rpm) %in% rownames(ss2[which(ss2$infec==0),]))]
rpm_i1 <- rpm[,which(colnames(rpm) %in% rownames(ss2[which(ss2$infec==1),]))]

# get sig hits 
top <- union(rownames(head(subset(infec_eos,padj<0.05),10)) , rownames(head(subset(infec_eos_adj,padj<0.05),10)) )

g <- top[1]

par(mfrow=c(2,3))
par(mar=c(2.1, 3.1, 2.1, 1.1))
lapply(top,function(g) {
  g0 <- rpm_i0[which(rownames(rpm_i0) == g),]
  g1 <- rpm_i1[which(rownames(rpm_i1) == g),]
  gl <- list("Ctrl"=log10(g0+0.1),"Infec"=log10(g1+0.1))
  boxplot(gl,cex=0,col="white",ylab="log10(RPM)")
  beeswarm(gl,add=TRUE,pch=19)
  mtext(g,cex=0.7)
})

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL
## 
## [[11]]
## NULL
## 
## [[12]]
## NULL
## 
## [[13]]
## NULL

par(mfrow=c(1,1))

par(mar= c(5.1, 4.1, 4.1, 2.1) )

Infection in all samples POD1

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 19

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 132 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000183598.4 H3C13          14.00786      2.4730926 0.42424324  5.829421
## ENSG00000101292.8 PROKR2         11.87666      1.3793099 0.23850677  5.783106
## ENSG00000274276.4 CBSL           21.26459      2.9221416 0.51504791  5.673534
## ENSG00000240098.3 RN7SL351P      21.82756      1.3063648 0.24582354  5.314238
## ENSG00000272282.1 LINC02084     149.90716     -0.8521057 0.16940446 -5.030007
## ENSG00000253766.1 RP11-804N13.1  18.05579      0.7600895 0.15368667  4.945709
## ENSG00000203814.6 H2BC18        153.69664      1.8605421 0.37723414  4.932062
## ENSG00000114993.17 RTKN          36.98959     -0.8192192 0.16951098 -4.832838
## ENSG00000245750.10 DRAIC         63.92483      1.0817504 0.22623781  4.781475
## ENSG00000120053.12 GOT1         217.93193     -0.3333793 0.07019228 -4.749516
##                                       pvalue         padj
## ENSG00000183598.4 H3C13         5.561991e-09 7.663033e-05
## ENSG00000101292.8 PROKR2        7.333397e-09 7.663033e-05
## ENSG00000274276.4 CBSL          1.398817e-08 9.744627e-05
## ENSG00000240098.3 RN7SL351P     1.071046e-07 5.595946e-04
## ENSG00000272282.1 LINC02084     4.904610e-07 2.050029e-03
## ENSG00000253766.1 RP11-804N13.1 7.586732e-07 2.429245e-03
## ENSG00000203814.6 H2BC18        8.136617e-07 2.429245e-03
## ENSG00000114993.17 RTKN         1.346001e-06 3.516258e-03
## ENSG00000245750.10 DRAIC        1.740137e-06 4.040792e-03
## ENSG00000120053.12 GOT1         2.039045e-06 4.261399e-03

mean(abs(dge$stat))

## [1] 1.39089

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 32 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000183598.4 H3C13      14.00786      2.5169037 0.43246793  5.819862
## ENSG00000110436.13 SLC1A2    29.63018      1.3167429 0.24648776  5.342022
## ENSG00000263006.6 ROCK1P1    96.76662     -2.5091103 0.47412086 -5.292132
## ENSG00000272282.1 LINC02084 149.90716     -0.9347387 0.17964375 -5.203291
## ENSG00000114993.17 RTKN      36.98959     -0.9381823 0.18398572 -5.099213
## ENSG00000120053.12 GOT1     217.93193     -0.3637241 0.07595466 -4.788700
## ENSG00000101292.8 PROKR2     11.87666      1.1987355 0.25254711  4.746582
## ENSG00000100505.14 TRIM9     24.99684      1.2583687 0.26679635  4.716589
## ENSG00000274276.4 CBSL       21.26459      2.6082167 0.55726258  4.680409
## ENSG00000002726.21 AOC1      20.67108      1.5866839 0.34485261  4.601049
##                                   pvalue        padj
## ENSG00000183598.4 H3C13     5.889618e-09          NA
## ENSG00000110436.13 SLC1A2   9.191575e-08 0.001138439
## ENSG00000263006.6 ROCK1P1   1.208983e-07 0.001138439
## ENSG00000272282.1 LINC02084 1.957899e-07 0.001229104
## ENSG00000114993.17 RTKN     3.410692e-07 0.001605839
## ENSG00000120053.12 GOT1     1.678653e-06 0.006322816
## ENSG00000101292.8 PROKR2    2.068834e-06          NA
## ENSG00000100505.14 TRIM9    2.398313e-06 0.007527906
## ENSG00000274276.4 CBSL      2.863033e-06 0.007702786
## ENSG00000002726.21 AOC1     4.203675e-06 0.008706667

mean(abs(dge$stat))

## [1] 1.220618

infec_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 11 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                baseMean log2FoldChange     lfcSE      stat
## ENSG00000107719.9 PALD1        53.63349      1.3069650 0.2802571  4.663451
## ENSG00000270550.1 IGHV3-30    240.99832     -1.2196819 0.2760123 -4.418940
## ENSG00000110436.13 SLC1A2      29.63018      1.1149531 0.2608810  4.273799
## ENSG00000171236.10 LRG1      1365.81495     -0.6649035 0.1730714 -3.841787
## ENSG00000263006.6 ROCK1P1      96.76662     -1.8170221 0.4806323 -3.780483
## ENSG00000246560.2 UBE2D3-AS1   23.77403     -0.4975790 0.1322363 -3.762803
## ENSG00000120262.10 CCDC170    141.74081      0.4625635 0.1242062  3.724159
## ENSG00000278196.3 IGLV2-8     119.81421     -1.1859894 0.3200049 -3.706161
## ENSG00000137563.13 GGH         64.90089     -0.4969402 0.1342343 -3.702035
## ENSG00000211663.2 IGLV3-19     97.65317     -1.0127961 0.2751574 -3.680788
##                                    pvalue       padj
## ENSG00000107719.9 PALD1      3.109507e-06 0.06627293
## ENSG00000270550.1 IGHV3-30   9.918614e-06 0.10569771
## ENSG00000110436.13 SLC1A2    1.921703e-05 0.13652420
## ENSG00000171236.10 LRG1      1.221418e-04 0.47559855
## ENSG00000263006.6 ROCK1P1    1.565244e-04 0.47559855
## ENSG00000246560.2 UBE2D3-AS1 1.680196e-04 0.47559855
## ENSG00000120262.10 CCDC170   1.959676e-04 0.47559855
## ENSG00000278196.3 IGLV2-8    2.104247e-04 0.47559855
## ENSG00000137563.13 GGH       2.138768e-04 0.47559855
## ENSG00000211663.2 IGLV3-19   2.325143e-04 0.47559855

mean(abs(dge$stat))

## [1] 0.8319592

infec_pod1_adj <- dge

Look at RPM of top genes in a box plot.

# make RPM
rpm <- apply(mx,2,function(x) { x/sum(x) *1e6} )
# separate by infection status
rpm_i0 <- rpm[,which(colnames(rpm) %in% rownames(ss2[which(ss2$infec==0),]))]
rpm_i1 <- rpm[,which(colnames(rpm) %in% rownames(ss2[which(ss2$infec==1),]))]

# get sig hits 
top <- union(rownames(head(subset(infec_pod1,padj<0.05),10)) , rownames(head(subset(infec_pod1_adj,padj<0.05),10)) )

g <- top[1]

par(mfrow=c(2,3))
par(mar=c(2.1, 3.1, 2.1, 1.1))
lapply(top,function(g) {
  g0 <- rpm_i0[which(rownames(rpm_i0) == g),]
  g1 <- rpm_i1[which(rownames(rpm_i1) == g),]
  gl <- list("Ctrl"=log10(g0+0.1),"Infec"=log10(g1+0.1))
  boxplot(gl,cex=0,col="white",ylab="log10(RPM)")
  beeswarm(gl,add=TRUE,pch=19)
  mtext(g,cex=0.7)
})

## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
## 
## [[6]]
## NULL
## 
## [[7]]
## NULL
## 
## [[8]]
## NULL
## 
## [[9]]
## NULL
## 
## [[10]]
## NULL

par(mfrow=c(1,1))

par(mar= c(5.1, 4.1, 4.1, 2.1) )

Infection in high CRP group

ss2 <- subset(ss2,crp_group==4)

Infection in high CRP group T0

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
ss2 <- subset(ss2,crp_group==4)
table(ss2$infec)

## 
##  0  1 
## 40 15

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 251 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE      stat
## ENSG00000081041.9 CXCL2       54.67578      2.4412085 0.4042774  6.038448
## ENSG00000277632.2 CCL3       566.52696      2.2413827 0.3723372  6.019765
## ENSG00000204936.10 CD177     132.67404     -2.7945103 0.5647378 -4.948332
## ENSG00000125538.12 IL1B      566.87608      1.4468628 0.3030643  4.774111
## ENSG00000122877.17 EGR2      185.00839      1.9815033 0.4184339  4.735523
## ENSG00000276085.1 CCL3L1     255.05022      2.0480682 0.4365353  4.691644
## ENSG00000271614.1 ATP2B1-AS1 335.81186      0.8540205 0.1881685  4.538595
## ENSG00000162772.17 ATF3      210.49710      1.7356235 0.3830255  4.531352
## ENSG00000145632.15 PLK2       67.63251      1.1523297 0.2602033  4.428575
## ENSG00000278196.3 IGLV2-8    176.52078     -1.6259988 0.3704989 -4.388674
##                                    pvalue         padj
## ENSG00000081041.9 CXCL2      1.556031e-09 1.841376e-05
## ENSG00000277632.2 CCL3       1.746705e-09 1.841376e-05
## ENSG00000204936.10 CD177     7.485213e-07 5.260608e-03
## ENSG00000125538.12 IL1B      1.805026e-06 9.213347e-03
## ENSG00000122877.17 EGR2      2.184914e-06 9.213347e-03
## ENSG00000276085.1 CCL3L1     2.710190e-06 9.523608e-03
## ENSG00000271614.1 ATP2B1-AS1 5.663039e-06 1.544596e-02
## ENSG00000162772.17 ATF3      5.860733e-06 1.544596e-02
## ENSG00000145632.15 PLK2      9.485759e-06 2.186665e-02
## ENSG00000278196.3 IGLV2-8    1.140439e-05 2.186665e-02

mean(abs(dge$stat))

## [1] 0.7545652

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 14 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000234200.2 U82671.8        9.544245    -30.0000000 3.4122477 -8.791859
## ENSG00000277632.2 CCL3          566.526963      2.1308743 0.3959351  5.381878
## ENSG00000081041.9 CXCL2          54.675784      2.3432113 0.4458406  5.255715
## ENSG00000276085.1 CCL3L1        255.050222      2.3337190 0.4600686  5.072546
## ENSG00000154099.18 DNAAF1        30.756835      1.0192608 0.2126929  4.792171
## ENSG00000125538.12 IL1B         566.876075      1.4998324 0.3335427  4.496672
## ENSG00000263089.1 RP11-166P13.4  30.702829      0.5983402 0.1366917  4.377298
## ENSG00000164047.6 CAMP          706.158203     -1.9909703 0.4795642 -4.151624
## ENSG00000229989.4 MIR181A1HG     21.621435      0.8412098 0.2043464  4.116586
## ENSG00000188396.4 DYNLT4         73.796111      0.8026334 0.1953743  4.108183
##                                       pvalue         padj
## ENSG00000234200.2 U82671.8      1.471055e-18 3.226760e-14
## ENSG00000277632.2 CCL3          7.371268e-08 8.084438e-04
## ENSG00000081041.9 CXCL2         1.474504e-07 1.078108e-03
## ENSG00000276085.1 CCL3L1        3.925277e-07 2.152524e-03
## ENSG00000154099.18 DNAAF1       1.649864e-06 7.237952e-03
## ENSG00000125538.12 IL1B         6.902519e-06 2.523446e-02
## ENSG00000263089.1 RP11-166P13.4 1.201595e-05 3.765284e-02
## ENSG00000164047.6 CAMP          3.301247e-05 8.611395e-02
## ENSG00000229989.4 MIR181A1HG    3.845253e-05 8.611395e-02
## ENSG00000188396.4 DYNLT4        3.987840e-05 8.611395e-02

mean(abs(dge$stat))

## [1] 0.7837256

infec_hi_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 37 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE     stat
## ENSG00000081041.9 CXCL2          54.67578      2.9887595 0.5656907 5.283381
## ENSG00000222043.2 AC079305.10    22.44175      1.2690003 0.2578361 4.921732
## ENSG00000263089.1 RP11-166P13.4  30.70283      0.6578750 0.1455111 4.521134
## ENSG00000154099.18 DNAAF1        30.75683      1.0395823 0.2349983 4.423787
## ENSG00000225450.1 RP3-508I15.14  20.11515      0.8892843 0.2030298 4.380069
## ENSG00000230184.1 SMYD3-IT1      15.13075      0.9215237 0.2136594 4.313051
## ENSG00000260103.2 RP11-10O17.1   50.62430      0.7192927 0.1673209 4.298881
## ENSG00000271614.1 ATP2B1-AS1    335.81186      0.7270082 0.1774110 4.097876
## ENSG00000188396.4 DYNLT4         73.79611      0.8149057 0.1999687 4.075165
## ENSG00000125538.12 IL1B         566.87608      1.1953484 0.3113801 3.838872
##                                       pvalue       padj
## ENSG00000081041.9 CXCL2         1.268215e-07 0.00278183
## ENSG00000222043.2 AC079305.10   8.578154e-07 0.00940809
## ENSG00000263089.1 RP11-166P13.4 6.150928e-06 0.04497353
## ENSG00000154099.18 DNAAF1       9.698570e-06 0.05204815
## ENSG00000225450.1 RP3-508I15.14 1.186418e-05 0.05204815
## ENSG00000230184.1 SMYD3-IT1     1.610172e-05 0.05379173
## ENSG00000260103.2 RP11-10O17.1  1.716627e-05 0.05379173
## ENSG00000271614.1 ATP2B1-AS1    4.169581e-05 0.11206758
## ENSG00000188396.4 DYNLT4        4.598168e-05 0.11206758
## ENSG00000125538.12 IL1B         1.236007e-04 0.27111815

mean(abs(dge$stat))

## [1] 0.761354

infec_hi_t0_adj <- dge

Infection in high CRP group EOS

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
ss2 <- subset(ss2,crp_group==4)
table(ss2$infec)

## 
##  0  1 
## 36 16

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 157 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000081041.9 CXCL2           59.38579      2.2675913 0.4119587  5.504415
## ENSG00000233608.4 TWIST2          31.85175      2.4148913 0.5301118  4.555438
## ENSG00000100906.11 NFKBIA       8328.93207      1.2218830 0.2758840  4.428974
## ENSG00000211679.2 IGLC3          717.15136     -1.8750700 0.4673205 -4.012385
## ENSG00000270972.1 RP11-326C3.15   90.07297      1.3355253 0.3412391  3.913753
## ENSG00000164683.18 HEY1           11.35677      1.5525717 0.3972293  3.908503
## ENSG00000163734.4 CXCL3           32.53411      1.0431076 0.2788941  3.740157
## ENSG00000137331.12 IER3         1035.52928      0.9442025 0.2525552  3.738599
## ENSG00000203811.1 H3C14           25.08126      1.0987773 0.2961847  3.709770
## ENSG00000203852.3 H3C15           25.08126      1.0987773 0.2961847  3.709770
##                                       pvalue        padj
## ENSG00000081041.9 CXCL2         3.703965e-08 0.000817354
## ENSG00000233608.4 TWIST2        5.227662e-06 0.057679408
## ENSG00000100906.11 NFKBIA       9.468247e-06 0.069645265
## ENSG00000211679.2 IGLC3         6.010832e-05 0.331602548
## ENSG00000270972.1 RP11-326C3.15 9.087261e-05 0.341559968
## ENSG00000164683.18 HEY1         9.286989e-05 0.341559968
## ENSG00000163734.4 CXCL3         1.839053e-04 0.457774341
## ENSG00000137331.12 IER3         1.850487e-04 0.457774341
## ENSG00000203811.1 H3C14         2.074475e-04 0.457774341
## ENSG00000203852.3 H3C15         2.074475e-04 0.457774341

mean(abs(dge$stat))

## [1] 0.7518441

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 5 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                             baseMean log2FoldChange     lfcSE      stat
## ENSG00000081041.9 CXCL2     59.38579      2.0807484 0.4097794  5.077728
## ENSG00000100906.11 NFKBIA 8328.93207      1.2324683 0.2822394  4.366747
## ENSG00000233608.4 TWIST2    31.85175      2.1222317 0.5269734  4.027208
## ENSG00000211679.2 IGLC3    717.15136     -1.8493501 0.4775844 -3.872300
## ENSG00000158714.11 SLAMF8  139.06390     -0.5474946 0.1429130 -3.830963
## ENSG00000203814.6 H2BC18   155.89198      1.6969900 0.4522577  3.752263
## ENSG00000181649.8 PHLDA2    11.78556      1.2185954 0.3315817  3.675098
## ENSG00000278196.3 IGLV2-8  136.83437     -1.3444988 0.3664133 -3.669350
## ENSG00000137331.12 IER3   1035.52928      0.9396829 0.2570760  3.655272
## ENSG00000203811.1 H3C14     25.08126      1.1390341 0.3116235  3.655161
##                                 pvalue        padj
## ENSG00000081041.9 CXCL2   3.819752e-07 0.008429047
## ENSG00000100906.11 NFKBIA 1.261105e-05 0.139144070
## ENSG00000233608.4 TWIST2  5.644302e-05 0.415176056
## ENSG00000211679.2 IGLC3   1.078131e-04 0.469157816
## ENSG00000158714.11 SLAMF8 1.276425e-04 0.469157816
## ENSG00000203814.6 H2BC18  1.752454e-04 0.469157816
## ENSG00000181649.8 PHLDA2  2.377583e-04 0.469157816
## ENSG00000278196.3 IGLV2-8 2.431677e-04 0.469157816
## ENSG00000137331.12 IER3   2.569090e-04 0.469157816
## ENSG00000203811.1 H3C14   2.570207e-04 0.469157816

mean(abs(dge$stat))

## [1] 0.7599704

infec_hi_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 22 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000137331.12 IER3         1035.52928      0.7141676 0.15437720  4.626121
## ENSG00000203811.1 H3C14           25.08126      0.7919624 0.18066746  4.383536
## ENSG00000203852.3 H3C15           25.08126      0.7919624 0.18066746  4.383536
## ENSG00000166716.10 ZNF592       3122.46108      0.2702272 0.06886622  3.923944
## ENSG00000136244.12 IL6            13.47291      1.5966335 0.42245821  3.779388
## ENSG00000211821.2 TRDV2           32.69504     -1.7812902 0.47211784 -3.772978
## ENSG00000196652.12 ZKSCAN5       402.77747      0.2149270 0.05831216  3.685800
## ENSG00000253651.1 SOD1P3          25.72541     -0.7099419 0.19470290 -3.646283
## ENSG00000281383.1 CH507-513H4.5   12.92813     -1.1564546 0.31791056 -3.637673
## ENSG00000273018.7 FAM106A        605.82214     -0.8674392 0.23918780 -3.626603
##                                       pvalue       padj
## ENSG00000137331.12 IER3         3.725777e-06 0.08221671
## ENSG00000203811.1 H3C14         1.167681e-05 0.08589073
## ENSG00000203852.3 H3C15         1.167681e-05 0.08589073
## ENSG00000166716.10 ZNF592       8.711086e-05 0.48056882
## ENSG00000136244.12 IL6          1.572143e-04 0.55011063
## ENSG00000211821.2 TRDV2         1.613106e-04 0.55011063
## ENSG00000196652.12 ZKSCAN5      2.279850e-04 0.55011063
## ENSG00000253651.1 SOD1P3        2.660606e-04 0.55011063
## ENSG00000281383.1 CH507-513H4.5 2.751128e-04 0.55011063
## ENSG00000273018.7 FAM106A       2.871745e-04 0.55011063

mean(abs(dge$stat))

## [1] 0.7913034

infec_hi_eos_adj <- dge

Infection in high CRP group POD1

Nothing that interesting found here.

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
ss2 <- subset(ss2,crp_group==4)
table(ss2$infec)

## 
##  0  1 
## 39 15

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 134 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange     lfcSE      stat
## ENSG00000263006.6 ROCK1P1   143.51840     -2.8457150 0.6127849 -4.643905
## ENSG00000114993.17 RTKN      33.45040     -0.9727041 0.2246229 -4.330387
## ENSG00000133067.18 LGR6     332.98785     -1.0190232 0.2389664 -4.264295
## ENSG00000272282.1 LINC02084 133.42431     -0.8483765 0.1995052 -4.252403
## ENSG00000163564.15 PYHIN1   778.61688     -0.7263181 0.1722034 -4.217790
## ENSG00000188011.5 RTP5      110.63679     -1.2773824 0.3113386 -4.102871
## ENSG00000101292.8 PROKR2     14.88931      1.2783959 0.3141276  4.069671
## ENSG00000162062.15 TEDC2     12.78794     -1.0962136 0.2693922 -4.069211
## ENSG00000161249.21 DMKN      25.84487     -0.9221472 0.2284010 -4.037404
## ENSG00000205176.3 REXO1L1P   35.31082     -2.0578515 0.5150296 -3.995599
##                                   pvalue       padj
## ENSG00000263006.6 ROCK1P1   3.418846e-06 0.07286245
## ENSG00000114993.17 RTKN     1.488477e-05 0.10515679
## ENSG00000133067.18 LGR6     2.005344e-05 0.10515679
## ENSG00000272282.1 LINC02084 2.114893e-05 0.10515679
## ENSG00000163564.15 PYHIN1   2.467079e-05 0.10515679
## ENSG00000188011.5 RTP5      4.080540e-05 0.11460022
## ENSG00000101292.8 PROKR2    4.707960e-05 0.11460022
## ENSG00000162062.15 TEDC2    4.717255e-05 0.11460022
## ENSG00000161249.21 DMKN     5.404593e-05 0.11460022
## ENSG00000205176.3 REXO1L1P  6.453100e-05 0.11460022

mean(abs(dge$stat))

## [1] 0.9575974

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 11 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange      lfcSE      stat
## ENSG00000272282.1 LINC02084      133.42431     -0.9863098 0.19601677 -5.031762
## ENSG00000274226.5 TBC1D3H         18.51873     15.6901825 3.32880289  4.713461
## ENSG00000163564.15 PYHIN1        778.61688     -0.7954948 0.17713864 -4.490803
## ENSG00000114993.17 RTKN           33.45040     -1.0353321 0.23289568 -4.445476
## ENSG00000274611.4 TBC1D3          56.78775    -14.8321478 3.35408796 -4.422111
## ENSG00000242516.2 LINC00960      319.14668      0.6196175 0.14078957  4.401018
## ENSG00000145649.8 GZMA          1281.54145     -0.8266697 0.19163297 -4.313817
## ENSG00000287644.1 CTD-2267D19.8   50.47381      1.0383331 0.24105860  4.307389
## ENSG00000164366.4 CCDC127        367.82694     -0.2970838 0.07000958 -4.243474
## ENSG00000214940.8 NPIPA8         119.02853     -3.2783231 0.77394460 -4.235863
##                                       pvalue       padj
## ENSG00000272282.1 LINC02084     4.859917e-07 0.01035794
## ENSG00000274226.5 TBC1D3H       2.435444e-06 0.02595331
## ENSG00000163564.15 PYHIN1       7.095507e-06 0.03827248
## ENSG00000114993.17 RTKN         8.769762e-06 0.03827248
## ENSG00000274611.4 TBC1D3        9.774142e-06 0.03827248
## ENSG00000242516.2 LINC00960     1.077440e-05 0.03827248
## ENSG00000145649.8 GZMA          1.604594e-05 0.04400948
## ENSG00000287644.1 CTD-2267D19.8 1.651930e-05 0.04400948
## ENSG00000164366.4 CCDC127       2.200859e-05 0.04852456
## ENSG00000214940.8 NPIPA8        2.276759e-05 0.04852456

mean(abs(dge$stat))

## [1] 1.055163

infec_hi_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 29 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000211675.2 IGLC1         434.36001     -1.2567577 0.26755308 -4.697228
## ENSG00000137563.13 GGH           74.09123     -0.6996762 0.16333393 -4.283716
## ENSG00000214940.8 NPIPA8        119.02853     -3.6034008 0.87843696 -4.102060
## ENSG00000014914.21 MTMR11      1006.38773      0.4366826 0.11421380  3.823378
## ENSG00000286482.1 RP3-395C13.2   58.43179      0.6167469 0.16419987  3.756074
## ENSG00000000460.17 C1orf112     186.09589      0.3118072 0.08349367  3.734501
## ENSG00000143228.13 NUF2          31.00317     -0.6012651 0.16638822 -3.613628
## ENSG00000242516.2 LINC00960     319.14668      0.5545684 0.15669239  3.539217
## ENSG00000225101.6 OR52K3P       332.46030      0.6931745 0.19617302  3.533485
## ENSG00000001630.17 CYP51A1      327.64662     -0.5637684 0.16018014 -3.519590
##                                      pvalue       padj
## ENSG00000211675.2 IGLC1        2.637165e-06 0.05620589
## ENSG00000137563.13 GGH         1.837974e-05 0.19586366
## ENSG00000214940.8 NPIPA8       4.094886e-05 0.29091434
## ENSG00000014914.21 MTMR11      1.316355e-04 0.66811827
## ENSG00000286482.1 RP3-395C13.2 1.725994e-04 0.66811827
## ENSG00000000460.17 C1orf112    1.880875e-04 0.66811827
## ENSG00000143228.13 NUF2        3.019420e-04 0.75627722
## ENSG00000242516.2 LINC00960    4.013152e-04 0.75627722
## ENSG00000225101.6 OR52K3P      4.101189e-04 0.75627722
## ENSG00000001630.17 CYP51A1     4.322141e-04 0.75627722

mean(abs(dge$stat))

## [1] 0.7980182

infec_hi_pod1_adj <- dge

Infection in low CRP group

Infection in low CRP group T0

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 21

ss2 <- subset(ss2,crp_group==1)
table(ss2$infec)

## 
##  0  1 
## 50  6

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 287 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                      baseMean log2FoldChange     lfcSE
## ENSG00000260287.4 TBC1D3G            28.40074    -22.6109180 4.0339406
## ENSG00000137563.13 GGH               62.69110      1.1709886 0.2328459
## ENSG00000274827.4 LINC01297          92.16914      1.2945366 0.2671907
## ENSG00000140488.16 CELF6            230.87659     -0.6575787 0.1586643
## ENSG00000099139.14 PCSK5           1203.88986      1.5452196 0.3805552
## ENSG00000225096.3 XXbac-BPG55C20.7   22.75294      1.2127085 0.3089678
## ENSG00000243708.11 PLA2G4B         1104.99936     -0.5151795 0.1313353
## ENSG00000205176.3 REXO1L1P           17.38896      1.5705138 0.4015593
## ENSG00000232810.4 TNF               441.02353      1.9155598 0.4949741
## ENSG00000242686.4 PDE6B-AS1          23.59407     -1.5222134 0.4013725
##                                         stat       pvalue         padj
## ENSG00000260287.4 TBC1D3G          -5.605169 2.080520e-08 0.0004553011
## ENSG00000137563.13 GGH              5.029028 4.929737e-07 0.0053941178
## ENSG00000274827.4 LINC01297         4.844991 1.266173e-06 0.0092363132
## ENSG00000140488.16 CELF6           -4.144465 3.406078e-05 0.1863465053
## ENSG00000099139.14 PCSK5            4.060434 4.898150e-05 0.2143822300
## ENSG00000225096.3 XXbac-BPG55C20.7  3.925031 8.671847e-05 0.2513932114
## ENSG00000243708.11 PLA2G4B         -3.922628 8.758828e-05 0.2513932114
## ENSG00000205176.3 REXO1L1P          3.911038 9.190028e-05 0.2513932114
## ENSG00000232810.4 TNF               3.870021 1.088262e-04 0.2646169227
## ENSG00000242686.4 PDE6B-AS1        -3.792521 1.491258e-04 0.3173303951

mean(abs(dge$stat))

## [1] 0.7832468

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 21 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE      stat
## ENSG00000237973.1 MTCO1P12    40.32129       3.002411 0.4119019  7.289141
## ENSG00000260287.4 TBC1D3G     42.10428     -29.665053 5.2096228 -5.694280
## ENSG00000274611.4 TBC1D3      82.13231     -29.166989 5.4345099 -5.366995
## ENSG00000152463.15 OLAH       21.52114       4.020448 0.7659512  5.248960
## ENSG00000239887.6 C1orf226    24.28151       2.631563 0.5103541  5.156348
## ENSG00000274827.4 LINC01297   92.16914       1.501144 0.2973802  5.047895
## ENSG00000130032.17 PRRG3      43.99543       2.456920 0.4944207  4.969290
## ENSG00000174705.13 SH3PXD2B  180.42091       1.863691 0.4011284  4.646121
## ENSG00000096060.15 FKBP5    6876.05993       1.399670 0.3153332  4.438702
## ENSG00000198929.13 NOS1AP     14.55122       2.218786 0.5050922  4.392834
##                                   pvalue         padj
## ENSG00000237973.1 MTCO1P12  3.119365e-13 6.842328e-09
## ENSG00000260287.4 TBC1D3G   1.238934e-08 1.358801e-04
## ENSG00000274611.4 TBC1D3    8.005912e-08 5.853656e-04
## ENSG00000152463.15 OLAH     1.529599e-07 8.387938e-04
## ENSG00000239887.6 C1orf226  2.518132e-07 1.104705e-03
## ENSG00000274827.4 LINC01297 4.467048e-07 1.633078e-03
## ENSG00000130032.17 PRRG3    6.719846e-07 2.105712e-03
## ENSG00000174705.13 SH3PXD2B 3.382342e-06 9.273960e-03
## ENSG00000096060.15 FKBP5    9.050294e-06 2.180410e-02
## ENSG00000198929.13 NOS1AP   1.118825e-05 2.180410e-02

mean(abs(dge$stat))

## [1] 0.7658495

infec_lo_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 35 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange     lfcSE      stat
## ENSG00000237973.1 MTCO1P12        40.321285      2.9361037 0.4322990  6.791836
## ENSG00000130032.17 PRRG3          43.995433      2.4165029 0.5073411  4.763073
## ENSG00000280064.1 RP11-205M5.3   192.566181     -4.2862829 0.9480431 -4.521190
## ENSG00000174705.13 SH3PXD2B      180.420909      1.5718498 0.3863045  4.068940
## ENSG00000074803.20 SLC12A1        84.816349     -4.2756070 1.0595137 -4.035443
## ENSG00000185201.17 IFITM2       5170.750055      1.0596384 0.2667563  3.972309
## ENSG00000183873.18 SCN5A          22.637470      1.6069010 0.4062225  3.955717
## ENSG00000102962.5 CCL22            9.737015      1.3544661 0.3439459  3.938021
## ENSG00000274827.4 LINC01297       92.169136      1.1883708 0.3018729  3.936659
## ENSG00000251661.3 RP11-326C3.11  202.018525     -0.8809665 0.2263599 -3.891885
##                                       pvalue         padj
## ENSG00000237973.1 MTCO1P12      1.107154e-11 2.428543e-07
## ENSG00000130032.17 PRRG3        1.906667e-06 2.091137e-02
## ENSG00000280064.1 RP11-205M5.3  6.149298e-06 4.496161e-02
## ENSG00000174705.13 SH3PXD2B     4.722745e-05 2.013724e-01
## ENSG00000074803.20 SLC12A1      5.449943e-05 2.013724e-01
## ENSG00000185201.17 IFITM2       7.117918e-05 2.013724e-01
## ENSG00000183873.18 SCN5A        7.630555e-05 2.013724e-01
## ENSG00000102962.5 CCL22         8.215634e-05 2.013724e-01
## ENSG00000274827.4 LINC01297     8.262375e-05 2.013724e-01
## ENSG00000251661.3 RP11-326C3.11 9.946850e-05 2.142215e-01

mean(abs(dge$stat))

## [1] 0.7297472

infec_lo_t0_adj <- dge

Infection in low CRP group EOS

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 77 21

ss2 <- subset(ss2,crp_group==1)
table(ss2$infec)

## 
##  0  1 
## 41  5

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 104 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000260287.4 TBC1D3G        61.94712    -23.0988811 3.1581305 -7.314100
## ENSG00000268734.1 CTB-61M7.2     23.54191    -22.0540131 3.1126254 -7.085341
## ENSG00000273513.1 TBC1D3K        28.79301    -22.3280408 3.2081270 -6.959837
## ENSG00000284554.2 CTA-150C2.22   15.49624    -21.4867745 3.4122270 -6.296995
## ENSG00000274611.4 TBC1D3         96.90177    -23.9534141 4.7239603 -5.070621
## ENSG00000112149.10 CD83         193.46983      2.1178279 0.4497498  4.708903
## ENSG00000081041.9 CXCL2          28.29751      2.6857651 0.6118343  4.389694
## ENSG00000123358.20 NR4A1        473.18397      1.5297318 0.3531655  4.331488
## ENSG00000269937.1 RP11-20I23.8  397.39661      0.9816369 0.2282927  4.299905
## ENSG00000261613.2 RP11-20I23.13 231.27242      0.8820845 0.2167940  4.068768
##                                       pvalue         padj
## ENSG00000260287.4 TBC1D3G       2.591127e-13 5.711622e-09
## ENSG00000268734.1 CTB-61M7.2    1.387021e-12 1.528705e-08
## ENSG00000273513.1 TBC1D3K       3.406667e-12 2.503105e-08
## ENSG00000284554.2 CTA-150C2.22  3.034719e-10 1.672358e-06
## ENSG00000274611.4 TBC1D3        3.965189e-07 1.748093e-03
## ENSG00000112149.10 CD83         2.490541e-06 9.149834e-03
## ENSG00000081041.9 CXCL2         1.135104e-05 3.574441e-02
## ENSG00000123358.20 NR4A1        1.481053e-05 4.080857e-02
## ENSG00000269937.1 RP11-20I23.8  1.708714e-05 4.185021e-02
## ENSG00000261613.2 RP11-20I23.13 4.726228e-05 1.041802e-01

mean(abs(dge$stat))

## [1] 0.7858115

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 6 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE       stat
## ENSG00000275954.5 TBC1D3F        33.51534     -29.883726 2.6369270 -11.332784
## ENSG00000259948.2 RP11-326A19.5  19.78023     -23.914298 3.0688020  -7.792714
## ENSG00000260287.4 TBC1D3G        61.94712     -29.395781 4.0841725  -7.197487
## ENSG00000267303.1 CTD-2369P2.12  31.24200     -20.463790 2.9549062  -6.925360
## ENSG00000268734.1 CTB-61M7.2     23.54191     -24.973470 4.1729640  -5.984588
## ENSG00000284554.2 CTA-150C2.22   15.49624     -29.083720 4.9560115  -5.868372
## ENSG00000273513.1 TBC1D3K        28.79301     -23.718876 4.1649549  -5.694870
## ENSG00000107719.9 PALD1         108.35203       2.675816 0.4857488   5.508642
## ENSG00000274611.4 TBC1D3         96.90177     -30.000000 5.5380902  -5.417030
## ENSG00000259661.1 AC068831.15    24.17674     -29.953491 5.5405222  -5.406258
##                                       pvalue         padj
## ENSG00000275954.5 TBC1D3F       9.028801e-30 1.992386e-25
## ENSG00000259948.2 RP11-326A19.5 6.558479e-15 7.236297e-11
## ENSG00000260287.4 TBC1D3G       6.133219e-13 4.511391e-09
## ENSG00000267303.1 CTD-2369P2.12 4.348676e-12 2.399056e-08
## ENSG00000268734.1 CTB-61M7.2    2.169383e-09 9.574356e-06
## ENSG00000284554.2 CTA-150C2.22  4.400946e-09 1.618595e-05
## ENSG00000273513.1 TBC1D3K       1.234661e-08 3.892182e-05
## ENSG00000107719.9 PALD1         3.616131e-08 9.974646e-05
## ENSG00000274611.4 TBC1D3        6.059722e-08 1.420126e-04
## ENSG00000259661.1 AC068831.15   6.435520e-08 1.420126e-04

mean(abs(dge$stat))

## [1] 0.8521819

infec_lo_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 34 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange     lfcSE      stat
## ENSG00000107719.9 PALD1     108.35203      2.7072742 0.5116337  5.291431
## ENSG00000179820.16 MYADM   2493.44944      0.9060031 0.1919774  4.719322
## ENSG00000205189.12 ZBTB10   299.56466      1.1745843 0.2663775  4.409473
## ENSG00000106571.15 GLI3      14.71645      1.7690416 0.4074469  4.341772
## ENSG00000152217.20 SETBP1   444.67981      0.9087408 0.2102917  4.321334
## ENSG00000232021.7 LEF1-AS1  131.81316     -1.1189064 0.2604249 -4.296465
## ENSG00000127124.16 HIVEP3   733.20898      0.9356917 0.2329499  4.016707
## ENSG00000204381.12 LAYN      41.75720     -2.1081428 0.5316181 -3.965521
## ENSG00000139610.2 CELA1      21.39799     -1.9637641 0.4998412 -3.928776
## ENSG00000165655.17 ZNF503    95.07212      1.0592072 0.2726987  3.884166
##                                  pvalue        padj
## ENSG00000107719.9 PALD1    1.213631e-07 0.002678121
## ENSG00000179820.16 MYADM   2.366317e-06 0.026108757
## ENSG00000205189.12 ZBTB10  1.036224e-05 0.063826322
## ENSG00000106571.15 GLI3    1.413384e-05 0.063826322
## ENSG00000152217.20 SETBP1  1.550891e-05 0.063826322
## ENSG00000232021.7 LEF1-AS1 1.735433e-05 0.063826322
## ENSG00000127124.16 HIVEP3  5.901696e-05 0.186046741
## ENSG00000204381.12 LAYN    7.323586e-05 0.196012808
## ENSG00000139610.2 CELA1    8.537936e-05 0.196012808
## ENSG00000165655.17 ZNF503  1.026817e-04 0.196012808

mean(abs(dge$stat))

## [1] 0.7993698

infec_lo_eos_adj <- dge

Infection in low CRP group POD1

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 19

ss2 <- subset(ss2,crp_group==1)
table(ss2$infec)

## 
##  0  1 
## 51  4

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 83 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000263244.2 RP11-473I1.9  136.22535     -22.023769 2.6787476 -8.221666
## ENSG00000275302.2 CCL4          434.08075       2.677717 0.4420782  6.057112
## ENSG00000260287.4 TBC1D3G        37.77301     -22.309321 3.8418428 -5.806932
## ENSG00000253797.2 UTP14C         15.75173     -21.093285 3.8247900 -5.514887
## ENSG00000277632.2 CCL3          340.81892       3.646567 0.6877949  5.301824
## ENSG00000118503.15 TNFAIP3     3568.09110       1.596738 0.3018511  5.289821
## ENSG00000081041.9 CXCL2          80.72454       4.040491 0.7679762  5.261219
## ENSG00000138738.11 PRDM5         60.80353       2.245809 0.4301535  5.220949
## ENSG00000115604.12 IL18R1       653.78759       1.682109 0.3237240  5.196119
## ENSG00000163661.4 PTX3          101.95976       1.676977 0.3240238  5.175474
##                                      pvalue         padj
## ENSG00000263244.2 RP11-473I1.9 2.006952e-16 4.271598e-12
## ENSG00000275302.2 CCL4         1.385870e-09 1.474843e-05
## ENSG00000260287.4 TBC1D3G      6.362784e-09 4.514183e-05
## ENSG00000253797.2 UTP14C       3.490041e-08 1.857051e-04
## ENSG00000277632.2 CCL3         1.146515e-07 4.343210e-04
## ENSG00000118503.15 TNFAIP3     1.224359e-07 4.343210e-04
## ENSG00000081041.9 CXCL2        1.431032e-07 4.351154e-04
## ENSG00000138738.11 PRDM5       1.780087e-07 4.735920e-04
## ENSG00000115604.12 IL18R1      2.034916e-07 4.812351e-04
## ENSG00000163661.4 PTX3         2.273337e-07 4.838569e-04

mean(abs(dge$stat))

## [1] 1.010522

# model with clinical covariates
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 18 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000284931.1 CTD-2643I7.4  182.253678     -19.687232 2.2731674 -8.660705
## ENSG00000263244.2 RP11-473I1.9  136.225348     -24.271770 2.9269177 -8.292604
## ENSG00000261759.1 RP11-626G11.3  17.177769     -16.471967 2.3549384 -6.994648
## ENSG00000107719.9 PALD1          66.882232       3.580906 0.5359352  6.681603
## ENSG00000273513.1 TBC1D3K        21.954589     -30.000000 4.7368710 -6.333295
## ENSG00000260287.4 TBC1D3G        37.773009     -29.994660 4.8750658 -6.152668
## ENSG00000242534.3 IGKV2D-28       8.309503     -29.976588 5.2334295 -5.727905
## ENSG00000254614.2 AP003068.23   433.793525      -1.990339 0.3648507 -5.455215
## ENSG00000183598.4 H3C13           7.695684       4.293642 0.7944061  5.404845
## ENSG00000273025.1 RP11-106M3.5    8.374942     -19.257691 3.8184518 -5.043324
##                                       pvalue         padj
## ENSG00000284931.1 CTD-2643I7.4  4.688548e-18 9.992702e-14
## ENSG00000263244.2 RP11-473I1.9  1.107954e-16 1.180692e-12
## ENSG00000261759.1 RP11-626G11.3 2.659248e-12 1.889218e-08
## ENSG00000107719.9 PALD1         2.363428e-11 1.259293e-07
## ENSG00000273513.1 TBC1D3K       2.399805e-10 1.022941e-06
## ENSG00000260287.4 TBC1D3G       7.619029e-10 2.706406e-06
## ENSG00000242534.3 IGKV2D-28     1.016783e-08 3.095815e-05
## ENSG00000254614.2 AP003068.23   4.891360e-08 1.303119e-04
## ENSG00000183598.4 H3C13         6.486450e-08 1.536063e-04
## ENSG00000273025.1 RP11-106M3.5  4.575129e-07 9.750972e-04

mean(abs(dge$stat))

## [1] 0.8394566

infec_lo_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 24 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000107719.9 PALD1          66.88223      3.5670498 0.5651323  6.311884
## ENSG00000254614.2 AP003068.23   433.79352     -2.0091426 0.3923548 -5.120729
## ENSG00000183246.8 RIMBP3C        13.03160     -6.7488223 1.6053959 -4.203837
## ENSG00000260238.6 PMF1-BGLAP    123.59233     -1.0347448 0.2465450 -4.196982
## ENSG00000137869.15 CYP19A1       19.03905     -4.2589501 1.0316283 -4.128377
## ENSG00000096696.15 DSP           80.75129     -5.7470014 1.4246909 -4.033858
## ENSG00000255200.1 PGAM1P8        82.13391     -1.5327583 0.3818111 -4.014442
## ENSG00000146232.17 NFKBIE       978.84088     -0.8316586 0.2107895 -3.945447
## ENSG00000279662.1 RP11-609N14.4  38.13266      1.1351499 0.2916021  3.892804
## ENSG00000171236.10 LRG1         623.62715     -1.2953112 0.3394857 -3.815510
##                                       pvalue         padj
## ENSG00000107719.9 PALD1         2.756589e-10 5.875119e-06
## ENSG00000254614.2 AP003068.23   3.043562e-07 3.243372e-03
## ENSG00000183246.8 RIMBP3C       2.624281e-05 1.441270e-01
## ENSG00000260238.6 PMF1-BGLAP    2.704960e-05 1.441270e-01
## ENSG00000137869.15 CYP19A1      3.653332e-05 1.557269e-01
## ENSG00000096696.15 DSP          5.486838e-05 1.814243e-01
## ENSG00000255200.1 PGAM1P8       5.958666e-05 1.814243e-01
## ENSG00000146232.17 NFKBIE       7.965124e-05 2.122008e-01
## ENSG00000279662.1 RP11-609N14.4 9.909221e-05 2.346614e-01
## ENSG00000171236.10 LRG1         1.359018e-04 2.524306e-01

mean(abs(dge$stat))

## [1] 0.894695

infec_lo_pod1_adj <- dge

Infection in treatment group A

Infection in treatment group A T0

CCL3 seems quite robust.

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 21

ss2 <- subset(ss2,treatment_group==1)
table(ss2$infec)

## 
##  0  1 
## 43  7

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 83 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                             baseMean log2FoldChange     lfcSE      stat
## ENSG00000260287.4 TBC1D3G   23.44166    -22.5667772 3.4251829 -6.588488
## ENSG00000137331.12 IER3   1016.45739      1.6800322 0.2919380  5.754758
## ENSG00000147576.17 ADHFE1  934.81114     -0.6596945 0.1267333 -5.205377
## ENSG00000277632.2 CCL3     860.85616      2.9825889 0.5933781  5.026456
## ENSG00000182013.18 PNMA8A   11.92564      2.2127877 0.4552821  4.860256
## ENSG00000161905.13 ALOX15   23.63455      1.9571138 0.4039780  4.844604
## ENSG00000167766.19 ZNF83  2629.92337     -0.6390342 0.1392730 -4.588357
## ENSG00000234709.2 UPF3AP3   21.14844      0.9677018 0.2137213  4.527869
## ENSG00000232810.4 TNF      532.70238      2.0738894 0.4711473  4.401786
## ENSG00000099860.9 GADD45B 2291.47951      1.0463573 0.2385711  4.385935
##                                 pvalue         padj
## ENSG00000260287.4 TBC1D3G 4.443274e-11 9.746322e-07
## ENSG00000137331.12 IER3   8.676609e-09 9.516071e-05
## ENSG00000147576.17 ADHFE1 1.936035e-07 1.415564e-03
## ENSG00000277632.2 CCL3    4.996276e-07 2.739833e-03
## ENSG00000182013.18 PNMA8A 1.172340e-06 4.637947e-03
## ENSG00000161905.13 ALOX15 1.268643e-06 4.637947e-03
## ENSG00000167766.19 ZNF83  4.467481e-06 1.399917e-02
## ENSG00000234709.2 UPF3AP3 5.958154e-06 1.633651e-02
## ENSG00000232810.4 TNF     1.073636e-05 2.533245e-02
## ENSG00000099860.9 GADD45B 1.154887e-05 2.533245e-02

mean(abs(dge$stat))

## [1] 0.8864522

# model with clinical covariates
# including crp_group in the model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 34 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000274611.4 TBC1D3          61.02093    -29.8751994 5.0803431 -5.880548
## ENSG00000273117.1 INSIG1-DT      109.77167      1.0347456 0.2115610  4.891004
## ENSG00000233673.7 ANAPC1P1        32.40291     -2.0654713 0.4283281 -4.822171
## ENSG00000167766.19 ZNF83        2629.92337     -0.8179630 0.1696786 -4.820660
## ENSG00000147576.17 ADHFE1        934.81114     -0.6784607 0.1420030 -4.777792
## ENSG00000237754.1 RP11-521C10.1   13.72225     -1.7214553 0.3762302 -4.575537
## ENSG00000279359.1 RP11-36D19.9    48.17870      2.5914247 0.5818684  4.453627
## ENSG00000284606.1 RP11-556O5.7   154.73434     -0.8365505 0.1905379 -4.390467
## ENSG00000234420.7 ZNF37BP       1994.48390     -0.6039811 0.1377019 -4.386149
## ENSG00000054598.9 FOXC1           29.68356      2.1647702 0.4973820  4.352329
##                                       pvalue         padj
## ENSG00000274611.4 TBC1D3        4.089113e-09 0.0000896947
## ENSG00000273117.1 INSIG1-DT     1.003227e-06 0.0077751224
## ENSG00000233673.7 ANAPC1P1      1.420039e-06 0.0077751224
## ENSG00000167766.19 ZNF83        1.430839e-06 0.0077751224
## ENSG00000147576.17 ADHFE1       1.772310e-06 0.0077751224
## ENSG00000237754.1 RP11-521C10.1 4.749993e-06 0.0173651821
## ENSG00000279359.1 RP11-36D19.9  8.443181e-06 0.0264573100
## ENSG00000284606.1 RP11-556O5.7  1.131074e-05 0.0281195177
## ENSG00000234420.7 ZNF37BP       1.153753e-05 0.0281195177
## ENSG00000054598.9 FOXC1         1.346988e-05 0.0295461927

mean(abs(dge$stat))

## [1] 1.094275

infec_a_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 64 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange     lfcSE      stat
## ENSG00000185201.17 IFITM2        4666.12989      1.4240423 0.2766663  5.147148
## ENSG00000167766.19 ZNF83         2629.92337     -0.6483684 0.1400962 -4.628023
## ENSG00000147576.17 ADHFE1         934.81114     -0.6560167 0.1479958 -4.432671
## ENSG00000255439.6 RP11-196G11.1   124.80644      1.0393206 0.2390768  4.347225
## ENSG00000237754.1 RP11-521C10.1    13.72225     -1.6625727 0.3841827 -4.327557
## ENSG00000277632.2 CCL3            860.85616      3.1002874 0.7179283  4.318380
## ENSG00000214413.9 BBIP1           996.88114     -0.5444700 0.1266365 -4.299473
## ENSG00000269711.1 CTD-3214H19.16  167.76977      1.7714729 0.4122681  4.296895
## ENSG00000099860.9 GADD45B        2291.47951      1.0125171 0.2390706  4.235222
## ENSG00000001631.16 KRIT1         1422.42000     -0.5380767 0.1274059 -4.223325
##                                        pvalue        padj
## ENSG00000185201.17 IFITM2        2.644764e-07 0.003776723
## ENSG00000167766.19 ZNF83         3.691735e-06 0.026358989
## ENSG00000147576.17 ADHFE1        9.307297e-06 0.030379214
## ENSG00000255439.6 RP11-196G11.1  1.378711e-05 0.030379214
## ENSG00000237754.1 RP11-521C10.1  1.507722e-05          NA
## ENSG00000277632.2 CCL3           1.571784e-05 0.030379214
## ENSG00000214413.9 BBIP1          1.712049e-05 0.030379214
## ENSG00000269711.1 CTD-3214H19.16 1.732069e-05 0.030379214
## ENSG00000099860.9 GADD45B        2.283265e-05 0.030379214
## ENSG00000001631.16 KRIT1         2.407242e-05 0.030379214

mean(abs(dge$stat))

## [1] 1.105629

infec_a_t0_adj <- dge

Infection in treatment group A EOS

CCL3 seems quite robust.

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 77 21

ss2 <- subset(ss2,treatment_group==1)
table(ss2$infec)

## 
##  0  1 
## 35  7

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 101 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE     stat
## ENSG00000100906.11 NFKBIA   9734.50485      1.9018267 0.2801957 6.787494
## ENSG00000167173.19 C15orf39 2198.74076      1.1854728 0.1754536 6.756617
## ENSG00000081041.9 CXCL2       53.31381      3.7502477 0.5670770 6.613296
## ENSG00000044574.9 HSPA5     7137.27302      1.4389376 0.2276968 6.319535
## ENSG00000125968.9 ID1         57.20937      3.4112787 0.5672157 6.014077
## ENSG00000125538.12 IL1B      436.12157      3.0466361 0.5084909 5.991526
## ENSG00000105697.9 HAMP        58.47161      2.5690836 0.4306887 5.965059
## ENSG00000185650.10 ZFP36L1  6744.81308      0.9258669 0.1563981 5.919937
## ENSG00000162772.17 ATF3       76.10013      2.0524677 0.3561658 5.762675
## ENSG00000163251.4 FZD5        85.90925      2.1821102 0.3871234 5.636730
##                                   pvalue         padj
## ENSG00000100906.11 NFKBIA   1.140979e-11 1.467806e-07
## ENSG00000167173.19 C15orf39 1.412506e-11 1.467806e-07
## ENSG00000081041.9 CXCL2     3.758564e-11 2.603808e-07
## ENSG00000044574.9 HSPA5     2.623513e-10 1.363112e-06
## ENSG00000125968.9 ID1       1.809150e-09 7.200661e-06
## ENSG00000125538.12 IL1B     2.078813e-09 7.200661e-06
## ENSG00000105697.9 HAMP      2.445451e-09 7.260544e-06
## ENSG00000185650.10 ZFP36L1  3.220658e-09 8.366868e-06
## ENSG00000162772.17 ATF3     8.279089e-09 1.911826e-05
## ENSG00000163251.4 FZD5      1.733091e-08 3.475487e-05

mean(abs(dge$stat))

## [1] 1.330536

# model with clinical covariates
# including crp_group in the model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 27 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE      stat
## ENSG00000132972.19 RNF17      13.11765    -18.8470748 2.0399879 -9.238817
## ENSG00000167173.19 C15orf39 2198.74076      1.2877505 0.1865780  6.901942
## ENSG00000100906.11 NFKBIA   9734.50485      1.7301002 0.3252893  5.318651
## ENSG00000162772.17 ATF3       76.10013      2.1917122 0.4165951  5.261014
## ENSG00000128203.7 ASPHD2     196.20728      0.7694590 0.1510066  5.095531
## ENSG00000162783.11 IER5     2200.29728      0.9752990 0.1924133  5.068770
## ENSG00000126003.7 PLAGL2     959.67879      0.9304217 0.1873070  4.967362
## ENSG00000155090.15 KLF10    2110.66056      1.6217528 0.3285993  4.935350
## ENSG00000081041.9 CXCL2       53.31381      3.2453213 0.6641432  4.886478
## ENSG00000185650.10 ZFP36L1  6744.81308      0.8335282 0.1711427  4.870369
##                                   pvalue         padj
## ENSG00000132972.19 RNF17    2.492379e-20           NA
## ENSG00000167173.19 C15orf39 5.129622e-12 6.271989e-08
## ENSG00000100906.11 NFKBIA   1.045394e-07 6.391017e-04
## ENSG00000162772.17 ATF3     1.432635e-07           NA
## ENSG00000128203.7 ASPHD2    3.477647e-07 1.223907e-03
## ENSG00000162783.11 IER5     4.003947e-07 1.223907e-03
## ENSG00000126003.7 PLAGL2    6.786985e-07 1.630411e-03
## ENSG00000155090.15 KLF10    8.000710e-07 1.630411e-03
## ENSG00000081041.9 CXCL2     1.026557e-06           NA
## ENSG00000185650.10 ZFP36L1  1.113899e-06 1.754794e-03

mean(abs(dge$stat))

## [1] 1.129203

infec_a_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 50 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE      stat
## ENSG00000167173.19 C15orf39 2198.74076      1.1001375 0.1618267  6.798246
## ENSG00000128203.7 ASPHD2     196.20728      0.7344183 0.1474197  4.981819
## ENSG00000126003.7 PLAGL2     959.67879      0.7834552 0.1586995  4.936720
## ENSG00000162783.11 IER5     2200.29728      0.9946933 0.2029167  4.901978
## ENSG00000159403.18 C1R        88.12334      0.7921169 0.1670592  4.741535
## ENSG00000183779.7 ZNF703     228.78586      1.3184889 0.2781708  4.739854
## ENSG00000138744.16 NAAA     2419.06871      0.7627905 0.1611521  4.733358
## ENSG00000256039.2 LINC02446  229.89248     -1.6703850 0.3567986 -4.681591
## ENSG00000100906.11 NFKBIA   9734.50485      1.1479049 0.2469534  4.648266
## ENSG00000164086.10 DUSP7    1004.95339      0.6072267 0.1326262  4.578483
##                                   pvalue         padj
## ENSG00000167173.19 C15orf39 1.059005e-11 1.249520e-07
## ENSG00000128203.7 ASPHD2    6.298919e-07 2.798624e-03
## ENSG00000126003.7 PLAGL2    7.944723e-07 2.798624e-03
## ENSG00000162783.11 IER5     9.487667e-07 2.798624e-03
## ENSG00000159403.18 C1R      2.121050e-06           NA
## ENSG00000183779.7 ZNF703    2.138721e-06 4.342736e-03
## ENSG00000138744.16 NAAA     2.208358e-06 4.342736e-03
## ENSG00000256039.2 LINC02446 2.846567e-06 4.798092e-03
## ENSG00000100906.11 NFKBIA   3.347376e-06 4.936960e-03
## ENSG00000164086.10 DUSP7    4.683604e-06 6.140205e-03

mean(abs(dge$stat))

## [1] 0.9829517

infec_a_eos_adj <- dge

Infection in treatment group A POD1

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 19

ss2 <- subset(ss2,treatment_group==1)
table(ss2$infec)

## 
##  0  1 
## 43  6

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 87 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange     lfcSE      stat
## ENSG00000263244.2 RP11-473I1.9   96.95970    -24.0528429 2.6558198 -9.056655
## ENSG00000183598.4 H3C13          16.79863      4.1384840 0.5559402  7.444118
## ENSG00000167536.14 DHRS13       630.19092      2.0049535 0.2875645  6.972187
## ENSG00000138738.11 PRDM5         93.01623      2.6786805 0.3850836  6.956100
## ENSG00000173757.10 STAT5B      9447.69305      0.9030826 0.1320213  6.840429
## ENSG00000197324.9 LRP10        8557.80925      1.3644779 0.2004622  6.806658
## ENSG00000182541.18 LIMK2       4132.98151      1.4101831 0.2115824  6.664936
## ENSG00000169180.12 XPO6        8932.98053      1.5194701 0.2293696  6.624550
## ENSG00000100505.14 TRIM9         25.92048      2.3314459 0.3562837  6.543791
## ENSG00000089351.15 GRAMD1A     5423.79264      1.1032986 0.1686106  6.543470
##                                      pvalue         padj
## ENSG00000263244.2 RP11-473I1.9 1.345128e-19 2.864719e-15
## ENSG00000183598.4 H3C13        9.759450e-14 1.039235e-09
## ENSG00000167536.14 DHRS13      3.120512e-12 1.862531e-08
## ENSG00000138738.11 PRDM5       3.498204e-12 1.862531e-08
## ENSG00000173757.10 STAT5B      7.895652e-12 3.363074e-08
## ENSG00000197324.9 LRP10        9.989212e-12 3.545671e-08
## ENSG00000182541.18 LIMK2       2.647805e-11 8.055759e-08
## ENSG00000169180.12 XPO6        3.483072e-11 9.272373e-08
## ENSG00000100505.14 TRIM9       5.997847e-11 1.280113e-07
## ENSG00000089351.15 GRAMD1A     6.010765e-11 1.280113e-07

mean(abs(dge$stat))

## [1] 1.394202

# model with clinical covariates
# including crp_group in the model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 34 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE     stat
## ENSG00000183598.4 H3C13       16.79863      4.6624411 0.6164893 7.562890
## ENSG00000089351.15 GRAMD1A  5423.79264      1.3137004 0.2044049 6.426953
## ENSG00000197324.9 LRP10     8557.80925      1.5022401 0.2436527 6.165496
## ENSG00000167536.14 DHRS13    630.19092      2.1052288 0.3439057 6.121529
## ENSG00000136156.15 ITM2B   15392.40802      1.0002585 0.1660492 6.023867
## ENSG00000100505.14 TRIM9      25.92048      2.5550712 0.4286519 5.960713
## ENSG00000173757.10 STAT5B   9447.69305      0.9290043 0.1559647 5.956505
## ENSG00000100368.14 CSF2RB   8859.06475      1.5426925 0.2636278 5.851783
## ENSG00000160710.18 ADAR    11737.71868      0.6009997 0.1035677 5.802964
## ENSG00000068383.19 INPP5A    516.82621      0.9573477 0.1658020 5.774042
##                                  pvalue         padj
## ENSG00000183598.4 H3C13    3.942104e-14 7.912985e-10
## ENSG00000089351.15 GRAMD1A 1.301870e-10 1.306622e-06
## ENSG00000197324.9 LRP10    7.026244e-10 4.650995e-06
## ENSG00000167536.14 DHRS13  9.268161e-10 4.650995e-06
## ENSG00000136156.15 ITM2B   1.702978e-09 6.836777e-06
## ENSG00000100505.14 TRIM9   2.511396e-09 7.389443e-06
## ENSG00000173757.10 STAT5B  2.576899e-09 7.389443e-06
## ENSG00000100368.14 CSF2RB  4.863322e-09 1.220268e-05
## ENSG00000160710.18 ADAR    6.515278e-09 1.453124e-05
## ENSG00000068383.19 INPP5A  7.739192e-09 1.553488e-05

mean(abs(dge$stat))

## [1] 1.34301

infec_a_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 49 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                               baseMean log2FoldChange     lfcSE      stat
## ENSG00000260903.3 XKR7        37.17335      1.2606218 0.2438369  5.169939
## ENSG00000173706.14 HEG1      656.38340      0.8946055 0.1784842  5.012239
## ENSG00000165028.12 NIPSNAP3B 111.36215     -1.2701025 0.2753142 -4.613284
## ENSG00000179981.11 TSHZ1     949.99984      0.5436810 0.1206120  4.507685
## ENSG00000104341.17 LAPTM4B    80.97473      1.3055591 0.3067875  4.255582
## ENSG00000233695.2 GAS6-AS1   813.60121     -1.5766908 0.3777491 -4.173910
## ENSG00000205571.14 SMN2      451.86820     -2.2984493 0.5722870 -4.016253
## ENSG00000071205.12 ARHGAP10  210.82833      0.7773440 0.1944938  3.996755
## ENSG00000186715.11 MST1L      49.85308     -1.7227578 0.4326843 -3.981558
## ENSG00000175764.16 TTLL11    241.50442      0.6236036 0.1594900  3.909986
##                                    pvalue       padj
## ENSG00000260903.3 XKR7       2.341702e-07 0.00499087
## ENSG00000173706.14 HEG1      5.380041e-07 0.00573324
## ENSG00000165028.12 NIPSNAP3B 3.963570e-06 0.02815852
## ENSG00000179981.11 TSHZ1     6.553873e-06 0.03492067
## ENSG00000104341.17 LAPTM4B   2.085063e-05 0.08887790
## ENSG00000233695.2 GAS6-AS1   2.994159e-05 0.10635750
## ENSG00000205571.14 SMN2      5.913089e-05 0.16213246
## ENSG00000071205.12 ARHGAP10  6.421675e-05 0.16213246
## ENSG00000186715.11 MST1L     6.846489e-05 0.16213246
## ENSG00000175764.16 TTLL11    9.230140e-05 0.18252081

mean(abs(dge$stat))

## [1] 0.9468285

infec_a_pod1_adj <- dge

Infection in treatment group B

Infection in treatment group B T0

CXCL2 seems quite robust.

mx <- xt0f
ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 21

ss2 <- subset(ss2,treatment_group==2)
table(ss2$infec)

## 
##  0  1 
## 47 14

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 396 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange     lfcSE      stat
## ENSG00000081041.9 CXCL2            35.63507      1.9439307 0.3665574  5.303209
## ENSG00000234745.13 HLA-B        62101.52290     -0.9532047 0.2093888 -4.552320
## ENSG00000281383.1 CH507-513H4.5    15.70766     -1.7158317 0.4215774 -4.070027
## ENSG00000146242.9 TPBG             42.16738      1.0944994 0.2693250  4.063862
## ENSG00000204936.10 CD177           87.28790     -2.0084278 0.5006215 -4.011869
## ENSG00000232117.2 LINC00384        11.61340     -1.1159899 0.2801978 -3.982864
## ENSG00000112149.10 CD83           411.63539      1.2198721 0.3066732  3.977759
## ENSG00000287087.1 CTD-2081C10.8    11.44057      0.7842126 0.1973394  3.973928
## ENSG00000008438.5 PGLYRP1         479.87684     -2.0009651 0.5116740 -3.910625
## ENSG00000162772.17 ATF3           160.05841      1.3211995 0.3410898  3.873466
##                                       pvalue        padj
## ENSG00000081041.9 CXCL2         1.137846e-07 0.002495865
## ENSG00000234745.13 HLA-B        5.305747e-06 0.058190781
## ENSG00000281383.1 CH507-513H4.5 4.700759e-05 0.193842544
## ENSG00000146242.9 TPBG          4.826743e-05 0.193842544
## ENSG00000204936.10 CD177        6.024001e-05 0.193842544
## ENSG00000232117.2 LINC00384     6.808968e-05 0.193842544
## ENSG00000112149.10 CD83         6.956769e-05 0.193842544
## ENSG00000287087.1 CTD-2081C10.8 7.069708e-05 0.193842544
## ENSG00000008438.5 PGLYRP1       9.205770e-05 0.224365070
## ENSG00000162772.17 ATF3         1.072986e-04 0.234109215

mean(abs(dge$stat))

## [1] 1.012181

# model with clinical covariates
# including crp_group in the model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 15 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                     baseMean log2FoldChange     lfcSE      stat
## ENSG00000130032.17 PRRG3           42.866897      1.6572563 0.3153272  5.255672
## ENSG00000081041.9 CXCL2            35.635073      1.8135859 0.3700844  4.900466
## ENSG00000278599.6 TBC1D3E           8.722472     17.2282728 3.6374850  4.736314
## ENSG00000234745.13 HLA-B        62101.522900     -1.0339314 0.2234914 -4.626269
## ENSG00000183260.7 ABHD16B          92.010146     -2.6572265 0.6007539 -4.423153
## ENSG00000286666.1 RP11-301G21.2    23.242360      0.6818548 0.1645395  4.144018
## ENSG00000276107.1 THBS1-IT1         9.083385      2.6551134 0.6415470  4.138611
## ENSG00000110436.13 SLC1A2          32.480826      1.2773138 0.3109516  4.107758
## ENSG00000232117.2 LINC00384        11.613398     -1.1541619 0.2818112 -4.095514
## ENSG00000287087.1 CTD-2081C10.8    11.440574      0.8546777 0.2088369  4.092561
##                                       pvalue        padj
## ENSG00000130032.17 PRRG3        1.474852e-07 0.003235088
## ENSG00000081041.9 CXCL2         9.560962e-07 0.010485985
## ENSG00000278599.6 TBC1D3E       2.176398e-06 0.015913096
## ENSG00000234745.13 HLA-B        3.723124e-06 0.020416683
## ENSG00000183260.7 ABHD16B       9.727079e-06 0.042672697
## ENSG00000286666.1 RP11-301G21.2 3.412726e-05 0.093582511
## ENSG00000276107.1 THBS1-IT1     3.494155e-05 0.093582511
## ENSG00000110436.13 SLC1A2       3.995187e-05 0.093582511
## ENSG00000232117.2 LINC00384     4.212327e-05 0.093582511
## ENSG00000287087.1 CTD-2081C10.8 4.266356e-05 0.093582511

mean(abs(dge$stat))

## [1] 1.052494

infec_b_t0 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 43 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                   baseMean log2FoldChange     lfcSE      stat
## ENSG00000130032.17 PRRG3          42.86690      1.5875029 0.3591955  4.419607
## ENSG00000025434.19 NR1H3         257.06365     -0.4481272 0.1115340 -4.017852
## ENSG00000065923.10 SLC9A7        517.06274      0.4089127 0.1051076  3.890421
## ENSG00000234745.13 HLA-B       62101.52290     -0.8637873 0.2227917 -3.877107
## ENSG00000284554.2 CTA-150C2.22    11.74190     -9.6088060 2.4784090 -3.877006
## ENSG00000203668.3 CHML           556.47066      0.4844837 0.1251990  3.869708
## ENSG00000116194.13 ANGPTL1        13.35445      1.1580486 0.2996963  3.864074
## ENSG00000274134.1 MIR6774         17.89025      0.9233534 0.2430764  3.798615
## ENSG00000165949.12 IFI27         104.98135     -2.1383061 0.5643889 -3.788710
## ENSG00000281162.2 LINC01127      418.70683     -0.6907730 0.1826310 -3.782342
##                                      pvalue     padj
## ENSG00000130032.17 PRRG3       9.888033e-06 0.216894
## ENSG00000025434.19 NR1H3       5.873116e-05 0.340781
## ENSG00000065923.10 SLC9A7      1.000703e-04 0.340781
## ENSG00000234745.13 HLA-B       1.057060e-04 0.340781
## ENSG00000284554.2 CTA-150C2.22 1.057498e-04 0.340781
## ENSG00000203668.3 CHML         1.089656e-04 0.340781
## ENSG00000116194.13 ANGPTL1     1.115113e-04 0.340781
## ENSG00000274134.1 MIR6774      1.455071e-04 0.340781
## ENSG00000165949.12 IFI27       1.514314e-04 0.340781
## ENSG00000281162.2 LINC01127    1.553595e-04 0.340781

mean(abs(dge$stat))

## [1] 0.8981908

infec_b_t0_adj <- dge

Infection in treatment group B EOS

CXCL2 seems quite robust.

mx <- xeosf
ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 77 21

ss2 <- subset(ss2,treatment_group==2)
table(ss2$infec)

## 
##  0  1 
## 42 14

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 150 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange     lfcSE      stat
## ENSG00000281383.1 CH507-513H4.5    15.11952     -1.5596348 0.3445350 -4.526782
## ENSG00000281741.2 CH17-118O6.6     37.17409     -1.9855820 0.4741593 -4.187584
## ENSG00000273221.1 RP5-1180E21.5    81.37678     -0.8909263 0.2145724 -4.152100
## ENSG00000154099.18 DNAAF1          32.13411      0.8499840 0.2056066  4.134030
## ENSG00000225840.2 AC010970.2    16407.05996     -1.1705777 0.2837503 -4.125379
## ENSG00000081041.9 CXCL2            39.75527      1.5920647 0.3885345  4.097615
## ENSG00000280614.1 CH507-513H4.4 14151.94639     -1.1096045 0.2717873 -4.082621
## ENSG00000280800.1 CH507-513H4.6 14151.94639     -1.1096045 0.2717873 -4.082621
## ENSG00000281181.1 CH507-513H4.3 14151.94639     -1.1096045 0.2717873 -4.082621
## ENSG00000269737.2 RP11-345P4.6    233.89006      0.8001306 0.2043683  3.915141
##                                       pvalue      padj
## ENSG00000281383.1 CH507-513H4.5 5.988877e-06 0.1091839
## ENSG00000281741.2 CH17-118O6.6  2.819395e-05 0.1091839
## ENSG00000273221.1 RP5-1180E21.5 3.294378e-05 0.1091839
## ENSG00000154099.18 DNAAF1       3.564568e-05 0.1091839
## ENSG00000225840.2 AC010970.2    3.701246e-05 0.1091839
## ENSG00000081041.9 CXCL2         4.174295e-05 0.1091839
## ENSG00000280614.1 CH507-513H4.4 4.453054e-05 0.1091839
## ENSG00000280800.1 CH507-513H4.6 4.453054e-05 0.1091839
## ENSG00000281181.1 CH507-513H4.3 4.453054e-05 0.1091839
## ENSG00000269737.2 RP11-345P4.6  9.035137e-05 0.1993784

mean(abs(dge$stat))

## [1] 0.8730035

# model with clinical covariates
# including crp_group in the model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 6 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange      lfcSE      stat
## ENSG00000281383.1 CH507-513H4.5    15.11952     -1.5796688 0.36836297 -4.288348
## ENSG00000286219.1 NOTCH2NLC      1082.05581      0.4138437 0.10731975  3.856174
## ENSG00000107719.9 PALD1            91.84364      1.1870472 0.31322499  3.789759
## ENSG00000273221.1 RP5-1180E21.5    81.37678     -0.8633386 0.22911710 -3.768111
## ENSG00000281741.2 CH17-118O6.6     37.17409     -1.9488395 0.52020549 -3.746288
## ENSG00000269737.2 RP11-345P4.6    233.89006      0.7876876 0.21762814  3.619420
## ENSG00000035141.8 FAM136A         931.14355     -0.2470166 0.06841642 -3.610487
## ENSG00000164687.11 FABP5          137.10393     -0.5587196 0.15776894 -3.541379
## ENSG00000225840.2 AC010970.2    16407.05996     -1.0565297 0.29924408 -3.530662
## ENSG00000173915.16 ATP5MK         540.05479     -0.4281250 0.12149361 -3.523848
##                                       pvalue      padj
## ENSG00000281383.1 CH507-513H4.5 1.800066e-05 0.3972205
## ENSG00000286219.1 NOTCH2NLC     1.151754e-04 0.6737808
## ENSG00000107719.9 PALD1         1.507936e-04 0.6737808
## ENSG00000273221.1 RP5-1180E21.5 1.644877e-04 0.6737808
## ENSG00000281741.2 CH17-118O6.6  1.794706e-04 0.6737808
## ENSG00000269737.2 RP11-345P4.6  2.952644e-04 0.6737808
## ENSG00000035141.8 FAM136A       3.056227e-04 0.6737808
## ENSG00000164687.11 FABP5        3.980413e-04 0.6737808
## ENSG00000225840.2 AC010970.2    4.145211e-04 0.6737808
## ENSG00000173915.16 ATP5MK       4.253286e-04 0.6737808

mean(abs(dge$stat))

## [1] 0.8610267

infec_b_eos <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 50 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                    baseMean log2FoldChange      lfcSE      stat
## ENSG00000281741.2 CH17-118O6.6     37.17409     -2.5980798 0.54862114 -4.735654
## ENSG00000143507.18 DUSP10         291.39486      0.3943481 0.09982852  3.950254
## ENSG00000203811.1 H3C14            21.14449      0.7947711 0.20119669  3.950219
## ENSG00000203852.3 H3C15            21.14449      0.7947711 0.20119669  3.950219
## ENSG00000271383.8 NBPF19        10978.51599      0.5342483 0.13755661  3.883843
## ENSG00000256427.2 RP11-118B22.2   128.34067     -0.7733993 0.20016420 -3.863824
## ENSG00000286219.1 NOTCH2NLC      1082.05581      0.4591214 0.11929688  3.848562
## ENSG00000154099.18 DNAAF1          32.13411      0.8247610 0.22924963  3.597655
## ENSG00000237248.5 LINC00987       156.01886     -0.6388614 0.17992509 -3.550708
## ENSG00000164850.15 GPER1          311.74828     -1.1288952 0.32071826 -3.519897
##                                       pvalue       padj
## ENSG00000281741.2 CH17-118O6.6  2.183500e-06 0.04818329
## ENSG00000143507.18 DUSP10       7.806818e-05 0.37455018
## ENSG00000203811.1 H3C14         7.807960e-05 0.37455018
## ENSG00000203852.3 H3C15         7.807960e-05 0.37455018
## ENSG00000271383.8 NBPF19        1.028184e-04 0.37455018
## ENSG00000256427.2 RP11-118B22.2 1.116255e-04 0.37455018
## ENSG00000286219.1 NOTCH2NLC     1.188132e-04 0.37455018
## ENSG00000154099.18 DNAAF1       3.210995e-04 0.81114477
## ENSG00000237248.5 LINC00987     3.841968e-04 0.81114477
## ENSG00000164850.15 GPER1        4.317148e-04 0.81114477

mean(abs(dge$stat))

## [1] 0.7381023

infec_b_eos_adj <- dge

Infection in treatment group B POD1

mx <- xpod1f
ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 19

ss2 <- subset(ss2,treatment_group==2)
table(ss2$infec)

## 
##  0  1 
## 47 13

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ infec )

## converting counts to integer mode

res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## -- replacing outliers and refitting for 376 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)

## estimating dispersions

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                 baseMean log2FoldChange      lfcSE      stat
## ENSG00000129535.13 NRL          57.14780     -0.5609943 0.11187913 -5.014289
## ENSG00000120053.12 GOT1        218.89056     -0.4240765 0.08922375 -4.752955
## ENSG00000234426.3 RP11-459O1.2  46.99935      1.8732252 0.40963550  4.572907
## ENSG00000119673.15 ACOT2       232.09990     -0.4556809 0.10389743 -4.385872
## ENSG00000225411.3 RP11-764K9.1  73.11837      0.8918663 0.20961454  4.254792
## ENSG00000196421.9 C20orf204     61.17974     -1.1312499 0.26662111 -4.242912
## ENSG00000133321.11 PLAAT4      965.23264     -0.5970574 0.14308900 -4.172629
## ENSG00000114993.17 RTKN         37.68934     -0.9210491 0.22358050 -4.119541
## ENSG00000287586.1 RP1-77N19.1   13.29168      1.5289914 0.37478338  4.079667
## ENSG00000176571.12 CNBD1        68.78388      0.7885928 0.19437260  4.057119
##                                      pvalue       padj
## ENSG00000129535.13 NRL         5.322982e-07 0.01134487
## ENSG00000120053.12 GOT1        2.004648e-06 0.02136253
## ENSG00000234426.3 RP11-459O1.2 4.810026e-06 0.03417203
## ENSG00000119673.15 ACOT2       1.155219e-05 0.06155297
## ENSG00000225411.3 RP11-764K9.1 2.092434e-05 0.07837427
## ENSG00000196421.9 C20orf204    2.206379e-05 0.07837427
## ENSG00000133321.11 PLAAT4      3.011045e-05 0.09167771
## ENSG00000114993.17 RTKN        3.796275e-05 0.09666658
## ENSG00000287586.1 RP1-77N19.1  4.510026e-05 0.09666658
## ENSG00000176571.12 CNBD1       4.968172e-05 0.09666658

mean(abs(dge$stat))

## [1] 1.00484

# model with clinical covariates
# including crp_group in the model
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 18 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                              baseMean log2FoldChange      lfcSE      stat
## ENSG00000129535.13 NRL       57.14780     -0.6161299 0.11710335 -5.261420
## ENSG00000120053.12 GOT1     218.89056     -0.4810951 0.09453293 -5.089180
## ENSG00000272282.1 LINC02084 148.15378     -0.9851775 0.20484157 -4.809461
## ENSG00000263006.6 ROCK1P1   125.27169     -2.6815259 0.57322844 -4.677936
## ENSG00000135069.14 PSAT1     74.18620     -0.8171086 0.17803113 -4.589695
## ENSG00000097021.20 ACOT7    165.93568     -0.7004217 0.15839008 -4.422131
## ENSG00000110436.13 SLC1A2    36.33599      1.4581064 0.34068871  4.279879
## ENSG00000205927.5 OLIG2      10.78384      1.2707752 0.30036315  4.230796
## ENSG00000184221.13 OLIG1    631.09997      0.9568114 0.23161817  4.130986
## ENSG00000116133.13 DHCR24   323.22445     -0.5030564 0.12223204 -4.115586
##                                   pvalue        padj
## ENSG00000129535.13 NRL      1.429470e-07 0.003046630
## ENSG00000120053.12 GOT1     3.596144e-07 0.003832231
## ENSG00000272282.1 LINC02084 1.513381e-06 0.010751565
## ENSG00000263006.6 ROCK1P1   2.897771e-06 0.015440047
## ENSG00000135069.14 PSAT1    4.438942e-06 0.018921433
## ENSG00000097021.20 ACOT7    9.773208e-06 0.034716062
## ENSG00000110436.13 SLC1A2   1.869952e-05 0.056934689
## ENSG00000205927.5 OLIG2     2.328657e-05 0.062038342
## ENSG00000184221.13 OLIG1    3.612108e-05 0.082310279
## ENSG00000116133.13 DHCR24   3.861975e-05 0.082310279

mean(abs(dge$stat))

## [1] 0.9990416

infec_b_pod1 <- dge

# model with clinical and cell covariates
# Monocytes.C NK T.CD8.Memory T.CD4.Naive Neutrophils.LD
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss2,
  design = ~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS +
    Monocytes.C + NK + T.CD8.Memory + T.CD4.Naive + Neutrophils.LD + crp_group + infec )

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function

##   the design formula contains one or more numeric variables that have mean or
##   standard deviation larger than 5 (an arbitrary threshold to trigger this message).
##   Including numeric variables with large mean can induce collinearity with the intercept.
##   Users should center and scale numeric variables in the design to improve GLM convergence.

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

res <- DESeq(dds)

## estimating size factors
##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

## final dispersion estimates

## fitting model and testing

## 23 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

z <- results(res)
vsd <- vst(dds, blind=FALSE)

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not a warning or an error]

zz <- cbind(as.data.frame(z),assay(vsd))
dge <- as.data.frame(zz[order(zz$pvalue),])
head(dge[order(dge$pvalue),1:6],10)

##                                  baseMean log2FoldChange      lfcSE      stat
## ENSG00000129535.13 NRL           57.14780     -0.6392066 0.12101568 -5.282015
## ENSG00000183010.17 PYCR1         33.69785     -0.9937114 0.19500612 -5.095796
## ENSG00000120053.12 GOT1         218.89056     -0.4072097 0.09348147 -4.356047
## ENSG00000258056.2 RP11-644F5.11 165.61102     -0.4330740 0.10509836 -4.120654
## ENSG00000135069.14 PSAT1         74.18620     -0.7265525 0.17680194 -4.109415
## ENSG00000000460.17 C1orf112     201.19205      0.3343957 0.08706870  3.840596
## ENSG00000275793.1 RIMBP3         94.73468     -0.7030986 0.18596807 -3.780749
## ENSG00000242611.2 AC093627.8     18.55195     -2.1900148 0.58037179 -3.773469
## ENSG00000246560.2 UBE2D3-AS1     25.20489     -0.5878541 0.15596389 -3.769168
## ENSG00000185875.13 THNSL1        73.66433     -0.5985449 0.16202704 -3.694105
##                                       pvalue        padj
## ENSG00000129535.13 NRL          1.277709e-07 0.002723182
## ENSG00000183010.17 PYCR1        3.472788e-07 0.003700777
## ENSG00000120053.12 GOT1         1.324322e-05 0.094084267
## ENSG00000258056.2 RP11-644F5.11 3.777985e-05 0.169081506
## ENSG00000135069.14 PSAT1        3.966628e-05 0.169081506
## ENSG00000000460.17 C1orf112     1.227360e-04 0.382420268
## ENSG00000275793.1 RIMBP3        1.563572e-04 0.382420268
## ENSG00000242611.2 AC093627.8    1.609934e-04 0.382420268
## ENSG00000246560.2 UBE2D3-AS1    1.637927e-04 0.382420268
## ENSG00000185875.13 THNSL1       2.206626e-04 0.382420268

mean(abs(dge$stat))

## [1] 0.8716056

infec_b_pod1_adj <- dge

Blood composition associate with infection in all samples T0

mx <- dec2

ss2 <- as.data.frame(cbind(ss_t0,sscell_t0))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 21

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
ss2$infec <- as.numeric(ss2$infec) -1
design <- model.matrix(~ ss2$infec)
fit <- lmFit(mx, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
bl_t0 <- topTable(fit,number=Inf)

## Removing intercept from test coefficients

bl_t0

##                       logFC    AveExpr            t    P.Value adj.P.Val
## B.Naive         1.485781854  3.2262095  2.447814145 0.01593594 0.2709109
## Plasmablasts   -0.013258764  0.1173774 -1.663095622 0.09910846 0.7976001
## T.CD8.Memory   -2.098142898  8.1875958 -1.483562645 0.14075296 0.7976001
## T.CD8.Naive    -0.791486921 14.5112581 -0.912231404 0.36361974 0.9539393
## T.gd.Vd2       -0.068599005  2.0044018 -0.838526580 0.40353333 0.9539393
## Monocytes.C     1.361884574 20.8800448  0.755627101 0.45146993 0.9539393
## NK              0.563023333  4.7371188  0.690214470 0.49149693 0.9539393
## Neutrophils.LD -0.868694955  3.2388988 -0.568220212 0.57103073 0.9539393
## mDCs           -0.031981746  0.7950039 -0.563555244 0.57419120 0.9539393
## T.gd.non.Vd2   -0.014173517  0.3715104 -0.484466194 0.62900762 0.9539393
## Monocytes.NC.I  0.431889032 10.5254623  0.433847984 0.66523912 0.9539393
## pDCs           -0.033540735  0.2066387 -0.422659273 0.67336895 0.9539393
## B.Memory        0.233854949  4.0288477  0.319990801 0.74957557 0.9604696
## T.CD4.Memory   -0.202910589 10.6556812 -0.237540456 0.81267450 0.9604696
## Basophils.LD    0.078462071  1.5507565  0.192797372 0.84747317 0.9604696
## T.CD4.Naive    -0.034433715 11.0483327 -0.023787443 0.98106478 0.9965077
## MAIT            0.002327032  3.9148616  0.004386756 0.99650774 0.9965077
##                        B
## B.Naive        -3.075068
## Plasmablasts   -4.273299
## T.CD8.Memory   -4.487441
## T.CD8.Naive    -5.011785
## T.gd.Vd2       -5.061622
## Monocytes.C    -5.112736
## NK             -5.149361
## Neutrophils.LD -5.208894
## mDCs           -5.210943
## T.gd.non.Vd2   -5.243129
## Monocytes.NC.I -5.261191
## pDCs           -5.264910
## B.Memory       -5.294568
## T.CD4.Memory   -5.312457
## Basophils.LD   -5.319951
## T.CD4.Naive    -5.334207
## MAIT           -5.334420

subset(bl_t0,P.Value<0.05)

##            logFC AveExpr        t    P.Value adj.P.Val         B
## B.Naive 1.485782 3.22621 2.447814 0.01593594 0.2709109 -3.075068

# model with clinical covariates
ss3 <- ss2[,c("sexD", "wound_typeOP", "duration_sx", "ethnicityCAT", "ageCS", "crp_group", "infec")]

#design <- model.matrix(~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec, ss2  )
design <- model.matrix(~ sexD + ethnicityCAT + ageCS + infec, ss2  )

fit <- lmFit(mx, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
bl_t0 <- topTable(fit,coef="infec",number=Inf)
bl_t0

##                       logFC    AveExpr            t    P.Value adj.P.Val
## B.Naive         1.357759364  3.2262095  2.132738308 0.03525085 0.5067739
## T.CD8.Memory   -2.245450340  8.1875958 -1.571628947 0.11901209 0.5067739
## mDCs           -0.075346792  0.7950039 -1.362257528 0.17599966 0.5067739
## NK              1.063898824  4.7371188  1.329623085 0.18649217 0.5067739
## T.CD8.Naive    -1.045819917 14.5112581 -1.297296290 0.19734113 0.5067739
## Plasmablasts   -0.010129114  0.1173774 -1.266658258 0.20804926 0.5067739
## Monocytes.C     2.294681743 20.8800448  1.264913664 0.20867162 0.5067739
## T.gd.Vd2       -0.089927045  2.0044018 -1.057315464 0.29276742 0.6221308
## pDCs           -0.044848971  0.2066387 -0.537962040 0.59174186 0.9836734
## T.gd.non.Vd2   -0.016060796  0.3715104 -0.531558555 0.59614219 0.9836734
## T.CD4.Naive    -0.660438040 11.0483327 -0.444785602 0.65738012 0.9836734
## MAIT           -0.173299037  3.9148616 -0.347928916 0.72858243 0.9836734
## B.Memory       -0.225996855  4.0288477 -0.316531218 0.75222083 0.9836734
## Monocytes.NC.I -0.211606320 10.5254623 -0.218957471 0.82710354 0.9942229
## T.CD4.Memory    0.066364341 10.6556812  0.077420343 0.93843487 0.9942229
## Basophils.LD    0.004967296  1.5507565  0.011642681 0.99073279 0.9942229
## Neutrophils.LD  0.011251659  3.2388988  0.007257702 0.99422286 0.9942229
##                        B
## B.Naive        -4.558305
## T.CD8.Memory   -4.579649
## mDCs           -4.586112
## NK             -4.587042
## T.CD8.Naive    -4.587943
## Plasmablasts   -4.588777
## Monocytes.C    -4.588824
## T.gd.Vd2       -4.593972
## pDCs           -4.602926
## T.gd.non.Vd2   -4.603001
## T.CD4.Naive    -4.603925
## MAIT           -4.604763
## B.Memory       -4.604991
## Monocytes.NC.I -4.605562
## T.CD4.Memory   -4.606021
## Basophils.LD   -4.606086
## Neutrophils.LD -4.606086

subset(bl_t0,P.Value<0.05)

##            logFC AveExpr        t    P.Value adj.P.Val         B
## B.Naive 1.357759 3.22621 2.132738 0.03525085 0.5067739 -4.558305

Blood composition associate with infection in all samples EOS

mx <- dec2

ss2 <- as.data.frame(cbind(ss_eos,sscell_eos))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 77 21

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
ss2$infec <- as.numeric(ss2$infec) -1
design <- model.matrix(~ ss2$infec)
fit <- lmFit(mx, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
bl_eos <- topTable(fit,number=Inf)

## Removing intercept from test coefficients

bl_eos

##                       logFC    AveExpr          t    P.Value adj.P.Val
## T.CD8.Naive    -1.954288396 15.1249961 -2.0747393 0.04054616 0.4169905
## B.Naive         0.928996873  2.5375035  1.7812462 0.07787020 0.4169905
## MAIT           -0.957805426  3.6323260 -1.7544315 0.08238137 0.4169905
## mDCs           -0.079644169  0.5948358 -1.6694567 0.09811542 0.4169905
## B.Memory       -0.868682889  3.1195397 -1.5103534 0.13406726 0.4558287
## Plasmablasts   -0.011988478  0.1201618 -1.3388267 0.18362474 0.4739035
## Monocytes.NC.I  0.849164716  6.2272662  1.1990483 0.23330730 0.4739035
## Neutrophils.LD  2.775590100  7.0969781  1.1750892 0.24273932 0.4739035
## T.CD4.Memory   -1.076926445 10.7921585 -1.1547997 0.25089010 0.4739035
## T.gd.Vd2       -0.093444647  1.9156645 -0.8782520 0.38188595 0.6031229
## NK             -1.129549232  7.3243506 -0.8628550 0.39025601 0.6031229
## Monocytes.C     1.442358484 23.6779806  0.5947583 0.55333163 0.7838865
## pDCs           -0.007956471  0.1671414 -0.3536932 0.72430410 0.8661475
## T.CD4.Naive     0.458724399  9.1757025  0.3316219 0.74086011 0.8661475
## Basophils.LD   -0.041957134  1.1171489 -0.2386489 0.81186014 0.8661475
## T.gd.non.Vd2    0.005999223  0.3327072  0.2265726 0.82121281 0.8661475
## T.CD8.Memory   -0.238590506  7.0435384 -0.1689825 0.86614747 0.8661475
##                        B
## T.CD8.Naive    -4.535140
## B.Naive        -4.555385
## MAIT           -4.557100
## mDCs           -4.562384
## B.Memory       -4.571642
## Plasmablasts   -4.580669
## Monocytes.NC.I -4.587275
## Neutrophils.LD -4.588339
## T.CD4.Memory   -4.589223
## T.gd.Vd2       -4.599805
## NK             -4.600313
## Monocytes.C    -4.607740
## pDCs           -4.612111
## T.CD4.Naive    -4.612401
## Basophils.LD   -4.613418
## T.gd.non.Vd2   -4.613526
## T.CD8.Memory   -4.613963

subset(bl_eos,P.Value<0.05)

##                 logFC AveExpr         t    P.Value adj.P.Val        B
## T.CD8.Naive -1.954288  15.125 -2.074739 0.04054616 0.4169905 -4.53514

# model with clinical covariates
ss3 <- ss2[,c("sexD", "wound_typeOP", "duration_sx", "ethnicityCAT", "ageCS", "crp_group", "infec")]

#design <- model.matrix(~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec, ss2  )
design <- model.matrix(~ sexD + ethnicityCAT + ageCS + infec, ss2  )

fit <- lmFit(mx, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
topTable(fit,coef="infec")

##                      logFC    AveExpr         t    P.Value adj.P.Val         B
## T.CD8.Naive    -1.95984109 15.1249961 -2.147611 0.03422174 0.3160929 -4.531220
## mDCs           -0.09047923  0.5948358 -1.825721 0.07095105 0.3160929 -4.553452
## B.Memory       -1.02027865  3.1195397 -1.756456 0.08214808 0.3160929 -4.557830
## B.Naive         0.91403437  2.5375035  1.698184 0.09266010 0.3160929 -4.561397
## MAIT           -0.91804390  3.6323260 -1.696558 0.09296851 0.3160929 -4.561495
## Plasmablasts   -0.01373720  0.1201618 -1.486451 0.14038426 0.3630459 -4.573448
## NK             -1.89747742  7.3243506 -1.452794 0.14948948 0.3630459 -4.575230
## Monocytes.C     2.85942899 23.6779806  1.194302 0.23525535 0.4687720 -4.587648
## Monocytes.NC.I  0.83666376  6.2272662  1.133862 0.25963060 0.4687720 -4.590223
## T.CD4.Memory   -1.05025753 10.7921585 -1.096065 0.27574822 0.4687720 -4.591768

bl_eos <- topTable(fit,coef="infec",number=Inf)
bl_eos

##                       logFC    AveExpr          t    P.Value adj.P.Val
## T.CD8.Naive    -1.959841094 15.1249961 -2.1476108 0.03422174 0.3160929
## mDCs           -0.090479231  0.5948358 -1.8257213 0.07095105 0.3160929
## B.Memory       -1.020278651  3.1195397 -1.7564562 0.08214808 0.3160929
## B.Naive         0.914034370  2.5375035  1.6981842 0.09266010 0.3160929
## MAIT           -0.918043901  3.6323260 -1.6965580 0.09296851 0.3160929
## Plasmablasts   -0.013737200  0.1201618 -1.4864508 0.14038426 0.3630459
## NK             -1.897477424  7.3243506 -1.4527941 0.14948948 0.3630459
## Monocytes.C     2.859428993 23.6779806  1.1943017 0.23525535 0.4687720
## Monocytes.NC.I  0.836663758  6.2272662  1.1338622 0.25963060 0.4687720
## T.CD4.Memory   -1.050257528 10.7921585 -1.0960653 0.27574822 0.4687720
## Neutrophils.LD  2.455781819  7.0969781  0.9948028 0.32233389 0.4981524
## T.gd.Vd2       -0.097351154  1.9156645 -0.8704465 0.38619302 0.5471068
## T.gd.non.Vd2    0.009793401  0.3327072  0.3756348 0.70800381 0.7488790
## pDCs           -0.008619337  0.1671414 -0.3675820 0.71398108 0.7488790
## T.CD4.Naive     0.505644379  9.1757025  0.3550411 0.72332511 0.7488790
## Basophils.LD   -0.062377239  1.1171489 -0.3415051 0.73345773 0.7488790
## T.CD8.Memory   -0.462883963  7.0435384 -0.3210237 0.74887903 0.7488790
##                        B
## T.CD8.Naive    -4.531220
## mDCs           -4.553452
## B.Memory       -4.557830
## B.Naive        -4.561397
## MAIT           -4.561495
## Plasmablasts   -4.573448
## NK             -4.575230
## Monocytes.C    -4.587648
## Monocytes.NC.I -4.590223
## T.CD4.Memory   -4.591768
## Neutrophils.LD -4.595662
## T.gd.Vd2       -4.599948
## T.gd.non.Vd2   -4.611435
## pDCs           -4.611547
## T.CD4.Naive    -4.611717
## Basophils.LD   -4.611894
## T.CD8.Memory   -4.612149

subset(bl_eos,P.Value<0.05)

##                 logFC AveExpr         t    P.Value adj.P.Val        B
## T.CD8.Naive -1.959841  15.125 -2.147611 0.03422174 0.3160929 -4.53122

Blood composition associate with infection in all samples POD1

mx <- dec2

ss2 <- as.data.frame(cbind(ss_pod1,sscell_pod1))
ss2$infec <- factor(infec[match(ss2$PG_number,infec$PG_number),"infection30d"])
table(ss2$infec)

## 
##  0  1 
## 90 19

mx <- mx[,colnames(mx) %in% rownames(ss2)]

# base model
ss2$infec <- as.numeric(ss2$infec) -1
design <- model.matrix(~ ss2$infec)
fit <- lmFit(mx, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
bl_eos <- topTable(fit,number=Inf)

## Removing intercept from test coefficients

bl_eos

##                       logFC     AveExpr          t     P.Value  adj.P.Val
## Neutrophils.LD  7.383561154  6.04021961  2.8906243 0.004635183 0.04559221
## mDCs           -0.147177994  0.56031255 -2.8402997 0.005363790 0.04559221
## Monocytes.NC.I -1.562991987  6.80694095 -2.2609764 0.025712050 0.12205786
## T.CD8.Memory   -1.836842825  4.94705245 -2.2162038 0.028719497 0.12205786
## B.Naive        -1.130456941  2.67567424 -2.1160802 0.036575644 0.12435719
## T.gd.Vd2       -0.126542498  1.96550576 -2.0202968 0.045763056 0.12966199
## MAIT           -0.753010120  2.56410707 -1.9040002 0.059501973 0.14450479
## Plasmablasts   -0.009622197  0.11220385 -1.8034825 0.074027871 0.15730923
## T.CD8.Naive    -1.304776589 13.87974798 -1.7275268 0.086857286 0.16406376
## T.CD4.Naive    -1.832981825  5.63243153 -1.6236129 0.107297960 0.17316669
## NK             -0.694609309  2.51929775 -1.6017813 0.112049036 0.17316669
## pDCs           -0.017440865  0.08142436 -1.3547992 0.178263669 0.25254020
## B.Memory       -0.689658924  2.49671698 -1.2954699 0.197848592 0.25872508
## Monocytes.C     2.332479245 35.58800977  0.8822890 0.379529170 0.46085685
## Basophils.LD    0.089728502  0.60330084  0.5905224 0.556054451 0.63019504
## T.CD4.Memory    0.297015802 13.12416487  0.3966971 0.692353280 0.73562536
## T.gd.non.Vd2    0.003327371  0.40288944  0.1541699 0.877755986 0.87775599
##                        B
## Neutrophils.LD -2.138812
## mDCs           -2.262089
## Monocytes.NC.I -3.555972
## T.CD8.Memory   -3.645021
## B.Naive        -3.838286
## T.gd.Vd2       -4.015499
## MAIT           -4.220450
## Plasmablasts   -4.388462
## T.CD8.Naive    -4.509743
## T.CD4.Naive    -4.667674
## NK             -4.699673
## pDCs           -5.032772
## B.Memory       -5.104804
## Monocytes.C    -5.519100
## Basophils.LD   -5.718212
## T.CD4.Memory   -5.807159
## T.gd.non.Vd2   -5.869351

subset(bl_eos,P.Value<0.05)

##                     logFC   AveExpr         t     P.Value  adj.P.Val         B
## Neutrophils.LD  7.3835612 6.0402196  2.890624 0.004635183 0.04559221 -2.138812
## mDCs           -0.1471780 0.5603126 -2.840300 0.005363790 0.04559221 -2.262089
## Monocytes.NC.I -1.5629920 6.8069409 -2.260976 0.025712050 0.12205786 -3.555972
## T.CD8.Memory   -1.8368428 4.9470524 -2.216204 0.028719497 0.12205786 -3.645021
## B.Naive        -1.1304569 2.6756742 -2.116080 0.036575644 0.12435719 -3.838286
## T.gd.Vd2       -0.1265425 1.9655058 -2.020297 0.045763056 0.12966199 -4.015499

# model with clinical covariates
ss3 <- ss2[,c("sexD", "wound_typeOP", "duration_sx", "ethnicityCAT", "ageCS", "crp_group", "infec")]
design <- model.matrix(~ sexD + ethnicityCAT + ageCS + infec, ss2  )
#design <- model.matrix(~ sexD + wound_typeOP + duration_sx + ethnicityCAT + ageCS + crp_group + infec, ss2  )
design <- model.matrix(~ sexD + ethnicityCAT + ageCS + infec, ss2  )
fit <- lmFit(mx, design)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
bl_pod1 <- topTable(fit,coef="infec",number=Inf)
subset(bl_pod1,P.Value<0.05)

##                     logFC    AveExpr         t     P.Value  adj.P.Val         B
## Neutrophils.LD  8.1537013  6.0402196  3.042204 0.002969841 0.04113201 -1.799746
## mDCs           -0.1533597  0.5603126 -2.878483 0.004839061 0.04113201 -2.220296
## T.CD8.Naive    -1.7651999 13.8797480 -2.539088 0.012574308 0.06425516 -3.030962
## T.CD8.Memory   -2.0884487  4.9470524 -2.456531 0.015661482 0.06425516 -3.214747
## NK             -1.0169202  2.5192977 -2.369188 0.019647626 0.06425516 -3.403322
## MAIT           -0.9129348  2.5641071 -2.312760 0.022678293 0.06425516 -3.521910
## B.Naive        -1.2049898  2.6756742 -2.214346 0.028958021 0.07032662 -3.722578
## T.gd.Vd2       -0.1332896  1.9655058 -2.119442 0.036401229 0.07735261 -3.908594

Session information

For reproducibility

save.image("qc_dge_infec.Rdata") #should be "qc.Rdata"

sessionInfo()

## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] beeswarm_0.4.0              limma_3.64.0               
##  [3] eulerr_7.0.2                MASS_7.3-65                
##  [5] mitch_1.20.0                DESeq2_1.48.0              
##  [7] SummarizedExperiment_1.38.0 Biobase_2.68.0             
##  [9] MatrixGenerics_1.20.0       matrixStats_1.5.0          
## [11] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
## [13] IRanges_2.42.0              S4Vectors_0.46.0           
## [15] BiocGenerics_0.54.0         generics_0.1.3             
## [17] dplyr_1.1.4                 WGCNA_1.73                 
## [19] fastcluster_1.2.6           dynamicTreeCut_1.63-1      
## [21] reshape2_1.4.4              gplots_3.2.0               
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      rstudioapi_0.17.1       jsonlite_2.0.0         
##   [4] magrittr_2.0.3          farver_2.1.2            rmarkdown_2.29         
##   [7] vctrs_0.6.5             memoise_2.0.1.9000      base64enc_0.1-3        
##  [10] htmltools_0.5.8.1       S4Arrays_1.8.0          SparseArray_1.8.0      
##  [13] Formula_1.2-5           sass_0.4.10             bslib_0.9.0            
##  [16] KernSmooth_2.23-26      htmlwidgets_1.6.4       plyr_1.8.9             
##  [19] echarts4r_0.4.5         impute_1.82.0           cachem_1.1.0           
##  [22] mime_0.13               lifecycle_1.0.4         iterators_1.0.14       
##  [25] pkgconfig_2.0.3         Matrix_1.7-3            R6_2.6.1               
##  [28] fastmap_1.2.0           GenomeInfoDbData_1.2.14 shiny_1.10.0           
##  [31] digest_0.6.37           colorspace_2.1-1        GGally_2.2.1           
##  [34] AnnotationDbi_1.70.0    Hmisc_5.2-3             RSQLite_2.3.9          
##  [37] polyclip_1.10-7         httr_1.4.7              abind_1.4-8            
##  [40] compiler_4.5.0          bit64_4.6.0-1           doParallel_1.0.17      
##  [43] htmlTable_2.4.3         backports_1.5.0         BiocParallel_1.42.0    
##  [46] DBI_1.2.3               ggstats_0.9.0           DelayedArray_0.34.1    
##  [49] gtools_3.9.5            caTools_1.18.3          tools_4.5.0            
##  [52] foreign_0.8-90          httpuv_1.6.16           nnet_7.3-20            
##  [55] glue_1.8.0              promises_1.3.2          polylabelr_0.3.0       
##  [58] grid_4.5.0              checkmate_2.3.2         cluster_2.1.8.1        
##  [61] gtable_0.3.6            preprocessCore_1.70.0   tidyr_1.3.1            
##  [64] data.table_1.17.0       xml2_1.3.8              XVector_0.48.0         
##  [67] foreach_1.5.2           pillar_1.10.2           stringr_1.5.1          
##  [70] later_1.4.2             splines_4.5.0           lattice_0.22-7         
##  [73] survival_3.8-3          bit_4.6.0               tidyselect_1.2.1       
##  [76] GO.db_3.21.0            locfit_1.5-9.12         Biostrings_2.76.0      
##  [79] knitr_1.50              gridExtra_2.3           svglite_2.1.3          
##  [82] xfun_0.52               statmod_1.5.0           stringi_1.8.7          
##  [85] UCSC.utils_1.4.0        yaml_2.3.10             statnet.common_4.11.0  
##  [88] kableExtra_1.4.0        evaluate_1.0.3          codetools_0.2-20       
##  [91] tcltk_4.5.0             tibble_3.2.1            cli_3.6.5              
##  [94] rpart_4.1.24            xtable_1.8-4            systemfonts_1.2.2      
##  [97] jquerylib_0.1.4         network_1.19.0          dichromat_2.0-0.1      
## [100] Rcpp_1.0.14             coda_0.19-4.1           png_0.1-8              
## [103] parallel_4.5.0          ggplot2_3.5.2           blob_1.2.4             
## [106] bitops_1.0-9            viridisLite_0.4.2       scales_1.4.0           
## [109] purrr_1.0.4             crayon_1.5.3            rlang_1.1.6            
## [112] KEGGREST_1.48.0

PADDI RNA expression analysis

Mark Ziemann

2025-05-29

Introduction

Multi-qc results

rRNA amount

Load the data

Number of reads per sample

MDS

Conclusion

Load patient info

PCA Analysis

Specific PCA charts

Specific PCA charts for infection

Blood composition

Differential expression

Overview

CRP group differences not stratified

CRP low vs high at t=0

CRP low vs high at EOS

CRP low vs high at POD1

Treatment group differences not stratified

Treatment A vs B at t0

Treatment A vs B at EOS

Treatment A vs B at POD1

Treatment group differences stratified

Treatment A vs B t0 in CRP low group

Treatment A vs B t0 in CRP high group

Treatment A vs B EOS in CRP low group

Treatment A vs B EOS in CRP high group

Treatment A vs B POD1 in CRP low group

Treatment A vs B POD in CRP high group

CRP Group Comparisons statified

CRP low vs high at t=0 treatment group A

CRP low vs high at t=0 treatment group B

CRP low vs high at EOS treatment group A

CRP low vs high at EOS treatment group B

CRP low vs high at POD treatment group A

CRP low vs high at POD1 treatment group B

Sex differences in low CRP group (not stratified for treatment group)

T0

EOS

POD1

Sex differences in high CRP group (not stratified for treatment group)

T0

EOS

POD1

Effect of surgery in males with high CRP

In sexD==1 females

In sexD==2 males

Compare male and female

Now look at infection

Infection in all samples

Infection in all samples T0

Infection in all samples EOS

Infection in all samples POD1

Infection in high CRP group

Infection in high CRP group T0

Infection in high CRP group EOS

Infection in high CRP group POD1

Infection in low CRP group

Infection in low CRP group T0

Infection in low CRP group EOS

Infection in low CRP group POD1

Infection in treatment group A

Infection in treatment group A T0

Infection in treatment group A EOS

Infection in treatment group A POD1

Infection in treatment group B

Infection in treatment group B T0

Infection in treatment group B EOS

Infection in treatment group B POD1

Blood composition associate with infection in all samples T0

Blood composition associate with infection in all samples EOS

Blood composition associate with infection in all samples POD1

Session information