Source: https://github.com/markziemann/SLE712_files/blob/master/BioinfoPrac2.Rmd

Introduction

In this session we will be building directly on top of what we learned last session. Last session we introduced simple data structures in R, which are individual numbers, integers, strings, and vectors of these. In this session we will be working with data tables, which are the most common type of data you will be using. The purpose of this session will be to become so familiar with R data tables, that you will no longer need to use spreadsheets in your research.

The benefit of using R over excel are many:

  1. The code is totally reproducible.

  2. The code documents each and every step that you have done.

  3. Errors can be picked up more easily in code than in large spreadsheets.

  4. Reanalysis is quicker.

  5. We can use the source control to manage different versions of the analysis (more on this next week).

  6. R has many more functions than Excel.

Data table types

When I say data table, I mean R objects that have two dimensions, similar to rows and columns in a spreadsheet. There are two main types of R data table objects, matrix and data frame. These are distinct objects in that matrices are limited to numerical values while data frames are more flexible. The columns of the data frame can have different types of data like numerical, factor, string, etc. Because of these differences, some commands that work for matrices might not work for data frames, so it is important to know how to find this information out.

In the below example we look at freeny.x which is a matrix, and mtcars which is a data frame. Again, using the str() command will tell us the structure of the data.

head(freeny.x)
##      lag quarterly revenue price index income level market potential
## [1,]               8.79636     4.70997      5.82110          12.9699
## [2,]               8.79236     4.70217      5.82558          12.9733
## [3,]               8.79137     4.68944      5.83112          12.9774
## [4,]               8.81486     4.68558      5.84046          12.9806
## [5,]               8.81301     4.64019      5.85036          12.9831
## [6,]               8.90751     4.62553      5.86464          12.9854
str(freeny.x)
##  num [1:39, 1:4] 8.8 8.79 8.79 8.81 8.81 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:4] "lag quarterly revenue" "price index" "income level" "market potential"
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

As an example of commands that can be used for matrices and not data frames, try mean() for freeny.x and mtcars.

Basic commands that can be used for data frames and matrices

Find out the number of rows, columns, and simple operations on those rows and columns.

# number of columns
ncol(mtcars)
## [1] 11
# number of rows
nrow(mtcars)
## [1] 32
# dimensions
dim(mtcars)
## [1] 32 11
# analysing rows and columns
colMeans(freeny.x)
## lag quarterly revenue           price index          income level 
##              9.280718              4.496182              6.038596 
##      market potential 
##             13.066831
colSums(freeny.x)
## lag quarterly revenue           price index          income level 
##              361.9480              175.3511              235.5052 
##      market potential 
##              509.6064
rowMeans(freeny.x)
##  [1] 8.074333 8.073353 8.072332 8.080375 8.071665 8.095770 8.106082 8.117520
##  [9] 8.124863 8.140490 8.149078 8.158940 8.158470 8.180965 8.192552 8.205092
## [17] 8.209112 8.223212 8.230868 8.231193 8.240507 8.250258 8.249220 8.260175
## [25] 8.267057 8.269210 8.282000 8.293537 8.297823 8.310002 8.315660 8.317607
## [33] 8.317818 8.330682 8.330505 8.333490 8.339897 8.345975 8.354988
rowSums(freeny.x)
##  [1] 32.29733 32.29341 32.28933 32.32150 32.28666 32.38308 32.42433 32.47008
##  [9] 32.49945 32.56196 32.59631 32.63576 32.63388 32.72386 32.77021 32.82037
## [17] 32.83645 32.89285 32.92347 32.92477 32.96203 33.00103 32.99688 33.04070
## [25] 33.06823 33.07684 33.12800 33.17415 33.19129 33.24001 33.26264 33.27043
## [33] 33.27127 33.32273 33.32202 33.33396 33.35959 33.38390 33.41995

You can also transpose a matrix or data frame. But be careful, transposing a data frame will automatically convert it to a matrix which could cause downstream errors.

freeny_flip <- t(freeny.x)
head(freeny_flip)
##                           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
## lag quarterly revenue  8.79636  8.79236  8.79137  8.81486  8.81301  8.90751
## price index            4.70997  4.70217  4.68944  4.68558  4.64019  4.62553
## income level           5.82110  5.82558  5.83112  5.84046  5.85036  5.86464
## market potential      12.96990 12.97330 12.97740 12.98060 12.98310 12.98540
##                           [,7]     [,8]     [,9]    [,10]    [,11]    [,12]
## lag quarterly revenue  8.93673  8.96161  8.96044  9.00868  9.03049  9.06906
## price index            4.61991  4.61654  4.61407  4.60766  4.60227  4.58960
## income level           5.87769  5.89763  5.92574  5.94232  5.95365  5.96120
## market potential      12.99000 12.99430 12.99920 13.00330 13.00990 13.01590
##                          [,13]    [,14]    [,15]    [,16]    [,17]    [,18]
## lag quarterly revenue  9.05871  9.10698  9.12685  9.17096  9.18665  9.23823
## price index            4.57592  4.58661  4.57997  4.57176  4.56104  4.54906
## income level           5.97805  6.00377  6.02829  6.03475  6.03906  6.05046
## market potential      13.02120 13.02650 13.03510 13.04290 13.04970 13.05510
##                          [,19]    [,20]    [,21]    [,22]    [,23]    [,24]
## lag quarterly revenue  9.26487  9.28436  9.31378  9.35025  9.35835  9.39767
## price index            4.53957  4.51018  4.50352  4.49360  4.46505  4.44924
## income level           6.05563  6.06093  6.07103  6.08018  6.08858  6.10199
## market potential      13.06340 13.06930 13.07370 13.07700 13.08490 13.09180
##                          [,25]    [,26]    [,27]    [,28]    [,29]    [,30]
## lag quarterly revenue  9.42150  9.44223  9.48721  9.52374  9.53980  9.58123
## price index            4.43966  4.42025  4.41060  4.41151  4.39810  4.38513
## income level           6.11207  6.11596  6.12129  6.12200  6.13119  6.14705
## market potential      13.09500 13.09840 13.10890 13.11690 13.12220 13.12660
##                          [,31]    [,32]    [,33]    [,34]    [,35]    [,36]
## lag quarterly revenue  9.60048  9.64496  9.64390  9.69405  9.69958  9.68683
## price index            4.37320  4.32770  4.32023  4.30909  4.30909  4.30552
## income level           6.15336  6.15627  6.16274  6.17369  6.16135  6.18231
## market potential      13.13560 13.14150 13.14440 13.14590 13.15200 13.15930
##                          [,37]    [,38]    [,39]
## lag quarterly revenue  9.71774  9.74924  9.77536
## price index            4.29627  4.27839  4.27789
## income level           6.18768  6.19377  6.20030
## market potential      13.15790 13.16250 13.16640
mtcars_flip <- t(mtcars)
head(mtcars_flip)
##      Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
## mpg      21.00        21.000      22.80         21.400             18.70
## cyl       6.00         6.000       4.00          6.000              8.00
## disp    160.00       160.000     108.00        258.000            360.00
## hp      110.00       110.000      93.00        110.000            175.00
## drat      3.90         3.900       3.85          3.080              3.15
## wt        2.62         2.875       2.32          3.215              3.44
##      Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
## mpg    18.10      14.30     24.40    22.80    19.20     17.80      16.40
## cyl     6.00       8.00      4.00     4.00     6.00      6.00       8.00
## disp  225.00     360.00    146.70   140.80   167.60    167.60     275.80
## hp    105.00     245.00     62.00    95.00   123.00    123.00     180.00
## drat    2.76       3.21      3.69     3.92     3.92      3.92       3.07
## wt      3.46       3.57      3.19     3.15     3.44      3.44       4.07
##      Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## mpg       17.30       15.20              10.40              10.400
## cyl        8.00        8.00               8.00               8.000
## disp     275.80      275.80             472.00             460.000
## hp       180.00      180.00             205.00             215.000
## drat       3.07        3.07               2.93               3.000
## wt         3.73        3.78               5.25               5.424
##      Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla Toyota Corona
## mpg             14.700    32.40      30.400         33.900        21.500
## cyl              8.000     4.00       4.000          4.000         4.000
## disp           440.000    78.70      75.700         71.100       120.100
## hp             230.000    66.00      52.000         65.000        97.000
## drat             3.230     4.08       4.930          4.220         3.700
## wt               5.345     2.20       1.615          1.835         2.465
##      Dodge Challenger AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9
## mpg             15.50      15.200      13.30           19.200    27.300
## cyl              8.00       8.000       8.00            8.000     4.000
## disp           318.00     304.000     350.00          400.000    79.000
## hp             150.00     150.000     245.00          175.000    66.000
## drat             2.76       3.150       3.73            3.080     4.080
## wt               3.52       3.435       3.84            3.845     1.935
##      Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora
## mpg          26.00       30.400          15.80        19.70         15.00
## cyl           4.00        4.000           8.00         6.00          8.00
## disp        120.30       95.100         351.00       145.00        301.00
## hp           91.00      113.000         264.00       175.00        335.00
## drat          4.43        3.770           4.22         3.62          3.54
## wt            2.14        1.513           3.17         2.77          3.57
##      Volvo 142E
## mpg       21.40
## cyl        4.00
## disp     121.00
## hp       109.00
## drat       4.11
## wt         2.78
str(mtcars_flip)
##  num [1:11, 1:32] 21 6 160 110 3.9 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...
##   ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
mtcars_flip <- as.data.frame(mtcars_flip)
str(mtcars_flip)
## 'data.frame':    11 obs. of  32 variables:
##  $ Mazda RX4          : num  21 6 160 110 3.9 ...
##  $ Mazda RX4 Wag      : num  21 6 160 110 3.9 ...
##  $ Datsun 710         : num  22.8 4 108 93 3.85 ...
##  $ Hornet 4 Drive     : num  21.4 6 258 110 3.08 ...
##  $ Hornet Sportabout  : num  18.7 8 360 175 3.15 ...
##  $ Valiant            : num  18.1 6 225 105 2.76 ...
##  $ Duster 360         : num  14.3 8 360 245 3.21 ...
##  $ Merc 240D          : num  24.4 4 146.7 62 3.69 ...
##  $ Merc 230           : num  22.8 4 140.8 95 3.92 ...
##  $ Merc 280           : num  19.2 6 167.6 123 3.92 ...
##  $ Merc 280C          : num  17.8 6 167.6 123 3.92 ...
##  $ Merc 450SE         : num  16.4 8 275.8 180 3.07 ...
##  $ Merc 450SL         : num  17.3 8 275.8 180 3.07 ...
##  $ Merc 450SLC        : num  15.2 8 275.8 180 3.07 ...
##  $ Cadillac Fleetwood : num  10.4 8 472 205 2.93 ...
##  $ Lincoln Continental: num  10.4 8 460 215 3 ...
##  $ Chrysler Imperial  : num  14.7 8 440 230 3.23 ...
##  $ Fiat 128           : num  32.4 4 78.7 66 4.08 ...
##  $ Honda Civic        : num  30.4 4 75.7 52 4.93 ...
##  $ Toyota Corolla     : num  33.9 4 71.1 65 4.22 ...
##  $ Toyota Corona      : num  21.5 4 120.1 97 3.7 ...
##  $ Dodge Challenger   : num  15.5 8 318 150 2.76 ...
##  $ AMC Javelin        : num  15.2 8 304 150 3.15 ...
##  $ Camaro Z28         : num  13.3 8 350 245 3.73 ...
##  $ Pontiac Firebird   : num  19.2 8 400 175 3.08 ...
##  $ Fiat X1-9          : num  27.3 4 79 66 4.08 ...
##  $ Porsche 914-2      : num  26 4 120.3 91 4.43 ...
##  $ Lotus Europa       : num  30.4 4 95.1 113 3.77 ...
##  $ Ford Pantera L     : num  15.8 8 351 264 4.22 3.17 14.5 0 1 5 ...
##  $ Ferrari Dino       : num  19.7 6 145 175 3.62 2.77 15.5 0 1 5 ...
##  $ Maserati Bora      : num  15 8 301 335 3.54 3.57 14.6 0 1 5 ...
##  $ Volvo 142E         : num  21.4 4 121 109 4.11 2.78 18.6 1 1 4 ...

Subsetting a data frame

One of the most common tasks in data analysis is to perform filtering. Last prac, we found out how to do this with vectors using the square bracket notation. Eg:x[3] will retrieve the 3rd element of x. Square brackets can also be used for two dimensional objects, but we need to provide two indexes. The syntax is df[rows,cols].

# get rows 1-10 of column 2
freeny.x[1:10,2]
##  [1] 4.70997 4.70217 4.68944 4.68558 4.64019 4.62553 4.61991 4.61654 4.61407
## [10] 4.60766
# get rows 1-6 of columns 1-3
freeny.x[1:6,1:3]
##      lag quarterly revenue price index income level
## [1,]               8.79636     4.70997      5.82110
## [2,]               8.79236     4.70217      5.82558
## [3,]               8.79137     4.68944      5.83112
## [4,]               8.81486     4.68558      5.84046
## [5,]               8.81301     4.64019      5.85036
## [6,]               8.90751     4.62553      5.86464
# get rows 1-6 of all columns
freeny.x[1:6,]
##      lag quarterly revenue price index income level market potential
## [1,]               8.79636     4.70997      5.82110          12.9699
## [2,]               8.79236     4.70217      5.82558          12.9733
## [3,]               8.79137     4.68944      5.83112          12.9774
## [4,]               8.81486     4.68558      5.84046          12.9806
## [5,]               8.81301     4.64019      5.85036          12.9831
## [6,]               8.90751     4.62553      5.86464          12.9854
# get all rows for columns 1 and 2 
freeny.x[,1:2]
##       lag quarterly revenue price index
##  [1,]               8.79636     4.70997
##  [2,]               8.79236     4.70217
##  [3,]               8.79137     4.68944
##  [4,]               8.81486     4.68558
##  [5,]               8.81301     4.64019
##  [6,]               8.90751     4.62553
##  [7,]               8.93673     4.61991
##  [8,]               8.96161     4.61654
##  [9,]               8.96044     4.61407
## [10,]               9.00868     4.60766
## [11,]               9.03049     4.60227
## [12,]               9.06906     4.58960
## [13,]               9.05871     4.57592
## [14,]               9.10698     4.58661
## [15,]               9.12685     4.57997
## [16,]               9.17096     4.57176
## [17,]               9.18665     4.56104
## [18,]               9.23823     4.54906
## [19,]               9.26487     4.53957
## [20,]               9.28436     4.51018
## [21,]               9.31378     4.50352
## [22,]               9.35025     4.49360
## [23,]               9.35835     4.46505
## [24,]               9.39767     4.44924
## [25,]               9.42150     4.43966
## [26,]               9.44223     4.42025
## [27,]               9.48721     4.41060
## [28,]               9.52374     4.41151
## [29,]               9.53980     4.39810
## [30,]               9.58123     4.38513
## [31,]               9.60048     4.37320
## [32,]               9.64496     4.32770
## [33,]               9.64390     4.32023
## [34,]               9.69405     4.30909
## [35,]               9.69958     4.30909
## [36,]               9.68683     4.30552
## [37,]               9.71774     4.29627
## [38,]               9.74924     4.27839
## [39,]               9.77536     4.27789

Now we need to see what happens when we subset just one column or row. You can see the default behaviour is to convert the data from matrix format to a vector. We can modify this using drop=FALSE to keep it in matrix format.

# get all rows of column 1
freeny.x[,1]
##  [1] 8.79636 8.79236 8.79137 8.81486 8.81301 8.90751 8.93673 8.96161 8.96044
## [10] 9.00868 9.03049 9.06906 9.05871 9.10698 9.12685 9.17096 9.18665 9.23823
## [19] 9.26487 9.28436 9.31378 9.35025 9.35835 9.39767 9.42150 9.44223 9.48721
## [28] 9.52374 9.53980 9.58123 9.60048 9.64496 9.64390 9.69405 9.69958 9.68683
## [37] 9.71774 9.74924 9.77536
# to prevent conversion to vector, use drop=FALSE
freeny.x[,1,drop=FALSE]
##       lag quarterly revenue
##  [1,]               8.79636
##  [2,]               8.79236
##  [3,]               8.79137
##  [4,]               8.81486
##  [5,]               8.81301
##  [6,]               8.90751
##  [7,]               8.93673
##  [8,]               8.96161
##  [9,]               8.96044
## [10,]               9.00868
## [11,]               9.03049
## [12,]               9.06906
## [13,]               9.05871
## [14,]               9.10698
## [15,]               9.12685
## [16,]               9.17096
## [17,]               9.18665
## [18,]               9.23823
## [19,]               9.26487
## [20,]               9.28436
## [21,]               9.31378
## [22,]               9.35025
## [23,]               9.35835
## [24,]               9.39767
## [25,]               9.42150
## [26,]               9.44223
## [27,]               9.48721
## [28,]               9.52374
## [29,]               9.53980
## [30,]               9.58123
## [31,]               9.60048
## [32,]               9.64496
## [33,]               9.64390
## [34,]               9.69405
## [35,]               9.69958
## [36,]               9.68683
## [37,]               9.71774
## [38,]               9.74924
## [39,]               9.77536
# the same concept for rows
freeny.x[1,]
## lag quarterly revenue           price index          income level 
##               8.79636               4.70997               5.82110 
##      market potential 
##              12.96990
# the same concept for rows
freeny.x[1,,drop=FALSE]
##      lag quarterly revenue price index income level market potential
## [1,]               8.79636     4.70997       5.8211          12.9699

Square brackets also works for data frames.

mtcars[1:10,1:6]
##                    mpg cyl  disp  hp drat    wt
## Mazda RX4         21.0   6 160.0 110 3.90 2.620
## Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875
## Datsun 710        22.8   4 108.0  93 3.85 2.320
## Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215
## Hornet Sportabout 18.7   8 360.0 175 3.15 3.440
## Valiant           18.1   6 225.0 105 2.76 3.460
## Duster 360        14.3   8 360.0 245 3.21 3.570
## Merc 240D         24.4   4 146.7  62 3.69 3.190
## Merc 230          22.8   4 140.8  95 3.92 3.150
## Merc 280          19.2   6 167.6 123 3.92 3.440
mtcars[,1]
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
mtcars[,1,drop=FALSE]
##                      mpg
## Mazda RX4           21.0
## Mazda RX4 Wag       21.0
## Datsun 710          22.8
## Hornet 4 Drive      21.4
## Hornet Sportabout   18.7
## Valiant             18.1
## Duster 360          14.3
## Merc 240D           24.4
## Merc 230            22.8
## Merc 280            19.2
## Merc 280C           17.8
## Merc 450SE          16.4
## Merc 450SL          17.3
## Merc 450SLC         15.2
## Cadillac Fleetwood  10.4
## Lincoln Continental 10.4
## Chrysler Imperial   14.7
## Fiat 128            32.4
## Honda Civic         30.4
## Toyota Corolla      33.9
## Toyota Corona       21.5
## Dodge Challenger    15.5
## AMC Javelin         15.2
## Camaro Z28          13.3
## Pontiac Firebird    19.2
## Fiat X1-9           27.3
## Porsche 914-2       26.0
## Lotus Europa        30.4
## Ford Pantera L      15.8
## Ferrari Dino        19.7
## Maserati Bora       15.0
## Volvo 142E          21.4

Data frames also have more options around subsetting columns. For example we can subset based on the name of the column or row.

mtcars[,"cyl"]
##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars[,c("mpg","wt")]
##                      mpg    wt
## Mazda RX4           21.0 2.620
## Mazda RX4 Wag       21.0 2.875
## Datsun 710          22.8 2.320
## Hornet 4 Drive      21.4 3.215
## Hornet Sportabout   18.7 3.440
## Valiant             18.1 3.460
## Duster 360          14.3 3.570
## Merc 240D           24.4 3.190
## Merc 230            22.8 3.150
## Merc 280            19.2 3.440
## Merc 280C           17.8 3.440
## Merc 450SE          16.4 4.070
## Merc 450SL          17.3 3.730
## Merc 450SLC         15.2 3.780
## Cadillac Fleetwood  10.4 5.250
## Lincoln Continental 10.4 5.424
## Chrysler Imperial   14.7 5.345
## Fiat 128            32.4 2.200
## Honda Civic         30.4 1.615
## Toyota Corolla      33.9 1.835
## Toyota Corona       21.5 2.465
## Dodge Challenger    15.5 3.520
## AMC Javelin         15.2 3.435
## Camaro Z28          13.3 3.840
## Pontiac Firebird    19.2 3.845
## Fiat X1-9           27.3 1.935
## Porsche 914-2       26.0 2.140
## Lotus Europa        30.4 1.513
## Ford Pantera L      15.8 3.170
## Ferrari Dino        19.7 2.770
## Maserati Bora       15.0 3.570
## Volvo 142E          21.4 2.780
mtcars["Camaro Z28",c("mpg","wt")]
##             mpg   wt
## Camaro Z28 13.3 3.84

Data frames columns can also be subsetted using the $ notation. The syntax is df$col.

mtcars$mpg
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4

This type of notation can even be used to create new columns. In this example below, we are converting the miles per gallon value to liters per 100km unit. We are also rounding this value to three significant figures.

mtcars$lper100km <- 235.215 / mtcars$mpg

mtcars[,c(1,ncol(mtcars))]
##                      mpg lper100km
## Mazda RX4           21.0 11.200714
## Mazda RX4 Wag       21.0 11.200714
## Datsun 710          22.8 10.316447
## Hornet 4 Drive      21.4 10.991355
## Hornet Sportabout   18.7 12.578342
## Valiant             18.1 12.995304
## Duster 360          14.3 16.448601
## Merc 240D           24.4  9.639959
## Merc 230            22.8 10.316447
## Merc 280            19.2 12.250781
## Merc 280C           17.8 13.214326
## Merc 450SE          16.4 14.342378
## Merc 450SL          17.3 13.596243
## Merc 450SLC         15.2 15.474671
## Cadillac Fleetwood  10.4 22.616827
## Lincoln Continental 10.4 22.616827
## Chrysler Imperial   14.7 16.001020
## Fiat 128            32.4  7.259722
## Honda Civic         30.4  7.737336
## Toyota Corolla      33.9  6.938496
## Toyota Corona       21.5 10.940233
## Dodge Challenger    15.5 15.175161
## AMC Javelin         15.2 15.474671
## Camaro Z28          13.3 17.685338
## Pontiac Firebird    19.2 12.250781
## Fiat X1-9           27.3  8.615934
## Porsche 914-2       26.0  9.046731
## Lotus Europa        30.4  7.737336
## Ford Pantera L      15.8 14.887025
## Ferrari Dino        19.7 11.939848
## Maserati Bora       15.0 15.681000
## Volvo 142E          21.4 10.991355
mtcars$lper100km <- signif(235.215 / mtcars$mpg ,3)

mtcars[,c(1,ncol(mtcars))]
##                      mpg lper100km
## Mazda RX4           21.0     11.20
## Mazda RX4 Wag       21.0     11.20
## Datsun 710          22.8     10.30
## Hornet 4 Drive      21.4     11.00
## Hornet Sportabout   18.7     12.60
## Valiant             18.1     13.00
## Duster 360          14.3     16.40
## Merc 240D           24.4      9.64
## Merc 230            22.8     10.30
## Merc 280            19.2     12.30
## Merc 280C           17.8     13.20
## Merc 450SE          16.4     14.30
## Merc 450SL          17.3     13.60
## Merc 450SLC         15.2     15.50
## Cadillac Fleetwood  10.4     22.60
## Lincoln Continental 10.4     22.60
## Chrysler Imperial   14.7     16.00
## Fiat 128            32.4      7.26
## Honda Civic         30.4      7.74
## Toyota Corolla      33.9      6.94
## Toyota Corona       21.5     10.90
## Dodge Challenger    15.5     15.20
## AMC Javelin         15.2     15.50
## Camaro Z28          13.3     17.70
## Pontiac Firebird    19.2     12.30
## Fiat X1-9           27.3      8.62
## Porsche 914-2       26.0      9.05
## Lotus Europa        30.4      7.74
## Ford Pantera L      15.8     14.90
## Ferrari Dino        19.7     11.90
## Maserati Bora       15.0     15.70
## Volvo 142E          21.4     11.00

You may also want to subset a data frame based on the values. Let’s say you want a car with fuel consumption less than 10 L/100km. Let’s do it the hard way first.

mtcars$lper100km 
##  [1] 11.20 11.20 10.30 11.00 12.60 13.00 16.40  9.64 10.30 12.30 13.20 14.30
## [13] 13.60 15.50 22.60 22.60 16.00  7.26  7.74  6.94 10.90 15.20 15.50 17.70
## [25] 12.30  8.62  9.05  7.74 14.90 11.90 15.70 11.00
mtcars$lper100km < 10
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
## [25] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
which(mtcars$lper100km < 10)
## [1]  8 18 19 20 26 27 28
mtcars[which(mtcars$lper100km < 10),]
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb lper100km
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2      9.64
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1      7.26
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2      7.74
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      6.94
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1      8.62
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2      9.05
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2      7.74

You can see that this is quite complicated. There is an easier way using subset(). Subset is also perfect for filtering based on more than one criteria using the & and ‘|’ operators.

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
##                     lper100km
## Mazda RX4               11.20
## Mazda RX4 Wag           11.20
## Datsun 710              10.30
## Hornet 4 Drive          11.00
## Hornet Sportabout       12.60
## Valiant                 13.00
## Duster 360              16.40
## Merc 240D                9.64
## Merc 230                10.30
## Merc 280                12.30
## Merc 280C               13.20
## Merc 450SE              14.30
## Merc 450SL              13.60
## Merc 450SLC             15.50
## Cadillac Fleetwood      22.60
## Lincoln Continental     22.60
## Chrysler Imperial       16.00
## Fiat 128                 7.26
## Honda Civic              7.74
## Toyota Corolla           6.94
## Toyota Corona           10.90
## Dodge Challenger        15.20
## AMC Javelin             15.50
## Camaro Z28              17.70
## Pontiac Firebird        12.30
## Fiat X1-9                8.62
## Porsche 914-2            9.05
## Lotus Europa             7.74
## Ford Pantera L          14.90
## Ferrari Dino            11.90
## Maserati Bora           15.70
## Volvo 142E              11.00
subset(mtcars,lper100km < 10)
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb lper100km
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2      9.64
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1      7.26
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2      7.74
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      6.94
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1      8.62
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2      9.05
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2      7.74
# you want an economical AND quick car
subset(mtcars,lper100km < 10 & qsec < 18)
##                mpg cyl  disp  hp drat    wt qsec vs am gear carb lper100km
## Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2      9.05
## Lotus Europa  30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2      7.74
# you want an economical OR quick car
subset(mtcars,lper100km < 10 | qsec < 18)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
##                     lper100km
## Mazda RX4               11.20
## Mazda RX4 Wag           11.20
## Hornet Sportabout       12.60
## Duster 360              16.40
## Merc 240D                9.64
## Merc 450SE              14.30
## Merc 450SL              13.60
## Cadillac Fleetwood      22.60
## Lincoln Continental     22.60
## Chrysler Imperial       16.00
## Fiat 128                 7.26
## Honda Civic              7.74
## Toyota Corolla           6.94
## Dodge Challenger        15.20
## AMC Javelin             15.50
## Camaro Z28              17.70
## Pontiac Firebird        12.30
## Fiat X1-9                8.62
## Porsche 914-2            9.05
## Lotus Europa             7.74
## Ford Pantera L          14.90
## Ferrari Dino            11.90
## Maserati Bora           15.70

Subset also works for strings and factors. To look at this we need to look at the iris dataset

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
setosa <- subset(iris,Species == "setosa")
head(setosa)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Row and columns names

You can use the colnames and rownames to get the row or column names and even mofidy them.

colnames(mtcars)
##  [1] "mpg"       "cyl"       "disp"      "hp"        "drat"      "wt"       
##  [7] "qsec"      "vs"        "am"        "gear"      "carb"      "lper100km"
rownames(mtcars)
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"
colnames(mtcars) <- c("miles per gallon",
              "number of cylinders",
              "displacement in cubic inches",
              "gross horsepower",
              "rear axle ratio",
              "weight (pounds/1000)",
              "quarter mile time in seconds",
              "V or straight cylinder configuration",
              "transmission type: auto (0) or manual (1)",
              "number of forward gears",
              "number of carburetors",
              "litres per 100km")

head(mtcars)
##                   miles per gallon number of cylinders
## Mazda RX4                     21.0                   6
## Mazda RX4 Wag                 21.0                   6
## Datsun 710                    22.8                   4
## Hornet 4 Drive                21.4                   6
## Hornet Sportabout             18.7                   8
## Valiant                       18.1                   6
##                   displacement in cubic inches gross horsepower rear axle ratio
## Mazda RX4                                  160              110            3.90
## Mazda RX4 Wag                              160              110            3.90
## Datsun 710                                 108               93            3.85
## Hornet 4 Drive                             258              110            3.08
## Hornet Sportabout                          360              175            3.15
## Valiant                                    225              105            2.76
##                   weight (pounds/1000) quarter mile time in seconds
## Mazda RX4                        2.620                        16.46
## Mazda RX4 Wag                    2.875                        17.02
## Datsun 710                       2.320                        18.61
## Hornet 4 Drive                   3.215                        19.44
## Hornet Sportabout                3.440                        17.02
## Valiant                          3.460                        20.22
##                   V or straight cylinder configuration
## Mazda RX4                                            0
## Mazda RX4 Wag                                        0
## Datsun 710                                           1
## Hornet 4 Drive                                       1
## Hornet Sportabout                                    0
## Valiant                                              1
##                   transmission type: auto (0) or manual (1)
## Mazda RX4                                                 1
## Mazda RX4 Wag                                             1
## Datsun 710                                                1
## Hornet 4 Drive                                            0
## Hornet Sportabout                                         0
## Valiant                                                   0
##                   number of forward gears number of carburetors
## Mazda RX4                               4                     4
## Mazda RX4 Wag                           4                     4
## Datsun 710                              4                     1
## Hornet 4 Drive                          3                     1
## Hornet Sportabout                       3                     2
## Valiant                                 3                     1
##                   litres per 100km
## Mazda RX4                     11.2
## Mazda RX4 Wag                 11.2
## Datsun 710                    10.3
## Hornet 4 Drive                11.0
## Hornet Sportabout             12.6
## Valiant                       13.0
colnames(mtcars)[1] <- "miles per US gallon"

head(mtcars)
##                   miles per US gallon number of cylinders
## Mazda RX4                        21.0                   6
## Mazda RX4 Wag                    21.0                   6
## Datsun 710                       22.8                   4
## Hornet 4 Drive                   21.4                   6
## Hornet Sportabout                18.7                   8
## Valiant                          18.1                   6
##                   displacement in cubic inches gross horsepower rear axle ratio
## Mazda RX4                                  160              110            3.90
## Mazda RX4 Wag                              160              110            3.90
## Datsun 710                                 108               93            3.85
## Hornet 4 Drive                             258              110            3.08
## Hornet Sportabout                          360              175            3.15
## Valiant                                    225              105            2.76
##                   weight (pounds/1000) quarter mile time in seconds
## Mazda RX4                        2.620                        16.46
## Mazda RX4 Wag                    2.875                        17.02
## Datsun 710                       2.320                        18.61
## Hornet 4 Drive                   3.215                        19.44
## Hornet Sportabout                3.440                        17.02
## Valiant                          3.460                        20.22
##                   V or straight cylinder configuration
## Mazda RX4                                            0
## Mazda RX4 Wag                                        0
## Datsun 710                                           1
## Hornet 4 Drive                                       1
## Hornet Sportabout                                    0
## Valiant                                              1
##                   transmission type: auto (0) or manual (1)
## Mazda RX4                                                 1
## Mazda RX4 Wag                                             1
## Datsun 710                                                1
## Hornet 4 Drive                                            0
## Hornet Sportabout                                         0
## Valiant                                                   0
##                   number of forward gears number of carburetors
## Mazda RX4                               4                     4
## Mazda RX4 Wag                           4                     4
## Datsun 710                              4                     1
## Hornet 4 Drive                          3                     1
## Hornet Sportabout                       3                     2
## Valiant                                 3                     1
##                   litres per 100km
## Mazda RX4                     11.2
## Mazda RX4 Wag                 11.2
## Datsun 710                    10.3
## Hornet 4 Drive                11.0
## Hornet Sportabout             12.6
## Valiant                       13.0

If you have a whitespace in the column or row name, it might cause problems later on with subsetting. In that case the column name needs to be wrapped in backticks like this.

economical_cars <- subset(mtcars,`litres per 100km` < 10)

economical_cars
##                miles per US gallon number of cylinders
## Merc 240D                     24.4                   4
## Fiat 128                      32.4                   4
## Honda Civic                   30.4                   4
## Toyota Corolla                33.9                   4
## Fiat X1-9                     27.3                   4
## Porsche 914-2                 26.0                   4
## Lotus Europa                  30.4                   4
##                displacement in cubic inches gross horsepower rear axle ratio
## Merc 240D                             146.7               62            3.69
## Fiat 128                               78.7               66            4.08
## Honda Civic                            75.7               52            4.93
## Toyota Corolla                         71.1               65            4.22
## Fiat X1-9                              79.0               66            4.08
## Porsche 914-2                         120.3               91            4.43
## Lotus Europa                           95.1              113            3.77
##                weight (pounds/1000) quarter mile time in seconds
## Merc 240D                     3.190                        20.00
## Fiat 128                      2.200                        19.47
## Honda Civic                   1.615                        18.52
## Toyota Corolla                1.835                        19.90
## Fiat X1-9                     1.935                        18.90
## Porsche 914-2                 2.140                        16.70
## Lotus Europa                  1.513                        16.90
##                V or straight cylinder configuration
## Merc 240D                                         1
## Fiat 128                                          1
## Honda Civic                                       1
## Toyota Corolla                                    1
## Fiat X1-9                                         1
## Porsche 914-2                                     0
## Lotus Europa                                      1
##                transmission type: auto (0) or manual (1)
## Merc 240D                                              0
## Fiat 128                                               1
## Honda Civic                                            1
## Toyota Corolla                                         1
## Fiat X1-9                                              1
## Porsche 914-2                                          1
## Lotus Europa                                           1
##                number of forward gears number of carburetors litres per 100km
## Merc 240D                            4                     2             9.64
## Fiat 128                             4                     1             7.26
## Honda Civic                          4                     2             7.74
## Toyota Corolla                       4                     1             6.94
## Fiat X1-9                            4                     1             8.62
## Porsche 914-2                        5                     2             9.05
## Lotus Europa                         5                     2             7.74

It is also useful to be able to subset a data frame based on the row names. Let’s get all the Mercedes models. To do this, we need to introduce the grep() command which matches strings.

# let's look again at the car names
rownames(mtcars)
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"
# lets filter all the ones with "Merc in the name"
grep("Merc",rownames(mtcars))
## [1]  8  9 10 11 12 13 14
# now lets extract out all those rows
mercs <- mtcars[grep("Merc",rownames(mtcars)),]

mercs
##             miles per US gallon number of cylinders
## Merc 240D                  24.4                   4
## Merc 230                   22.8                   4
## Merc 280                   19.2                   6
## Merc 280C                  17.8                   6
## Merc 450SE                 16.4                   8
## Merc 450SL                 17.3                   8
## Merc 450SLC                15.2                   8
##             displacement in cubic inches gross horsepower rear axle ratio
## Merc 240D                          146.7               62            3.69
## Merc 230                           140.8               95            3.92
## Merc 280                           167.6              123            3.92
## Merc 280C                          167.6              123            3.92
## Merc 450SE                         275.8              180            3.07
## Merc 450SL                         275.8              180            3.07
## Merc 450SLC                        275.8              180            3.07
##             weight (pounds/1000) quarter mile time in seconds
## Merc 240D                   3.19                         20.0
## Merc 230                    3.15                         22.9
## Merc 280                    3.44                         18.3
## Merc 280C                   3.44                         18.9
## Merc 450SE                  4.07                         17.4
## Merc 450SL                  3.73                         17.6
## Merc 450SLC                 3.78                         18.0
##             V or straight cylinder configuration
## Merc 240D                                      1
## Merc 230                                       1
## Merc 280                                       1
## Merc 280C                                      1
## Merc 450SE                                     0
## Merc 450SL                                     0
## Merc 450SLC                                    0
##             transmission type: auto (0) or manual (1) number of forward gears
## Merc 240D                                           0                       4
## Merc 230                                            0                       4
## Merc 280                                            0                       4
## Merc 280C                                           0                       4
## Merc 450SE                                          0                       3
## Merc 450SL                                          0                       3
## Merc 450SLC                                         0                       3
##             number of carburetors litres per 100km
## Merc 240D                       2             9.64
## Merc 230                        2            10.30
## Merc 280                        4            12.30
## Merc 280C                       4            13.20
## Merc 450SE                      3            14.30
## Merc 450SL                      3            13.60
## Merc 450SLC                     3            15.50

Sorting data tables

We are going to sort our subset of economical cars by their speed based on their quarter mile time. To do this, we need to use the order() command together with the square brackets. order() only returns the index of the values, it doesn’t actually do the sorting. Note that order() default behaviour is to bring the smaller values to the top. That can be reversed by putting a - before the vector being ordered.

economical_cars
##                miles per US gallon number of cylinders
## Merc 240D                     24.4                   4
## Fiat 128                      32.4                   4
## Honda Civic                   30.4                   4
## Toyota Corolla                33.9                   4
## Fiat X1-9                     27.3                   4
## Porsche 914-2                 26.0                   4
## Lotus Europa                  30.4                   4
##                displacement in cubic inches gross horsepower rear axle ratio
## Merc 240D                             146.7               62            3.69
## Fiat 128                               78.7               66            4.08
## Honda Civic                            75.7               52            4.93
## Toyota Corolla                         71.1               65            4.22
## Fiat X1-9                              79.0               66            4.08
## Porsche 914-2                         120.3               91            4.43
## Lotus Europa                           95.1              113            3.77
##                weight (pounds/1000) quarter mile time in seconds
## Merc 240D                     3.190                        20.00
## Fiat 128                      2.200                        19.47
## Honda Civic                   1.615                        18.52
## Toyota Corolla                1.835                        19.90
## Fiat X1-9                     1.935                        18.90
## Porsche 914-2                 2.140                        16.70
## Lotus Europa                  1.513                        16.90
##                V or straight cylinder configuration
## Merc 240D                                         1
## Fiat 128                                          1
## Honda Civic                                       1
## Toyota Corolla                                    1
## Fiat X1-9                                         1
## Porsche 914-2                                     0
## Lotus Europa                                      1
##                transmission type: auto (0) or manual (1)
## Merc 240D                                              0
## Fiat 128                                               1
## Honda Civic                                            1
## Toyota Corolla                                         1
## Fiat X1-9                                              1
## Porsche 914-2                                          1
## Lotus Europa                                           1
##                number of forward gears number of carburetors litres per 100km
## Merc 240D                            4                     2             9.64
## Fiat 128                             4                     1             7.26
## Honda Civic                          4                     2             7.74
## Toyota Corolla                       4                     1             6.94
## Fiat X1-9                            4                     1             8.62
## Porsche 914-2                        5                     2             9.05
## Lotus Europa                         5                     2             7.74
order(economical_cars$`quarter mile time in seconds`)
## [1] 6 7 3 5 2 4 1
sorted <- economical_cars[order(economical_cars$`quarter mile time in seconds`),]

sorted[,c(7,ncol(sorted))]
##                quarter mile time in seconds litres per 100km
## Porsche 914-2                         16.70             9.05
## Lotus Europa                          16.90             7.74
## Honda Civic                           18.52             7.74
## Fiat X1-9                             18.90             8.62
## Fiat 128                              19.47             7.26
## Toyota Corolla                        19.90             6.94
## Merc 240D                             20.00             9.64
reverse_sorted <- economical_cars[order(-economical_cars$`quarter mile time in seconds`),]

reverse_sorted[,c(7,ncol(reverse_sorted))]
##                quarter mile time in seconds litres per 100km
## Merc 240D                             20.00             9.64
## Toyota Corolla                        19.90             6.94
## Fiat 128                              19.47             7.26
## Fiat X1-9                             18.90             8.62
## Honda Civic                           18.52             7.74
## Lotus Europa                          16.90             7.74
## Porsche 914-2                         16.70             9.05

Creating data frames and matrices

First we will create a data frame for some people who completed a survey about their height and weight. You should always run str() to check that the resulting dataframe has the intended structure. You may need to include stringsAsFactors=FALSE to protect character strings being converted to factors.

pnames <- c("Jill", "Matt", "Sam", "Amy", "Bob", "Raj")

pnames
## [1] "Jill" "Matt" "Sam"  "Amy"  "Bob"  "Raj"
pgender <- as.factor(c("F", "M", "F", "F", "M", "M"))

pgender
## [1] F M F F M M
## Levels: F M
pheight <- c(164, 186, 170, 175, 178, 191)

pheight
## [1] 164 186 170 175 178 191
pweight <- c(54.1, 90.3, 64.8, 66.7, 80.4, 86.9)

pweight
## [1] 54.1 90.3 64.8 66.7 80.4 86.9
df <- data.frame(pnames,pgender,pheight,pweight)

str(df)
## 'data.frame':    6 obs. of  4 variables:
##  $ pnames : chr  "Jill" "Matt" "Sam" "Amy" ...
##  $ pgender: Factor w/ 2 levels "F","M": 1 2 1 1 2 2
##  $ pheight: num  164 186 170 175 178 191
##  $ pweight: num  54.1 90.3 64.8 66.7 80.4 86.9
df <- data.frame(pnames,pgender,pheight,pweight,stringsAsFactors = FALSE)

str(df)
## 'data.frame':    6 obs. of  4 variables:
##  $ pnames : chr  "Jill" "Matt" "Sam" "Amy" ...
##  $ pgender: Factor w/ 2 levels "F","M": 1 2 1 1 2 2
##  $ pheight: num  164 186 170 175 178 191
##  $ pweight: num  54.1 90.3 64.8 66.7 80.4 86.9
df
##   pnames pgender pheight pweight
## 1   Jill       F     164    54.1
## 2   Matt       M     186    90.3
## 3    Sam       F     170    64.8
## 4    Amy       F     175    66.7
## 5    Bob       M     178    80.4
## 6    Raj       M     191    86.9

Now we might want to make the row names the name of the person. This makes the data tidier, but it won’t work if there are more then one entry with the same name. You can use the NULL to delete columns. Deleting rows can be done with square brackets.

rownames(df) <- df$pnames

df
##      pnames pgender pheight pweight
## Jill   Jill       F     164    54.1
## Matt   Matt       M     186    90.3
## Sam     Sam       F     170    64.8
## Amy     Amy       F     175    66.7
## Bob     Bob       M     178    80.4
## Raj     Raj       M     191    86.9
df$pnames=NULL

df
##      pgender pheight pweight
## Jill       F     164    54.1
## Matt       M     186    90.3
## Sam        F     170    64.8
## Amy        F     175    66.7
## Bob        M     178    80.4
## Raj        M     191    86.9
# delete row 2 and 4
df <- df[-c(2,4),]

Now we will convert df into a matrix.

as.matrix(df)
##      pgender pheight pweight
## Jill "F"     "164"   "54.1" 
## Sam  "F"     "170"   "64.8" 
## Bob  "M"     "178"   "80.4" 
## Raj  "M"     "191"   "86.9"
df$pgender <- as.numeric(df$pgender)

mymat <- as.matrix(df)

mymat
##      pgender pheight pweight
## Jill       1     164    54.1
## Sam        1     170    64.8
## Bob        2     178    80.4
## Raj        2     191    86.9
str(mymat)
##  num [1:4, 1:3] 1 1 2 2 164 170 178 191 54.1 64.8 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:4] "Jill" "Sam" "Bob" "Raj"
##   ..$ : chr [1:3] "pgender" "pheight" "pweight"

We can also convert other types of data into a matrix. To demonstrate this, I’ll create some random data with rnorm and convert it into a matrix.

mydata <- rnorm(n = 100, mean = 10, sd = 20)

mymatrix <- matrix(data = mydata, nrow = 20, ncol = 5)

mymatrix
##              [,1]        [,2]       [,3]       [,4]        [,5]
##  [1,]  -4.2801716  27.8042726   5.560975  37.711016  32.5423586
##  [2,]  26.0206213 -13.5250847  28.054877  -6.942423  -6.7270302
##  [3,]  16.1918011  -2.1972911  33.744376 -14.043753 -17.5920574
##  [4,]  11.6004951   5.5450123  11.106866  27.963182  18.3480520
##  [5,]  21.5811984  28.1508528 -21.518946  20.666844  14.5482229
##  [6,]  10.6465952  40.1319231  24.226176  -5.570186  31.8289240
##  [7,]  20.0885722   0.4503572  11.780728  48.622576   5.0014874
##  [8,] -18.2532148  -2.9604872  28.019194   3.735677   2.2180992
##  [9,]   0.1716520  21.6853002  26.868184 -23.726472   0.5105539
## [10,]  13.4265996   4.8371126   1.520408   1.831450  50.6481434
## [11,]  38.8093321  -0.3240359  19.267868  33.496460 -24.8902884
## [12,]  11.9257015  15.5416153  15.411834   1.310456   0.6568034
## [13,]   0.6394727  47.5262953  51.190076  21.856575  18.0834210
## [14,]  17.5129350  12.8474774 -12.472880  34.379648  16.1540475
## [15,]  24.7494196  14.0251167  11.802182   8.459479  34.9307680
## [16,]  30.9681002  16.2868426   1.858304  19.721188  10.3956476
## [17,]  16.4886661  37.2384536  10.482843  50.870806  37.0697074
## [18,]  17.5981333  -1.5361183  14.029610  22.431381   5.5375631
## [19,]  40.7664445  33.1114125  -8.072433  18.171389  32.1747771
## [20,] -19.3565872   0.5343738  -6.352488  30.039362  45.9022889

Creating charts from data frame and matrix objects

Making charts from data frames and matrix objects is really similar to what we did last week. Here are some examples.

# histogram of cylinders
data(mtcars)
hist(mtcars$cyl,xlab="number of cylinders", main="number of cylinders in mtcars data")

# boxplot of qsec values for Toyota and Mercedes
toyota <- mtcars[grep("Toyota",rownames(mtcars)),]
merc <- mtcars[grep("Merc",rownames(mtcars)),]
boxplot(toyota$qsec, merc$qsec,
        ylab="quarter mile time in seconds",
        main="mtcars",
        names = c("Toyota","Mercedes"))

# scatterplot of petal length vs sepal length for setosa irises
# include a trend line using the lm function
setosa <- subset(iris,Species=="setosa")
head(setosa)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
mylm <- lm(setosa$Sepal.Length ~ setosa$Petal.Length)
plot(setosa$Petal.Length,setosa$Sepal.Length, 
     xlab="Petal length (cm)",
     ylab="Sepal length (cm)",
     main="Setosa petal and sepal length",
     pch=19)
abline(mylm,col="red",lwd=2,lty=2)

# a pairs plot is a special type of scatterplot
pairs(mymatrix)

# lets make a line diagram of freeny revenues
# first need to normalise each column to the initial value
head(freeny.x)
##      lag quarterly revenue price index income level market potential
## [1,]               8.79636     4.70997      5.82110          12.9699
## [2,]               8.79236     4.70217      5.82558          12.9733
## [3,]               8.79137     4.68944      5.83112          12.9774
## [4,]               8.81486     4.68558      5.84046          12.9806
## [5,]               8.81301     4.64019      5.85036          12.9831
## [6,]               8.90751     4.62553      5.86464          12.9854
freeny_norm <-t( t(freeny.x) / freeny.x[1,] )
head(freeny_norm)
##      lag quarterly revenue price index income level market potential
## [1,]             1.0000000   1.0000000     1.000000         1.000000
## [2,]             0.9995453   0.9983439     1.000770         1.000262
## [3,]             0.9994327   0.9956412     1.001721         1.000578
## [4,]             1.0021031   0.9948216     1.003326         1.000825
## [5,]             1.0018928   0.9851846     1.005027         1.001018
## [6,]             1.0126359   0.9820721     1.007480         1.001195
plot(freeny_norm[,1], type="b", ylim=c(0.9,1.12), col="blue",
     xlab="Quarters beginning 1962 Q2",
     ylab="Change in values overtime")
  lines(freeny_norm[,2], type="b", col="red")
  lines(freeny_norm[,3], type="b", col="black")
  lines(freeny_norm[,4], type="b", col="darkgreen")
  legend("topleft", legend = colnames(freeny.x), lty=1 , col = c("blue","red", "black", "darkgreen"))

In R, sometimes we need to load a particular package in order to make a special type of chart. In the example below we are making a heatmap, where the colour indicates the numerical value.

library("gplots")
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
heatmap.2(mymatrix,trace="none",scale="none",main="heatmap")

heatmap.2(cor(mymatrix),trace="none",scale="none", main="correlation")

This is also a good time to let you know that you can save R charts as files. There are different types of file formats, but PNG and PDF are the most used types. You will see these new files appear in your files menu.

pdf("myplot.pdf")
plot(1:10)
dev.off()
## png 
##   2
png("myplot.png")
plot(1:10)
dev.off()
## png 
##   2

Check your understanding so far

Answer the questions which use concepts from this week and previous sessions.

  1. Create a scatterplot of mtcars weight (x axis) versus mpg (y axis). Include x and y axis labels and a main heading.

  2. Sort mtcars by weight (wt) and create a horizontal barplot of wt values so heaviest ones at shown at the top of the bar plot. The plot should be labeled so it is clear to see which bar belong to which car. Include an axis label and main title.

  3. Create a box plot of iris petal lengths. Each species should be a different category. Chart needs axis labels and a main title.

We will discuss your solutions next week.

Reading in data files

So far we have been working with datasets that are built into R, but in the real world you will be working with data files. These could be in different formats like text (.txt), comma separated values (.csv), tab separated values (.tsv) and perhaps Excel files too (.xls and .xlsx).

We will be working with TSV file of pipetting measurements. URL: https://raw.githubusercontent.com/markziemann/SLE712_files/master/pipette_test.tsv

We will use the read.table() command and show you some really important options.

URL="https://raw.githubusercontent.com/markziemann/SLE712_files/master/pipette_test.tsv"

download.file(URL,"my.tsv")
# look closely at the structure of pip
# can you see what is wrong?
pip <- read.table("my.tsv")
pip
##         V1     V2     V3     V4
## 1 RefValue R100uL R150uL R200uL
## 2       M1  99.62 149.56 200.16
## 3       M2  98.48 147.06 199.88
## 4       M3 100.26 151.34 199.92
## 5       M4 101.12 150.12 200.62
## 6       M5  99.89 149.94 201.37
str(pip)
## 'data.frame':    6 obs. of  4 variables:
##  $ V1: chr  "RefValue" "M1" "M2" "M3" ...
##  $ V2: chr  "R100uL" "99.62" "98.48" "100.26" ...
##  $ V3: chr  "R150uL" "149.56" "147.06" "151.34" ...
##  $ V4: chr  "R200uL" "200.16" "199.88" "199.92" ...
# try again
pip <- read.table(URL,stringsAsFactors = FALSE)
pip
##         V1     V2     V3     V4
## 1 RefValue R100uL R150uL R200uL
## 2       M1  99.62 149.56 200.16
## 3       M2  98.48 147.06 199.88
## 4       M3 100.26 151.34 199.92
## 5       M4 101.12 150.12 200.62
## 6       M5  99.89 149.94 201.37
str(pip)
## 'data.frame':    6 obs. of  4 variables:
##  $ V1: chr  "RefValue" "M1" "M2" "M3" ...
##  $ V2: chr  "R100uL" "99.62" "98.48" "100.26" ...
##  $ V3: chr  "R150uL" "149.56" "147.06" "151.34" ...
##  $ V4: chr  "R200uL" "200.16" "199.88" "199.92" ...
# try again
# looking better
pip <- read.table(URL,stringsAsFactors = FALSE, header=TRUE)
pip
##   RefValue R100uL R150uL R200uL
## 1       M1  99.62 149.56 200.16
## 2       M2  98.48 147.06 199.88
## 3       M3 100.26 151.34 199.92
## 4       M4 101.12 150.12 200.62
## 5       M5  99.89 149.94 201.37
str(pip)
## 'data.frame':    5 obs. of  4 variables:
##  $ RefValue: chr  "M1" "M2" "M3" "M4" ...
##  $ R100uL  : num  99.6 98.5 100.3 101.1 99.9
##  $ R150uL  : num  150 147 151 150 150
##  $ R200uL  : num  200 200 200 201 201
# got it now
pip <- read.table(URL,stringsAsFactors = FALSE, header=TRUE, row.names=1)
pip
##    R100uL R150uL R200uL
## M1  99.62 149.56 200.16
## M2  98.48 147.06 199.88
## M3 100.26 151.34 199.92
## M4 101.12 150.12 200.62
## M5  99.89 149.94 201.37
str(pip)
## 'data.frame':    5 obs. of  3 variables:
##  $ R100uL: num  99.6 98.5 100.3 101.1 99.9
##  $ R150uL: num  150 147 151 150 150
##  $ R200uL: num  200 200 200 201 201

Now let’s try a csv file containing some travel records.

URL="https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv"

trav <- read.table(URL,sep=",")
trav
##       V1   V2   V3   V4
## 1  Month 1958 1959 1960
## 2    JAN  340  360  417
## 3    FEB  318  342  391
## 4    MAR  362  406  419
## 5    APR  348  396  461
## 6    MAY  363  420  472
## 7    JUN  435  472  535
## 8    JUL  491  548  622
## 9    AUG  505  559  606
## 10   SEP  404  463  508
## 11   OCT  359  407  461
## 12   NOV  310  362  390
## 13   DEC  337  405  432
str(trav)
## 'data.frame':    13 obs. of  4 variables:
##  $ V1: chr  "Month" "JAN" "FEB" "MAR" ...
##  $ V2: int  1958 340 318 362 348 363 435 491 505 404 ...
##  $ V3: int  1959 360 342 406 396 420 472 548 559 463 ...
##  $ V4: int  1960 417 391 419 461 472 535 622 606 508 ...
trav <- read.table(URL,sep=",",header=TRUE)
trav
##    Month X1958 X1959 X1960
## 1    JAN   340   360   417
## 2    FEB   318   342   391
## 3    MAR   362   406   419
## 4    APR   348   396   461
## 5    MAY   363   420   472
## 6    JUN   435   472   535
## 7    JUL   491   548   622
## 8    AUG   505   559   606
## 9    SEP   404   463   508
## 10   OCT   359   407   461
## 11   NOV   310   362   390
## 12   DEC   337   405   432
str(trav)
## 'data.frame':    12 obs. of  4 variables:
##  $ Month: chr  "JAN" "FEB" "MAR" "APR" ...
##  $ X1958: int  340 318 362 348 363 435 491 505 404 359 ...
##  $ X1959: int  360 342 406 396 420 472 548 559 463 407 ...
##  $ X1960: int  417 391 419 461 472 535 622 606 508 461 ...
trav <- read.csv(URL)
trav
##    Month X1958 X1959 X1960
## 1    JAN   340   360   417
## 2    FEB   318   342   391
## 3    MAR   362   406   419
## 4    APR   348   396   461
## 5    MAY   363   420   472
## 6    JUN   435   472   535
## 7    JUL   491   548   622
## 8    AUG   505   559   606
## 9    SEP   404   463   508
## 10   OCT   359   407   461
## 11   NOV   310   362   390
## 12   DEC   337   405   432
str(trav)
## 'data.frame':    12 obs. of  4 variables:
##  $ Month: chr  "JAN" "FEB" "MAR" "APR" ...
##  $ X1958: int  340 318 362 348 363 435 491 505 404 359 ...
##  $ X1959: int  360 342 406 396 420 472 548 559 463 407 ...
##  $ X1960: int  417 391 419 461 472 535 622 606 508 461 ...

Now let’s try a Microsoft Excel file.

URL="https://github.com/markziemann/SLE712_files/blob/master/misc/file_example_XLS_10.xls?raw=true"
NAME="file_example_XLS_10.xls"
download.file(URL,destfile=NAME)
library("readxl")

mydata <- read_xls(NAME)
mydata
## # A tibble: 9 × 8
##     `0` `First Name` `Last Name` Gender Country         Age Date          Id
##   <dbl> <chr>        <chr>       <chr>  <chr>         <dbl> <chr>      <dbl>
## 1     1 Dulce        Abril       Female United States    32 15/10/2017  1562
## 2     2 Mara         Hashimoto   Female Great Britain    25 16/08/2016  1582
## 3     3 Philip       Gent        Male   France           36 21/05/2015  2587
## 4     4 Kathleen     Hanner      Female United States    25 15/10/2017  3549
## 5     5 Nereida      Magwood     Female United States    58 16/08/2016  2468
## 6     6 Gaston       Brumm       Male   United States    24 21/05/2015  2554
## 7     7 Etta         Hurn        Female Great Britain    56 15/10/2017  3598
## 8     8 Earlean      Melgar      Female United States    27 16/08/2016  2456
## 9     9 Vincenza     Weiland     Female United States    40 21/05/2015  6548
str(mydata)
## tibble [9 × 8] (S3: tbl_df/tbl/data.frame)
##  $ 0         : num [1:9] 1 2 3 4 5 6 7 8 9
##  $ First Name: chr [1:9] "Dulce" "Mara" "Philip" "Kathleen" ...
##  $ Last Name : chr [1:9] "Abril" "Hashimoto" "Gent" "Hanner" ...
##  $ Gender    : chr [1:9] "Female" "Female" "Male" "Female" ...
##  $ Country   : chr [1:9] "United States" "Great Britain" "France" "United States" ...
##  $ Age       : num [1:9] 32 25 36 25 58 24 56 27 40
##  $ Date      : chr [1:9] "15/10/2017" "16/08/2016" "21/05/2015" "15/10/2017" ...
##  $ Id        : num [1:9] 1562 1582 2587 3549 2468 ...

Save and load session and single datasets

When working in R, it is convenient to save the session with save.image(). This results in an Rdata file which contains all the data objects in your current environment. To test that it’s actually working, clear your environment with the sweep/broom icon and then load the Rdata file with load().

save.image("mysession.Rdata")

rm(list=ls())

load("mysession.Rdata")

That is really cool, but sometimes we want to save individual objects, such as a large dataframe as Rdata files.

saveRDS(object = mymatrix , file = "mymatrix.Rds")

rm(list=ls())

x <- readRDS("mymatrix.Rds")

head(x)
##           [,1]       [,2]       [,3]       [,4]      [,5]
## [1,] -4.280172  27.804273   5.560975  37.711016  32.54236
## [2,] 26.020621 -13.525085  28.054877  -6.942423  -6.72703
## [3,] 16.191801  -2.197291  33.744376 -14.043753 -17.59206
## [4,] 11.600495   5.545012  11.106866  27.963182  18.34805
## [5,] 21.581198  28.150853 -21.518946  20.666844  14.54822
## [6,] 10.646595  40.131923  24.226176  -5.570186  31.82892

Check your skills

For the TSV file located here: https://raw.githubusercontent.com/markziemann/SLE712_files/master/misc/mydata.tsv

  1. Read it in and show the first 6 rows of data.

  2. Calculate the column and row means.

  3. Use the cor() command to find the correlation coefficients between the 3 data sets. Which two datasets are the most similar?

  4. Make a pairs plot of the three datasets.

Session information

For reproducibility.

sessionInfo()
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] readxl_1.4.1 gplots_3.1.3
## 
## loaded via a namespace (and not attached):
##  [1] rstudioapi_0.14    knitr_1.40         magrittr_2.0.3     R6_2.5.1          
##  [5] rlang_1.0.6        fastmap_1.1.0      fansi_1.0.3        stringr_1.4.1     
##  [9] highr_0.9          caTools_1.18.2     tools_4.2.2        xfun_0.34         
## [13] utf8_1.2.2         KernSmooth_2.23-20 cli_3.4.1          jquerylib_0.1.4   
## [17] htmltools_0.5.3    gtools_3.9.3       yaml_2.3.6         digest_0.6.30     
## [21] lifecycle_1.0.3    tibble_3.1.8       vctrs_0.5.0        sass_0.4.2        
## [25] bitops_1.0-7       glue_1.6.2         cachem_1.0.6       evaluate_0.17     
## [29] rmarkdown_2.17     stringi_1.7.8      pillar_1.8.1       cellranger_1.1.0  
## [33] compiler_4.2.2     bslib_0.4.0        jsonlite_1.8.3     pkgconfig_2.0.3