2  Recap

2.1 Pre-requisites

  • This course assumes a working level of R.
  • Ideally you should have attended “Introduction to R and the tidyverse” or an equivalent R course prior to taking this course

If you have never used R you should not be on this course!

2.2 Things you should know about

It is assumed that you are familiar with the following

  1. Vectors
  2. Data frames
  3. Importing and exporting data
  4. Usage of functions for summarising / aggregating data (mean, sd etc)

2.3 The magrittr Pipe

If you’ve not used R recently then you probably haven’t come across the magrittr pipe, %>%. R programming has changed a lot and the pipe is now very common. The pipe is essentially an alternative to nesting that allows functions to be chained together: the output of one function becomes the first input of the next.

To see how it works we will use some theophylline data. The aim here is to extract baseline concentration values for each subject. We will use filter and select. Without the pipe there are two ways we might do this.

> # tidyverse loads the dplyr package for filter and select (and the pipe)
> library(tidyverse) 
> # Start with the theoph data
> head(theoph)
# A tibble: 6 × 5
  SUBJID    WT  DOSE  TIME  CONC
   <dbl> <dbl> <dbl> <dbl> <dbl>
1      1  79.6  4.02  0     0.74
2      1  79.6  4.02  0.25  2.84
3      1  79.6  4.02  0.57  6.57
4      1  79.6  4.02  1.12 10.5 
5      1  79.6  4.02  2.02  9.66
6      1  79.6  4.02  3.82  8.58
> # Method 1: nesting (this is horrible)
> bl_conc <- select(filter(theoph, TIME == 0), SUBJID, CONC)
> 
> # Method 2: traditional stepwise approach
> # (better, but leaves us with an object, bl_rows, that we may not want)
> bl_rows <- filter(theoph, TIME == 0)
> bl_conc <- select(bl_rows, SUBJID, CONC)
> 
> # Now let's look at the data
> head(bl_conc)
# A tibble: 6 × 2
  SUBJID  CONC
   <dbl> <dbl>
1      1  0.74
2      2  0   
3      3  0   
4      4  0   
5      5  0   
6      6  0   

The magrittr pipe provides a more readable syntax, similar to method 2 but avoids the intermediary dataset.

> bl_conc <- theoph %>%      # Take the theoph data, then ...
+   filter(TIME == 0) %>%   # ... filter the rows, then ... 
+   select(SUBJID, CONC)    # ... select columns
> 
> # Now let's look at the data
> bl_conc %>% head
# A tibble: 6 × 2
  SUBJID  CONC
   <dbl> <dbl>
1      1  0.74
2      2  0   
3      3  0   
4      4  0   
5      5  0   
6      6  0   
If you notice an issue, have suggestions for improvements, or want to view the source code, you can find it on GitHub.