library(readr)
# Alternatively
library(tidyverse)Natively, R has very good support for many file formats. For example, functions such as read.csv or the more generic read.table can be used to read in CSV files and other delimited text files. The foreign package can also be used to read in SAS transport files.
In this section we look at tidyverse functions for importing/exporting data. Although the tidyverse functions don’t always offer a great deal more in terms of functionality they are generally faster and more consistent than their base R counterparts.
5.1 Importing Text Files
The tidyverse functions that we will use for importing and exporting text files are contained in the readr package. Note that readr is loaded by default when loading tidyverse.
The readr package has many functions but the import ones all begin read_* and the export ones all begin write_*.
5.2 Reading in a CSV
The read_csv function is a special case of the read_delim function, a function that allows us to read in text files with a variety of different delimiters and structures. The read_csv function enables us to read in a CSV file. As always, we must give the imported data a name, else it is simply printed to screen. The named data is imported as a tbl_df data frame.
# Read in and save as `theoph`
# The "." represents the current working directory so this is a relative path
theoph <- read_csv("./data/theoph.csv")# Now print
theoph# A tibble: 132 × 5
SUBJID WT DOSE TIME CONC
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 79.6 4.02 0 0.74
2 1 79.6 4.02 0.25 2.84
3 1 79.6 4.02 0.57 6.57
4 1 79.6 4.02 1.12 10.5
5 1 79.6 4.02 2.02 9.66
6 1 79.6 4.02 3.82 8.58
7 1 79.6 4.02 5.1 8.36
8 1 79.6 4.02 7.03 7.47
9 1 79.6 4.02 9.05 6.89
10 1 79.6 4.02 12.1 5.94
# ℹ 122 more rows
5.2.1 File Paths
Note that we specified the file path using forward slashes, i.e. "/". The “backslash, "\", is an escape key and has special meaning (or at least what comes after it has a meaning). For example "\n" means ‘return/enter’, "\t" means ‘tab’ and confusingly "\\" means ‘backslash’! Basically, you must either replace all of the backslashes with forward slashes or add a second backslash at each location.
5.3 Reading in Data from SAS
We can read both the “.XPT” (a.k.a. “SAS transport file”) and “.SAS7BDAT” formats into R by making use of the haven package. The SAS transport file (version 5) is an open format and can be read in by making use of the haven read_xpt function. The “.SAS7BDAT” files can be read in by making use of the haven read_sas function.
RStudio understands the concept of labels and the RStudio data viewer is arguably better than the one in PC SAS!!!
There are other packages, e.g. foreign and SASXport, which have similar functionality when it comes to importing SAS transport files. However, we use haven for reasons of consistency and efficiency.
# Load the package
library(haven)# Read in the data (remembering file extension)
dm <- read_sas("./data/dm.sas7bdat")# View the data
dm # or try View(dm)# A tibble: 30 × 7
USUBJID AGE SEX COUNTRY RACE ETHNIC ARM
<chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 STD123456:000001 32 F UK BLACK OR AFRICAN AMERICAN NOT HIS… Comp…
2 STD123456:000002 28 M FRA WHITE NOT HIS… Comp…
3 STD123456:000003 55 M USA BLACK OR AFRICAN AMERICAN NOT HIS… Comp…
4 STD123456:000004 35 F GER WHITE HISPANI… Comp…
5 STD123456:000005 30 F IRE WHITE NOT HIS… Comp…
6 STD123456:000006 22 F GER WHITE NOT HIS… Comp…
7 STD123456:000007 59 F USA WHITE NOT HIS… Comp…
8 STD123456:000008 53 M GER WHITE NOT HIS… GSK
9 STD123456:000009 60 F USA WHITE NOT HIS… GSK
10 STD123456:000010 48 M USA WHITE NOT HIS… Comp…
# ℹ 20 more rows
5.4 The Working Directory
All R sessions have a working directory. On Windows this is usually something like "C:/Users/[mudid]/Documents". This is the directory that R looks in by default when we attempt to import data. Similarly, it’s where R writes to by default. It’s also the default if sourcing other R scripts. We can find out what our working directory is via the getwd function and list the files within it using the list.files function.
# What is my current working directory?
getwd()[1] "/home/runner/work/intro_to_r_and_the_tidyverse_training/intro_to_r_and_the_tidyverse_training/data"
# What files are in the working directory?
list.files() [1] "act.sas7bdat" "act.xpt" "actFull.sas7bdat" "actFull.xpt"
[5] "actLong.sas7bdat" "actLong.xpt" "dataset.sas7bdat" "dm.sas7bdat"
[9] "dm.xpt" "pft.sas7bdat" "pft.xpt" "sl.sas7bdat"
[13] "sl.xpt" "theoph.csv" "vs.sas7bdat" "vs.xpt"
We can change/set the working directory using the setwd function. The advantage of setting up a working directory is that we needn’t specify full file paths every time we import/export data. This also makes our code more transferable as our username isn’t hard-coded into our scripts!
# Set my working directory to where some data are stored
setwd("/mnt/code/gsk_R_training/data")In the example below we import data that is located in our current working directory. We can therefore simply specify the name of the file (including the extension) and ignore the path.
dm <- read_sas("dm.sas7bdat")5.5 Projects
We started this course by creating an RStudio project. One of the benefits of creating a project within a directory is that it sets the workspace to that directory. This means that we can immediately use relative file paths for any local data.
For more information on RStudio projects see [RStudio Projects]
5.7 EXERCISE
- Import the
theophdata into R using a relative file path (i.e. one that starts “data/”)- Check that it has imported correctly - how many rows and columns does it have?
- Import the
actdata into R- Check that it has imported correctly - how many rows and columns does it have?
5.8 Exporting Data
There is an experimental, write_sas function within haven. However it is not currently possible to export data to the “.SAS7BDAT” format with any consistency. However we may export data by using the haven function write_xpt to the “.XPT” (a.k.a. “SAS transport file”) format following the SAS V5 standard (acceptable for submission to regulatory agencies).
In addition, we may export data to various delimited file formats … using readr functions such as write_delim or write_csv. The format of such functions is extremely consistent - the first argument is the name of the data (i.e. the R object name) and the second argument is the name of the file that we wish to write to. Here is an example using write_csv.
write_csv(dm, "dm.csv")Other useful arguments to write_csv include na, which controls the way missing values are written to the output file (defaults to "NA"), and append which, when set to TRUE allows us to append to existing files rather than create new ones or overwrite existing files.
write_csv(dm, "dm_saslike.csv", na = ".")