2. Importing Data: readr

Thomas Mailund  
Aarhus, Denmark
Before we can analyze data, we need to load it into R. The main Tidyverse package for this is called readr, and it is loaded when you load the tidyverse package:
but you can also load it explicitly using

Tabular data is usually stored in text files or compressed text files with rows and columns matching the table’s structure. Each line in the file is a row in the table, and columns are separated by a known delimiter character. The readr package is made for such data representation and contains functions for reading and writing variations of files formatted in this way. It also provides functionality for determining the types of data in each column, either by inferring types or through user specifications.

Functions for Reading Data

The readr package provides the following functions for reading tabular data :


File format


Comma-separated values


Semicolon-separated values


Tab-separated values


General column delimiters1


Space-separated values (fixed-length columns)


Space-separated values (variable-length columns)

The interface to these functions differs little. In the following text, I describe read_csv, but I highlight when the other functions differ. The read_csv function reads data from a file with comma-separated values. Such a file could look like this:
Unlike the base R read.csv function, read_csv will also handle files with spaces between the columns, so it will interpret the following data the same as the preceding file:
A,  B,  C,    D
1,  a,  a,   1.2
2,  b,  b,   2.1
3,  c,  c,  13.0

If you use R’s read.csv function instead, the spaces before columns B and C will be included as part of the data and the text columns will be interpreted as factors.

The first line in the file will be interpreted as a header, naming the columns, and the remaining three lines as data rows.

Assuming the file is named data/data.csv, you read its data like this:
my_data<-read_csv(file ="data/data.csv")
## Rows: 3 Columns: 4
##---- Column specification --------------------------
## Delimiter: ","
## chr (2): B, C
## dbl (2): A, D
## i Use 'spec()' to retrieve the full column specification for this data.
## i Specify the column types or set 'show_col_types = FALSE' to quiet this message.
The message you get from read_csv() tells you that you can get information about the type it has inferred for each column if you use the spec() function :
## cols(
##   A = col_double(),
##   B = col_character(),
##   C = col_character(),
##   D = col_double()
## )

When reading the file, read_csv will infer that columns A and D are numbers and columns B and C are strings.

If you are happy with that, and don’t want to be told about it in the future, you can use the option
show_col_types = FALSE:
my_data<-read_csv(file ="data/data.csv",
                  show_col_types =FALSE)
If the file contains tab-separated values
A  B  C  D
1  a  a  1.2
2  b  b  2.1
3  c  c  13.0
you should use read_tsv() instead.
my_data<-read_tsv(file ="data/data.tsv",
                  show_col_types =FALSE)
The file you read with read_csv can be compressed. If the suffix of the file name is .gz, .bz2, .xz, or .zip, it will be uncompressed before read_csv loads the data.
my_data<-read_csv(file ="data/data.csv.gz",
                  show_col_types =FALSE)

If the file name is a URL (i.e., has prefix http://, https://, ftp://, or ftps://, the file will automatically be downloaded.

You can also provide a string as the file object:
    "A, B, C,    D
     1, a, a,  1.2
     2, b, b,  2.1
     3, c, c, 13.0
", show_col_types = FALSE)
## # A tibble: 3 × 4
##      A B     C         D
##  <dbl> <chr> <chr> <dbl>
## 1    1 a     a       1.2
## 2    2 b     b       2.1
## 3    3 c     c      13

This is rarely useful in a data analysis project, but you can use it to create examples or for debugging.

File Headers

The first line in a comma-separated file is not always the column names; that information might be available from elsewhere outside the file. If you do not want to interpret the first line as column names, you can use the option col_names = FALSE.
    file ="data/data.csv",
    col_names =FALSE,
    show_col_types =FALSE
## # A tibble: 4 × 4
##   X1    X2    X3    X4
##   <chr> <chr> <chr> <chr>
## 1 A     B     C     D
## 2 1     a     a     1.2
## 3 2     b     b     2.1
## 4 3     c     c     13.0
Since the data/data.csv file has a header, that is interpreted as part of the data, and because the header consists of strings, read_csv infers that all the column types are strings. If we did not have the header, for example, if we had the file data/data-no-header.csv:
1, a, a, 1.2
2, b, b, 2.1
3, c, c, 13.0
then we would get the same data frame as before, except that the names would be autogenerated:
     file ="data/data-no-header.csv",
     col_names =FALSE,
     show_col_types =FALSE
## # A tibble: 3 × 4
##      X1 X2    X3        X4
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a        1.2
## 2     2 b     b        2.1
## 3     3 c     c       13

The autogenerated column names all start with X and are followed by the number the columns have from left to right in the file.

If you have data in a file without a header, but you do not want the autogenerated names, you can provide column names to the col_names option:
    file ="data/data-no-header.csv",
    col_names =c("X","Y","Z","W"),
    show_col_types =FALSE
## # A tibble: 3 × 4
##       X Y     Z         W
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c        13

If there is a header line, but you want to rename the columns, you cannot just provide the names to read_csv using col_names. The first row will still be interpreted as data. This gives you data you do not want in the first row, and it also affects the inferred types of the columns.

You can, however, skip lines before read_csv parse rows as data. Since we have a header line in data/data.csv, we can skip one line and set the column names.
    file ="data/data.csv",
    col_names =c("X","Y","Z","W"),
    skip =1,
    show_col_types =FALSE
## # A tibble: 3 × 4
##       X Y     Z         W
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
You can also put a limit on how many data rows you want to load using the n_max option.
    file ="data/data.csv",
    col_names =c("X","Y","Z","W"),
    skip =1,
    n_max =2,
    show_col_types =FALSE
If your input file has comment lines, identifiable by a character where the rest of the line should be considered a comment, you can skip them if you provide the comment option :
    "A, B, C,      D   # this is a comment
     1, a, a,   1.2     # another comment
     2, b, b,   2.1
     3, c, c, 13.0",
     comment ="#",
     show_col_types =FALSE)
## # A tibble: 3 × 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
You can leave a whole line as a comment, but then you want the comment character to start to the left of that line:
    "A, B, C,   D   # this is a comment
# whole line comment
     1, a, a,   1.2   # another comment
     2, b, b,   2.1
     3, c, c, 13.0",
     comment ="#",
     show_col_types =FALSE)
## # A tibble: 3 × 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
If you have space before the comment, the function can’t tell if there is an error—it looks like a line with missing columns rather than a blank line—so you will get a warning and a row with NA where the comment line was.
    "A, B, C,   D   # this is a comment
    # the indentation is a potential problem; missing columns?
    1, a, a,   1.2   # another comment
    2, b, b,   2.1
    3, c, c, 13.0",
    comment ="#",
    show_col_types =FALSE
## Warning: One or more parsing issues, see
## 'problems()' for details
## # A tibble: 4 × 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1    NA <NA>  <NA>     NA
## 2     1 a     a       1.2
## 3     2 b     b       2.1
## 4     3 c     c      13

For more options affecting how input files are interpreted, read the function documentation: ?read_csv.

Column Types

When read_csv parses a file, it infers the type of each column . This inference can be slow, or worse the inference can be incorrect. If you know a priori what the types should be, you can specify this using the col_types option. If you do this, then read_csv will not make a guess at the types. It will, however, replace values that it cannot parse as of the right type into NA.2

String-Based Column Type Specification

In the simplest string specification format , you must provide a string with the same length as you have columns and where each character in the string specifies the type of one column. The characters specifying different types are this:






















Guess (default)


Skip the column

By default, read_csv guesses, so we could make this explicit using the type specification "????":
    file ="data/data.csv",
    col_types ="????"
## # A tibble: 3 × 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1      1 a    a       1.2
## 2      2 b    b       2.1
## 3      3 c    c      13
The results of the guesses are double for columns A and D and character for columns B and C. If we wanted to make this explicit, we could use "dccd".
    file ="data/data.csv",
    col_types ="dccd"
## # A tibble: 3 × 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
If you want an integer type for column A, you can use "iccd":
    file ="data/data.csv",
    col_types ="iccd"
## # A tibble: 3 × 4
##       A B     C         D
##   <int> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
If you try to interpret column D as integers as well, you will get a list of warning messages , and the values in column D will all be NA; the numbers in column D cannot be interpreted as integers, and read_csv will not round them to integers.
    file ="data/data.csv",
    col_types ="icci"
## Warning: One or more parsing issues, see
## 'problems()' for details
## # A tibble: 3 × 4
##       A B     C         D
##   <int> <chr> <chr> <int>
## 1     1 a     a        NA
## 2     2 b     b        NA
## 3     3 c     c        NA

If you specify that a column should have type d, the numbers in the column must be integers or decimal numbers. If you use the type n (the default that read_csv will guess), you will also get doubles, but the latter type can handle strings that can be interpreted as numbers such as dollar amounts, percentages, and group separators in numbers. The column type n will ignore leading and trailing text and handle number separators:

With this function call
    'A, B, C,   D,   E
    col_types ="nccnn"
## # A tibble: 3 × 5
##       A B     C         D        E
##   <dbl> <chr> <chr> <dbl>    <dbl>
## 1     1 a     a       1.2  1100200
## 2     2 b     b       2.1   140000
## 3     3 c     c      13    2005000

columns A, D, and E will be read as numbers. If you use the type specification d, they would not, and all the values would be NA.

The decimal indicator and group delimiter vary around the world. By default, read_csv uses the US convention with a dot for decimal notation and comma for grouping in numbers. In many European countries, it is the opposite. You can use the locale option to change these:
    'A, B, C,   D,   E
    $1,a,a," 1,2%","1.100.200"
    $2,b,b," 2,1%","   140.000"
    locale=locale(decimal_mark =",",grouping_mark ="."),
    col_types ="nccnn")
## # A tibble: 3 × 5
##       A B     C         D        E
##   <dbl> <chr> <chr> <dbl>    <dbl>
## 1     1 a     a       1.2  1100200
## 2     2 b     b       2.1   140000
## 3     3 c     c      13    2005000

In the preceding example, I explicitly specified how read_csv should interpret numbers, but you can also use ISO 639-1 language codes.3 If you do, you also get the local time conventions and local day and month names. The default is English, but if your data is from Denmark, for example, you want to use Danish conventions, you would use the local locale("da"). For French data, you would use fr, locale("fr"). If you type this into an R console, you will see the month and week names, including their abbreviated forms, in these languages.

See the ?locale documentation for more options.

In files that use commas as decimal points and “.” for number groupings, the column delimiter is usually “;” rather than “,”. This way, it is not necessary to put decimal numbers in quotes. The read_csv2 function works as read_csv but uses “;” as column delimiter and “.” for number groupings.

The logical type is used for boolean values. If a column only contains TRUE and FALSE (case doesn’t matter)
    'A, B, C,   D
    TRUE, a, a,   1.2
    false, b, b,   2.1
    true, c, c, 13',
    show_col_types =FALSE
## # A tibble: 3 × 4
##   A     B     C         D
##   <lgl> <chr> <chr> <dbl>
## 1 TRUE  a     a       1.2
## 2 FALSE b     b       2.1
## 3 TRUE  c     c      13

then read_csv will guess that the type is logical.

It is not unusual to code boolean values as 0 and 1, however, and since these will be interpreted as numbers by default, you can make their type explicit using l:
    'A, B, C,   D
    1, a, a,   1.2
    0, b, b,   2.1
    1, c, c, 13',
    col_types ="lccn"
## # A tibble: 3 × 4
##   A     B     C         D
##   <lgl> <chr> <chr> <dbl>
## 1 TRUE  a     a       1.2
## 2 FALSE b     b       2.1
## 3 TRUE  c     c      13

If you use type l, you can mix TRUE/FALSE (ignoring case) with 0/1. Any other number or string will be translated into NA.

The D, t, and T types are for dates, time points, and datetime, in that order. Dates and time are what you might expect. A date specifies a range of days, for example, a single day, a week, a month, or a year. A time point specifies a specific time of the day, for example, an hour, a minute, or a second. A datetime combines a day and a time, that is, it specifies a specific time during a specific day.
    'D, T, t
    "2018-08-23", "2018-08-23T14:30", 14:30',
    col_types ="DTt"
## # A tibble: 1 × 3
##   D          T                   t
##   <date>     <dttm>              <time>
## 1 2018-08-23 2018-08-23 14:30:00 14:30

If you use one of these type specifications, the time and dates should be in ISO 8601 format.4 Local conventions for writing time and date, however, differ substantially and are rarely ISO 8601. When your time data are not ISO 8601, you need to tell read_csv how to read them.

The default time parser handles times in the hh:mm, hh:mm:ss formats and handles am and pm suffixes; it suffices for most time formats (but notice that it wants time in hh:mm or hh:mm:ss format; it is flexible in the number of characters you use for hours, and you can leave out seconds, but you cannot leave out minutes). Date and datetime vary much more than time formats, and there, you usually need to specify the encoding format.

You can use the locale option to change how read_csv parses dates (D) and time (t).
    'D, t
    "23 Oct 2018", 2pm',
    col_types ="Dt",
    locale =locale(
        date_format ="%d %b %Y",
        time_format ="%I%p"
## # A tibble: 1 × 2
##   D          t
##   <date>     <time>
## 1 2018-10-23 14:00

The date_format "%d %b %Y" says that dates are written as day, three-letter month abbreviation, and year with four digits, and each of the three separated by a space. The time_format "%I%p" says that we want time to be written as a number from 1 to 12, with no minute information, the hour immediately followed by am/pm without any space between.

For datetimes (T), we cannot specify the format using locale. We need a more verbose type specification that we will return to later. We also return to formatting specifications for parsing dates and time later.

Columns that are not immediately parsed as numbers, booleans, dates, or times will be parsed as strings. If you want these to be factors instead, you use the f type specification .
    'A, B, C,   D
    1, a, a,   1.2
    0, b, b,   2.1
    1, c, c, 13',
col_types ="lcfn")
## # A tibble: 3 × 4
##   A     B     C         D
##   <lgl> <chr> <fct> <dbl>
## 1 TRUE  a     a       1.2
## 2 FALSE b     b       2.1
## 3 TRUE  c     c      13
If you only want to use some of the columns, you can skip the rest using the “type” - or _:
    file ="data/data.csv",
    col_type ="_cc-"
## # A tibble: 3 × 2
##   B     C
##   <chr> <chr>
## 1 a     a
## 2 b     b
## 3 c     c
If you specify the column types using a string, you should specify the types of all columns. If you only want to define the types of a subset of columns, you can use the function cols() to specify types. You call this function with named parameters, where the names are column names and the arguments are types.
    file ="data/data.csv",
    col_types =cols(A ="c")
## # A tibble: 3 × 4
##   A     B     C        D
##   <chr> <chr> <chr> <dbl>
## 1 1     a     a       1.2
## 2 2     b     b       2.1
## 3 3     c     c      13
    file ="data/data.csv",
    col_types =cols(A ="c",D ="c")
## # A tibble: 3 × 4
##   A     B     C     D
##   <chr> <chr> <chr> <chr>
## 1 1     a     a     1.2
## 2 2     b     b     2.1
## 3 3     c     c     13.0

Function-Based Column Type Specification

If you are like me, you might find it hard to remember the single-character codes for different types. If so, you can use longer type names that you specify using function calls. These functions have names that start with col_, so you can use autocomplete to get a list of them. The types you can specify using functions are the same as those you can specify using characters, of course, and the functions are as follows:






















Guess (default)


Skip the column

You need to wrap the function-based type specifications in a call to cols.
    file ="data/data.csv",
    col_types =cols(A =col_integer())
## # A tibble: 3 × 4
##       A B     C         D
##   <int> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
    file ="data/data.csv",
    col_types =cols(D =col_character())
## # A tibble: 3 × 4
##       A B     C     D
##   <dbl> <chr> <chr> <chr>
## 1     1 a     a     1.2
## 2     2 b     b     2.1
## 3     3 c     c     13.0

Most of the col_ functions do not take any arguments, but they are affected by the locale parameter the same way that the string specifications are.

For factors, date, time, and datetime types, however, you have more control over the format using the col_ functions. You can use arguments to these functions for specifying how read_csv should parse dates and how it should construct factors.

For factors, you can explicitly set the levels. If you do not, then the column parser will set the levels in the order it sees the different strings in the column. For example, in data/data.csv the strings in columns C and D are in the order a, b, and c:
A, B, C,   D
1, a, a,  1.2
2, b, b,  2.1
3, c, c, 13.0
By default, the two columns will be interpreted as characters, but if we specify that C should be a factor, we get one where the levels are a, b, and c, in that order.
    file = "data/data.csv",
    col_types = cols(C =col_factor())
## [1] a b c
## Levels: a b c
If we want the levels in a different order, we can give col_factor() a levels argument.
    "A, B, C
     Foo, 12.4, Medium
     Bar, 5.2,   Small
     Baz, 42.0, Large
    col_types =cols(
        C =col_factor(levels =c("Small","Medium","Large"))
## [1] Medium Small   Large
## Levels: Small Medium Large
We can also make factors ordered using the ordered argument .
    file ="data/data.csv",
    col_types =cols(
        B =col_factor(ordered =TRUE),
        C =col_factor(levels =c("c","b","a"))
## [1] a b c
## Levels: a b c
## [1] a b c
## Levels: c b a

Parsing Time and Dates

The most complex types to read (or write) are dates and time (and datetime), just because these are written in many different ways. You can specify the format that dates and datetime are in using a string with codes that indicate how time information is represented.

The codes are these:


Time format

Example string



4-digit year


The year 1975


2-digit year5


Also the year 1975


2-digit month




Abbreviated month name6




Full month name




2-digit day


The 15th of a month


Hour number on a 24-hour clock


Six o’clock in the evening


Hour number on a 12-hour clock7

6 pm

18:00 hours


AM/PM indicator

6 pm

18:00 hours


Two-digit minutes


Half past six


Integer seconds


Ten seconds past 18:00


Time zone as name8


Central Time


Time zone as offset from UTC


Central European Time

There are shortcuts for frequently used formats:













As we saw earlier, you can set the date and time format using the locale() function . If you do not, the default codes will be %AD for dates and %AT for time (there is no locale() argument for datetime). These codes specify YMD and H:M/H:M:S formats, respectively, but are more relaxed in matching the patterns. The date parse, for example, will allow different separators. For dates, both “1975-02-15” and “1975/02/15” will be read as February 15, 1975, and for time, both “18:00” and “6:00 pm” will be six o’clock in the evening.

In the following text, I give a few examples. I will use the functions parse_date, parse_time, and parse_datetime rather than read_csv with column type specifications. These functions are used by read_csv when you specify a date, time, or datetime column type, but using read_csv for the examples would be unnecessarily verbose. Each takes a vector string representation of dates and time. For more examples, you can read the function documentation ?col_datetime.

Parsing time is simplest; there is not much variation in how time points are written. The main differences are in whether you use 24-hour clocks or 12-hour clocks. The %R and %T codes expect 24-hour clocks and differ in whether seconds are included or not.
parse_time(c("18:00"),format ="%R")
## 18:00:00
parse_time(c("18:00:30"),format ="%T")
## 18:00:30
There is no shortcut for 12-hour codes, but you must combine %I with %p to read PM/AM formats.
parse_time(c("6 pm"),format ="%I %p")
## 18:00:00
Here, I have specified that the input only includes hours and not minutes. If we want hours (and not minutes) in 24-hour clocks, we need to use %H rather than %R.
parse_time(c("18"),format ="%R")
## 18:00:00
For dates, ISO 8601 says that the format should be YYYY-MM-DD. The default date parser will accept this format, but the explicit format string is
parse_date(c("1975-02-05"),format ="%Y-%m-%d")
## [1] "1975-02-05"
If you do not want to include the day, and you want to use two-digit years, you need
parse_date(c("75-02"),format ="%y-%m")
## [1] "1975-02-01"

This is February 1975; remember that the %y code assumes that numbers above 68 are in the 20th century.

Dates written on the form 15/02/75 can mean both February 15, 1975, and May 2, 1975, depending on where you are in the world. Europe uses the sensible DD/MM/YY format, where the order goes from the smallest time unit, days, to the medium time units, months, and then to years. In the United States, they use the MM/DD/YY format. To get the 15th of February, you need one of these formats :
parse_date(c("15/02/75"),format ="%d/%m/%y")
## [1] "1975-02-15"
parse_date(c("02/15/75"),format ="%m/%d/%y")
## [1] "1975-02-15"
Date specifications that only use numbers are not affected by the local language, but if you include the name of months, they are. The name of months and their abbreviation varies from language to language, obviously. So does the name of weekdays, but at the time of writing, parsing weeks and weekdays is not supported by readr. You can get the name information from locale() if you use a language code. In the following examples, I parse dates in English and Danish. The month names are almost the same, but abbreviations in Danish require a dot following them, and the day is followed by a dot as well.
parse_date(c("Feb 15 1975"),format ="%b %d %Y",locale =locale("en"))
## [1] "1975-02-15"
parse_date(c("15. feb. 1975"),format ="%d. %b %Y",locale =locale("da"))
## [1] "1975-02-15"
parse_date(c("February 15 1975"),format ="%B %d %Y",locale =locale("en"))
## [1] "1975-02-15"
parse_date(c("15. feb. 1975"),format ="%d. %b %Y",locale =locale("da"))
## [1] "1975-02-15"
parse_date(c("Oct 15 1975"),format ="%b %d %Y",locale =locale("en"))
## [1] "1975-10-15"
parse_date(c("15. okt. 1975"),format ="%d. %b %Y",locale =locale("da"))
## [1] "1975-10-15"
parse_date(c("October 15 1975"),format ="%B %d %Y",locale =locale("en"))
## [1] "1975-10-15"
parse_date(c("15. oktober 1975"),format ="%d. %B %Y",locale =locale("da"))
## [1] "1975-10-15"

“Datetimes” can be parsed using combinations of date and time strings . With these, you also want to consider time zones. You can ignore those for dates and time, but unless you are sure that you will never have to consider time zones, you should not rely on the default time zone (which is UTC).9

You can either specify that time zones are relative to UTC with %z or location based, with %Z if the time zone is given in the input, or you can use locale() if it is the same for all the input.

If you specify a time zone based on a location, R will automatically adjust for daylight saving time, but if you use dates relative to UTC, you will not—UTC does not have daylight savings. Central European Time (CET) is “+0100” and with daylight saving time “+0200”. US Pacific Time (PST) is “-0800”, but with daylight saving time (PDT), it is “-0700”. When you switch back and forth between daylight savings is determined by your location.

These two datetimes are the same:
parse_datetime(c("Feb 15 1975 18:00 US/Pacific"),format ="%b %d %Y %R %Z")
## [1] "1975-02-16 02:00:00 UTC"
parse_datetime(c("Feb 15 1975 18:00 -0800"),format ="%b %d %Y %R %z")
## [1] "1975-02-16 02:00:00 UTC"
as are these two:
parse_datetime(c("Jun 15 1975 18:00 US/Pacific"),format ="%b %d %Y %R %Z")
## [1] "1975-06-16 01:00:00 UTC"
parse_datetime(c("Jun 15 1975 18:00 -0700"),format ="%b %d %Y %R %z")
## [1] "1975-06-16 01:00:00 UTC"
If you use locale() to specify a time zone, you cannot use zones relative to UTC. The point of using locale() is local formats, not time zones. The parser will still handle daylight savings for you, however. These two are the same datetimes:
   c("Aug 15 1975 18:00"),
   format ="%b %d %Y %R",
   locale =locale(tz ="US/Pacific")
## [1] "1975-08-15 18:00:00 PDT"
   c("Aug 15 1975 18:00 US/Pacific"),
   format ="%b %d %Y %R %Z"
## [1] "1975-08-16 01:00:00 UTC"

They are printed differently, but as we saw earlier, 6 pm (18:00) PDT is the same as 01:00 (the following day) in UTC.

If you print the objects you parse, there is a difference between using locale() and using %Z, but the time will be the same. Using %Z, you will automatically translate the time into UTC; using locale(), you will not. But you can compare the result of the two calls and see that they are equivalent objects:
   c("Aug 15 1975 18:00"),
   format ="%b %d %Y %R",
   locale =locale(tz ="US/Pacific")
   c("Aug 15 1975 18:00 US/Pacific"),
   format ="%b %d %Y %R %Z"
## [1] TRUE

The output of parse_datetime() looks like strings when you print them, but the object classes are not character, which, among other things, is why the comparison works.

Space-Separated Columns

The preceding functions all read delimiter-separated columns . They expect a single character to separate one column from the next. If the argument trim_ws is true, they ignore whitespace. This argument is true by default for read_csv, read_csv2, and read_tsv, but false for read_delim.

The function read_table instead separates columns by one or more spaces:
    "A   B   C   D
     1   2   3   4
    15  16  17  18"
## # A tibble: 2 × 4
##       A     B     C     D
##   <dbl> <dbl> <dbl> <dbl>
## 1     1     2     3     4
## 2    15    16    17    18
The first line is interpreted as a header, just as for the previous functions, but you can disable that with col_names = FALSE again:
    "A   B   C   D
     1   2   3   4
    15  16  17  18",
    col_names =FALSE
## # A tibble: 3 × 4
##   X1    X2    X3    X4
##   <chr> <chr> <chr> <chr>
## 1 A     B     C     D
## 2 1     2     3     4
## 3 15    16    17    18

The read_table function takes many of the same arguments as read_csv or read_tsv and mostly differs in what it considers the column separator—this function uses whitespace instead of a specific field separator such as a comma or a semicolon.

The package readxl is not loaded when you load the package tidyverse, but can be quite useful. Its read_excel function does exactly what it says on the tin; it reads Excel spreadsheets into R. Its interface is similar to the functions in readr. Where the interface differs is in Excel-specific options such as which sheet to read. Such options are clearly only needed when reading Excel files.

Functions for Writing Data

Writing data to a file is more straightforward than reading data because we have the data in the correct types and we do not need to deal with different formats. With readr’s writing functions, we have fewer options to format our output—for example, we cannot give the functions a locale() and we cannot specify date and time formatting, but we can use different functions to specify delimiters and time will be output in ISO 8601 which is what the reading functions will use as default.

The functions are write_delim, write_csv, write_csv2, and write_tsv, and for formats that Excel can read, write_excel_csv and write_excel_csv2. The difference between write_csv and write_excel_csv and between write_csv2 and write_excel_csv2 is that the Excel functions include a UTF-8 byte order mark so Excel knows that the file is UTF-8 encoded.

The first argument to these functions is the data we want to write, and the second is the path to the file we want to write to. If this file has suffix .gz, .bz2, or .xz, the output is automatically compressed.

I will not list all the arguments for these functions here, but you can read the documentation for them from the R console. The argument you are most likely to use is col_names, which, if true, means that the function will write the column names as the first line in the output, and if false, it will not. If you use write_delim, you might also want to specify the delimiter character using the delim argument. By default, it is a single space; if you write to a file using write_delim with the default options, you get the data in a format that you can read using read_table.

The delimiter characters and the decimal points for write_csv, write_csv2, and write_tsv are the same as for the corresponding read functions.

