© Thomas Mailund 2019
Thomas MailundR Data Science Quick Referencehttps://doi.org/10.1007/978-1-4842-4894-2_3

3. Representing Tables: tibble

Thomas Mailund1 
(1)
Aarhus, Denmark
 

The data that the readr package returns are represented as tibble objects. These are tabular data representations similar to the base R data frames but are a more modern version.

The package that implements tibbles is tibble. You can load it using:
library(tibble)
Or as part of the tidy verse:
library(tidyverse)

Creating Tibbles

Tidyverse functions that create tabular data will create tibbles rather than data frames. For example, when we use read_csv to read a file into memory, the result is a tibble:
x <- read_csv(file = "data/data.csv")
## Parsed with column specification:
## cols(
##   A = col_double(),
##   B = col_character(),
##   C = col_character(),
##   D = col_double()
## )
x
## # A tibble: 3 x 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
## 3     3 c     c      13
The table that read_csv() creates has several super-classes, but the last is data.frame.
class(x)
## [1] "spec_tbl_df" "tbl_df" "tbl"
## [4] "data.frame"
This means that generic functions, if not specialized in the other classes, will use the data.frame version, and this, in turn, means that you can often use tibbles in functions that expect data frames. It does not mean that you can always use tibbles as a replacement for a data frame. If you run into this problem, you can translate a tibble into a data frame using as.data.frame() :
y <- as.data.frame(x)
y
##   A B C    D
## 1 1 a a  1.2
## 2 2 b b  2.1
## 3 3 c c 13.0
class(y)
## [1] "data.frame"

Notice that the two objects, when printed, give different output. When printing a tibble, there is more information about the column types.

If you have a data frame, you can translate it into a tibble using as_tibble() :
z <- as_tibble(y)
z
## # A tibble: 3 x 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a     1.2
## 2     2 b     b     2.1
## 3     3 c     c    13
You can create a tibble from vectors using the tibble() function :
x <- tibble(
    x = 1:100,
    y = x^2,
    z = y^2
)
x
## # A tibble: 100 x 3
##        x     y     z
##    <int> <dbl> <dbl>
##  1     1     1     1
##  2     2     4    16
##  3     3     9    81
##  4     4    16   256
##  5     5    25   625
##  6     6    36  1296
##  7     7    49  2401
##  8     8    64  4096
##  9     9    81  6561
## 10    10   100 10000
## # ... with 90 more rows

Two things to notice here: when you print a tibble, you only see the first ten lines. This is because the tibble has enough lines that it will flood the console if you print all of them. If a tibble has more than 20 rows, you will only see the first ten. If it has fewer, you will see all the rows.

You can change how many lines you will see using the n option to print() :
print(x, n = 2)
## # A tibble: 100 x 3
##       x     y     z
##   <int> <dbl> <dbl>
## 1     1     1     1
## 2     2     4    16
## # ... with 98 more rows
If a tibble has more columns than your console can show, only some will be printed. You can change the number of characters it will print using the width option to print.
print(x, n = 2, width = 15)
## # A tibble:
## #   100 x 3
##       x     y
##   <int> <dbl>
## 1     1     1
## 2     2     4
## # ...  with 98
## #   more rows,
## #   and 1 more
## #   variable:
## #   z <dbl>

You can set either option to Inf. If n is Inf, you will see all rows, and if width is Inf, you will see all the columns.

The second thing to notice is that you can refer to previous columns when specifying later columns. When we created x, we used
tibble(
    x = 1:100,
    y = x^2,
    z = y^2
)
where column y refers to column x and column z refers to column y. You cannot refer to variable in the following columns, so this would be an error:
tibble(
    w = x/2,
    x = 1:100
)
When you use tibble() to create a data frame, you specify the columns as named arguments. If you indent your code as I have in the above example, then you can think of this as defining each column in one line. You can also create a tibble with one line per row.
tribble(
    ~x, ~y,  ~z,
     1, 10, 100,
     2, 20, 200,
     3, 30, 300
)
## # A tibble: 3 x 3
##       x     y     z
##   <dbl> <dbl> <dbl>
## 1     1    10   100
## 2     2    20   200
## 3     3    30   300

The first line names the columns, and the ~ is necessary before the names. For large tibbles, using tribble() is not that helpful, but for example code or small tables, it can be.

Indexing Tibbles

You can index a tibble in much the same way as you can index a data frame. You can extract a column using single-bracket index ([]) , either by name or by index:
x <- read_csv(file = "data/data.csv")
## Parsed with column specification:
## cols(
##   A = col_double(),
##   B = col_character(),
##   C = col_character(),
##   D = col_double()
## )
y <- as.data.frame(x)
x["A"]
## # A tibble: 3 x 1
##       A
##   <dbl>
## 1     1
## 2     2
## 3     3
y["A"]
##   A
## 1 1
## 2 2
## 3 3
x[1]
## # A tibble: 3 x 1
##       A
##   <dbl>
## 1     1
## 2     2
## 3     3
y[1]
##   A
## 1 1
## 2 2
## 3 3

The result is a tibble or data.frame, respectively, containing a single column.

If you use double brackets ([[]]) , you will get the vector contained in a column rather than a tibble/data frame:
x[["A"]]
## [1] 1 2 3
y[["A"]]
## [1] 1 2 3
You will also get the underlying vector of a column if you use $-indexing:
x$A
## [1] 1 2 3
y$A
## [1] 1 2 3
Using [] you can extract more than one column.
x[c("A", "C")]
## # A tibble: 3 x 2
##       A C
##   <dbl> <chr>
## 1     1 a
## 2     2 b
## 3     3 c
y[c("A", "C")]
##   A C
## 1 1 a
## 2 2 b
## 3 3 c
x[1:2]
## # A tibble: 3 x 2
##       A B
##   <dbl> <chr>
## 1     1 a
## 2     2 b
## 3     3 c
y[1:2]
##   A B
## 1 1 a
## 2 2 b
## 3 3 c

You cannot do this using [[]].

You can extract a subset of rows and columns if you use two indices. For example, you can get the first two rows in the first two columns using [1:2,1:2]:
x[1:2,1:2]
## # A tibble: 2 x 2
##       A B
##   <dbl> <chr>
## 1     1 a
## 2     2 b
y[1:2,1:2]
##   A B
## 1 1 a
## 2 2 b
With a single index, you always extract a subset of columns. If you want to extract a subset of rows for all columns, you can use
x[1:2,]
## # A tibble: 2 x 4
##       A B     C         D
##   <dbl> <chr> <chr> <dbl>
## 1     1 a     a       1.2
## 2     2 b     b       2.1
y[1:2,]
##   A B C   D
## 1 1 a a 1.2
## 2 2 b b 2.1
If you extract a subset of rows from a single column, tibbles and data frames no longer have the same behavior. A tibble will give you a tibble in return, while a data frame will give you a vector:
x[1:2,2]
## # A tibble: 2 x 1
##   B
##   <chr>
## 1 a
## 2 b
y[1:2,2]
## [1] "a" "b"

Tibbles are more consistent. When you extract part of a tibble, you always get a tibble in return. Data frames sometimes give you a data frame and sometimes (in this particular case) a vector. In any function that expects a data frame and only use subscripts that return a data frame, you can also use a tibble. Tibbles return tibbles, and since a tibble is also a data frame, this will not cause any problems. If a function expects to extract a vector, you cannot use a tibble. This is where you will need to use as.data.frame() to get a data.frame() with data frame behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.13.5