Working with Big Data inR | 219
8.3.3
data.table
Package
The package data.table has extremely fast data processing capabilities. It extends the data.frame
object of traditional R but makes it extremely fast in terms of data transformation, aggregation,
subset generation, etc. Let’s do a quick check to see the extent to which data.table functions are
faster than traditional R functions or for that matter the functions from ff and ffbase packages.
Let’s start with the same data import operation for the credit card data. The fread() function of
data.table package imports data through the fast le reader.
  > library(data.table)
  >dt_ccard<- fread(creditcard.csv, stringsAsFactors = TRUE)
Read 284807 rows and 31 (of 31) columns from 0.140 GB file in 00:00:09
It is quite amazing that fread() function could import the credit card data in 9 seconds. For
the same data, traditional R
read.data() function took 42.5 seconds and read.table.
ffdf()
function took 19.9 seconds.
Another advantage of using data.table package is its ease of use. Subsets and aggregates can
be generated very easily from data tables generated using
fread() function. The pattern to be
followed is given as follows.
The literal syntax to be used is DTable [i, j, BY], where DTable represents the data table formed,
i stands for WHERE, i.e., row-wise filter, j stands for SELECT, i.e., columns to be selected and BY
for the GROUPBY columns. Below is a sample code and output for the credit card data, where we
need to select only the fraudulent transactions and for the selected subset, we need to group the
transactions by the transaction amount and create to derived attributes one by averaging and the
other by summing the values in attribute V1.
   >dt_ccard[Class == 1, .(Avg_V1 = mean(V1, na.rm = TRUE), Tot_V1 =
sum(V1)), by = .(Amount)]
Amount Avg_V1 Tot_V1
1: 0.00 -3.0948811 -83.5617900
2: 529.00 -3.0435406 -3.0435406
3: 239.93 -2.3033496 -2.3033496
4: 59.00 -4.3979744 -4.3979744
5: 1.00 -5.1061397 -576.9937853
---
255: 349.08 -1.3744244 -1.3744244
256: 390.00 -1.9278833 -1.9278833
257: 77.89 -0.6761427 -0.6761427
258: 245.00 -3.1138316 -3.1138316
259: 42.53 1.9919761 1.9919761
M08 Big Data Simplified XXXX 01.indd 219 5/10/2019 10:01:18 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.59.192