On-disk formats

R creates and stores data in memory. This means that, if the size of your dataset exceeds the amount of available memory, it will not be feasible to read and write the corresponding data.

There are, however, a few tools that can save and operate on data stored locally on disk using R. A few of them have been mentioned for reference:

Package

Use

bigmemory

Can store and manipulate massive matrices and create the big.matrix objects

bigtabulate

Used for table, tapply, and other operations on the big.matrix objects

biganalytics

Extends the big memory package and adds functionality such as large-scale k-means and other analytical functions

biglm

Used for generalized linear modeling on large datasets

bigstatsr

Used for statistical analysis of large matrices (under development at the time of writing) and available at: https://github.com/cran/bigstatsr

ff

Used for storing large datasets on disk

SparkR

R connector to Spark

 

While other connectors also exist, the preceding are some of the more common ones that you will encounter today.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.251.70