R creates and stores data in memory. This means that, if the size of your dataset exceeds the amount of available memory, it will not be feasible to read and write the corresponding data.
There are, however, a few tools that can save and operate on data stored locally on disk using R. A few of them have been mentioned for reference:
Package |
Use |
bigmemory |
Can store and manipulate massive matrices and create the big.matrix objects |
bigtabulate |
Used for table, tapply, and other operations on the big.matrix objects |
biganalytics |
Extends the big memory package and adds functionality such as large-scale k-means and other analytical functions |
biglm |
Used for generalized linear modeling on large datasets |
bigstatsr |
Used for statistical analysis of large matrices (under development at the time of writing) and available at: https://github.com/cran/bigstatsr |
ff |
Used for storing large datasets on disk |
SparkR |
R connector to Spark |
While other connectors also exist, the preceding are some of the more common ones that you will encounter today.