Summary

In this chapter, you learned a number of basic functions and various packages for data manipulation. Using built-in functions to manipulate data can be redundant. Several packages are tailored for filtering and aggregating data based on different techniques and philosophies. The sqldf packages use embedded SQLite databases so that we can directly write SQL statements to query data frame in our working environment. On the other hand, data.table provides an enhanced version of data.frame and a powerful syntax, and dplyr defines a grammar of data manipulation by providing a set of pipeline friendly verb functions. The rlist class provides a set of pipeline friendly functions for non-tabular data manipulation. No single package is best for all situations. Each of them represents a way of thinking, and which best fits a certain problem depends on how you understand the problem and your experience of working with data.

Processing data and doing simulation require considerable computing power. However, from the beginning to today, performance is not the top priority for R. Although R is very powerful in interactive analysis, visualization, and reporting, its implementation is considered slow compared to some other popular scripting languages when it is used to process a large amount of data. In the next chapter, we'll introduce several techniques from performance measure and profiling to vectorization, MKL-powered R kernel, parallel computing, and Rcpp. These techniques will help you achieve high performance when you really need it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.14.245