Data Wrangling with R

"You can have data without information, but you cannot have information without data."
                                                                                                      – Daniel Keys Moran

Data wrangling has been one of the core strengths of R, given its capabilities of relatively fast in-memory processing on demand and a wide array of packages that facilitate the fast data curation processes that data wrangling involves.

R is especially invaluable when working with datasets in excess of 1 million rows—the limit in Microsoft Excel—or when working with files that are in the order of gigabytes. Due to several easy-to-use functions for common day-to-day tasks such as aggregations, joins, and pivots, R is also arguably much simpler to use relative to some of the GUI-based tools that are available for similar tasks.

At a high level, the core categories of data wrangling with R include data extraction, data cleansing, data transformation, and data consolidation. This is a simplified categorization of the basic tenets of data wrangling and we'll delve deeper into these individual subject areas in the next few sections. The challenge emanates largely due to the fact that data comes in a range of data types and data formats from a diverse pool of data sources. Here, data type refers to the characteristics of the contents of the files, format refers to the file format in which data is delivered, and source refers to the systems from when you receive data. There is no common universal convention for thesethe data may exist in a CSV file or a binary SAS file or be present in a database, each of which can have its own nuances and challenges.

In this chapter, we will cover the following topics:

  • Introduction to data wrangling with R
  • The foundational tools of data wrangling: dplyr, data.table, and others
  • ETL with R data extraction
  • ETL with R data transformation
  • ETL with R data load
  • Helpful data wrangling tools for everyday use
  • Tutorial
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.21.152