Checking data quality

There are several inbuilt functions as well as packages for checking the quality of data in R. The most commonly used among them is the summary function in base R:

## Packages Used: 
## psych, pastecs, dataMaid, daff 
 
# install.packages(c("psych","pastecs","dataMaid","daff")) 
 
state <- data.frame(state.x77) 
state$State <- row.names(state) 
state 
 
 
summary(state) 

The output of the preceding code is as follows:

library(psych) 
describe(state) 

The output of the preceding code is as follows:

You can also use describe.by to get summary information on a per group basis, as shown:

describe.by(state,state$State) 

The following is the output:

Or, for a comprehensive statistical description, you can use stat.desc from pastecs, as shown:

library(pastecs) 
stat.desc(state) 

The output of the preceding code is as follows:

Among other utilities, a more recent package, called dataMaid, makes is easy to capture a high-level comparison of all of the data contained in the dataset using a one-line command, as follows:

library(dataMaid) 
makeDataReport(state) 

The output of the preceding code is as follows:

We often need to find differences in datasets when some information changes. This can be done on an iterative basis by inspecting individual columns and so on, but a new package called daff can now be used to get very nice visual renderings of the changes, in a similar fashion to how you may have seen them on sites such as GitHub and elsewhere:

library(daff) 
state <- data.frame(state.x77) 
state2 <- state 
identical(state, state2) 
 
state2$Population <- state2$Population+1 
diff_data(state,state2) 

The output of the preceding code is as follows:

diff_info <- diff_data(state,state2) 
render_diff(diff_info) 

The output of the preceding code is as follows:

You can also patch the data using patch_data and merge datasets using merge_data. More information can be found on the developer's website.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.166.127