Chapter 7. Statistical Data Analysis with Incanter

In this chapter, we will cover the following recipes:

  • Generating summary statistics with $rollup
  • Working with changes in values
  • Scaling variables to simplify variable relationships
  • Working with time series data with Incanter Zoo
  • Smoothing variables to decrease variation
  • Validating sample statistics with bootstrapping
  • Modeling linear relationships
  • Modeling non-linear relationships
  • Modeling multinomial Bayesian distributions
  • Finding data errors with Benford's law

Introduction

So far, we've focused on data and process. We've seen how to get data and how to get it ready to analyze. We've also looked at how to organize and partition our processing to keep things simple and get the best performance.

We'll now look at how to leverage statistics to gain insights into our data. This is a subject that is both broad and deep, and covering statistics in any meaningful way is far beyond the scope of this chapter. For more information about some of the procedures and functions described here, you should refer to a textbook, class, your local statistician, or another resource. For instance, Coursera has an online statistics course (https://www.coursera.org/course/stats1), and Harvard has a course on probability on iTunes (https://itunes.apple.com/us/course/statistics-110-probability/id502492375).

Some of the recipes in this chapter will involve generating simple summary statistics. Some will involve further messaging our data to make trends and relationships more clear. We'll then look at different ways to model the relationships in our data. Finally, we'll look at Benford's law, a curious observation about the behavior of naturally occurring sequences of numbers, which we can leverage to discover problems with our data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.241.250