Data Exploration and Predictive Modeling with R

Using the R language inside SQL Server gives us the opportunity to get knowledge out of data. We introduced R and R support in SQL Server in the previous chapter, and this chapter demonstrates how you can use R for advanced data exploration, statistical analysis, and predictive modeling, way beyond the possibilities offered by using the T-SQL language only.

You will start with intermediate statistics: exploring associations between two discrete and two continuous variables, and one discrete and one continuous variable. You will also learn about linear regression, where you explain the values of a dependent continuous variable with a linear regression formula using one or more continuous input variables.

The second section of this chapter starts by introducing advanced multivariate data mining and machine learning methods. You will learn about methods that do not use a target variable, or so-called undirected methods.

In the third part, you will learn about the most popular directed methods.

Finally, to finish the chapter and the book in a slightly lighter way, you will play with graphs again. The last section introduces ggplot2, the most popular package for visualizing data in R.

The target audience of this book is database developers and solution architects who plan to use the new SQL Server 2016 and 2017 features, or simply want to know what is now available and which limitations from previous versions have been removed.

Most of the readers deal daily with simple statistics only, and less often with data mining and machine learning. Because of that, this chapter does not only show how to write the code for advanced analysis in R, it also gives you an introduction to the mathematics behind the code, and explains when you want to use which method.

This chapter will cover the following points:

  • Associations between two or more variables
  • Undirected data mining and machine learning methods
  • Directed data mining and machine learning methods
  • Advanced graphing in R
  • The mathematics behind the advanced methods explained
  • A mapping between problems and algorithms
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.195.225