Chapter 2. Visualizing and Manipulating Data Using R

Data visualization is one of the most important processes in data science. Relationships between variables can sometimes more easily be understood visually than relying only on predictive modeling, or statistics, and this most often requires data manipulation. Visualization is the art of examining distributions and relationships between variables using visual representations (graphics), with the aim of discovering patterns in data. As a matter of fact, a number of software companies provide data visualization tools as their sole or primary product (for example, Tableau, Visual.ly). R has built-in capabilities for data visualization. These capabilities can of course (as with almost everything in R) be extended by recourse to external packages. Furthermore, graphics made for a particular dataset can be reused and adapted for another with relatively little effort. Another great advantage of R is of course that it is a fully functional statistical software, unlike most of the alternatives.

In this chapter, we will do the following:

  • Examine the basic capabilities of R with regards to data visualization, by using some of the most important tools for visualization: histograms, bar plots, line plots, boxplots, and scatterplots.
  • Generate data sets based on virtual European roulette spins and develop basic data manipulation (for example, subsetting) and programming skills (use of conditions and loops).
  • This will give us the opportunity to have a look at visualization tools with data for which the theoretical distributions and relationships are known in advance; whereas, usually, the theoretical distribution is unknown and the aim of visualization is to get an understanding of the data structures and patterns. Working with known theoretical distribution allows for observing deviations from what is expected.

The roulette case

Roulette is a betting game which rewards the player's correct prediction of its outcome. The game consists of a ball spinning around a wheel which rotates in the opposite direction. The wheel features 37 numbered pockets. Each of the number has a color (18 are red, 18 are black and one, the zero, is green). The aim of the game is to bet on one or several outcomes regarding the pocket on which the ball lands. Numbers can range from 0 to 36, and several types of bets are available such as the color of the number, it being even or odd, and several other characteristics related to the number or the position on the wheel (as marked on the betting grid). The image below is a representation of an European roulette wheel. The ball is represented by the tiny white circle. In this example it landed on the pocket corresponding to the number 3.

The roulette case

A representation of a roulette wheel

Numbers are ordered on the wheel in such a way that the position of a number on the wheel is as unrelated as possible to the possible bets, (except of course bets on the position itself, which we will not describe here). The order of the numbers, starting from 0 is visible on the image above. As you can notice, the color of the numbers alternates when moving forward on the wheel (red, black, red, and so on). Also, the order seems to be unrelated to the betting grid. We will see if this is the case at the end of the chapter.

The table below is a schematic representation of the betting grid at European roulette. Red numbers are italicized. Betting on each number returns 35 times the amount if the number is drawn (plus the initial bet). Betting on color (red or black), odd vs even, 1-18 vs 19-36 return each the betted amount (plus the initial bet), if the drawn number corresponds to that attribute. The probability of occurrence of any of these is 18/37 or 0.487. Betting on the 1st dozen, 2nd dozen, 3rd dozen and each of the 2:1 column returns for each 2 times the amount if the drawn number falls in that category (plus the initial bet). The probability of occurrence of any of these is 12/37 or 0.32. Bets are lost if the drawn number fails to be within the betting category. In the example above, the ball stopped on number 3. Examples of winning bets in the depicted example are Red, first dozen, 1-18, the 3rd column, and of course betting on number 3.

The following table depicts a betting grid at roulette.

The roulette case

A representation of a betting grid at roulette

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.164.75