Histograms and bar plots

Roulette is a fascinating example of a betting game using random outcomes. In order to explore some properties of roulette spins, let's visualize some randomly drawn numbers in the range of those in an European roulette game (0 to 36). Histograms allow the graphic representation of the distribution of variables. Let's have a look at it! Type in the following code:

1  set.seed(1)
2  drawn = sample(0:36, 100, replace = T)
3  hist(drawn, main = "Frequency of numbers drawn",
4     xlab = "Numbers drawn", breaks=37)

Here we first set the seed number to 1 (see line 1). For reproducibility reasons, computer generated random numbers are generally not really random (they are in fact called pseudo-random). Exceptions exist, such as numbers generated on the website http://www.random.org (which bases the numbers on atmospheric variations). Setting the seed number to 1 (or any number really) makes sure the numbers we generate here will be the same as you will have on your screen, when using the same code and the same seed number. Basically, setting it allows the reproduction of random drawings. On line 2, we use the sample() function to generate 100 random numbers in a range of 0 to 36 (0:36). The replace argument is set to true (T), which means that the same number can be drawn several times.

The hist() function (lines 3 and 4) will plot the frequency of these numbers. The hist() function takes a multitude of arguments, of which we use 4 here; main, which sets the title of the graphic, xlab, which sets the title of the horizontal axis (similarity, ylab would set the title of the vertical axis), and breaks, which forces the display of 37 breaks (corresponding to the number of possible outcomes of the random drawings). For more information about the hist() function, you can simply type ?hist() in your R console.

As you can notice on the graph below, the frequencies are quite different between numbers, even though each number has an equal theoretical probability to be drawn on each roll. The output is provided in the figure below:

Histograms and bar plots

A histogram of the frequency of numbers drawn

Let's dwell a little upon the representation of mean values using bar plots. This will allow us to have a look at other properties of the roulette drawings. The mean values will represent the proportions of presence of characteristics of the roulette outcomes (for example, proportion of red number drawn). We will therefore build some new functions.

The buildDf() function will return a data frame with a number of rows that correspond to how many numbers we want to be drawn, and a number of columns that correspond to the total number of attributes we are interested in (the number drawn, its position on the wheel and several possible bets), totaling 14 columns. The matrix is first filled with zeroes, and will be populated at a later stage:

1    buildDf = function(howmany) {
2       Matrix=matrix(rep(0, howmany * 14), nrow=howmany,ncol=14)
3       DF=data.frame(Matrix)
4       names(DF)=c("number","position","isRed","isBlack",
5         "isOdd","isEven","is1to18","is19to36","is1to12",
6         "is13to24","is25to36","isCol1","isCol2","isCol3")
7       return(DF)
8    }

Let's examine the code in detail: on line one, we declare the function, which we call buildDF. We tell R that it will have an argument called howmany. On line 2, we assign a matrix of howmany rows and 14 columns to an object called Matrix. The matrix is at this stage filled with zeroes. On line 3, we make a data frame called DF of the matrix, which will make some operations easier later. On lines 4 to 6, we name the columns of the data frame using names() functions. The first column will be the number drawn, the second the position on the wheel (the position for 0 will be 1, the position for 32 will be 2, and so on). The other names correspond to possible bets on the betting grid. We will describe these later when declaring the function that will fill in the matrix. On line 7, we specify that we want the function to return the data frame. On line 8, we close the function code block (using a closing bracket), which we opened on line 1 (using an opening bracket).

Our next function, attributes(), will fill the data frame with numbers drawn from the roulette, their position on the roulette, their color, and other attributes (more about this below):

1    attributes = function(howmany,Seed=9999) { 
2       if (Seed != 9999) set.seed(Seed)
3       DF = buildDf(howmany)
4       drawn = sample(0:36, howmany, replace = T)
5       DF$number=drawn
6       numbers = c(0, 32, 15, 19, 4, 21, 2, 25, 17, 34, 6, 27, 
7          13, 36, 11, 30, 8, 23, 10, 5, 24, 16, 33, 1, 20, 14, 
8          31, 9, 22, 18, 29, 7, 28, 12, 35, 3, 26)

The function is not fully declared at this stage. We will break it down in several parts in order to explain what we are doing here. On line 1, we assign the function to object attributes, specifying that we have 2 arguments; howmany for the number of rows corresponding to how many numbers we want to be drawn, and Seed for the seed number we will use (with default value 9999). On line 2, we set the seed to the provided seed number if it is not 9999 (as we need the function to be able not to set the seed for analyses we will do later). On line 3, we create the data frame by calling the function buildDf() we created before. On line 4, we sample the specified amount of numbers. On line 5, we assign these numbers to the column of the data frame called drawn. On line 6, we create a vector called numbers, which contains the numbers 0 to 36, in the order featured on the roulette wheel (starts with 0, then 32, 15 …).

In the remaining of the function (presented below), we populate the rest of the attributes:

9       for (i in 1:nrow(DF)){
10         DF$position[i]= match(DF$number[i],numbers)
11         if (DF$number[i] != 0) { if (DF$position[i]%%2) {
12            DF$isBlack[i] = 1} else {DF$isRed[i] = 1}
13         if (DF$number[i]%%2) { DF$isOdd[i]=1} 
14         else {DF$isEven[i]=1}
15         if (DF$number[i] <= 18){ DF$is1to18[i]=1} 
16         else { DF$is19to36[i]=1}
17         if(DF$number[i] <= 12){ DF$is1to12[i]=1} 
18         else if (DF$number[i]<25) { DF$is13to24[i] = 1} 
19            else { DF$is25to36[i] = 1}
20         if(!(DF$number[i]%%3)){ DF$isCol3[i] = 1} 
21         else if ((DF$number[i] %% 3 ) == 2) {
22           DF$isCol2[i] = 1}  
23           else { DF$isCol1[i] = 1}
24           }
25         }
26       return(DF)
27    }

On line 9, we create a loop, meaning that the code block will iterate from i = 1, to i = the number of numbers we have drawn (the number of rows of the data frame). We open the code block using an opening bracket. On line 10, we assign to the attribute position of the drawn number on the wheel, using function match(). On lines 11 to 12, we create a nested condition, stating that if the number is not 0, we assign 1 to attribute isBlack is the position of the number is even, or 1 to isRed if the position is odd (remember the color of the numbers alternate – red, black, red ...). On line 13 and 14, we assign 1 to attribute isOdd if the number if odd, or 1 to attribute isEven if the number is even. On lines 15 and 16, we assign 1 to attribute is1to18 if the number is smaller or equal to 18, or 1 to attribute is19to36 if the number is higher than 18. On lines 17 to 18, we assign 1 to either is1to12, is13to24 or is25to36 depending on the value of the number (that's self-explanatory). Finally, on lines 20 to 26, we assign the column number on the betting grid, by setting the value of either isCol1, isCol2, or isCol3 (on the table representing the betting grid, isCol1 is the left 2:1 column, isCol2 the middle 2:1 column and isCol3 the right one). As we have used nested conditions here, we close the code block on lines 24 and 25. On line 26, we tell R that we want the function to return the resulting data frame. On line 27, we close the code block of the function (that we opened on line 1).

Now that we have our functions ready, we can now focus on visualizing some data. The following code will generate 1,000 roulette spins (let's use a seed number of 2 so that the calculation of the random number is the same on your machine as in mine):

Data=attributes(1000,2)

It is now time to explore the relationship between our variables in the following graph. We will first ask R to plot several graphs on the plotting area. To do so, we will rely on the mfrow argument of the par() function (line 1). We then tell R to plot 2 rows of 3 graphs corresponding to the proportion of red numbers (the mean of the values, as these are represented by 1 for presence and 0 for absence) in the 2:1 columns 1, 2 and 3 on the first row, and the proportion of even number in columns 1, 2 and 3 in the second row. Notice that for all 6 plots we use subsetting (using subset() function here) to select the portion of the data we are interested in. We use attribute ylim to define the range of the plotting area (from 0 to 1), and attribute main to print the title of the plots.

1    par(mfrow = c(2,3))
2    barplot(mean(subset(Data, isCol1 == 1)$isRed), ylim=(c(0,1)), 
3       main = "Prop. of red in Col. 1")
4    barplot(mean(subset(Data, isCol2 == 1)$isRed), ylim=(c(0,1)), 
5       main = "Prop. of red in Col. 2")
6    barplot(mean(subset(Data, isCol3 == 1)$isRed), ylim=(c(0,1)), 
7       main = "Prop. of red in Col. 3")
8    barplot(mean(subset(Data, isCol1 == 1)$isEven), ylim=(c(0,1)), 
9       main = "Prop. of even numbers in Col. 1")
10    barplot(mean(subset(Data, isCol2 == 1)$isEven), ylim=(c(0,1)),
11       main = "Prop. of even numbers in Col. 2")
12    barplot(mean(subset(Data, isCol3 == 1)$isEven), ylim=(c(0,1)), 
13       main = "Prop. of even numbers in Col. 3")
Histograms and bar plots

Bar plots of the proportion of red, and even numbers drawn from Columns 1, 2 and 3

As can be seen on the graphs, the proportion of red numbers drawn from columns 1, 2 and 3 is different, whereas the proportion of even numbers is relatively similar between all the columns. This can be expected from the betting grid.

You might have noticed that we have lost important information in the process; the total of numbers drawn from each column, and the number of zeros; and we needed to produce one bar plot per column, which is a bit tricky. Let's solve these problems by first adding a single attribute which indicates the membership of the drawn numbers to Column 1, Column 2 and Column 3.

1    for (i in 1:nrow(Data)){
2       if(Data$isCol1[i]== 1){ Data$Column[i]=1 } 
3          else if (Data$isCol2[i] == 1 ) { Data$Column[i] = 2 } 
4          else if (Data$isCol3[i] == 1 ) { Data$Column[i] = 3 } 
5          else {Data$Column[i] = 0 }
6    }

On line 1, we start a for loop that will iterate from i = 1 to i = the number of rows in data frame Data. We use nested condition in lines 2 to 5 to determine the column number (1 if attribute isCol1 equals to 1, 2 if attribute isCol2 equals to 1, 3 if attribute isCol3 equals to 1, or 0 if neither of these conditions is satisfied. We close the code block on line 6.

We now can plot the column in relation to the proportion of red, and even numbers. For now, our attributes isRed and isEven are ordered with 0 coming first and 1 second. We want just the opposite, as we want the number of numbers coded 1 to appear at the bottom of the graph. We therefore reorder the values of our attributes using the levels attribute of the factor() function (lines 1 and 2). We will use par() again to get both graphs on the same plotting area. We then generate the stacked bar plots using the barplot() function again. Notice we do not plot mean values this time, but the content of the table in which the cells correspond to the intersections of the attributes Column and isRed or isEven. We rely on the argument name.arg to name the sections of the plots:

1    Data$isRed = factor(Data$isRed, levels = c(1,0))
2    Data$isEven = factor(Data$isEven, levels = c(1,0))
3    par(mfrow = c(2,1))
4    barplot(table(Data$isRed,Data$Column), 
5       main = "Red numbers in Columns 1, 2 and 3", 
6       names.arg = (c("0","Column 1", "Column 2", "Column 3")) )
7    barplot(table(Data$isEven,Data$Column), 
8       main = "Even numbers in Columns 1, 2 and 3", 
9       names.arg = (c("0","Column 1", "Column 2", "Column 3")) )
Histograms and bar plots

A bar plot of the number of Red and Even numbers drawn

As can be seen on this stacked bar plot, approximately the same amount of numbers have been drawn from each of the columns. The number 0 has been drawn around 50 times, which is about twices often as expected given its theoretical probability equal to those of the other numbers (1000 * (1/37) = 27).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.70.170