Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Probability distributions

R makes it very easy to plot and get statistical information on many probability distributions. For those who are not familiar with probability distributions, they are defined as a table or an equation that links each outcome of a statistical experiment with its probability of occurrence. A summary of many common probability distributions available in R is available in the following table:

Probability distribution	R name
Beta	beta
Binomial	binom
Cauchy	cauchy
Chi square	chisq
Exponential	exp
F	f
Gamma	gamma
Geometric	geom
Hypergeometric	hyper
Logistic	logis
Lognormal	lnorm
Negative Binomial	nbinom
Normal	norm
Poisson	pois
Student t	t
Uniform	unif
Tukey	tukey
Weibull	weib
Wilcoxon	wilcox

You can also get this summary in R by entering help("distributions"). For additional probability distributions, and the packages needed to load them, you can consult the CRAN distributions page at http://cran.r-project.org/web/views/Distributions.html.

For each probability distribution, you can obtain the function that generates the mass or the probability function by adding the d prefix, the cumulative density function by adding the p prefix, and the quantile function by adding the q prefix to the R name, shown in the previous table. You can also generate random numbers from these probability distributions by adding the r prefix to the R name. For example, you can use qnorm() to call the quantile function for a normal distribution and rpois() to generate random numbers from a Poisson distribution.

For the 0.65 quantile of a normal distribution with a mean of 7.5 and standard deviation of 4, we would enter:

> qnorm(0.65, mean=7.5, sd=4)
[1] 9.041282

To generate seven random numbers from a Poisson distribution with a lambda equal to 4, we would enter:

> rpois(7, lambda=4)
[1] 2 3 5 4 6 3 5

Now, let's consider a more detailed example using probability distribution functions to solve a particular problem. Say the average number of liters of water consumed per day for children under the age of 12 has a normal distribution with a mean of 7.5 and a standard deviation of 3.5. Since the 68–95–99.7 rule (also known as the three-sigma rule or empirical rule) states that 99.7 percent randomly generated values will fall within three standard deviations of the mean in a normal distribution, we can approximate the interval values to be used for the x values in our plot, as follows:

> ld.mean <- 7.5
> ld.sd <- 1.5
> ld.mean+3*ld.sd
[1] 12
>  ld.mean-3*ld.sd
[1] 2

So, from these calculations, we can use an interval of [0, 16] because most random numbers generated will fall between 2 and 12:

> x <- seq(0, 16, length=100)

Next, we will use the dnorm() function, along with our mean and standard deviation, to return the density curve for average liters of water consumed per day for children under the age of 12:

> nd.height <- dnorm(x, mean = 7.5, sd = 1.5)

Now, we can plot the normal curve for probability distribution in R using the plot() function. We will set type = "l" in the plot() function to graph a line instead of points, as shown in the following command:

> plot(x, nd.height, type = "l", xlab = "Liters per day",  ylab = "Density", main = "Liters of water drank by school children < 12 years old")

The graph for this normal curve is shown in the following plot:

Suppose we want to evaluate the probability of a child drinking less than 4 liters of water per day. We can get this information by measuring the area under the curve to the left of 4 using pnorm(), as shown in the following code to return the cumulative density function. Since we want to measure the area to the left of the curve, we set lower.tail=TRUE (default command) to the pnorm() function; otherwise, we will enter lower.tail=FALSE to measure the area to the right of the curve:

> pnorm(4, mean = 7.5, sd = 1.5, lower.tail = TRUE)
 [1] 0.009815329

We can plot the cumulative density function for x, as follows:

> ld.cdf <- pnorm(x, mean = 7.5, sd = 1.5, lower.tail = TRUE)
> plot(x, ld.cdf, type = "l", xlab = "Liters per day", ylab = "Cumulative Probability")

The result is shown in the following graph:

We can also plot the cumulative probability of a child drinking more than 8 liters of water per day on our normal curve by setting upper and lower boundaries and then coloring in that area using the polygon() function. By looking at our cumulative density function plot (shown in the previous diagram), we can see that the probability of a child drinking more than 15 liters per day approaches zero so we can set our upper limit to 15.

Plot the normal curve using the plot() function, as follows:

>  plot(x, nd.height, type = "l", xlab = "Liters per day",  ylab = "Density")

Set the lower and upper limits, as follows:

>  ld.lower <- 8
>  ld.upper <- 15

Get all values of x that fall between 8 and 15:

>  i <- x >= ld.lower & x <= ld.upper #returns a logical vector

Now, we can highlight the area under the curve corresponding to the probability of a child drinking more than 8 liters of water in red with the polygon() function:

> polygon(c(ld.lower,x[i], ld.upper), c(0, nd.height [i],0), col="red")
> abline(h = 0, col = "gray")

Calculate the cumulative probability of a child drinking more than 8 liters of water per day:

> pb <- round(pnorm(8, mean = 7.5, sd = 1.5, lower.tail = FALSE)
> pb
[1] 0.37

Use the paste() function to create a character vector that will concatenate the pb value to our text:

> pb.results <- paste("Cumulative probabily of a child drinking > 8L/day", pb, sep=": ")

Add the pb.results text as the title of our plot:

> title(pb.results)

The result is shown in the following graph:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Probability distributions

Create new playlist

Sign In

Sign Up

Probability distributions

Table of Contents for
Probability distributions