Histograms can represent the distribution of values either as frequency (the absolute number of times values fall within specific ranges) or as probability density (the proportion of the values that falls within specific ranges). In this recipe, we will learn how to choose one or the other.
We are only using base graphics functions for this recipe. So, just open up the R prompt and type in the following code. We will use the airpollution.csv
example dataset for this recipe. So, let's first load it:
air<-read.csv("airpollution.csv")
We will use the hist()
base graphics function to make our histogram, first showing the frequency and then probability density of nitrogen oxide concentrations:
hist(air$Nitrogen.Oxides, xlab="Nitrogen Oxide Concentrations", main="Distribution of Nitrogen Oxide Concentrations")
Now, let's make the same histogram but with probability instead of frequency:
hist(air$Nitrogen.Oxides, freq=FALSE, xlab="Nitrogen Oxide Concentrations", main="Distribution of Nitrogen Oxide Concentrations")
The first example, which shows the frequency counts of different value ranges of nitrogen oxides, simply uses a call to the hist()
function in the base graphics library. The variable is passed as the first argument; by default, the histogram plotted shows frequency. In the second example, we pass an extra freq
argument and set it to FALSE
, which results in a histogram that shows probability densities. This suggests that, by default, freq
is set to TRUE
. The help section on hist()
(?hist
) states that freq
defaults to TRUE
if, and only if, the breaks are equidistant and the probability is not specified.
18.117.145.11