Histograms are used to visualize count data. The chart consists of bars with a y axis representing the count and the x axis representing the data. The following code demonstrates the use of the hist function in R:
# Data Visualisation library(car) library(RColorBrewer) data("Salaries") # Histogram of Salaries of Professors from the Salaries dataset hist(Salaries$salary/1000, main="Histogram of Salaries", xlab="Salary (in '000s)", ylab="Count")
The output of the preceding code is as follows:
We will be running the same code with a minor change of col=brewer.pal(8,"Reds"). This will provide you with the same output as the previous one with the color red:
# Same as above with a Brewer Palette h1 <- hist(Salaries$salary/1000,main="Histogram of Salaries", xlab="Salary (in '000s)", ylab="Count", col=brewer.pal(8,"Reds"))
The output of the preceding code is as follows:
The hist function produces several metrics which can be accessed using their respective names (for example, in this case, h1$breaks, h1$counts, and so on), as shown:
names(h1) # [1] "breaks" "counts" "density" "mids" "xname" "equidist" h1$counts # [1] 1 50 90 114 60 48 23 8 2 1 h1$breaks # [1] 40 60 80 100 120 140 160 180 200 220 240
The pros and cons of using histograms are as follows:
Pros:
Histograms can be used to get a general overview quickly from large datasets.
Cons:
- Can only be used for numeric values
- The bin width can be changed (if equal bin sizes are not used) which might not be able to capture the important aspects of the data