Histograms

Histograms are used to visualize count data. The chart consists of bars with a y axis representing the count and the x axis representing the data. The following code demonstrates the use of the hist function in R:

# Data Visualisation 
 
library(car) 
library(RColorBrewer) 
 
data("Salaries") 
 
# Histogram of Salaries of Professors from the Salaries dataset 
hist(Salaries$salary/1000, main="Histogram of Salaries", xlab="Salary (in '000s)", ylab="Count") 

The output of the preceding code is as follows:

We will be running the same code with a minor change of col=brewer.pal(8,"Reds"). This will provide you with the same output as the previous one with the color red:

# Same as above with a Brewer Palette 
h1 <- hist(Salaries$salary/1000,main="Histogram of Salaries", xlab="Salary (in '000s)", ylab="Count", col=brewer.pal(8,"Reds")) 

The output of the preceding code is as follows:

The hist function produces several metrics which can be accessed using their respective names (for example, in this case, h1$breaks, h1$counts, and so on), as shown:

names(h1) 
# [1] "breaks"   "counts"   "density"  "mids"     "xname"    "equidist" 
 
h1$counts 
#  [1]   1  50  90 114  60  48  23   8   2   1 
 
h1$breaks 
#  [1]  40  60  80 100 120 140 160 180 200 220 240 
 

The pros and cons of using histograms are as follows:

Pros:

Histograms can be used to get a general overview quickly from large datasets.

Cons:

  • Can only be used for numeric values
  • The bin width can be changed (if equal bin sizes are not used) which might not be able to capture the important aspects of the data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.91.254