Understanding histograms

In computer vision, histograms are simply graphs that represent the distribution of pixel values over the possible range of the accepted values for those pixels, or, in other words, the probability distribution of pixels. Well, this might not be as crystal clear as you would expect, so let's take single-channel grayscale images as a simple example to describe what histograms are, and then expand it to multi-channel colored images, and so on. We already know that the pixels in a standard grayscale image can contain values between 0 and 255. Considering this fact, a graph similar to the following, which depicts the ratio of the number of pixels containing each and every possible grayscale pixel value of an arbitrary image, is simply the histogram of that given image:

Keeping in mind what we just learned, it can be easily guessed that the histogram of a three-channel image, for example, would be three graphs representing the distribution of values for each channel, similar to what we just saw with the histogram of a single-channel grayscale image.

You can use the calcHist function in the OpenCV library to calculate the histogram of one or multiple images that can be single-channel or multi-channel themselves. This function requires a number of parameters that must be provided carefully for it to produce the desired results. Let's see how this function is used with a few examples.

The following example code (followed by description of all the parameters) demonstrates how you can calculate the histogram of a single grayscale image:

Mat image = imread("Test.png"); 
if(image.empty()) 
    return -1; 
Mat grayImg; 
cvtColor(image, grayImg, COLOR_BGR2GRAY); 
 
int bins = 256; 
int nimages = 1; 
int channels[] = {0}; 
Mat mask; 
int dims = 1; 
int histSize[] = { bins }; 
float rangeGS[] = {0, 256}; 
const float* ranges[] = { rangeGS }; 
bool uniform = true; 
bool accumulate = false; 
Mat histogram; 
calcHist(&grayImg, 
         nimages, 
         channels, 
         mask, 
         histogram, 
         dims, 
         histSize, 
         ranges, 
         uniform, 
         accumulate);

We infer the following from the preceding code:

grayImg is the input grayscale image that wants to calculate its histogram, and histogram will contain the result.
nimages must contain the number of images for which we want histograms calculated, which, in this case, is just one image.
channels is an array that is supposed to contain the zero-based index number of the channels in each image for which we want their histogram calculated. For instance, if we want to calculate the histogram of the first, second, and fourth channels in a multi-channel image, the channels array must contain the values of 0, 1, and 3. In our example, channels only contained 0, since we're calculating the histogram of the only channel in a grayscale image.
mask, which is common to many other OpenCV functions, is a parameter that is used to mask (or ignore) certain pixels, or, in other words, prevent them from participating in the calculated result. In our case, and as long as we are not working on a certain portion of an image, mask must contain an empty matrix.
dims, or the dimensionality parameters, corresponds to the dimensionality of the result histogram that we are calculating. It must not be greater than CV_MAX_DIM, which is 32 in current OpenCV versions. We'll be using 1 in most cases, since we expect our histogram to be a simple array-shaped matrix. Consequently, the index number of each element in the resulting histogram will correspond to the bin number.
histSize is an array that must contain the size of the histogram in each dimension. In our example, since the dimensionality was 1, histSize must contain a single value. The size of the histogram, in this case, is the same as the number of bins in a histogram. In the preceding example code, bins is used to define the number of bins in the histogram, and it is also used as the single histSize value. Think of bins as the number of groups of pixels in a histogram. This will be further clarified with examples later on, but for now, it is important to note that a value of 256 for bins will result in a histogram containing the count of all individual possible pixel values.
ranges must contain pairs of values corresponding to the lower and higher bounds of each range of possible values when calculating the histogram of an image. In our example, this means a value in the single range of (0, 256), which is what we have provided to this parameter.
The uniform parameter is used to define the uniformity of the histogram. Note that if the histogram is non-uniform, as opposed to what is demonstrated in our example, the ranges parameter must contain the lower and higher bounds of all dimensions, respectively.
The accumulate parameter is used to decide whether the histogram should be cleared before it is calculated, or the calculated values should be added to an existing histogram. This can be quite useful when you need to calculate a single histogram using multiple images.

We'll cover the parameters mentioned here as much as possible in the examples provided in this chapter. However, you can also refer to the online documentation of the calcHist function for more information.

Table of Contents for Understanding histograms

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding histograms