OpenCV API

There are a surprising number of methods to read in, perform image processing, and display a digital image file in Python. However, OpenCV provides some of the easiest and most intuitive APIs to do this. One important thing to note regarding OpenCV is that it actually inverts RGB values to BGR values when interpreting its images, so instead of red, green, and blue in order, the tuples in an image matrix will represent blue, green, and red, in that order.

Let's look at an example of interacting with OpenCV in Python. Let's a take look at the Chapter08/example1.py file:

# Chapter08/example1.py

import cv2

im = cv2.imread('input/ship.jpg')
cv2.imshow('Test', im)
cv2.waitKey(0) # press any key to move forward here

print(im)
print('Type:', type(im))
print('Shape:', im.shape)
print('Top-left pixel:', im[0, 0])

print('Done.')

There are a few methods from OpenCV that have been used in this script that we need to discuss:

cv2.imread(): This method takes in a path to an image file (common file extensions include .jpeg, .jpg, .png, and so on) and returns an image object, which, as we will see later, is represented by a NumPy array.
cv2.imshow(): This method takes in a string and an image object and displays it in a separate window. The title of the window is specified by the passed-in string. The method should always be followed by the cv2.waitKey() method.

cv2.waitKey(): This method takes in a number and blocks the program for a corresponding number of milliseconds, unless the number 0 is passed in, in which case it will block indefinitely until the user presses a key on their keyboard. This method should always follow the cv2.imshow() method.

After calling cv2.imshow() on the ship.jpg file inside the input subfolder so that it's displayed from the Python interpreter, the program will stop until a key is pressed, at which point it will execute the rest of the program. If run successfully, the script will display the following image:

You should also obtain the following output for the rest of the main program after pressing any key to close the displayed picture:

> python example1.py
[[[199 136 86]
  [199 136 86]
  [199 136 86]
  ..., 
  [198 140 81]
  [197 139 80]
  [201 143 84]]

[...Truncated for readability...]

 [[ 56 23 4]
  [ 59 26 7]
  [ 60 27 7]
  ..., 
  [ 79 43 7]
  [ 80 44 8]
  [ 75 39 3]]]
Type: <class 'numpy.ndarray'>
Shape: (1118, 1577, 3)
Top-left pixel: [199 136 86]
Done.

The output confirms a few of the things that we discussed earlier:

First, when printing out the image object returned from the cv2.imread() function, we obtained a matrix of numbers.
Using the type() method from Python, we found out that the class of this matrix is indeed a NumPy array: numpy.ndarray.
Calling the shape attribute of the array, we can see that the image is a three-dimensional matrix of the shape (1118, 1577, 3), which corresponds to a table with 1118 rows and 1577 columns, each element of which is a pixel (three-number tuple). The numbers for the rows and columns also correspond to the size of the image.
Focusing on the top-left pixel in the matrix (the first pixel in the first row, that is, im[0, 0]), we obtained the BGR value of (199, 136, 86)—199 blue, 136 green, and 86 red. By looking up this BGR value through any online converter, we can see that this is a light blue that corresponds to the sky, which is the upper part of the image.

Table of Contents for OpenCV API

Create new playlist

Sign In

Sign Up

Table of Contents for
OpenCV API