Chapter 2. Handling Files, Cameras, and GUIs

Installing OpenCV and running samples is fun, but at this stage, we want to try it out ourselves. This chapter introduces OpenCV's I/O functionality. We also discuss the concept of a project and the beginnings of an object-oriented design for this project, which we will flesh out in subsequent chapters.

By starting with a look at the I/O capabilities and design patterns, we will build our project in the same way we would make a sandwich: from the outside in. Bread slices and spread, or endpoints and glue, come before fillings or algorithms. We choose this approach because computer vision is mostly extroverted—it contemplates the real world outside our computer—and we want to apply all our subsequent algorithmic work to the real world through a common interface.

Basic I/O scripts

Most CV applications need to get images as input. Most also produce images as output. An interactive CV application might require a camera as an input source and a window as an output destination. However, other possible sources and destinations include image files, video files, and raw bytes. For example, raw bytes might be transmitted via a network connection, or they might be generated by an algorithm if we incorporate procedural graphics into our application. Let's look at each of these possibilities.

Reading/writing an image file

OpenCV provides the imread() and imwrite() functions that support various file formats for still images. The supported formats vary by system but should always include the BMP format. Typically, PNG, JPEG, and TIFF should be among the supported formats too.

Let's explore the anatomy of the representation of an image in Python and NumPy.

No matter the format, each pixel has a value, but the difference is in how the pixel is represented. For example, we can create a black square image from scratch by simply creating a 2D NumPy array:

img = numpy.zeros((3,3), dtype=numpy.uint8)

If we print this image to a console, we obtain the following result:

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=uint8)

Each pixel is represented by a single 8-bit integer, which means that the values for each pixel are in the 0-255 range.

Let's now convert this image into Blue-green-red (BGR) using cv2.cvtColor:

img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)

Let's observe how the image has changed:

array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]], dtype=uint8)

As you can see, each pixel is now represented by a three-element array, with each integer representing the B, G, and R channels, respectively. Other color spaces, such as HSV, will be represented in the same way, albeit with different value ranges (for example, the hue value of the HSV color space has a range of 0-180) and different numbers of channels.

You can check the structure of an image by inspecting the shape property, which returns rows, columns, and the number of channels (if there is more than one).

Consider this example:

>>> img = numpy.zeros((3,3), dtype=numpy.uint8)
>>> img.shape

The preceding code will print (3,3). If you then converted the image to BGR, the shape would be (3,3,3), which indicates the presence of three channels per pixel.

Images can be loaded from one file format and saved to another. For example, let's convert an image from PNG to JPEG:

import cv2

image = cv2.imread('MyPic.png')
cv2.imwrite('MyPic.jpg', image)

Note

Most of the OpenCV functionalities that we use are in the cv2 module. You might come across other OpenCV guides that instead rely on the cv or cv2.cv modules, which are legacy versions. The reason why the Python module is called cv2 is not because it is a Python binding module for OpenCV 2.x.x, but because it has introduced a better API, which leverages object-oriented programming as opposed to the previous cv module, which adhered to a more procedural style of programming.

By default, imread() returns an image in the BGR color format even if the file uses a grayscale format. BGR represents the same color space as red-green-blue (RGB), but the byte order is reversed.

Optionally, we may specify the mode of imread() to be one of the following enumerators:

  • IMREAD_ANYCOLOR = 4
  • IMREAD_ANYDEPTH = 2
  • IMREAD_COLOR = 1
  • IMREAD_GRAYSCALE = 0
  • IMREAD_LOAD_GDAL = 8
  • IMREAD_UNCHANGED = -1

For example, let's load a PNG file as a grayscale image (losing any color information in the process), and then, save it as a grayscale PNG image:

import cv2

grayImage = cv2.imread('MyPic.png', cv2.IMREAD_GRAYSCALE)
cv2.imwrite('MyPicGray.png', grayImage)

To avoid unnecessary headaches, use absolute paths to your images (for example, C:UsersJoePicturesMyPic.png on Windows or /home/joe/pictures/MyPic.png on Unix) at least while you're familiarizing yourself with OpenCV's API. The path of an image, unless absolute, is relative to the folder that contains the Python script, so in the preceding example, MyPic.png would have to be in the same folder as your Python script or the image won't be found.

Regardless of the mode, imread() discards any alpha channel (transparency). The imwrite() function requires an image to be in the BGR or grayscale format with a certain number of bits per channel that the output format can support. For example, bmp requires 8 bits per channel, while PNG allows either 8 or 16 bits per channel.

Converting between an image and raw bytes

Conceptually, a byte is an integer ranging from 0 to 255. In all real-time graphic applications today, a pixel is typically represented by one byte per channel, though other representations are also possible.

An OpenCV image is a 2D or 3D array of the .array type. An 8-bit grayscale image is a 2D array containing byte values. A 24-bit BGR image is a 3D array, which also contains byte values. We may access these values by using an expression, such as image[0, 0] or image[0, 0, 0]. The first index is the pixel's y coordinate or row, 0 being the top. The second index is the pixel's x coordinate or column, 0 being the leftmost. The third index (if applicable) represents a color channel.

For example, in an 8-bit grayscale image with a white pixel in the upper-left corner, image[0, 0] is 255. For a 24-bit BGR image with a blue pixel in the upper-left corner, image[0, 0] is [255, 0, 0].

Note

As an alternative to using an expression, such as image[0, 0] or image[0, 0] = 128, we may use an expression, such as image.item((0, 0)) or image.setitem((0, 0), 128). The latter expressions are more efficient for single-pixel operations. However, as we will see in subsequent chapters, we usually want to perform operations on large slices of an image rather than on single pixels.

Provided that an image has 8 bits per channel, we can cast it to a standard Python bytearray, which is one-dimensional:

byteArray = bytearray(image)

Conversely, provided that bytearray contains bytes in an appropriate order, we can cast and then reshape it to get a numpy.array type that is an image:

grayImage = numpy.array(grayByteArray).reshape(height, width)
bgrImage = numpy.array(bgrByteArray).reshape(height, width, 3)

As a more complete example, let's convert bytearray, which contains random bytes to a grayscale image and a BGR image:

import cv2
import numpy
import os

# Make an array of 120,000 random bytes.
randomByteArray = bytearray(os.urandom(120000))
flatNumpyArray = numpy.array(randomByteArray)

# Convert the array to make a 400x300 grayscale image.
grayImage = flatNumpyArray.reshape(300, 400)
cv2.imwrite('RandomGray.png', grayImage)

# Convert the array to make a 400x100 color image.
bgrImage = flatNumpyArray.reshape(100, 400, 3)
cv2.imwrite('RandomColor.png', bgrImage)

After running this script, we should have a pair of randomly generated images, RandomGray.png and RandomColor.png, in the script's directory.

Note

Here, we use Python's standard os.urandom() function to generate random raw bytes, which we will then convert to a NumPy array. Note that it is also possible to generate a random NumPy array directly (and more efficiently) using a statement, such as numpy.random.randint(0, 256, 120000).reshape(300, 400). The only reason we use os.urandom() is to help demonstrate a conversion from raw bytes.

Accessing image data with numpy.array

Now that you have a better understanding of how an image is formed, we can start performing basic operations on it. We know that the easiest (and most common) way to load an image in OpenCV is to use the imread function. We also know that this will return an image, which is really an array (either a 2D or 3D one, depending on the parameters you passed to imread()).

The y.array structure is well optimized for array operations, and it allows certain kinds of bulk manipulations that are not available in a plain Python list. These kinds of .array type-specific operations come in handy for image manipulations in OpenCV. Let's explore image manipulations from the start and step by step though, with a basic example: say you want to manipulate a pixel at the coordinates, (0, 0), of a BGR image and turn it into a white pixel.

import cv

import numpy as np
img = cv.imread('MyPic.png')
img[0,0] = [255, 255, 255]

If you then showed the image with a standard imshow() call, you will see a white dot in the top-left corner of the image. Naturally, this isn't very useful, but it shows what can be accomplished. Let's now leverage the ability of numpy.array to operate transformations to an array much faster than a plain Python array.

Let's say that you want to change the blue value of a particular pixel, for example, the pixel at coordinates, (150, 120). The numpy.array type provides a very handy method, item(), which takes three parameters: the x (or left) position, y (or top), and the index within the array at (x, y) position (remember that in a BGR image, the data at a certain position is a three-element array containing the B, G, and R values in this order) and returns the value at the index position. Another itemset() method sets the value of a particular channel of a particular pixel to a specified value (itemset() takes two arguments: a three-element tuple (x, y, and index) and the new value).

In this example, we will change the value of blue at (150, 120) from its current value (127) to an arbitrary 255:

import cv
import numpy as  np
img = cv.imread('MyPic.png')
print img.item(150, 120, 0)  // prints the current value of B for that pixel
img.itemset( (150, 120, 0), 255)
print img.item(150, 120, 0)  // prints 255

Remember that we do this with numpy.array for two reasons: numpy.array is an extremely optimized library for these kind of operations, and because we obtain more readable code through NumPy's elegant methods rather than the raw index access of the first example.

This particular code doesn't do much in itself, but it does open a world of possibilities. It is, however, advisable that you utilize built-in filters and methods to manipulate an entire image; the above approach is only suitable for small regions of interest.

Now, let's take a look at a very common operation, namely, manipulating channels. Sometimes, you'll want to zero-out all the values of a particular channel (B, G, or R).

Tip

Using loops to manipulate the Python arrays is very costly in terms of runtime and should be avoided at all costs. Using array indexing allows for efficient manipulation of pixels. This is a costly and slow operation, especially if you manipulate videos, you'll find yourself with a jittery output. Then a feature called indexing comes to the rescue. Setting all G (green) values of an image to 0 is as simple as using this code:

import cv
import  as  np
img = cv.imread('MyPic.png')
img[:, :, 1] = 0

This is a fairly impressive piece of code and easy to understand. The relevant line is the last one, which basically instructs the program to take all pixels from all rows and columns and set the resulting value at index one of the three-element array, representing the color of the pixel to 0. If you display this image, you will notice a complete absence of green.

There are a number of interesting things we can do by accessing raw pixels with NumPy's array indexing; one of them is defining regions of interests (ROI). Once the region is defined, we can perform a number of operations, namely, binding this region to a variable, and then even defining a second region and assigning it the value of the first one (visually copying a portion of the image over to another position in the image):

import cv
import numpy as  np
img = cv.imread('MyPic.png')
my_roi = img[0:100, 0:100]
img[300:400, 300:400] = my_roi

It's important to make sure that the two regions correspond in terms of size. If not, NumPy will (rightly) complain that the two shapes mismatch.

Finally, there are a few interesting details we can obtain from numpy.array, such as the image properties using this code:

import cv
import numpy  as  np
img = cv.imread('MyPic.png')
print img.shape
print img.size
print img.dtype

These three properties are in this order:

  • Shape: NumPy returns a tuple containing the width, height, and—if the image is in color—the number of channels. This is useful to debug a type of image; if the image is monochromatic or grayscale, it will not contain a channel's value.
  • Size: This property refers to the size of an image in pixels.
  • Datatype: This property refers to the datatype used for an image (normally a variation of an unsigned integer type and the bits supported by this type, that is, uint8).

All in all, it is strongly advisable that you familiarize yourself with NumPy in general and numpy.array in particular when working with OpenCV, as it is the foundation of an image processing done with Python.

Reading/writing a video file

OpenCV provides the VideoCapture and VideoWriter classes that support various video file formats. The supported formats vary by system but should always include an AVI. Via its read() method, a VideoCapture class may be polled for new frames until it reaches the end of its video file. Each frame is an image in a BGR format.

Conversely, an image may be passed to the write() method of the VideoWriter class, which appends the image to a file in VideoWriter. Let's look at an example that reads frames from one AVI file and writes them to another with a YUV encoding:

import cv2

videoCapture = cv2.VideoCapture('MyInputVid.avi')
fps = videoCapture.get(cv2.CAP_PROP_FPS)
size = (int(videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),
        int(videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter = cv2.VideoWriter(
    'MyOutputVid.avi', cv2.VideoWriter_fourcc('I','4','2','0'), fps, size)

success, frame = videoCapture.read()
while success: # Loop until there are no more frames.
    videoWriter.write(frame)
    success, frame = videoCapture.read()

The arguments to the VideoWriter class constructor deserve special attention. A video's filename must be specified. Any preexisting file with this name is overwritten. A video codec must also be specified. The available codecs may vary from system to system. These are the options that are included:

  • cv2.VideoWriter_fourcc('I','4','2','0'): This option is an uncompressed YUV encoding, 4:2:0 chroma subsampled. This encoding is widely compatible but produces large files. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('P','I','M','1'): This option is MPEG-1. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('X','V','I','D'): This option is MPEG-4 and a preferred option if you want the resulting video size to be average. The file extension should be .avi.
  • cv2.VideoWriter_fourcc('T','H','E','O'): This option is Ogg Vorbis. The file extension should be .ogv.
  • cv2.VideoWriter_fourcc('F','L','V','1'): This option is a Flash video. The file extension should be .flv.

A frame rate and frame size must be specified too. Since we are copying video frames from another video, these properties can be read from the get() method of the VideoCapture class.

Capturing camera frames

A stream of camera frames is represented by the VideoCapture class too. However, for a camera, we construct a VideoCapture class by passing the camera's device index instead of a video's filename. Let's consider an example that captures 10 seconds of video from a camera and writes it to an AVI file:

import cv2

cameraCapture = cv2.VideoCapture(0)
fps = 30 # an assumption
size = (int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),
        int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter = cv2.VideoWriter(
    'MyOutputVid.avi', cv2.VideoWriter_fourcc('I','4','2','0'), fps, size)

success, frame = cameraCapture.read()
numFramesRemaining = 10 * fps - 1
while success and numFramesRemaining > 0:
    videoWriter.write(frame)
    success, frame = cameraCapture.read()
    numFramesRemaining -= 1
cameraCapture.release()

Unfortunately, the get() method of a VideoCapture class does not return an accurate value for the camera's frame rate; it always returns 0. The official documentation at http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html reads:

"When querying a property that is not supported by the backend used by the VideoCapture class, value 0 is returned."

This occurs most commonly on systems where the driver only supports basic functionalities.

For the purpose of creating an appropriate VideoWriter class for the camera, we have to either make an assumption about the frame rate (as we did in the code previously) or measure it using a timer. The latter approach is better and we will cover it later in this chapter.

The number of cameras and their order is of course system-dependent. Unfortunately, OpenCV does not provide any means of querying the number of cameras or their properties. If an invalid index is used to construct a VideoCapture class, the VideoCapture class will not yield any frames; its read() method will return (false, None). A good way to prevent it from trying to retrieve frames from VideoCapture that were not opened correctly is to use the VideoCapture.isOpened method, which returns a Boolean.

The read() method is inappropriate when we need to synchronize a set of cameras or a multihead camera (such as a stereo camera or Kinect). Then, we use the grab() and retrieve() methods instead. For a set of cameras, we use this code:

success0 = cameraCapture0.grab()
success1 = cameraCapture1.grab()
if success0 and success1:
    frame0 = cameraCapture0.retrieve()
    frame1 = cameraCapture1.retrieve()

Displaying images in a window

One of the most basic operations in OpenCV is displaying an image. This can be done with the imshow() function. If you come from any other GUI framework background, you would think it sufficient to call imshow() to display an image. This is only partially true: the image will be displayed, and will disappear immediately. This is by design, to enable the constant refreshing of a window frame when working with videos. Here's a very simple example code to display an image:

import cv2
import numpy as np

img = cv2.imread('my-image.png')
cv2.imshow('my image', img)
cv2.waitKey()
cv2.destroyAllWindows()

The imshow() function takes two parameters: the name of the frame in which we want to display the image, and the image itself. We'll talk about waitKey() in more detail when we explore the displaying of frames in a window.

The aptly named destroyAllWindows() function disposes of all the windows created by OpenCV.

Displaying camera frames in a window

OpenCV allows named windows to be created, redrawn, and destroyed using the namedWindow(), imshow(), and destroyWindow() functions. Also, any window may capture keyboard input via the waitKey() function and mouse input via the setMouseCallback() function. Let's look at an example where we show the frames of a live camera input:

import cv2

clicked = False
def onMouse(event, x, y, flags, param):
    global clicked
    if event == cv2.EVENT_LBUTTONUP:
        clicked = True

cameraCapture = cv2.VideoCapture(0)
cv2.namedWindow('MyWindow')
cv2.setMouseCallback('MyWindow', onMouse)

print 'Showing camera feed. Click window or press any key to stop.'
success, frame = cameraCapture.read()
while success and cv2.waitKey(1) == -1 and not clicked:
    cv2.imshow('MyWindow', frame)
    success, frame = cameraCapture.read()

cv2.destroyWindow('MyWindow')
cameraCapture.release()

The argument for waitKey() is a number of milliseconds to wait for keyboard input. The return value is either -1 (meaning that no key has been pressed) or an ASCII keycode, such as 27 for Esc. For a list of ASCII keycodes, see http://www.asciitable.com/. Also, note that Python provides a standard function, ord(), which can convert a character to its ASCII keycode. For example, ord('a') returns 97.

Tip

On some systems, waitKey() may return a value that encodes more than just the ASCII keycode. (A bug is known to occur on Linux when OpenCV uses GTK as its backend GUI library.) On all systems, we can ensure that we extract just the ASCII keycode by reading the last byte from the return value like this:

keycode = cv2.waitKey(1)
if keycode != -1:
    keycode &= 0xFF

OpenCV's window functions and waitKey() are interdependent. OpenCV windows are only updated when waitKey() is called, and waitKey() only captures input when an OpenCV window has focus.

The mouse callback passed to setMouseCallback() should take five arguments, as seen in our code sample. The callback's param argument is set as an optional third argument to setMouseCallback(). By default, it is 0. The callback's event argument is one of the following actions:

  • cv2.EVENT_MOUSEMOVE: This event refers to mouse movement
  • cv2.EVENT_LBUTTONDOWN: This event refers to the left button down
  • cv2.EVENT_RBUTTONDOWN: This refers to the right button down
  • cv2.EVENT_MBUTTONDOWN: This refers to the middle button down
  • cv2.EVENT_LBUTTONUP: This refers to the left button up
  • cv2.EVENT_RBUTTONUP: This event refers to the right button up
  • cv2.EVENT_MBUTTONUP: This event refers to the middle button up
  • cv2.EVENT_LBUTTONDBLCLK: This event refers to the left button being double-clicked
  • cv2.EVENT_RBUTTONDBLCLK: This refers to the right button being double-clicked
  • cv2.EVENT_MBUTTONDBLCLK: This refers to the middle button being double-clicked

The mouse callback's flags argument may be some bitwise combination of the following events:

  • cv2.EVENT_FLAG_LBUTTON: This event refers to the left button being pressed
  • cv2.EVENT_FLAG_RBUTTON: This event refers to the right button being pressed
  • cv2.EVENT_FLAG_MBUTTON: This event refers to the middle button being pressed
  • cv2.EVENT_FLAG_CTRLKEY: This event refers to the Ctrl key being pressed
  • cv2.EVENT_FLAG_SHIFTKEY: This event refers to the Shift key being pressed
  • cv2.EVENT_FLAG_ALTKEY: This event refers to the Alt key being pressed

Unfortunately, OpenCV does not provide any means of handling window events. For example, we cannot stop our application when a window's close button is clicked. Due to OpenCV's limited event handling and GUI capabilities, many developers prefer to integrate it with other application frameworks. Later in this chapter, we will design an abstraction layer to help integrate OpenCV into any application framework.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.116.51