Installing OpenCV and running samples is fun, but at this stage, we want to try it out ourselves. This chapter introduces OpenCV's I/O functionality. We also discuss the concept of a project and the beginnings of an object-oriented design for this project, which we will flesh out in subsequent chapters.
By starting with a look at the I/O capabilities and design patterns, we will build our project in the same way we would make a sandwich: from the outside in. Bread slices and spread, or endpoints and glue, come before fillings or algorithms. We choose this approach because computer vision is mostly extroverted—it contemplates the real world outside our computer—and we want to apply all our subsequent algorithmic work to the real world through a common interface.
Most CV applications need to get images as input. Most also produce images as output. An interactive CV application might require a camera as an input source and a window as an output destination. However, other possible sources and destinations include image files, video files, and raw bytes. For example, raw bytes might be transmitted via a network connection, or they might be generated by an algorithm if we incorporate procedural graphics into our application. Let's look at each of these possibilities.
OpenCV provides the imread()
and imwrite()
functions that support various file formats for still images. The supported formats vary by system but should always include the BMP format. Typically, PNG, JPEG, and TIFF should be among the supported formats too.
Let's explore the anatomy of the representation of an image in Python and NumPy.
No matter the format, each pixel has a value, but the difference is in how the pixel is represented. For example, we can create a black square image from scratch by simply creating a 2D NumPy array:
img = numpy.zeros((3,3), dtype=numpy.uint8)
If we print this image to a console, we obtain the following result:
array([[0, 0, 0], [0, 0, 0], [0, 0, 0]], dtype=uint8)
Each pixel is represented by a single 8-bit integer, which means that the values for each pixel are in the 0-255 range.
Let's now convert this image into Blue-green-red (BGR) using
cv2.cvtColor
:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
Let's observe how the image has changed:
array([[[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0, 0, 0], [0, 0, 0], [0, 0, 0]], [[0, 0, 0], [0, 0, 0], [0, 0, 0]]], dtype=uint8)
As you can see, each pixel is now represented by a three-element array, with each integer representing the B, G, and R channels, respectively. Other color spaces, such as HSV, will be represented in the same way, albeit with different value ranges (for example, the hue value of the HSV color space has a range of 0-180) and different numbers of channels.
You can check the structure of an image by inspecting the shape
property, which returns rows, columns, and the number of channels (if there is more than one).
Consider this example:
>>> img = numpy.zeros((3,3), dtype=numpy.uint8) >>> img.shape
The preceding code will print (3,3)
. If you then converted the image to BGR, the shape would be (3,3,3)
, which indicates the presence of three channels per pixel.
Images can be loaded from one file format and saved to another. For example, let's convert an image from PNG to JPEG:
import cv2 image = cv2.imread('MyPic.png') cv2.imwrite('MyPic.jpg', image)
Most of the OpenCV functionalities that we use are in the cv2
module. You might come across other OpenCV guides that instead rely on the cv
or cv2.cv
modules, which are legacy versions. The reason why the Python module is called cv2
is not because it is a Python binding module for OpenCV 2.x.x, but because it has introduced a better API, which leverages object-oriented programming as opposed to the previous cv
module, which adhered to a more procedural style of programming.
By default, imread()
returns an image in the BGR color format even if the file uses a grayscale format. BGR represents the same color space as red-green-blue (RGB), but the byte order is reversed.
Optionally, we may specify the mode of imread()
to be one of the following enumerators:
IMREAD_ANYCOLOR = 4
IMREAD_ANYDEPTH = 2
IMREAD_COLOR = 1
IMREAD_GRAYSCALE = 0
IMREAD_LOAD_GDAL = 8
IMREAD_UNCHANGED = -1
For example, let's load a PNG file as a grayscale image (losing any color information in the process), and then, save it as a grayscale PNG image:
import cv2 grayImage = cv2.imread('MyPic.png', cv2.IMREAD_GRAYSCALE) cv2.imwrite('MyPicGray.png', grayImage)
To avoid unnecessary headaches, use absolute paths to your images (for example, C:UsersJoePicturesMyPic.png
on Windows or /home/joe/pictures/MyPic.png
on Unix) at least while you're familiarizing yourself with OpenCV's API. The path of an image, unless absolute, is relative to the folder that contains the Python script, so in the preceding example, MyPic.png
would have to be in the same folder as your Python script or the image won't be found.
Regardless of the mode, imread()
discards any alpha channel (transparency). The imwrite()
function requires an image to be in the BGR or grayscale format with a certain number of bits per channel that the output format can support. For example, bmp
requires 8 bits per channel, while PNG allows either 8 or 16 bits per channel.
Conceptually, a byte is an integer ranging from 0 to 255. In all real-time graphic applications today, a pixel is typically represented by one byte per channel, though other representations are also possible.
An OpenCV image is a 2D or 3D array of the .array
type. An 8-bit grayscale image is a 2D array containing byte values. A 24-bit BGR image is a 3D array, which also contains byte values. We may access these values by using an expression, such as image[0, 0]
or image[0, 0, 0]
. The first index is the pixel's y coordinate or row, 0
being the top. The second index is the pixel's x coordinate or column, 0
being the leftmost. The third index (if applicable) represents a color channel.
For example, in an 8-bit grayscale image with a white pixel in the upper-left corner, image[0, 0]
is 255
. For a 24-bit BGR image with a blue pixel in the upper-left corner, image[0, 0]
is [255, 0, 0]
.
As an alternative to using an expression, such as image[0, 0]
or image[0, 0] = 128
, we may use an expression, such as image.item((0, 0))
or image.setitem((0, 0), 128)
. The latter expressions are more efficient for single-pixel operations. However, as we will see in subsequent chapters, we usually want to perform operations on large slices of an image rather than on single pixels.
Provided that an image has 8 bits per channel, we can cast it to a standard Python bytearray
, which is one-dimensional:
byteArray = bytearray(image)
Conversely, provided that bytearray
contains bytes in an appropriate order, we can cast and then reshape it to get a numpy.array
type that is an image:
grayImage = numpy.array(grayByteArray).reshape(height, width) bgrImage = numpy.array(bgrByteArray).reshape(height, width, 3)
As a more complete example, let's convert bytearray
, which contains random bytes to a grayscale image and a BGR image:
import cv2
import numpy
import os
# Make an array of 120,000 random bytes.
randomByteArray = bytearray(os.urandom(120000))
flatNumpyArray = numpy.array(randomByteArray)
# Convert the array to make a 400x300 grayscale image.
grayImage = flatNumpyArray.reshape(300, 400)
cv2.imwrite('RandomGray.png', grayImage)
# Convert the array to make a 400x100 color image.
bgrImage = flatNumpyArray.reshape(100, 400, 3)
cv2.imwrite('RandomColor.png', bgrImage)
After running this script, we should have a pair of randomly generated images, RandomGray.png
and RandomColor.png
, in the script's directory.
Here, we use Python's standard os.urandom()
function to generate random raw bytes, which we will then convert to a NumPy array. Note that it is also possible to generate a random NumPy array directly (and more efficiently) using a statement, such as numpy.random.randint(0, 256, 120000).reshape(300, 400)
. The only reason we use os.urandom()
is to help demonstrate a conversion from raw bytes.
Now that you have a better understanding of how an image is formed, we can start performing basic operations on it. We know that the easiest (and most common) way to load an image in OpenCV is to use the imread
function. We also know that this will return an image, which is really an array (either a 2D or 3D one, depending on the parameters you passed to imread()
).
The y.array
structure is well optimized for array operations, and it allows certain kinds of bulk manipulations that are not available in a plain Python list. These kinds of .array
type-specific operations come in handy for image manipulations in OpenCV. Let's explore image manipulations from the start and step by step though, with a basic example: say you want to manipulate a pixel at the coordinates, (0, 0), of a BGR image and turn it into a white pixel.
import cv
import numpy as np
img = cv.imread('MyPic.png')
img[0,0] = [255, 255, 255]
If you then showed the image with a standard imshow()
call, you will see a white dot in the top-left corner of the image. Naturally, this isn't very useful, but it shows what can be accomplished. Let's now leverage the ability of numpy.array
to operate transformations to an array much faster than a plain Python array.
Let's say that you want to change the blue value of a particular pixel, for example, the pixel at coordinates, (150, 120). The numpy.array
type provides a very handy method, item()
, which takes three parameters: the x (or left) position, y (or top), and the index within the array at (x, y) position (remember that in a BGR image, the data at a certain position is a three-element array containing the B, G, and R values in this order) and returns the value at the index position. Another itemset()
method sets the value of a particular channel of a particular pixel to a specified value (itemset()
takes two arguments: a three-element tuple (x, y, and index) and the new value).
In this example, we will change the value of blue at (150, 120) from its current value (127) to an arbitrary 255:
import cv
import numpy as np
img = cv.imread('MyPic.png')
print img.item(150, 120, 0) // prints the current value of B for that pixel
img.itemset( (150, 120, 0), 255)
print img.item(150, 120, 0) // prints 255
Remember that we do this with numpy.array
for two reasons: numpy.array
is an extremely optimized library for these kind of operations, and because we obtain more readable code through NumPy's elegant methods rather than the raw index access of the first example.
This particular code doesn't do much in itself, but it does open a world of possibilities. It is, however, advisable that you utilize built-in filters and methods to manipulate an entire image; the above approach is only suitable for small regions of interest.
Now, let's take a look at a very common operation, namely, manipulating channels. Sometimes, you'll want to zero-out all the values of a particular channel (B, G, or R).
Using loops to manipulate the Python arrays is very costly in terms of runtime and should be avoided at all costs. Using array indexing allows for efficient manipulation of pixels. This is a costly and slow operation, especially if you manipulate videos, you'll find yourself with a jittery output. Then a feature called indexing comes to the rescue. Setting all G (green) values of an image to 0
is as simple as using this code:
import cv import as np img = cv.imread('MyPic.png') img[:, :, 1] = 0
This is a fairly impressive piece of code and easy to understand. The relevant line is the last one, which basically instructs the program to take all pixels from all rows and columns and set the resulting value at index one of the three-element array, representing the color of the pixel to 0
. If you display this image, you will notice a complete absence of green.
There are a number of interesting things we can do by accessing raw pixels with NumPy's array indexing; one of them is defining regions of interests (ROI). Once the region is defined, we can perform a number of operations, namely, binding this region to a variable, and then even defining a second region and assigning it the value of the first one (visually copying a portion of the image over to another position in the image):
import cv import numpy as np img = cv.imread('MyPic.png') my_roi = img[0:100, 0:100] img[300:400, 300:400] = my_roi
It's important to make sure that the two regions correspond in terms of size. If not, NumPy will (rightly) complain that the two shapes mismatch.
Finally, there are a few interesting details we can obtain from numpy.array
, such as the image properties using this code:
import cv import numpy as np img = cv.imread('MyPic.png') print img.shape print img.size print img.dtype
These three properties are in this order:
uint8
).All in all, it is strongly advisable that you familiarize yourself with NumPy in general and numpy.array
in particular when working with OpenCV, as it is the foundation of an image processing done with Python.
OpenCV provides the VideoCapture
and VideoWriter
classes that support various video file formats. The supported formats vary by system but should always include an AVI. Via its read()
method, a VideoCapture
class may be polled for new frames until it reaches the end of its video file. Each frame is an image in a BGR format.
Conversely, an image may be passed to the write()
method of the VideoWriter
class, which appends the image to a file in VideoWriter
. Let's look at an example that reads frames from one AVI file and writes them to another with a YUV encoding:
import cv2 videoCapture = cv2.VideoCapture('MyInputVid.avi') fps = videoCapture.get(cv2.CAP_PROP_FPS) size = (int(videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT))) videoWriter = cv2.VideoWriter( 'MyOutputVid.avi', cv2.VideoWriter_fourcc('I','4','2','0'), fps, size) success, frame = videoCapture.read() while success: # Loop until there are no more frames. videoWriter.write(frame) success, frame = videoCapture.read()
The arguments to the VideoWriter
class constructor deserve special attention. A video's filename must be specified. Any preexisting file with this name is overwritten. A video codec must also be specified. The available codecs may vary from system to system. These are the options that are included:
cv2.VideoWriter_fourcc('I','4','2','0')
: This option is an uncompressed YUV encoding, 4:2:0 chroma subsampled. This encoding is widely compatible but produces large files. The file extension should be .avi
.cv2.VideoWriter_fourcc('P','I','M','1')
: This option is MPEG-1. The file extension should be .avi
.cv2.VideoWriter_fourcc('X','V','I','D')
: This option is MPEG-4 and a preferred option if you want the resulting video size to be average. The file extension should be .avi
.cv2.VideoWriter_fourcc('T','H','E','O')
: This option is Ogg Vorbis. The file extension should be .ogv
.cv2.VideoWriter_fourcc('F','L','V','1')
: This option is a Flash video. The file extension should be .flv
.A frame rate and frame size must be specified too. Since we are copying video frames from another video, these properties can be read from the get()
method of the VideoCapture
class.
A stream of camera frames is represented by the VideoCapture
class too. However, for a camera, we construct a VideoCapture
class by passing the camera's device index instead of a video's filename. Let's consider an example that captures 10 seconds of video from a camera and writes it to an AVI file:
import cv2 cameraCapture = cv2.VideoCapture(0) fps = 30 # an assumption size = (int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT))) videoWriter = cv2.VideoWriter( 'MyOutputVid.avi', cv2.VideoWriter_fourcc('I','4','2','0'), fps, size) success, frame = cameraCapture.read() numFramesRemaining = 10 * fps - 1 while success and numFramesRemaining > 0: videoWriter.write(frame) success, frame = cameraCapture.read() numFramesRemaining -= 1 cameraCapture.release()
Unfortunately, the get()
method of a VideoCapture
class does not return an accurate value for the camera's frame rate; it always returns 0
. The official documentation at http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html reads:
"When querying a property that is not supported by the backend used by the
VideoCapture
class, value0
is returned."
This occurs most commonly on systems where the driver only supports basic functionalities.
For the purpose of creating an appropriate VideoWriter
class for the camera, we have to either make an assumption about the frame rate (as we did in the code previously) or measure it using a timer. The latter approach is better and we will cover it later in this chapter.
The number of cameras and their order is of course system-dependent. Unfortunately, OpenCV does not provide any means of querying the number of cameras or their properties. If an invalid index is used to construct a VideoCapture
class, the VideoCapture
class will not yield any frames; its read()
method will return (false, None)
. A good way to prevent it from trying to retrieve frames from VideoCapture
that were not opened correctly is to use the VideoCapture.isOpened
method, which returns a Boolean.
The read()
method is inappropriate when we need to synchronize a set of cameras or a multihead camera (such as a stereo camera or Kinect). Then, we use the grab()
and retrieve()
methods instead. For a set of cameras, we use this code:
success0 = cameraCapture0.grab() success1 = cameraCapture1.grab() if success0 and success1: frame0 = cameraCapture0.retrieve() frame1 = cameraCapture1.retrieve()
One of the most basic operations in OpenCV is displaying an image. This can be done with the imshow()
function. If you come from any other GUI framework background, you would think it sufficient to call imshow()
to display an image. This is only partially true: the image will be displayed, and will disappear immediately. This is by design, to enable the constant refreshing of a window frame when working with videos. Here's a very simple example code to display an image:
import cv2 import numpy as np img = cv2.imread('my-image.png') cv2.imshow('my image', img) cv2.waitKey() cv2.destroyAllWindows()
The imshow()
function takes two parameters: the name of the frame in which we want to display the image, and the image itself. We'll talk about waitKey()
in more detail when we explore the displaying of frames in a window.
The aptly named destroyAllWindows()
function disposes of all the windows created by OpenCV.
OpenCV allows named windows to be created, redrawn, and destroyed using the namedWindow()
, imshow()
, and destroyWindow()
functions. Also, any window may capture keyboard input via the waitKey()
function and mouse input via the setMouseCallback()
function. Let's look at an example where we show the frames of a live camera input:
import cv2 clicked = False def onMouse(event, x, y, flags, param): global clicked if event == cv2.EVENT_LBUTTONUP: clicked = True cameraCapture = cv2.VideoCapture(0) cv2.namedWindow('MyWindow') cv2.setMouseCallback('MyWindow', onMouse) print 'Showing camera feed. Click window or press any key to stop.' success, frame = cameraCapture.read() while success and cv2.waitKey(1) == -1 and not clicked: cv2.imshow('MyWindow', frame) success, frame = cameraCapture.read() cv2.destroyWindow('MyWindow') cameraCapture.release()
The argument for waitKey()
is a number of milliseconds to wait for keyboard input. The return value is either -1
(meaning that no key has been pressed) or an ASCII keycode, such as 27
for Esc. For a list of ASCII keycodes, see http://www.asciitable.com/. Also, note that Python provides a standard function, ord()
, which can convert a character to its ASCII keycode. For example, ord('a')
returns 97
.
On some systems, waitKey()
may return a value that encodes more than just the ASCII keycode. (A bug is known to occur on Linux when OpenCV uses GTK as its backend GUI library.) On all systems, we can ensure that we extract just the ASCII keycode by reading the last byte from the return value like this:
keycode = cv2.waitKey(1) if keycode != -1: keycode &= 0xFF
OpenCV's window functions and waitKey()
are interdependent. OpenCV windows are only updated when waitKey()
is called, and waitKey()
only captures input when an OpenCV window has focus.
The mouse callback passed to setMouseCallback()
should take five arguments, as seen in our code sample. The callback's
param
argument is set as an optional third argument to setMouseCallback()
. By default, it is 0
. The callback's event argument is one of the following actions:
cv2.EVENT_MOUSEMOVE
: This event refers to mouse movementcv2.EVENT_LBUTTONDOWN
: This event refers to the left button downcv2.EVENT_RBUTTONDOWN
: This refers to the right button downcv2.EVENT_MBUTTONDOWN
: This refers to the middle button downcv2.EVENT_LBUTTONUP
: This refers to the left button upcv2.EVENT_RBUTTONUP
: This event refers to the right button upcv2.EVENT_MBUTTONUP
: This event refers to the middle button upcv2.EVENT_LBUTTONDBLCLK
: This event refers to the left button being double-clickedcv2.EVENT_RBUTTONDBLCLK
: This refers to the right button being double-clickedcv2.EVENT_MBUTTONDBLCLK
: This refers to the middle button being double-clickedThe mouse callback's flags argument may be some bitwise combination of the following events:
cv2.EVENT_FLAG_LBUTTON
: This event refers to the left button being pressedcv2.EVENT_FLAG_RBUTTON
: This event refers to the right button being pressedcv2.EVENT_FLAG_MBUTTON
: This event refers to the middle button being pressedcv2.EVENT_FLAG_CTRLKEY
: This event refers to the Ctrl key being pressedcv2.EVENT_FLAG_SHIFTKEY
: This event refers to the Shift key being pressedcv2.EVENT_FLAG_ALTKEY
: This event refers to the Alt key being pressedUnfortunately, OpenCV does not provide any means of handling window events. For example, we cannot stop our application when a window's close button is clicked. Due to OpenCV's limited event handling and GUI capabilities, many developers prefer to integrate it with other application frameworks. Later in this chapter, we will design an abstraction layer to help integrate OpenCV into any application framework.
13.58.116.51