Image-feature extraction

When dealing with unstructured data, be it text or images, we must first convert the data into a numerical representation that's usable by our machine learning model. The process of converting data that is non-numeric into a numerical representation is called feature extraction. For image data, our features are the pixel values of the image.

First, let's imagine a 1,150 x 1,150 pixel grayscale image. A 1,150 x 1,150 pixel image will return a 1,150 x 1,150 matrix of pixel intensities. For grayscale images, the pixel values can range from 0 to 255, with 0 being a completely black pixel, and 255 being a completely white pixel, and shades of gray in between.  

To demonstrate what this looks like in code, let's extract the features from our grayscale cat burrito. The image is available on GitHub at https://github.com/PacktPublishing/Python-Machine-Learning-Blueprints-Second-Edition/tree/master/Chapter08 as grayscale_cat_burrito.jpg

I've made the image assets used throughout this chapter available to you at https://github.com/mroman09/packt-image-assets. You can find our cat burritos there!

Let's now take a look at a sample of this in the following code:

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

cat_burrito = mpimg.imread('images/grayscale_cat_burrito.jpg')
cat_burrito
If you're unable to read a .jpg by running the preceding code, just install PIL by running pip install pillow

In the preceding code, we imported pandas and two submodules: image and pyplot, from matplotlib. We used the imread method from matplotlib.image to read-in the image.

Running the preceding code gives us the following output:

The output is a two-dimensional numpy ndarray that contains the features of our model. As with most applied machine learning applications, there are several preprocessing steps you'll want to perform on these extracted features, some of which we'll explore together on the Zalando fashion dataset later in this chapter, but these are the raw extracted features of the image!

The shape of the extracted features for our grayscale image is image_height rows x image_width columns. We can check the shape easily by running the following:

cat_burrito.shape

The preceding code returns this output:

We can check the maximum and minimum pixel values in our ndarray easily, too:

print(cat_burrito.max())
print(cat_burrito.min())

This returns the following:

Finally, we can display our grayscale image from our ndarray by running this code:

plt.axis('off')
plt.imshow(cat_burrito, cmap='gray');

The preceding code returns our image, which is available at https://github.com/PacktPublishing/Python-Machine-Learning-Blueprints-Second-Edition/tree/master/Chapter08 as output_grayscale_cat_burrito.png.

The feature-extraction process for color images is identical; however, with color images, the shape of our ndarray output will be three-dimensional—a tensor—representing the red, green, and blue (RGB) pixel values of our image. Here, we'll carry out the same process as before, this time on a color version of the cat burrito. The image is available on GitHub at https://github.com/PacktPublishing/Python-Machine-Learning-Blueprints-Second-Edition/tree/master/Chapter08 as color_cat_burrito.jpg.

Let's extract the features from our color version of the cat burrito by using the following code:

color_cat_burrito = mpimg.imread('images/color_cat_burrito.jpg')
color_cat_burrito.shape

Running this code returns the following output:

Again, here we see that this image contains three channels. Our color_cat_burrito variable is a tensor that contains three matrices that tell us what the RGB values are for each pixel in our image.

We can display the color image from our ndarray by running the following:

plt.axis('off')
plt.imshow(color_cat_burrito);

This returns our color image. The image is available on GitHub at https://github.com/PacktPublishing/Python-Machine-Learning-Blueprints-Second-Edition/tree/master/Chapter08 as output_color_cat_burrito.png.

This is the first step of our image-feature extraction. We've taken a single image at a time and converted those images into numeric values using just a few lines of code. In doing so, we saw that extracting features from grayscale images produces a two-dimensional ndarray and extracting features from color images produces a tensor of pixel-intensity values.
However, there's a slight problem. Remember, this is just a single image, a single training sample, a single row of our data. In the instance of our grayscale image, if we were to flatten this matrix into a single row, we would have image_height x image_width columns, or in our case, 1,322,500 columns. We can confirm that in code by running the following snippet:

# flattening our grayscale cat_burrito and checking the length
len(cat_burrito.flatten())

This is an issue! As with other machine learning modeling tasks, high dimensionality leads to model-performance issues. At this magnitude of dimensionality, any model we build will likely overfit, and training times will be slow.

This dimensionality problem is endemic to computer-vision tasks of this sort. Even a dataset of a lower resolution, 400 x 400 pixel grayscale cat burritos, would leave us with 160,000 features per image.

There is, however, a known solution to this problem: convolutional neural networks. In the next section, we'll continue our feature-extraction process using convolutional neural networks to build lower-dimensional representations of these raw image pixels. We'll go over the mechanics of how they work and continue to build an idea of why they're so performant in image-classification tasks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.85.221