Pooling

Now, in this section, we'll move on to pooling. We'll be learning about the one-dimensional pooling operation; the two-dimensional pooling operation, such as you would use on an image; and then finally, we are going to discuss image channels and how they're used in this data.

Okay, from the top, we're going to be importing keras and some additional layers this time—particularly MaxPooling1D, and MaxPooling2D—and we're going to go ahead and import the convolutional 2D layer, which we'll be using a little bit later on. So, if you take a peek at the code, what we're doing is setting up a matrix, and this matrix just has some values. You can think of it as a square matrix of almost all ones, but I've sprinkled some higher values in here; there's 2, 3, 4, and 5. What the max pooling is going to do is extract out the highest values. So, we're going to be using a bit of a trick. To date, we've used Keras to learn machine learning models, but it turns out you can also just run the layers directly and do a little bit of math:

Importing packages

So, as you already know, you can see the values that popped into the screen; 2, 3, 4, and 5; those are actually the maximum values in the single dimension on the leading edge here:

Max pooling operation single matrix

You can see in the preceding screenshot that the sequential model that's been put together just has the max pooling operation, and we directly insert our NumPy array into the predicted batch. We're basically skipping the training step here and just running the model as a mathematical engine. But this gives you a sense of what the max pooling operation does: it pulls out the maximum value in the dimension.

I also want to point out np.squeeze. What does this do? Well, squeeze eliminates dimensions that only have one potential value. So, remember that Keras almost always works in a batch. Here, our batch has only one batch entry: this matrix we have on the preceding screenshot. So, squeezing eliminates batch dimension so that we have a nice flat array as output.

For moving up to two dimensions, we're going to be using the MaxPooling2D operator with a pool size of 2. What this means is that we're going to be using a 2 x 2 square pool that's going to extract the maximum values. You can see from the values on the screenshot—1, 4, 3, and 5—that if you look back up at the input matrix, you'll see that the upper left-hand 1 is the maximum value of the upper left-hand region of the input, and that 4 is the maximum value of the upper right-hand region:

Max pooling operation matrices

You get the basic idea! It actually took the 4 x 4 and turned it into a 2 x 2 by pulling in the maximum value. Okay, so if this is just two dimensions, why do we have three dimensions here? 4, 4, and 1. The answer is that pixels have color; they could be red, green, or blue; in which case you'd have a three in the final channel dimension. In the black and white images we're working with, you simply have a one in that dimension. So, when we pool, what we're really doing is pooling in a specific channel. In this case, we're pooling the black and white pixels together.

You can see here that there's an additional call in here, which is np.expand_dims with -1. What this does is it takes our a perfectly square array (that's 4 x 4 on the input), and adds an additional one dimension to the end to encode the channel shape so that it fits MaxPooling2D. Then, we undo that on the output with np.squeeze again, which reduces all of the one dimension axes and tosses them away, and so we get a nice square matrix on output.

Alright, so why do we carry out pooling? Well, it extracts the strong signals. What the pooling operation does is it reduces the size of the image and focuses in on the strongest values. This effectively allows the machine learner to help identify the most important pixels and regions in the image.

Table of Contents for Pooling

Create new playlist

Sign In

Sign Up

Table of Contents for
Pooling