How to do it...

First, create the save_to_npy.py file and add the following lines to import the necessary dependencies and point to the python3 interpreter:

#!/usr/bin/env python3
from PIL import Image
import numpy as np
import os

Next, build the first core method for this script, which will grab a list of the files from a given folder based on an extension, as follows:

def grabListOfFiles(startingDirectory,extension=".webp"):
    listOfFiles = []
    for file in os.listdir(startingDirectory):
        if file.endswith(extension):
            listOfFiles.append(os.path.join(startingDirectory, 
                               file))
    return listOfFiles

The great thing about the preceding function is that it will recursively search the directory and find every file with the given extension. In our case, the LSUN dataset has all of its images saved in a WebP format.

The second method called grabArrayOfImages will read the image and convert the image to RGB or grayscale depending on your flag, as follows:

def grabArrayOfImages(listOfFiles,resizeW=64,resizeH=64,gray=False):
    imageArr = []
    for f in listOfFiles:
        if gray:
            im = Image.open(f).convert("L")
        else:
            im = Image.open(f).convert("RGB")
        im = im.resize((resizeW,resizeH))
        imData = np.asarray(im)
        imageArr.append(imData)
    return imageArr

It's important to use the Image class from Pillow in the preceding example in order to read the WebP files correctly. Using the built-in functionality of Pillow, we are able to resize the image to a smaller, square size (64 x 64 is reasonable for most graphics cards). Once the image has been read and resized, we can then append it to an array. Once the list of files has been exhausted, we return the array.

Finally, let's use each of the functions that we have discussed to grab a list of the files, process the images in both grayscale and color, and then finally use Numpy's built-in save function to save the images to their own npy files, as follows:

direc = "/data/church_outdoor_train_lmdb/expanded/"

listOfFiles = grabListOfFiles(direc)
imageArrGray = 
    grabArrayOfImages(listOfFiles,resizeW=64,resizeH=64,gray=True)
imageArrColor = grabArrayOfImages(listOfFiles,resizeW=64,resizeH=64)

print("Shape of ImageArr Gray: ", np.shape(imageArrGray))
print("Shape of ImageArr Color: ", np.shape(imageArrColor))

np.save('/data/church_outdoor_train_lmdb_gray.npy', imageArrGray)
np.save('/data/church_outdoor_train_lmdb_color.npy', imageArrColor)

Now, we need to ensure that the create_data shell script is executable (chmod 777 create_data.sh). Run the following command to download the data, unpack it, and then save it to the relevant files for learning:

sudo ./create_data.sh

At the end of this script, you should see an output similar to the following:

Archive: church_outdoor_val_lmdb.zip
  creating: church_outdoor_val_lmdb/
  inflating: church_outdoor_val_lmdb/lock.mdb
  inflating: church_outdoor_val_lmdb/data.mdb
.
.
Exporting /data/church_outdoor_train_lmdb to 
Finished 1000 images
Finished 2000 images
Finished 3000 images
Finished 4000 images
Finished 5000 images
Finished 6000 images
Finished 7000 images
.
.
Shape of ImageArr Gray: (126227, 64, 64, 1)
Shape of ImageArr Color: (126227, 64, 64, 3)

Let's move on to the next recipe!

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...