Making a TFRecord

Before we start, let's break down how a TFRecord works. After you open a TFRecord file for writing, you create something called an Example. This is just a protocol buffer that we will use to stuff all the data we want to save inside. Within an Example, we will store our data in Features. Features is a way to describe the data inside of our Example. A Feature can be one of three types: bytes list, float list, or int64 list. Once we have put all our data into the Features and written them into the Example buffer, we will serialize the whole protocol buffer to a string and then this is what we write to the TFRecord file.

Let's see how this can work in practice. We will keep using our previous example of image classification and create a TFRecord to store the relevant data.

First, we create our file, and this will also return to us a way to write to it:

writer = tf.python_io.TFRecordWriter('/data/dataset.tfrecord') 

Next, we are going to assume that our images have been loaded and are in memory as a numpy array already; we will see later how we can store encoded images as well:

# labels is a list of integer labels.
# image_data is an NxHxWxC numpy array of images
for index in range(len(labels)):
image_raw = image_data[index, ...].tobytes()
# Create our feature.
my_features= {
'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw])), 'label':
tf.train.Feature(int64_list=tf.train.Int64List(value=[labels[index]]))}
# The Example protocol buffer.
example = tf.train.Example(features=tf.train.Features(feature=my_features)
writer.write(example.SerializeToString())

writer.close() # Close our tfrecord file after finishing writing to it.

We loop over the list of labels, converting each image array to raw bytes one at a time.

To store data in our example, we need to add Features to it. We store our Features in a dictionary where each key is some string name we choose, for instance, label, and the value is a tf.train.Feature, which will be our data.

The data going into the tf.train.Feature must be converted to the correct type that it expects using either tf.train.BytesList, tf.train.Int64List, or tf.train.FloatList.

Next, we create a tf.train.Example protocol buffer and pass the Features to it. Finally, we serialize our Example to string and write it to our TFRecord file. Once we have looped through the whole array of images, we must remember to close our file for writing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.139.169