Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Sharding

Although we said that it is best if we have all our data in one file, this is not actually 100% true. As TFRecords are read sequentially, we are unable to shuffle our dataset if we use just one file. Every time you reach the end of the TFRecord after an epoch of training, you will go back to the start of the dataset but, unfortunately, the data will be in the same order every time you go through the file.

In order to allow us to shuffle data, one thing we can do is shard our data by creating multiple TFRecord files and spreading out data across these multiple files. This way, we can just shuffle the order that we load the TFRecord files each epoch and thus our data will be effectively shuffled for us while we train. Something like 1,000 shards for every million images is a good baseline to follow.

In the next section, we will see how to use our TFRecords to make efficient data feeding pipelines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.148.104.124

Table of Contents for Sharding

Create new playlist

Sign In

Sign Up

Table of Contents for
Sharding