Epochs, iterations, and batch sizes

As our dataset is much larger now, we need to also think about the practicalities of training it. Performing training on an item-by-item basis is fine, but we can train items in batches as well. Instead of training on all 60,000 items in MNIST, we can split up our data into 600 iterations, with batches of 100 items each. For our dataset, this means feeding our model 100 x 784 matrices as input instead of a 784-value-long vector. We could also feed it a three-dimensional tensor of 100 x 28 x 28, but we'll do that in a later chapter when we cover a model architecture that makes good use of this structure.

Since we are doing this in a programming language, we can just build a loop as follows:

for b := 0; b < batches; b++ {
    start := b * bs
    end := start + bs
    if start >= numExamples {
        break
    }
    if end > numExamples {
        end = numExamples
    }
}

And then, within each loop, we can insert our logic to extract the necessary information to feed into our machine:

var xVal, yVal tensor.Tensor
if xVal, err = inputs.Slice(sli{start, end}); err != nil {
    log.Fatal("Unable to slice x")
}

if yVal, err = targets.Slice(sli{start, end}); err != nil {
    log.Fatal("Unable to slice y")
}
// if err = xVal.(*tensor.Dense).Reshape(bs, 1, 28, 28); err != nil {
// log.Fatal("Unable to reshape %v", err)
// }
if err = xVal.(*tensor.Dense).Reshape(bs, 784); err != nil {
    log.Fatal("Unable to reshape %v", err)
}

gorgonia.Let(x, xVal)
gorgonia.Let(y, yVal)
if err = vm.RunAll(); err != nil {
    log.Fatalf("Failed at epoch %d: %v", i, err)
}
solver.Step(m.learnables())
vm.Reset()

Another term you'll hear a lot in deep learning is epochs. Epochs really just run your input data into your data multiple times. If you recall, gradient descent is an iterative process: it depends heavily on repetition to converge to the optimal solution. This means that we have a simple way to improve our model despite having only 60,000 training images: we can repeat the process a number of times until our network converges.

We can certainly manage this in several different ways. For example, we can stop repetition when the difference in our loss function between the previous epoch and the current epoch is small enough. We can also run a champion-challenger approach and take the weights from the epochs that emerge as champions on our test set. However, as we want to keep our example simple, we'll pick an arbitrary number of epochs; in this case, 100.

While we're at it, let's also add a progress bar so we can watch our model train:

batches := numExamples / bs
log.Printf("Batches %d", batches)
bar := pb.New(batches)
bar.SetRefreshRate(time.Second / 20)
bar.SetMaxWidth(80)

for i := 0; i < *epochs; i++ {
    // for i := 0; i < 1; i++ {
    bar.Prefix(fmt.Sprintf("Epoch %d", i))
    bar.Set(0)
    bar.Start()
    // put iteration and batch logic above here
    bar.Update()
    log.Printf("Epoch %d | cost %v", i, costVal)
}

Table of Contents for Epochs, iterations, and batch sizes

Create new playlist

Sign In

Sign Up

Table of Contents for
Epochs, iterations, and batch sizes