Running the neural network

Observe that up to this point, we've merely described the computations we need to perform. The neural network doesn't actually run; this is simply a description on the neural network to run.

We need to be able to evaluate the mathematical expression. In order to do so, we need to compile the expression into a program that can be executed. Here's the code to do it:

    vm := gorgonia.NewTapeMachine(g, 
        gorgonia.WithPrecompiled(prog, locMap), 
        gorgonia.BindDualValues(m.learnables()...))
    solver := gorgonia.NewRMSPropSolver(gorgonia.WithBatchSize(float64(bs)))
    defer vm.Close()

It's not strictly necessary to call gorgonia.Compile(g). This was done for pedagogical reasons, to showcase that the mathematical expression can indeed be compiled down into an assembly-like program. In production systems, I often just do something like this: vm := gorgonia.NewTapeMachine(g, gorgonia.BindDualValues(m.learnables()...)).

There are two provided vm types in Gorgonia, each representing different modes of computation. In this project, we're merely using NewTapeMachine to get a *gorgonia.tapeMachine. The function to create a vm takes many options, and the BindDualValues option simply binds the gradients of each of the variables in the models to the variables themselves. This allows for cheaper gradient descent.

Lastly, note that a VM is a resource. You should think of a VM as if it were an external CPU, a computing resource. It is good practice to close any external resources after we use them and, fortunately, Go has a very convenient way of handling cleanups: defer vm.Close().

Before we move on to talk about gradient descent, here's what the compiled program looks like, in pseudo-assembly:


 Instructions:
 0 loadArg 0 (x) to CPU0
 1 loadArg 1 (y) to CPU1
 2 loadArg 2 (w0) to CPU2
 3 loadArg 3 (w1) to CPU3
 4 loadArg 4 (w2) to CPU4
 5 loadArg 5 (w3) to CPU5
 6 loadArg 6 (w4) to CPU6
 7 im2col<(3,3), (1, 1), (1,1) (1, 1)> [CPU0] CPU7 false false false
 8 Reshape(32, 9) [CPU2] CPU8 false false false
 9 Reshape(78400, 9) [CPU7] CPU7 false true false
 10 Alloc Matrix float64(78400, 32) CPU9
 11 A × Bᵀ [CPU7 CPU8] CPU9 true false true
 12 DoWork
 13 Reshape(100, 28, 28, 32) [CPU9] CPU9 false true false
 14 Aᵀ{0, 3, 1, 2} [CPU9] CPU9 false true false
 15 const 0 [] CPU10 false false false
 16 >= true [CPU9 CPU10] CPU11 false false false
 17 ⊙ false [CPU9 CPU11] CPU9 false true false
 18 MaxPool{100, 32, 28, 28}(kernel: (2, 2), pad: (0, 0), stride: (2, 
                             2)) [CPU9] CPU12 false false false
 19 0(0, 1) - (100, 32, 14, 14) [] CPU13 false false false
 20 const 0.2 [] CPU14 false false false
 21 > true [CPU13 CPU14] CPU15 false false false
 22 ⊙ false [CPU12 CPU15] CPU12 false true false
 23 const 5 [] CPU16 false false false
 24 ÷ false [CPU12 CPU16] CPU12 false true false
 25 im2col<(3,3), (1, 1), (1,1) (1, 1)> [CPU12] CPU17 false false false
 26 Reshape(64, 288) [CPU3] CPU18 false false false
 27 Reshape(19600, 288) [CPU17] CPU17 false true false
 28 Alloc Matrix float64(19600, 64) CPU19
 29 A × Bᵀ [CPU17 CPU18] CPU19 true false true
 30 DoWork
 31 Reshape(100, 14, 14, 64) [CPU19] CPU19 false true false
 32 Aᵀ{0, 3, 1, 2} [CPU19] CPU19 false true false
 33 >= true [CPU19 CPU10] CPU20 false false false
 34 ⊙ false [CPU19 CPU20] CPU19 false true false
 35 MaxPool{100, 64, 14, 14}(kernel: (2, 2), pad: (0, 0), stride: (2, 
                             2)) [CPU19] CPU21 false false false
 36 0(0, 1) - (100, 64, 7, 7) [] CPU22 false false false
 37 > true [CPU22 CPU14] CPU23 false false false
 38 ⊙ false [CPU21 CPU23] CPU21 false true false
 39 ÷ false [CPU21 CPU16] CPU21 false true false
 40 im2col<(3,3), (1, 1), (1,1) (1, 1)> [CPU21] CPU24 false false false
 41 Reshape(128, 576) [CPU4] CPU25 false false false
 42 Reshape(4900, 576) [CPU24] CPU24 false true false
 43 Alloc Matrix float64(4900, 128) CPU26
 44 A × Bᵀ [CPU24 CPU25] CPU26 true false true
 45 DoWork
 46 Reshape(100, 7, 7, 128) [CPU26] CPU26 false true false
 47 Aᵀ{0, 3, 1, 2} [CPU26] CPU26 false true false
 48 >= true [CPU26 CPU10] CPU27 false false false
 49 ⊙ false [CPU26 CPU27] CPU26 false true false
 50 MaxPool{100, 128, 7, 7}(kernel: (2, 2), pad: (0, 0), stride: (2, 
                            2)) [CPU26] CPU28 false false false
 51 Reshape(100, 1152) [CPU28] CPU28 false true false
 52 0(0, 1) - (100, 1152) [] CPU29 false false false
 53 > true [CPU29 CPU14] CPU30 false false false
 54 ⊙ false [CPU28 CPU30] CPU28 false true false
 55 ÷ false [CPU28 CPU16] CPU28 false true false
 56 Alloc Matrix float64(100, 625) CPU31
 57 A × B [CPU28 CPU5] CPU31 true false true
 58 DoWork
 59 >= true [CPU31 CPU10] CPU32 false false false
 60 ⊙ false [CPU31 CPU32] CPU31 false true false
 61 0(0, 1) - (100, 625) [] CPU33 false false false
 62 const 0.55 [] CPU34 false false false
 63 > true [CPU33 CPU34] CPU35 false false false
 64 ⊙ false [CPU31 CPU35] CPU31 false true false
 65 const 1.8181818181818181 [] CPU36 false false false
 66 ÷ false [CPU31 CPU36] CPU31 false true false
 67 Alloc Matrix float64(100, 10) CPU37
 68 A × B [CPU31 CPU6] CPU37 true false true
 69 DoWork
 70 exp [CPU37] CPU37 false true false
 71 Σ[1] [CPU37] CPU38 false false false
 72 SizeOf=10 [CPU37] CPU39 false false false
 73 Repeat[1] [CPU38 CPU39] CPU40 false false false
 74 ÷ false [CPU37 CPU40] CPU37 false true false
 75 ⊙ false [CPU37 CPU1] CPU37 false true false
 76 Σ[0 1] [CPU37] CPU41 false false false
 77 SizeOf=100 [CPU37] CPU42 false false false
 78 SizeOf=10 [CPU37] CPU43 false false false
 79 ⊙ false [CPU42 CPU43] CPU44 false false false
 80 ÷ false [CPU41 CPU44] CPU45 false false false
 81 neg [CPU45] CPU46 false false false
 82 DoWork
 83 Read CPU46 into 0xc43ca407d0
 84 Free CPU0
 Args: 11 | CPU Memories: 47 | GPU Memories: 0
 CPU Mem: 133594448 | GPU Mem []
 ```

Printing the program allows you to actually have a feel for the complexity of the neural network. At 84 instructions, the convnet is among the simpler programs I've seen. However, there are quite a few expensive operations, which would inform us quite a bit about how long each run would take. This output also tells us roughly how many bytes of memory will be used: 133594448 bytes, or 133 megabytes.

Now it's time to talk about, gradient descent. Gorgonia comes with a number of gradient descent solvers. For this project, we'll be using the RMSProp algorithm. So, we create a solver by calling solver := gorgonia.NewRMSPropSolver(gorgonia.WithBatchSize(float64(bs))). Because we are planning to perform our operations in batches, we should correct the solver by providing it the batch size, lest the solver overshoots its target.

To run the neural network, we simply run it for a number of epochs (which is passed in as an argument to the program):

    batches := numExamples / bs
    log.Printf("Batches %d", batches)
    bar := pb.New(batches)
    bar.SetRefreshRate(time.Second)
    bar.SetMaxWidth(80)

    for i := 0; i < *epochs; i++ {
        bar.Prefix(fmt.Sprintf("Epoch %d", i))
        bar.Set(0)
        bar.Start()
        for b := 0; b < batches; b++ {
            start := b * bs
            end := start + bs
            if start >= numExamples {
                break
            }
            if end > numExamples {
                end = numExamples
            }

            var xVal, yVal tensor.Tensor
            if xVal, err = inputs.Slice(sli{start, end}); err != nil {
                log.Fatal("Unable to slice x")
            }

            if yVal, err = targets.Slice(sli{start, end}); err != nil {
                log.Fatal("Unable to slice y")
            }
            if err = xVal.(*tensor.Dense).Reshape(bs, 1, 28, 28); err != nil {
                log.Fatalf("Unable to reshape %v", err)
            }

            gorgonia.Let(x, xVal)
            gorgonia.Let(y, yVal)
            if err = vm.RunAll(); err != nil {
                log.Fatalf("Failed at epoch  %d: %v", i, err)
            }
            solver.Step(gorgonia.NodesToValueGrads(m.learnables()))
            vm.Reset()
            bar.Increment()
        }
        log.Printf("Epoch %d | cost %v", i, costVal)
    }

Because I was feeling a bit fancy, I decided to add a progress bar to track the progress. To do so, I'm using cheggaaa/pb.v1 as the library to draw a progress bar. To install it, simply run go get gopkg.in/cheggaaa/pb.v1 and to use it, simply add import "gopkg.in/cheggaaa/pb.v1 in the imports.

The rest is fairly straightforward. From the training dataset, we slice out a small portion of it (specifically, we slice out bs rows). Because our program takes a rank-4 tensor as an input, the data has to be reshaped to xVal.(*tensor.Dense).Reshape(bs, 1, 28, 28).

Finally, we feed the value into the function by using gorgonia.Let. Where gorgonia.Read reads a value out from the execution environment, gorgonia.Let puts a value into the execution environment. After which, vm.RunAll() executes the program, evaluating the mathematical function. As a programmed and intentional side-effect, each call to vm.RunAll() will populate the cost value into costVal.

Once the equation has been evaluated, this also means that the variables of the equation are now ready to be updated. As such, we use solver.Step(gorgonia.NodesToValueGrads(m.learnables())) to perform the actual gradient updates. After this, vm.Reset() is called to reset the VM state, ready for its next iteration.

Gorgonia in general, is pretty efficient. In the current version as this book was written, it managed to use all eight cores in my CPU as shown here:

Table of Contents for Running the neural network

Create new playlist

Sign In

Sign Up

Table of Contents for
Running the neural network