Running the neural network

Observe that up to this point, we've merely described the computations we need to perform. The neural network doesn't actually run; this is simply a description on the neural network to run.

We need to be able to evaluate the mathematical expression. In order to do so, we need to compile the expression into a program that can be executed. Here's the code to do it:

    vm := gorgonia.NewTapeMachine(g, 
gorgonia.WithPrecompiled(prog, locMap),
gorgonia.BindDualValues(m.learnables()...))
solver := gorgonia.NewRMSPropSolver(gorgonia.WithBatchSize(float64(bs)))
defer vm.Close()

It's not strictly necessary to call gorgonia.Compile(g). This was done for pedagogical reasons, to showcase that the mathematical expression can indeed be compiled down into an assembly-like program. In production systems, I often just do something like this: vm := gorgonia.NewTapeMachine(g, gorgonia.BindDualValues(m.learnables()...)).

There are two provided vm types in Gorgonia, each representing different modes of computation. In this project, we're merely using NewTapeMachine to get a *gorgonia.tapeMachine. The function to create a vm takes many options, and the BindDualValues option simply binds the gradients of each of the variables in the models to the variables themselves. This allows for cheaper gradient descent.

Lastly, note that a VM is a resource. You should think of a VM as if it were an external CPU, a computing resource. It is good practice to close any external resources after we use them and, fortunately, Go has a very convenient way of handling cleanups: defer vm.Close().

Before we move on to talk about gradient descent, here's what the compiled program looks like, in pseudo-assembly:


Instructions:
0 loadArg 0 (x) to CPU0
1 loadArg 1 (y) to CPU1
2 loadArg 2 (w0) to CPU2
3 loadArg 3 (w1) to CPU3
4 loadArg 4 (w2) to CPU4
5 loadArg 5 (w3) to CPU5
6 loadArg 6 (w4) to CPU6
7 im2col<(3,3), (1, 1), (1,1) (1, 1)> [CPU0] CPU7 false false false
8 Reshape(32, 9) [CPU2] CPU8 false false false
9 Reshape(78400, 9) [CPU7] CPU7 false true false
10 Alloc Matrix float64(78400, 32) CPU9
11 A × Bᵀ [CPU7 CPU8] CPU9 true false true
12 DoWork
13 Reshape(100, 28, 28, 32) [CPU9] CPU9 false true false
14 Aᵀ{0, 3, 1, 2} [CPU9] CPU9 false true false
15 const 0 [] CPU10 false false false
16 >= true [CPU9 CPU10] CPU11 false false false
17 ⊙ false [CPU9 CPU11] CPU9 false true false
18 MaxPool{100, 32, 28, 28}(kernel: (2, 2), pad: (0, 0), stride: (2,
2)) [CPU9] CPU12 false false false
19 0(0, 1) - (100, 32, 14, 14) [] CPU13 false false false
20 const 0.2 [] CPU14 false false false
21 > true [CPU13 CPU14] CPU15 false false false
22 ⊙ false [CPU12 CPU15] CPU12 false true false
23 const 5 [] CPU16 false false false
24 ÷ false [CPU12 CPU16] CPU12 false true false
25 im2col<(3,3), (1, 1), (1,1) (1, 1)> [CPU12] CPU17 false false false
26 Reshape(64, 288) [CPU3] CPU18 false false false
27 Reshape(19600, 288) [CPU17] CPU17 false true false
28 Alloc Matrix float64(19600, 64) CPU19
29 A × Bᵀ [CPU17 CPU18] CPU19 true false true
30 DoWork
31 Reshape(100, 14, 14, 64) [CPU19] CPU19 false true false
32 Aᵀ{0, 3, 1, 2} [CPU19] CPU19 false true false
33 >= true [CPU19 CPU10] CPU20 false false false
34 ⊙ false [CPU19 CPU20] CPU19 false true false
35 MaxPool{100, 64, 14, 14}(kernel: (2, 2), pad: (0, 0), stride: (2,
2)) [CPU19] CPU21 false false false
36 0(0, 1) - (100, 64, 7, 7) [] CPU22 false false false
37 > true [CPU22 CPU14] CPU23 false false false
38 ⊙ false [CPU21 CPU23] CPU21 false true false
39 ÷ false [CPU21 CPU16] CPU21 false true false
40 im2col<(3,3), (1, 1), (1,1) (1, 1)> [CPU21] CPU24 false false false
41 Reshape(128, 576) [CPU4] CPU25 false false false
42 Reshape(4900, 576) [CPU24] CPU24 false true false
43 Alloc Matrix float64(4900, 128) CPU26
44 A × Bᵀ [CPU24 CPU25] CPU26 true false true
45 DoWork
46 Reshape(100, 7, 7, 128) [CPU26] CPU26 false true false
47 Aᵀ{0, 3, 1, 2} [CPU26] CPU26 false true false
48 >= true [CPU26 CPU10] CPU27 false false false
49 ⊙ false [CPU26 CPU27] CPU26 false true false
50 MaxPool{100, 128, 7, 7}(kernel: (2, 2), pad: (0, 0), stride: (2,
2)) [CPU26] CPU28 false false false
51 Reshape(100, 1152) [CPU28] CPU28 false true false
52 0(0, 1) - (100, 1152) [] CPU29 false false false
53 > true [CPU29 CPU14] CPU30 false false false
54 ⊙ false [CPU28 CPU30] CPU28 false true false
55 ÷ false [CPU28 CPU16] CPU28 false true false
56 Alloc Matrix float64(100, 625) CPU31
57 A × B [CPU28 CPU5] CPU31 true false true
58 DoWork
59 >= true [CPU31 CPU10] CPU32 false false false
60 ⊙ false [CPU31 CPU32] CPU31 false true false
61 0(0, 1) - (100, 625) [] CPU33 false false false
62 const 0.55 [] CPU34 false false false
63 > true [CPU33 CPU34] CPU35 false false false
64 ⊙ false [CPU31 CPU35] CPU31 false true false
65 const 1.8181818181818181 [] CPU36 false false false
66 ÷ false [CPU31 CPU36] CPU31 false true false
67 Alloc Matrix float64(100, 10) CPU37
68 A × B [CPU31 CPU6] CPU37 true false true
69 DoWork
70 exp [CPU37] CPU37 false true false
71 Σ[1] [CPU37] CPU38 false false false
72 SizeOf=10 [CPU37] CPU39 false false false
73 Repeat[1] [CPU38 CPU39] CPU40 false false false
74 ÷ false [CPU37 CPU40] CPU37 false true false
75 ⊙ false [CPU37 CPU1] CPU37 false true false
76 Σ[0 1] [CPU37] CPU41 false false false
77 SizeOf=100 [CPU37] CPU42 false false false
78 SizeOf=10 [CPU37] CPU43 false false false
79 ⊙ false [CPU42 CPU43] CPU44 false false false
80 ÷ false [CPU41 CPU44] CPU45 false false false
81 neg [CPU45] CPU46 false false false
82 DoWork
83 Read CPU46 into 0xc43ca407d0
84 Free CPU0
Args: 11 | CPU Memories: 47 | GPU Memories: 0
CPU Mem: 133594448 | GPU Mem []
```

Printing the program allows you to actually have a feel for the complexity of the neural network. At 84 instructions, the convnet is among the simpler programs I've seen. However, there are quite a few expensive operations, which would inform us quite a bit about how long each run would take. This output also tells us roughly how many bytes of memory will be used: 133594448 bytes, or 133 megabytes.

Now it's time to talk about, gradient descent. Gorgonia comes with a number of gradient descent solvers. For this project, we'll be using the RMSProp algorithm. So, we create a solver by calling solver := gorgonia.NewRMSPropSolver(gorgonia.WithBatchSize(float64(bs))). Because we are planning to perform our operations in batches, we should correct the solver by providing it the batch size, lest the solver overshoots its target.

To run the neural network, we simply run it for a number of epochs (which is passed in as an argument to the program):

    batches := numExamples / bs
log.Printf("Batches %d", batches)
bar := pb.New(batches)
bar.SetRefreshRate(time.Second)
bar.SetMaxWidth(80)

for i := 0; i < *epochs; i++ {
bar.Prefix(fmt.Sprintf("Epoch %d", i))
bar.Set(0)
bar.Start()
for b := 0; b < batches; b++ {
start := b * bs
end := start + bs
if start >= numExamples {
break
}
if end > numExamples {
end = numExamples
}

var xVal, yVal tensor.Tensor
if xVal, err = inputs.Slice(sli{start, end}); err != nil {
log.Fatal("Unable to slice x")
}

if yVal, err = targets.Slice(sli{start, end}); err != nil {
log.Fatal("Unable to slice y")
}
if err = xVal.(*tensor.Dense).Reshape(bs, 1, 28, 28); err != nil {
log.Fatalf("Unable to reshape %v", err)
}

gorgonia.Let(x, xVal)
gorgonia.Let(y, yVal)
if err = vm.RunAll(); err != nil {
log.Fatalf("Failed at epoch %d: %v", i, err)
}
solver.Step(gorgonia.NodesToValueGrads(m.learnables()))
vm.Reset()
bar.Increment()
}
log.Printf("Epoch %d | cost %v", i, costVal)
}

Because I was feeling a bit fancy, I decided to add a progress bar to track the progress. To do so, I'm using cheggaaa/pb.v1 as the library to draw a progress bar. To install it, simply run go get gopkg.in/cheggaaa/pb.v1 and to use it, simply add import "gopkg.in/cheggaaa/pb.v1 in the imports.

The rest is fairly straightforward. From the training dataset, we slice out a small portion of it (specifically, we slice out bs rows). Because our program takes a rank-4 tensor as an input, the data has to be reshaped to xVal.(*tensor.Dense).Reshape(bs, 1, 28, 28).

Finally, we feed the value into the function by using gorgonia.Let. Where gorgonia.Read reads a value out from the execution environment, gorgonia.Let puts a value into the execution environment. After which, vm.RunAll() executes the program, evaluating the mathematical function. As a programmed and intentional side-effect, each call to vm.RunAll() will populate the cost value into costVal.

Once the equation has been evaluated, this also means that the variables of the equation are now ready to be updated. As such, we use solver.Step(gorgonia.NodesToValueGrads(m.learnables())) to perform the actual gradient updates. After this, vm.Reset() is called to reset the VM state, ready for its next iteration.

Gorgonia in general, is pretty efficient. In the current version as this book was written, it managed to use all eight cores in my CPU as shown here:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.91.254