The structure of a neural network

The purpose of this example is clearly not to build a cutting-edge computer vision system but, rather, to demonstrate how to use these fundamental operations (and how Gorgonia handles them) in the context of a parameterized function where the parameters are learned over time. The key goal of this section is to understand the idea of a network that learns. This learning really just means the continuous, deliberate re-parameterization of the network (updating the weights). This is done by an optimization method that is, essentially, a small amount of code representing some basic undergraduate-level calculus.

The Sigmoid function (and activation functions more generally), Stochastic Gradient Descent (SGD), and backpropagation will each receive detailed treatment in later sections of this chapter. For now, we will talk about them in the context of the code; that is, where and how they are used and what their role is in the function we are computing.

By the time you reach the end of this book, or if you are an experienced ML practitioner, the following will look like an absurdly simple first step into the world of neural network architectures. But if this is your first rodeo, pay close attention. All of the fundamentals that make the magic happen are here.

What is the network made of? The following are the major components of our toy example neural network:

  • Input data: This is a 4 x 3 matrix.
  • Validation data: This is a 1 x 4 column vector, or in reality, a four-rowed matrix with one column. This is expressed in Gorgonia as WithShape(4,1).
  • An activation (Sigmoid) function: This introduces nonlinearity into our network and the function we are learning.
  • A synapse: This is also called a trainable weight, which is the key parameter of the network we will be optimizing with SGD.

Each of these components and their associated operations are represented as nodes on our computational graph. As we move through the explanation of what the network is doing, we will generate visualizations of the graph using the techniques we learned in Chapter 1, Introduction to Deep Learning in Go.

We are also going to over-engineer our network a little. What does this mean? Consider the following chunk of code:

type nn struct {
g *ExprGraph
w0, w1 *Node

pred *Node
}

We are embedding the key components of the network in a struct named nn. This not only makes our code readable, but it scales well when we want to perform our optimization process (SGD/backpropagation) on a number of weights for each layer of a deep (many-layered) network. As you can see, beyond the weights for each layer, we also have a node representing the prediction our network makes, as well as *ExprGraph itself.

Our network has two layers. These are computed during the forward pass of our network. A forward pass represents all of the numerical transformations we want to perform on the value nodes in our computation graph.

Specifically, we have the following:

  • l0: The input matrix, our X
  • w0: The trainable parameter, our network weight that will be optimized by the SGD algorithm
  • l1: The value of the Sigmoid applied to the dot product of l0 and w0
  • pred: A node that represents the prediction of the network, fed back to the appropriate field in nn struct

So, what are we aiming to achieve here?

We want to build a system that learns a function that best models the columnar sequence of 0, 0, 1, 1. Time to dive in!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.186.79