All expressions are graphs

Now we can finally return to the preceding example.

Our problem, if you recall, is that we had to specify the neural network twice: once for prediction and once for learning purposes. We then refactored the program so that we don't have to specify the network twice. Additionally, we had to manually write out the expression for the backpropagation. This is error prone, especially when dealing with larger neural networks like the one we're about to build in this chapter. Is there a better way? The answer is yes.

Once we understand and fully internalize that neural networks are essentially mathematical expressions, we can take the learning's from tensors, and model a neural network where the entire neural network is a flow of tensors.

Recall that tensors can only be defined in the presence of transformation; then, any operation that transforms tensor(s), used in concert with data structures that hold data are tensors. Also, recall that computer programs can be represented as abstract syntax trees. Mathematical expressions can be represented as a program. Therefore, mathematical expressions can also be represented as an abstract syntax tree.

More accurate, however, is that mathematical expressions can be expressed as a graph; a directed acyclic graph, to be specific. We call this the expression graph.

This distinction matters. Trees cannot share nodes. Graphs can. Let's consider, for example, the following mathematical expression:

Here are the representations as a graph and as a tree:

On the left, we have a directed acyclic graph, and on the right, we have a tree. Note that in the tree variant of the mathematical equation, there are repeat nodes. Both are rooted at . The arrow should be read as depends on depends on two other nodes,  and , and so on and so forth.

Both the graph and tree are valid representations of the same mathematical equation, of course.

Why bother representing a mathematical expression as a graph or a tree? Recall that an abstract syntax tree represents a computation. If a mathematical expression, represented as a graph or a tree, has a shared notion of computation, then it also represents an abstract syntax tree.

Indeed, we can take each node in the graph or tree, and perform a computation on it. If each node is a representation of a computation, then logic holds that fewer nodes means faster computations (and less memory usage). Therefore, we should prefer to use the directed acyclic graph representation.

And now we come to the major benefit of representing a mathematical expression as a graph: we get differentiation for free.

If you recall from the previous chapter, backpropagation is essentially differentiating the cost with regards to the inputs. The gradients, once calculated, can then be used to update the values of the weights themselves. Having a graph structure, we wouldn't have to write the backpropagation parts. Instead, if we have a virtual machine that executes the graph, starting at the leaves and moving toward the root, the virtual machine can automatically perform differentiation on the values as it traverses the graph from leaf to root.

Alternatively, if we don't want to do automatic differentiation, we can also perform symbolic differentiation by manipulating the graph in the same way that we manipulated the AST in the What is programming section, by adding and coalescing nodes.

In this way, we can now shift our view of a neural network to this:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.14.132