What is a tensor?

Fundamentally tensors are very much like vectors. The idea is stolen from physics. Imagine pushing a box on a two-dimensional plane. If you push the box with a force of 1 Newton along the x axis, there is no force applied to the y axis. You would write the vector as such: [1, 0]. If the box were moving along the x axis with at a speed of 10 km/h and along the y axis with a speed of 2 km/h, you would write the vector as such: [10, 2]. Note that they are unitless: the first example was a vector of Newtons, the second example was a vector with km/h as its units.

In short, it is a representation of something (a force, a speed, or anything with magnitude and direction) applied to a direction. From this idea, computer science co-opted the name vector. But in Go, they're called a slice.

So what is a tensor? Eliding a lot of the details but without a loss of generality, a tensor is like a vector. Except multidimensional. Imagine if you were to describe two speeds along the plane (imagine a silly putty being stretched in two directions at different speeds): [1, 0] and [10, 2]. You would write it as such:
⎡ 1 0⎤
⎣10 2⎦

This is also called a matrix (when it's two-dimensional). It's called a 3-Tensor when it's three-dimensional, 4-Tensor when its four-dimensional, and so on and so forth. Note that if you have a third speed (that is, the silly putty being stretched in a third direction), you wouldn't have a 3-Tensor. Instead you'd still have a matrix, with three rows.

To visualize a 3-Tensor while building on the previous example, imagine if you will, that the two directions that the silly putty was being pulled at was a slice in time. Then imagine another slice in time where the same silly putty is pulled in two directions again. So now you'd have two matrices. A 3-Tensor is what happens when you imagine stacking these matrices together.

To convert a []RawImage to a tensor.Tensor, the code is as follows:

func prepareX(M []RawImage) (retVal tensor.Tensor) {
  rows := len(M)
  cols := len(M[0])

  b := make([]float64, 0, rows*cols)
  for i := 0; i < rows; i++ {
    for j := 0; j < len(M[i]); j++ {
      b = append(b, pixelWeight(M[i][j]))
    }
  }
  return tensor.New(tensor.WithShape(rows, cols), tensor.WithBacking(b))
}

Gorgonia may be a bit confusing to beginners. So let me explain the code line by line. But first, you must be aware that like Gonum matrices, Gorgonia tensors, no matter how many dimensions, are also internally represented as a flat slice. Gorgonia tensors are a little more flexible in the sense that they can take more than a flat slice of float64 ( it takes slices of other types too). This is called the backing slice or array. This is one of the fundamental reasons why performing linear algebra operations is more efficient in Gonum and Gorgonia than using plain [][]float64.

rows := len(M) and cols := len(M[0]) are pretty self explanatory. We want to know the rows (that is, number of images) and columns (the number of pixels in the image).

b := make([]float64, 0, rows*cols) creates the backing array with a capacity of rows * cols. This backing array is called a backing array because throughout the lifetime of b, the size will not change. Here we start with a length of 0 because we want to use the append function later on.

a := make([]T, 0, capacity) is a good pattern to use to pre-allocate a slice. Consider a snippet that looks like this:
a := make([]int, 0)
for i := 0; i < 10; i++ {
a = append(a, i)
}
During the first call to append, the Go runtime will look at the capacity of a, and find it's 0. So it will allocate some memory to create a slice of size 1. Then on the second call to append, the Go runtime will look at the capacity of a and find that it's 1, which is insufficient. So it will allocate twice the current capacity of the slice. On the fourth iteration, it will find the capacity of a is insufficient for appending and once again allocates twice the current capacity of the slice.

The thing about allocation is that it is an expensive operation. Occasionally the Go runtime may not only have to allocate memory, but copy the memory to a new location. This adds to the cost of appending to a slice.

So instead, if we know the capacity of the slice upfront, it's best to allocate all of it in one shot. We can specify the length, but it's often a cause of indexing errors. So my recommendation is to allocate with the capacity and a length of 0. That way, you can safely use append without having to worry about indexing errors.

After creating a backing slice, we simply populate the backing slice with the values of the pixel, converted to a float64 using the pixelWeight function that we described earlier.

Finally, we call tensor.New(tensor.WithShape(rows, cols), tensor.WithBacking(b)), which returns a *tensor.Dense. The tensor.WithShape(rows, cols) construction option creates a *tensor.Dense with the specified shape while tensor.WithBacking(b) simply uses the already pre-allocated and pre-filled b as a backing slice.

The tensor library will simply reuse the entire backing array so that fewer allocations are made. What this means is you have to be careful when handling b. Modifying the contents of b afterward will change the content in the tensor.Dense as well. Given that b was created in the prepareX function, once the function has returned, there's no way to modify the contents of b. This is a good way to prevent accidental modification.

Table of Contents for What is a tensor?

Create new playlist

Sign In

Sign Up

Table of Contents for
What is a tensor?