Preprocessing

What we are going to do next is to "whiten" our data using a Zero Phase Component Analysis (ZCA). The definitions of ZCA is beyond the scope of this chapter, but briefly, ZCA is very much like Principal Component Analysis (PCA). In our 784-pixel slice, there is a high probability that the pixels are correlated with one another. What PCA does is it finds the set of pixels that are uncorrelated with one another. It does this by looking at all the images at once and figuring out how each column correlates with one another:

func zca(data tensor.Tensor) (retVal tensor.Tensor, err error) {
  var dataᵀ, data2, sigma tensor.Tensor
  data2 = data.Clone().(tensor.Tensor)

  if err := minusMean(data2); err != nil {
    return nil, err
  }
  if dataᵀ, err = tensor.T(data2); err != nil {
    return nil, err
  }

  if sigma, err = tensor.MatMul(dataᵀ, data2); err != nil {
    return nil, err
  }

  cols := sigma.Shape()[1]
  if _, err = tensor.Div(sigma, float64(cols-1), tensor.UseUnsafe()); err != nil {
    return nil, err
  }

  s, u, _, err := sigma.(*tensor.Dense).SVD(true, true)
  if err != nil {
    return nil, err
  }

  var diag, uᵀ, tmp tensor.Tensor
  if diag, err = s.Apply(invSqrt(0.1), tensor.UseUnsafe()); err != nil {
    return nil, err
  }
  diag = tensor.New(tensor.AsDenseDiag(diag))

  if uᵀ, err = tensor.T(u); err != nil {
    return nil, err
  }

  if tmp, err = tensor.MatMul(u, diag); err != nil {
    return nil, err
  }

  if tmp, err = tensor.MatMul(tmp, uᵀ); err != nil {
    return nil, err
  }

  if err = tmp.T(); err != nil {
    return nil, err
  }

  return tensor.MatMul(data, tmp)
}

func invSqrt(epsilon float64) func(float64) float64 {
  return func(a float64) float64 {
    return 1 / math.Sqrt(a+epsilon)
  }
}

This is a pretty large chunk of code. Let's go through the code. But first, let's understand the key ideas behind ZCA before going through the code that implements it..

First, recall what PCA does: it finds the set of inputs (columns and pixels, to be used interchangeably) that are least correlated with one another. What ZCA does is then to take the principal components found and multiply them by the inputs to transform the inputs so that they become less correlated with one another.

First, we want to subtract the row mean. To do that, we first make a clone of the data (we'll see why later), then subtract the mean with this function:

func minusMean(a tensor.Tensor) error {
  nat, err := native.MatrixF64(a.(*tensor.Dense))
  if err != nil {
    return err
  }
  for _, row := range nat {
    mean := avg(row)
    vecf64.Trans(row, -mean)
  }

  rows, cols := a.Shape()[0], a.Shape()[1]

  mean := make([]float64, cols)
  for j := 0; j < cols; j++ {
    var colMean float64
    for i := 0; i < rows; i++ {
      colMean += nat[i][j]
    }
    colMean /= float64(rows)
    mean[j] = colMean
  }

  for _, row := range nat {
    vecf64.Sub(row, mean)
  }

  return nil
}

After all the preceding spiel about efficiency of a flat slice versus a [][]float64, what I am going to suggest next is going to sound counter-intuitive. But please bear with me. native.MatrixF64 takes a *tensor.Dense and returns a [][]float64, which we call nat. nat shares the same allocation as the tensor a. No extra allocations are made, and any modification made to nat will show up in a. In this scenario, we should treat [][]float64 as an easy way to iterate through the values in the tensor. This can be seen here:

  for j := 0; j < cols; j++ {
    var colMean float64
    for i := 0; i < rows; i++ {
      colMean += nat[i][j]
    }
    colMean /= float64(rows)
    mean[j] = colMean
  }

Like in the visualize function, we first iterate through the columns, albeit for a different purpose. We want to find the mean of each column. We then store the mean of each column in the mean variable. This allows us to subtract the column mean:

  for _, row := range nat {
    vecf64.Sub(row, mean)
  }

This block of code uses the vecf64 package that comes with Gorgonia to subtract a slice from another slice, element-wise. It's rather the same as the following:

  for _, row := range nat {
    for j := range row {
      row[j] -= mean[j]
    }
  }

The only real reason to use vecf64 is that it's optimized to perform the operation with SIMD instructions: instead of doing row[j] -= mean[j] one at a time, it performs row[j] -= mean[j], row[j+1] -= mean[j+1], row[j+2] -= mean[j+2], and row[j+3] -= mean[j+3] simultaneously.

After we've subtracted the mean, we find its transpose and make a copy of it:

  if dataᵀ, err = tensor.T(data2); err != nil {
    return nil, err
  }

Typically, you would find the transpose of a tensor.Tensor by using something like data2.T(). But this does not return a copy of it. Instead, the tensor.T function clones the data structure, then performs a transposition on it. The reason for that? We're about to use both the tranpose and data2 to find Sigma (more on matrix multiplication will be expounded in the next chapter):

  var sigma tensor.Tensor
  if sigma, err = tensor.MatMul(dataᵀ, data2); err != nil {
    return nil, err
  }

After we have found sigma, we divide it by the number of columns-1. This provides an unbiased estimator. The tensor.UseUnsafe option is used to indicate that the result should be stored back into the sigma tensor:

  cols := sigma.Shape()[1]
  if _, err = tensor.Div(sigma, float64(cols-1), tensor.UseUnsafe()); err != nil {
    return nil, err
  }

All this is done so that we can perform an SVD on sigma:

  s, u, _, err := sigma.(*tensor.Dense).SVD(true, true)
  if err != nil {
    return nil, err
  }

Singular Value Decomposition, if you are not familiar with it, is a method among many that breaks down a matrix into its constituents. Why would you want to do so? For one, it makes parts of calculations of some things easier. What it does is to factorize , a (M, N) matrix into a (M, N) matrix called , a (M,M) matrix called , and a (N, N) matrix called . To reconstruct A, the formula is simply:

The decomposed parts will then be used. In our case, we're not particularly interested about the right singular values , so we'll ignore it for now. The decomposed parts are simply used to transform the images, which can be found in the tailend of the function body.

After preprocessing, we can once more visualize the first 100 or so images:

Table of Contents for Preprocessing

Create new playlist

Sign In

Sign Up

Table of Contents for
Preprocessing