Getting the data

The data for the MNIST data can be found in the repository for this chapter. In its original form, it's not in a standard image format. So, we will need to parse the data into an acceptable format.

The dataset comes in two parts: labels and images. So here are a couple of functions, designed to read and parse the MNIST file:

// Image holds the pixel intensities of an image.
// 255 is foreground (black), 0 is background (white).
type RawImage []byte

// Label is a digit label in 0 to 9
type Label uint8


const numLabels = 10
const pixelRange = 255

const (
  imageMagic = 0x00000803
  labelMagic = 0x00000801
  Width = 28
  Height = 28
)

func readLabelFile(r io.Reader, e error) (labels []Label, err error) {
  if e != nil {
    return nil, e
  }

  var magic, n int32
  if err = binary.Read(r, binary.BigEndian, &magic); err != nil {
    return nil, err
  }
  if magic != labelMagic {
    return nil, os.ErrInvalid
  }
  if err = binary.Read(r, binary.BigEndian, &n); err != nil {
    return nil, err
  }
  labels = make([]Label, n)
  for i := 0; i < int(n); i++ {
    var l Label
    if err := binary.Read(r, binary.BigEndian, &l); err != nil {
      return nil, err
    }
    labels[i] = l
  }
  return labels, nil
}

func readImageFile(r io.Reader, e error) (imgs []RawImage, err error) {
  if e != nil {
    return nil, e
  }

  var magic, n, nrow, ncol int32
  if err = binary.Read(r, binary.BigEndian, &magic); err != nil {
    return nil, err
  }
  if magic != imageMagic {
    return nil, err /*os.ErrInvalid*/
  }
  if err = binary.Read(r, binary.BigEndian, &n); err != nil {
    return nil, err
  }
  if err = binary.Read(r, binary.BigEndian, &nrow); err != nil {
    return nil, err
  }
  if err = binary.Read(r, binary.BigEndian, &ncol); err != nil {
    return nil, err
  }
  imgs = make([]RawImage, n)
  m := int(nrow * ncol)
  for i := 0; i < int(n); i++ {
    imgs[i] = make(RawImage, m)
    m_, err := io.ReadFull(r, imgs[i])
    if err != nil {
      return nil, err
    }
    if m_ != int(m) {
      return nil, os.ErrInvalid
    }
  }
  return imgs, nil
}

First, the functions read the file from a io.Reader and reading a set of int32s. These are the metadata of the file. The first int32 is a magic number that is used to indicate if a file is a labels file or a file of images. n indicates the number of images or labels the file contains. nrow and ncol are metadata that exists in the file, and indicates how many rows/columns there are in each image.

Zooming into the readImageFile function, we can see that after all the metadata has been read, we know to create a []RawImage of size n. The image format used in the MNIST dataset is essentially a slice of 784 bytes (28 columns and 28 rows). Each byte therefore represents a pixel in the image. The value of each byte represents how bright the pixel is, ranging from 0 to 255:

The preceding image is an example of an MNIST image blown up. At the top-left corner, the index of the pixel in a flat slice is 0. At the top right corner, the index of the pixel in a flat slice is 27. At the bottom-left corner, the index of the pixel in a flat slice is 755. And, finally, at the bottom-right corner, the index is 727. This is an important concept to keep in mind: A 2D image can be represented as a 1D slice.

Table of Contents for Getting the data

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting the data