Getting the data

The data for the MNIST data can be found in the repository for this chapter. In its original form, it's not in a standard image format. So, we will need to parse the data into an acceptable format.

The dataset comes in two parts: labels and images. So here are a couple of functions, designed to read and parse the MNIST file:

// Image holds the pixel intensities of an image.
// 255 is foreground (black), 0 is background (white).
type RawImage []byte

// Label is a digit label in 0 to 9
type Label uint8


const numLabels = 10
const pixelRange = 255

const (
imageMagic = 0x00000803
labelMagic = 0x00000801
Width = 28
Height = 28
)

func readLabelFile(r io.Reader, e error) (labels []Label, err error) {
if e != nil {
return nil, e
}

var magic, n int32
if err = binary.Read(r, binary.BigEndian, &magic); err != nil {
return nil, err
}
if magic != labelMagic {
return nil, os.ErrInvalid
}
if err = binary.Read(r, binary.BigEndian, &n); err != nil {
return nil, err
}
labels = make([]Label, n)
for i := 0; i < int(n); i++ {
var l Label
if err := binary.Read(r, binary.BigEndian, &l); err != nil {
return nil, err
}
labels[i] = l
}
return labels, nil
}

func readImageFile(r io.Reader, e error) (imgs []RawImage, err error) {
if e != nil {
return nil, e
}

var magic, n, nrow, ncol int32
if err = binary.Read(r, binary.BigEndian, &magic); err != nil {
return nil, err
}
if magic != imageMagic {
return nil, err /*os.ErrInvalid*/
}
if err = binary.Read(r, binary.BigEndian, &n); err != nil {
return nil, err
}
if err = binary.Read(r, binary.BigEndian, &nrow); err != nil {
return nil, err
}
if err = binary.Read(r, binary.BigEndian, &ncol); err != nil {
return nil, err
}
imgs = make([]RawImage, n)
m := int(nrow * ncol)
for i := 0; i < int(n); i++ {
imgs[i] = make(RawImage, m)
m_, err := io.ReadFull(r, imgs[i])
if err != nil {
return nil, err
}
if m_ != int(m) {
return nil, os.ErrInvalid
}
}
return imgs, nil
}

First, the functions read the file from a io.Reader and reading a set of int32s. These are the metadata of the file. The first int32 is a magic number that is used to indicate if a file is a labels file or a file of images. n indicates the number of images or labels the file contains. nrow and ncol are metadata that exists in the file, and indicates how many rows/columns there are in each image.

Zooming into the readImageFile function, we can see that after all the metadata has been read, we know to create a []RawImage of size n. The image format used in the MNIST dataset is essentially a slice of 784 bytes (28 columns and 28 rows). Each byte therefore represents a pixel in the image. The value of each byte represents how bright the pixel is, ranging from 0 to 255:

The preceding image is an example of an MNIST image blown up. At the top-left corner, the index of the pixel in a flat slice is 0. At the top right corner, the index of the pixel in a flat slice is 27. At the bottom-left corner, the index of the pixel in a flat slice is 755. And, finally, at the bottom-right corner, the index is 727.  This is an important concept to keep in mind: A 2D image can be represented as a 1D slice.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.202.61