Downloading training data

You should start by downloading the training data from the following links:

We will download this programmatically, but we should start with a manual download just to peek at the data and structure of the archive. This will be important when we write the pipeline, as we'll need to understand the structure so we can manipulate the data.

The small set is ideal for peeking. You can do this via the following command line, or just use a browser to download the file with an unarchiver to extract the files (I suggest getting familiarized with the command line as all of this needs to be automated):

cd ~/workdir
wget http://yaroslavvb.com/upload/notMNIST/notMNIST_small.tar.gz
tar xvf notMNIST_small.tar.gz

The preceding command line will reveal a container folder called notMNIST_small with ten subfolders underneath, one for each letter of the alphabet a through j. Under each lettered folder, there are thousands of 28x28 pixel images of the letter. Additionally, an interesting thing to note is the filename of each letter image, (QnJhbmRpbmcgSXJvbi50dGY=), suggesting a random string that does not contain information of use.

Table of Contents for Downloading training data

Create new playlist

Sign In

Sign Up

Table of Contents for
Downloading training data