Getting data into Pachyderm

Let's prepare our data. In this case, we are using the CIFAR-10 dataset from Chapter 6, Object Recognition with Convolutional Neural Networks. If you need a refresher, pull the data from the source at the University of Toronto, like so:

wget https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
...
cifar-10-binary.tar.gz 100%[==================================>] 162.17M 833KB/s in 2m 26s

Extract the data to a temporary directory, and create repo in Pachyderm:

# pachctl create repo data
# pachctl list repo
NAME CREATED SIZE (MASTER)
data 8 seconds ago 0B
bash-3.2$

Now that we've got a repository, let's fill it with our CIFAR-10 image data. First, let's create individual directories and break up the various CIFAR-10 files so that we can just dump an entire directory of files (from our data or training sets).

Now we can execute the following command and then confirm that the data has made it to repo successfully:

#pachctl put file -r data@master -f data/
#pachctl list repo
NAME CREATED SIZE (MASTER)
data 2 minutes ago 202.8MiB

We can drill down to details of the files that repo contains:

pachctl list file data@master
COMMIT NAME TYPE COMMITTED SIZE
b22db05d23324ede839718bec5ff219c /data dir 6 minutes ago 202.8MiB
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.254.90