Exploring Pachyderm

Our focus for this book is on developing deep learning systems in Go. So, naturally, now that we are talking about how to manage the data that we feed to our networks, let's take a look at a tool to do so that is also written in Go.

Pachyderm is a mature and scalable tool that offers containerized data pipelines. In these, everything you could possibly need, from data to tools, is held together in a single place where deployments can be maintained and managed and versioning for the data itself. The Pachyderm team sell their tool as Git for data, which is a useful analogy. Ideally, we want to version the entire pipeline so that we know which data was used to train, and which, in turn, gave us the specific prediction of X.

Pachyderm removes much of the complexity of managing these pipelines. Both Docker and Kubernetes run under the hood. We will explore each of these tools in greater detail in the next chapter, but for now, all we need to know is that they are critical for enabling reproducible builds, as well as scalable distributed training of our models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.47.208