Chapter 4: Installing Pachyderm Locally

In the previous chapters, we learned about Pachyderm's architecture, the internals of the Pachyderm solution, and version control primitives such as repositories branches, and commits. We reviewed why reproducibility is essential and why it should be a part of a successful data science process. We also learned how to do this on all three major platforms – macOS, Linux, and Windows.

There are many ways and a variety of platforms that enable you to run your end-to-end Machine Learning (ML) workflows using Pachyderm. We will start with the most common and easy to configure local deployment method on your computer; then, in the following chapters, we will review the deployment process on cloud platforms.

This chapter will walk you through the process of installing Pachyderm locally so that you can get started quickly and test Pachyderm. This chapter will prepare you to run your first pipeline. We will provide an overview of the system requirements and guide you through the process of installing all the prerequisite software needed for Pachyderm to run smoothly.

In this chapter, we're going to cover the following main topics:

  • Installing the required tools
  • Installing minikube
  • Installing Docker Desktop
  • Installing the Pachyderm Command-Line Interface (CLI)
  • Enabling autocompletion for Pachyderm
  • Preparing the Kubernetes environment
  • Deploying Pachyderm
  • Accessing the Pachyderm dashboard
  • Deleting an existing Pachyderm deployment

Technical requirements

Whether you are on macOS, Windows, or Linux, you need to install the following tools:

  • Homebrew
  • The Kubernetes CLI; that is, via kubectl
  • Helm
  • minikube
  • Docker Desktop (as an alternative to minikube)
  • The Pachyderm CLI; that is, via pachctl
  • Windows Subsystem for Linux (WSL) for Windows installations

We will get into the specifics of installing and configuring these tools as we go through this chapter. If you already know how to do this, you can go ahead and set them up now.

Installing the required tools

In this section, we will cover how to install the system tools that we will use to prepare our environment before installing Pachyderm.

Installing Homebrew (macOS only)

While Linux distributions have many package management options, there is no default package manager for macOS users. Homebrew (brew) fills this gap and provides a great solution to easily install and manage software from the macOS Terminal and Linux shell as an alternative to apt, yum, or flatpak, which are available in Linux distributions.

Homebrew uses Git to download its updates. In Homebrew, packages are installed based on definitions known as Formulae. Homebrew installs software packages to the Cellar, which is located under the /user/local/Cellar directory. Another term you will hear often is Tap. Tap is a Git repository of Formulae.

In this chapter, we will frequently use brew to install various software packages on macOS. Therefore, you need to install it if you are using macOS. The same brew commands we will use in this chapter run on Linux as well, but we will keep the use of brew for Linux optional:

  1. Execute the following command to install Homebrew on your computer:

    $ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

  2. Execute the following command to verify that Homebrew has been installed, as well as to list the available commands:

    $ brew commands

The following screenshot shows the system's output:

Figure 4.1 – List of brew commands

Figure 4.1 – List of brew commands

  1. Let's learn about some of the useful Homebrew commands you may need in the future. You can update Homebrew itself by executing the following command:

    $ brew update

  2. Execute the following command to find any outdated formulae:

    $ brew outdated

  3. Execute the following command to upgrade all the outdated formulae:

    $ brew upgrade

Now that you have installed the Homebrew package manager on your computer, let's install kubectl.

Installing Windows Subsystem for Linux (for Windows only)

WSL is a tool that enables Windows users to run Linux commands and utilities natively in Windows. If you are using Windows, you can install WSL on your machine by following these steps:

  1. Open PowerShell or the Windows Command Prompt.
  2. Install WSL by running the following command:

    wsl --install

    Important note

    If you are on Windows, run all the Linux and Pachyderm commands described in this book from WSL.

For more information, see the official Microsoft Windows documentation at https://docs.microsoft.com/en-us/windows/wsl/install.

Installing the Kubernetes command-line tool

Before you create your first Kubernetes cluster, you need to install the Kubernetes command-line tool, kubectl, to execute commands against the cluster. Now, let's learn how to install kubectl on a computer.

See the official Kubernetes documentation for more information: https://kubernetes.io/docs/home/.

Follow these steps:

  1. Execute the following commands to download and install kubectl on your computer:

If you are using Linux, run the following command:

$ curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

$ chmod +x ./kubectl && sudo mv ./kubectl /usr/local/bin/kubectl

If you are on macOS (Intel), run the following command:

$ curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl

$ chmod +x ./kubectl && sudo mv ./kubectl /usr/local/bin/kubectl

If you are on Windows, the following command will do the trick:

curl -LO https://dl.k8s.io/release/v1.22.0/bin/windows/amd64/kubectl.exe

  1. Verify the version you are using and make sure that kubectl is installed by executing the following command:

    $ kubectl version --short --client

Here is an example of the system's output:

Client Version: v1.22.3

To be able to perform the following commands, the kubectl version must be v1.19 or later.

Now that you have kubectl installed on your computer to execute commands against your Kubernetes cluster, let's install Helm.

Installing Helm v3

Helm is a popular package manager for Kubernetes clusters. Before you deploy Pachyderm by using its Helm chart, you need to install the Helm binary on your environment to be able to manage the life cycle of your Helm chart. Follow these steps to install Helm on your computer:

  1. Execute the following command to download and install Helm on your computer:

    $ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/

    helm/helm/master/scripts/get-helm-3

    $ chmod 700 get_helm.sh

    $ ./get_helm.sh

  2. Verify the version you are using and make sure Helm is installed by executing the following command:

    $ helm version --short

Here is an example of the system's output:

V3.7.1+g1d11fcb

Next, you must install the necessary tools to prepare your local Kubernetes cluster environment before you can deploy Pachyderm. If you are familiar with containers in Linux, you must be familiar with these tools. If you are using Linux as your local machine, follow the instructions provided in the Installing minikube section to prepare your environment. If you are using macOS, follow the instructions provided in the Installing Docker Desktop section. Using Docker Desktop is recommended due to its simplicity.

Installing minikube

Minikube is a popular cross-platform and lightweight Kubernetes implementation that helps users quickly create a single-node local Kubernetes cluster. minikube supports multiple runtimes, including CRI-O, container, and Docker. It can be deployed as a Virtual Machine (VM), a container, or on bare metal. Since Pachyderm supports the Docker runtime only, we will cover how to use the Docker container runtime and deploy it as a container. For additional configuration details, you can refer to the official Docker documentation at https://minikube.sigs.k8s.io/docs/start/. Let's install the latest version of minikube:

  1. Execute the following commands to install minikube on your computer.

If are using Linux, run the following command:

$ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64

$ sudo install minikube-linux-amd64 /usr/local/bin/minikube

If you are on Windows (Chocolatey package manager is required), run the following command:

choco install minikube

  1. Verify the minikube version you are using and make sure that minikube has been installed by executing the following command:

    $ minikube version

The following is an example of the command's response:

minikube version: v1.22.0

commit: a03fbcf166e6f74ef224d4a63be4277d017bb62e

Now that you have installed minikube installed, let's install Docker Desktop.

Installing Docker Desktop

Docker simplifies developing, delivering, and running applications by separating applications from the infrastructure and its dependencies. Pachyderm supports the Docker container runtime only; therefore, Docker tools must be installed before you deploy Pachyderm.

Docker runs as a native application using the macOS sandbox security model and installs all Docker tools on your macOS, including Docker Engine, the CLI, Docker Compose, Credential Helper, Notary, and Kubernetes.

If you do not have Docker Desktop already installed, you can follow the instructions provided in the next section to install it. Otherwise, you can skip to the Preparing your Kubernetes environment section. You can also refer to the official Docker documentation at https://docs.docker.com/get-docker/.

Installing Docker Desktop for macOS

Follow these steps to install Docker Desktop on macOS. The latest version of Docker is supported on the last three versions of macOS. If your macOS version is older than the last three versions, you need to upgrade it to the latest version of macOS:

  1. Visit the Docker Hub download link at https://hub.docker.com/editions/community/docker-ce-desktop-mac/ to download the Docker Desktop installer.
  2. Click on the Get Docker button to download the Docker.dmg file to your workstation.
  3. Once the download has completed, double-click on the Docker.dmg file to open the image and drag the Docker icon from the window to the Applications folder to complete the installation:
Figure 4.2 – Docker Desktop installation screen on macOS

Figure 4.2 – Docker Desktop installation screen on macOS

  1. In the Applications folder, double-click on the Docker icon to start Docker Desktop.
  2. Confirm that you have privileged access to install Docker components on your workstation.
  3. If you are using Docker for the first time, follow the quick tutorial. Otherwise, click Skip Tutorial to start Docker:
Figure 4.3 – Docker Desktop graphical user interface

Figure 4.3 – Docker Desktop graphical user interface

Installing Docker Desktop on Windows

Install Docker Desktop on your Windows machine by following these steps:

  1. Go to https://docs.docker.com/desktop/windows/install/.
  2. Click Docker Desktop for Windows.
  3. Click Docker Desktop Installer.exe.
  4. Follow the interactive prompts to install Docker Desktop.
  5. On the Configuration page, make sure that you select Install required Windows components for WSL 2.
  6. When you're done, click Close.
  7. Start Docker Desktop by finding it in Windows Search and accept the terms and conditions. Docker will start automatically after that.

Now that you have installed Docker Desktop on your machine, let's install the Pachyderm CLI, called pachctl.

Installing the Pachyderm command-line interface

The Pachyderm CLI, pachctl, is used to deploy and interact with Pachyderm clusters. Follow these steps to install pachctl:

  1. Let's get the latest release version tag of pachctl and keep it in a variable called PACHYDERMVERSION:

    $ PACHYDERMVERSION=$(curl --silent "https://api.github.com/repos/pachyderm/pachyderm/releases/latest" | grep '"tag_name":' |

    sed -E 's/.*"v([^"]+)".*/1/')

  2. Execute the following commands to install pachctl on your computer.

If you are using macOS, run the following command:

$ brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@${PACHYDERMVERSION}

If you are using Debian Linux or WSL on Windows 10, run the following command:

$ curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v${PACHYDERMVERSION}/pachctl_${PACHYDERMVERSION}_amd64.deb && sudo dpkg -i /tmp/pachctl.deb

  1. Execute the following command to verify that you have installed pachctl:

    $ pachctl version --client-only

The following is an example of the system's output:

COMPONENT           VERSION

pachctl             2.0.1

With that, you have installed the prerequisites to run Pachyderm locally. Now, let's prepare our cluster and deploy Pachyderm on our local Kubernetes cluster.

Enabling autocompletion for Pachyderm

Autocompletion is a functionality that's offered by Unix shell flavors to autofill parameters using the CLI. Depending on the type of shell that's used in your system, the autocompletion feature suggests or autocompletes the partially typed commands as you type, sometimes by pressing the Tab key. Pachyderm supports autocompletion for Bourne Again Shell (bash) and Z shell (zsh), an extended Bourne shell. bash and zsh are the most common Unix command-line interpreters that are used on macOS and Linux. In this section, you will learn how to enable the Pachyderm autocompletion feature and the parameters that are available from the pachctl command.

If you don't know which shell you are using, type the following command to find out:

$ echo "$SHELL"

If you are using bash, the output of the preceding command should look as follows:

/bin/bash

If you are using zsh, the output of the preceding command should look as follows:

/bin/zsh

Since we now know which shell we are using, we can install Pachyderm autocompletion.

Enabling Pachyderm autocompletion for bash

Follow these steps to enable Pachyderm autocompletion on your computer:

  1. Execute the following commands to enable bash-completion.

If you are using macOS or Linux with Homebrew, use the following command:

$ brew install bash-completion

If you are on Ubuntu Linux, use the following command:

$ sudo apt install bash-completion

If you are using RHEL or CentOS Linux, use the following command:

$ sudo yum install bash-completion bash-completion-extras

  1. Execute the following command to verify that bash-completion is enabled on your computer.

If you are on macOS, run the following command:

$ brew info bash-completion

If you are using Linux, run the following command:

$ complete -p

  1. Confirm that the path for bash-completion is pointing to the correct directory. Then, execute either of the following commands to enable Pachyderm pachctl autocompletion:

If you are on macOS, run the following command:

$ pachctl completion bash --install --path /usr/local/etc/bash_completion.d/pachctl

If you are using Linux, run the following command:

$ pachctl completion bash --install --path /usr/share/bash-completion/completions/pachctl

With that, Pachyderm's command-line autocompletion has been enabled in your bash shell.

Enabling Pachyderm autocompletion for zsh

Z shell, or zsh, is an improved interactive login shell with many advanced features. The default interactive shell in Apple laptops was changed to zsh with macOS Catalina. Follow these steps to enable Pachyderm autocompletion on your computer:

Important note

If you do not wish to enable autocompletion, you can try using pachctl shell instead. To enable this feature, type pachctl shell.

  1. Execute the following commands to enable zsh autocompletion.

If you are using macOS or Linux with Homebrew, use the following command:

$ brew install zsh-completion

If you are on Linux, visit the https://github.com/zsh-users/zsh-completions page and follow the instructions for your Linux distribution to enable zsh completion. As an example, for Ubuntu 19.10, this would look as follows:

$ echo 'deb http://download.opensuse.org/repositories/shells:/zsh-users:/zsh-completions/xUbuntu_19.10/ /' | sudo tee /etc/apt/sources.list.d/shells:zsh-users:zsh-completions.list

$ curl -fsSL https://download.opensuse.org/repositories/shells:zsh-users:zsh-completions/xUbuntu_19.10/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/shells_zsh-users_zsh-completions.gpg > /dev/null

$ sudo apt update && sudo apt install zsh-completions

  1. Execute the following commandss to verify that zsh autocompletion is enabled on your computer.

If you are on macOS, run the following command:

$ brew info zsh-completions

If you are using Linux, run the following command:

$ complete -p

  1. Confirm that the path for zsh autocompletion is pointing to the correct directory. Then, execute either of the following commands to enable Pachyderm pachctl autocompletion.

On macOS, run the following command:

$ pachctl completion zsh --install --path /usr/local/share/zsh-completions/_pachctl

If you are using Linux, run the following command:

$ pachctl completion zsh --install --path /home/linuxbrew/.linuxbrew/share/zsh-completions/_pachctl

With that, Pachyderm command-line autocompletion is now enabled in your zsh shell. Next, let's prepare the Kubernetes environment.

Preparing the Kubernetes environment

In this section, you will provision a Kubernetes cluster by using the preferred tools that you deployed in the Installing the required tools section.

Enabling Kubernetes on Docker Desktop

Follow these steps to enable Kubernetes if you're using Docker Desktop as your container platform to deploy your Kubernetes cluster on both Windows and macOS:

  1. Open the Docker UI.
  2. At the top-right corner of the Docker UI, click the Settings icon.
  3. Switch to the Kubernetes settings panel and click the Enable Kubernetes button to start a single-node Kubernetes cluster with Docker Desktop. Apply these settings by clicking the Apply & Restart button:
Figure 4.4 – Enabling Kubernetes in Docker Desktop

Figure 4.4 – Enabling Kubernetes in Docker Desktop

  1. Open a Terminal window and confirm that Kubernetes is running by executing the following command:

    $ kubectl get node

The following is an example of the system's response:

NAME           STATUS ROLES                AGE  VERSION

docker-desktop Ready  control-plane,master 7m9s v1.21.5

With that, you have a single-node Kubernetes cluster configured on Docker Desktop. Now, we are ready to deploy Pachyderm on our local Kubernetes environment.

Enabling Kubernetes using minikube

Follow these steps to run Kubernetes locally when using minikube:

  1. Make Docker the default driver for minikube:

    $ minikube config set driver docker

  2. Start a Kubernetes cluster:

    $ minikube start

  3. Execute the following command to verify that your Kubernetes cluster is ready:

    $ kubectl get node

    NAME       STATUS   ROLES                  AGE   VERSION

    minikube   Ready    control-plane,master   29m   v1.20.2

With that, your Kubernetes cluster has been configured using minikube. Now, we are ready to deploy Pachyderm on our local Kubernetes environment.

Deploying Pachyderm

When running Pachyderm in production, it is recommended to start in an environment where resources can scale up to handle the computational needs of larger pipelines. Pachyderm can be installed on any Kubernetes cluster, including managed Kubernetes services provided by AWS, Google Cloud, Microsoft Azure, IBM Cloud, and OpenShift, as well as locally on your workstation. In this section, we are going to focus on a smaller test deployment; therefore, a local cluster is good enough to get started.

Pachyderm provides sample Helm charts to help you deploy Pachyderm to all major cloud platforms. You can read more about Helm Charts in Chapter 2, Pachyderm Basics. Because Helm charts are flexible, you can pick the components that you want to install. For example, you can install the Pachyderm in-browser interface called the Console.

The Pachyderm Console is the Pachyderm user interface and provides a birds-eye view of your pipelines through the Direct Acyclic Graph (DAG), as well as other useful features.

Some components, such as the Pachyderm Console, require an Enterprise license but are also available for testing with a free trial license for 30 days. You can request a free trial license at https://www.pachyderm.com/trial/.

Follow these steps to install Pachyderm on your local Kubernetes cluster:

  1. Deploying Pachyderm consists of multiple resources. The helm command helps manage complex Kubernetes applications' life cycles and creates all the necessary components at once using a single command. You can learn more about the options for Helm charts in Chapter 5, Installing Pachyderm on a Cloud Platform. For now, let's execute the following command to add Pachyderm Helm chart repositories to your local repository:

    $ helm repo add pach https://helm.pachyderm.com  

  2. Execute the following command to get the latest chart information from the chart repository:

    $ helm repo update

  3. Execute the following command to deploy the latest version of Pachyderm on your cluster without the console:

    $ helm install pachd pach/pachyderm --set deployTarget=LOCAL

If you have an Enterprise key and you would like to deploy it with Pachyderm's console user interface, create a file called license.txt and paste your Enterprise token into that file. Then, run the following commands:

$ helm install pachd pach/pachyderm --set deployTarget=LOCAL --set pachd.enterpriseLicenseKey=$(cat license.txt) --set console.enabled=true  

Once the Console has been deployed successfully, follow the instructions provided in the Accessing the Pachyderm Console section to access the Console.

The preceding commands return the following output:

Figure 4.5 – Pachyderm Helm chart getting deployed on Kubernetes

Figure 4.5 – Pachyderm Helm chart getting deployed on Kubernetes

  1. You can inspect the values.yaml file in the Helm Chart repository (https://github.com/pachyderm/pachyderm/tree/master/etc/helm/pachyderm) to learn more about the Kubernetes objects that are created by deploying the Chart. The Pachyderm Helm chart creates Kubernetes service accounts, Services, Deployments, PostgreSQL, and etcd instances, all of which are needed to run Pachyderm.
  2. A Kubernetes Deployment is a controller that rolls out a ReplicaSet of Pods based on the requirements defined in your manifest file. A ReplicaSet is a group of the same service instances. Execute the following command to verify the state of the Deployments that have been created during the installation:

    $ kubectl get deployments

The output of the preceding command should look as follows:

Figure 4.6 – List of Pachyderm Deployment objects

Figure 4.6 – List of Pachyderm Deployment objects

  1. Execute the following command to verify that the installation was successful and view the Pods that were created as part of the Deployments:

    $ kubectl get pods

The output of the preceding command should look as follows:

Figure 4.7 – List of Pachyderm Pods

Figure 4.7 – List of Pachyderm Pods

  1. Execute the following commands to connect to your new Pachyderm instance:

    pachctl config import-kube local –overwrite

    pachctl config set active-context local

    pachctl port-forward

  2. Go to your browser and point it to http://localhost:4000.
  3. Authenticate with a mock user called admin and use password as your password.
  4. Go back to your terminal and enable authentication:

    pachctl auth activate

You'll be prompted to log into the UI again. Log in with the mock user called admin and use password as your password.

  1. Execute the following command to verify that Pachyderm has been installed successfully:

    pachctl version

The output of the preceding command should look as follows:

COMPONENT           VERSION

pachctl             2.0.1

pachd               2.0.1

Now that we have installed Pachyderm on our cluster, we are ready to create our first pipeline.

Accessing the Pachyderm Console

If you have installed the Console with your Pachyderm cluster, you can access it and view your pipelines, repositories, and other Pachyderm objects in the UI. The Pachyderm Console is available as a free trial for 30 days. Follow these steps to access the Pachyderm Console:

  1. Execute the following command to verify the status of your Enterprise activation:

    pachctl enterprise get-state

The output of the preceding command should look as follows:

Pachyderm Enterprise token state: ACTIVE

Expiration: 2022-02-02 22:35:21 +0000 UTC

  1. If port-forwarding is not running, execute the following command to start it. This step will forward the Pachyderm Console service on port 4000 to your local workstation on port 4000:

    pachctl port-forward

  2. In a web browser, open http://localhost:4000.
  3. Use the default credentials of admin and password on the login screen, as follows:

Figure 4.8 – Login screen of the Pachyderm Console

  1. Upon clicking Login to access the Pachyderm Console, click the View Project button to view your repositories and pipelines:

Figure 4.9 – Pachyderm dashboard

Because we have not created any Pachyderm objects, this page is empty.

Now that you have learned how to access the Pachyderm Console, you are ready to create your first pipeline in Pachyderm.

Deleting an existing Pachyderm deployment

Only perform the steps in this section if you want to delete your cluster. If you want to continue working on the examples in other chapters, then please skip this section.

If you need to delete your deployment and start afresh, you need to wipe out your environment and start over again from the steps provided in the Preparing the Kubernetes environment section. When you delete an existing Pachyderm deployment, all the components, except for the Helm repository and pachctl, are removed from your machine.

Follow these steps to delete your existing Pachyderm deployment:

  1. If you have used a different name for your Helm instance, then execute the following command to find the Pachyderm instance name that you deployed using the Helm chart:

    $ helm ls | grep pachyderm

The output of the preceding command should look as follows:

pachd default 1 2021-11-08 21:33:44 deployed Pachyderm-2.0.1 2.0.1

  1. Execute the following command using your Pachyderm instance name to remove the Pachyderm components from your cluster:

    $ helm uninstall pachd

  2. If you're using minikube, remove the entire Kubernetes cluster and redeploy before you deploy Pachyderm again:

    $ minikube stop

    $ minikube delete

With that, you have completely removed Pachyderm and the local Kubernetes cluster from your computer.

Summary

In this chapter, we learned about the software prerequisites for getting Pachyderm up and running on your local computer for testing purposes.

We gained basic knowledge about minikube and Docker Desktop and learned how to install them on our local machine. We also learned how to install the Pachyderm CLI and enable autocompletion on different operating systems.

We then installed Helm and the Pachyderm Helm repository on our system. We learned about Helm charts and how to obtain a free trial Pachyderm license.

We deployed a single-node, local Kubernetes cluster by using the most popular options available based on our desktop operating system. Finally, we deployed Pachyderm and learned how to access the Pachyderm Console.

We also learned how to do so on all three major platforms – macOS, Linux, and Windows.

In the next chapter, we will learn about how to install Pachyderm via the cloud and explain the software requirements needed to run a Pachyderm cluster in production. We will also learn about Pachyderm Hub, the Software-as-a-Service (SaaS) version of Pachyderm that is great for both testing and production environments.

Further reading

Please refer to the following links for more information about the topics that were covered in this chapter:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.142.145