Chapter 9: Persistent Storage in Kubernetes

So far, we've learned about Kubernetes' key concepts, and this chapter is going to be the last one about Kubernetes' core concepts. So far, you've understood that Kubernetes is all about creating an object in the etcd datastore that will be converted into actual computing resources on the Nodes that are part of your cluster.

This chapter will focus on a concept called PersistentVolume. This is going to be another object that you will need to master in order to get persistent storage on your cluster. Persistent storage is achieved in Kubernetes by using the PersistentVolume resource kind, which has its own mechanics. Honestly, these can be relatively difficult to approach at first, but we are going to discover all of that!

In this chapter, we're going to cover the following main topics:

  • Why you would want to use PersistentVolume
  • Understanding how to mount PersistentVolume to your Pod claims
  • Understanding the life cycle of PersistentVolume in Kubernetes
  • Static and dynamic PersistentVolume provisioning

Technical requirements

  • A working Kubernetes cluster (either local or cloud-based)
  • A working kubectl CLI configured to communicate with the cluster

If you do not meet these technical requirements, you can follow Chapter 2, Kubernetes Architecture – From Docker Images to Running Pods, and Chapter 3, Installing Your Kubernetes Cluster, to get these two prerequisites.

Why you would want to use PersistentVolume

When you're creating your Pods, you have the opportunity to create volumes in order to share files between the containers created by them. However, these volumes can represent a massive problem: they are bound to the life cycle of the Pod that created them.

That is why Kubernetes offers another object called PersistentVolume, which is a way to create storage in Kubernetes that will not be bound to the life cycle of a Pod.

Introducing PersistentVolumes

Just like the Pod of ConfigMap, PersistentVolume is a resource kind that is exposed through kube-apiserver: you can create, update, and delete persistent volumes using YAML and kubectl just like any other Kubernetes object.

The following command will demonstrate how to list the PersistentVolume resource kind currently provisioned within your Kubernetes cluster:

$ kubectl get persistentvolume

No resource found

The persistentvolume object is also accessible with the plural form of persistentvolumes along with the alias of pv. The following three commands are essentially the same:

$ kubectl get persistentvolume

No resource found

$ kubectl get persistentvolumes

No resource found

$ kubectl get pv

No resource found

You'll find that the pv alias is very commonly used in the Kubernetes world, and a lot of people refer to persistent volumes as simply pv, so be aware of that. As of now, no PersistentVolume object has been created within my Kubernetes cluster, and that is why I don't see any resource listed in the output of the preceding command.

PersistentVolume is the object and, essentially, represents a piece of storage that you can attach to your Pod. That piece of storage is referred to as a Persistent one because it is not supposed to be tied with the life cycle of a Pod.

Indeed, as mentioned in Chapter 5, Using Multi-Container Pods and Design Patterns, Kubernetes Pods uses the notion of volumes. Additionally, we discovered the emptyDir and hostPath volumes, which, respectively, initiate an empty directory that your Pods can share. It also defines a path within the worker Node filesystem that will be exposed to your Pods. Both of these volumes were supposed to be attached to the life cycle of the Pod. This means that once the Pod is destroyed, the data stored within the volumes will be destroyed as well.

However, sometimes, you don't want the volume to be destroyed. You just want it to have its life cycle to keep both the volume and its data alive even if the Pod fails. That's where PersistentVolumes comes into play: essentially, they are volumes that are not tied to the life cycle of a Pod. Since they are a resource kind just like the Pods themselves, they can live on their own!

Important Note

Bear in mind that PersistentVolumes objects are just entries within the etcd datastore, and they are not an actual disk on their own.

PersistentVolume is just a kind of pointer within Kubernetes to an external piece of storage, such as an NFS, a disk, an Amazon EBS volume, and more. This is so that you can access these technologies from within Kubernetes and in a Kubernetes way.

Simply put, PersistentVolume is essentially made up of two different things:

  • A backend technology called a PersistentVolume type
  • An access mode, such as ReadWriteOnce

You need to master both concepts in order to understand how to use PersistentVolumes. Let's begin by explaining what PersistentVolume types are.

Introducing PersistentVolume types

Kubernetes is supposed to be able to run on as much infrastructure as possible, and even though it started as a Google project, it can be used on many platforms, whether they are public clouds or private solutions.

As you already know, the simplest Kubernetes setup consists of a simple minikube installation, whereas the most complex Kubernetes setup can be made of dozens of servers on massively scalable infrastructure. All of these different setups will forcibly have different ways in which to manage persistent storage. For example, the three biggest public cloud providers have a lot of different solutions. Let's name a few, as follows:

  • Amazon AWS EBS volumes
  • Amazon AWS EFS filesystems
  • Google GCE Persistent Disk (PD)
  • Microsoft Azure disks

These solutions have their own design and set of principles along with their own logic and mechanics. Kubernetes was built with the principle that all of these setups should be abstracted using just one object to abstract all of the different technologies. And that single object is the PersistentVolume resource kind. The PersistentVolume resource kind is the object that is going to be attached to a running Pod. Indeed, a Pod is a Kubernetes resource and does not know what an EBS or a PD is; Kubernetes Pods only play well with PersistentVolumes, which is also a Kubernetes resource.

Whether your Kubernetes cluster is running on Google GKE, Amazon EKS, or whether it is a single Minkube cluster on your local machine has no importance. When you wish to manage persistent storage, you are going to create, use, and deploy PersistentVolumes objects, and then bind them to your Pods!

Here are some of the backend technologies supported by Kubernetes out of the box:

  • awsElasticBlockStore: Amazon EBS volumes
  • gcePersistentDisk: Google Cloud PD
  • azureDisk: Azure Disk
  • azureFile: Azure File
  • cephfs: Ceph-based filesystems
  • csi: Container storage interface
  • glusterfs: GlusterFS-based filesystems
  • nfs: Regular network file storage

The preceding list is not exhaustive: Kubernetes is extremely versatile and can be used with many storage solutions that can be abstracted as PersistentVolume objects in your cluster.

When you create a PersistentVolume object, essentially, you are creating a YAML file. However, this YAML file is going to have a different key/value configuration based on the backend technology used by the PersistentVolume objects.

The benefits brought by PersistentVolume

There are three major benefits of PersistentVolume:

  • PersistentVolume is not bound to the life cycle of a Pod. If you want to remove a Pod that is attached to a PersistentVolume object, then the volume will survive.
  • The preceding statement is also valid when a Pod crashes: the PersistentVolume object will survive the fault and not be removed from the cluster.
  • PersistentVolume is cluster-wide; this means that it can be attached to any Pod running on any Node.

Bear in mind that these three statements are not always 100% valid. Indeed, sometimes, a PersistentVolume object can be affected by its underlying technology.

To demonstrate this, let's consider a PersistentVolume object that is, for example, a pointer to an Amazon EBS volume created on your AWS cloud. In this case, the worker Node will be an Amazon EC2 instance. In such a setup, PersistentVolume won't be available to any Node.

The reason is that AWS has some limitations around EBS volumes, which relates to the fact that an EBS volume can only be attached to one instance at a time, and that instance must be provisioned in the same availability zones as the EBS volume. From a Kubernetes perspective, this would only make PersistentVolume (EBS volumes) accessible from EC2 instances (that is, worker Nodes) in the same AWS availability zone, and several Pods running on different Nodes (EC2 instances) won't be able to access the PersistentVolume object at the same time.

However, if you take another example, such as an NFS setup, it wouldn't be the same. Indeed, you can access an NFS from multiple machines at once; therefore, a PersistentVolume object that is backed by an NFS would be accessible from several different Pods running on different Nodes without much problem. To understand how to make a PersistentVolume object on several different Nodes at a time, we need to consider the concept of access modes.

Introducing access modes

As the name suggests, access modes are an option you can set when you create a PersistentVolume type that will tell Kubernetes how the volume should be mounted.

PersistentVolumes supports three access modes, as follows:

  • ReadWriteOnce: This volume allows read/write by only one Node at the same time.
  • ReadOnlyMany: This volume allows read-only mode by many Nodes at the same time.
  • ReadWriteMany: This volume allows read/write by multiple Nodes at the same time.

It is necessary to set at least one access mode to a PersistentVolume type, even if said volume supports multiple access modes. Indeed, not all PersistentVolume types will support all access modes.

Understanding that not all access modes are available to all PersistentVolume types

As mentioned earlier, PersistentVolume types are only a pointer to an external piece of storage. And that piece of storage is constrained by the backend technology that is providing it.

As mentioned earlier, one good example that we can use to explain this is the Amazon EBS volume technology that is accessible within the AWS cloud. When you create a PersistentVolume in Kubernetes, which is a pointer to an Amazon EBS volume, then that PersistentVolume will only support the ReadWriteOnce access mode, whereas NFS supports all three. This is because of the hard limitation mentioned earlier: an EBS volume can only be attached to one Amazon EC2 instance at a time, and it is a hard limit set by AWS. So, in the Kubernetes world, it can only be represented by a PersistentVolume type with an access mode set to ReadWriteOnce.

Simply put, these PersistentVolume types, and the concepts surrounding them, are simply Kubernetes concepts that are only valid within the Kubernetes scope and have absolutely no meaning outside of Kubernetes.

Some PersistentVolume objects will be permissive, while others will have a lot of constraints. And all of this is determined by the underlying technology they are pointing to. No matter what you do with PersistentVolume, you'll have to deal with the restrictions set by your cloud provider or underlying infrastructure.

Now, let's create our first PersistentVolume object.

Creating our first PersistentVolume

So, let's create a PersistentVolume on your Kubernetes cluster using the declarative way. Since this is a kind of complex resource, I heavily recommend that you try not to use the imperative way to create such resources:

# ~/pv-hostpath.yaml

apiVersion: v1

kind: PersistentVolume

metadata:

  name: pv-hostpath

  spec:

    volumeMode: Filesystem

    accessModes:

      - ReadWriteOnce

    capacity:

      storage: 1Gi

This is the simplest form of PersistentVolume. Essentially, this YAML file creates a PersistentVolume entry within the Kubernetes cluster. So, this PersistentVolume will be a hostPath type.

It could be a more complex volume such as a cloud-based disk, or an NFS, but in its simplest form, a PersistentVolume can simply be a hostPath type on the worker Node running your Pod.

How does Kubernetes PersistentVolumes handle cloud-based storage?

A bare PersistentVolume entry in our cluster can do nothing on its own and must be seen as a layer of abstraction on the Kubernetes level: outside Kubernetes, the PersistentVolume resource kind has no meaning.

That being said, the PersistentVolume resource kind is a pointer to something else, and that something else can be, for example, a disk, an NFS drive, a Google Cloud PD, or an Amazon EBS volume. All of these different technologies are managed differently. However, fortunately for us, in Kubernetes, they are all represented by the PersistentVolume object.

Simply put, the YAML file to build a PersistentVolume will be a little bit different depending on the backend technology that the PersistentVolume is backed by. For example, if you want your PersistentVolume to be a pointer to an Amazon EBS volume, you have to meet the following two conditions:

  • The Amazon EBS volume must already be provisioned in your AWS cloud.
  • The YAML file for your PersistentVolume must include the ID of the EBS volume, as it will be displayed in the AWS console.

And the same logic goes for everything else. For a PersistentVolume to work properly, it needs to forcibly be able to make the link between Kubernetes and the actual storage. So, you need to create a piece of storage or provision it outside of Kubernetes and then create the PersistentVolume entry by including the unique ID of the disk, or the volume, that is backed by a storage technology that is external to Kubernetes. Next, let's take a closer look at some examples of PersistentVolume YAML files.

Amazon EBS PersistentVolume YAML

This example displays a PersistentVolume object that is pointing to an Amazon EBS volume on AWS:

# ~/persistent-volume-ebs.yaml

apiVersion: v1

kind: PersistentVolume

metdata:

  name: persistent-volume-ebs

spec:

  capacity:

    storage: 2Gi

  accessModes:

    - ReadWriteOnce

  awsElasticBlockStore:

    volumeId: vol-xxxx

    fsType: ext4

As you can see, in this YAML file, awsElasticBlockStore is indicating that this PersistentVolume object is pointing to a volume on my AWS account. The exact Amazon EBS volume is identified by the volumeId key. And that's pretty much it. With this YAML file, Kubernetes is capable of finding the proper EBS volume and maintaining a pointer to it thanks to this PersistentVolume entry.

Of course, since EBS volumes are pure AWS, they can only be mounted on EC2 instances, which means this volume will never work if you attempt to attach it to something else. Now, let's examine a very similar YAML file; however, this time, it's going to point to a GCE PD.

GCE PersistentDisk PersistentVolume YAML

Here is the YAML file that is creating a PersistentVolume object that is pointing to an existing GCE PD:

# ~/persistent-volume-pd.yaml

apiVersion: v1

kind: PersistentVolume

metdata:

  name: persistent-volume-pd

spec:

  capacity:

    storage: 2Gi

  accessModes:

    - ReadWriteOnce

  gcePersistentDisk:

    pdName: xxxx

    fsType: ext4

Once again, please note that it is the same kind: PersistentVolume object as the one used by the Amazon EBS PersistentVolume object. In fact, it is the same object and the same interface from the Kubernetes side. The only difference is the configuration under gcePersistentDisk, which, this time, points to a PD created on Google Cloud. Kubernetes is so versatile that it can fetch and use different cloud storage solutions just like that.

Next, let's explore one last example in YAML, this time using NFS.

NFS PersistentVolume YAML

Here is an example YAML file that can create a PersistentVolume object that is backed by an NFS drive:

# ~/persistent-volume-nfs.yaml

apiVersion: v1

kind: PersistentVolume

metdata:

  name: persistent-volume-nfs

spec:

  capacity:

    storage: 2Gi

  accessModes:

    - ReadWriteMany

  nfs:

    path: /opt/nfs

    server: nfsxxxx

    fsType: ext4

Again, note that this time, we're still using the kind: PersistentVolume entry. Additionally, we are now specifying an nfs path configuration with the path as well as the server address. Now, let's discuss a little bit about the provisioning of the storage resources.

Can Kubernetes handle the provisioning or creation of the resource itself?

The fact that you need to create the actual storage resource separately and then create a PersistentVolume in Kubernetes might be tedious.

Fortunately for us, Kubernetes is also capable of communicating with the APIs of your cloud provider in order to create the volumes or disk on the fly. There is something called dynamic provisioning that you can use when it comes to managing PersistentVolume. It makes things a lot simpler when dealing with PersistentVolume provisioning, but it only works on supported cloud providers.

However, this is an advanced topic, so we will discuss it, in more detail, later in this chapter.

Now that we know how to provision PersistentVolume objects inside our cluster, we can try to mount them. Indeed, in Kubernetes, once you create a PersistentVolume, you need to mount it to a Pod so that it becomes in use. Things will get slightly more advanced and conceptual here; this Kubernetes uses an intermediate object in order to mount a PersistentVolume to Pods. And this intermediate object is called PersistentVolumeClaim. Let's focus on it next.

Understanding how to mount a PersistentVolume to your Pod claims

So far, we've learned that Kubernetes makes use of two objects to deal with persistent storage technologies. The first one is PersistentVolumes, which represents a piece of storage, and we quoted Google Cloud PD and Amazon EBS volumes as possible backends for PersistentVolume. Additionally, we discovered that depending on the technology that PersistentVolume is relying on, it is going to be exposed to one or more Pods using access modes.

That being said, we can now try to mount a PersistentVolume object to a Pod. To do that, we will need to use another object, which is the second object we need to explore in this chapter, called PersistentVolumeClaim.

Introducing PersistentVolumeClaim

Just like PersistentVolume and ConfigMap, PersistentVolumeClaim is another independent resource kind living within your Kubernetes cluster and is the second resource kind that we're going to examine in this chapter.

This object can appear to be a little bit more complex to understand compared to the others. First, bear in mind that even if both names are almost the same, PersistentVolume and PersistentVolumeClaim are two distinct resources that represent two different things.

You can list the PersistentVolumeClaim resource kind created within your cluster using kubectl, as follows:

$ kubectl get persistentvolumeclaims

No resources found in default namespace.

The following output is telling me that I don't have any PersistentVolumeClaim resources created within my cluster. Please note that the pvc alias works, too:

$ kubectl get pvc

No resources found in default namespace.

You'll quickly find that a lot of people working with Kubernetes refer to the PersistentVolumeClaim resources simply with pvc. So, don't be surprised if you see the term pvc here and there while working with Kubernetes. That being said, let's explain what PersistentVolumeClaim resources are in Kubernetes.

Splitting storage creation and storage consumption

The key to understanding the difference between PersistentVolume and PersistentVolumeClaims is to understand that one is meant to represent the storage itself, whereas the other one represents the request for storage that a Pod makes to get the actual storage.

The reason is that Kubernetes is supposed to be used by two types of people:

  • Kubernetes administrator: This person is supposed to maintain the cluster, operate it, and also add computation resources and persistent storage.
  • Kubernetes application developer: This person is supposed to develop and deploy an application, so put simply, consume the computation resource and storage offered by the administrator.

In fact, there is no problem if you handle both roles in your organization; however, this information is crucial to understand the workflow to mount PersistentVolume to Pods.

Kubernetes was built with the idea that a PersistentVolume object should belong to the cluster administrator scope, whereas PersistentVolumeClaims objects belong to the application developer scope. It is up to the cluster administrator to add PersistentVolumes since they might be hardware resources, whereas developers have a better understanding of what amount of storage and what kind of storage is needed, and that's why the PersistentVolumeClaim object was built.

Essentially, a Pod cannot mount a PersistentVolume object directly. It needs to explicitly ask for it. And that asking action is achieved by creating a PersistentVolumeClaim object and attaching it to the Pod that needs a PersistentVolume object.

This is the only reason why this additional layer of abstractions exists.

The summarized PersistentVolume workflow

Once the developer has built the application, it is their responsibility to ask for a PersistentVolume object if needed. To do that, the developer will write two YAML manifest files:

  • One file is for the Pod application.
  • The other file is for PersistentVolumeClaim.

The Pod application must be written so that the PersistentVolumeClaim object is mounted as a volumeMount configuration key in the YAML file. Please note that for it to work, the PersistentVolumeClaim object needs to be in the same namespace as the application Pod that is mounting it. The PersistentVolume object is never mounted directly to the Pod.

When both YAML files are applied and both resources are created in the cluster, the PersistentVolumeClaim object will look for a PersistentVolume object that matches the criteria required in the claim. Supposing that a PersistentVolume object capable of fulfilling the claim is created and ready in the Kubernetes cluster, the PersistentVolume object will be attached to the PersistentVolumeClaim object.

If everything is okay, the claim is considered fulfilled, and the volume is correctly mounted to the Pod: if you understand this workflow, essentially, you understand everything related to PersistentVolume usage.

Let's summarize this as follows:

  1. A Kubernetes administrator created a PersistentVolume object.
  2. A Kubernetes developer requests a PersistentVolume object for their application using the PersistentVolumeClaim object.
  3. The developer writes its YAML file so that the PersisentVolumeClaim object is configured as a volume mount to the Pod.
  4. Once the Pod and its PersisentVolumeClaim are created, Kubernetes fetches a PersisentVolume object that answers what is requested in the PVC.
  5. Then, the PersisentVolume object is accessible from the Pod and is ready to receive read or write operations based on the PersisentVolume access mode.

This setup might seem complex to understand at first, but you will quickly become used to it.

Creating a Pod with a PersistentVolumeClaim object

In this section, I will create a Pod that mounts PersisentVolume within a minikube cluster. This is going to be a kind of PersisentVolume object, but this time, it will not be bound to the life cycle of the Pod. Indeed, since it will be managed as a real PersisentVolume object, the hostPath type will get its life cycle independent of the Pod.

The very first thing to do is to create the PersisentVolume object that will be a hostPath type. Here is the YAML file to do that. Please note that I created this PersisentVolume object with some arbitrary labels in the metadata section. This is so that it will be easier to fetch it from the PersistentVolumeClaim object later:

# ~/pv.yaml

apiVersion: v1

kind: PersistentVolume

metadata:

  name: my-hostpath-pv

  labels:

    type: hostpath

    env: prod

spec:

  capacity:

    storage: 1Gi

  accessModes:

    - ReadWriteOnce

  hostPath:

    path: "/Users/me/test"

We can now list the PersisentVolume entries available in our cluster, and we should observe that this one exists. Please note that the pv alias works, too:

$ kubectl create -f pv.yaml

persistentvolume/my-hostpath-pv created

$ kubectl get pv

NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE

my-hostpath-pv   1Gi        RWO            Retain           Available                                   49s

We can see that the PersisentVolume was successfully created.

Now, we need to create two things in order to mount the PersisentVolume object:

  • A PersistentVolumeClaim object that targets this specific PersisentVolume object
  • A Pod to use the PersistentVolumeClaim object

Let's proceed, in order, with the creation of the PersistentVolumeClaim object:

# ~/pvc.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: my-hostpath-pvc

spec:

  resources:

    requests:

      storage: 1Gi

  selector:

    matchLabels:

       type: hostpath

       env: prod  

The important aspect of this PersisentVolumeClaim object is that it is going to fetch the proper volume by using its labels, using the selector key. Let's create it and check that it was successfully created in the cluster. Please note that the pvc alias also works here:

$ kubectl create -f pvc.yaml

persistentvolumeclaim/my-hostpath-pvc created

$ kubectl get pvc

NAME              STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE

my-hostpath-pvc   Pending                                      standard       53s

Now that the PersisentVolume object and the PersistentVolumeClaim object exist, I can create a Pod that will mount the PV using the PVC.

Let's create an NGINX Pod that will do the job:

# ~/Pod.yaml

apiVersion: v1

kind: Pod

metadata:

  name: nginx

spec:

  containers

  - image: nginx

    name: nginx

    volumeMounts:

      - mountPath: "/var/www/html"

        name: mypersistentvolume

    volumes:

      - name: mypersistentvolume

        persistentVolumeClaim:

          claimName: my-hostpath-pvc   

As you can see, in the volumeMounts section, the PersistentVolumeClaim object is referenced as a volume, and we reference the PVC by its name. Note that the PVC must live in the same namespace as the Pod that mounts it. This is because PVCs are namespace-scoped resources, whereas PVs are not. There are no labels and selectors for this one; to bind a PVC to a Pod, you simply need to use the PVC name.

That way, the Pod will become attached to the PersistentVolumeClaim object, which will find the corresponding PersisentVolume object. This, in the end, will make the host path available and mounted on my NGINX Pod.

Now we can create the three objects in the following order:

  1. The PersisentVolume object
  2. The PersistentVolumeClaim object
  3. The Pod object

Note that before you go any further, you need to make sure that the /Users/me/test directory exists on your host machine or your worker Node. This is because this is the path specified in the PV definition.

You can achieve that using the following commands if you have not already created these resources in your cluster:

$ kubectl create -f pvc.yaml

persistentvolumeclaim/my-hostpath-pvc created

$ kubectl create -f Pod.yaml

Pod/nginx created

Now, let's check that everything is okay by checking the status of our PersistentVolumeClaim object:

zEverything seems to be okay! We have just demonstrated a typical workflow. No matter what kind of storage you need, it's always going to be the same:

  1. First, the storage must be provisioned.
  2. Second, you create a PV entry in order to have a pointer to it in Kubernetes.
  3. Third, you provision a PVC capable of fetching this PV.
  4. Fourth, you mount the PVC (not the PV directly) as a volume mount to a Pod.

And that's it!

So far, we have learned what PersistentVolume and PersistentVolumeClaim objects are and how to use them to mount persistent storage on your Pods.

Next, we must continue our exploration of the PersistentVolume and PersistentVolumeClaim mechanics by explaining the life cycle of these two objects. Because they are independent of the Pods, their life cycles have some dedicated behaviors that you need to be aware of.

Understanding the life cycle of a PersistentVolume object in Kubernetes

PersistentVolume objects are good if you want to keep the state of your app without being constrained by the life cycle of the Pods or containers that are running them.

However, since PersistentVolume objects get their very own life cycle, they have some very specific mechanics that you need to be aware of when you're using them. We'll take a closer look at them next.

Understanding that PersistentVolume objects are not bound to namespaces

The first thing to be aware of when you're using PersistentVolume objects is that they are not namespaced resources, but PersistentVolumeClaims objects are.

That's something very important to know. This is because when a Pod is using a PersistentVolume object, it is only exposed to the PersistentVolumeClaims object. So, its one requirement is that it is created in the same namespace as the Pod that is using it.

That being said, PersistentVolume objects are constrained by namespaces, unlike PersistentVolumeClaim objects. Indeed, they are created cluster-wide. So, do bear that in mind: PersistentVolumeClaim needs to be created in the same namespace resource as the Pods using them, but they are able to fetch PersistentVolume resources that are not in any namespace resource at all.

To figure this out, I invite you to create the following PersistentVolume object using the following YAML file, which will create a PersistentVolume called new-pv-hostpath:

# ~/new-pv-hostpath.yaml

apiVersion: v1

kind: PersistentVolume

metadata:

  name: new-pv-hostpath

  spec:

    accessModes:

      - ReadWriteOnce

    capacity:

      storage: 1Gi

    hostPath:

      path: "/home/user/mydirectory"

Once the file has been created, we can apply it against our cluster using the kubectl create -f new-pv-hostpath.yaml command:

$ kubectl create -f new-pv-hostpath

persistentvolume/new-pv-hostpath created

Then, we will run the kubectl get pv/new-hostpath-pv -o yaml | grep -i "namespace" command, which will output nothing. This means the PersistentVolume object is not a namespace:

$ kubectl get api-resources –namespaced=false #kubectl

Olala…

As you can see, the PersistentVolume object appears in the output of the command, which means it is not living in any namespace!

Now, let's examine another important aspect of PersistentVolume known as reclaiming a policy. This is something that is going to be important when you want to unmount a PVC from a running Pod.

Reclaiming a PersistentVolume object

When it comes to PersistentVolume, there is a very important option that you need to understand, which is the reclaim policy. But what does this option do?

This option will tell Kubernetes what treatment it should give to your PersistentVolume object when you delete the corresponding PersistentVolumeClaim object that was attaching it to the Pods.

Indeed, deleting a PersistentVolumeClaim object consists of deleting the link between the Pod(s) and your PersistentVolume object, so it's like you unmount the volume and then the volume becomes available again for another application to use. However, in some cases, you don't want that behavior; instead, you want your PersistentVolume object to be automatically removed when its corresponding PersistentVolumeClaim object has been deleted. That's why the reclaim policy option exists and it is what you should configure.

The reclaim policy can be set to three statuses, as follows:

  • Delete
  • Retain
  • Recycle

Let's explain these three reclaim policies.

The delete one is the simplest of the three. When you set your reclaim policy to delete, the PersistentVolume object will be wiped out and the PersistentVolume entry will be removed from the Kubernetes cluster when the corresponding PersistentVolumeClaim object is deleted. That's the behavior for sensitive data. So, use this when you want your data to be deleted and not used by any other application. Bear in mind that this is a permanent option, so you might want to build a backup strategy with your underlying storage provider if you need to recover anything.

The retain policy is the second policy and is contrary to the delete policy. If you set this reclaim policy, the PersistentVolume object won't be deleted if you delete its corresponding PersistentVolumeClaim object. Instead, the PersistentVolume object will enter the released status, which means it is still available in the cluster, and all of its data can be manually retrieved by the cluster administrator.

The third policy is the recycle reclaim policy, which is a kind of combination of the previous two policies. First, the volume is wiped of all its data, such as a basic rm -rf volume/* volume. However, the volume itself will remain accessible in the cluster, so you can mount it again on your application.

The reclaim policy can be set in your cluster directly in the YAML definition file at the PersistentVolume level.

Updating a reclaim policy

The good news with a reclaim policy is that you can change it after the PersistentVolume object has been created; it is a mutable setting. To do that, you can simply list the PVs in your cluster and then issue a kubectl patch command to update the PV of your choice:

$ kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                     STORAGECLASS   REASON   AGE

my-hostpath-pv            1Gi        RWO         Retain         Available                                           24m

As you can see, this PV has a Retain reclaim policy. I'll now update it to Delete using the kubectl patch command against the my-hostpath-pv PersistentVolume:

$ kubectl patch pv/my-hostpath-pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'

persistentvolume/my-hostpath-pv patched

$ kubectl get pv

NAME                                                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                             STORAGECLASS   REASON   AGEpath-pv                             1Gi        RWO            Delete           Available                                                     3h13m

We can observe that the reclaim policy was updated from Retain to Delete!

Now, let's discuss the different statuses that PVs and PVCs can have.

Understanding PersistentVolume and PersistentVolumeClaims statuses

Just like Pods can be in a different state, such as Pending, ContainerCreating, Running, and more, PersistentVolume and PersistentVolumeClaim can also hold different states. You can identify their state by issuing the kubectl get pv and kubectl get pvc commands.

PersistentVolume has the following different states that you need to be aware of:

  • Available
  • Bound
  • Terminating

On their side, PersistentVolumeClaim can hold one additional status, which is the Terminating one.

Let's explain these different states in more detail.

The Available status indicates that the PersistentVolume object is created and ready to be mounted by a PersistentVolumeClaim object. There's nothing wrong with it, and the PV is just ready to be used.

The Bound status indicates that the PersistentVolume object is currently mounted to one or several Pods. The PersistentVolume is bound to a PersistentVolumeClaim object. Essentially, it indicates that the volume is currently in use. When this status is applied to a PersistentVolumeClaim object, this indicates that the PVC is currently in use: that is, a Pod is using it and has access to a PV through it.

The Terminating status applies to a PersistentVolumeClaim object. This is the status the PVC enters after you issue a kubectl delete pvc command. It is during this phase that the PV the PVC is bound to is destroyed and wiped out. This happens if its reclaim policy is set to Retain and is destroyed when it is then set to Delete.

We now have all the basics relating to PersistentVolume and PersistentVolumeClaim that should be enough to start using persistent storage in Kubernetes. However, there's still something important to know about this topic, and it is called dynamic provisioning. This is a very impressive aspect of Kubernetes that makes it able to communicate with cloud provider APIs to create persistent storage on the cloud. Additionally, it can make this storage available on the cluster by dynamically creating PV objects. In the next section, we will compare static and dynamic provisioning.

Static and dynamic PersistentVolume provisioning

So far, we've only provisioned PersistentVolume by doing static provisioning. Now we're going to discover dynamic PersistentVolume provisioning, which enables PersistentVolume provisioning directly from the Kubernetes cluster.

Static versus dynamic provisioning

So far, when using static provisioning, you have learned that you have to follow this workflow:

  1. You create the piece of storage against the cloud provider or the backend technology.
  2. Then, you create the PersistentVolume object to serve as a Kubernetes pointer to this actual storage.
  3. Following this, you create a Pod and a PVC to bind the PV to the Pod.

That is called static provisioning. It is static because you have to create the piece of storage before creating the PV and the PVC in Kubernetes. It works well; however, at scale, it can become more and more difficult to manage, especially if you are managing dozens of PV and PVC. Let's say you are creating an Amazon EBS volume to mount it as a PersistentVolume object, and you would do it like this with static provisioning:

  1. Authenticate against the AWS console.
  2. Create an EBS volume.
  3. Copy/paste its unique ID to a PersistentVolume YAML definition file.
  4. Create the PV using your YAML file.
  5. Create a PVC to fetch this PV.
  6. Mount the PVC to the Pod object.

Again, it should work, but it would become extremely time-consuming to do at scale, with possibly dozens and dozens of PVs and PVCs.

That's why Kubernetes developers decided that it would be better if Kubernetes was capable of provisioning the piece of actual storage on your behalf along with the PersistentVolume object to serve as a pointer to it. This is known as dynamic provisioning.

Introducing dynamic provisioning

When using dynamic provisioning, you configure your Kubernetes cluster so that it authenticates for you on your AWS account. Then, you issue a command to provision an EBS volume and automatically create a PersistentVolume claim to bind it to a Pod.

That way, you can save a huge amount of time by getting things automated. Dynamic provisioning is so useful because Kubernetes supports a wide range of storage technologies. We already introduced a few of them earlier in this chapter, when we mentioned NFS, Google PD, Amazon EBS volumes, and more.

But how does Kubernetes achieve this versatility? Well, the answer is that it makes use of a third resource kind, which we're going to discover in this chapter, that is the StorageClass object.

Introducing StorageClasses

StorageClass is another resource kind exposed by kube-apiserver. This resource kind is the one that grants Kubernetes the ability to deal with several underlying technologies transparently.

You can access and list the storageclasse resources created within your Kubernetes cluster by using kubectl. Here is the command to list the storage classes:

$ kubectl get storageclass

NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE

standard (default)   k8s.io/minikube-hostpath   Delete          Immediate           false                  24d

Additionally, you can use the plural form of storageclasses along with the sc alias. The following three commands are essentially the same:

$ kubectl get storageclass

$ kubectl get storageclasses

$ kubectl get sc

Note that I haven't included the output of the command for simplicity, but it is essentially the same for the three commands. There are two fields within the command output that are important to us:

  • NAME: This is the name and the unique identifier of the storageclass object.
  • PROVISIONNER: This is the name of the underlying storage technology: this is basically a piece of code the Kubernetes cluster uses to interact with the underlying technology.

    Important Note

    Note that you can create multiple StorageClass objects that use the same provisioner.

As I'm currently using a minikube cluster, I have a storageclass resource called standard that is using the k8s.io/minikube-hostpath provisioner.

This provider deals with my host filesystem to automatically create provisioned host path volumes for my Pods, but it could be the same for Amazon EBS volumes or Google PDs.

Here is the same output for a Kubernetes cluster based on Google GKE:

$ kubectl get sc

And here is the same output for a Kubernetes cluster based on Amazon EKS:

$ kubectl get sc

As you might have gathered, by default, we get different storage because all of these clusters need to access different kinds of storage. In GKE, Google built a storage class with a provisioner that was capable of interacting with the Google PD's API, which is a pure Google Cloud feature, In contrast, in AWS, we have a storageclass object with a provisioner that is capable of dealing with EBS volume APIs. These provisioners are just libraries that interact with the APIs of these different cloud providers.

The storageclass objects are the reason why Kubernetes is capable of dealing with so many different storage technologies. From a Pod perspective, no matter if it is an EBS volume, NFS drive, or GKE volume, the Pod will only see a PersistentVolume object. All the underlying logic dealing with the actual storage technology is implemented by the provisioner the storageclass object uses.

The good news is that you can add as many storageclass objects with their provisioner as you want to your Kubernetes cluster in a plugin-like fashion. As of writing, the following is a list of PersistentVolume types that are supported in Kubernetes:

  • awsElasticBlockStore: Amazon EBS volumes
  • gcePersistentDisk: Google Cloud PD
  • azureDisk: Azure Disk
  • azureFile: Azure File
  • cephfs: Ceph-based filesystems
  • csi: Container storage interface
  • glusterfs: GlusterFS-based filesystems
  • nfs: Regular network file storage

By the way, nothing is preventing you from expanding your cluster by adding storageclasses to your cluster. You'll simply add the ability to deal with different storage technologies from your cluster. For example, I can add an Amazon EBS storageclass object to my minikube cluster. But while it is possible, it's going to be completely useless. Indeed, since my minikube setup is not running on an EC2 instance but my local machine, I won't be able to attach an EBS.

Understanding the role of PersistentVolumeClaim for dynamic storage provisioning

When using dynamic storage provisioning, the PersistentVolumeClaim object will get an entirely new role. Since PersistentVolume is gone in this use case, the only object that will be left for you to manage is the PersistentVolumeClaim one.

Let's demonstrate this by creating an NGINX Pod that will mount a hostPath type dynamically. In this example, the administrator won't have to provision a PersistentVolume object at all. This is because the PersistentVolumeClaim object and the StorageClass object will be able to create and provision it together.

Let's start by creating a new namespace, called dynamicstorage, where we will run our examples:

$ kubectl create ns dynamicstorage

namespace/dynamicstorage created

Now, let's run a kubectl get sc command to check that we have a storage class that is capable of dealing with the hostPath that is provisioned in our cluster.

For this specific storageclass object in this specific Kubernetes setup (minikube) we don't have to do anything to get the storageclass object as it is created by default at cluster installation. However, this might not be the case depending on your Kubernetes distribution.

Bear that in mind because it is very important: clusters that have been set up on GKE might have default storage classes that are capable of dealing with Google's storage offerings, whereas an AWS-based cluster might have storageclass to communicate with Amazon's storage offerings and more. With minikube, we have at least one default storageclass object that is capable of dealing with a hostPath-based PersistentVolume object. If you understand that, you should understand that the output of the kubectl get sc command will be different depending on where your cluster has been set up:

$ kubectl get sc

NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE

standard (default)   k8s.io/minikube-hostpath   Delete          Immediate           false                  11h

As you can see, we do have a storage class called standard on our cluster that is capable of dealing with hostPath.

Important Note

Some complex clusters spanning across multiple clouds and or on-premises might be provisioned with a lot of different storageclass objects to be able to communicate with a lot of different storage technologies. Bear in mind that Kubernetes is not tied to any cloud provider and, therefore, does not force or limit you in your usage of backing storage solutions.

Now, we will create a PersistentVolumeClaim object that will dynamically create a hostPath type. Here is the YAML file to create the PVC. Please note that storageClassName is set to standard:

# ~/pvc-dynamic.yaml

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: my-dynamic-hostpath-pvc

spec:

  accessModes:

    - ReadWriteOnce

  storageClassName: standard # VERY IMPORTANT !

  resources:

    requests:

      storage: 1Gi

  selector:

    matchLabels:

       type: hostpath

       env: prod  

Following this, we can create it in the proper namespace:

$ kubectl create -f pvc-dynamic.yaml -n dynamicstorage

persistentvolumeclaim/my-dynamic-hostpath-pvc created

Now that this PVC has been created, we can add a new Pod that will mount this PersistentVolumeClaim object. As soon as this claim has been mounted, it will create a PersistentVolume object using the provisioner and then bind to it.

That's how dynamic provisioning works, and it is the same behavior no matter if it is on-premise or in the cloud. Here is a YAML definition file of a Pod that will mount the PersistentVolumeClaim object created earlier:

# ~/pvc-dynamic.yaml

apiVersion: v1

kind: Pod

metadata:

  name: nginx-dynamic-storage  

spec:

  containers

  - image: nginx

    name: nginx

    volumeMounts:

      - mountPath: "/var/www/html"

        name: mypersistentvolume

    volumes:

      - name: mypersistentvolume

        persistentVolumeClaim:

          claimName: my-dynamic-hostpath-pvc   

Now let's create it in the correct namespace:

$ kubectl create -f Pod-dynamic.yaml -n dynamicstorage

Pod/nginx-dynamic-storage created

Next, let's list the PersistentVolume object. If everything worked, we should get a brand new PersistentVolume object that has been dynamically created and is in the bound state:

$ kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                               STORAGECLASS   REASON   AGE

pvc-56b79a65-86f6-4db5-b800-2ec415156097   1Gi      RWO      Delete       Bound     dynamicstorage/my-dynamic-hostpath-pvc   standard        7m19s

Everything is OK! We're finally done with dynamic provisioning! Please note, by default, the reclaim policy will be set to Delete so that the PV is removed when the PVC that created it is removed, too. Don't hesitate to change the reclaim policy if you need to retain sensitive data.

Summary

We have arrived at the end of this chapter, which taught you how to manage persistent storage on Kubernetes. You discovered that PersistentVolume is a resource kind that acts as a point to an underlying resource technology, such as hostPath and NFS, along with cloud-based solutions such as Amazon EBS and Google PDs.

Additionally, you discovered that you cannot use your PersistentVolume object without PersistentVolumeClaim, and that PersistentVolumeClaim acts as an object to fetch and mount PersistentVolume to your Pods. You learned that PersistentVolume can hold different reclaim policies, which makes it possible to remove, recycle, or retain them when their corresponding PersistentVolumeClaim object gets removed.

Finally, we discovered what dynamic provisioning is and how it can help us. Bear in mind that you need to be aware of this feature because if you create and retain too many volumes, it can have a negative impact on your cloud bill at the end of the month.

We're now done with the basics of Kubernetes, and this chapter is also the end of this section. In the next section, you're going to discover Kubernetes controllers, which are objects designed to automate certain tasks in Kubernetes, such as maintaining a number of replicas of your Pods, either using the Deployment resource kind or the StatefulSet resource kind. There are still a lot of things to learn!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.229.113