7 Pod storage and the CSI

This chapter covers

  • Introducing the virtual filesystem (VFS)
  • Exploring Kubernetes in-tree and out-of-tree storage providers
  • Running dynamic storage in a kind cluster with multiple containers
  • Defining the Container Storage Interface (CSI)

Storage is complex, and this book won’t cover all the storage types available to the modern app developer. Instead, we’ll start with a concrete problem to solve: our Pod needs to store a file. The file needs to persist between container restarts, and it needs to be schedulable to new nodes in our cluster. In this case, the default baked-in storage volumes that we’ve already covered in this book won’t “cut the mustard”:

  • Our Pod can’t rely on hostPath because the node itself may not have a unique writeable directory on its host disk.

  • Our Pod also can’t rely on emptyDir because it is a database, and databases can’t afford to lose information stored on an ephemeral volume.

  • Our Pod might be able to use Secrets to retain its certificate or password credentials to access services like databases, but this Pod is generally not considered a volume when it comes to applications running on Kubernetes.

  • Our Pod has the ability to write data on the top layer of its container filesystem. This is generally slow and not recommended for high-volume write traffic. And, in any case, this simply won’t work: this data disappears as soon as the Pod is restarted!

Thus, we’ve stumbled upon an entirely new dimension of Kubernetes storage for our Pod: fulfilling the needs of the application developer. Kubernetes applications, like regular cloud applications, often need to be able to mount EBS volumes, NFS shares, or data from S3 buckets inside containers and read from or write to these data sources. To solve this application storage problem, we’ll need a cloud-friendly data model and API for storage. Kubernetes represents this data model using the concepts of PersistentVolume (PV), PersistentVolumeClaim (PVC), and StorageClass:

  • PVs give administrators a way to manage disk volumes in a Kubernetes environment.

  • PVCs define a claim to these volumes that can be requested by an application (by a Pod) and fulfilled by the Kubernetes API under the hood.

  • StorageClass gives application developers a way to get a volume without knowing exactly how it is implemented. It gives applications a way to request a PVC without knowing exactly which type of PersistentVolume is being used under the hood.

StorageClasses allow applications to request volumes or storage types that fulfill different end-user requirements in a declarative way. This allows you to design StorageClasses for your data center that might fulfill various needs, such as

  • Complex data SLAs (what to keep, how long to keep it, and what not to keep)

  • Performance requirements (batch-processing applications versus low-latency applications)

  • Security and multi-tenancy semantics (for users to access particular volumes)

Keep in mind that many containers (for example, a CFSSL server for managing application certificates) might not need a lot of storage, but they will need some storage in case they restart and need to reload basic caching or certificate data, for example. In the next chapter, we’ll dig further into the high-level concepts of how you might manage StorageClasses. If you’re new to Kubernetes, you might be wondering if Pods can maintain any state without a volume.

Do Pods retain state?

In short, the answer is no. Don’t forget that a Pod is an ephemeral construct in almost all cases. In some cases (for example, with a StatefulSet) some aspects of a Pod (such as the IP address or, potentially, a locally mounted host volume directory) might persist between restarts.

If a Pod dies for any reason, it will be recreated by a process in the Kubernetes controller manager (KCM). When new Pods are created, it is the Kubernetes scheduler’s job to make sure that a given Pod lands on a node capable of running it. Hence, the ephemeral nature of Pod storage that allows this real-time decision making is integral to the flexibility of managing large fleets of applications.

7.1 A quick detour: The virtual filesystem (VFS) in Linux

Before going head-on into the abstractions that Kubernetes offers for Pod storage, it’s worth noting that the OS itself also provides these abstractions to programs. In fact, the filesystem itself is an abstraction for a complicated schematic that connects applications to a simple set of APIs that we’ve seen before. You likely know this already, but recall that accessing a file is like accessing any other API. A file in a Linux OS supports a variety of obvious and basic commands (as well as some more opaque ones not listed here):

  • read()—Reads a few bytes from a file that is open

  • write()—Writes a few bytes from a file that is open

  • open()—Creates and/or opens a file so that reads and writes can take place

  • stat()—Returns some basic information about a file

  • chmod()—Changes what users or groups can do with a file and reads, writes, and executes permissions

All of these operations are called against what is known as the virtual filesystem (VFS), which ultimately is a wrapper around your system’s BIOS in most cases. In the cloud, and in the case of FUSE (Filesystem in Userspace), the Linux VFS is just a wrapper to what is ultimately a network call. Even if you are writing data to a disk outside of your Linux machine, you are still accessing that data via the Linux kernel through the VFS. The only difference is that, because you are writing to a remote disk, the VFS uses its NFS client, FUSE client, or whatever other filesystem client it needs, based on your OS, to send this write over the wire. This is depicted in figure 7.1, where all of the various container write operations are actually talking through the VFS API:

  • In the case of Docker or CIR storage, the VFS sends filesystem operations to a device mapper or OverlayFS, which ultimately sends traffic to local devices through your system’s BIOS.

  • In the case of Kubernetes infrastructure storage, the VFS sends filesystems operations to locally attached disks on your node.

  • In the case of applications, the VFS often sends writes over the network, especially in “real” Kubernetes clusters running in the cloud or in a data center with many computers. This is because you are not using the local volume types.

What about Windows?

In Windows nodes, the kubelet mounts and provides storage to containers in a similar way as does Linux. Windows kubelets typically run the CSI Proxy (https://github.com/kubernetes-csi/csi-proxy) that makes low-level calls to the Windows OS, which mounts and unmounts volumes when the kubelet instructs it to do so. The same concepts around filesystem abstraction exist in the Windows ecosystem (https://en.wikipedia.org/wiki/Installable_File_System).

In any case, you don’t need to understand the Linux storage API in order to mount PersistentVolumes in Kubernetes. It is, however, helpful to understand the basis for filesystems when creating Kubernetes solutions because, ultimately, your Pods will interact with these low-level APIs. Now, let’s get back to our Kubernetes-centric view of Pod storage.

7.2 Three types of storage requirements for Kubernetes

The term storage is overloaded. Before we go down the rabbit hole, let’s distinguish the types of storage that typically cause problems in Kubernetes environments:

  • Docker/containerd/CRI storage—The copy-on-write filesystem that runs your containers. Containers require special filesystems on their resident run times because they need to write to a VFS layer (this is why, for example, you can run rm -rf /tmp on a container without actually deleting anything from your host). Typically, the Kubernetes environment uses a filesystem such as btrfs, overlay, or overlay2.

  • Kubernetes infrastructure storage—The hostPath or Secret volumes that are used on individual kubelets for local information sharing (for example, as a home for a secret that is going to be mounted in a Pod or a directory from where a storage or networking plugin is called).

  • Application storage—The storage volumes that Pods use in a Kubernetes cluster. When Pods need to write data to disk, they need to mount a storage volume, and this is done in a Pod specification. Common storage volume filesystems are OpenEBS, NFS, GCE, EC2 and vSphere persistent disks, and so on.

In figure 7.1, which is extended by figure 7.2, we visually depict how all three types of storage are fundamental steps in starting a Pod. Previously, we only looked at the CNI-related Pod startup sequence steps. As a reminder, there are several checks done by the scheduler before a Pod starts to confirm storage is ready. Then, before a Pod is started, the kubelet and the CSI provider mount external application volumes on a node for the Pod to use. A Pod that is running might write data to its own OverlayFS, and this is completely ephemeral. For example, it might have a /tmp directory that it uses for scratch space. Finally, once a Pod is running, it reads local volumes and might write other remote volumes.

Figure 7.1 The three types of storage in the startup of a Pod

Now, the first figure ends with the CSIDriver, but there are many other layers to the sequence diagram that it depicts. In figure 7.2, we can see that the CSIDriver, containerd, layered filesystem, and CSI volume itself are all targeted downstream from the processes of the Pod. Specifically, when the kubelet starts a process, it sends a message to containerd, which then creates a new writeable layer in the filesystem. Once the containerized process starts, it needs to read secrets from files that are mounted to it. Thus, there are many different types of storage calls made in a single Pod. In a typical production scenario, each has its own semantics and purpose in the life cycle of an app.

The CSI volume mounting step is one of the final events occurring before a Pod starts. To understand this step, we need to take a quick detour and look at how Linux organizes its filesystems.

Figure 7.2 The three types of storage in the startup of a Pod, part 2

7.3 Let’s create a PVC in our kind cluster

Enough with the theory; let’s give some application storage to a simple NGINX Pod. We defined PVs, PVCs, and StorageClasses earlier. Now, let’s see how they are used to provide a real Pod with a scratch directory to store some files:

  • The PV is created by a dynamic storage provisioner that runs on our kind cluster. This is a container that provides Pods with storage by fulfilling PVCs on demand.

  • The PVC will not be available until the PersistentVolume is ready because the scheduler needs to ensure that it can mount storage into the Pod’s namespace before starting it.

  • The kubelet will not start the Pod until the VFS has successfully mounted the PVC into the Pod’s filesystem namespace as a writable storage location.

Luckily, our kind cluster comes out of the box with a storage provider. Let’s see what happens when we ask for a Pod with a new PVC, one that hasn’t been created yet and that has no associated volume already in our cluster. We can check which storage providers are available in a Kubernetes cluster by running the kubectl get sc command as follows:

$ kubectl get sc
NAME                   PROVISIONER             RECLAIMPOLICY
standard (default)     rancher.io/local-path   Delete
 
VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION    AGE
WaitForFirstConsumer   false                   9d

In order to demonstrate how Pods share data between containers, as well as mount multiple storage points with different semantics, this time around we’ll run a Pod with two containers and two volumes. In summary,

  • The containers in a Pod can share information with each other.

  • Persistent storage can be created on the fly in kind by its dynamic hostPath provisioner.

  • Any container can have multiple volume mounts in a Pod.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic1
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100k                             
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: busybox                              
    name: busybox
    volumeMounts:
      - mountPath: /shared
        name: shared
  - image: nginx                                
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    volumeMounts:
      - mountPath: /var/www
        name: dynamic1                          
      - mountPath: /shared
        name: shared
  volumes:
  - name: dynamic1
    persistentVolumeClaim:
      claimName: dynamic1                       
  - name: shared
    emptyDir: {}                                
$ kubectl create -f simple.yaml
pod/nginx created
 
$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          3s         
 
$ kubectl get pods
NAME    READY   STATUS              RESTARTS   AGE
nginx   0/1     ContainerCreating   0          5s
 
$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          13s        

Shares a folder with the second container

Specifies a dynamic storage volume for the second container, in addition to sharing a folder with the first container

Mounts the volume that was previously created

Because the volume stanza is outside of our container’s stanza, multiple Pods can read the same data.

Accesses the shared volume by both containers if needed

The amount of storage requested; our PVC determines if it can be fulfilled.

The first state, Pending, occurs because the volume for our Pod doesn’t exist yet.

The final state, Running, means that the volume for our Pod exists (via a PVC), and the Pod can access it; thus, the kubelet starts the Pod.

Now, we can create a file in our first container by running a simple command, such as echo a > /shared/ASDF. We can easily see the results of this in the second container in the emptyDir folder named /shared/ for both containers:

$ kubectl exec -i -t nginx -t busybox -- /bin/sh
Defaulting container name to busybox.
Use kubectl describe pod/nginx -n default to see the containers in this pod.
/ # cat /shared/a
ASDF

We now have a Pod that has two volumes: one ephemeral and one permanent. How did this happen? If we look at the logs that come with our kind cluster for the local-path-provisioner, it becomes obvious:

$ kubectl logs local-path-provisioner-77..f-5fg2w
    -n local-path-storage
controller.go:1027] provision "default/dynamic2" class "standard":
    volume "pvc-ddf3ff41-5696-4a9c-baae-c12f21406022"
        provisioned
controller.go:1041] provision "default/dynamic2" class "standard":
        trying to save persistentvolume "pvc-ddf3ff41-5696-4a9c-baae-
        c12f21406022"
controller.go:1048] provision "default/dynamic2" class "standard":
        persistentvolume "pvc-ddf3ff41-5696-4a9c-baae-c12f21406022" saved
controller.go:1089] provision "default/dynamic2" class "standard": succeeded
event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim",
        Namespace:"default", Name:"dynamic2",
        UID:"ddf3ff41-5696-4a9c-baae-
      c12f21406022", APIVersion:"v1", ResourceVersion:"11962",
        FieldPath:""}
    ): type: 'Normal' reason:
        'ProvisioningSucceeded'
    Successfully provisioned volume
        pvc-ddf3ff41-5696-4a9c-baae-c12f21406022

The container continues to run as a controller at all times in our cluster. When it sees that we want a volume called dynamic2, it creates it for us. Once this succeeds, the volume itself is bound to the PVC by Kubernetes itself. In the Kubernetes core, if a volume exists that satisfies the needs of a PVC, then a binding event occurs.

At this point, the Kubernetes scheduler confirms that this particular PVC is now deployable on a node, and if this check passes, the Pod moves from the Pending state to the ContainerCreating state, as we saw earlier. As you know by now, the ContainerCreating state is simply the state wherein cgroups and mounts are set up by the kubelet for a Pod before it enters the Running state. The fact that this volume was made for us (we did not manually make a PersistentVolume) is an example of dynamic storage in a cluster. We can take a look at the dynamically generated volumes like so:

$ kubectl get pv
NAME                                       CAPACITY   ACCESS
pvc-74879bc4-e2da-4436-9f2b-5568bae4351a   100k       RWO
 
RECLAIM POLICY   STATUS  CLAIM             STORAGECLASS
Delete           Bound   default/dynamic1  standard

Looking a little closer, we can see that the StorageClass standard is used for this volume. In fact, that storage class is the way that Kubernetes was able to make this volume. When a standard or default storage class is defined, a PVC that has no storage class is automatically configured to receive the default PVC if one exists. This actually happens via an admission controller that premodifies new Pods coming into the API server, adding a default storage class label to them. With this label in place, the volume provisioner that runs in your cluster (in our case, this is called local-path-provisioner, and comes bundled with kind) automatically detects the new Pod’s storage request and immediately creates a volume:

$ kubectl get sc -o yaml
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"storage.k8s.io/v1",
           "kind":"StorageClass","metadata":{
         "annotations":{
                "storageclass.kubernetes.io/is-default-class": "true"}
               ,"name":"standard"
             },
             "provisioner":"rancher.io/local-path",
             "reclaimPolicy":"Delete",
          "volumeBindingMode":"WaitForFirstConsumer"}
      storageclass.kubernetes.io/is-default-class: "true"    
    name: standard
  provisioner: rancher.io/local-path
kind: List                                                   

The is-default-class makes this the go-to volume for Pods wanting storage without needing to explicitly request a storage class.

You can have many different storage classes in a cluster.

Once we realize that Pods can have many different types of storage, it becomes clear that we need a pluggable storage provider for Kubernetes. That is the purpose of the CSI interface (https://kubernetes-csi.github.io/docs/).

7.4 The container storage interface (CSI)

The Kubernetes CSI defines an interface (figure 7.3) so that vendors providing storage solutions can easily plug themselves into any Kubernetes cluster and provide applications with a broad range of storage solutions to meet different needs. It is the alternative to in-tree storage, where the kubelet itself bakes the drivers for a volume type into its startup process for a Pod.

Figure 7.3 The architecture of the Kubernetes CSI model

The purpose of defining the CSI is to make it easy to manage storage solutions from a vendor’s perspective. To frame this problem, let’s consider the underlying storage implementation for a few Kubernetes PVCs:

  • vSphere’s CSI driver can create VMFS- or vSAN-based PersistentVolume objects.

  • Filesystems such as GlusterFS have CSI drivers that allow you to run volumes in a distributed fashion in containers if you want.

  • Pure Storage has a CSI driver that directly creates volumes on a Pure Storage disk array.

Many other vendors also provide CSI-based storage solutions for Kubernetes. Before we describe how the CSI makes this easy, we’ll take a quick look at the in-tree provider problem in Kubernetes. This CSI was largely a response to the challenges associated with managing storage volumes, posed by the in-tree storage model.

7.4.1 The in-tree provider problem

Since the inception of Kubernetes, vendors have spent a lot of time putting interoperability into its core codebase. The consequence of this was that vendors of different storage types had to contribute operability code into the Kubernetes core itself! There are still remnants of this in the Kubernetes codebase, as we can see at http://mng.bz/J1NV:

package glusterfs
 
import (
    "context"
         ...
    gcli "github.com/heketi/heketi/client/api/go-client"
    gapi "github.com/heketi/heketi/pkg/glusterfs/api"

The importation of the GlusterFS’s API package (Heketi is the REST API for Gluster) actually implies that Kubernetes is aware of and dependent on GlusterFS. Looking a little further, we can see how this dependency is manifested:

func (p *glusterfsVolumeProvisioner) CreateVolume(gid int)
    (r *v1.GlusterfsPersistentVolumeSource, size int,
     volID string, err error) {
  ...
    // GlusterFS/heketi creates volumes in units of GiB.
    sz, err := volumehelpers.RoundUpToGiBInt(capacity)
  ...
    cli := gcli.NewClient(p.url, p.user, p.secretValue)
  ...

The Kubernetes volume package ultimately makes calls to the GlusterFS API to create new volumes. This can also be seen for other vendors as well, such as VMware’s vSphere. In fact, many vendors, including VMware, Portworx, ScaleIO, and so on, have their own directories under the pkg/volume file in Kubernetes. This is an obvious anti-pattern for any open source project because it conflates vendor-specific code with that of the broader open source framework. This comes with obvious baggage:

  • Users have to align their version of Kubernetes with specific storage drivers.

  • Vendors have to continually commit code to Kubernetes itself to keep their storage offerings up to date.

These two scenarios are obviously unsustainable over time. Hence, the need for a standard to define externalized volume creation, mounting, and life cycle capabilities was born. Similarly to our look at CNI earlier, the CSI standard typically results in a DaemonSet running on all nodes that handle mounting (much like the CNI agents that handled IP injection for a namespace). Also, the CSI allows us to easily swap out one storage type for another and to even run multiple storage types at once (something not easily done with networks) because it specifies a specific volume-naming convention.

Note that the in-tree problem isn’t specific to storage. The CRI, CNI, and CSI are all born of polluted code that’s lived in Kubernetes for a long time. In the first versions of Kubernetes, the codebase was coupled to tools such as Docker, Flannel, and many other filesystems. These couplings are being moved out over time, and the CSI is just one prominent example of how code can move from in-tree to out-of-tree once the proper interfaces are in place. In practice, however, there is still quite a bit of vendor-specific life cycle code that lives in Kubernetes, and it will potentially take years to truly decouple these add-on technologies.

7.4.2 CSI as a specification that works inside of Kubernetes

Figure 7.4 demonstrates the workflow for provisioning a PVC with a CSI driver. It’s much more transparent and decoupled than what we see with GlusterFS, where different components accomplish different tasks in a discrete manner.

Figure 7.4 Provisioning a PVC with a CSI driver

The CSI specification abstractly defines a generic set of functionality that allows a storage service to be defined without specifying any implementation. In this section, we’ll go through some aspects of this interface in the context of Kubernetes itself. The operations it defines are in three general categories: identity services, controller services, and node services. At its heart is the notion of, as you might have guessed, a controller that negotiates the need for storage with a backend provider (your expensive NAS solution) and the Kubernetes control plane by fulfilling a dynamic storage request. Let’s take a quick peek at these three categories:

  • Identity services—Allow a plugin service to self-identify (provide metadata about itself). This allows the Kubernetes control plane to confirm that a particular type of storage plugin is running and available for a volume type.

  • Node services—Allow the kubelet itself to talk to a local service, which can do operations for the kubelet that are specific to a storage provider. For example, a CSI provider’s node service might call a vendor-specific binary when it is prompted to mount a particular type of storage. This is requested over a socket, communicating via the GRPC protocol.

  • Controller services—Implement the creation, deletion, and other life cycle-related events for a vendor’s storage volume. Keep in mind that in order for the NodeService to be of any value, the backend storage system being used needs to first create a volume that can be attached at the right moment to a kubelet. Thus, the controller services play a “glue” role, connecting Kubernetes to the storage vendor. As you might expect, this is implemented by running a watch against the Kubernetes API for volume operations.

The following code snippet provides a brief overview of the CSI specification. We don’t show all the methods here as they are available at http://mng.bz/y4V7:

service Identity {
  rpc GetPluginInfo(GetPluginInfoRequest)                  
  rpc GetPluginCapabilities(GetPluginCapabilitiesRequest)
  rpc Probe (ProbeRequest)
}
 
service Controller {
  rpc CreateVolume (CreateVolumeRequest)
  rpc DeleteVolume (DeleteVolumeRequest)                   
  rpc ControllerPublishVolume (ControllerPublishVolumeRequest)
}
 
service Node {
  rpc NodeStageVolume (NodeStageVolumeRequest)             
  rpc NodeUnstageVolume (NodeUnstageVolumeRequest)
  rpc NodePublishVolume (NodePublishVolumeRequest)
  rpc NodeUnpublishVolume (NodeUnpublishVolumeRequest)
  rpc NodeGetInfo (NodeGetInfoRequest)
  ...
}

The Identity service tells Kubernetes what type of volumes can be created by the controllers running in a cluster.

The Create and Delete methods are called before a node can mount a volume into a Pod, implementing dynamic storage.

The Node service is the part of CSI that runs on a kubelet, mounting the volume created previously into a specific Pod on demand.

7.4.3 CSI: How a storage driver works

A CSI storage plugin decomposes the operations necessary for mounting a Pod’s storage into three distinct phases. This includes registering a storage driver, requesting a volume, and publishing a volume.

Registering a storage driver is done via the Kubernetes API. This involves telling Kubernetes how to deal with this particular driver (whether certain things need to happen before a storage volume is writable) and letting Kubernetes know that a particular type of storage is available for the kubelet. The name of a CSI driver is important, as we will see shortly:

type CSIDriverInfoSpec struct {
    Name string `json:"name"`

When requesting a volume (by making an API call to your $200,000 NAS solution, for instance), the vendor storage mechanism is called upon to create a storage volume. This is done using the CreateVolume function we introduced previously. The call to CreateVolume is actually made by (typically) a separate service that is known as an external provisioner, which probably isn’t running in the DaemonSet. Rather, it’s a standard Pod that watches the Kubernetes API server and responds to volume requests by calling another API for a storage vendor. This service looks at created PVC objects and then calls CreateVolume against a registered CSI driver. It knows what driver to call because this information is provided to it by the volume name. (Hence, it is important to get the name field right.) In this scenario, the request for a volume in a CSI driver is separate from the mounting of that volume.

When publishing a volume, the volume is attached (mounted) to a Pod. This is done by a CSI storage driver that typically lives on every node of your cluster. Publishing a volume is a fancy way to say mounting a volume to a location the kubelet requests so a Pod can write data to it. The kubelet is in charge of making sure the Pod’s container is launched with the correct mount namespaces to access this directory.

7.4.4 Bind mounting

You might recall that earlier we defined mounts as simple Linux operations that expose a directory to a new place under the / tree. This is a fundamental part of the contract between the attacher and the kubelet, which is defined by the CSI interface. In Linux, the specific operation that we refer to when we make a directory available to a Pod (or any other process via mirroring a directory) is called a bind mount. Thus, in any CSI-provisioned storage environment, Kubernetes has several services running that coordinate the delicate interplay of API calls back and forth to reach the ultimate end goal of mounting external storage volumes into Pods.

Because CSI drivers are a set of containers often maintained by vendors, the kubelet itself needs to be able to accept that mounts might be created from inside a container. This is known as mount propagation and is an important part of the low-level Linux requirements for certain aspects of Kubernetes to work properly.

7.5 A quick look at a few running CSI drivers

We’ll conclude with a few concrete examples of real CSI providers. Because this may require a running cluster, rather than creating a walkthrough where we reproduce CSI behavior step by step (as we did with CNI providers), we’ll instead just share the running logs of the various components of a CSI provider. That way, you can see how the interfaces in this chapter are implemented and monitored in real time.

7.5.1 The controller

The controller is the brains of any CSI driver, connecting requests for storage with backend storage providers, such as vSAN, EBS, and so on. The interface it implements needs to be able to create, delete, and publish volumes on the fly for our Pods to use. We can see the continuous monitoring of the Kubernetes API server if we look directly at the logs of a running vSphere CSI controller:

I0711 05:38:07.057037       1 controller.go:819] Started provisioner
   controller csi.vsphere.vmware.com_vsphere-csi-controller-...-
I0711 05:43:25.976079       1 reflector.go:389] sigs.k8s.io/sig-
     storage-lib-external-provisioner/controller/controller.go:807:
        Watch close - *v1.StorageClass total 0 items received
I0711 05:45:13.975291       1 reflector.go:389] sigs.k8s.io/sig-
     storage-lib-external-provisioner/controller/controller.go:804:
        Watch close - *v1.PersistentVolume total 3 items received
I0711 05:46:32.975365       1 reflector.go:389] sigs.k8s.io/sig-
     storage-lib-external-provisioner/controller/controller.go:801:
        Watch close - *v1.PersistentVolumeClaim total 3 items received

Once these PVCs are perceived, the controller can request storage from vSphere itself. The volumes created by vSphere can then synchronize metadata across PVCs and PVs to confirm that a PVC is now mountable. After this, the CSI node takes over (the scheduler first will confirm that a CSI node for vSphere is healthy on the Pod’s destination).

7.5.2 The node interface

The node interface is responsible for communicating with kubelets and mounting storage to Pods. We can concretely see this by looking at the running logs of volumes in production. Previously, we attempted to run the NFS CSI driver in a hostile environment as a way to uncover lower-level VFS utilization by Linux. Now that we’ve covered the CSI interface, let’s again look back at how the NFS CSI driver looks in production.

The first thing we’ll look at is how both the NFS and vSphere CSI plugins use a socket for communicating with the kubelet. This is how the node components of the interface are called. When we look into the details of a CSI node container, we should see something like this:

$ kubectl logs
 csi-nodeplugin-nfsplugin-dbj6r  -c nfs
I0711 05:41:02.957011  1 nfs.go:47]
 Driver: nfs.csi.k8s.io version: 2.0.0       
I0711 05:41:02.963340  1 server.go:92] Listening for connections on address:
   &net.UnixAddr{
     Name:"/plugin/csi.sock",
     Net:"unix"}                               
 
$ kubectl logs csi-nodeplugin-nfsplugin-dbj6r
    -c node-driver-registrar
I0711 05:40:53.917188   1 main.go:108] Version: v1.0.2-rc1-0-g2edd7f10
I0711 05:41:04.210022   1 main.go:76] Received GetInfo call: &InfoRequest{}

Name of the CSI driver

The channel for the kubelet to talk to the CSI plugins it uses for storage

The naming of CSI drivers is important because it is part of the CSI protocol. The csi-nodeplugin prints its exact version on startup. Note that the csi.sock plugin directory is the common channel that the kubelet uses to talk to the CSI plugins:

$ kubectl logs -f vsphere-csi-node-6hh7l  -n kube-system
 -c vsphere-csi-node
{"level":"info","time":"2020-07-08T21:07:52.623267141Z",
  "caller":"logger/logger.go:37",
  "msg":"Setting default log level to :"PRODUCTION""}
{"level":"info","time":"2020-07-08T21:07:52.624012228Z",
   "caller":"service/service.go:106",
   "msg":"configured: "csi.vsphere.vmware.com"
      with clusterFlavor: "VANILLA"
      and mode: "node"",
      "TraceId":"72fff590-523d-46de-95ca-fd916f96a1b6"}
 
level=info msg="identity service registered"    
level=info msg="node service registered"
level=info msg=serving endpoint=
 "unix:///csi/csi.sock"                       

Shows that the identity of the driver is registered

Shows that the CSI socket is used

This concludes our treatment of the CSI interface and why it exists. Unlike other components of Kubernetes, this is not easy to discuss or reason about without a cluster with real workloads running in front of you. As a follow-up exercise, we highly recommend installing the NFS CSI provider (or any other CSI driver) on a cluster of your choice (VMs or bare metal). One exercise worthy of running through is measuring whether the creation of volumes slows over time and, if so, what the bottlenecks are.

We don’t include a live example of a CSI driver in this chapter because most of the current CSI drivers that are used in production clusters aren’t runnable inside of a simple kind environment. In general, as long as you understand that the provisioning of volumes is distinct from the mounting of those volumes, you should be well prepared to debug CSI failures in a production system by treating these two independent operations as distinct failure modes.

7.5.3 CSI on non-Linux OSs

Similar to CNI, the CSI interface is OS-agnostic; however, its implementation is quite natural for Linux users with the ability to run privileged containers. As with networking outside of Linux, the way CSI is implemented in a Linux process is a little different by tradition. For example, if you are running Kubernetes on Windows, you might find a lot of value in the CSI proxy project (https://github.com/kubernetes-csi/csi-proxy) that runs a service on every kubelet of your cluster, which abstracts away many of the PowerShell commands that implement CSI node functionality. This is because, on Windows, the concept of privileged containers is quite new and only works on certain, more recent versions of containerd.

In time, we expect that many people running Windows kubelets will also be able to run their CSI implementations as Windows DaemonSets, with behavior similar to that of the Linux DaemonSets we’ve demoed in this chapter. Ultimately the need to abstract storage happens at many levels of the computing stack, and Kubernetes is just one more abstraction on top of an ever increasing ecosystem of storage and persistence support for applications.

Summary

  • Pods can acquire storage dynamically at run time when they are created by mount operations that the kubelet executes.

  • The simplest way to experiment with Kubernetes storage providers is to make a PVC in a Pod in a kind cluster.

  • The CSI provider for NFS is one of many CSI providers, all of which conform to the same CSI standard for container storage mounting. This decouples Kubernetes source code from storage vendor source code.

  • When implemented, the CSI-defined identity controller and node services, each of which includes several abstract functions, allow providers to dynamically provide storage to Pods through the CSI API.

  • The CSI interface can be made to work on non-Linux OSs, with the CSI proxy for Windows kubelets as the leading example of this type of implementation.

  • The Linux virtual filesystem (VFS) includes anything that can be opened, read, and written to. Operations on disks happen beneath its API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.193.232