Chapter 20: Autoscaling Kubernetes Pods and Nodes

Needless to say, having autoscaling capabilities for your cloud-native application is considered the holy grail of running applications in cloud. In short, by autoscaling, we mean a method to automatically and dynamically adjust the amount of computational resources, such as CPU and RAM memory, available to your application. The goal of it is to cleverly add or remove available resources based on the activity and demand of end users. So, for example, the application may require more CPU and RAM memory during daytime hours, when users are most active, but much less during the night. Similarly, if you are running an e-commerce business, you can expect a huge spike in demand during so-called Black Friday. In this way, you can not only provide a better, highly available service to users but also reduce your cost of goods sold (COGS) for the business. The fewer resources you consume in the cloud, the less you pay, and the business can invest the money elsewhere – this is a win-win situation. There is, of course, no single rule that fits all use cases, hence good autoscaling needs to be based on critical usage metrics and should have predictive features to anticipate the workloads based on history.

Kubernetes, as the most mature container orchestration system available, comes with a variety of built-in autoscaling features. Some of these features are natively supported in every Kubernetes cluster and some require installation or specific type of cluster deployment. There are also multiple dimensions of scaling that you can have:

  • Vertical for Pods: This involves adjusting the amount of CPU and memory resources available to a Pod. Pods can run under limits specified for CPU and memory, to prevent excessive consumption, but these limits may require automatic adjustment rather than a human operator guessing. This is implemented by a VerticalPodAutoscaler (VPA).
  • Horizontal for Pods: This involves dynamically changing the number of Pod replicas for your Deployment or StatefulSet. These objects come with nice scaling features out of the box, but adjusting the number of replicas can be automated using a HorizontalPodAutoscaler (HPA).
  • Horizontal for Nodes: Another dimension of horizontal scaling (scaling out), but this time at the level of a Kubernetes Node. You can scale your whole cluster by adding or removing the Nodes. This requires, of course, a Kubernetes Deployment that runs in an environment that supports the dynamic provisioning of machines, such as a cloud environment. This is implemented by a Cluster Autoscaler (CA), available for some cloud vendors.

In this chapter, we will cover the following topics:

  • Pod resource requests and limits
  • Autoscaling Pods vertically using a Vertical Pod Autoscaler
  • Autoscaling Pods horizontally using a Horizontal Pod Autoscaler
  • Autoscaling Kubernetes Nodes using a Cluster Autoscaler

Technical requirements

For this chapter, you will need the following:

  • A Kubernetes cluster deployed. We recommend using a multi-node, cloud-based Kubernetes cluster.
  • Having a multi-node Google Kubernetes Engine (GKE) cluster is a recommended prerequisite to follow the second section relating to the Vertical Pod Autoscaler (VPA). AKS and EKS currently require the manual installation of a VPA, which we are going to demonstrate, but GKE has support for it out of the box.
  • Having a multi-node AKS, EKS, or GKE cluster is a prerequisite for following the final section regarding a CA.
  • A Kubernetes CLI (kubectl) installed on your local machine and configured to manage your Kubernetes cluster.

Basic Kubernetes cluster deployment (local and cloud-based) and kubectl installation have been covered in Chapter 3, Installing Your First Kubernetes Cluster.

The following chapters can provide you with an overview of how to deploy a fully functional Kubernetes cluster on different cloud platforms and install the requisite CLIs to manage them:

  • Chapter 14, Kubernetes Clusters on Google Kubernetes Engine
  • Chapter 15, Launching a Kubernetes Cluster on Amazon Web Services with the Amazon Elastic Kubernetes Service
  • Chapter 16, Kubernetes Clusters on Microsoft Azure with the Azure Kubernetes Service

You can download the latest code samples for this chapter from the official GitHub repository at https://github.com/PacktPublishing/The-Kubernetes-Bible/tree/master/Chapter20.

Pod resource requests and limits

Before we dive into the topics of autoscaling in Kubernetes, we need to explain a bit more about how you can control the CPU and memory resource (known as compute resources) usage by Pod containers in Kubernetes. Controlling the use of compute resources is important since, in this way, you can enforce resource governance – this allows better planning of the cluster capacity and, most importantly, prevents situations when a single container can consume all compute resources and prevent other Pods from serving the requests.

When you create a Pod, it is possible to specify how much compute resources its containers require and what the limits are in terms of permitted consumption. The Kubernetes resource model provides an additional distinction between two classes of resources: compressible and incompressible. In short, a compressible resource can be easily throttled, without severe consequences. A perfect example of such a resource is the CPU – if you need to throttle CPU usage for a given container, the container will operate normally, just slower. On the other hand, we have incompressible resources that cannot be throttled without sever consequences – RAM memory allocation is an example of such a resource. If you do not allow a process running in a container to allocate more memory, the process will crash and result in container restart.

Important Note

If you want to know more about the philosophy and design decisions for the Kubernetes resource governance model, we recommend reading the official design proposal documents. Resource model: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/resources.md. Resource quality of service: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md.

To control the resources for a Pod container, you can specify two values in its specification:

  • requests: This specifies the guaranteed amount of a given resource provided by the system. You can also think the other way round – this is the amount of a given resource that the Pod container requires from the system in order to function properly. This is important as Pod scheduling is dependent on the requests value (not limits), namely, the PodFitsResources predicate and the BalancedResourceAllocation priority.
  • limits: This specifies the maximum amount of a given resource provided by the system. If specified together with requests, this value must be greater than or equal to requests. Depending on whether the resource is compressible or incompressible, exceeding the limit has different consequences – compressible resources (CPU) will be throttled, whereas incompressible resources (RAM) may result in container kill and restart.

If you use different values for requests and limits, you can allow for resource overcommit. This technique is useful for efficiently handling short bursts of resource usage while allowing better resource usage on average. The reasoning behind this is that you will rarely have all containers on the Node requiring maximum resources, as they specify in limits, at the same time. This gives you better bin packing of your Pods for the majority of the time. The concept is similar to overprovisioning for virtual machine hypervisors or, in the real world, overbooking for airplane flights.

If you do not specify limits at all, the container can consume as much of the resource on a Node as it wants. This can be controlled by namespace resource quotas and limit ranges – you can read more about these objects in the official documentation: https://kubernetes.io/docs/concepts/policy/limit-range/.

Tip

In more advanced scenarios, you can also control huge pages and ephemeral storage requests and limits.

Before we dive into the configuration details, we need to look at what are the units for measuring CPU and memory in Kubernetes. For CPU, the base unit is Kubernetes CPU (KCU), where 1 is equivalent to, for example, 1 vCPU on Azure, 1 core on GCP, or 1 hyperthreaded core on a bare-metal machine. Fractional values are allowed: 0.1 can be also specified as 100m (milliKCUs). For memory, the base unit is byte; you can, of course, specify standard unit prefixes, such as M, Mi, G, or Gi.

To enable compute resource requests and limits for Pod containers in our nginx Deployment that we used in the previous chapters, you can make the following changes to the YAML manifest, nginx-deployment.yaml:

apiVersion: apps/v1

kind: Deployment

metadata:

  name: nginx-deployment-example

spec:

  replicas: 5

  selector:

    matchLabels:

      app: nginx

      environment: test

  template:

    metadata:

      labels:

        app: nginx

        environment: test

    spec:

      containers:

      - name: nginx

        image: nginx:1.17

        ports:

        - containerPort: 80

        resources:

          limits:

            cpu: 200m

            memory: 60Mi

          requests:

            cpu: 100m

            memory: 50Mi

For each container that you have in the Pod, you can specify the .spec.template.spec.containers[*].resources field. In this case, we have set limits at 200m KCU and 60Mi for RAM, and requests at 100m KCU and 50Mi for RAM.

When you apply the manifest to the cluster using kubectl apply -f ./nginx-deployment.yaml, you can describe one of the Nodes in the cluster that run Pods for this Deployment and you will see detailed information about compute resources quotas and allocation:

$ kubectl describe node aks-nodepool1-77120516-vmss000000

...

Non-terminated Pods:          (5 in total)

  Namespace                   Name                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE

  ---------                   ----                                 ------------  ----------  ---------------  -------------  ---

  default                     nginx-deployment-example-5d8b9979d4-9sd9x    100m (5%)     200m (10%)  50Mi (1%)        60Mi (1%)      8m12s

  default                     nginx-deployment-example-5d8b9979d4-rbwv2    100m (5%)     200m (10%)  50Mi (1%)        60Mi (1%)      8m10s

  default                     nginx-deployment-example-5d8b9979d4-sfzx9    100m (5%)     200m (10%)  50Mi (1%)        60Mi (1%)      8m10s

  kube-system                 kube-proxy-q6xdq                             100m (5%)     0 (0%)      0 (0%)           0 (0%)         10d

  kube-system                 omsagent-czm6q                               75m (3%)      500m (26%)  225Mi (4%)       600Mi (13%)    17d

Allocated resources:

  (Total limits may be over 100 percent, i.e., overcommitted.)

  Resource                       Requests    Limits

  --------                       --------    ------

  cpu                            475m (25%)  1100m (57%)

  memory                         375Mi (8%)  780Mi (17%)

  ephemeral-storage              0 (0%)      0 (0%)

  hugepages-1Gi                  0 (0%)      0 (0%)

  hugepages-2Mi                  0 (0%)      0 (0%)

  attachable-volumes-azure-disk  0           0

Now, based on this information, you could experiment, and set requests for CPU for the container to a value higher than the capacity of a single Node in the cluster, in our case, 2000m KCU. When you do that and apply the changes to the Deployment, you will notice that new Pods hang in the Pending state because they cannot be scheduled on a matching Node. In such cases, inspecting the Pod will reveal the following:

$ kubectl describe pod nginx-deployment-example-56868549b-5n6lj

...

Events:

  Type     Reason            Age   From               Message

  ----     ------            ----  ----               -------

  Warning  FailedScheduling  25s   default-scheduler  0/3 nodes are available: 3 Insufficient cpu.

There were no Nodes that could accommodate a Pod that has a container requiring 2000m KCU, and therefore the Pod cannot be scheduled at this moment.

With knowledge of how to manage compute resources, we will move on to autoscaling topics: first, we are going to explain the vertical autoscaling of Pods.

Autoscaling Pods vertically using a Vertical Pod Autoscaler

In the previous section, we have been managing requests and limits for the compute resources manually. Setting these values correctly requires some accurate human guessing, observing metrics, and performing benchmarks to adjust. Using overly high requests values will result in a waste of compute resources, whereas setting it too low may result in Pods being packed too densely and having performance issues. Also, in some cases, the only way to scale the Pod workload is to do it vertically by increasing the amount of compute resources it can consume. For bare-metal machines, this would mean upgrading the CPU hardware and adding more physical RAM memory. For containers, it is as simple as allowing them more of the compute resource quotas. This works, of course, only up to the capacity of a single Node. You cannot scale vertically beyond that unless you add more powerful Nodes to the cluster.

To help resolve these issues, Kubernetes offers a Vertical Pod Autoscaler (VPA), which can increase and decrease CPU and memory resource requests for Pod containers dynamically. The goal is to better match the actual usage rather than rely on hardcoded, predefined values. Controlling limits within specified ratios is also supported.

The VPA is created by a Custom Resource Definition (CRD) object named VerticalPodAutoscaler. This means that this object is not part of standard Kubernetes API groups and needs to be installed in the cluster. The VPA is developed as part of an autoscaler project (https://github.com/kubernetes/autoscaler) in the Kubernetes ecosystem.

There are three main components of a VPA:

  • Recommender: Monitors the current and past resource consumption and provides recommended CPU and memory request values for a Pod container.
  • Updater: Checks for Pods with incorrect resources and deletes them, so that the Pods can be recreated with the updated requests and limits values
  • Admission plugin: Sets the correct resource requests and limits on new Pods created or recreated by their controller, for example, a Deployment object, due to changes made by the updater

The reason why the updater needs to terminate Pods and the VPA has to rely on the admission plugin is that Kubernetes does not support dynamic changes to the resource requests and limits. The only way is to terminate the Pod and create a new one with new values. In-place modifications of values are tracked in KEP1287 (https://github.com/kubernetes/enhancements/pull/1883) and, when implemented, will make the design of the VPA much simpler, thereby ensuring improved high availability.

Important note

A VPA can run in recommendation-only mode where you see the suggested values in the VPA object, but the changes are not applied to the Pods. A VPA is currently considered experimental and using it in a mode that recreates the Pods may lead to downtimes of your application. This should change when in-place updates of Pod requests and limits are implemented.

Some Kubernetes offerings come with one-click support for installing a VPA. Two good examples are OpenShift and GKE. We will now quickly explain how you can do that if you are running a GKE cluster.

Enabling a VPA in GKE

Assuming that your GKE cluster is named k8sforbeginners, as in Chapter 14, Kubernetes Clusters on Google Kubernetes Engine, enabling a VPA is as simple as running the following command:

$ gcloud container clusters update k8sforbeginners --enable-vertical-pod-autoscaling

Note that this operation causes a restart to the Kubernetes control plane.

If you want to enable a VPA for a new cluster, you can use the additional argument --enable-vertical-pod-autoscaling, for example:

$ gcloud container clusters create k8sforbeginners --num-nodes=2 --zone=us-central1-a --enable-vertical-pod-autoscaling

The GKE cluster will have a VPA CRD available, and you can use it to control the vertical autoscaling of Pods.

Enabling a VPA for other Kubernetes clusters

In the case of different platforms such as AKS or EKS (or even local deployments for testing), you need to install a VPA manually by adding a VPA CRD to the cluster. The exact, most recent steps are documented in the corresponding GitHub repository: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler#installation.

To install a VPA in your cluster, please perform the following steps:

  1. Clone the Kubernetes autoscaler repository (https://github.com/kubernetes/autoscaler):

    $ git clone https://github.com/kubernetes/autoscaler

  2. Navigate to the VPA component directory:

    $ cd autoscaler/vertical-pod-autoscaler

  3. Begin installation using the following command. This assumes that your current kubectl context is pointing to the desired cluster:

    $ ./hack/vpa-up.sh

  4. This will create a bunch of Kubernetes objects. You can verify that the main component Pods are started correctly using the following command:

    $ kubectl get pods -n kube-system

    NAME                     READY   STATUS    RESTARTS   AGE

    vpa-admission-controller-688857d5c4-4l9c2   1/1     Running   0          10s

    vpa-recommender-74849cc845-qbfpg            1/1     Running   0          11s

    vpa-updater-6dbd6569d6-9np22                1/1     Running   0          12s

The VPA components are running, and we can now proceed to testing a VPA on real Pods.

Using a VPA

For demonstration purposes, we need a Deployment with Pods that cause actual consumption of CPU. The Kubernetes autoscaler repository has a good, simple example that has predictable CPU usage: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/examples/hamster.yaml. We are going to modify this example a bit and do a step-by-step demonstration. Let's prepare the Deployment first:

  1. Create the hamster-deployment.yaml YAML manifest file:

    apiVersion: apps/v1

    kind: Deployment

    metadata:

      name: hamster

    spec:

      selector:

        matchLabels:

          app: hamster

      replicas: 5

      template:

        metadata:

          labels:

            app: hamster

        spec:

          containers:

          - name: hamster

            image: ubuntu:20.04

            resources:

              requests:

                cpu: 100m

                memory: 50Mi

            command:

            - /bin/sh

            - -c

            - while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done

    It's a real hamster! The command that is used in the Pod's ubuntu container consumes the maximum available CPU of 0.5 seconds and does nothing for 0.5 seconds, all the time. This means that the actual CPU usage will stay, on average, at around 500m KCU. However, the requests for resources specify that it requires 100m KCU. This means that the Pod will consume more than it declares, but since there are no limits set, Kubernetes will not throttle the container CPU. This could potentially lead to incorrect scheduling decisions by Kubernetes Scheduler.

  2. Apply the manifest to the cluster using the following command:

    $ kubectl apply -f ./hamster-deployment.yaml

    deployment.apps/hamster created

  3. Let's verify what the CPU usage of the Pod is. The simplest way is to use the kubectl top command:

    $ kubectl top pod

    NAME                       CPU(cores)   MEMORY(bytes)

    hamster-779cfd69b4-5bnbf   475m         1Mi

    hamster-779cfd69b4-8dt5h   497m         1Mi

    hamster-779cfd69b4-mn5p5   492m         1Mi

    hamster-779cfd69b4-n7nss   496m         1Mi

    hamster-779cfd69b4-rl29j   484m         1Mi

    As we expected, the CPU consumption for each Pod in the deployment oscillates at around 500m KCU.

With that, we can move on to creating a VPA for our Pods. VPAs can operate in four modes that you specify by means of the .spec.updatePolicy.updateMode field:

  • Recreate: Pod container limits and requests are assigned on Pod creation and dynamically updated based on calculated recommendations. To update the values, the Pod must be restarted. Please note that this may be disruptive to your application.
  • Auto: Currently equivalent to Recreate, but when in-place updates for Pod container requests and limits are implemented, this can automatically switch to the new update mechanism.
  • Initial: Pod container limits and requests are assigned on Pod creation only.
  • Off: A VPA runs in recommendation-only mode. The recommended values can be inspected in the VPA object, for example, by using kubectl.

We are going to first create a VPA for hamster Deployment, which runs in Off mode, and later we will enable Auto mode. To do this, please perform the following steps:

  1. Create a VPA YAML manifest named hamster-vpa.yaml:

    apiVersion: autoscaling.k8s.io/v1

    kind: VerticalPodAutoscaler

    metadata:

      name: hamster-vpa

    spec:

      targetRef:

        apiVersion: apps/v1

        kind: Deployment

        name: hamster

      updatePolicy:

        updateMode: "Off"

      resourcePolicy:

        containerPolicies:

        - containerName: '*'

          minAllowed:

            cpu: 100m

            memory: 50Mi

          maxAllowed:

            cpu: 1

            memory: 500Mi

          controlledResources:

          - cpu

          - memory

    This VPA is created for a Deployment object with the name hamster, as specified in .spec.targetRef. The mode is set to "Off" in .spec.updatePolicy.updateMode ("Off" needs to be specified in quotes to avoid being interpreted as a Boolean) and the container resource policy is configured in .spec.resourcePolicy.containerPolicies. The policy that we used allows Pod container requests for CPU to be adjusted automatically between 100m KCU and 1000m KCU, and for memory between 50Mi and 500Mi.

  2. Apply the manifest file to the cluster:

    $ kubectl apply -f ./hamster-vpa.yaml

    verticalpodautoscaler.autoscaling.k8s.io/hamster-vpa created

  3. You need to wait a while for the recommendation to be calculated for the first time. Then, you can check what the recommendation is by describing the VPA:

    $ kubectl describe vpa hamster-vpa

    ...

    Status:

      Conditions:

        Last Transition Time:  2021-03-28T14:33:33Z

        Status:                True

        Type:                  RecommendationProvided

      Recommendation:

        Container Recommendations:

          Container Name:  hamster

          Lower Bound:

            Cpu:     551m

            Memory:  262144k

          Target:

            Cpu:     587m

            Memory:  262144k

          Uncapped Target:

            Cpu:     587m

            Memory:  262144k

          Upper Bound:

            Cpu:     1

            Memory:  378142066

    The VPA has recommended allocating a bit more than the expected 500m KCU and 262144k memory. This makes sense, as the Pod should have a safe buffer for CPU consumption.

  4. Now we can check the VPA in practice and change its mode to Auto. Modify hamster-vpa.yaml:

    apiVersion: autoscaling.k8s.io/v1

    kind: VerticalPodAutoscaler

    metadata:

      name: hamster-vpa

    spec:

    ...

      updatePolicy:

        updateMode: Auto

    ...

  5. Apply the manifest to the cluster:

    $ kubectl apply -f ./hamster-vpa.yaml

    verticalpodautoscaler.autoscaling.k8s.io/hamster-vpa configured

  6. After a while, you will notice that the Pods for the Deployment are being restarted by the VPA:

    $ kubectl get pod

    NAME                 READY   STATUS        RESTARTS   AGE

    hamster-779cfd69b4-5bnbf   1/1     Running       0          45m

    hamster-779cfd69b4-8dt5h   1/1     Terminating   0          45m

    hamster-779cfd69b4-9tqfx   1/1     Running       0          60s

    hamster-779cfd69b4-n7nss   1/1     Running       0          45m

    hamster-779cfd69b4-wdz8t   1/1     Running       0          60s

  7. We can inspect one of the restarted Pods to see the current requests for resources:

    $ kubectl describe pod hamster-779cfd69b4-9tqfx

    ...

    Annotations:  vpaObservedContainers: hamster

                  vpaUpdates: Pod resources updated by hamster-vpa: container 0: cpu request, memory request

    ...

    Containers:

      hamster:

    ...

        Requests:

          cpu:        587m

          memory:     262144k

    ...

    As you can see, the newly started Pod has CPU and memory requests set to the values recommended by the VPA!

    Important note

    A VPA should not be used with an HPA running on CPU/memory metrics at this moment. However, you can use a VPA in conjunction with an HPA running on custom metrics.

Next, we are going to discuss how you can horizontally autoscale Pods using a Horizontal Pod Autoscaler (HPA).

Autoscaling Pods horizontally using a Horizontal Pod Autoscaler

While a VPA acts like an optimizer of resource usage, the true scaling of your Deployments and StatefulSets that run multiple Pod replicas can be done using a Horizontal Pod Autoscaler (HPA). At a high level, the goal of the HPA is to automatically scale the number of replicas in Deployment or StatefulSets depending on the current CPU utilization or other custom metrics (including multiple metrics at once). The details of the algorithm that determines the target number of replicas based on metric values can be found here: https://kubernetes.io/docs/tasks/run-application/horizontal-Pod-autoscale/#algorithm-details. HPAs are highly configurable and, in this chapter, we will cover a standard scenario in which we would like to autoscale based on target CPU usage.

Important note

An HPA is represented by a built-in HorizontalPodAutoscaler API resource in Kubernetes in the autoscaling API group. The current stable version that supports CPU autoscaling only can be found in the autoscaling/v1 API version. The beta version that supports autoscaling based on RAM and custom metrics can be found in the autoscaling/v2beta2 API version.

The role of the HPA is to monitor the configured metric for Pods, for example, CPU usage, and determine whether there is a change to the number of replicas needed. Usually, the HPA will calculate the average of the current metric value from all Pods and determine whether adding or removing replicas will bring the metric value closer to the specified target value. For example, you set the target CPU usage to be 50%. At some point, increased demand for the application causes the Deployment Pods to have 80% CPU usage. The HPA will decide to add more Pod replicas so that the average usage across all replicas will fall and be closer to 50%. And the cycle repeats. In other words, the HPA tries to maintain the average CPU usage to be as close to 50% as possible. This is like a continuous, closed-loop controller – in real life, a thermostat reacting to temperature changes in the building is a good, similar example. HPA additionally uses mechanisms such as a stabilization window to prevent the replicas from scaling down too quickly and causing unwanted replica flapping.

Tip

GKE has added support for multidimensional Pod autoscaling that combines horizontal scaling using CPU metrics and vertical scaling based on memory usage at the same time. You can read more about this feature in the official documentation: https://cloud.google.com/kubernetes-engine/docs/how-to/multidimensional-pod-autoscaling.

As an HPA is a built-in feature of Kubernetes, there is no need to perform any installation. We just need to prepare a Deployment for testing and create a HorizontalPodAutoscaler API object.

Using an HPA

To test an HPA, we are going to rely on the standard CPU usage metric. This means that we need to configure requests for CPU on the Deployment Pods, otherwise autoscaling is not possible as there is no absolute number that is needed to calculate the percentage metric. On top of that, we again need a Deployment that can consume a predictable amount of CPU resources. Of course, in real use cases, the varying CPU usage would be coming from actual demand for your application from end users.

Unfortunately, there is no simple way to have predictable and varying CPU usage in a container out of the box, so we have to prepare a Deployment with a Pod template that will do that. We will modify our hamster Deployment approach and create an elastic-hamster Deployment. The small shell script running continuously in the container will behave slightly differently. We will assign the total desired work by hamsters in all Pods together. Each Pod will query the Kubernetes API to check how many replicas there are currently running for the Deployment. Then, we will divide the total desired work by the number of replicas to get the amount of work that a single hamster needs to do. So, for example, we will say that all hamsters together should do 1.0 of work, which roughly maps to the total consumption of KCU in the cluster. Then, if you deploy five replicas for the Deployment, each of the hamsters will do 1.0/5 = 0.2 work, so they will work for 0.2 seconds and sleep for 0.8 seconds. Now, if we scale the Deployment manually to 10 replicas, the amount of work per hamster will fall to 0.1 seconds, and they will sleep for 0.9 seconds. As you can see, they collectively always work for 1.0 second, no matter how many replicas we use. This kind of reflects a real-life scenario where end users cause some amount of traffic to handle, and you distribute it among the Pod replicas. The more Pod replicas you have, the less traffic they have to handle and, in the end, the CPU usage metric will be lower on average.

Querying Deployments via the Kubernetes API will require some additional RBAC setup. You can find more details in Chapter 18, Authentication and Authorization on Kubernetes. To create the deployment for the demonstration, please perform the following steps:

  1. Create an elastic-hamster ServiceAccount manifest file named elastic-hamster-serviceaccount.yaml:

    apiVersion: v1

    kind: ServiceAccount

    metadata:

      name: elastic-hamster

      namespace: default

  2. Create a deployment-reader Role manifest file named deployment-reader-role.yaml. This role allows Deployments to obtain information from the Kubernetes API:

    apiVersion: rbac.authorization.k8s.io/v1

    kind: Role

    metadata:

      namespace: default

      name: deployment-reader

    rules:

    - apiGroups: ["apps"]

      resources: ["deployments"]

      verbs: ["get", "watch", "list"]

  3. Create a read-deployments RoleBinding manifest file named read-deployments-rolebinding.yaml. This RoleBinding associates the ServiceAccount with the role:

    apiVersion: rbac.authorization.k8s.io/v1

    kind: RoleBinding

    metadata:

      name: read-deployments

      namespace: default

    subjects:

    - kind: ServiceAccount

      name: elastic-hamster

      namespace: default

    roleRef:

      kind: Role

      name: deployment-reader

      apiGroup: rbac.authorization.k8s.io

  4. Finally, create an elastic-hamster Deployment manifest file named elastic-hamster-deployment.yaml, which will have Pods running on the elastic-hamster ServiceAccount. Let's take a look at the first part, without the shell command (the full file is also available in the book's GitHub repository: https://github.com/PacktPublishing/Kubernetes-for-Beginners/blob/master/Chapter20/03_hpa/elastic-hamster-deployment.yaml):

    apiVersion: apps/v1

    kind: Deployment

    metadata:

      name: elastic-hamster

    spec:

      selector:

        matchLabels:

          app: elastic-hamster

      replicas: 5

      template:

        metadata:

          labels:

            app: elastic-hamster

        spec:

          serviceAccountName: elastic-hamster

          containers:

          - name: hamster

            image: ubuntu:20.04

            resources:

              requests:

                cpu: 200m

                memory: 50Mi

            env:

            - name: TOTAL_HAMSTER_USAGE

              value: "1.0"

            command:

            - /bin/sh

            - -c

            - |

    ... shell command available in the next step ...

    While it is not a good practice to have long shell scripts in the YAML manifest definitions, it is easier for demonstration purposes than creating a dedicated container image, pushing it to the image repository, and consuming it. Let's take a look at what is happening in the manifest file. Initially, we need to have five replicas. Each Pod container has requests with cpu set to 200m KCU and memory set to 50Mi. We also define an environment variable, TOTAL_HAMSTER_USAGE, with an initial value of "1.0" for more readability. This variable defines the total collective work that the hamsters are expected to do.

  5. Now, let's take a look at the continuation of the file, at the part with the shell script for the container (the indentation has been removed and, in the YAML file, you need to correctly indent the script, as in the GitHub repository):

    # Install curl and jq

    apt-get update && apt-get install -y curl jq || exit 1

    SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount

    TOKEN=$(cat ${SERVICEACCOUNT}/token)

    while true

      # Calculate CPU usage by hamster. This will dynamically adjust to be 1.0 / num_replicas. So for initial 5 replicas, it will be 0.2

      HAMSTER_USAGE=$(curl -s --cacert $SERVICEACCOUNT/ca.crt --header "Authorization: Bearer $TOKEN" -X GET https://kubernetes/apis/apps/v1/namespaces/default/deployments/elastic-hamster | jq ${TOTAL_HAMSTER_USAGE}/'.spec.replicas')

      # Hamster sleeps for the rest of the time, with a small adjustment factor

      HAMSTER_SLEEP=$(jq -n 1.2-$HAMSTER_USAGE)

      echo "Hamster uses $HAMSTER_USAGE and sleeps $HAMSTER_SLEEP"

      do timeout ${HAMSTER_USAGE}s yes >/dev/null

      sleep ${HAMSTER_SLEEP}s

    done

    The shell script, as the very first step, installs curl and jq packages from the APT repository. We define SERVICEACCOUNT and TOKEN variables, which we need to query the Kubernetes API. Then, we retrieve the elastic-hamster Deployment from the API using https://kubernetes/apis/apps/v1/namespaces/default/deployments/elastic-hamster. The result is parsed using the jq command, we extract the .spec.replicas field, and use it to divide the total work between all hamsters. Based on this number, we make the hamster work for a calculated period of time and then sleep for the rest. As you can see, if the number of replicas for the Deployment changes, either by means of a manual action or autoscaling, the amount of work to be done by an individual hamster will change. And therefore, the CPU usage will decrease the more Pod replicas we have.

  6. We are now ready to apply all manifest files in the directory with the following command:

    $ kubectl apply -f ./

    role.rbac.authorization.k8s.io/deployment-reader created

    deployment.apps/elastic-hamster created

    serviceaccount/elastic-hamster created

    rolebinding.rbac.authorization.k8s.io/read-deployments created

  7. When the Pods are fully started, you will be able to see in the logs that the hamster work and sleep cycle has begun:

    $ kubectl logs elastic-hamster-5897858459-26bdd

    ...

    Running hooks in /etc/ca-certificates/update.d...

    done.

    Hamster uses 0.2 and sleeps 1

    Hamster uses 0.2 and sleeps 1

    ...

  8. After a while, you will see in the output of the kubectl top command that the CPU usage is about the expected 200m KCU. Of course, this method is not precise because there is more CPU usage by the container than just the work and sleep cycle:

    $ kubectl top pods

    NAME                          CPU(cores)   MEMORY(bytes)

    elastic-hamster-5897858459-26bdd   229m         40Mi

    elastic-hamster-5897858459-f2856   210m         40Mi

    elastic-hamster-5897858459-lmphl   236m         40Mi

    elastic-hamster-5897858459-m6j58   225m         40Mi

    elastic-hamster-5897858459-qfh76   227m         41Mi

  9. We can test how it reacts to change in a number of replicas. Scale down the Deployment imperatively to two replicas using the kubectl scale command:

    $ kubectl scale deploy elastic-hamster --replicas=2

    deployment.apps/elastic-hamster scaled

  10. You can inspect the Pod logs again and, after a while, when metrics are processed, you will see the CPU usage change in the kubectl top command output, which is, as expected, around 500m KCU per Pod:

    $ kubectl top pods

    NAME                           CPU(cores)   MEMORY(bytes)

    elastic-hamster-5897858459-m6j58   462m         40Mi

    elastic-hamster-5897858459-qfh76   474m         40Mi

With the Deployment ready, we can start using the HPA to automatically adjust the number of replicas, which will target 75% of average CPU utilization across individual Pods. To do that, perform the following steps:

  1. Create an elastic-hamster-hpa.yaml YAML manifest file for the HPA:

    apiVersion: autoscaling/v1

    kind: HorizontalPodAutoscaler

    metadata:

      name: elastic-hamster-hpa

    spec:

      minReplicas: 1

      maxReplicas: 10

      targetCPUUtilizationPercentage: 75

      scaleTargetRef:

        apiVersion: apps/v1

        kind: Deployment

        name: elastic-hamster

    The HPA targets elastic-hamster deployment, which we have provided using .spec.scaleTargetRef. The configuration that we specified ensures that the HPA will always keep the number of replicas between minReplicas: 1 and maxReplicas: 10. The most important parameter in the HPA targeting the CPU metric is targetCPUUtilizationPercentage, which we have set to 75%. This means that the HPA will try to target 75% of the container requests value for cpu, which we set to be 200m KCU. As a result, the HPA will try to keep the CPU consumption at around 150m KCU. Our current Deployment with two replicas only is consuming much more, on average, 500m KCU.

  2. Apply the manifest file to the cluster:

    $ kubectl apply -f ./elastic-hamster-hpa.yaml

    horizontalpodautoscaler.autoscaling/elastic-hamster-hpa created

  3. After a while, the HPA will start adjusting the number of replicas to match the target CPU usage. Describe the HPA using the kubectl command to see the details:

    $ kubectl describe hpa elastic-hamster-hpa

    ...

    Metrics:                                               ( current / target )

      resource cpu on pods  (as a percentage of request):  79% (159m) / 75%

    ...

    Events:

      Type    Reason       Age   From                 Message

      ----    ------       ----  ----                 -------

      Normal  SuccessfulRescale  15m   horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target

      Normal  SuccessfulRescale  14m   horizontal-pod-autoscaler  New size: 6; reason: cpu resource utilization (percentage of request) above target

      Normal  SuccessfulRescale  13m   horizontal-pod-autoscaler  New size: 8; reason: cpu resource utilization (percentage of request) above target

      Normal  SuccessfulRescale  11m   horizontal-pod-autoscaler  New size: 9; reason: cpu resource utilization (percentage of request) above target

    In the output, you can see that the Deployment was gradually scaled up over time as it eventually stabilized at 9 replicas. Note that for you, the numbers may vary slightly. If you hit the maximum number of allowed replicas (10), you may try increasing the number or adjust the targetCPUUtilizationPercentage parameter.

    Tip

    It is possible to use an imperative command to achieve a similar result: kubectl autoscale deploy elastic-hamster --cpu-percent=75 --min=1 --max=10.

Congratulations! You have successfully configured horizontal autoscaling for your Deployment using an HPA. In the next section, we will take a look at autoscaling Kubernetes Nodes using a CA which gives even more flexibility when combined with an HPA.

Autoscaling Kubernetes Nodes using a Cluster Autoscaler

So far, we have discussed scaling at the level of individual Pods, but this is not the only way in which you can scale your workloads on Kubernetes. It is possible to scale the cluster itself to accommodate changes in demand for compute resources – at some point, we will need more Nodes to run more Pods. This is solved by the CA, which is part of the Kubernetes autoscaler repository (https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler). The CA must be able to provision and deprovision Nodes for the Kubernetes cluster, so this means that vendor-specific plugins must be implemented. You can find the list of supported cloud service providers here: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment.

The CA periodically checks the status of Pods and Nodes and decides whether it needs to take action:

  • If there are Pods that cannot be scheduled and are in the Pending state because of insufficient resources in the cluster, CA will add more Nodes, up to the predefined maximum size.
  • If Nodes are under-utilized and all Pods could be scheduled even with a smaller number of Nodes in the cluster, the CA will remove the Nodes from the cluster, unless it has reached the predefined minimum size. Nodes are gracefully drained before they are removed from the cluster.
  • For some cloud service providers, the CA can also choose between different SKUs for VMs to better optimize the cost of operating the cluster.

    Important note

    Pod containers must specify requests for the compute resources to make the CA work properly. Additionally, these values should reflect real usage, otherwise the CA will not be able to take correct decisions for your type of workload.

As you can see, the CA can complement HPA capabilities. If the HPA decides that there should be more Pods for a Deployment or StatefulSet, but no more Pods can be scheduled, then the CA can intervene and increase the cluster size.

Enabling the CA entails different steps depending on your cloud service provider. Additionally, some configuration values are specific for each of them. We will first take a look at GKE.

Enabling the cluster autoscaler in GKE

For GKE, it is easiest to create a cluster with CA enabled from scratch. To do that, you need to run the following command to create a cluster named k8sforbeginners:

$ gcloud container clusters create k8sforbeginners --num-nodes=2 --zone=us-central1-a --enable-autoscaling --min-nodes=2 --max-nodes=10

You can control the minimum number of Nodes in autoscaling by using the --min-nodes parameter, and the maximum number of Nodes by using the --max-nodes parameter.

In the case of an existing cluster, you need to enable the CA on an existing Node pool. For example, if you have a cluster named k8sforbeginners with one Node pool named nodepool1, then you need to run the following command:

$ gcloud container clusters update k8sforbeginners --enable-autoscaling --min-nodes=2 --max-nodes=10 --zone=us-central1-a --node-pool=nodepool1

The update will take a few minutes.

You can learn more in the official documentation: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler.

Once configured, you can move on to Using the cluster autoscaler.

Enabling the cluster autoscaler in the Amazon Elastic Kubernetes Service

Setting up the CA in Amazon EKS cannot currently be realized in a one-click or one-command action. You need to create an appropriate IAM policy and role, deploy the CA resources to the Kubernetes cluster, and undertake manual configuration steps. For this reason, we will not cover this in the book and we request that you refer to the official instructions: https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html.

Once configured, you can move on to Using the cluster autoscaler.

Enabling the cluster autoscaler in the Azure Kubernetes Service

AKS provides a similar CA setup experience to GKE – you can use a one-command procedure to either deploy a new cluster with CA enabled or update the existing one to use the CA. To create a new cluster named k8sforbeginners-aks from scratch in the k8sforbeginners-rg resource group, execute the following command:

$ az aks create --resource-group k8sforbeginners-rg --name k8sforbeginners-aks --node-count 2 --enable-cluster-autoscaler --min-count 2 --max-count 10

You can control the minimum number of Nodes in autoscaling by using the --min-count parameter, and the maximum number of Nodes by using the --max-count parameter.

To enable the CA on an existing AKS cluster named k8sforbeginners-aks, execute the following command:

$ az aks update --resource-group k8sforbeginners-rg --name k8sforbeginners-aks --enable-cluster-autoscaler --min-count 2 --max-count 10

The update will take a few minutes.

You can learn more in the official documentation: https://docs.microsoft.com/en-us/azure/aks/cluster-autoscaler. Additionally, the CA in AKS has more parameters that you can configure using autoscaler profile. Further details are provided in the official documentation at https://docs.microsoft.com/en-us/azure/aks/cluster-autoscaler#using-the-autoscaler-profile.

Now, let's take a look at how you can use the CA.

Using the cluster autoscaler

We have just configured the CA for the cluster and now it may take a bit of time until the CA performs its first actions. This depends on the CA configuration, which may be vendor-specific. For example, in the case of AKS, the cluster will be evaluated every 10 seconds (scan-interval), whether it needs to be scaled up or down. If scaling down needs to happen after scaling up, there is a 10-minute delay (scale-down-delay-after-add). Scaling down will be triggered if the sum of requested resources divided by capacity is below 0.5 (scale-down-utilization-threshold).

As a result, the cluster may automatically scale up, scale down, or remain unchanged after the CA was enabled. If you are using exactly the same cluster setup as we did in the examples, you will have the following situation:

  • There are three Nodes, each with a capacity of 2000m KCU, which means that the total KCU in the cluster is 6000m.
  • elastic-hamster Deployment is currently automatically scaled by the HPA to 9 replicas, each consuming 200m KCU, which gives us the total 1800m KCU requested.
  • There is a bit of KCU consumed by the kube-system namespace Pods.
  • Roughly, the current usage should be around 40%-50% of KCU. You can check the exact number using the kubectl top nodes command.

This means that the cluster with the current workload will either scale down by one Node or remain unchanged.

But instead, we can do some modifications to our elastic-hamster Deployment to trigger a more firm decision from CA. We will increase the total amount of work requested from the elastic-hamster Deployment and also increase the requests for CPU by its Pods. Additionally, we will allow more replicas to be created by the HPA. This will result in quickly exceeding the cluster capacity of 6000m KCU and cause the CA to scale the cluster up. To do the demonstration, please perform the following steps:

  1. In elastic-hamster-deployment.yaml, introduce the following changes. Set the number of replicas to 7 and TOTAL_HAMSTER_USAGE to "7.0" (the second value should be greater than the number of replicas). Set requests for cpu to 500m:

    apiVersion: apps/v1

    kind: Deployment

    metadata:

      name: elastic-hamster

    spec:

    ...

      replicas: 7

      template:

    ...

        spec:

          serviceAccountName: elastic-hamster

          containers:

          - name: hamster

            image: ubuntu:20.04

            resources:

              requests:

                cpu: 500m

                memory: 50Mi

            env:

            - name: TOTAL_HAMSTER_USAGE

              value: "7.0"

    ...

  2. In the elastic-hamster-hpa.yaml file, change the number of maxReplicas to 25:

    apiVersion: autoscaling/v1

    kind: HorizontalPodAutoscaler

    metadata:

      name: elastic-hamster-hpa

    spec:

      minReplicas: 1

      maxReplicas: 25

    ...

  3. Apply all YAML manifests in the directory to the cluster again:

    $ kubectl apply -f ./

    role.rbac.authorization.k8s.io/deployment-reader unchanged

    deployment.apps/elastic-hamster configured

    horizontalpodautoscaler.autoscaling/elastic-hamster-hpa configured

    serviceaccount/elastic-hamster unchanged

    rolebinding.rbac.authorization.k8s.io/read-deployments unchanged

  4. If you soon check the status of the Pods in the cluster, you will see that some of them are pending because of insufficient resources:

    $ kubectl get pods

    NAME                               READY   STATUS        RESTARTS   AGE

    ...

    elastic-hamster-5854d5f967-cjsmg   0/1     Pending       0          23s

    elastic-hamster-5854d5f967-nsnqd   0/1     Pending       0          23s

    ...

  5. Check the status of the Nodes in the cluster. The CA should already start provisioning new Nodes by that time. In our case, Node 3 has been provisioned successfully, and Node 4 is in the process of provisioning:

    $ kubectl get node

    NAME                                STATUS     ROLES    AGE     VERSION

    aks-nodepool1-77120516-vmss000000  Ready   agent   22d    v1.18.14

    aks-nodepool1-77120516-vmss000001   Ready   agent   22d   v1.18.14

    aks-nodepool1-77120516-vmss000002   Ready    agent  29h   v1.18.14

    aks-nodepool1-77120516-vmss000003   Ready   agent  2m47s  v1.18.14

    aks-nodepool1-77120516-vmss000004  NotReady   <none>  5s  v1.18.14

  6. If you inspect some of the Pods that were in the Pending state, you will see that their events contain information about the CA trigger to create a new Node:

    $ kubectl describe pod elastic-hamster-5854d5f967-grjbj

    ...

    Events:

      Type     Reason            Age    From                Message

      ----     ------            ----   ----                -------

      Warning  FailedScheduling  5m28s  default-scheduler   0/7 nodes are available: 7 Insufficient cpu.

      Warning  FailedScheduling  3m6s   default-scheduler   0/8 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 7 Insufficient cpu.

      Normal   Scheduled         2m55s  default-scheduler   Successfully assigned default/elastic-hamster-5854d5f967-grjbj to aks-nodepool1-77120516-vmss000007

      Normal   TriggeredScaleUp  4m55s  cluster-autoscaler  pod triggered scale-up: [{aks-nodepool1-77120516-vmss 7->8 (max: 10)}]

  7. Eventually, scaling up using the HPA will be finished, all Pods will become ready, and the CA will not need to autoscale to more Nodes. In our example, we ended at 16 Pod replicas running on 8 Nodes in total, and this resulted in the stabilization of average CPU usage at 82%:

    $ kubectl describe hpa elastic-hamster-hpa

    ...

    Metrics:                                               ( current / target )

      resource cpu on pods  (as a percentage of request):  82% (410m) / 75%

    Min replicas:                                          1

    Max replicas:                                          25

    Deployment pods:                                       16 current / 16 desired

  8. Node CPU usage is not distributed evenly though – the reason for this is that scaling to new Nodes does not trigger any rescheduling of Pods:

    $ kubectl top nodes

    NAME                           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%

    aks-nodepool1-77120516-vmss000000   981m    51%    2212Mi      48%

    aks-nodepool1-77120516-vmss000001   1297m    68%    2121Mi     46%

    aks-nodepool1-77120516-vmss000002   486m     25%    883Mi      19%

    aks-nodepool1-77120516-vmss000003   475m     25%    933Mi      20%

    aks-nodepool1-77120516-vmss000004   507m     26%    945Mi      20%

    aks-nodepool1-77120516-vmss000005   902m     47%    987Mi      21%

    aks-nodepool1-77120516-vmss000006   1304m    68%    1028Mi     22%

    aks-nodepool1-77120516-vmss000007   1263m     66%    1018Mi    22%

This shows how the CA has worked together with the HPA to seamlessly scale the Deployment and cluster at the same time to accommodate the workload. We will now show what automatic scaling down looks like. Perform the following steps:

  1. To decrease the load in the cluster, we can simply change the value of the TOTAL_HAMSTER_USAGE environment variable, for example, to "1.0". This will cause a rapid decrease in the load on Pods – if we currently have 16 replicas, the CPU utilization will be roughly 63m KCU per Pod, which gives 13% average CPU usage per Pod. This will cause the HPA to scale down after the stabilization window time has passed, which is, by default, 5 minutes. Introduce the changes to the elastic-hamster-deployment.yaml manifest file:

    apiVersion: apps/v1

    kind: Deployment

    metadata:

      name: elastic-hamster

    spec:

    ...

      template:

    ...

        spec:

    ...

          containers:

          - name: hamster

    ...

            env:

            - name: TOTAL_HAMSTER_USAGE

              value: "7.0"

    ...

  2. Apply the manifest file to the cluster:

    $ kubectl apply -f ./elastic-hamster-deployment.yaml

    deployment.apps/elastic-hamster configured

  3. Now, you have to wait patiently for a bit. First, the HPA must get past the stabilization window, which can take around 5 minutes, and after the Deployment is scaled down to around 3 replicas, you will still have to wait around 10 minutes for the cluster to scale down following the recent scale-up. It's time for a good cup of coffee!
  4. The HPA has eventually scaled down the Deployment to three Pods and stabilized the CPU usage at 66%:

    $ kubectl describe hpa elastic-hamster-hpa

    ...

    Metrics:                                               ( current / target )

      resource cpu on pods  (as a percentage of request):  66% (331m) / 75%

    Min replicas:                                          1

    Max replicas:                                          25

    Deployment pods:                                       3 current / 3 desired

  5. At some point, you will notice that the Nodes are being deprovisioned:

    $ kubectl get nodes

    NAME                                STATUS     ROLES   AGE   VERSION

    aks-nodeool1-77120516-vmss000000   Ready      agent   22d   v1.18.14

    aks-nodepool1-77120516-vmss000001   Ready   agent   22d   v1.18.14

    aks-nodepool1-77120516-vmss000003   NotReady  agent  56m  v1.18.14

    aks-nodepool1-77120516-vmss000004   Ready   agent   53m   v1.18.14

    aks-nodepool1-77120516-vmss000005   NotReady  agent  51m  v1.18.14

    aks-nodepool1-77120516-vmss000006   NotReady  agent  47m  v1.18.14

    aks-nodepool1-77120516-vmss000007   NotReady  agent  42m  v1.18.14

  6. And finally, you will end up with a cluster with only two Nodes, which is the minimum number that we preconfigured:

    $ kubectl get nodes

    NAME                       STATUS   ROLES   AGE   VERSION

    aks-nodepool1-77120516-vmss000000   Ready    agent   22d   v1.18.14

    aks-nodepool1-77120516-vmss000001   Ready   agent  22d   v1.18.14

This shows how efficiently the CA can react to a decrease in the load in the cluster when the HPA has scaled down the Deployment. Earlier, without any intervention, the cluster scaled to eight Nodes for a short period of time, and then scaled down to just two Nodes. Imagine the cost difference between having an eight-Node cluster running all the time and using the CA to cleverly autoscale on demand!

Tip

To ensure that you are not charged for any unwanted cloud resources, you need to clean up the cluster or disable cluster autoscaling to be sure that you are not running too many Nodes.

This demonstration concludes our chapter about autoscaling in Kubernetes. Let's summarize what we have learned in this chapter.

Summary

In this chapter, you have learned about autoscaling techniques in Kubernetes clusters. We first explained the basics behind Pod resource requests and limits and why they are crucial for the autoscaling and scheduling of Pods. Next, we introduced the VPA, which can automatically change requests and limits for Pods based on current and past metrics. After that, you learned about the HPA, which can be used to automatically change the number of Deployment or StatefulSet replicas. The changes are done based on CPU, memory, or custom metrics. Lastly, we explained the role of the CA in cloud environments. We also demonstrated how you can efficiently combine using the HPA with the CA to achieve the scaling of your workload together with the scaling of the cluster.

There is much more that can be configured in the VPA, HPA, and CA, so we have just scratched the surface of powerful autoscaling in Kubernetes!

In the last chapter, we will explain how you can use Ingress in Kubernetes for advanced traffic routing.

Further reading

For more information regarding autoscaling in Kubernetes, please refer to the following PacktPub books:

You can also refer to the official documentation:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.161.225