Autoscaling pods

Azure Kubernetes also supports autoscaling. The scheduler will then update the number of pods depending on CPU utilization or other metrics that are available.

Kubernetes uses the metrics server for this. The metrics server collects metrics from the summary API of the kubelet agents that run on the nodes within the cluster.

The metrics service is available by default if you are using Kubernetes version 1.10 or above. If you are using an older version, you will have to install the metrics server manually.

The autoscale functionality also requires some configuration on the deployment side of Kubernetes. For a deployment, you need to specify the requests and limits for the running container. These values are specified for a specific metric, for example, the CPU.

In the following example, there are requests and limits specified for the CPU metric. The CPU metric is measured in CPU units. In Azure, one unit stands for one core. For different platforms, it can have a different meaning:

resources:
  requests:
     cpu: 0.25
  limits:
     cpu: 0.5

This part can be added to the container in the deployment file and this will make sure that the pods can be autoscaled when large numbers of requests need to be served.

With the updated deployment file, deploy it and make an autoscale rule within the Kubernetes cluster:

kubectl autoscale deployment [deployment name] --cpu-percent=60 --min=1 --max=10

This rule will update the deployment with autoscale functionality. If average CPU utilization across all pods exceeds 60% of their requested usage, the autoscaler increases the pods up to a maximum of 10 instances. A minimum of one instance is then defined for the deployment:

After creating the autoscaler, you can check it by running the following command:

kubectl get hpa

HPA stands for horizontal pod autoscaler.

Try creating a CPU-intensive operation within an application and checking automatic pod creation during execution. The Kubernetes cluster will notice the significant amount of CPU usage and will scale out the cluster automatically by creating multiple pods.

Once the intensive operation is finished, Kubernetes will scale the number of pods down to the minimum.

Table of Contents for Autoscaling pods

Create new playlist

Sign In

Sign Up

Table of Contents for
Autoscaling pods