© The Author(s), under exclusive license to APress Media, LLC , part of Springer Nature 2021
P. MartinKuberneteshttps://doi.org/10.1007/978-1-4842-6494-2_8

8. Application Self-Healing

Philippe Martin1  
(1)
Gif-sur-Yvette, France
 

When you start a Pod on a cluster, it is scheduled on a specific node of the cluster. If the node, at a given moment, is not able to continue to host this Pod, the Pod will not be restarted on a new node – the application is not self-healing.

Let’s have a try, on a cluster with more than one worker (e.g., on the cluster installed in Chapter 1).

First, run a Pod; then examine on which node it has been scheduled:
$ kubectl run nginx --image=nginx
pod/nginx created
$ kubectl get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP           NODE
nginx   1/1     Running   0          12s   10.244.1.8   worker-0

Here, the Pod has been scheduled on the node worker-0.

Let’s put this node in maintenance mode, to see what happens to the Pod:
$ kubectl drain worker-0 --force
node/worker-0 cordoned
WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, Daemon
Set or StatefulSet: default/nginx
evicting pod "nginx"
pod/nginx evicted
node/worker-0 evicted
$ kubectl get pods
No resources found in default namespace.
You can see that the Pod you created has disappeared and has not been recreated in another node. We finish our experiment here. You can make your node schedulable again:
$ kubectl uncordon worker-0
node/worker-0 uncordoned

Controller to the Rescue

We have seen in Chapter 5, section “Pod Controllers,” that using Pod controllers ensures your Pod is scheduled in another node if one node stops to work.

Let’s make the experience again, with a Deployment:
$ kubectl create deployment nginx --image=nginx
deployment.apps/nginx created
$ kubectl get pods -o wide
NAME                     READY   STATUS   RESTARTS   AGE   IP          NODE
nginx-554b9c67f9-ndtsz   1/1     Running  0          11s   10.244.1.9 worker-0
$ kubectl drain worker-0
node/worker-0 cordoned
evicting pod "nginx-554b9c67f9-ndtsz"
pod/nginx-554b9c67f9-ndtsz evicted
node/worker-0 evicted
$ kubectl get pods -o wide
NAME                     READY   STATUS   RESTARTS   AGE      IP            NODE
nginx-554b9c67f9-5kz5v   1/1     Running  0          4s      10.244.2.9    worker-1

This time, we can see that a Pod has been recreated in another node of the cluster – our app now survives a node eviction.

Liveness Probes

It is possible to define a liveness probe for each container of a Pod. If the kubelet is not able to execute the probe successfully a given number of times, the container is considered not healthy and is restarted into the same Pod.

This probe should be used to detect that the container is not responsive.

There are three possibilities for the liveness probe:
  • Make an HTTP request.

    If your container is an HTTP server, you can add an endpoint that always replies with a success response and define the probe with this endpoint. If your backend is not healthy anymore, it is probable that this endpoint will not respond either.

  • Execute a command.

    Most server applications have an associate CLI application. You can use this CLI to execute a very simple operation on the server. If the server is not healthy, it is probable it will not respond to this simple request either.

  • Make a TCP connection.

    When a server running in a container communicates via a non-HTTP protocol (on top of TCP), you can try to open a socket to the application. If the server is not healthy, it is probable that it will not respond to this connection request.

You have to use the declarative form to declare liveness probes.

A Note About Readiness Probes

Note that it is also possible to define a readiness probe for a container. The main role of the readiness probe is to indicate if a Pod is ready to serve network requests. The Pod will be added to the list of backends of matching Services when the readiness probe succeeds.

Later, during the container execution, if a readiness probe fails, the Pod will be removed from the list of backends of Services. This can be useful to detect that a container is not able to handle more connections (e.g., if it is already treating a lot of connections) and stop sending new ones.

HTTP Request Liveness Probe

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    livenessProbe:
      httpGet:
        scheme: HTTP
        port: 80
        path: /healthz

Here, we define a probe that queries the /healthz endpoint. As nginx is not configured by default to reply to this path, it will reply with a 404 response code, and the probe will fail. This is not a real case, but that simulates an nginx server that would reply in error to a simple request.

You can see in the Pod events that after three failed probes, the container is restarted:
$ kubectl describe pod nginx
[...]
Events:
  Type     Reason     Age                From              Message
  ----     ------     ----               ----              -------
  Normal   Started    31s                kubelet, minikube Started container nginx
  Normal   Pulling    0s (x2 over 33s)   kubelet, minikube Pulling image "nginx"
  Warning  Unhealthy  0s (x3 over 20s)   kubelet, minikube Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    0s                 kubelet, minikube Container nginx failed liveness probe, will be restarted

Command Liveness Probe

apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
  - image: postgres
    name: postgres
    livenessProbe:
      initialDelaySeconds: 10
      exec:
        command:
        - "psql"
        - "-h"
        - "localhost"
        - "-U"
        - "unknownUser"
        - "-c"
        - "select 1"

Here, the liveness probe tries to connect to the server using the psql command and execute a very simple SQL query (SELECT 1) as user unknownUser. As this user does not exist, the query will fail.

You can see in the Pod events that after three failed probes, the container is restarted:
$ kubectl describe pod postgres
[...]
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  <unknown>                default-scheduler  Successfully assigned default/postgres to minikube
  Warning  Unhealthy  0s (x3 over 20s)         kubelet, minikube  Liveness probe failed: psql: error: could not connect to server: FATAL: role "unknownUser" does not exist
  Normal   Killing    0s                       kubelet, minikube  Container postgres failed liveness probe, will be restarted

TCP Connection Liveness Probe

apiVersion: v1
kind: Pod
metadata:
  name: postgres
spec:
  containers:
  - image: postgres
    name: postgres
    livenessProbe:
      initialDelaySeconds: 10
      tcpSocket:
        port: 5433

Here, the liveness probe tries to connect to the container on the 5433 port. As postgres listens on the port 5432, the connection will fail.

You can see in the Pod events that after three failed probes, the container is restarted:
$ kubectl describe pod postgres
[...]
Events:
  Type     Reason     Age               From                Message
  ----     ------     ----              ----                -------
  Normal   Started    25s               kubelet, minikube   Started container postgres
  Warning  Unhealthy  0s (x3 over 15s)  kubelet, minikube   Liveness probe failed: dial tcp 172.17.0.3:5433: connect: connection refused
  Normal   Killing    0s                kubelet, minikube   Container postgres failed  liveness probe, will be restarted

Resource Limits and Quality of Service (QoS) Classes

You can define for each container of Pods resource (CPU and memory) requests and limits.

The resource requests values are used to schedule a Pod in a node having at least the requested resources available (see Chapter 9, section “Resource Requests”).

If you do not declare limits, each container will still have access to all the resources of the node; in this case, if some Pods are not using all their requested resources at a given time, some other containers will be able to use them and vice versa.

In contrast, if a limit is declared for a container, the container will be constrained to those particular resources. If it tries to allocate more memory than its limit, it will get a memory allocation error and will probably crash or work on a degraded mode; and it will have access to the CPU in its limit only.

Depending on whether the requests and limits values are declared or not, a different Quality of Service is assured for a Pod:
  • If all containers of a Pod have declared requests and limits for all resources (CPU and memory) and the limits equal the requests, the Pod will be running with a Guaranteed QoS class.

  • Or if at least one container of a Pod has a resource request or limit, the Pod will be running with a Burstable QoS class.

  • Otherwise, if no request nor limit is declared for its containers, the Pod will be running with a Best Effort QoS class.

If a node runs out of an incompressible resource (memory), the associated kubelet can decide to eject one or more Pods, to prevent total starvation of the resource.

The Pods evicted are decided depending on their Quality of Service class: first Best Effort ones, then Burstable ones, and finally Guaranteed ones.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.162.87