Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8. Application Self-Healing

Philippe Martin¹

(1)

Gif-sur-Yvette, France

When you start a Pod on a cluster, it is scheduled on a specific node of the cluster. If the node, at a given moment, is not able to continue to host this Pod, the Pod will not be restarted on a new node – the application is not self-healing.

Let’s have a try, on a cluster with more than one worker (e.g., on the cluster installed in Chapter 1).

First, run a Pod; then examine on which node it has been scheduled:

$ kubectl run nginx --image=nginx

pod/nginx created

$ kubectl get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE

nginx 1/1 Running 0 12s 10.244.1.8 worker-0

Here, the Pod has been scheduled on the node worker-0.

Let’s put this node in maintenance mode, to see what happens to the Pod:

$ kubectl drain worker-0 --force

node/worker-0 cordoned

WARNING: deleting Pods not managed by ReplicationController, ReplicaSet, Job, Daemon

Set or StatefulSet: default/nginx

evicting pod "nginx"

pod/nginx evicted

node/worker-0 evicted

$ kubectl get pods

No resources found in default namespace.

You can see that the Pod you created has disappeared and has not been recreated in another node. We finish our experiment here. You can make your node schedulable again:

$ kubectl uncordon worker-0

node/worker-0 uncordoned

Controller to the Rescue

We have seen in Chapter 5, section “Pod Controllers,” that using Pod controllers ensures your Pod is scheduled in another node if one node stops to work.

Let’s make the experience again, with a Deployment:

$ kubectl create deployment nginx --image=nginx

deployment.apps/nginx created

$ kubectl get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE

nginx-554b9c67f9-ndtsz 1/1 Running 0 11s 10.244.1.9 worker-0

$ kubectl drain worker-0

node/worker-0 cordoned

evicting pod "nginx-554b9c67f9-ndtsz"

pod/nginx-554b9c67f9-ndtsz evicted

node/worker-0 evicted

$ kubectl get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE

nginx-554b9c67f9-5kz5v 1/1 Running 0 4s 10.244.2.9 worker-1

This time, we can see that a Pod has been recreated in another node of the cluster – our app now survives a node eviction.

Liveness Probes

It is possible to define a liveness probe for each container of a Pod. If the kubelet is not able to execute the probe successfully a given number of times, the container is considered not healthy and is restarted into the same Pod.

This probe should be used to detect that the container is not responsive.

There are three possibilities for the liveness probe:

Make an HTTP request.
If your container is an HTTP server, you can add an endpoint that always replies with a success response and define the probe with this endpoint. If your backend is not healthy anymore, it is probable that this endpoint will not respond either.
Execute a command.
Most server applications have an associate CLI application. You can use this CLI to execute a very simple operation on the server. If the server is not healthy, it is probable it will not respond to this simple request either.
Make a TCP connection.
When a server running in a container communicates via a non-HTTP protocol (on top of TCP), you can try to open a socket to the application. If the server is not healthy, it is probable that it will not respond to this connection request.

You have to use the declarative form to declare liveness probes.

A Note About Readiness Probes

Note that it is also possible to define a readiness probe for a container. The main role of the readiness probe is to indicate if a Pod is ready to serve network requests. The Pod will be added to the list of backends of matching Services when the readiness probe succeeds.

Later, during the container execution, if a readiness probe fails, the Pod will be removed from the list of backends of Services. This can be useful to detect that a container is not able to handle more connections (e.g., if it is already treating a lot of connections) and stop sending new ones.

HTTP Request Liveness Probe

apiVersion: v1

kind: Pod

metadata:

spec:

containers:

- image: nginx

livenessProbe:

httpGet:

scheme: HTTP

port: 80

path: /healthz

Here, we define a probe that queries the /healthz endpoint. As nginx is not configured by default to reply to this path, it will reply with a 404 response code, and the probe will fail. This is not a real case, but that simulates an nginx server that would reply in error to a simple request.

You can see in the Pod events that after three failed probes, the container is restarted:

$ kubectl describe pod nginx

[...]

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Started 31s kubelet, minikube Started container nginx

Normal Pulling 0s (x2 over 33s) kubelet, minikube Pulling image "nginx"

Warning Unhealthy 0s (x3 over 20s) kubelet, minikube Liveness probe failed: HTTP probe failed with statuscode: 404

Normal Killing 0s kubelet, minikube Container nginx failed liveness probe, will be restarted

Command Liveness Probe

apiVersion: v1

kind: Pod

metadata:

spec:

containers:

- image: postgres

livenessProbe:

initialDelaySeconds: 10

exec:

command:

- "psql"

- "-h"

- "localhost"

- "-U"

- "unknownUser"

- "-c"

- "select 1"

Here, the liveness probe tries to connect to the server using the psql command and execute a very simple SQL query (SELECT 1) as user unknownUser. As this user does not exist, the query will fail.

You can see in the Pod events that after three failed probes, the container is restarted:

$ kubectl describe pod postgres

[...]

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled <unknown> default-scheduler Successfully assigned default/postgres to minikube

Warning Unhealthy 0s (x3 over 20s) kubelet, minikube Liveness probe failed: psql: error: could not connect to server: FATAL: role "unknownUser" does not exist

Normal Killing 0s kubelet, minikube Container postgres failed liveness probe, will be restarted

TCP Connection Liveness Probe

apiVersion: v1

kind: Pod

metadata:

spec:

containers:

- image: postgres

livenessProbe:

initialDelaySeconds: 10

tcpSocket:

port: 5433

Here, the liveness probe tries to connect to the container on the 5433 port. As postgres listens on the port 5432, the connection will fail.

You can see in the Pod events that after three failed probes, the container is restarted:

$ kubectl describe pod postgres

[...]

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Normal Started 25s kubelet, minikube Started container postgres

Warning Unhealthy 0s (x3 over 15s) kubelet, minikube Liveness probe failed: dial tcp 172.17.0.3:5433: connect: connection refused

Normal Killing 0s kubelet, minikube Container postgres failed liveness probe, will be restarted

Resource Limits and Quality of Service (QoS) Classes

You can define for each container of Pods resource (CPU and memory) requests and limits.

The resource requests values are used to schedule a Pod in a node having at least the requested resources available (see Chapter 9, section “Resource Requests”).

If you do not declare limits, each container will still have access to all the resources of the node; in this case, if some Pods are not using all their requested resources at a given time, some other containers will be able to use them and vice versa.

In contrast, if a limit is declared for a container, the container will be constrained to those particular resources. If it tries to allocate more memory than its limit, it will get a memory allocation error and will probably crash or work on a degraded mode; and it will have access to the CPU in its limit only.

Depending on whether the requests and limits values are declared or not, a different Quality of Service is assured for a Pod:

If all containers of a Pod have declared requests and limits for all resources (CPU and memory) and the limits equal the requests, the Pod will be running with a Guaranteed QoS class.
Or if at least one container of a Pod has a resource request or limit, the Pod will be running with a Burstable QoS class.
Otherwise, if no request nor limit is declared for its containers, the Pod will be running with a Best Effort QoS class.

If a node runs out of an incompressible resource (memory), the associated kubelet can decide to eject one or more Pods, to prevent total starvation of the resource.

The Pods evicted are decided depending on their Quality of Service class: first Best Effort ones, then Burstable ones, and finally Guaranteed ones.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 8. Application Self-Healing

Create new playlist

Sign In

Sign Up

8. Application Self-Healing

Controller to the Rescue

Liveness Probes

A Note About Readiness Probes

HTTP Request Liveness Probe

Command Liveness Probe

TCP Connection Liveness Probe

Resource Limits and Quality of Service (QoS) Classes

Table of Contents for
8. Application Self-Healing