Health checks

Kubernetes provides three layers of health checking. First, in the form of HTTP or TCP checks, K8s can attempt to connect to a particular endpoint and give a status of healthy on a successful connection. Second, application-specific health checks can be performed using command-line scripts. We can also use the exec container to run a health check from within your container. Anything that exits with a 0 status will be considered healthy.

Let's take a look at a few health checks in action. First, we'll create a new controller named nodejs-health-controller.yaml with a health check:

apiVersion: v1 
kind: ReplicationController 
metadata: 
  name: node-js 
  labels: 
    name: node-js 
spec: 
  replicas: 3 
  selector: 
    name: node-js 
  template: 
    metadata: 
      labels: 
        name: node-js 
    spec: 
      containers: 
      - name: node-js 
        image: jonbaier/node-express-info:latest 
        ports: 
        - containerPort: 80 
        livenessProbe: 
          # An HTTP health check  
          httpGet: 
            path: /status/ 
            port: 80 
          initialDelaySeconds: 30 
          timeoutSeconds: 1

Note the addition of the livenessprobe element. This is our core health check element. From here, we can specify httpGet, tcpScoket, or exec. In this example, we use httpGet to perform a simple check for a URI on our container. The probe will check the path and port specified and restart the pod if it doesn't successfully return.

Status codes between 200 and 399 are all considered healthy by the probe.

Finally, initialDelaySeconds gives us the flexibility to delay health checks until the pod has finished initializing. The timeoutSeconds value is simply the timeout value for the probe.

Let's use our new health check-enabled controller to replace the old node-js RC. We can do this using the replace command, which will replace the replication controller definition:

$ kubectl replace -f nodejs-health-controller.yaml

Replacing the RC on its own won't replace our containers because it still has three healthy pods from our first run. Let's kill off those pods and let the updated ReplicationController replace them with containers that have health checks:

$ kubectl delete pods -l name=node-js

Now, after waiting a minute or two, we can list the pods in an RC and grab one of the pod IDs to inspect it a bit deeper with the describe command:

$ kubectl describe rc/node-js

The following screenshot is the result of the preceding command:

Description of node-js replication controller

Now, use the following command for one of the pods:

$ kubectl describe pods/node-js-7esbp

The following screenshot is the result of the preceding command:

Description of node-js-1m3cs pod

At the top, we'll see the overall pod details. Depending on your timing, under State, it will either show Running or Waiting with a CrashLoopBackOff reason and some error information. A bit below that, we can see information on our Liveness probe and we will likely see a failure count above 0. Further down, we have the pod events. Again, depending on your timing, you are likely to have a number of events for the pod. Within a minute or two, you'll note a pattern of killing, started, and created events repeating over and over again. You should also see a note in the Killing entry that the container is unhealthy. This is our health check failing because we don't have a page responding at /status.

You may note that if you open a browser to the service load balancer address, it still responds with a page. You can find the load balancer IP with a kubectl get services command.

This is happening for a number of reasons. First, the health check is simply failing because /status doesn't exist, but the page where the service is pointed is still functioning normally between restarts. Second, the livenessProbe is only charged with restarting the container on a health check fail. There is a separate readinessProbe that will remove a container from the pool of pods answering service endpoints.

Let's modify the health check for a page that does exist in our container, so we have a proper health check. We'll also add a readiness check and point it to the nonexistent status page. Open the nodejs-health-controller.yaml file and modify the spec section to match the following listing and save it as nodejs-health-controller-2.yaml:

apiVersion: v1 
kind: ReplicationController 
metadata: 
  name: node-js 
  labels: 
    name: node-js 
spec: 
  replicas: 3 
  selector: 
    name: node-js 
  template: 
    metadata: 
      labels: 
        name: node-js 
    spec: 
      containers: 
      - name: node-js 
        image: jonbaier/node-express-info:latest 
        ports: 
        - containerPort: 80 
        livenessProbe: 
          # An HTTP health check  
          httpGet: 
            path: / 
            port: 80 
          initialDelaySeconds: 30 
          timeoutSeconds: 1 
        readinessProbe: 
          # An HTTP health check  
          httpGet: 
            path: /status/ 
            port: 80 
          initialDelaySeconds: 30 
          timeoutSeconds: 1

This time, we'll delete the old RC, which will kill the pods with it, and create a new RC with our updated YAML file:

$ kubectl delete rc -l name=node-js-health
$ kubectl create -f nodejs-health-controller-2.yaml

Now, when we describe one of the pods, we only see the creation of the pod and the container. However, you'll note that the service load balancer IP no longer works. If we run the describe command on one of the new nodes, we'll note a Readiness probe failed error message, but the pod itself continues running. If we change the readiness probe path to path: /, we'll again be able to fulfill requests from the main service. Open up nodejs-health-controller-2.yaml in an editor and make that update now. Then, once again remove and recreate the replication controller:

$ kubectl delete rc -l name=node-js
$ kubectl create -f nodejs-health-controller-2.yaml

Now the load balancer IP should work once again. Keep these pods around as we will use them again in Chapter 3, Networking, Load Balancers, and Ingress.

Table of Contents for Health checks

Create new playlist

Sign In

Sign Up

Table of Contents for
Health checks