Kubernetes Health Checks Using Probes

Using health checks gives your Kubernetes services a solid foundation, better reliability and higher uptime.

Feb 16th, 2022 6:29am by Roshan Shetty

Featued image for: Kubernetes Health Checks Using Probes

Image via Squadcast

Roshan is a site reliability engineer at Squadcast. He is an open source enthusiast and mostly focuses on building tools to solve enterprise reliability problems. He likes to contribute to open source projects such as slo-tracker (github.com/roshan8/slo-tracker) in his free time.

Kubernetes is an open source container orchestration platform that significantly simplifies an application’s creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes. In this blog, we will discuss these probes in detail. But before that, let’s first discuss health checks.

What Is a Health Check?

Health checks are a simple way to let the system know whether an instance of your app is working. If the instance of your app is not working, the other services should not access it or send requests to it. Instead, requests should be sent to another instance that is ready or you should retry sending requests. The system should be able to bring your app to a healthy state. By default, Kubernetes will start sending traffic to the pod when all the containers inside the pod have started. Kubernetes will restart containers when they crash. This default behavior should be enough to get started. Making deployments more robust becomes relatively straightforward as Kubernetes helps create custom health checks. But before we do that, let’s discuss the pod life cycle.

Pod Life Cycle

A Kubernetes pod follows a defined life cycle. These are the different phases:

When the pod is first created, it starts with a pending phase. The scheduler tries to figure out where to place the pod. If the scheduler can’t find the node to place the pod, it will remain pending. (To check why the pod is in pending state, run the kubectl describe pod <pod name> command).
Once the pod is scheduled, it goes to the container creating phase, where the images required for the application are pulled, and the container starts.
Once the containers are in the pod, it moves to the running phase, where it continues until the program is completed successfully or terminated.

To check the status of the pod, run the kubectl get pod command and check the STATUS column. As you can see, in this case all the pods are in running state. Also, the READY column states the pod is ready to accept user traffic.

# kubectl get pod

NAME READY STATUS RESTARTS AGE

my-nginx-6b74b79f57-fldq6 1/1 Running 0 20s

my-nginx-6b74b79f57-n67wp 1/1 Running 0 20s

my-nginx-6b74b79f57-r6pcq 1/1 Running 0 20s

Different Types of Probes in Kubernetes

Kubernetes gives you the following types of health checks:

Readiness probes: This probe will tell you when your app is ready to serve traffic. Kubernetes will ensure the readiness probe passes before allowing a service to send traffic to the pod. If the readiness probe fails, Kubernetes will not send the traffic to the pod until it passes.
Liveness probes: Liveness probes will let Kubernetes know whether your app is healthy. If your app is healthy, Kubernetes will not interfere with pod functioning, but if it is unhealthy, Kubernetes will destroy the pod and start a new one to replace it.

To understand this further, let’s use a real-world scenario as an example. You have an application that needs some time to warm up or download the application content from some external source like GitHub. Your application shouldn’t receive traffic until it’s fully ready. By default, Kubernetes will start sending traffic as soon as the process inside the container starts. Using the readiness probe, Kubernetes will wait until the app has fully started before it allows the service to send traffic to the new copy. Let’s take another scenario where your application crashes due to a bug in code (maybe an edge case), and it hangs indefinitely and stops serving requests. Because your process continues to run by default, Kubernetes will send traffic to the broken pod. Using the liveness probes, Kubernetes will detect the app is no longer serving requests and restart the malfunctioning pod by default. With the theory part done, let us see how to define the probes. There are three types of probes:

HTTP
TCP
Command

Note: You have an option to start by defining either the readiness or liveness probes, as the implementation for both requires a similar template. For example, if we first define livenessProbe, we can use it to define readinessProbe or vice-versa.

HTTP probes (httpGet): This is the most common probe type. Even if your app isn’t an HTTP server, you can usually create a lightweight HTTP server inside your app to respond to the liveness probe. Kubernetes will ping a path (for example, /healthz) at a given port (8080 in this example). If it gets an HTTP response in the 200 or 300 range, it will be marked as healthy. (For more information regarding HTTP response codes, refer to this link). Otherwise, it will be marked as unhealthy. Here is how you can define HTTP livelinessProbe:

livenessProbe:
httpGet:
path: /healthz
port: 8080

HTTP readiness probe is defined just like the HTTP livelinessProbe; you just have to replace liveness with readiness. readinessProbe:

httpGet:
path: /healthz
port: 8080

TCP probes (tcpSocket): With TCP probes, Kubernetes will try to establish a TCP connection on the specified port (for example, port 8080 in the below example). If it can establish a connection, the container is considered healthy. If it can’t, it’s considered a failure. These probes will be handy where HTTP or command probes don’t work well. For example, the FTP service will be able to use this type of probe.

readinessProbe:
tcpSocket:
port: 8080

Command probes (exec command): In the case of commandprobes, Kubernetes will run a command inside your container. If the command returns an exit code zero, the container will be marked as healthy. Otherwise, it will be marked as unhealthy. This type of probe is useful when you can’t or don’t want to run an HTTP server, but you can run a command that will check whether your app is healthy. In the example below, we check whether the file /tmp/healthy exists, and if the command returns an exit code zero, the container will be marked as healthy; otherwise, it will be marked as unhealthy.

livenessProbe:
exec:
command:
- cat
- /tmp/healthy

Probes can be configured in many ways based on how often they need to run, the success and failure thresholds, and how long to wait for responses.

initialDelaySeconds (default value 0): If you know your application needs n seconds (for example, 30 seconds) to warm up, you can add delay in seconds until the first check is executed by using initialDelaySeconds.
periodSeconds (default value 10): If you want to specify how often you execute a check, you can define that using periodSeconds.
timeoutSeconds (default value 1): This defines the maximum number of seconds until the probe operation is timed out.
successThreshold (default value 1): This is the number of attempts until the probe is considered successful after the failure.
failureThreshold (default value 3): In case of probe failure, Kubernetes makes multiple attempts before the probe is marked as failed.

Note: By default, the probe will stop if the application is not ready after three attempts. In case of a liveness probe, it will restart the container. In the case of a readiness probe, it will mark pods as unhealthy. For more information about probe configuration, refer to this link. Let’s combine everything we have discussed so far. The key thing to note here is the use of readinessProbe with httpGet. The first check will be executed after 10 seconds, and then it will be repeated after every 5 seconds.

apiVersion: v1

kind: Pod

metadata:

labels:

run: nginx

name: nginx

spec:

containers:

- image: nginx

name: nginx

readinessProbe:

httpGet:

path: /

port: 80

initialDelaySeconds: 10

periodSeconds: 5

To create a pod, use the kubectl create command and specify the YAML manifest file with -f flag. You can give any name to the file, but it should end with a .yaml extension.

kubectl create -f readinessprobe.yaml

pod/nginx created

If you check the pod’s status now, it should show the status as Running under the STATUS column. But if you check the READY column, it will still show 0/1, which means it’s not ready to accept a new connection.

kubectl get pod

NAME READY STATUS RESTARTS AGE

nginx 0/1 Running 0 16s

Verify the status after a few seconds as we set the initial delay of a second. By now, the pod should be running.

kubectl get pod

NAME READY STATUS RESTARTS AGE

nginx 1/1 Running 0 28s

To check the detailed status of all the parameters (for example, initialDelaySeconds, periodSeconds, etc.) used when defining readiness probe, run the kubectl describe command.

kubectl describe pod nginx |grep -i readiness

Readiness: http-get http://:80/ delay=10s timeout=1s period=5s #success=1 #failure=3

Let’s further reinforce the concept of liveness and readiness probe with the help of an example. First, let’s start with a liveness probe. In the below example, we are executing a command, ‘touch healthy; sleep 20; rm -rf healthy; sleep 600’. With this command, we have created a filename “healthy” using touch command. This file will exist in the container for the first 20 seconds, then it will be removed by using the rm -rf command. Lastly, the container will sleep for 600 seconds. Then we defined the liveness probe. It first checks whether the file exists using the cat healthy command. It does that with an initial delay of five seconds. We further define the parameter periodSeconds which performs a liveness probe every five seconds. Once we delete the file, after 20 seconds the probe will be in a failed state.

apiVersion: v1

kind: Pod

metadata:

labels:

name: liveness-probe-exec

spec:

containers:

- name: liveness-probe

image: busybox

args:

- /bin/sh

- -c

- touch healthy; sleep 20; rm -rf healthy; sleep 600

livenessProbe:

exec:

command:

- cat

- healthy

initialDelaySeconds: 5

periodSeconds: 5

To create a pod, store the above code in a file that ends with .yaml (for example, liveness-probe.yaml) and execute the kubectl create command with -f <file name>, which will create the pod.

# kubectl create -f liveness-probe.yaml

pod/liveness-probe-exec created

Run the kubectl get events command, and you will see that the liveness probe has failed, and the container has been killed and restarted.

54s Normal Scheduled pod/liveness-probe-exec Successfully assigned default/liveness-probe-exec to controlplane

53s Normal Pulling pod/liveness-probe-exec Pulling image "busybox"

52s Normal Pulled pod/liveness-probe-exec Successfully pulled image "busybox" in 384.330188ms

52s Normal Created pod/liveness-probe-exec Created container liveness-probe

52s Normal Started pod/liveness-probe-exec Started container liveness-probe

18s Warning Unhealthy pod/liveness-probe-exec Liveness probe failed: cat: can't open 'healthy': No such file or directory

18s Normal Killing pod/liveness-probe-exec Container liveness-probe failed liveness probe, will be restarted

You can also verify it by using the kubectl get pods command, and as you can see in the restart column, the container is restarted once.

# kubectl get pods

NAME READY STATUS RESTARTS AGE

liveness-probe-exec 1/1 Running 1 24s

Now that you understand how the liveness probe works, let’s understand how the readiness probe works by tweaking the above example to define it as a readiness probe. In the example below, we execute a command inside the container (sleep 20; touch healthy; sleep 600), which first sleeps for 20 seconds, creates a file and finally sleeps for 600 seconds. As the initial delay is set to 15 seconds, the first check is executed with a delay of 15 seconds.

apiVersion: v1

kind: Pod

metadata:

labels:

name: readiness-probe-exec

spec:

containers:

- name: readiness-probe

image: busybox

args:

- /bin/sh

- -c

- sleep 20;touch healthy;sleep 600

readinessProbe:

exec:

command:

- cat

- healthy

initialDelaySeconds: 15

periodSeconds: 5

To create a pod, store the above code in a file that ends with .yaml, and execute the kubectl create command, which will create the pod.

# kubectl create -f readiness-probe.yaml

pod/readiness-probe-exec created

If you execute kubectl get events here, you will see that the probe failed as the file is not present.

63s Normal Scheduled pod/readiness-probe-exec Successfully assigned default/readiness-probe-exec to controlplane

62s Normal Pulling pod/readiness-probe-exec Pulling image "busybox"

62s Normal Pulled pod/readiness-probe-exec Successfully pulled image "busybox" in 156.57701ms

61s Normal Created pod/readiness-probe-exec Created container readiness-probe

61s Normal Started pod/readiness-probe-exec Started container readiness-probe

42s Warning Unhealthy pod/readiness-probe-exec Readiness probe failed: cat: can't open 'healthy': No such file or directory

If you check the status of the container initially, it is not in a ready state.

# kubectl get pods

NAME READY STATUS RESTARTS AGE

readiness-probe-exec 0/1 Running 0 5s

But if you check it after 20 seconds, it should be in the running state.

# kubectl get pods

NAME READY STATUS RESTARTS AGE

readiness-probe-exec 1/1 Running 0 27s

Conclusion

Health checks are required for any distributed system, and Kubernetes is no exception. Using health checks gives your Kubernetes services a solid foundation, better reliability and higher uptime.

Plug: Use K8s With Squadcast for Faster Resolution

Squadcast is an incident management tool that’s purpose-built for site reliability engineering. It allows you to get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. You also can work in collaboration using virtual incident war rooms and use automation to eliminate toil.