Join our community of software engineering leaders and aspirational developers. Always
stay in-the-know by getting the most important news and exclusive content delivered
fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter
in the past. Click the button below to open the re-subscribe form
in a new tab. When you're done, simply close that tab and continue
with this form to complete your subscription.
The New Stack does not sell your information or share it with
unaffiliated third parties. By continuing, you agree to our
Terms of Use and
Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
What’s next?
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
Kubernetes is an open source container orchestration platform that significantly simplifies an application’s creation and management.
Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed.
These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes. In this blog, we will discuss these probes in detail. But before that, let’s first discuss health checks.
What Is a Health Check?
Health checks are a simple way to let the system know whether an instance of your app is working. If the instance of your app is not working, the other services should not access it or send requests to it. Instead, requests should be sent to another instance that is ready or you should retry sending requests.
The system should be able to bring your app to a healthy state. By default, Kubernetes will start sending traffic to the pod when all the containers inside the pod have started. Kubernetes will restart containers when they crash. This default behavior should be enough to get started. Making deployments more robust becomes relatively straightforward as Kubernetes helps create custom health checks. But before we do that, let’s discuss the pod life cycle.
Pod Life Cycle
A Kubernetes pod follows a defined life cycle. These are the different phases:
When the pod is first created, it starts with a pending phase. The scheduler tries to figure out where to place the pod. If the scheduler can’t find the node to place the pod, it will remain pending. (To check why the pod is in pending state, run the kubectl describe pod <pod name> command).
Once the pod is scheduled, it goes to the container creating phase, where the images required for the application are pulled, and the container starts.
Once the containers are in the pod, it moves to the running phase, where it continues until the program is completed successfully or terminated.
To check the status of the pod, run the kubectl get pod command and check the STATUS column. As you can see, in this case all the pods are in running state. Also, the READY column states the pod is ready to accept user traffic.
# kubectl get pod
NAME READY STATUS RESTARTS AGE
my-nginx-6b74b79f57-fldq6 1/1 Running 0 20s
my-nginx-6b74b79f57-n67wp 1/1 Running 0 20s
my-nginx-6b74b79f57-r6pcq 1/1 Running 0 20s
Different Types of Probes in Kubernetes
Kubernetes gives you the following types of health checks:
Readiness probes: This probe will tell you when your app is ready to serve traffic. Kubernetes will ensure the readiness probe passes before allowing a service to send traffic to the pod. If the readiness probe fails, Kubernetes will not send the traffic to the pod until it passes.
Liveness probes: Liveness probes will let Kubernetes know whether your app is healthy. If your app is healthy, Kubernetes will not interfere with pod functioning, but if it is unhealthy, Kubernetes will destroy the pod and start a new one to replace it.
To understand this further, let’s use a real-world scenario as an example. You have an application that needs some time to warm up or download the application content from some external source like GitHub. Your application shouldn’t receive traffic until it’s fully ready. By default, Kubernetes will start sending traffic as soon as the process inside the container starts. Using the readiness probe, Kubernetes will wait until the app has fully started before it allows the service to send traffic to the new copy.
Let’s take another scenario where your application crashes due to a bug in code (maybe an edge case), and it hangs indefinitely and stops serving requests. Because your process continues to run by default, Kubernetes will send traffic to the broken pod. Using the liveness probes, Kubernetes will detect the app is no longer serving requests and restart the malfunctioning pod by default.
With the theory part done, let us see how to define the probes. There are three types of probes:
HTTP
TCP
Command
Note: You have an option to start by defining either the readiness or liveness probes, as the implementation for both requires a similar template. For example, if we first define livenessProbe, we can use it to define readinessProbe or vice-versa.
HTTP probes (httpGet): This is the most common probe type. Even if your app isn’t an HTTP server, you can usually create a lightweight HTTP server inside your app to respond to the liveness probe. Kubernetes will ping a path (for example, /healthz) at a given port (8080 in this example). If it gets an HTTP response in the 200 or 300 range, it will be marked as healthy. (For more information regarding HTTP response codes, refer to this link). Otherwise, it will be marked as unhealthy. Here is how you can define HTTP livelinessProbe:
livenessProbe:
httpGet:
path: /healthz
port: 8080
HTTP readiness probe is defined just like the HTTP livelinessProbe; you just have to replace liveness with readiness.
readinessProbe:httpGet:
path: /healthz
port: 8080
TCP probes (tcpSocket): With TCP probes, Kubernetes will try to establish a TCP connection on the specified port (for example, port 8080 in the below example). If it can establish a connection, the container is considered healthy. If it can’t, it’s considered a failure. These probes will be handy where HTTP or command probes don’t work well. For example, the FTP service will be able to use this type of probe.
readinessProbe:
tcpSocket:
port: 8080
Command probes (exec command): In the case of commandprobes, Kubernetes will run a command inside your container. If the command returns an exit code zero, the container will be marked as healthy. Otherwise, it will be marked as unhealthy. This type of probe is useful when you can’t or don’t want to run an HTTP server, but you can run a command that will check whether your app is healthy. In the example below, we check whether the file /tmp/healthy exists, and if the command returns an exit code zero, the container will be marked as healthy; otherwise, it will be marked as unhealthy.
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
Probes can be configured in many ways based on how often they need to run, the success and failure thresholds, and how long to wait for responses.
initialDelaySeconds (default value 0): If you know your application needs n seconds (for example, 30 seconds) to warm up, you can add delay in seconds until the first check is executed by using initialDelaySeconds.
periodSeconds (default value 10): If you want to specify how often you execute a check, you can define that using periodSeconds.
timeoutSeconds (default value 1): This defines the maximum number of seconds until the probe operation is timed out.
successThreshold (default value 1): This is the number of attempts until the probe is considered successful after the failure.
failureThreshold (default value 3): In case of probe failure, Kubernetes makes multiple attempts before the probe is marked as failed.
Note: By default, the probe will stop if the application is not ready after three attempts. In case of a liveness probe, it will restart the container. In the case of a readiness probe, it will mark pods as unhealthy.
For more information about probe configuration, refer to this link.
Let’s combine everything we have discussed so far. The key thing to note here is the use of readinessProbe with httpGet. The first check will be executed after 10 seconds, and then it will be repeated after every 5 seconds.
To create a pod, use the kubectl create command and specify the YAML manifest file with -f flag. You can give any name to the file, but it should end with a .yaml extension.
kubectl create -f readinessprobe.yaml
pod/nginx created
If you check the pod’s status now, it should show the status as Running under the STATUS column. But if you check the READY column, it will still show 0/1, which means it’s not ready to accept a new connection.
kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 0/1 Running 0 16s
Verify the status after a few seconds as we set the initial delay of a second. By now, the pod should be running.
kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 28s
To check the detailed status of all the parameters (for example, initialDelaySeconds, periodSeconds, etc.) used when defining readiness probe, run the kubectl describe command.
Let’s further reinforce the concept of liveness and readiness probe with the help of an example. First, let’s start with a liveness probe. In the below example, we are executing a command, ‘touch healthy; sleep 20; rm -rf healthy; sleep 600’.
With this command, we have created a filename “healthy” using touch command. This file will exist in the container for the first 20 seconds, then it will be removed by using the rm -rf command. Lastly, the container will sleep for 600 seconds.
Then we defined the liveness probe. It first checks whether the file exists using the cat healthy command. It does that with an initial delay of five seconds. We further define the parameter periodSeconds which performs a liveness probe every five seconds. Once we delete the file, after 20 seconds the probe will be in a failed state.
To create a pod, store the above code in a file that ends with .yaml (for example, liveness-probe.yaml) and execute the kubectl create command with -f <file name>, which will create the pod.
# kubectl create -f liveness-probe.yaml
pod/liveness-probe-exec created
Run the kubectl get events command, and you will see that the liveness probe has failed, and the container has been killed and restarted.
54s Normal Scheduled pod/liveness-probe-exec Successfully assigned default/liveness-probe-exec to controlplane
53s Normal Pulling pod/liveness-probe-exec Pulling image "busybox"
52s Normal Pulled pod/liveness-probe-exec Successfully pulled image "busybox" in 384.330188ms
52s Normal Created pod/liveness-probe-exec Created container liveness-probe
52s Normal Started pod/liveness-probe-exec Started container liveness-probe
18s Warning Unhealthy pod/liveness-probe-exec Liveness probe failed: cat: can't open 'healthy': No such file or directory
18s Normal Killing pod/liveness-probe-exec Container liveness-probe failed liveness probe, will be restarted
You can also verify it by using the kubectl get pods command, and as you can see in the restart column, the container is restarted once.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-probe-exec 1/1 Running 1 24s
Now that you understand how the liveness probe works, let’s understand how the readiness probe works by tweaking the above example to define it as a readiness probe. In the example below, we execute a command inside the container (sleep 20; touch healthy; sleep 600), which first sleeps for 20 seconds, creates a file and finally sleeps for 600 seconds. As the initial delay is set to 15 seconds, the first check is executed with a delay of 15 seconds.
To create a pod, store the above code in a file that ends with .yaml, and execute the kubectl create command, which will create the pod.
# kubectl create -f readiness-probe.yaml
pod/readiness-probe-exec created
If you execute kubectl get events here, you will see that the probe failed as the file is not present.
63s Normal Scheduled pod/readiness-probe-exec Successfully assigned default/readiness-probe-exec to controlplane
62s Normal Pulling pod/readiness-probe-exec Pulling image "busybox"
62s Normal Pulled pod/readiness-probe-exec Successfully pulled image "busybox" in 156.57701ms
61s Normal Created pod/readiness-probe-exec Created container readiness-probe
61s Normal Started pod/readiness-probe-exec Started container readiness-probe
42s Warning Unhealthy pod/readiness-probe-exec Readiness probe failed: cat: can't open 'healthy': No such file or directory
If you check the status of the container initially, it is not in a ready state.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-probe-exec 0/1 Running 0 5s
But if you check it after 20 seconds, it should be in the running state.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-probe-exec 1/1 Running 0 27s
Conclusion
Health checks are required for any distributed system, and Kubernetes is no exception. Using health checks gives your Kubernetes services a solid foundation, better reliability and higher uptime.
Easiest On-Call with SRE – Squadcast is a cloud-based software that unites on-call alerting & incident management along with Site Reliability Engineering (SRE) workflows under one platform, empowering organizations to improve their incident resolution metrics and the reliability of their systems.
Learn More
The latest from Squadcast
Plug: Use K8s With Squadcast for Faster Resolution
Squadcast is an incident management tool that’s purpose-built for site reliability engineering. It allows you to get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. You also can work in collaboration using virtual incident war rooms and use automation to eliminate toil.
Easiest On-Call with SRE – Squadcast is a cloud-based software that unites on-call alerting & incident management along with Site Reliability Engineering (SRE) workflows under one platform, empowering organizations to improve their incident resolution metrics and the reliability of their systems.