Illumina Innovates with Rancher and Kubernetes
When your application is user-facing, ensuring continuous availability and minimal downtime is a challenge. Hence, monitoring the health of the application is essential to avoid any outages.
Cattle provided the ability to add HTTP or TCP healthchecks for the deployed services in Rancher 1.6. Healthcheck support is provided by Rancher’s own healthcheck microservice. You can read more about it here.
In brief, a Cattle user can add a TCP healthcheck to a service. Rancher’s healthcheck containers, which are launched on a different host, will test if a TCP connection opens at the specified port for the service containers. Note that with the latest release (v1.6.20), healthcheck containers are also scheduled on the same host as the service containers, along with other hosts.
HTTP healthchecks can also be added while deploying services. You can ask Rancher to make an HTTP request at a specified path and specify what response is expected.
These healthchecks are done periodically at a configurable interval, and retries/timeouts are also configurable. Upon failing a healthcheck, you can also instruct Rancher if and when the container should be recreated.
Consider a service running an Nginx image on Cattle, with an HTTP healthcheck configured as below.
The healthcheck parameters appear in the rancher-compose.yml file and not the docker-compose.yml because healthcheck functionality is implemented by Rancher.
Lets see if we can configure corresponding healthchecks in Rancher 2.0.
In 2.0, Rancher uses the native Kubernetes healthcheck mechanisms: livenessProbe and readinessProbe.
As documented here, probes are diagnostics performed periodically by the Kubelet on a container. In Rancher 2.0, healthchecks are done by the Kubelet running locally, as compared to the cross-host healthchecks in Rancher 1.6.
A livenessProbe is an action performed on a container to check if the container is running. If the probe reports failure, Kubernetes will kill the pod container, and it is restarted as per the restart policy specified in the specs.
A readinessProbe is used to check if a container is ready to accept and serve requests. When a readinessProbe fails, the pod container is not exposed via the public endpoints so that no requests are made to the container.
If your workload is busy doing some startup routine before it can serve requests, it is a good idea to configure a readinessProbe for the workload.
The following types of livenessProbe and readinessProbe can be configured for Kubernetes workloads:
More configuration details for the above probes can be found here.
Via Rancher UI, users can add TCP or HTTP healthchecks to Kubernetes workloads. By default, Rancher asks you to configure a readinessProbe for the workload and applies a livenessProbe using the same configuration. You can choose to define a separate livenessProbe.
If the healthchecks fail, the container is restarted per the restartPolicy defined in the workload specs. This is equivalent to the strategy parameter in rancher-compose.yml files for 1.6 services using healthchecks in Cattle.
While deploying a workload in Rancher 2.0, users can configure TCP healthchecks to check if a TCP connection can be opened at a specific port.
Here are the Kubernetes YAML specs showing the TCP readinessProbe configured for the Nginx workload as shown above. Rancher also adds a livenessProbe to your workload using the same config.
Healthcheck parameters from 1.6 to 2.0:
You can also specify an HTTP healthcheck and provide a path in the pod container at which HTTP/HTTPS GET requests will be made by the Kubelet. However, Kubernetes only supports an HTTP/HTTPS GET request, unlike any HTTP method supported by healthchecks in Rancher 1.6.
Here are the Kubernetes YAML specs showing the HTTP readinessProbe and livenessProbe configured for the Nginx workload as shown above.
Now let’s see what happens when a healthcheck fails and how the workload recovers in Kubernetes.
Consider the above HTTP healthcheck on our Nginx workload doing an HTTP GET on the /index.html path.
To make the healthcheck fail, I did a exec into the pod container using the Execute Shell UI option in Rancher.
Once I exec’ed to the container, I moved the file that the healthcheck does a GET on.
The readinessProbe and livenessProbe check failed, and the workload status changed to unavailable.
The pod was killed and recreated soon by Kubernetes, and the workload came back up since the restartPolicy was set to Always.
Using Kubectl, you can see these healthcheck event logs.
As a quick tip, the Rancher 2.0 UI provides the helpful option to Launch Kubectl from the Kubernetes Cluster view, where you can run native Kubernetes commands on the cluster objects.
Rancher 1.6 provided healthchecks via its own microservice, which is why the healthcheck parameters that a Cattle user added to the services appear in the rancher-compose.yml file and not in the docker-compose.yml config file. The Kompose tool we used earlier in this blog series works on standard docker-compose.yml parameters and therefore cannot parse the Rancher healthcheck constructs. So as of now, we cannot use this tool for converting the Rancher healthchecks from compose config to Kubernetes Yaml.
As seen in this blog post, the configuration parameters available to add TCP or HTTP healthchecks in Rancher 2.0 are very similar to Rancher 1.6. The healthcheck config used by Cattle services can be transitioned completely to 2.0 without loss of any functionality.
In the upcoming article, I plan to explore how to map scheduling options that Cattle supports to Kubernetes in Rancher 2.0. Stay tuned!