The CNCF recently released 9 Kubernetes Security Best Practices Everyone Must Follow, in which they outline nine basic actions that they recommend people take with their Kubernetes clusters.
Although their recommendations are a good start, the article leans heavily on GKE. For those of you who are committed to using Google’s services, GKE is a good solution. However, others want to run in Amazon, Azure, DigitalOcean, on their own infrastructure, or anywhere else they can think of, and having solutions that point to GKE doesn’t help them.
For these people, Rancher is a great open source solution.
Rancher Labs takes security seriously. Darren Shepherd, who is one of the founders, discovered the bug that resulted in CVE-2018-1002105 in December, 2018. Security isn’t an afterthought or something you remember to do after you deploy an insecure cluster. You don’t, for example, build a house, move all of your belongings into it, and then put locks on the door.
In this article, I’ll respond to each of the points raised by the CNCF and walk you through how Rancher and RKE satisfy these security recommendations by default.
Upgrade to the Latest Version
This is sound advice that doesn’t only apply to Kubernetes. Unpatched software is the most common entry point for attackers when they breach systems. When a CVE is released and proof of concept code is made publicly available, tool suites such as Metasploit quickly include the exploits in their standard kit. Anyone with the skill to copy and paste commands from the Internet can find themselves in control of your systems.
When using Rancher Kubernetes Engine (RKE), either standalone or when installed by Rancher, you can choose the version of Kubernetes to install. Rancher Labs uses native upstream Kubernetes, and this enables the company to quickly respond to security alerts, releasing patched versions of the software. Because RKE runs the Kubernetes components within Docker containers. operations teams can perform zero-downtime upgrades of critical infrastructure.
I recommend that you follow Rancher Labs on Twitter to receive announcements about new releases. I also strongly recommend that you test new versions in a staging environment before upgrading, but in the event that an upgrade goes awry, Rancher makes it just as easy to roll back to a previous version.
Enable Role-Based Access Control (RBAC)
RKE installs with RBAC enabled by default. If you’re only using RKE, or any other standalone Kubernetes deployment, you’re responsible for configuring the accounts, roles, and bindings to secure your cluster.
If you’re using Rancher, it not only installs secure clusters, but it proxies all communication to those clusters through the Rancher server. Rancher plugs into a number of backend authentication providers, such as Active Directory, LDAP, SAML, Github, and more. When connected in this way, Rancher enables you to extend your existing corporate authentication out to all of the Kubernetes clusters under Rancher’s umbrella, no matter where they’re running.
Rancher enables roles at the global, cluster, and project level, and it makes it possible for administrators to define roles in a single place and apply them to all clusters.
This combination of RBAC-by-default and strong controls for authentication and authorization means that from the moment you deploy a cluster with Rancher or RKE, that cluster is secure.
Use Namespaces to Establish Security Boundaries
Because of the special way that Kubernetes treats the
default namespace, I don’t recommend that you use it. Instead, create a namespace for each of your applications, defining them as logical groups.
Rancher defines an additional layer of abstraction called a Project. A Project is a collection of namespaces, onto which roles can be mapped. Users with access to one Project cannot see any or interact with any workload running in another Project to which they do not have access. This effectively creates single-cluster multi-tenancy.
Using Projects makes it easier for administrators to grant access to multiple namespaces within a single cluster. It minimizes duplicated configuration and reduces human error.
Separate Sensitive Workloads
This is a good suggestion, in that it presumes the question, “what happens if a workload is compromised?” Acting in advance to reduce the blast radius of a breach makes it harder for an attacker to escalate privileges, but it doesn’t make it impossible. If anything, this might buy you additional time.
Kubernetes allows you to set taints and tolerations, which control where a Pod might be deployed.
Rancher also lets you control scheduling of workloads through Kubernetes labels. In addition to taints and tolerations, when deploying a workload you can set the labels that a host must have, should have, or can have for a Pod to land there. You can also schedule workloads to a specific node if your environment is that static.
Secure Cloud Metadata Access
This suggestion states that it sensitive metadata “can sometimes be stolen or misused,” but it fails to outline the conditions of when or how. The article references a disclosure from Shopify, presented at Kubecon NA on December 13, 2018. Although this piece of the article points out a GKE feature for “metadata concealment,” it’s worth noting that the service which leaked the credentials in the first place was the Google Cloud metadata API.
There is nothing that shows the same vulnerability exists with any other cloud provider.
The only place this vulnerability might exist would be in a hosted Kubernetes service such as GKE. If you deploy RKE onto bare metal or cloud compute instances, either directly or via Rancher, you’ll end up with a cluster that cannot have credentials leaked via the cloud provider’s metadata API.
If you’re using GKE, I recommend that you activate this feature to prevent any credentials from leaking via the metadata service.
I would also argue that cloud providers should never embed credentials into metadata accessible via an API. Even if this exists for convenience, it’s an unnecessary risk with unimaginable consequences.
Create and Define Cluster Network Policies
RKE clusters, deployed directly or by Rancher, use Canal by default, although you can also choose Calico or Flannel. Both Canal and Calico include support for NetworkPolicies. Rancher-deployed clusters, when using Canal as a network provider, also support ProjectNetworkPolicies. When activated, workloads can speak to other workloads within their Project, and the System project, which includes cluster-wide components such as ingress controllers, can communicate with all projects.
Earlier versions of Rancher enabled ProjectNetworkPolicies by default, but this created confusion for some users who weren’t aware of the extra security. To provide the best experience across the entire user base, this feature is now off by default but can be easily activated at launch time or later if you change your mind.
Run a Cluster-wide Pod Security Policy
A Pod Security Policy (PSP) controls what capabilities and configuration Pods must have in order to run within your cluster. For example, you can block privileged mode, host networking, or containers running as root. When installing a cluster via Rancher or RKE, you choose if you want a restricted PSP enabled by default. If you choose to enable it, your cluster will immediately enforce strong limitations on the workload permissions.
unrestricted PSPs are the same within RKE and Rancher, so what they activate at install is identical. Rancher allows an unlimited number of additional PSP templates, all handled at the global level. Administrators define PSPs and then apply them to every cluster that Rancher manages. This, like the RBAC configuration discussed earlier, keeps security configuration in a single place and dramatically simplifies the configuration and application of the policies.
When something is easy to do, more people will do it.
Harden Node Security
This isn’t a Kubernetes-specific suggestion, but it’s a good general policy. Anything that interacts with traffic that you don’t control, such as user traffic hitting an application running within Kubernetes, should be running on nodes with a small attack surface. Disable and uninstall unneeded services. Restrict root access via SSH and require a password for
sudo. Use passphrases on SSH keys, or use 2FA, U2F keys, or a service like Krypton to bind keys to devices that your users have. These are examples of basic, standard configurations for secure systems.
Rancher requires nothing on the host beyond a supported version of Docker. RKE requires nothing but SSH access, and it will install the latest version of Docker supported by Kubernetes before continuing to install Kubernetes itself.
If you want to reduce the attack surface even more, take a look at RancherOS, a lightweight Linux operating system that runs all processes as Docker containers. The System Docker runs only the smallest number of processes necessary to provide access and run an instance of Docker in userspace for the actual workloads. Both lightweight and secure, RancherOS is what an operating system should be: secure by default.
Turn on Audit Logging
The Rancher Server runs inside of an RKE cluster, so in addition to the Kubernetes audit logging, it’s important to activate audit logging for API calls to the server itself. This log will show all activities that users execute to any cluster, including what happened, who did it, when they did it, and what cluster they did it to.
It’s also important to ship these logs off of the servers in question. Rancher connects to Splunk, Elasticsearch, Fluentd, Kafka, or any syslog endpoint, and from these you can generate dashboards and alerts for suspicious activity.
Information on enabling audit logging for the Rancher Server is available in our documentation.
For information on enabling audit logging for RKE clusters, please see the next section.
It takes more than nine changes to truly secure a Kubernetes cluster. Rancher has a hardening guide and a self assessment guide that cover more than 100 controls from the CIS Benchmark for Securing Kubernetes.
If you’re serious about security, Rancher, RKE, and RancherOS will help you stay that way.
- 2019-01-24: added clarification around ProjectNetworkPolicies and additional images
- 2019-01-23: added main image at top of article