Continental Innovates with Rancher and Kubernetes
In the Rancher UI, etcd backup and recovery for Rancher launched Kubernetes clusters can be easily performed.
Rancher recommends configuring recurrent etcd snapshots for all production clusters. Additionally, one-time snapshots can easily be taken as well.
Snapshots of the etcd database are taken and saved either locally onto the etcd nodes or to a S3 compatible target. The advantages of configuring S3 is that if all etcd nodes are lost, your snapshot is saved remotely and can be used to restore the cluster.
This section covers the following topics:
When Rancher creates a snapshot, it includes three components:
Because the Kubernetes version is now included in the snapshot, it is possible to restore a cluster to a prior Kubernetes version.
The multiple components of the snapshot allow you to select from the following options if you need to restore a cluster from a snapshot:
It’s always recommended to take a new snapshot before any upgrades.
For each etcd node in the cluster, the etcd cluster health is checked. If the node reports that the etcd cluster is healthy, a snapshot is created from it and optionally uploaded to S3.
The snapshot is stored in /opt/rke/etcd-snapshots. If the directory is configured on the nodes as a shared mount, it will be overwritten. On S3, the snapshot will always be from the last node that uploads it, as all etcd nodes upload it and the last will remain.
In the case when multiple etcd nodes exist, any created snapshot is created after the cluster has been health checked, so it can be considered a valid snapshot of the data in the etcd cluster.
The name of the snapshot is auto-generated. The --name option can be used to override the name of the snapshot when creating one-time snapshots with the RKE CLI.
When Rancher creates a snapshot of an RKE cluster, the snapshot name is based on the type (whether the snapshot is manual or recurring) and the target (whether the snapshot is saved locally or uploaded to S3). The naming convention is as follows:
Some example snapshot names are:
On restore, the following process is used:
Select how often you want recurring snapshots to be taken as well as how many snapshots to keep. The amount of time is measured in hours. With timestamped snapshots, the user has the ability to do a point-in-time recovery.
By default, Rancher launched Kubernetes clusters are configured to take recurring snapshots (saved to local disk). To protect against local disk failure, using the S3 Target or replicating the path on disk is advised.
During cluster provisioning or editing the cluster, the configuration for snapshots can be found in the advanced section for Cluster Options. Click on Show advanced options.
In the Advanced Cluster Options section, there are several options available to configure:
In addition to recurring snapshots, you may want to take a “one-time” snapshot. For example, before upgrading the Kubernetes version of a cluster it’s best to backup the state of the cluster to protect against upgrade failure.
Result: Based on your snapshot backup target, a one-time snapshot will be taken and saved in the selected backup target.
Rancher supports two different backup targets:
By default, the local backup target is selected. The benefits of this option is that there is no external configuration. Snapshots are automatically saved locally to the etcd nodes in the Rancher launched Kubernetes clusters in /opt/rke/etcd-snapshots. All recurring snapshots are taken at configured intervals. The downside of using the local backup target is that if there is a total disaster and all etcd nodes are lost, there is no ability to restore the cluster.
The S3 backup target allows users to configure a S3 compatible backend to store the snapshots. The primary benefit of this option is that if the cluster loses all the etcd nodes, the cluster can still be restored as the snapshots are stored externally. Rancher recommends external targets like S3 backup, however its configuration requirements do require additional effort that should be considered.
The backup snapshot can be stored on a custom S3 backup like minio. If the S3 back end uses a self-signed or custom certificate, provide a custom certificate using the Custom CA Certificate option to connect to the S3 backend.
Custom CA Certificate
The S3 backup target supports using IAM authentication to AWS API in addition to using API credentials. An IAM role gives temporary permissions that an application can use when making API calls to S3 storage. To use IAM authentication, the following requirements must be met:
To give an application access to S3, refer to the AWS documentation on Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances.
The list of all available snapshots for the cluster is available in the Rancher UI.
Snapshot files are timestamped to simplify processing the files using external tools and scripts, but in some S3 compatible backends, these timestamps were unusable.
The option safe_timestamp is added to support compatible file names. When this flag is set to true, all special characters in the snapshot filename timestamp are replaced.
This option is not available directly in the UI, and is only available through the Edit as Yaml interface.
Edit as Yaml
If you have any Rancher launched Kubernetes clusters that were created before v2.2.0, after upgrading Rancher, you must edit the cluster and save it, in order to enable the updated snapshot features. Even if you were already creating snapshots before v2.2.0, you must do this step as the older snapshots will not be available to use to back up and restore etcd through the UI.