Continental Innovates with Rancher and Kubernetes
This procedure describes how to use RKE to restore a snapshot of the Rancher Kubernetes cluster. This will restore the Kubernetes configuration and the Rancher database and state.
Note: This document covers clusters set up with RKE >= v0.2.x, for older RKE versions refer to the RKE Documentation.
It is advised that you run the restore from your local host or a jump box/bastion where your cluster yaml, rke statefile, and kubeconfig are stored. You will need RKE and kubectl CLI utilities installed locally.
Prepare by creating 3 new nodes to be the target for the restored Rancher instance. We recommend that you start with fresh nodes and a clean state. For clarification on the requirements, review the Installation Requirements.
Alternatively you can re-use the existing nodes after clearing Kubernetes and Rancher configurations. This will destroy the data on these nodes. See Node Cleanup for the procedure.
You must restore each of your etcd nodes to the same snapshot. Copy the snapshot you’re using from one of your nodes to the others before running the etcd snapshot-restore command.
etcd snapshot-restore
IMPORTANT: Before starting the restore make sure all the Kubernetes services on the old cluster nodes are stopped. We recommend powering off the nodes to be sure.
As of RKE v0.2.0, snapshots could be saved in an S3 compatible backend. To restore your cluster from the snapshot stored in S3 compatible backend, you can skip this step and retrieve the snapshot in 4. Restore the Database and bring up the Cluster. Otherwise, you will need to place the snapshot directly on one of the etcd nodes.
Pick one of the clean nodes that will have the etcd role assigned and place the zip-compressed snapshot file in /opt/rke/etcd-snapshots on that node.
/opt/rke/etcd-snapshots
Note: Because of a current limitation in RKE, the restore process does not work correctly if /opt/rke/etcd-snapshots is a NFS share that is mounted on all nodes with the etcd role. The easiest options are to either keep /opt/rke/etcd-snapshots as a local folder during the restore process and only mount the NFS share there after it has been completed, or to only mount the NFS share to one node with an etcd role in the beginning.
Use your original rancher-cluster.yml and rancher-cluster.rkestate files. If they are not stored in a version control system, it is a good idea to back them up before making any changes.
rancher-cluster.yml
rancher-cluster.rkestate
cp rancher-cluster.yml rancher-cluster.yml.bak cp rancher-cluster.rkestate rancher-cluster.rkestate.bak
If the replaced or cleaned nodes have been configured with new IP addresses, modify the rancher-cluster.yml file to ensure the address and optional internal_address fields reflect the new addresses.
IMPORTANT: You should not rename the rancher-cluster.yml or rancher-cluster.rkestate files. It is important that the filenames match each other.
You will now use the RKE command-line tool with the rancher-cluster.yml and the rancher-cluster.rkestate configuration files to restore the etcd database and bring up the cluster on the new nodes.
Note: Ensure your rancher-cluster.rkestate is present in the same directory as the rancher-cluster.yml file before starting the restore, as this file contains the certificate data for the cluster.
When restoring etcd from a local snapshot, the snapshot is assumed to be located on the target node in the directory /opt/rke/etcd-snapshots.
rke etcd snapshot-restore --name snapshot-name --config ./rancher-cluster.yml
Note: The –name parameter expects the filename of the snapshot without the extension.
Available as of RKE v0.2.0
When restoring etcd from a snapshot located in an S3 compatible backend, the command needs the S3 information in order to connect to the S3 backend and retrieve the snapshot.
$ rke etcd snapshot-restore --config ./rancher-cluster.yml --name snapshot-name \ --s3 --access-key S3_ACCESS_KEY --secret-key S3_SECRET_KEY \ --bucket-name s3-bucket-name --s3-endpoint s3.amazonaws.com \ --folder folder-name # Available as of v2.3.0
rke etcd snapshot-restore
S3 specific options are only available for RKE v0.2.0+.
--name
--config
--s3
--s3-endpoint
--access-key
--secret-key
--bucket-name
--folder
--region
--ssh-agent-auth
--ignore-docker-version
Once RKE completes it will have created a credentials file in the local directory. Configure kubectl to use the kube_config_rancher-cluster.yml credentials file and check on the state of the cluster. See Installing and Configuring kubectl for details.
kubectl
kube_config_rancher-cluster.yml
Wait for the pods running in kube-system, ingress-nginx and the rancher pod in cattle-system to return to the Running state.
kube-system
ingress-nginx
rancher
cattle-system
Running
Note: cattle-cluster-agent and cattle-node-agent pods will be in an Error or CrashLoopBackOff state until Rancher server is up and the DNS/Load Balancer have been pointed at the new cluster.
cattle-cluster-agent
cattle-node-agent
Error
CrashLoopBackOff
kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE cattle-system cattle-cluster-agent-766585f6b-kj88m 0/1 Error 6 4m cattle-system cattle-node-agent-wvhqm 0/1 Error 8 8m cattle-system rancher-78947c8548-jzlsr 0/1 Running 1 4m ingress-nginx default-http-backend-797c5bc547-f5ztd 1/1 Running 1 4m ingress-nginx nginx-ingress-controller-ljvkf 1/1 Running 1 8m kube-system canal-4pf9v 3/3 Running 3 8m kube-system cert-manager-6b47fc5fc-jnrl5 1/1 Running 1 4m kube-system kube-dns-7588d5b5f5-kgskt 3/3 Running 3 4m kube-system kube-dns-autoscaler-5db9bbb766-s698d 1/1 Running 1 4m kube-system metrics-server-97bc649d5-6w7zc 1/1 Running 1 4m kube-system tiller-deploy-56c4cf647b-j4whh 1/1 Running 1 4m
Rancher should now be running and available to manage your Kubernetes clusters. > IMPORTANT: Remember to save your updated RKE config (rancher-cluster.yml) state file (rancher-cluster.rkestate) and kubectl credentials (kube_config_rancher-cluster.yml) files in a safe place for future maintenance for example in a version control system.