Restoring Backups—High Availability Installs
This procedure describes how to use RKE to restore a snapshot of the Rancher Kubernetes cluster. The cluster snapshot will include Kubernetes configuration and the Rancher database and state.
- 1. Preparation
- 2. Place Snapshot and PKI Bundle
- 3. Configure RKE
- 4. Restore Database
- 5. Bring Up the Cluster
Prepare by creating 3 new nodes to be the target for the restored Rancher instance. See HA Install for node requirements.
We recommend that you start with fresh nodes and a clean state. Alternatively you can clear Kubernetes and Rancher configurations from the existing nodes. This will destroy the data on these nodes. See Node Cleanup for the procedure.
IMPORTANT: Before starting the restore make sure all the kubernetes services on the old cluster nodes are stopped. We recommend powering off the nodes to be sure.
2. Place Snapshot and PKI Bundle
Pick a one of the clean nodes. That node will be the “target node” for the initial restore. Place the snapshot and PKI certificate bundle files in the
/opt/rke/etcd-snapshots directory on the “target node”.
- Snapshot -
- PKI Bundle -
3. Configure RKE
Make a copy of your original
cp rancher-cluster.yml rancher-cluster-restore.yml
Modify the copy and make the following changes.
- Remove or comment out entire the
addons:section. The Rancher deployment and supporting configuration is already in the
- Change your
nodes:section to point to the restore nodes.
- Comment out the nodes that are not your “target node”. We want the cluster to only start on that one node.
nodes: - address: 220.127.116.11 # New Target Node user: ubuntu role: [ etcd, controlplane, worker ] # - address: 18.104.22.168 # user: ubuntu # role: [ etcd, controlplane, worker ] # - address: 22.214.171.124 # user: ubuntu # role: [ etcd, controlplane, worker ] # addons: |- # --- # kind: Namespace # apiVersion: v1 # metadata: # name: cattle-system # --- ...
4. Restore Database
Use RKE with the new
rancher-cluster-restore.yml configuration and restore the database to the single “target node”.
rke etcd snapshot-restore --name <snapshot>.db --config ./rancher-cluster-restore.yml
Note: RKE will create an
etcdcontainer with the restored database on the “target node”. This container will not complete the
etcdinitialization and stay in a running state until the cluster brought up in the next step.
5. Bring Up the Cluster
Use RKE and bring up the cluster on the single “target node”.
rke up --config ./rancher-cluster-restore.yml
Testing the Cluster
Once RKE completes it will have created a credentials file in the local directory. Configure
kubectl to use the
kube_config_rancher-cluster-restore.yml credentials file and check on the state of the cluster. See Installing and Configuring kubectl for details.
Your new cluster will take a few minutes to stabilize. Once you see the new “target node” transition to
Ready and three old nodes in
NotReady you are ready to continue.
kubectl get nodes NAME STATUS ROLES AGE VERSION 126.96.36.199 Ready controlplane,etcd,worker 1m v1.10.5 188.8.131.52 NotReady controlplane,etcd,worker 16d v1.10.5 184.108.40.206 NotReady controlplane,etcd,worker 16d v1.10.5 220.127.116.11 NotReady controlplane,etcd,worker 16d v1.10.5
Cleaning up Old Nodes
kubectl to delete the old nodes from the cluster.
kubectl delete node 18.104.22.168 22.214.171.124 126.96.36.199
Reboot the Target Node
Reboot the target node to ensure the cluster networking and services are in a clean state before continuing.
Check Kubernetes Pods
Wait for the pods running in
ingress-nginx and the
rancher pod in
cattle-system to return to the
cattle-node-agentpods will be in an
CrashLoopBackOffstate until Rancher server is up and the DNS/Load Balancer have been pointed at the new cluster.
kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE cattle-system cattle-cluster-agent-766585f6b-kj88m 0/1 Error 6 4m cattle-system cattle-node-agent-wvhqm 0/1 Error 8 8m cattle-system rancher-78947c8548-jzlsr 0/1 Running 1 4m ingress-nginx default-http-backend-797c5bc547-f5ztd 1/1 Running 1 4m ingress-nginx nginx-ingress-controller-ljvkf 1/1 Running 1 8m kube-system canal-4pf9v 3/3 Running 3 8m kube-system cert-manager-6b47fc5fc-jnrl5 1/1 Running 1 4m kube-system kube-dns-7588d5b5f5-kgskt 3/3 Running 3 4m kube-system kube-dns-autoscaler-5db9bbb766-s698d 1/1 Running 1 4m kube-system metrics-server-97bc649d5-6w7zc 1/1 Running 1 4m kube-system tiller-deploy-56c4cf647b-j4whh 1/1 Running 1 4m
Adding in Additional Nodes
rancher-cluster-restore.yml RKE config file and uncomment the additional nodes.
nodes: - address: 188.8.131.52 # New Target Node user: ubuntu role: [ etcd, controlplane, worker ] - address: 184.108.40.206 user: ubuntu role: [ etcd, controlplane, worker ] - address: 220.127.116.11 user: ubuntu role: [ etcd, controlplane, worker ] # addons: |- # --- # kind: Namespace ...
Run RKE and add the nodes to the new cluster.
rke up --config ./rancher-cluster-restore.yml
Rancher should now be running and available to manage your Kubernetes clusters. Swap your Rancher DNS or Load Balancer endpoints to target the new cluster. Once this is done the agents on your managed clusters should automatically reconnect. This may take 10-15 minutes due to reconnect back off timeouts.
IMPORTANT: Remember to save your new RKE config (
kube_config_rancher-cluster-restore.yml) files in a safe place for future maintenance.