The commands/steps listed on this page can be used to check name resolution issues in your cluster.

Make sure you configured the correct kubeconfig (for example, export KUBECONFIG=$PWD/kube_config_rancher-cluster.yml for Rancher HA) or are using the embedded kubectl via the UI.

Before running the DNS checks, make sure that the overlay network is functioning correctly as this can also be the reason why DNS resolution (partly) fails.

Check if DNS pods are running

kubectl -n kube-system get pods -l k8s-app=kube-dns

Example output:

NAME                        READY   STATUS    RESTARTS   AGE
kube-dns-5fd74c7488-h6f7n   3/3     Running   0          4m13s

Check if the DNS service is present with the correct cluster-ip

kubectl -n kube-system get svc -l k8s-app=kube-dns
NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
service/kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP   4m13s

Check if domain names are resolving

Check if internal cluster names are resolving (in this example, kubernetes.default), the IP shown after Server: should be the same as the CLUSTER-IP from the kube-dns service.

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup kubernetes.default

Example output:

Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.43.0.1 kubernetes.default.svc.cluster.local
pod "busybox" deleted

Check if external names are resolving (in this example, www.google.com)

kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup www.google.com

Example output:

Server:    10.43.0.10
Address 1: 10.43.0.10 kube-dns.kube-system.svc.cluster.local

Name:      www.google.com
Address 1: 2a00:1450:4009:80b::2004 lhr35s04-in-x04.1e100.net
Address 2: 216.58.211.100 ams15s32-in-f4.1e100.net
pod "busybox" deleted

If you want to check resolving of domain names on all of the hosts, execute the following steps:

  1. Save the following file as ds-dnstest.yml

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: dnstest
    spec:
      selector:
          matchLabels:
            name: dnstest
      template:
        metadata:
          labels:
            name: dnstest
        spec:
          tolerations:
          - operator: Exists
          containers:
          - image: busybox:1.28
            imagePullPolicy: Always
            name: alpine
            command: ["sh", "-c", "tail -f /dev/null"]
            terminationMessagePath: /dev/termination-log
    
  2. Launch it using kubectl create -f ds-dnstest.yml

  3. Wait until kubectl rollout status ds/dnstest -w returns: daemon set "dnstest" successfully rolled out.

  4. Configure the environment variable DOMAIN to a fully qualified domain name (FQDN) that the host should be able to resolve (www.google.com is used as an example) and run the following command to let each container on every host resolve the configured domain name (it’s a single line command).

    export DOMAIN=www.google.com; echo "=> Start DNS resolve test"; kubectl get pods -l name=dnstest --no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do kubectl exec $pod -- /bin/sh -c "nslookup $DOMAIN > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $host cannot resolve $DOMAIN; fi; done; echo "=> End DNS resolve test"
    
  5. When this command has finished running, the output indicating everything is correct is:

    => Start DNS resolve test
    => End DNS resolve test
    

If you see error in the output, that means that the mentioned host(s) is/are not able to resolve the given FQDN.

Example error output of a situation where host with IP 209.97.182.150 had the UDP ports blocked.

=> Start DNS resolve test
command terminated with exit code 1
209.97.182.150 cannot resolve www.google.com
=> End DNS resolve test

Cleanup the alpine DaemonSet by running kubectl delete ds/dnstest.

Check upstream nameservers in kubedns container

By default, the configured nameservers on the host (in /etc/resolv.conf) will be used as upstream nameservers for kube-dns. Sometimes the host will run a local caching DNS nameserver, which means the address in /etc/resolv.conf will point to an address in the loopback range (127.0.0.0/8) which will be unreachable by the container. In case of Ubuntu 18.04, this is done by systemd-resolved. Since Rancher v2.0.7, we detect if systemd-resolved is running, and will automatically use the /etc/resolv.conf file with the correct upstream nameservers (which is located at /run/systemd/resolve/resolv.conf).

Use the following command to check the upstream nameservers used by the kubedns container:

kubectl -n kube-system get pods -l k8s-app=kube-dns --no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do echo "Pod ${pod} on host ${host}"; kubectl -n kube-system exec $pod -c kubedns cat /etc/resolv.conf; done

Example output:

Pod kube-dns-667c7cb9dd-z4dsf on host x.x.x.x
nameserver 1.1.1.1
nameserver 8.8.4.4

If the output shows an address in the loopback range (127.0.0.0/8), you can correct this in two ways:

  • Make sure the correct nameservers are listed in /etc/resolv.conf on your nodes in the cluster, please consult your operating system documentation on how to do this. Make sure you execute this before provisioning a cluster, or reboot the nodes after making the modification.
  • Configure the kubelet to use a different file for resolving names, by using extra_args as shown below (where /run/resolvconf/resolv.conf is the file with the correct nameservers):
services:
  kubelet:
    extra_args:
      resolv-conf: "/run/resolvconf/resolv.conf"

Note: As the kubelet is running inside a container, the path for files located in /etc and /usr are in /host/etc and /host/usr inside the kubelet container.

See Editing Cluster as YAML how to apply this change. When the provisioning of the cluster has finished, you have to remove the kube-dns pod to activate the new setting in the pod:

kubectl delete pods -n kube-system -l k8s-app=kube-dns
pod "kube-dns-5fd74c7488-6pwsf" deleted

Try to resolve name again using Check if domain names are resolving.