prometheus-logoPrometheus is a modern and popular monitoring alerting system, built at SoundCloud and eventually open sourced in 2012 – it handles multi-dimensional time series data really well, and friends at InfinityWorks have already developed a Rancher template to deploy Prometheus at click of a button.

In hybrid cloud environments, it is likely that one might be using multiple orchestration engines such as Kubernetes and Mesos, in which case it is helpful to have the stack or application portable across environments. In this short tutorial, we will convert the template for Prometheus from Cattle format to make it work in a Kubernetes environment. It is assumed that the reader has a basic understanding of Kubernetes concepts such as pods, replication controller (RC), services and so on. If you need a refresher on the basic concepts, the Kubernetes 101 and concept guide are excellent starting points.

Prometheus Cattle Template Components

If you look at latest version of the Prometheus template here you will notice:

  • docker-compose.yml – defines containers in docker compose format
  • rancher-compose.yml – adds additional Rancher functionality to manage container lifecycle.

Below is a quick overview of each component’s role (Defined in docker-compose.yml):

  • Prometheus: is the core component which scrapes and stores data.
  • Prometheus node exporter: gets host level metrics and exposes them to Prometheus.
  • Ranch-eye: is an haproxy and exposes cAdvisor stats to Prometheus
  • Grafana: visualization for data
  • InfluxDB: Time series database specifically used to store data from Rancher server which is exported via Graphite connector
  • Prom-ranch-exporter: is a simple node.js application which helps in querying rancher server for states of a stack or service.

You will also notice that the template uses two data containers, prom-conf and graf-db, which are used to house the configuration/data and then provide to respective app containers as volumes. Additional behavior also defined in rancher-compose.yml about scaling, health checks and upgrade strategy etc.

Designing Kubernetes templates

We will define a pod for every component using replication controllers and expose these pods using Kubernetes service objects. Let’s start with the Prometheus service:

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: "default"
spec:
  type: NodePort
  ports:
  - name: "prometheus"
    port: 9090
    protocol: TCP
  selector:
    name: prometheus

We have defined a service and exposed port 9090 through which then we can stitch up other components. And finally, selector is Prometheus so the service will pick up pods with Prometheus label.

Data Containers and Volumes

Now let’s create a replication controller for Prometheus. As you might have noticed, we will need to handle two containers here. One is the app container and other is the data container for configuration file used by Prometheus. However, Kubernetes does not support data containers (read about the issue here) in same way that Docker does.

So how do we have handle this? Kubernetes has volumes of different types and for our purposes we can use EBS (awsElasticBlockStore in Kubernetes) or Google Cloud disk (gcePersistentDisk type in Kubernetes) and map those to the Prometheus pod on the fly. For this article, let’s assume that we are not using a public cloud provider and we need to make this work without cloud disk. Now we have two options:

  • hostPath: It is possible to provide the data from hostPath by mapping the hostPath to a container path (much like native Docker). But this poses one potential issue – what if the pod moves to a different node during a restart and that host does not have hostPath files available? There is also cleaning up of files from a hostPath required if the container moves to a different node. So although this option is feasible, it is not clean in design. Let’s move on.
  • gitRepo: is another type of volume that Kubernetes supports. All you have to do is map a Git repo as a volume for a container and the container will fetch it before running:
volumes:
- name: git-volume
gitRepo:
  repository: "[email protected]:me/my-git-repository.git"
  revision: "22f1d8406d464b0c0874075539c1f2e96c253775"

Volume of type gitRepo is yet to be supported on Rancher in a Kubernetes environment (See the issue filed here). You can also checkout the features of Kubernetes that are supported in the issue filed here. These features will be supported in the next release of Rancher.

To move on and make this work without gitRepo, we will build new images for the Prometheus and Grafana containers (because Grafana also needs data). We will simply extend the official images and add additional files to image itself. When gitRepo volume support is available, we can simply switch to official Docker images and use volumes from gitRepo. The Dockerfile for Prometheus looks like below:

FROM prom/prometheus:0.18.0
ADD prometheus.yml /etc/prometheus/prometheus.yml
ENTRYPOINT [ "/bin/prometheus" ]
CMD [ "-config.file=/etc/prometheus/prometheus.yml", \
"-storage.local.path=/prometheus", \
"-web.console.libraries=/etc/prometheus/console_libraries", \
"-web.console.templates=/etc/prometheus/consoles" ]

You can find both images on Docker hub, for Prometheus here and Grafana here

Back to Kubernetes Templates

With that, our final replication controller for Prometheus looks like this:

apiVersion: v1
kind: ReplicationController
metadata:
  name: prometheus-rc
  namespace: default
spec:
  replicas: 1
  selector:
  template:
    metadata:
      labels:
        name: prometheus
    spec:
      restartPolicy: Always
      containers:
      - image: infracloud/prometheus
        command:
        - /bin/prometheus
          --alertmanager.url=http://alertmanager:9093
          --config.file=/etc/prometheus/prometheus.yml
          --storage.local.path=/prometheus -web.console.libraries=/etc/prometheus/console_libraries
          --web.console.templates=/etc/prometheus/consoles
        imagePullPolicy: Always
        name: prometheus
        ports:
        - containerPort: 9090

You can check out definitions of other services and replication controllers at this Github repo.

The Magic of Labels

Looking more closely, you’ll notice that the Cattle template for Prometheus-rancher-exporter uses Rancher labels:

prometheus-rancher-exporter:
  tty: true
  labels:
    io.rancher.container.create_agent: true
    io.rancher.container.agent.role: environment

The labels create a temporary Rancher API key and exposes environment variables to the container. In the case of the Kubernetes template, CATTLE_URL, CATTLE_ACCESS_KEY and CATTLE_SECRET_KEY are provided as configuration options while launching the template. To get API keys – head over to “API” the right most tab in Rancher UI. Copy the Endpoint URL listed there, as it varies from one environment to another. Also create an API key and copy both the access and secret keys. You will be asked for these keys when you launch the catalog as shown in screenshot below: Prometheus template configuration
options If you added the repo to your catalog, then you can click on “Launch”, and in a few minutes you should have the cluster beating to life! prometheus catalog
icon Now let’s head over to the Grafana UI and check its stats – that will serve as a test of what is working and what is broken. Grafana has five dashboards; you will notice that “Rancher Stats” is not showing any data at all: Rancher statistics within
Grafana

Kubernetes Network & Graphite Port

The issue here is that the Rancher Statistics dashboard gets data from InfluxDB – which in turn is sent data by the Rancher server through Graphite connector. Since the Kubernetes cluster creates its own network and assigns IPs and ports to containers dynamically, and is on a different network than Rancher server, we have to configure this after the Prometheus cluster is up. InfluxDB is running in a private network, but it is exposing the ports on host network using port type as NodePort:

spec:
  type: NodePort
  ports:
  - port: 2003
    protocol: TCP
    name: idb-4

This is visible if you click on Service: InfluxDB and open tab “Ports”. Essentially, we have to configure the host machine’s IP and exposed port to enable the Rancher server to talk to the InfluxDB graphite connector.

influxdb

Go to http://<RANCHER_SERVER_IP>:8080/v1/settings/graphite.host and click on the edit button at top right. This will provide you a value field – enter the IP of the host on which InfluxDB container is running here and send the request. You’ll see the new IP if you refresh above URL. Now go to the KUBERNETES console, Services and click on InfluxDB service and find out the host port for 2003 – in our case 32435. Update this port at URL: http://<RANCHER_SERVER_IP>:8080/v1/settings/graphite.port

For above settings to take effect, you will have to restart the rancher-server container. But once this is done, you’ll see stats reported on Grafana UI, and when you query the InfluxDB UI:

Updated Rancher
Statistics

Conclusion

In this article, we saw how the Prometheus template can be converted from a Cattle format to a Kubernetes format. The networking model, linking of containers, and semantics for data volumes in Kubernetes are different than in Docker. Hence, while converting a Cattle template which is in native Docker format, we need to apply these Kubernetes semantics and redesign the template.