In this blog, we’ll discuss how and why Weave developed the best-practice RED method for monitoring apps with Prometheus.
What is Prometheus Monitoring?
You may have heard a lot about Prometheus lately, especially when it comes to monitoring applications in Kubernetes. To provide a bit of background before we delve into the RED method, apps running in containers and orchestrated by Kubernetes are highly automated and dynamic, and so, when it comes to monitoring applications in these environments, traditional server-based monitoring tools designed for static services are not sufficient.
This is where Prometheus comes in.
Prometheus is an open source project that was originally developed by engineers at SoundCloud. It was built and designed specially to monitor microservices that run in containers. Data is scraped from running services at time intervals and saved to a time-series database where it can be queried via the PromQL language. Because the data is stored as a time series, it allows you to explore those time intervals to diagnose problems when they occurred and to also analyze long-term monitoring trends with your infrastructure — two awesomely powerful features of Prometheus.
At Weaveworks we built on the open source distribution of Prometheus and created a scalable, multi-tenant version that is part of our Software-as-a-Service called Weave Cloud.
After having run this service for several months now, and by using Weave Cloud to monitor itself, we’ve learned a few things about monitoring cloud native applications and devised a system that we use in determining what to measure before instrumenting code.
What to Instrument?
One of the most important decisions to make when setting up Prometheus Monitoring is deciding on the type of metrics you need to collect about your app. The metrics you choose simplifies troubleshooting when a problem occurs and also enables you to stay on top of the stability of your services and infrastructure. To help us think about what’s important to instrument, we defined a system that we call the RED method. Read more
If you use containers as part of your day-to-day operations, you need to monitor them — ideally, by using a monitoring solution that you already have in place, rather than implementing an entirely new tool. Containers are often deployed quickly and at a high volume, and they frequently consume and release system resources at a rapid rate. You need to have some way of measuring container performance, and the impact that container deployment has on your system.
In this article, we’ll take a look at four widely used monitoring platforms—Netuitive, New Relic, Splunk, and AppDynamics—that support containers, and compare how they measure up when it comes to monitoring containers.
First, though, a question: When you monitor containers, what kind of metrics do you expect to see? The answer, as we’ll see below, varies with the monitoring platform. But in general, container metrics fall into two categories—those that measure overall container impact on the system, and those that focus on the performance of individual containers. Read more
As a reminder, Elasticsearch is the cornerstone of the ELK platform (ELK stands for Elasticsearch/Logstash/Kibana). In this article, we’ll deploy the stack using Rancher Catalog, and use it to track tags and brands on Twitter.
Tracking hashtags on Twitter can be very useful for measuring the impact of a Twitter-based marketing campaign. You can pull information like the number of times your announcement has been retweeted, or how many new followers your marketing campaign has brought in.
Installing the ELK Stack
Following the previous article, you should know have a fully working Elasticsearch Cluster. So now for our example, we just have to tweak its configuration a bit, by creating an index template using JSON configuration.
The Rancher Community Catalog just got two new gems – SPM and Logsene – monitoring and logging tools from Sematext. If you are familiar with Logstash, Kibana, Prometheus, Grafana, and friends, this post explains what SPM and Logsene bring to the Rancher users’ table, and how they are different from other monitoring or logging solutions.
Meet Sematext Docker Agent
Sematext Docker Agentis a modern, Docker-native monitoring and log collection agent. It runs as a tiny container on every Docker host, and collects logs, metrics, and events for all cluster nodes and their containers. The agent discovers all containers on all nodes managed by Rancher. After the deployment of Sematext Docker Agent, all logs, Docker events, and metrics are immediately available out of the box.
Why is this valuable? It means you don’t have to spend the next N hours or days figuring out which data to collect, or how to chart it. Read more
Monitoring your container-based infrastructure is crucial to ensure good performance, identify issues early and gain the insight necessary to maximize its efficiency. When you are dealing with a large number of often short-lived containers spread over multiple hosts and even data centers, understanding the operational health of your infrastructure implies the need to aggregate performance data from both physical hosts as well as the container cluster running on top of it. Ideally, you want to capture and correlate application performance with the underlying infrastructure to troubleshoot and identify bottlenecks. Implementing a monitoring system that satisfies these requirements can be a complex endeavor.
Update (October 2017): Gord Sissons revisited this topic and compared the top 10 container-monitoring solutions for Rancher in a recent blog post.
Update (October 2016): Our October online meetup demonstrated and compared Sysdig, Datadog, and Prometheus in one go. Check out the recording.
As Docker is used for larger deployments it becomes more important to get visibility into the status and health of docker environments. In this article, I aim to go over some of the common tools used to monitor containers. I will be evaluating these tools based on the following criteria: 1) ease of deployment, 2) level of detail of information presented, 3) level of aggregation of information from entire deployment, 4) ability to raise alerts from the data, 5) ability to monitor non-docker resources, and 6) cost. This list is by no means comprehensive however I have tried to highlight the most common tools and tools that optimize our six evaluation criteria.