Illumina Innovates with Rancher and Kubernetes
Cassandra is a popular database
technology which is gaining popularity these days. It provides
adjustable consistency guarantees, is horizontally scalable, is built to
be fault-tolerant and provides very low latency, (sub-millisecond)
writes. This is why Cassandra is used heavily by large companies such as
Facebook and Twitter. Furthermore, Cassandra uses application layer
replication for its data which makes it ideal for a containerized
environment. However, Cassandra, like most databases, assumes that
database nodes are fairly static. This assumption is at odds with a
containerized infrastructure, which caters to more dynamic environments.
In this post, I will cover some of the challenges this poses for
managing Cassandra and other database workloads in containerized
A basic Cassandra deployment on Rancher is fairly straight forward
thanks to the work done by Patric Boos to
container image in the community catalog. The docker-compose.yaml file
to create a service stack with the Cassandra service is shown below.
Note that we are creating a volume at /var/lib/cassandra. This is
because the union file system containers are very slow and not suitable
for high IO use cases. By using a volume, we defer to the host file
system, which should be much faster.
In addition, we set the RANCHER_ENABLE environment variable to true. If
this variable is set, the container will use the Rancher Meta Data
Service to get the IPs of all containers in the service (other than the
current container) and set them up as seeds. Seeds are very important
for Cassandra and this setup has a few very important impacts. First,
seeds help bootstrap the cluster because all new nodes must report to
the seeds so that they can give these new nodes a list of all the
current servers and also tell the older connected servers about the new
servers. If any server in the cluster did not know about any other, you
would not be able to provide consistency guarantees. This is why
Cassandra will refuse to start up if it cannot talk to all the seed
servers. If you had multiple nodes launching at the same time, they
would see each other as seeds. Since the nodes are not ready to accept
connections, they would both fail to start, stating that they cannot
connect to the seed. You can add more containers at a later time (once
the first container is ready to serve traffic) but, you must do so one
at a time and wait for that container to be ready before launching
In our fork of the container image,
(usman/docker-rancher-cassandra:3.1) I slightly change the behavior by
marking all nodes in the service as seeds (even the newly launched
node). Seed nodes do not need to wait for other nodes to be ready.
Hence, we can get around the requirement to launch one node at a time.
Note we are only able to get around this requirement when launching
nodes into a newly launched empty cluster. Once we have data in the
cluster, we need to allow nodes to stream data from peers so they can
start serving requests.
If you Execute Shell in one of the containers and run *nodetool status
y*ou will see that all your nodes are in the same data center and more
problematically in the same rack. If you are launching nodes across
cloud providers or geographical regions, the latency will be such that
your cluster will not be very performant. In these scenarios, you have
to partition your nodes into data centers. Multi-data center clusters
are able to provide low latency writes (and potentially low latency
reads based on consistency requirements) across high latency network
links because of asynchronous synchronization.
In addition to single data center, we are also using a single rack,
which is more problematic. We mentioned earlier that Cassandra has
configurable consistency guarantees. Essentially, you can specify how
many copies of your data are written and read to service requests. If
you split your cluster into racks, then copies of your data are spread
out across racks. However, if you have just one rack then copies of the
data are on arbitrary nodes. This means if you need to take nodes
offline it is very difficult to decide which nodes to take offline
without loosing ALL copies of SOME data. If you had racks, its
always possible to take down N/2 racks without loosing any data. i.e. if
you have 3 racks you can take one offline without data loss.
If you are using a cloud like AWS to run your containers, then it is
important to create a separate rack for each Availability Zone (AZ).
This is because Amazon guarantees that separate AZs have independent
power sources, and network hardware etc. If there is an outage in one AZ
it is likely to impact many nodes in that AZ. If your Cassandra cluster
is not rack aware, you can loose multiple nodes and potentially all
copies of some of your data. However, if you data is laid out in racks
and all nodes in one AZ are not available you are still guaranteed to
have redundant copies of your data.
To specify the Rack and Data Center, add the CASSANDRA_RACK and
CASSANDRA_DC parameters into your environment. We will need to create
separate services for each of the racks so that we can specify the
properties separately. You will also need to set the
CASSANDRA_ENDPOINT_SNITCH parameter to one of the snitches defined
The default SimpleSnitch is not able to process racks and data centers
but, the others such as the GossipingPropertyFileSnitch will be able
to setup clusters which are data center and rack aware. For example, the
docker-config.yaml below sets up a single data center (called
aws-us-east) and three rack cluster. You can copy the CassandraX section
to create more racks and data centers as needed. Note that you can use
scale to increase the number of nodes in each of the racks as needed.
Note that we still are limited by the requirement that only one node can
be launched per Rancher service (Rack/DC). Also note that since we use
the containers parent, Rancher service to determine seed nodes right now
to each of your racks is disjointed from the others, essentially each
rack is its own Cassandra cluster. In order to unify these into one
cluster, we will need to use a shared set of seed nodes. In the next
section, we will cover how to setup a separate shared seed service.
So far, we have used the default Rancher Cassandra seed behavior, which
means that every node will see all other nodes in its own Rancher
service as seed nodes. If you ran the docker-compose.yaml file in the
previous section, you will notice that your cluster is partitioned (i.e.
none of the racks know about each other). This is because nodes only
knows about seeds in its own Rancher service. Without shared seeds, the
racks do not know about each other. To get around this limitation, we
setup a separate seed service which will be used by all nodes in the
rack. We then use the RANCHER_SEED_SERVICE property to specify that
all members of this service will be considered seed nodes. A snippet of
the updated docker-compose.yaml file is shown below. The complete file
can be seen
Note that now that all your Rancher services are in one cluster, you can
only add one node to the entire cluster at a given time. Once you add a
node, you have to wait for it to be bootstrapped by its peers before we
can launch more nodes. This means that we can only launch one of the
non-seed nodes at a given time.
Today, we have covered how to setup a data center and rack aware
Cassandra cluster on top of Docker using Rancher. We saw that the
Rancher/Docker service model does not fit cleanly with Cassandra’s
seeding model and we had to use multiple services create a workable
solution. In addition, Cassandra only allows us to add one new node at
any given time which means we have to be careful about how we launch
services. We have to launch the seed service before the rack services.
We have to make sure we only scale up each rack service at any given
time. In addition, we can only scale up one rack services at a time.
This requirement would be familiar to anyone who runs Cassandra but is
at odds with most services running on Docker and Rancher.
Furthermore, containers are designed to be disposable and come up and
down often. With our Cassandra setup, we are assuming (and have to
ensure) that our containers are launched, and scaled in a coordinated
controlled fashion. For these reasons, running production database
workloads in Docker and Rancher is still a risky proposition and should
only be attempted if you are able to bring the commit database and
infrastructure expertise to this part of your infrastructure.
To learn more about running other types of applications in Rancher,
download our eBook: Continuous Integration and Deployment with Docker