One of the things that often surprises administrators when they first begin working with Docker containers is the fact that containers natively use non-persistent storage. When a container is removed, so too is the container’s storage.
Of course, containerized applications would be of very limited use if there were no way of enabling persistent storage. Fortunately, there are ways to implement persistent storage in a containerized environment. Although a container’s own native storage is non-persistent, a container can be connected to storage that is external to the container. This allows for the storage of persistent data, since this external storage is not removed when a container is stopped.
The first step in deciding how to go about implementing persistent storage for your containers is to determine the underlying type of storage system that you will use. In this regard, there are three main options that are generally available: File system storage, block storage, and object storage. Below, I explain the differences between each type of storage and what they mean when it comes to setting up storage for a containerized environment.
File System Storage
File system storage has been around for decades, and stores data as files. Each file is referenced by a filename, and typically has attributes associated with it. Some of the more commonly used file systems include NFS and NTFS.
When it comes to configuring a container to store data persistently, file system storage is one of the simplest options to implement. The likely best known example of file system storage (as it relates to containers), is host-based persistence.
The idea behind host-based persistence is really simple. Containers reside on a host server. This host server contains its own operating system, and its own file system. Containers can be configured to store persistent data within a dedicated folder on the host server’s file storage. Docker containers normally use a union file system to assemble the container layers into a cohesive file structure. Host-based persistence bypasses the union file system for the data that needs to be stored persistently, and stores that data using the same file system that is in use on the host.
The primary problem caused by simple host persistence is that it completely undermines container portability. When host persistence is used, a dependency resource (the persistent storage) resides outside of the container, on the host server’s native file system. To get around this problem, other flavors of host persistence have been created. For example, multi-host persistence uses a distributed file system to replicate persistent storage across multiple host servers.
The takeaway: File system storage is probably the most awkward match for containers because file systems were not originally designed with portability in mind. As I’ve noted, however, there are ways to implement container-friendly file storage systems; this is usually done by distributing a file system across multiple servers.
Block storage is another storage option for containers. As previously mentioned, file system storage organizes data into a hierarchy of files and folders. In contrast, block storage stores chunks of data in blocks. A block is identified only by its address. A block has no filename, nor does it have any metadata of its own. Blocks only become meaningful when they are combined with other blocks to form a complete piece of data.
Block storage is commonly used for database applications because of its performance. Block storage is also generally used to provide snapshotting capabilities, which allow a volume to be rolled back to a specific point in time, without having to restore a backup.
In the case of containers, block storage is sometimes implemented in the form of container-defined storage. Container-defined storage is a form of software-defined storage, but is specifically intended for use in a containerized environment. This storage is often implemented inside of a dedicated storage container.
Rancher Labs has introduced its own distributed block storage project, called Project Longhorn. The basic idea behind Longhorn is relatively simple. A storage system can contain numerous block storage volumes, and each of these volumes can only be mounted by a single host. That being the case, Longhorn seeks to partition block storage controllers into large numbers of smaller block storage controllers, each of which can be mapped to a different block storage volume. If all of these block storage volumes reside within a common pool of physical disks, then the Longhorn approach will allow an orchestration engine to create block storage volumes on an on-demand basis. For example, a block storage volume could conceivably be automatically created at the same time that a container is created.
The takeaway: Block storage is more flexible than file system storage, which makes it easier to adapt block storage for container environments. The only big challenge is making sure that block storage data is available across an environment composed of multiple hosts. This can be resolved through distributed storage.
Object storage works differently from file system storage or block storage. Rather than referencing data by a block address or a file name, data is stored as an object and is referenced by an object ID. The advantages of object storage are that it is massively scalable, and allows for a high degree of flexibility with regard to associating attributes with objects. The disadvantage to using object storage is that it does not perform as well as block storage.
Because object storage is designed primarily for scalability, it is a popular choice for public cloud providers. Docker containers can be linked to object storage on Amazon Web Services or Microsoft Azure, but doing so requires that the containerized application be specifically designed to take advantage of object storage. Whereas a typical application might be designed to access data through file system or SCSI calls, object storage requires HTTP-based REST calls such as Get or Put. As such, object storage should generally be saved for applications that need massively scaled storage, or storage that needs to traverse geographic boundaries.
The takeaway: Object storage can be more complex to implement because it relies on REST calls, but the scalability that object storage provides makes it a good choice for container environments where massive scalability is a priority.
Brien Posey is a freelance technology author and 15-time Microsoft MVP. Prior to going freelance, Posey was a CIO for a national chain of hospitals and healthcare facilities. He also served as Lead Network Engineer for the United States Department of Defense at Fort Knox. In addition to Posey’s continued work in IT, Posey is in his third year of training as a commercial scientist-astronaut candidate.