Not so long ago, I presented a couple of sessions at the IOUG Collaborate 17 conference. During my session ‘Docker 101 for Oracle DBAs’ there were a bunch of questions regarding Docker images and the concept of layers and their benefits. So this blog basically summarizes the discussion I had with the audience during my session.
What are Docker Images and how do they compare to VM Images?
For anyone who has used VMs, the concept of a VM image is not new. A Docker image is very similar and serves the same purpose, that is, they are used to create containers, but that is where the similarity ends. While VM image is a single large file, a Docker image references a list of read-only layers that represent differences in the filesystem. These layers are stacked one over the other, as shown in the image below, and form the basis of the container root filesystem. The Docker storage driver stacks and maintains the different layers. The storage driver also manages sharing of layers across images. This makes building, pulling, pushing, and copying of images fast and saves on storage.
How are Images used to create Containers?
When you spawn a container (docker run <image-name>), each gets its own thin writable container layer, and all changes are stored in this container layer, this means that multiple containers can share access to the same underlying image and yet have their own data state.
When a container is deleted, all data stored is lost. For databases and data-centric apps, which require persistent storage, Docker allows mounting host’s filesystem directly into the container. This ensures that the data is persisted even after the container is deleted, and the data can be shared across multiple containers. Docker also allows mounting data volumes from external storage arrays and storage services like AWS EBS via its Docker Volume Plug-ins.
How do I find the layers of an Image?
The older versions of Docker provided `docker images –tree` which would show the tree view of all images and layers, unfortunately in the absence of that we have to look for other options.
For illustration, I am going to use an Oracle WebLogic server image I built. This image was built using multiple other images, each with their own set of layers.
You can see the total size of the image, currently 1.62GB, and the image ID.
Next, we will use the `docker history <image>` command to see the layers of this image. This output below shows all the layers that make up the weblogic image, but what are all these commands in the 2nd last column? For this, we have to refer to the dockerfile. Every instruction in the dockerfile creates a new layer, and since we used multiple dockerfiles to create the different images, this view shows the aggregate of all instructions run to build the final image we are using. Here is a good reference on how to write good dockerfiles.
While this information is good, it still doesn’t give us the hierarchical or tree view of images and layers. For this, we will use a hack. Run the following command to download an image from DockerHub, that will print out the tree view we want.
>> docker run –rm -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t
This view shows the same set of layers as the history command, but it also shows the lineage. If I focus on just the hierarchy of images & layers used to build by Oracle Weblogic image, I see that I used 4 different dockerfiles to build 4 different images. In fact, the last two images were derived from the same set of shared layers up till layer ‘bd54831efb16 Virtual Size: 1.2 GB’, after which the tree splits. Below is the hierarchy of images I built and used.
Now that I know the hierarchy of images being used, I can run ‘docker history’ command for each image and see its individual layers and the dockerfile instructions used to create them.
Once I completed mapping all layers to their respective images, I end up with a breakdown as follows.
Success!!
If you want to learn more about Docker and other container formats, take a look at my blog – Containers Deep Dive – LXC vs Docker
Shameless Plug:
I currently work for Robin Systems and we provide an excellent container-based platform for both stateless and persistent stateful applications, especially for Big data applications and relational and no-SQL databases. This platform includes the 4 key components required to run stateful applications – containers (Docker and LXC), a scale-out block storage, virtual networking, and orchestration, and the platform can be deployed both on-premises on commodity hardware, or on public clouds like AWS.
Try our free community edition!