During my docker learning journey, while i was going through many blogs, tutorials, conference videos; most of them were mentioning about some of the anti-patterns. I would like to collate all the items which i have been learning and will be learning.
Data or Logs in Container.
Containers are ideal for stateless applications and are meant to be ephemeral (only for a short period of time). This means no data or logs should be stored in the container — otherwise, they’ll be lost when the container terminates. Instead, use volume mapping to persist them outside the containers. The ELK stack could be used to store and process logs. If managed volumes are used during early in the testing process, then remove them using a -v switch with the docker rm command.
IP Addresses of Containers
Each container is assigned an IP address. Multiple containers communicate with each other to create an application, for example, an application deployed on an application server will need to talk with a database. Existing containers are terminated and new containers are started all the time.
Relying upon the IP address of the container will require constantly updating the application configuration. This will make the application fragile. Instead, create services. This will provide a logical name that can be referred independent of the growing and shrinking number of containers. And it also provides a basic load balancing as well.
Run a Single Process in a Container
A Dockerfile uses one CMD and ENTRYPOINT. Often, CMD will use a script that will perform some configuration of the image and then start the container. Don’t try to start multiple processes using that script. It's important to follow separation of concerns pattern when creating Docker images. This will make managing your containers, collecting logs, and updating each individual process that much harder. You may consider breaking up applications into multiple containers and managing them independently.
Don’t Use docker exec
The
docker exec
command starts a new command in a running container. This is useful for attaching a shell using thedocker exec -it {cid} bash
. But other than that, the container is already running the process that it's supposed to be running.Keep Your Image Lean
Create a new directory and include the Dockerfile and other relevant files in that directory. Also consider using .dockerignore to remove any logs, source code, et.c before creating the image. Make sure to remove any downloaded artifacts after they are unzipped. We can also prefer docker multistage builds.
Create Images From a Running Container
A new image can be created using the
docker commit
command. This is useful when any changes in the container have been made. But images created using this are non-reproducible. Instead, make changes in the Dockerfile, terminate existing containers, and start a new container with the updated image.Security Credentials in a Docker Image
Do not store security credentials in the Dockerfile. They are in clear text and checked into a repository. This makes them completely vulnerable. Use
-e
to specify passwords as runtime environment variable. Alternatively--env-file
can be used to read environment variables from a file. Another approach is to usedCMD
orENTRYPOINT
to specify a script. This script will pull the credentials from a third party and then configure your application.The latest Tag
Tagging the docker image is important. If we aren't giving any name, it will be assigned to latest. But we can't assure whether its the latest image or an old image. In an production environment its better to use the tag with the version.
Impedance Mismatch
Don’t use different images, or even different tags in the dev, test, staging, and production environment. The image that is the “source of truth” should be created once and pushed to a repo. That image should be used for different environments going forward. In some cases, you may consider running your unit tests on the WAR file as part of maven build and then create the image. But any system integration testing should be done on the image that will be pushed in production.
Publishing Ports
Don’t use
-P
to publish all the exposed ports. This will allow you to run multiple containers and publish their exposed ports. But this also means that all the ports will be published. Instead use-p
to publish specific ports.Treating docker container as a virtual machine
We should consider docker as a simple stateless, immutable, short lived box which runs a single process and can be recreated again and again. But developers are asking questions like how to ssh in to a container ?, how do i get logs out from a container ?, how to run multiple programs in a container ?
If you regularly find yourself wanting to open ssh sessions to running containers in order to “upgrade” them or manually get logs/files out of them you are definitely using Docker in the wrong way and you need to do some extra reading on how containers work.
Creating docker images with magic folders
Consider the below code example,
FROM alpine:3.4 RUN apk add --no-cache ca-certificates pciutils ruby ruby-irb ruby-rdoc && echo http://dl-4.alpinelinux.org/alpine/edge/community/ >> /etc/apk/repositories && apk add --no-cache shadow && gem install puppet:"5.5.1" facter:"2.5.1" && /usr/bin/puppet module install puppetlabs-apk # Install Java application RUN /usr/bin/puppet agent --onetime --no-daemonize ENTRYPOINT ["java","-jar","/app/spring-boot-application.jar"]
Here the Dockerfile, is having the dependency of the puppet tool which is installed in your machine. Consider your machine is having some root access rights. If this image is getting build with puppet running in your machine; its difficult to reproduce the same image again. Consider, puppet is down in your machine, or else there is been an upgrade. So it can't be a same image today and tommorow. We should create the image which is not dependent on these magic folders.
- Python: pip --no-cache-dir
Storing data inside containers
The ephemeral nature of container filesystems means you shouldn't be writing data within them. Persistent data created by your application's users, such as uploads and databases, should be stored in Docker volumes or it will be lost when your containers restart.
Other kinds of useful data should avoid writing to the filesystem wherever possible. Stream logs to your container's output stream, where they can be consumed via the
docker logs
command, instead of dumping them to a directory which would be lost after a container failure.Container filesystem writes can also incur a significant performance penalty when modifying existing files. Docker's use of the "copy-on-write" layering strategy means files that exist in lower filesystem layers are read from that layer, rather than your image's final layer. If a change is made to the file, Docker must first copy it into the uppermost layer, then apply the change. This process could take several seconds for larger files.
Storing zip, tar and other archives
It is generally a bad idea to add an archive (zip, tar.gz or otherwise) to a container image. It is certainly a bad idea if the container unpacks that archive when it starts, because it will waste time and disk space, without providing any gain whatsoever!
It turns out that Docker images are already compressed when they are stored on a registry and when they are pushed to, or pulled from, a registry. This means two things:
storing compressed files in a container image doesn’t take less space,
storing uncompressed files in a container image doesn’t use more space.
If we include an archive (e.g. a tarball) and decompress it when the container starts:
we waste time and CPU cycles, compared to a container image where the data would already be uncompressed and ready to use;
we waste disk space, because we end up storing both the compressed and uncompressed data in the container filesystem;
if the container runs multiple times, we waste more time, CPU cycles, and disk space each time we run an additional copy of the container.
If you notice that a Dockerfile is copying an archive, it is almost always better to uncompress the archive (e.g. using a multi-stage build) and copy the uncompressed files.
Using the root user
A fourth common Dockerfile anti-pattern is using the root user to run your application or your commands. This can expose your container to security risks, as any malicious code or user can gain full access to your container and your host system. To avoid this, you should use a non-root user to run your application or your commands, and use the USER directive in your Dockerfile to specify it. You should also use the least privilege principle, and only grant the necessary permissions to your user.
Not using BuildKit
BuildKit is a new backend for
docker build
. It’s a complete rehaul with a ton of new features, including parallel builds, cross-arch builds (e.g. building ARM images on Intel and vice versa), building images in Kubernetes Pods, and much more; while remaining fully compatible with the existing Dockerfile syntax. It’s like switching to a fully electric car: we still drive it with a wheel and two pedals, but internally it is completely different from the old thing.If you are using a recent version of Docker Desktop, you are probably already using BuildKit, so that’s great. Otherwise (in particular, if you’re on Linux), set the environment variable
DOCKER_BUILDKIT=1
and run yourdocker build
ordocker-compose
command; for instance:DOCKER_BUILDKIT=1 docker build . --tag test
while comparing time, docker buildkit consumes lesser time.
Conflicting names for scripts and images
Avoid to name your scripts in a way that could conflict with other popular programs. Some folks will see it and they will be careful, others might not notice and accidentally run the wrong thing.
This is particularly true with 2-letter commands, because UNIX has so many of them! For instance:
bc and dc (“build container” and “deploy container” for some folks, but also some relatively common text-mode calculators on UNIX)
cc (“create container” but also the standard C compiler on UNIX)
go (conflicts with the Go toolchain)
Building a Docker container image “on the fly” right before deployment
This is somewhat similar to the above antipattern, but goes beyond just doing a git clone directly into an image. This involves cloning, building, and then running the newly created image without ever pushing the image to an intermediary Docker registry.
This is an antipattern for several reasons. First off, pushing the image to a registry gives you a “backup” of the image. This confers several benefits, the most important of which is that you can easily do a “quick rollback” should your deployment fail. You simply pull that last functioning image and run that, then go fix the current deployment.
Additionally, many current container registries also offer the benefit of scanning your images for potential vulnerabilities. The value of this cannot be overstated – scanning a container image for vulnerabilities helps keep your data and your users safe.
Another reason to avoid this is because the newly created docker image has not been tested at all. You should always test your images before deploying them, especially to a production environment.
You can’t upgrade inside an unprivileged container
In a whole bunch of places you will be told not to install security updates when building your Docker image. But actually we should run these.
In order to install security updates, you need to be running as root or some other privileged user. And it’s true, we should not run it as root (one of our antipattern).
But just because you’re installing security updates doesn’t mean your image needs to run as root. Behold the not-so-secret, quite obvious solution—first you install security updates, then you switch to another user:
FROM debian:buster # Runs as root: RUN apt-get update && apt-get -y upgrade # Switch to non-root user: RUN useradd --create-home appuser WORKDIR /home/appuser USER appuser # Runs as non-root user: ENTRYPOINT ["whoami"]
What if your base image already changes to a non-root user? That’s still not a problem, you can switch back and forth between different users throughout your
Dockerfile
. So you can switch to root, install security updates, and then switch back to the non-root user.Just to demonstrate:
FROM debian:buster # Switch to non-root user: RUN useradd --create-home appuser WORKDIR /home/appuser USER appuser RUN whoami # Switch back to root. USER root RUN whoami # Runs as non-root user: USER appuser RUN whoami
If we run this:
$ docker build . Sending build context to Docker daemon 2.048kB ... Step 4/9 : USER appuser ---> Running in bd9f962c3173 Removing intermediate container bd9f962c3173 ---> 30c7b4932cfd Step 5/9 : RUN whoami ---> Running in c763f389036f appuser Removing intermediate container c763f389036f ---> 305bf441eb99 Step 6/9 : USER root ---> Running in a7f1d6ae91b8 Removing intermediate container a7f1d6ae91b8 ---> 5ac4d87a852f Step 7/9 : RUN whoami ---> Running in 81f4bc596dad root Removing intermediate container 81f4bc596dad ---> 4bc187b4892a Step 8/9 : USER appuser ---> Running in 08db9249418a Removing intermediate container 08db9249418a ---> 651753d0a56e Step 9/9 : RUN whoami ---> Running in c9fb60a9627d appuser ...