Skip to content
Amar Prakash Pandey

Docker - the right way

docker, best-practices, infrastructure, iac5 min read

Google Summer of Code

Docker is a software framework for building, running, and managing containers on servers and the cloud. Here are the several best practices for using Docker in production to improve security, optimize image size and write cleaner and more maintainable Dockerfiles.


1. Use Official Docker Image as Base Image

Always use the official or verified base image when writing the docker file. Let's say you are developing a java application and want to build it and run it as a docker image. Instead of taking a base operating system image and installing java, maven, and other tools you need for your application.

Dockerfile
1FROM ubuntu
2
3RUN apt-get update && \
4 apt-get install -y openjdk-8-jdk && \
5 apt-get install -y ant && \
6 apt-get clean;

Use the official Java image for your application. This will not only make your docker file cleaner but also let you use an official and verified image which is already built using the best practices.

Dockerfile
1FROM openjdk

2. Use specific Image Version

As you see from the previous script we have chosen OpenJDK as our base image, but now when we build our application image from the above docker file, it will always use the latest tag of the OpenJDK image.

Dockerfile
1# Is same as FROM openjdk:latest
2FROM openjdk

The problem here is that we might get a different image version as in the previous build and the new image version may break stuff or cause unexpected behavior, so the latest tag is unpredictable, we don't know exactly which image we are getting. So instead of the random latest image tag, we need to fixate the version. We should be as specific as possible with the image version.

Dockerfile
1FROM openjdk:11-alpine

3. Use small sized Official Image

There are multiple official images of openjdk not only with different version numbers but also with the different operating system distribution, so the question here is which one to choose? and does it even matter?

If the image is based on a full-blown operating system distribution like ubuntu or centos which has a bunch of tools already packaged in, which makes the image size large. But most of the time, we don't need these tools in our application image.

In contrast, having smaller images means, we need less storage space in the image repository as well as on a deployment server and of course, we can transfer the images faster when pulling or pushing them from the repository.

In addition to the size, there is another issue with images on a full-blown operating system with lots of tools installed and that is a security issue because such a base usually contains hundreds of known vulnerabilities and basically creates a larger attack surface to your application image.

In comparison, using smaller images with leaner operating system distribution which bundle the necessary system tools and libraries, we are minimizing the attack surface and building more secure images.


4. Minimize the Number of Layers

Every line in our Dockerfile will be treated as an image layer. Each layer increases the size of images since they are cached. Therefore, as the number of layers increases, the size also increases. It's always a good idea to combine RUN, COPY, and ADD commands as much as possible since they create layers.

You can test this out with the docker history command:

Console
1$ docker images
2REPOSITORY TAG IMAGE ID CREATED SIZE
3dockerfile latest 194f98552a02 37 seconds ago 218MB
4
5$ docker history 194f98552a02
6
7IMAGE CREATED CREATED BY SIZE COMMENT
8194f98552a02 37 seconds ago COPY . . # buildkit 6.71kB buildkit.dockerfile.v0
9<missing> 37 seconds ago RUN /bin/sh -c pip install -r requirements.t… 35.5MB buildkit.dockerfile.v0
10<missing> About a minute ago COPY requirements.txt . # buildkit 58B buildkit.dockerfile.v0
11<missing> About a minute ago WORKDIR /app

If we see the above logs carefully, we can notice only the RUN, COPY, and ADD command adds size to the image. we can reduce the image size by combining commands wherever possible. For example:

Dockerfile
1RUN apt-get update
2RUN apt-get install -y openjdk-8-jdk

Can be combined into a single RUN command:

Dockerfile
1RUN apt-get update && apt-get install -y openjdk-8-jdk

Thus, creating a single layer instead of multiple, which reduces the size of the final image.


5. Optimize Caching Image Layers

Docker images are built based on Dockerfile. In Dockerfile, each line generates its layer during the building process. The layers are also cached and reused between different building processes if no changes are detected.

Let's take a look at the dockerfile based on a node alpine image:

Dockerfile
1FROM node:17.0.1-alpine
2
3WORKDIR /app
4
5COPY project /app
6
7RUN npm install --production
8
9CMD ["node", "src/index.js"]

As we discussed before each line creates its cached layer. Let's build this docker image and see what is happening.

Console
1Step 1/5 : FROM node:17.0.1-alpine
217.0.1-alpine: Pulling from library/node
3Digest: sha256:959c4fc79a753b8b797c4fc9da967c7a81b4a3a3ff93d484dfe00092bf9fd584
4Status: Downloaded newer image for node:17.0.1-alpine
5 ---> c0fc1c9c473b
6Step 2/5 : WORKDIR /app
7 ---> Using cache
8 ---> f665e3b63c98
9Step 3/5 : COPY project /app
10 ---> 8d4971fa2f3b
11Step 4/5 : RUN npm install --production
12 ---> Running in a5eac87912ce
13
14up to date, audited 1 package in 371ms
15
16found 0 vulnerabilities
17Removing intermediate container a5eac87912ce
18 ---> 9c21576cad06
19Step 5/5 : CMD ["node", "src/index.js"]
20 ---> Running in 1ff9c5bb72e7
21Removing intermediate container 1ff9c5bb72e7
22 ---> 9783eef2c1d3
23Successfully built 9783eef2c1d3
24Successfully tagged dockerfile:latest

Docker image from docker file was built completely from scratch, so it took 1 minute to build. Let's try to build again and see.

Console
1Step 1/5 : FROM node:17.0.1-alpine
2 ---> c0fc1c9c473b
3Step 2/5 : WORKDIR /app
4 ---> Using cache
5 ---> f665e3b63c98
6Step 3/5 : COPY project /app
7 ---> Using cache
8 ---> 8d4971fa2f3b
9Step 4/5 : RUN npm install --production
10 ---> Using cache
11 ---> 9c21576cad06
12Step 5/5 : CMD ["node", "src/index.js"]
13 ---> Using cache
14 ---> 9783eef2c1d3
15Successfully built 9783eef2c1d3
16Successfully tagged dockerfile:latest

A few lines of logs. As you can see this is Using cache text in multiple lines and the whole process took less than 1 sec. This is the power of layer caching. Nothing here was built from scratch, every layer comes from the cache.

Important thing is, If any layer is created from scratch because of some changes in the source file, every next layer is built from scratch too.

The best practice here is to order docker file commands from least to most frequently change to take advantage of caching and this way we can optimize how fast the image gets built.


6. Use .dockerignore to Exclude Files and Folder

We should use the .dockerignore file to list all the files and folders that we want to exclude. we can create the .dockerignore file in the root directory and list all the files and folders we want to ignore.

When building the image, docker will look at the contents and ignore anything specified inside. Matching is done using Go's filepath.Match rules.

A sample .dockerignore file would look like:

.dockerignore
1# ignore .git and .cache folders
2.git
3.cache
4
5# ignore all markdown files
6*.md
7
8# ignore sensitive files
9private.key
10settings.json

7. Make use of Multi-stage builds

Let's assume that there are some contents in our project, that we need for building the image so during the build process but we don't need them in the final image itself to run the application.

For example in a Java-based application, we need JDK to compile the Java source code but JDK is not needed to run the Java application. In addition to that, we also use build tools like Maven or Gradle to build our Java application and those are also not needed in the final image.

Multi-stage builds allow us to use multiple temporary images during the build process but keep only the latest image as the final image. Let's see how it is done.

Dockerfile
1# Build stage
2FROM tomcat AS build
3
4RUN apt-get update \
5 && apt-get -y install maven
6
7WORKDIR /app
8
9copy project /app
10
11RUN mvn package
12
13# Runtime stage
14FROM tomcat
15
16COPY --from=build /app/target/file.war /usr/local/tomcat/webapps
17
18EXPOSE 8080
19
20ENTRYPOINT ["java", "-jar", "/usr/local/tomcat/webapps/file.war"]

Let's also look at the size comparion between two stages:

Console
1REPOSITORY TAG IMAGE ID CREATED SIZE
2docker-single latest 8d6b6a4d7fb6 16 seconds ago 259MB
3docker-multi latest 813c2fa9b114 3 minutes ago 156MB

8. Use the Least Privileged User or Non-Root User

By default, Docker runs container processes as root inside of a container. However, this is a bad practice since a process running as root inside the container is running root in the docker host. Thus, if an attacker gains access to our container, they have access to all the root privileges and can perform several attacks on the Docker host, like:

  • Copying sensitive info from the host's filesystem to the container.
  • Executing remote commands.

To prevent this, we should run container processes with a non-root user or less previliged user.

Dockerfile
1...
2
3# create group and user
4RUN groupadd -r amar && useradd -g amar amar
5
6# set ownership and permissions
7RUN chown -R amar:amar /app
8
9# switch to user
10USER amar
11
12...

Some base images already have a generic user bundled in which we can use. For example, the node image already bundles a user called a node.


9. Scan Images for Vulnerabilities

Once we build the image, we should scan the image for security vulnerabilities using the docker scan command. We need to be logged in to the Docker Hub to run the docker scan command to scan our images.

Console
1$ docker scan hello-world
2
3Testing hello-world...
4
5Organization: docker-desktop-test
6Package manager: linux
7Project name: docker-image|hello-world
8Docker image: hello-world
9Licenses: enabled
10
11✓ Tested 0 dependencies for known issues, no vulnerable paths found.
12
13Note that we do not currently have vulnerability data for your image.

In the background, Docker uses called Snyk to do the vulnerability scanning of the images. The scan uses a database of vulnerabilities that gets constantly updated so new ones get discovered and added all the time for different images.


Summary

  1. Use Official Docker Image as Base Image.
  2. Use specific Image Version.
    • Do not use a random latest image tag
    • Fixate the version
    • The more specific, the better
  3. Use small sized Official Image.
    • Base image could not be based on full blown OS
    • Use image based on a leaner and smaller OS distribution like Alpine
    • Full blown operating system introduce more security vulnerabilities
  4. Minimize the Number of Layers.
    • RUN, COPY and ADD each create layers.
    • Each layer contain the difference from the previous layer.
    • Layers increase the size of the final image.
  5. Optimize Caching Image Layers.
    • Order dockerfile command from least to most frequently changing.
  6. Use .dockerignore to Exclude Files and Folder.
    • Use .dockerignore to explicitly exclude files and folders
  7. Make use of Multi-stage builds.
    • Multi-stage builds can decerease the size of our production images.
    • Small image size potentially means small attack surface.
  8. Use the Least Privileged User or Non-Root User
  9. Scan Images for Vulnerabilities

Follow all the above mentioned practices to make your Docker image leaner and more secure.


Lastly, thank you for reading this post. For more awesome posts, you can also follow me on Twitter — iamarpandey, Github — amarlearning.

© 2022 by Amar Prakash Pandey. All rights reserved.