— docker, best-practices, infrastructure, iac — 5 min read
Docker is a software framework for building, running, and managing containers on servers and the cloud. Here are the several best practices for using Docker in production to improve security, optimize image size and write cleaner and more maintainable Dockerfiles.
Always use the official or verified base image when writing the docker file. Let's say you are developing a java application and want to build it and run it as a docker image. Instead of taking a base operating system image and installing java, maven, and other tools you need for your application.
1FROM ubuntu2
3RUN apt-get update && \4 apt-get install -y openjdk-8-jdk && \5 apt-get install -y ant && \6 apt-get clean;
Use the official Java image for your application. This will not only make your docker file cleaner but also let you use an official and verified image which is already built using the best practices.
1FROM openjdk
As you see from the previous script we have chosen OpenJDK
as our base image, but now when we build our application image from the above docker file, it will always use the latest tag of the OpenJDK
image.
1# Is same as FROM openjdk:latest2FROM openjdk
The problem here is that we might get a different image version as in the previous build and the new image version may break stuff or cause unexpected behavior, so the latest tag is unpredictable, we don't know exactly which image we are getting. So instead of the random latest image tag, we need to fixate the version. We should be as specific as possible with the image version.
1FROM openjdk:11-alpine
There are multiple official images of openjdk
not only with different version numbers but also with the different operating system distribution, so the question here is which one to choose? and does it even matter?
If the image is based on a full-blown operating system distribution like ubuntu or centos which has a bunch of tools already packaged in, which makes the image size large. But most of the time, we don't need these tools in our application image.
In contrast, having smaller images means, we need less storage space in the image repository as well as on a deployment server and of course, we can transfer the images faster when pulling or pushing them from the repository.
In addition to the size, there is another issue with images on a full-blown operating system with lots of tools installed and that is a security issue because such a base usually contains hundreds of known vulnerabilities and basically creates a larger attack surface to your application image.
In comparison, using smaller images with leaner operating system distribution which bundle the necessary system tools and libraries, we are minimizing the attack surface and building more secure images.
Every line in our Dockerfile
will be treated as an image layer. Each layer increases the size of images since they are cached. Therefore, as the number of layers increases, the size also increases. It's always a good idea to combine RUN
, COPY
, and ADD
commands as much as possible since they create layers.
You can test this out with the docker history
command:
1$ docker images2REPOSITORY TAG IMAGE ID CREATED SIZE3dockerfile latest 194f98552a02 37 seconds ago 218MB4
5$ docker history 194f98552a026
7IMAGE CREATED CREATED BY SIZE COMMENT8194f98552a02 37 seconds ago COPY . . # buildkit 6.71kB buildkit.dockerfile.v09<missing> 37 seconds ago RUN /bin/sh -c pip install -r requirements.t… 35.5MB buildkit.dockerfile.v010<missing> About a minute ago COPY requirements.txt . # buildkit 58B buildkit.dockerfile.v011<missing> About a minute ago WORKDIR /app
If we see the above logs carefully, we can notice only the RUN
, COPY
, and ADD
command adds size to the image. we can reduce the image size by combining commands wherever possible. For example:
1RUN apt-get update2RUN apt-get install -y openjdk-8-jdk
Can be combined into a single RUN
command:
1RUN apt-get update && apt-get install -y openjdk-8-jdk
Thus, creating a single layer instead of multiple, which reduces the size of the final image.
Docker images are built based on Dockerfile. In Dockerfile, each line generates its layer during the building process. The layers are also cached and reused between different building processes if no changes are detected.
Let's take a look at the dockerfile based on a node alpine image:
1FROM node:17.0.1-alpine2
3WORKDIR /app4
5COPY project /app6
7RUN npm install --production8
9CMD ["node", "src/index.js"]
As we discussed before each line creates its cached layer. Let's build this docker image and see what is happening.
1Step 1/5 : FROM node:17.0.1-alpine217.0.1-alpine: Pulling from library/node3Digest: sha256:959c4fc79a753b8b797c4fc9da967c7a81b4a3a3ff93d484dfe00092bf9fd5844Status: Downloaded newer image for node:17.0.1-alpine5 ---> c0fc1c9c473b6Step 2/5 : WORKDIR /app7 ---> Using cache8 ---> f665e3b63c989Step 3/5 : COPY project /app10 ---> 8d4971fa2f3b11Step 4/5 : RUN npm install --production12 ---> Running in a5eac87912ce13
14up to date, audited 1 package in 371ms15
16found 0 vulnerabilities17Removing intermediate container a5eac87912ce18 ---> 9c21576cad0619Step 5/5 : CMD ["node", "src/index.js"]20 ---> Running in 1ff9c5bb72e721Removing intermediate container 1ff9c5bb72e722 ---> 9783eef2c1d323Successfully built 9783eef2c1d324Successfully tagged dockerfile:latest
Docker image from docker file was built completely from scratch, so it took 1 minute to build. Let's try to build again and see.
1Step 1/5 : FROM node:17.0.1-alpine2 ---> c0fc1c9c473b3Step 2/5 : WORKDIR /app4 ---> Using cache5 ---> f665e3b63c986Step 3/5 : COPY project /app7 ---> Using cache8 ---> 8d4971fa2f3b9Step 4/5 : RUN npm install --production10 ---> Using cache11 ---> 9c21576cad0612Step 5/5 : CMD ["node", "src/index.js"]13 ---> Using cache14 ---> 9783eef2c1d315Successfully built 9783eef2c1d316Successfully tagged dockerfile:latest
A few lines of logs. As you can see this is Using cache text in multiple lines and the whole process took less than 1 sec. This is the power of layer caching. Nothing here was built from scratch, every layer comes from the cache.
Important thing is, If any layer is created from scratch because of some changes in the source file, every next layer is built from scratch too.
The best practice here is to order docker file commands from least to most frequently change to take advantage of caching and this way we can optimize how fast the image gets built.
We should use the .dockerignore
file to list all the files and folders that we want to exclude. we can create the .dockerignore
file in the root directory and list all the files and folders we want to ignore.
When building the image, docker will look at the contents and ignore anything specified inside. Matching is done using Go's filepath.Match
rules.
A sample .dockerignore
file would look like:
1# ignore .git and .cache folders2.git3.cache4
5# ignore all markdown files6*.md7
8# ignore sensitive files9private.key10settings.json
Let's assume that there are some contents in our project, that we need for building the image so during the build process but we don't need them in the final image itself to run the application.
For example in a Java-based application, we need JDK to compile the Java source code but JDK is not needed to run the Java application. In addition to that, we also use build tools like Maven or Gradle to build our Java application and those are also not needed in the final image.
Multi-stage builds allow us to use multiple temporary images during the build process but keep only the latest image as the final image. Let's see how it is done.
1# Build stage2FROM tomcat AS build3
4RUN apt-get update \5 && apt-get -y install maven6
7WORKDIR /app8
9copy project /app10
11RUN mvn package12
13# Runtime stage14FROM tomcat15
16COPY --from=build /app/target/file.war /usr/local/tomcat/webapps17
18EXPOSE 808019
20ENTRYPOINT ["java", "-jar", "/usr/local/tomcat/webapps/file.war"]
Let's also look at the size comparion between two stages:
1REPOSITORY TAG IMAGE ID CREATED SIZE2docker-single latest 8d6b6a4d7fb6 16 seconds ago 259MB3docker-multi latest 813c2fa9b114 3 minutes ago 156MB
By default, Docker runs container processes as root inside of a container. However, this is a bad practice since a process running as root inside the container is running root in the docker host. Thus, if an attacker gains access to our container, they have access to all the root privileges and can perform several attacks on the Docker host, like:
To prevent this, we should run container processes with a non-root user or less previliged user.
1...2
3# create group and user4RUN groupadd -r amar && useradd -g amar amar5
6# set ownership and permissions7RUN chown -R amar:amar /app8
9# switch to user10USER amar11
12...
Some base images already have a generic user bundled in which we can use. For example, the node image already bundles a user called a
node
.
Once we build the image, we should scan the image for security vulnerabilities using the docker scan
command. We need to be logged in to the Docker Hub to run the docker scan command to scan our images.
1$ docker scan hello-world2
3Testing hello-world...4
5Organization: docker-desktop-test6Package manager: linux7Project name: docker-image|hello-world8Docker image: hello-world9Licenses: enabled10
11✓ Tested 0 dependencies for known issues, no vulnerable paths found.12
13Note that we do not currently have vulnerability data for your image.
In the background, Docker uses called Snyk to do the vulnerability scanning of the images. The scan uses a database of vulnerabilities that gets constantly updated so new ones get discovered and added all the time for different images.
Follow all the above mentioned practices to make your Docker image leaner and more secure.
Lastly, thank you for reading this post. For more awesome posts, you can also follow me on Twitter — iamarpandey, Github — amarlearning.