Building your own Docker images (Dockerfiles)

2024-06-08

A Docker image is built by:

Writing a Dockerfile with instructions on how to construct the image
Running the docker build command and passing it the path to the directory containing the Dockerfile

A Dockerfile is a file that contains line-by-line instructions. This file is conventionally always named "Dockerfile", and additional arguments for docker build are required if it's named something else. For an example of a Dockerfile, below is the one that was used to create the bmcase/hello-world-app image used in a previous example:

FROM busybox:1.31.1
COPY script.sh /script.sh
CMD ["sh","/script.sh"]

Here's what each line does:

FROM: This indicates what base image to use. Base images will serve as the foundation for your image, and they might have some important functionality that your image needs. Base images are discussed more in "Selecting a base image" below.
COPY: This instruction specifies that files are to be moved from the file system of the host into that of the image, similar to the Linux "cp" command. In this case, the file "script.sh" is copied into the root directory of the image and keeps the same filename.
CMD: This specifies what command is run when the container starts. In this case, it runs sh /script.sh

Selecting a base image

The most important consideration in selecting the base image for your FROM instruction is the dependencies your application will have.

If you're creating a Java application, you'll need to use a JVM and so will probably want to use an openjdk base image.
Similarly, if your application is written in Python, you'll want a base image that has a Python interpreter (such as the "python" base image).
If you're creating a web server or reverse proxy, you'll want for your base image nginx or something like it.

But sometimes you will not have any dependencies. For example, if your application is written in Go, then the thing you put in your image will just be a compiled Go binary. In this case, there are some much simpler base images available for you to use. You'll typically want to pick the smallest base image that still gives you the functionality you need. This is because complex base images will have more overhead.

Here are some selected base images, arranged from least to most complex:

scratch -> busybox -> alpine -> ubuntu (or debian, or centos, etc.)

scratch: this is an empty image and is therefore the smallest, least complex you can get. It would be the best choice in creating the aforementioned image with a Go application.
busybox: this image strikes a balance between having tools available to Unix/Linux operating systems while remaining lightweight
alpine: a very lightweight image based on busybox that has a larger package library
ubuntu: quite a bit larger than alpine, this contains all the tools and file system of the Ubuntu operating system

docker build

After creating your Dockerfile, you can build an image with it by using docker build. An example of how this is done:

docker build -t my-image .

The argument -t my-image is used for tagging images. In this case, the image is just named "my-image". If you wanted to use a tag to designate it as version 1.0.0, you could run docker build -t my-image:1.0.0 .
Notice that there is a dot (.) at the end of the command. This indicates that docker build is to search for a Dockerfile in whatever directory you're running the command from. If you want it to search for a Dockerfile in a different directory, substitute the dot with a path to that directory.

Once the image is built, Docker will store it locally, allowing it to be used for other commands that do things with Docker images, such as docker run or docker push.

Try it out

Create a file called my-text.txt, and write anything you want in it. Then, in the same folder, create a file called Dockerfile and give it the below contents:

FROM busybox:1.31.1
COPY my-text.txt /my-text.txt
CMD ["cat","/my-text.txt"]

Build it with docker build -t file-regurgitator:0.0.1 . (if not running the command from the same folder, replace the dot at the end with a path to the folder containing your files), and then run it with docker run file-regurgitator:0.0.1. It'll display the text you wrote.

docker push

docker push <image> pushes an image you built to a remote repository.

To push docker images to a registry, you'll have to have access to it and then use the "docker login" command to submit your credentials.

For Docker Hub, this means creating a free Docker Hub account (which I recommend). Create one at https://hub.docker.com/signup

Layers in an image

When an image is created via a Dockerfile, each instruction actually creates a new image. Subsequent instructions are then created as new images layered on the previous images. Docker keeps a cache of these images to make their construction faster.

If it has previously built an image that shares the exact same instruction as the first one in the Dockerfile it is building, then when building the image layer for that instruction, instead of building it again it uses the cached image.
If it has previously built one sharing the exact same two instructions as the first two in the Dockerfile, then it uses does the same for the second layer.
And so on.

For example, let's say that you have previously built an image from this Dockerfile (we'll call it the script Runner):

FROM busybox:1.31.1
COPY script.sh /script.sh
CMD ["sh","/script.sh"]

And then you later build another image with this Dockerfile (we'll call it the script Reader) (notice that "cat" has replaced "sh"):

FROM busybox:1.31.1
COPY script.sh /script.sh
CMD ["cat","/script.sh"]

When building the Reader, it'll see in the cache that it has images for the first two layers, from when it had built the Runner. But since the third instruction is different, it'll build a new image for the third layer.

Behavior of the cache

Returning to the file-regurgitator example:

Open my-text.txt and make some noticeable changes in it
Then run docker build -t file-regurgitator:0.0.2 ., incrementing the version number in the tag
And then run docker run file-regurgitator:0.0.2

Notice the following:

Your changes are reflected in the output of the 0.0.2 image.
But you made no changes at all to the Dockerfile

Since the Dockerfile's instructions were the same as before, you may have expected that it would re-use the cache at all steps, and so you wouldn't see any change to the output. But the text of the line from the Dockerfile isn't the only thing Docker uses to check the distinctness of the COPY instruction. It also calculates and remembers a hash of the file that was copied. So when the copied file is changed and a new image created, it compares the hashes of the files, detects the difference, and avoids using the cache.

When to avoid the cache

There are situations in which Docker isn't smart enough to know that the cache shouldn't be used. This is most commonly seen regarding images in which applications are built. An example would be any Dockerfile having a RUN instruction that does any of the following, or anything like these:

npm install
mvn clean package
pip install -r requirements.txt

An example of such a Dockerfile will be discussed in a later page. You'll want to have Docker ignore the cache in all such cases.

Docker will disregard the cache if the --no-cache option is provided. For example: docker build --no-cache -t my-image . This will force docker to ignore the cache, building it anew each time.

Ben's Words

About me