Docker

What are we trying to do

We would like to have a consistent environment to develop and deploy applications in across devices so, in theory, we can run the code everywhere and have it work - not "it works on my machine". Different devices may have different packages installed at different versions. We don't want the application to break because we updated the system. You may also not want to configure servers on a development machine. We need a way to repeatably specify what a system should look like in order to run the application. We could then build such a system on any device and run the application in that without worrying about what's outside it. We would also like to introduce separation between applications for security reasons - sandboxing

Operating system

An OS is fundamentally a resource manager, which allocates resources such as processor time, memory, files, or device access to different processes. The kernel is the actual program loaded by the bootloader which provides these services.

Virtualization and emulation

Emulation: Software which imitates the behaviour of a piece of hardware
Virtualization: Logical abstraction of the hardware

One approach would be to run multiple kernels with different environments running in each. The problem is that the kernel has special access to the CPU. When an x86 based system starts the processor is in real mode where the operating system can use special instructions and registers not available in protected mode which it switches to in order to run userspace code. The separation, theoretically, limits what applications can do to interfere with each other or the system. This obviously causes a problem for trying to run two kernels at the same time, the second one also needs access to real mode, and even then would overwrite the setup done by the first.

One way of addressing this is emulation, we go a step up and write a program which acts like a processor and then use it to interpret the binary files. The problem is that this is slow. An improvement would be to natively execute the parts of the code which we can run in protected mode, and replace real mode instructions with a different set of instructions which emulates just that behaviour in our virtualization program. This is the difference between emulation and virtualization, virtualization is just an abstraction and doesn't required the hardware to be emulated. More recently processor features like VT-x added instructions for entering a virtual execution mode which the guest sees as running in real mode.

Chroot

Another approach would be to only have one kernel running but try to separate the project from the rest of the system.

The chroot command is useful for trying to repair a system from a bootable drive and is also used to package software for the distribution's repositories

sudo debootstrap --variant=minbase jammy . http://archive.ubuntu.com/ubuntu/

This command fetches the base system image for ubuntu jammy and writes the files to the current directory.

sudo chroot .

This command only changes the root path so that / now refers to what was the current directory We can still access other resources such as processes, and devices such as networking

ps -a

apt install iproute2 -y
ip addr

At first it may look like we've at least hidden the filesystem, we can't ls or cd above the root, but we can get around that.

apt install python3

>>> import os
>>> os.getcwd()
'/'
>>> os.listdir()
['proc', 'home', 'mnt', 'usr', 'sbin', 'media', 'srv', 'boot', 'opt', 'bin', 'lib32', 'libx32', 'tmp', 'root', 'run', 'lib64', 'lib', 'dev', 'etc', 'sys', 'var']

If are currently in the root directory in the chroot it looks as you would expect. But what if we chroot again?

>>> os.chroot('home')
>>> os.getcwd()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory
>>> os.listdir()
['proc', 'home', 'mnt', 'usr', 'sbin', 'media', 'srv', 'boot', 'opt', 'bin', 'lib32', 'libx32', 'tmp', 'root', 'run', 'lib64', 'lib', 'dev', 'etc', 'sys', 'var']

We use /home as an existing directory we can change the root to. Our current working directory did not change, this is why we couldn't use the chroot command. Rather than just changing the root it runs a command in the new root and returns. Now we are now in an invalid path, above the root. But we can still list it. Can we go higher?

>>> os.chdir('..')
>>> os.listdir()
['jail']

Since we are outside the root, we are no longer constrained to stay within it. Note that this particular method must be run from a root shell otherwise we wouldn't be able to perform the chroot into the subdirectory

Containers

A container is a sandboxed group of processes that are isolated from the host system. In the early 2000s linux added namespaces, cgroups (control groups), and seccomp (secure computing mode) which are the main functionalities containers are built on, though they weren't really feature complete enough until around 2013.

Namespaces provide the ability to create separate sets of resources (of supported types) for different processes. For example creating a new pid namespace for a process allows it and it's children to have and interact with pid numbers separate from the system running in the default namespace.

Cgroups allow the kernel to limit resources, such as processor, memory, disk bandwidth, etc... for a group of processes.

Seccomp allows the kernel to restrict a processes access to system calls.

With the ability to create a separate set of common resources, limit resource usage, and block access to system calls which manipulate resources we don't have namespaces for, we have everything we need to isolate a set of processes from the rest of the system.

Container vs virtualization

Every running container shares the same os kernel, along with the host. This makes it easy to share resources between them since they are all running within the same resource manager. In contrast, virtualization requires creating an emulated hardware interface the guest can use to access the outside world. While this has improved it can still be a pain to do something as simple as share files with the guest. The higher overhead of virtualization also reduces performance.

The main downsides of containers are that since they use the same kernel, a kernel exploit can lead to a container escape. Also, only resources which have namespace support can be isolated from the host and other containers. For example, time and the kernel keyring do not have namespaces. By default Docker blocks access to these syscalls so that the container remains isolated, but changes to these resources will affect the host and other containers.

Images and containers

While a container is a running group of processes, an image is a filesystem (similar to our chroot) to use as the environment as well as other metadata such as the user to run as, working directory, environment variables, etc... A container can be thought of as a running instance of an image.

Running a container

We can run a container using the official ubuntu image

docker run ubuntu
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
2ab09b027e7f: Pull complete
Digest: sha256:67211c14fa74f070d27cc59d69a7fa9aeff8e28ea118ef3babc295a0428a6d21
Status: Downloaded newer image for ubuntu:latest

Notice that it doesn't find an image called ubuntu locally, so it pulls from the docker hub
It just exited, by default

We can run it interactively which will connect our shell's standard streams to the container

docker run -i ubuntu
ls
bin
boot
dev
etc
home
lib
lib32
lib64
libx32
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var

But we don't get a shell prompt because bash is not aware it's running in a terminal

Docker can be told to set up a psuedo terminal with

docker run -it ubuntu
root@a6b978a183b9:/# ls
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

You can see that you are running as the root user in the container and the hostname is the container id
If you look around the file system you will see that it's different to your system

There is also only 1 process running - bash

root@d0d4b391e433:/# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4624  3612 pts/0    Ss   04:32   0:00 /bin/bash
root           9  0.0  0.0   7060  1600 pts/0    R+   04:38   0:00 ps aux

Deleting containers

Stopped containers are still around and can be resumed

-a will list stopped containers, by default only running containers are listed

docker ps -a
CONTAINER ID   IMAGE     COMMAND       CREATED          STATUS          PORTS     NAMES
d0d4b391e433   ubuntu    "/bin/bash"   31 seconds ago   Up 30 seconds             hardcore_euler

Can delete the containers we're done with using rm
Will work with the container name or id
```
docker rm d0d4b391e433
```
Note that you need to stop a container before it can be deleted

Pruning containers

If you have multiple stopped containers that you want to clean up you can use prune, which will delete everything that's stopped

docker container prune

Naming containers

We can name our container to make it easier to refer to later. Names must be unique and we can not run a container with the same name before deleting the old one.

docker run -it --name=ubuntu-container ubuntu

Attaching and detaching from containers

We can use ctrl-p followed by ctrl-q to detach the terminal's streams from the container. We can reattach with

docker attach ubuntu-container

Starting and stopping containers

We can stop a container with docker stop and restart with docker start. This is not a suspend - docker will send a SIGTERM to the container process followed by a SIGKILL if it doesn't stop after a grace period.

Executing a shell in a container

We can run a command in a running container with docker exec

docker exec -it ubuntu-container bash

Publishing ports

Lets try running a server in the container

root@d0d4b391e433:/# apt update && apt install ncat
root@d0d4b391e433:/# ncat -l 1337

And trying to connect to it on the host

ncat 127.0.0.1 1337
Ncat: Connection refused.

The containers has a network namespace which has it's own interfaces, routing tables, firewall rules, etc...
We need to publish the ports to be accessed outside the container
Format: ip:host port:container port
If ip isn't specified it the ports will be published on every interface
If the host port isn't specified it will be mapped to a random host port
```
docker run -itp 1337:1337 ubuntu
```

Mounting

We can mount part of the filesystem into the container to share files with it
-v for volume
-v /host path:/container path
```
docker run -itv ~/:/volume ubuntu
```
- Note that the slash after the tilde is required to make a valid tilde prefix - see tilde expansion in the bash manual

Creating an image

To tell docker how to create an image we write a dockerfile which is a series of instructions for docker to follow to create the image

Create a file called Dockerfile The dockerfile must start by inheriting an existing image

FROM ubuntu

We can run commands to install dependencies

RUN apt update && apt install ncat

It's recommended to not run your application as root in the container

RUN useradd -ms /bin/bash app
USER app

We could application files into the container

COPY . /app

We can set the directory further commands should run in

WORKDIR /app

Document a port that should be published

EXPOSE 1337

And specify a command to run when the container starts

CMD ncat -l 1337

We can build the image using

docker build . -t test-image
[+] Building 48.8s (10/10) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                 0.0s
 => => transferring dockerfile: 188B                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest                                                                                                                     0.0s
 => CACHED [1/5] FROM docker.io/library/ubuntu                                                                                                                                       0.0s
 => [internal] load build context                                                                                                                                                    0.0s
 => => transferring context: 32B                                                                                                                                                     0.0s
 => [2/5] RUN apt update && apt install -y ncat                                                                                                                                     47.5s
 => [3/5] RUN useradd -ms /bin/bash app                                                                                                                                              0.6s
 => [4/5] COPY . /app                                                                                                                                                                0.1s
 => [5/5] WORKDIR /app                                                                                                                                                               0.1s
 => exporting to image                                                                                                                                                               0.4s
 => => exporting layers                                                                                                                                                              0.4s
 => => writing image sha256:d63525149978da6ac92d55fbdfde54f1be80aee2ce77ba542e7ecb45595776af                                                                                         0.0s
 => => naming to docker.io/library/test-image

-t specifies a tag we can use to refer to the image later, rather than having to query the generated id

We can then run our image as before, note that EXPOSE in the dockerfile doesn't actually publish the port

docker run -itp 1337:1337 --name=test-container test-image

There are two different forms for the cmd command. Shell form

CMD ncat -l 1337

And exec form

CMD ["ncat", "-l", "1337"]

Shell form will be run with /bin/sh -c and will also interpolate environment variables. Exec form will treat the first value as the program to run followed by arguments to pass. If an entrypoint is defined with the ENTRYPOINT instruction then CMD in exec form will pass the values as arguments to that instead. This allows an image to specify e.g. a server to start with ENTRYPOINT and default arguments using CMD which can be overridden by the user.

Layers and caching

Images are stored as layers, which store the changes made to the previous layer. Image layers are readonly, containers add another filesystem layer which is writeable to that container. If a file in the image is modified, it is copied to the container layer.

Each instruction in the Dockerfile creates a new layer, which can contain filesystem and metadata changes. Metadata includes the user to run as, working directory, environment variables, etc... We can see the layers using the following command

docker image history test-image
IMAGE          CREATED             CREATED BY                                      SIZE      COMMENT
d63525149978   About an hour ago   CMD ["/bin/sh" "-c" "ncat -l 1337"]             0B        buildkit.dockerfile.v0
<missing>      About an hour ago   EXPOSE map[1337/tcp:{}]                         0B        buildkit.dockerfile.v0
<missing>      About an hour ago   WORKDIR /app                                    0B        buildkit.dockerfile.v0
<missing>      About an hour ago   COPY . /app # buildkit                          149B      buildkit.dockerfile.v0
<missing>      About an hour ago   USER app                                        0B        buildkit.dockerfile.v0
<missing>      About an hour ago   RUN /bin/sh -c useradd -ms /bin/bash app # b…   334kB     buildkit.dockerfile.v0
<missing>      About an hour ago   RUN /bin/sh -c apt update && apt install -y …   47MB      buildkit.dockerfile.v0
<missing>      7 weeks ago         /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B
<missing>      7 weeks ago         /bin/sh -c #(nop) ADD file:c8ef6447752cab254…   77.8MB
<missing>      7 weeks ago         /bin/sh -c #(nop)  LABEL org.opencontainers.…   0B
<missing>      7 weeks ago         /bin/sh -c #(nop)  LABEL org.opencontainers.…   0B
<missing>      7 weeks ago         /bin/sh -c #(nop)  ARG LAUNCHPAD_BUILD_ARCH     0B
<missing>      7 weeks ago         /bin/sh -c #(nop)  ARG RELEASE                  0B

Docker uses the layers to cache the build process, when building docker recreates all the layers after the changed layer. If we re-run the build you can see that the layers were found in the cache.

docker build . -t test-image
[+] Building 0.1s (10/10) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                 0.0s
 => => transferring dockerfile: 188B                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/ubuntu:latest                                                                                                                     0.0s
 => [1/5] FROM docker.io/library/ubuntu                                                                                                                                              0.0s
 => [internal] load build context                                                                                                                                                    0.0s
 => => transferring context: 32B                                                                                                                                                     0.0s
 => CACHED [2/5] RUN apt update && apt install -y ncat                                                                                                                               0.0s
 => CACHED [3/5] RUN useradd -ms /bin/bash app                                                                                                                                       0.0s
 => CACHED [4/5] COPY . /app                                                                                                                                                         0.0s
 => CACHED [5/5] WORKDIR /app                                                                                                                                                        0.0s
 => exporting to image                                                                                                                                                               0.0s
 => => exporting layers                                                                                                                                                              0.0s
 => => writing image sha256:d63525149978da6ac92d55fbdfde54f1be80aee2ce77ba542e7ecb45595776af                                                                                         0.0s
 => => naming to docker.io/library/test-image

Docker will rerun the step if the line changes, or in the case of ADD and COPY it checks the hashes of the files you are copying into the container and will re-copy if they have changed. Also note that this cache is shared between images and is not keyed on dockerfile location, or tag, or anything.

Deploying a node app

Let's create a basic express server

Make a new node project with

npm init

Install express

npm install express

Add the following to src/index.js

const express = require("express");

const server = express();

server.get("/", (req, res) => {
  res.send("Hello world");
});

server.listen(3000);

Add a start script to package.json

{
  "name": "app",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "start": "node old/index.js"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "express": "^4.18.2"
  }
}

Check that the server is working by running

npm start

And visiting 127.0.0.1:3000 in a web browser

Now let's create a basic Dockerfile

FROM node
WORKDIR /app
COPY . .
RUN npm install
CMD ["npm", "start"]

docker build . -t nodeapp
docker run -itp 3000:3000 nodeapp

This works, but we can improve it.

Firstly, we are using a larger image than necessary, and have not specified a version which means that next time we build we could get a different result if there's an update. Let's switch to the alpine3.17 tag

FROM node:alpine3.17
WORKDIR /app
COPY . .
RUN npm install
CMD ["npm", "start"]

We shouldn't run the process inside the container as root. The node container creates a user called node which we can switch to. We also need to chown the copied files to be owned by the node user.

FROM node:alpine3.17
USER node
WORKDIR /app
COPY --chown=node:node . .
RUN npm install
CMD ["npm", "start"]

We should also stop copying the node_modules folder into the container. We can add a .dockerignore file which contains patters to specify files which should not be copied.

echo 'node_modules/' > .dockerignore

Next, notice that every time we change the source file we need to reinstall the dependencies, since we invalidated the layer cache.

echo -e "\n" >> src/index.js
docker build . -t nodeapp
[+] Building 2.7s (9/9) FINISHED
 => [internal] load .dockerignore                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                      0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                 0.0s
 => => transferring dockerfile: 110B                                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/node:alpine3.17                                                                                                                   0.9s
 => [1/4] FROM docker.io/library/node:alpine3.17@sha256:cc4e8f3d78a276fa05eae1803b6f8cbb43145441f54c828ab14e0c19dd95c6fd                                                             0.0s
 => [internal] load build context                                                                                                                                                    0.0s
 => => transferring context: 27.77kB                                                                                                                                                 0.0s
 => CACHED [2/4] WORKDIR /app                                                                                                                                                        0.0s
 => [3/4] COPY . .                                                                                                                                                                   0.1s
 => [4/4] RUN npm install                                                                                                                                                            1.4s
 => exporting to image                                                                                                                                                               0.1s
 => => exporting layers                                                                                                                                                              0.1s
 => => writing image sha256:7ea580abd1c10a42d81c81fa44045408bc6fedb4eda999851cdce7467510a524                                                                                         0.0s
 => => naming to docker.io/library/nodeapp

We could copy the package.json and package-lock.json in first and run the install before copying the source.

FROM node:alpine3.17
USER node
WORKDIR /app
COPY --chown=node:node package.json package-lock.json .
RUN npm install
COPY --chown=node:node . .
CMD ["npm", "start"]

Notice that this time the npm install is cached

echo -e "\n" >> src/index.js
docker build . -t nodeapp
[+] Building 1.2s (10/10) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                 0.0s
 => => transferring dockerfile: 196B                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                    0.0s
 => => transferring context: 54B                                                                                                                                                     0.0s
 => [internal] load metadata for docker.io/library/node:alpine3.17                                                                                                                   1.0s
 => [1/5] FROM docker.io/library/node:alpine3.17@sha256:cc4e8f3d78a276fa05eae1803b6f8cbb43145441f54c828ab14e0c19dd95c6fd                                                             0.0s
 => [internal] load build context                                                                                                                                                    0.0s
 => => transferring context: 351B                                                                                                                                                    0.0s
 => CACHED [2/5] WORKDIR /app                                                                                                                                                        0.0s
 => CACHED [3/5] COPY --chown=node:node package.json package-lock.json .                                                                                                             0.0s
 => CACHED [4/5] RUN npm install                                                                                                                                                     0.0s
 => [5/5] COPY --chown=node:node . .                                                                                                                                                 0.1s
 => exporting to image                                                                                                                                                               0.1s
 => => exporting layers                                                                                                                                                              0.1s
 => => writing image sha256:f2833dc99e0d12bee5d8961ca6e4adc7aeef93441eb12d9b39db27b89bba2567                                                                                         0.0s
 => => naming to docker.io/library/nodeapp

You may have noticed that node doesn't seem particularly happy with keyboard interrupt, or docker stopping the container.

> app@1.0.0 start
> node old/index.js

npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c node old/index.js

npm ERR! A complete log of this run can be found in: /home/node/.npm/_logs/2023-04-26T12_25_00_931Z-debug-0.log

Node is not designed to run as pid 1, this is usually reserved for an init system like systemd which is responsible for setting up services and user sessions among other things. We can use dumb-init which is a simplified init system that will handle the responsibilities of pid 1 correctly.

FROM node:alpine3.17
RUN apk add dumb-init
USER node
WORKDIR /app
COPY --chown=node:node package.json package-lock.json .
RUN npm install
COPY --chown=node:node . .
CMD ["dumb-init", "npm", "start"]

npm install will use semantic versioning to decide if it should update to a newer package version, and update package-lock.json. We do not want to do this in the container, builds could suddenly start failing if a package updates. We can use npm ci to only use the versions in package-lock as well as only downloading production dependencies

FROM node:alpine3.17
RUN apk add dumb-init
USER node
WORKDIR /app
COPY --chown=node:node package.json package-lock.json .
RUN npm ci --only=production
COPY --chown=node:node . .
CMD ["dumb-init", "npm", "start"]

We should set the environment variable NODE_ENV to production so that packages can use more efficient code rather than code designed for debugging. Not using production mode can also be a security issue if the software allows the client more access for debugging.

FROM node:alpine3.17
RUN apk add dumb-init
USER node
WORKDIR /app
ENV NODE_ENV production
COPY --chown=node:node package.json package-lock.json .
RUN npm ci --only=production
COPY --chown=node:node . .
CMD ["dumb-init", "npm", "start"]

You should delete temporary files (e.g. downloaded archives) in the same RUN command so they are not added to the layer

Multistage builds

We can use webpack to do a multistage build First install webpack as a dev dependency

npm install --save-dev webpack webpack-cli

We can now build the app using

npx webpack-cli --entry ./old/index.js --mode production --target node

This will bundle all the dependencies in a single file at ./dist/main.js

Add this dist directory to dockerignore

echo 'dist/' > .dockerignore

Let's put this into a dockerfile

FROM node:alpine3.17 as builder
USER node
WORKDIR /app
ENV NODE_ENV production
COPY --chown=node:node package.json package-lock.json .
RUN npm ci
COPY --chown=node:node . .
RUN npx webpack-cli --entry ./src/index.js --mode production --target node

FROM node:alpine3.17
RUN apk add dumb-init
USER node
WORKDIR /app
ENV NODE_ENV production
COPY --from=builder --chown=node:ndoe /app/dist/main.js .
CMD ["dumb-init", "node", "/app/main.js"]

We create two images, one that we will refer to as builder which runs the build step, and another which will run the application. This allows us to publish a smaller container for the application. It also means that any secrets needed in the build process can be separated from what is published.