(July 2017, Feb 2018)
Docker is a container system. Docker containers are single-app rather than full system containers. In fact a Docker container more closely resembles a single, isolated process (albeit ones that may spawn child processes) than traditional virtual machines. A container might spin up only to handle one or a handful of requests before exiting. Docker best matches applications that either are stateless or keep their state in an external store, like a database outside Docker.
Docker containers are light-weight, and operate using standard Linux technologies like cgroups and namespaces. Beyond neatly bundling existing technologies, Docker adds a powerful API for container administration.
Docker has a Docker Server that hosts containers. One Docker server/daemon instance runs on a box, and manages multiple containers. The same binary provides both server and client.† The client sends commands to the server. Optionally, a third component, the Docker Registry, stores images and metadata. A registry may service multiple Docker daemons on multiple hosts. There is an central public Docker registry, but also supports private registries.
The Docker client speaks to the server in one of three ways: via Unix socket, via unencrypted TCP (port 2375), or encrypted TCP (port 2376).
On systemd boxes, the default is a Unix socket, but one controlled by systemd socket activation   (try
systemctl status docker.socket).
Docker images use a “union” file system that starts each container instance from a known state. I.e., changes that occur in a container spun up from a master image do not affect new containers that use the same base. When a running container modifies a file, the change affects only a read-write layer that sits on top of and masks the base image. A Docker image may be comprised of numerous stacked (mostly read-only) layers that nevertheless present a coherent image to an instance that uses it. To an instance, the file system appears writable, but changes don’t affect the underlying layers. The union file system uses copy-on-write for efficiency.
So these are the core Docker things:
† Although one binary provides all functionality, various manual pages split the documentation.
dockerd(8) page covers daemon mode, for example, while the
docker-run(1) page covers running containers.
The Docker project provides packages and good directions for several Linux distributions, including Debian and CentOS.
Verify the installation:
# docker run hello-world # docker ps -a # docker images -a # docker pull fedora # docker run -it fedora /bin/bash [root@2ab31fa5597a /]# dnf update [root@2ab31fa5597a /]# dnf install asterisk # docker run -it alpine ash # docker run -it --rm busybox:glibc
For development work, it’s handy to add your user account to the
# usermod -a -G docker paulgorman
The Docker install fails, and
journalctl -xe shows:
dockerd: Error starting daemon: Error initializing network controller: list bridge addresses failed: no available network
Fix this by adding a bridge for Docker:
# ip link add name docker0 type bridge # ip addr add dev docker0 172.17.0.1/16 # install docker-ce
The Docker daemon creates a number of
It also sets the default policy on the
FORWARD chain to
If we, for example, want KVM guests to have unfettered access to
br0, do something like:
# iptables -I FORWARD -i br0 -o br0 -j ACCEPT
A lot of auxiliary and third-party tooling is available, much of it focusing on orchestration and cluster management.
docker binary itself provides core container management tooling in two ways:
dockerlets us do things like:
dockerdaemon offers a powerful, flexible, and well-documented API
By default, Docker isolates the containers to a bridge on the host called
Containers on the same host can talk to each other, but not to the external/real network.
Forward host ports into the bridge to connect containers to the outside.
--bridge flag in
dockerd(8) to specify a non-default bridge, or to disable networking altogether.
Disable inter-container communication with
Transience and consistency is a selling point of containers, certainly of Docker containers. However, sometimes we need to persist data beyond one run of the container, or to share changes between running instances.
Through the years, Docker has offered various solutions for persistence. Initially, these included injecting data into the container at launch or mounting volumes over NFS. Those solutions proved inadequate. Docker 1.8 and earlier advocated “data only containers” — barebones containers that do nothing besides exposing a data volume. Docker 1.9 and above shifts the recommendation to the “volume” API:
$ docker volume create --name my_data $ docker run -d -v my_data:/container/path/for/volume container_image my_command $ docker volume ls $ docker volume inspect volume_name $ docker volume ls -f dangling=true $ docker volume rm my_unwanted_data
Data volumes avoid the copy-on-write mechanism of regular containers, expose the data for management by the host, and can be accessed by multiple instances. The data exists outside Docker’s union file system. Note that this may cause challenges, e.g., file locking.
Another aspect of persistence is application configuration data. Ideally, a Dockerized application gets all its configuration in the form of environment variables passed to it by the container.
Note that, because of the overlay filesystem, writes inside the container perform poorly, so extensive write are discouraged, even if we don’t care about persisting them.
Logs are not written inside the container. See “Docker Logs” below.
Docker containers use a union file system (like union mounts from Plan 9, or a bit like qcow2 sparse images). This makes new container instances very cheap to create — in terms of disk space, a new container might only add a few KB on top of the space used by the underlying image.
Changes to an image accumulate in thin layers, presenting a “union” of the layers as one coherent file system to the container.
A Dockerfile specifies how to build a container.
FROM fedora:23 MAINTAINER Paul Gorman <firstname.lastname@example.org> LABEL "thing"="important note" "another"="reference this later" ENV astuser RUN dnf update && dnf install asterisk && dnf clean all ADD ./*.conf /etc/asterisk/ EXPOSE 5060-5061/tcp EXPOSE 10000-20000/udp USER $astuser CMD ["/usr/sbin/asterisk"]
Assuming ‘Dockerfile’ is in our current directory, build the container with:
$ docker build --tag "my_build" .
By default, Docker runs process in the container as “root”, unless changed with the “USER” instruction. Don’t run productions containers as “root”. (Even though the container provides some isolation, the container still uses the host kernel, where we don’t want it mucking around as root!)
Because each command in the dockerfile adds a layer to the union filesystem, it’s good to combine lines like the
dnf commands above.
If the setup instructions are extensive enough to create a lot of layers, consider pulling down a shell script with
ADD and feeding it to
RUN instead of including all the setup directly in the dockerfile.
CMD instruction sets what gets run when the container starts.
The dockerfile only contains one
CMD (or, anyhow, only the last one actually happens).
Run our newly-built container:
$ docker run -d -p 8080:8080 my_build
docker run is a convenience wrapper that masks two command: first
docker create to construct the image, and then
docker start to start the container.
By default, Docker gives the container a name like
Override the default like
docker run --name "mycontainer".
“Tags” name image builds. “Names” name container instances. Container names must be unique per Docker host.
Labels let us apply arbitrary key/value metadata to images or individual containers.
Add labels during image creation and/or at container runtime.
See the labels on a container with
docker inspect peaceful_blackwell.
Search containers for labels like:
$ docker ps -a -f label=deployer=Paul
To build an image, feed a dockerfile to the
docker tool with the
Each command in the dockerfile generates an additional layer on top of the image, so it’s easy to understand how Docker composes the image.
Delete an unwanted container, then remove the image for that container:
# docker images -a REPOSITORY TAG IMAGE ID CREATED SIZE rocketchat/rocket.chat latest 2cd49c2c326d 8 days ago 769MB busybox glibc 2bbf44aed9f8 7 weeks ago 4.4MB busybox latest 5b0d59026729 7 weeks ago 1.15MB alpine latest 3fd9065eaf02 2 months ago 4.14MB hello-world latest f2a91732366c 3 months ago 1.85kB # docker rmi rocketchat/rocket.chat Error response from daemon: conflict: unable to remove repository reference "rocketchat/rocket.chat" (must force) - container df3f9b43f8c7 is using its referenced image 2cd49c2c326d # docker rm df3f9b43f8c7 df3f9b43f8c7 # docker rmi rocketchat/rocket.chat Untagged: rocketchat/rocket.chat:latest Untagged: rocketchat/rocket.chat@sha256:061dcb056431eccc6f7dce1e7ea400ccd31278dea2181c558b9c891bf3f0e141 Deleted: sha256:2cd49c2c326d8361fb8333db65e9bd0c551fb36ae3b64e2d8e534da8f5a4aafd Deleted: sha256:d2ae5a7ae8b0a9526e20fbd8a4956ceffe5396934306d2f2736dcb3706eb327b Deleted: sha256:7f11c62ba8995a6b7692fb3ff3a501984b068d2b454e409ff564b390ffb81903 Deleted: sha256:04f266c56df2117c1ecfad0c32711484c540f6949dace10dbb1e7fe5e8040c71 Deleted: sha256:e0f2864d8ad8234bf233bd3848be65e7a7358f2cfb3cd7e2792ca2c4c6aefc6f Deleted: sha256:4bcdffd70da292293d059d2435c7056711fab2655f8b74f48ad0abe042b63687
Docker logs anything written to STDOUT or STDERR from inside a container. The logging method is configurable. By default, Docker logs to a per-container JSON file.
$ docker logs --since 1h 1bd4c783ad93 $ docker logs --follow 1bd4c783ad93
Docker saves the JSON files in
With long-running or chatty containers, the defaults may be inadequate.
For example, Docker does not rotate logs by default, although
docker run has the options
--log-opt max-size and
Other supported logging mechanisms include syslog and journald.
$ docker container ls
The daemon gets most (all?) of its config as command-line arguments. On systemd boxes, the command invocation happens in the service file. If we want to customize the service (on Debian):
# cp /lib/systemd/system/docker.service /etc/systemd/system/ # vim /etc/systemd/system/docker.service # systemctl daemon-reload # systemctl restart docker.service
docker.service file only sets the method of client communication.
-H fd://, Docker expects the process that spawned it (i.e., systemd) to hand it an already-activated socket.
When a container starts, Docker copies various files from the host (
/var/lib/docker/containers/mycontainer/, then bind mounts them into the container.
Override or augment this behavior with arguments to
docker run —
Docker allocates CPU in terms of “shares”, with 1024 total shares representing the whole available pool of CPU.
A container allocated 512 shares can use half the total CPU resources, for example.
Configure this with the
--cpu-shares argument to
Constrain memory with
docker run -m 1g ….
This allocates RAM and a matching amount of swap.
Set swap separately with
Constrain IO like
Use a value between 10 and 1000 (default 500).
These constraints are enforced by cgroups.
It’s possible to adjust the constraints of a running container.
By default, no.
docker stopdoesn’t end a misbehaving container?
$ docker kill 1bd4c783ad93
Just like the system
docker-kill can send other signals with
--signal=HUP or whatever.
docker rm or
$ docker info 75625e1f51a0
But what going on right now?
This is like
top for running containers:
$ docker stats
$ docker exec -t -i 75625e1f51a0 /bin/bash
It’s also possible to use
nsenter to directly break into the container namespace from the host.
We might expect to create a namespace like
ip netns add foo, then run the container like
docker run --netns=foo.
That doesn’t work.
The next-best thing is to create the namespace like
docker network create foo, and then
docker run --network=foo.
ip netns list will not include
ip netns list looks for files in
docker network create deletes its files from
ip netns list isn’t aware of them.
We could, if we had any reason to, expose a
docker-network namespace by re-linking the
/proc/$PID/ns/net file into
It’s also possible to start a container with
--network=none and afterwards attach it to a network namespace with a veth pair.
The atomic host concept involves a light-weight container-supervisor OS — a minimal, immutable OS image. The host configuration comes from the network — e.g., by cloud-init and OSTree. To update the host, simply swap out that OS image atomically, and let the new instance pull down its config from the network again.
Project Atomic is a Red Hat-based atomic host project. CoreOS and RancherOS are similar.
CentOS has Atomic Host builds available as ISO for bare-metal install, Amazon AMI image, and QCOW2 image for KVM.
Many technologies come together in Project Atomic:
Before spinning up our first Atomic host, we need cloud-init in place to handle early initialization of the instance. Cloud-init does things like:
Cloud-init is not magic. Essentially, it creates config files in an ISO image that gets attached to a booting Atomic Host virtual machine.
UPDATE: As of 2018, Red Hat acquired CoreOS, leaving the future of how CoreOS will merge with Project Atomic uncertain.
I expect that over the next year or so, Fedora Atomic Host will be replaced by a new thing combining the best from Container Linux and Project Atomic. This new thing will be “Fedora CoreOS” and serve as the upstream to Red Hat CoreOS.
Project Atomic is an umbrella project consisting of two flavors of Atomic Host (Fedora and CentOS) as well as various other container-related projects. Project Atomic as a project name will be sunset by the end of 2018 with a stronger individual focus on its successful projects such as Buildah and Cockpit.