(July 2017, Feb 2018)
Docker is a container system. Docker containers are single-app rather than full system containers. In fact a Docker container more closely resembles a single, isolated process (albeit ones that may spawn child processes) than traditional virtual machines. A container might spin up only to handle one or a handful of requests before exiting. Docker best matches applications that either are stateless or keep their state in an external store, like a database outside Docker.
Docker containers are light-weight, and operate using standard Linux technologies like cgroups and namespaces. Beyond neatly bundling existing technologies, Docker adds a powerful API for container administration.
Docker has a Docker Server that hosts containers. One Docker server/daemon instance runs on a box, and manages multiple containers. The same binary provides both server and client.† The client sends commands to the server. Optionally, a third component, the Docker Registry, stores images and metadata. A registry may service multiple Docker daemons on multiple hosts. There is an central public Docker registry, but also supports private registries.
The Docker client speaks to the server in one of three ways: via Unix socket, via unencrypted TCP (port 2375), or encrypted TCP (port 2376).
On systemd boxes, the default is a Unix socket, but one controlled by systemd socket activation [1] [2] (try systemctl status docker.socket
).
Docker images use a “union” file system that starts each container instance from a known state. I.e., changes that occur in a container spun up from a master image do not affect new containers that use the same base. When a running container modifies a file, the change affects only a read-write layer that sits on top of and masks the base image. A Docker image may be comprised of numerous stacked (mostly read-only) layers that nevertheless present a coherent image to an instance that uses it. To an instance, the file system appears writable, but changes don’t affect the underlying layers. The union file system uses copy-on-write for efficiency.
So these are the core Docker things:
† Although one binary provides all functionality, various manual pages split the documentation.
The dockerd(8)
page covers daemon mode, for example, while the docker-run(1)
page covers running containers.
The Docker project provides packages and good directions for several Linux distributions, including Debian and CentOS.
Verify the installation:
# docker run hello-world
# docker ps -a
# docker images -a
# docker pull fedora
# docker run -it fedora /bin/bash
[root@2ab31fa5597a /]# dnf update
[root@2ab31fa5597a /]# dnf install asterisk
# docker run -it alpine ash
# docker run -it --rm busybox:glibc
For development work, it’s handy to add your user account to the docker
group:
# usermod -a -G docker paulgorman
The Docker install fails, and journalctl -xe
shows:
dockerd[18376]: Error starting daemon: Error initializing network controller: list bridge addresses failed: no available network
Fix this by adding a bridge for Docker:
# ip link add name docker0 type bridge
# ip addr add dev docker0 172.17.0.1/16
# install docker-ce
The Docker daemon creates a number of iptables
rules.
It also sets the default policy on the FORWARD
chain to DENY
.
If we, for example, want KVM guests to have unfettered access to br0
, do something like:
# iptables -I FORWARD -i br0 -o br0 -j ACCEPT
A lot of auxiliary and third-party tooling is available, much of it focusing on orchestration and cluster management.
The docker
binary itself provides core container management tooling in two ways:
docker
lets us do things like:
docker
daemon offers a powerful, flexible, and well-documented APIBy default, Docker isolates the containers to a bridge on the host called docker0
.
Containers on the same host can talk to each other, but not to the external/real network.
Forward host ports into the bridge to connect containers to the outside.
See the --bridge
flag in dockerd(8)
to specify a non-default bridge, or to disable networking altogether.
Disable inter-container communication with --icc=false
option.
Transience and consistency is a selling point of containers, certainly of Docker containers. However, sometimes we need to persist data beyond one run of the container, or to share changes between running instances.
Through the years, Docker has offered various solutions for persistence. Initially, these included injecting data into the container at launch or mounting volumes over NFS. Those solutions proved inadequate. Docker 1.8 and earlier advocated “data only containers” — barebones containers that do nothing besides exposing a data volume. Docker 1.9 and above shifts the recommendation to the “volume” API:
$ docker volume create --name my_data
$ docker run -d -v my_data:/container/path/for/volume container_image my_command
$ docker volume ls
$ docker volume inspect volume_name
$ docker volume ls -f dangling=true
$ docker volume rm my_unwanted_data
Data volumes avoid the copy-on-write mechanism of regular containers, expose the data for management by the host, and can be accessed by multiple instances. The data exists outside Docker’s union file system. Note that this may cause challenges, e.g., file locking.
Another aspect of persistence is application configuration data. Ideally, a Dockerized application gets all its configuration in the form of environment variables passed to it by the container.
Note that, because of the overlay filesystem, writes inside the container perform poorly, so extensive write are discouraged, even if we don’t care about persisting them.
Logs are not written inside the container. See “Docker Logs” below.
Docker containers use a union file system (like union mounts from Plan 9, or a bit like qcow2 sparse images). This makes new container instances very cheap to create — in terms of disk space, a new container might only add a few KB on top of the space used by the underlying image.
Changes to an image accumulate in thin layers, presenting a “union” of the layers as one coherent file system to the container.
A Dockerfile specifies how to build a container.
See docker-build(1)
.
FROM fedora:23
MAINTAINER Paul Gorman <paul@example.com>
LABEL "thing"="important note" "another"="reference this later"
ENV astuser
RUN dnf update && dnf install asterisk && dnf clean all
ADD ./*.conf /etc/asterisk/
EXPOSE 5060-5061/tcp
EXPOSE 10000-20000/udp
USER $astuser
CMD ["/usr/sbin/asterisk"]
Assuming ‘Dockerfile’ is in our current directory, build the container with:
$ docker build --tag "my_build" .
By default, Docker runs process in the container as “root”, unless changed with the “USER” instruction. Don’t run productions containers as “root”. (Even though the container provides some isolation, the container still uses the host kernel, where we don’t want it mucking around as root!)
Because each command in the dockerfile adds a layer to the union filesystem, it’s good to combine lines like the dnf
commands above.
If the setup instructions are extensive enough to create a lot of layers, consider pulling down a shell script with ADD
and feeding it to RUN
instead of including all the setup directly in the dockerfile.
The CMD
instruction sets what gets run when the container starts.
The dockerfile only contains one CMD
(or, anyhow, only the last one actually happens).
Run our newly-built container:
$ docker run -d -p 8080:8080 my_build
docker run
is a convenience wrapper that masks two command: first docker create
to construct the image, and then docker start
to start the container.
By default, Docker gives the container a name like peaceful_blackwell
(i.e., adjective_famousname
).
Override the default like docker run --name "mycontainer"
.
“Tags” name image builds. “Names” name container instances. Container names must be unique per Docker host.
Labels let us apply arbitrary key/value metadata to images or individual containers.
Add labels during image creation and/or at container runtime.
See the labels on a container with docker inspect peaceful_blackwell
.
Search containers for labels like:
$ docker ps -a -f label=deployer=Paul
To build an image, feed a dockerfile to the docker
tool with the build
flag.
Each command in the dockerfile generates an additional layer on top of the image, so it’s easy to understand how Docker composes the image.
See docker-build(1)
.
Delete an unwanted container, then remove the image for that container:
# docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
rocketchat/rocket.chat latest 2cd49c2c326d 8 days ago 769MB
busybox glibc 2bbf44aed9f8 7 weeks ago 4.4MB
busybox latest 5b0d59026729 7 weeks ago 1.15MB
alpine latest 3fd9065eaf02 2 months ago 4.14MB
hello-world latest f2a91732366c 3 months ago 1.85kB
# docker rmi rocketchat/rocket.chat
Error response from daemon: conflict: unable to remove repository reference "rocketchat/rocket.chat" (must force) - container df3f9b43f8c7 is using its referenced image 2cd49c2c326d
# docker rm df3f9b43f8c7
df3f9b43f8c7
# docker rmi rocketchat/rocket.chat
Untagged: rocketchat/rocket.chat:latest
Untagged: rocketchat/rocket.chat@sha256:061dcb056431eccc6f7dce1e7ea400ccd31278dea2181c558b9c891bf3f0e141
Deleted: sha256:2cd49c2c326d8361fb8333db65e9bd0c551fb36ae3b64e2d8e534da8f5a4aafd
Deleted: sha256:d2ae5a7ae8b0a9526e20fbd8a4956ceffe5396934306d2f2736dcb3706eb327b
Deleted: sha256:7f11c62ba8995a6b7692fb3ff3a501984b068d2b454e409ff564b390ffb81903
Deleted: sha256:04f266c56df2117c1ecfad0c32711484c540f6949dace10dbb1e7fe5e8040c71
Deleted: sha256:e0f2864d8ad8234bf233bd3848be65e7a7358f2cfb3cd7e2792ca2c4c6aefc6f
Deleted: sha256:4bcdffd70da292293d059d2435c7056711fab2655f8b74f48ad0abe042b63687
Docker logs anything written to STDOUT or STDERR from inside a container. The logging method is configurable. By default, Docker logs to a per-container JSON file.
$ docker logs --since 1h 1bd4c783ad93
$ docker logs --follow 1bd4c783ad93
Docker saves the JSON files in /var/lib/docker/containers/mycontainer/
.
With long-running or chatty containers, the defaults may be inadequate.
For example, Docker does not rotate logs by default, although docker run
has the options --log-opt max-size
and --log-opt max-file
.
Other supported logging mechanisms include syslog and journald.
See --log-driver
in docker-run(1)
.
Mostly in /var/lib/docker/
.
And see:
$ docker container ls
The daemon gets most (all?) of its config as command-line arguments. On systemd boxes, the command invocation happens in the service file. If we want to customize the service (on Debian):
# cp /lib/systemd/system/docker.service /etc/systemd/system/
# vim /etc/systemd/system/docker.service
# systemctl daemon-reload
# systemctl restart docker.service
The default docker.service
file only sets the method of client communication.
With -H fd://
, Docker expects the process that spawned it (i.e., systemd) to hand it an already-activated socket.
When a container starts, Docker copies various files from the host (hostname
, hosts
, resolv.conf
) to /var/lib/docker/containers/mycontainer/
, then bind mounts them into the container.
Override or augment this behavior with arguments to docker run
— --hostname
, --dns
, --dns-search
, --add-host
.
Docker allocates CPU in terms of “shares”, with 1024 total shares representing the whole available pool of CPU.
A container allocated 512 shares can use half the total CPU resources, for example.
Configure this with the --cpu-shares
argument to docker run
.
Constrain memory with -m
, like docker run -m 1g …
.
This allocates RAM and a matching amount of swap.
Set swap separately with --memory-swap
.
Constrain IO like --blkio-weight=500
.
Use a value between 10 and 1000 (default 500).
These constraints are enforced by cgroups.
It’s possible to adjust the constraints of a running container.
See docker-update(1)
.
By default, no.
Set --restart
like --restart="on-failure:3"
docker stop
doesn’t end a misbehaving container?$ docker kill 1bd4c783ad93
Just like the system kill
, docker-kill
can send other signals with --signal=HUP
or whatever.
docker rm
or docker rmi
.
$ docker info 75625e1f51a0
But what going on right now?
This is like top
for running containers:
$ docker stats
$ docker exec -t -i 75625e1f51a0 /bin/bash
It’s also possible to use nsenter
to directly break into the container namespace from the host.
Ctrl-p Ctrl-q
We might expect to create a namespace like ip netns add foo
, then run the container like docker run --netns=foo
.
That doesn’t work.
The next-best thing is to create the namespace like docker network create foo
, and then docker run --network=foo
.
However, ip netns list
will not include foo
.
Why?
ip netns list
looks for files in /run/netns/
.
docker network create
deletes its files from /run/netns/
, so ip netns list
isn’t aware of them.
We could, if we had any reason to, expose a docker-network
namespace by re-linking the /proc/$PID/ns/net
file into /run/netns/
It’s also possible to start a container with --network=none
and afterwards attach it to a network namespace with a veth pair.
Use git pull
.
Grab the new image version, tear down the old container, and spin up a new container with the new image.
$ docker pull theimage
$ docker stop mycontainer
$ docker rm mycontainer
$ docker run -d --restart unless-stopped --name mycontainer theimage
The atomic host concept involves a light-weight container-supervisor OS — a minimal, immutable OS image. The host configuration comes from the network — e.g., by cloud-init and OSTree. To update the host, simply swap out that OS image atomically, and let the new instance pull down its config from the network again.
Project Atomic is a Red Hat-based atomic host project. CoreOS and RancherOS are similar.
CentOS has Atomic Host builds available as ISO for bare-metal install, Amazon AMI image, and QCOW2 image for KVM.
Many technologies come together in Project Atomic:
Before spinning up our first Atomic host, we need cloud-init in place to handle early initialization of the instance. Cloud-init does things like:
~/.ssh/authorized_keys
Cloud-init is not magic. Essentially, it creates config files in an ISO image that gets attached to a booting Atomic Host virtual machine.
See https://paulgorman.org/technical/cloud-init.txt.html.
UPDATE: As of 2018, Red Hat acquired CoreOS, leaving the future of how CoreOS will merge with Project Atomic uncertain.
I expect that over the next year or so, Fedora Atomic Host will be replaced by a new thing combining the best from Container Linux and Project Atomic. This new thing will be “Fedora CoreOS” and serve as the upstream to Red Hat CoreOS.
https://lwn.net/Articles/757878/
Project Atomic is an umbrella project consisting of two flavors of Atomic Host (Fedora and CentOS) as well as various other container-related projects. Project Atomic as a project name will be sunset by the end of 2018 with a stronger individual focus on its successful projects such as Buildah and Cockpit.
https://coreos.fedoraproject.org/