Docker
============================================================================

(July 2017, Feb 2018)

Docker is a container system.
Docker containers are single-app rather than full system containers.
In fact a Docker container more closely resembles a single, isolated process (albeit ones that may spawn child processes) than traditional virtual machines.
A container might spin up only to handle one or a handful of requests before exiting.
Docker best matches applications that either are stateless or keep their state in an external store, like a database outside Docker.

Docker containers are light-weight, and operate using standard Linux technologies like cgroups and namespaces.
Beyond neatly bundling existing technologies, Docker adds a powerful API for container administration.

Docker has a Docker Server that hosts containers.
One Docker server/daemon instance runs on a box, and manages multiple containers.
The same binary provides both server and client.†
The client sends commands to the server.
Optionally, a third component, the Docker Registry, stores images and metadata.
A registry may service multiple Docker daemons on multiple hosts.
There is an central public Docker registry, but also supports private registries.

The Docker client speaks to the server in one of three ways: via Unix socket, via unencrypted TCP (port 2375), or encrypted TCP (port 2376).
On systemd boxes, the default is a Unix socket, but one controlled by systemd socket activation [[1]](http://0pointer.de/blog/projects/socket-activation.html) [[2]](https://stackoverflow.com/questions/43303507/what-does-fd-mean-exactly-in-dockerd-h-fd/43408869) (try `systemctl status docker.socket`).

Docker images use a "union" file system that starts each container instance from a known state.
I.e., changes that occur in a container spun up from a master image do not affect new containers that use the same base.
When a running container modifies a file, the change affects only a read-write layer that sits on top of and masks the base image.
A Docker image may be comprised of numerous stacked (mostly read-only) layers that nevertheless present a coherent image to an instance that uses it.
To an instance, the file system appears writable, but changes don't affect the underlying layers.
The union file system uses copy-on-write for efficiency.

So these are the core Docker things:

- **Docker server** that launches and tears down containers
- **Docker client** that sends commands to a Docker server (client is same binary as server)
- **Docker images** contain all the files to launch a container, consisting of union filesystem layers
- **Docker container** a running instance launched from an image
- **Atomic host** a minimal OS for running containers

† Although one binary provides all functionality, various manual pages split the documentation.
The `dockerd(8)` page covers daemon mode, for example, while the `docker-run(1)` page covers running containers.


Installation
----------------------------------------------------------------------------

The Docker project provides packages and good directions for several Linux distributions, including Debian and CentOS.

- https://docs.docker.com/install/linux/docker-ce/debian/
- https://docs.docker.com/install/linux/docker-ce/centos/

Verify the installation:

	# docker run hello-world
	# docker ps -a
	# docker images -a

	# docker pull fedora
	# docker run -it fedora /bin/bash
	[root@2ab31fa5597a /]# dnf update
	[root@2ab31fa5597a /]# dnf install asterisk

	# docker run -it alpine ash

	# docker run -it --rm busybox:glibc

For development work, it's handy to add your user account to the `docker` group:

	#  usermod -a -G docker paulgorman

---

**The Docker install fails**, and `journalctl -xe` shows:

	dockerd[18376]: Error starting daemon: Error initializing network controller: list bridge addresses failed: no available network

Fix this by adding a bridge for Docker:

```
#  ip link add name docker0 type bridge
#  ip addr add dev docker0 172.17.0.1/16
#  install docker-ce
```

The Docker daemon creates a number of `iptables` rules.
It also sets the default policy on the `FORWARD` chain to `DENY`.
If we, for example, want KVM guests to have unfettered access to `br0`, do something like:

```
#  iptables -I FORWARD -i br0 -o br0 -j ACCEPT
```

---


Tooling
----------------------------------------------------------------------------

A lot of auxiliary and third-party tooling is available, much of it focusing on orchestration and cluster management.
The `docker` binary itself provides core container management tooling in two ways:

- as a command-line tool, `docker` lets us do things like:
	- pull/push images from/to a registry
	- build container images
	- view Docker logs, even on a remote host
	- open a shell inside a container, even on a remote host
- the `docker` daemon offers a powerful, flexible, and well-documented [API](https://docs.docker.com/develop/sdk/examples/)


Networking
----------------------------------------------------------------------------

By default, Docker isolates the containers to a bridge on the host called `docker0`.
Containers on the same host can talk to each other, but not to the external/real network.
Forward host ports into the bridge to connect containers to the outside.

See the `--bridge` flag in `dockerd(8)` to specify a non-default bridge, or to disable networking altogether.
Disable inter-container communication with `--icc=false` option.


Persistent Data
----------------------------------------------------------------------------

Transience and consistency is a selling point of containers, certainly of Docker containers.
However, sometimes we need to persist data beyond one run of the container, or to share changes between running instances.

Through the years, Docker has offered various solutions for persistence.
Initially, these included injecting data into the container at launch or mounting volumes over NFS.
Those solutions proved inadequate.
Docker 1.8 and earlier advocated "data only containers" — barebones containers that do nothing besides exposing a data volume.
Docker 1.9 and above shifts the recommendation to the "volume" API:

	$ docker volume create --name my_data
	$ docker run -d -v my_data:/container/path/for/volume container_image my_command
	$ docker volume ls
	$ docker volume inspect volume_name
	$ docker volume ls -f dangling=true
	$ docker volume rm my_unwanted_data

Data volumes avoid the copy-on-write mechanism of regular containers, expose the data for management by the host, and can be accessed by multiple instances.
The data exists outside Docker's union file system.
Note that this may cause challenges, e.g., file locking.

- https://docs.docker.com/storage/volumes/
- https://docs.docker.com/engine/admin/volumes/volumes/
- https://docs.docker.com/storage/

Another aspect of persistence is application configuration data.
Ideally, a Dockerized application gets all its configuration in the form of environment variables passed to it by the container.

Note that, because of the overlay filesystem, writes inside the container perform poorly, so extensive write are discouraged, even if we don't care about persisting them.

Logs are not written inside the container.
See "Docker Logs" below.


Union File System
----------------------------------------------------------------------------

Docker containers use a union file system (like [union mounts from Plan 9](https://en.wikipedia.org/wiki/Union_mount), or a bit like [qcow2 sparse images](https://paulgorman.org/technical/linux-virtualization.txt.html#qcow2-files)).
This makes new container instances very cheap to create — in terms of disk space, a new container might only add a few KB on top of the space used by the underlying image.

Changes to an image accumulate in thin layers, presenting a "union" of the layers as one coherent file system to the container.


Dockerfiles
----------------------------------------------------------------------------

A Dockerfile specifies how to build a container.
See `docker-build(1)`.

	FROM fedora:23

	MAINTAINER Paul Gorman <paul@example.com>

	LABEL "thing"="important note" "another"="reference this later"

	ENV astuser

	RUN dnf update && dnf install asterisk && dnf clean all

	ADD ./*.conf /etc/asterisk/

	EXPOSE 5060-5061/tcp
	EXPOSE 10000-20000/udp

	USER $astuser

	CMD ["/usr/sbin/asterisk"]

Assuming 'Dockerfile' is in our current directory, build the container with:

	$  docker build --tag "my_build" .

By default, Docker runs process in the container as "root", unless changed with the "USER" instruction.
Don't run productions containers as "root".
(Even though the container provides some isolation, the container still uses the host kernel, where we don't want it mucking around as root!)

Because each command in the dockerfile adds a layer to the union filesystem, it's good to combine lines like the `dnf` commands above.
If the setup instructions are extensive enough to create a lot of layers, consider pulling down a shell script with `ADD` and feeding it to `RUN` instead of including all the setup directly in the dockerfile.

The `CMD` instruction sets what gets run when the container starts.
The dockerfile only contains one `CMD` (or, anyhow, only the last one actually happens).

Run our newly-built container:

	$  docker run -d -p 8080:8080 my_build

`docker run` is a convenience wrapper that masks two command: first `docker create` to construct the image, and then `docker start` to start the container.

By default, Docker gives the container a name like `peaceful_blackwell` (i.e., `adjective_famousname`).
Override the default like `docker run --name "mycontainer"`.

"Tags" name image builds.
"Names" name container instances.
Container names must be unique per Docker host.

Labels let us apply arbitrary key/value metadata to images or individual containers.
Add labels during image creation and/or at container runtime.
See the labels on a container with `docker inspect peaceful_blackwell`.
Search containers for labels like:

	$  docker ps -a -f label=deployer=Paul


Building a Docker Image
----------------------------------------------------------------------------

To build an image, feed a dockerfile to the `docker` tool with the `build` flag.
Each command in the dockerfile generates an additional layer on top of the image, so it's easy to understand how Docker composes the image.

See `docker-build(1)`.


Managing Images and Containers
----------------------------------------------------------------------------

Delete an unwanted container, then remove the image for that container:

	# docker images -a
	REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
	rocketchat/rocket.chat   latest              2cd49c2c326d        8 days ago          769MB
	busybox                  glibc               2bbf44aed9f8        7 weeks ago         4.4MB
	busybox                  latest              5b0d59026729        7 weeks ago         1.15MB
	alpine                   latest              3fd9065eaf02        2 months ago        4.14MB
	hello-world              latest              f2a91732366c        3 months ago        1.85kB
	# docker rmi rocketchat/rocket.chat
	Error response from daemon: conflict: unable to remove repository reference "rocketchat/rocket.chat" (must force) - container df3f9b43f8c7 is using its referenced image 2cd49c2c326d
	# docker rm df3f9b43f8c7
	df3f9b43f8c7
	# docker rmi rocketchat/rocket.chat
	Untagged: rocketchat/rocket.chat:latest
	Untagged: rocketchat/rocket.chat@sha256:061dcb056431eccc6f7dce1e7ea400ccd31278dea2181c558b9c891bf3f0e141
	Deleted: sha256:2cd49c2c326d8361fb8333db65e9bd0c551fb36ae3b64e2d8e534da8f5a4aafd
	Deleted: sha256:d2ae5a7ae8b0a9526e20fbd8a4956ceffe5396934306d2f2736dcb3706eb327b
	Deleted: sha256:7f11c62ba8995a6b7692fb3ff3a501984b068d2b454e409ff564b390ffb81903
	Deleted: sha256:04f266c56df2117c1ecfad0c32711484c540f6949dace10dbb1e7fe5e8040c71
	Deleted: sha256:e0f2864d8ad8234bf233bd3848be65e7a7358f2cfb3cd7e2792ca2c4c6aefc6f
	Deleted: sha256:4bcdffd70da292293d059d2435c7056711fab2655f8b74f48ad0abe042b63687


Docker Logs
----------------------------------------------------------------------------

Docker logs anything written to STDOUT or STDERR from inside a container.
The logging method is configurable.
By default, Docker logs to a per-container JSON file.

	$  docker logs --since 1h 1bd4c783ad93
	$  docker logs --follow 1bd4c783ad93

Docker saves the JSON files in `/var/lib/docker/containers/mycontainer/`.
With long-running or chatty containers, the defaults may be inadequate.
For example, Docker does not rotate logs by default, although `docker run` has the options `--log-opt max-size` and `--log-opt max-file`.

Other supported logging mechanisms include syslog and journald.
See `--log-driver` in `docker-run(1)`.


Q & A
----------------------------------------------------------------------------

### Where does Docker store stuff?

Mostly in `/var/lib/docker/`.

And see:

	$  docker container ls


### How is Docker itself configured? How do we set where it listens for client connections?

The daemon gets most (all?) of its config as command-line arguments.
On systemd boxes, the command invocation happens in the service file.
If we want to customize the service (on Debian):

	#  cp /lib/systemd/system/docker.service /etc/systemd/system/
	#  vim /etc/systemd/system/docker.service
	#  systemctl daemon-reload
	#  systemctl restart docker.service

The default `docker.service` file only sets the method of client communication.
With `-H fd://`, Docker expects the process that spawned it (i.e., systemd) to hand it an already-activated socket.


### How does a container know which DNS server to use, etc.?

When a container starts, Docker copies various files from the host (`hostname`, `hosts`, `resolv.conf`) to `/var/lib/docker/containers/mycontainer/`, then bind mounts them into the container.
Override or augment this behavior with arguments to `docker run` — `--hostname`, `--dns`, `--dns-search`, `--add-host`.


### How do we constrain resources used by a container?

Docker allocates CPU in terms of "shares", with 1024 total shares representing the whole available pool of CPU.
A container allocated 512 shares can use half the total CPU resources, for example.
Configure this with the `--cpu-shares` argument to `docker run`.

Constrain memory with `-m`, like `docker run -m 1g …`.
This allocates RAM _and_ a matching amount of swap.
Set swap separately with `--memory-swap`.

Constrain IO like `--blkio-weight=500`.
Use a value between 10 and 1000 (default 500).

These constraints are enforced by cgroups.

It's possible to adjust the constraints of a running container.
See `docker-update(1)`.


### Will a container automatically restart?

By default, no.
Set `--restart` like `--restart="on-failure:3"`


### What if `docker stop` doesn't end a misbehaving container?

	$  docker kill 1bd4c783ad93

Just like the system `kill`, `docker-kill` can send other signals with `--signal=HUP` or whatever.


### How do we get rid of unwanted containers and images?

`docker rm` or `docker rmi`.


### What's up with this container?

	$  docker info 75625e1f51a0

But what going on _right now_?
This is like `top` for running containers:

	$  docker stats


### How do we open a shell in a running container?

	$  docker exec -t -i 75625e1f51a0 /bin/bash

It's also possible to use `nsenter` to directly break into the container namespace from the host.


### How do we disconnect from a container's shell without letting the container die?

Ctrl-p Ctrl-q


### What if we want multiple containers to share a custom network namespace?

We might expect to create a namespace like `ip netns add foo`, then run the container like `docker run --netns=foo`.
That doesn't work.

The next-best thing is to create the namespace like `docker network create foo`, and then `docker run --network=foo`.
However, `ip netns list` will not include `foo`.
Why?
`ip netns list` looks for files in `/run/netns/`.
`docker network create` deletes its files from `/run/netns/`, so `ip netns list` isn't aware of them.
We _could_, if we had any reason to, expose a `docker-network` namespace by re-linking the `/proc/$PID/ns/net` file into `/run/netns/`

It's also possible to start a container with `--network=none` and afterwards attach it to a network namespace with a veth pair.


### How do we update our existing container?

Use `git pull`.
Grab the new image version, tear down the old container, and spin up a new container with the new image.

```
$  docker pull theimage
$  docker stop mycontainer
$  docker rm mycontainer
$  docker run -d --restart unless-stopped --name mycontainer theimage
```


Atomic Hosts
----------------------------------------------------------------------------

The atomic host concept involves a light-weight container-supervisor OS — a minimal, immutable OS image.
The host configuration comes from the network — e.g., by cloud-init and OSTree.
To update the host, simply swap out that OS image atomically, and let the new instance pull down its config from the network again.

Project Atomic is a Red Hat-based atomic host project.
CoreOS and RancherOS are similar.

http://www.projectatomic.io/

CentOS has Atomic Host builds available as ISO for bare-metal install, Amazon AMI image, and QCOW2 image for KVM.

Many technologies come together in Project Atomic:

- Docker for containers
- Flannel to make an overlay network for inter-container communication
- OSTree for host managment
- etcd as a key-value store for state and configuration
- Kubernetes for orchestration (grouping containers from multiple hosts into collective "pods" that comprise an application)


### cloud-init ###

Before spinning up our first Atomic host, we need cloud-init in place to handle early initialization of the instance.
Cloud-init does things like:

- set default locale
- set instance hostname
- generate ssh keys
- add ssh keys to `~/.ssh/authorized_keys`
- set up ephemeral mounts points

Cloud-init is not magic.
Essentially, it creates config files in an ISO image that gets attached to a booting Atomic Host virtual machine.

See https://paulgorman.org/technical/cloud-init.txt.html.

---

**UPDATE:** As of 2018, Red Hat acquired CoreOS, leaving the future of how CoreOS will merge with Project Atomic uncertain.

> I expect that over the next year or so, Fedora Atomic Host will be replaced by a new thing combining the best from Container Linux and Project Atomic. This new thing will be “Fedora CoreOS” and serve as the upstream to Red Hat CoreOS.

https://lwn.net/Articles/757878/

> Project Atomic is an umbrella project consisting of two flavors of Atomic Host (Fedora and CentOS) as well as various other container-related projects. Project Atomic as a project name will be sunset by the end of 2018 with a stronger individual focus on its successful projects such as Buildah and Cockpit.

https://coreos.fedoraproject.org/

---


Links
----------------------------------------------------------------------------

- https://docs.docker.com/get-started/
- https://docs.docker.com/
- https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_containers/
- https://stackoverflow.com/questions/18496940/how-to-deal-with-persistent-storage-e-g-databases-in-docker
- https://www.networkcomputing.com/storage/docker-containers-and-persistent-storage-4-options/1320691891
- https://wiki.debian.org/Docker
- http://docker-saigon.github.io/post/Docker-Internals/
- https://accelazh.github.io/docker/Play-with-Docker-Network