paulgorman.org/technical

Linux LXC

(Written in 2016, and revised in 2017. Tested on Debian Stretch.)

LXC is light-weight kernel-based containers on linux, similar to FreeBSD jails.

See lxc(7), which is pretty useful, as linux man pages go. And https://linuxcontainers.org/lxc/manpages/

# apt-get install lxc
# lxc-checkconfig

In the examples below, we assume the container host has the IP address 10.0.0.99. Furthermore, we assume a NAT bridge of 10.100.0.0/24 and a guest container at 10.100.0.10.

Networking for Containers

There are two different ways to network containers:

Create either a host bridge (if the containers will have a public IP addresses) or a NAT bridge (if the containers will hide behind the host’s IP address). If using a NAT bridge with lxc-net, be sure to read the notes below about firewalling.

UPDATE 2019 on Debian Buster

Some of the config options have changed. Notably lxc.network.foo options have become lxc.net.0.foo, like:

lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br0

NAT Bridge

lxc-net.service handles the NAT bridge.

Edit /etc/lxc/default.conf:

lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx

Create /etc/default/lxc-net:

USE_LXC_BRIDGE="true"
LXC_BRIDGE="lxcbr0"
LXC_ADDR="10.100.0.1"
LXC_NETMASK="255.255.255.0"
LXC_NETWORK="10.100.0.0/24"
LXC_DHCP_RANGE="10.100.0.101,10.100.0.249"
LXC_DHCP_MAX="148"
LXC_DHCP_CONFILE="/etc/lxc/dnsmasq.conf"
LXC_DOMAIN=""

And do:

# touch /etc/lxc/dnsmasq.conf
# systemctl enable lxc-net
# systemctl start lxc-net

Host Bridge

To create a host bridge, edit /etc/network/interfaces:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual

auto br0
iface br0 inet static
	address 10.0.0.99
	network 10.0.0.0
	netmask 255.255.255.0
	gateway 10.0.0.1
	bridge_ports eth0
	bridge_fd 9
	bridge_hello 2
	bridge_maxage 12
	bride_stp off

… and bring it up: sudo ifup br0.

Edit /etc/lxc/default.conf:

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0

https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html#lbAM

iptables for LXC Containers

TL;DR — It’s safest to the set firewall rules on the container’s *filter table INPUT chain.

Whether the containers hook up to a NAT bridge or a host bridge, pay careful attention to the host’s rules. In either case, packets directed to a container never hit rules in the host’s *filter table INPUT chain; rules for traffic destined to the host do not protect containers.

Creating firewall rules for containers can be done in several ways. The most straightforward way is to filter traffic in the container’s *filter table INPUT chain. That’s a good way to do it.

However, before traffic hits the container, it passes through the host’s *filter table FORWARD chain. It’s possible to filter traffic there. FORWARD rules are more straightforward for containers networked via a host bridge than for containers NAT’d with lxc-net. When it starts, the lxc-net service creates the following rule on the host’s *filter table FORWARD chain:

-I FORWARD -o lxcbr0 -j ACCEPT

Since lxc-net inserts this rule at the head of the chain, it effectively nullifies any further restrictions added to FORWARD. Boo. If we wanted to add rules on FORWARD and still use lxc-net we’d need minimally to do the following (but it’s a fragile hack):

# iptables -D FORWARD -o lxcbr0 -j ACCEPT
# sed -i '/iptables $use_iptables_lock -I FORWARD -o ${LXC_BRIDGE} -j ACCEPT/d' /usr/lib/x86_64-linux-gnu/lxc/lxc-net

When using lxc-net, don’t bother writing FORWARD rules; filter in the container, or create very selective NAT rules. Creating very specific NAT rules in the PREROUTING chain of the *nat table is reasonable, since we need to NAT service ports to our container anyhow.

So, when firewalling an lxc-net NAT bridge, do one of the following:

When firewalling a container on a host bridge, do one of the following:

Basic lxc-net iptables rules

The lxc-net service is supposed to create these rules, but sometimes it doesn’t. (A systemd concurrency iptables locking issue??) These are minimal rules, plus the example of NAT’ing port 443 to the container at 10.100.0.10

# iptables -A FORWARD -o lxcbr0 -j ACCEPT -m comment --comment "Note the WARNING below!"
# iptables -A FORWARD -i lxcbr0 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p tcp -m tcp --dport 53 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p udp -m udp --dport 53 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p tcp -m tcp --dport 67 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p udp -m udp --dport 67 -j ACCEPT
# iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
# iptables -t nat -A POSTROUTING -s 10.100.0.0/24 ! -d 10.100.0.0/24 -j MASQUERADE
# iptables -t nat -A PREROUTING -d 10.0.0.99/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.100.0.10

Or, as /etc/iptables/rules.v4 for iptables-restore:

*mangle
-A POSTROUTING -o lxcbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
*nat
-A PREROUTING -d 10.0.0.99/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.100.0.10
-A POSTROUTING -s 10.100.0.0/24 ! -d 10.100.0.0/24 -j MASQUERADE
COMMIT
*filter
:INPUT DROP [17:984]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [71:11664]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -i lxcbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i lxcbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i lxcbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i lxcbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp --dport 53 -j ACCEPT
-A INPUT -p udp --dport 53 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A FORWARD -o lxcbr0 -j ACCEPT
-A FORWARD -i lxcbr0 -j ACCEPT
COMMIT

Creating containers

Create a container:

# lxc-create -n mycontainer -t debian -- -r jessie
# lxc-create -n test-alpine -t alpine -- -r v3.6

See the templates located in /usr/share/lxc/templates. Download updated/additional templates like:

# lxc-create -t download -n test

(This may require opening of outbound TCP port 11371 to allow access to PGP key servers.)

Alpine containers are much smaller than Debian containers:

# du -sh /var/lib/lxc/test-alpine/ /var/lib/lxc/test-debian/
6.5M    /var/lib/lxc/test-alpine/
272M    /var/lib/lxc/test-debian/

Destroy/delete a container:

# lxc-destroy -n mycontainer

Using containers

List containers:

# lxc-ls -f
# lxc-ls --fancy

Get info about a container:

# lxc-info -n mycontainer

Start a container:

# lxc-start -n mycontainer

Stop a container:

# lxc-stop -n mycontainer

Connect to the container console:

# lxc-console -n mycontainer

(Ctrl-a q to exit console. Or Ctrl-a Ctrl-a q if we’re already in a tmux/screen session.)

Connect to container as root:

# lxc-attach -n mycontainer

(lxc-attach apparently doesn’t count as a login shell. Do su -.)

Run a command inside the container:

# lxc-attach -n mycontainer -- sh -c 'mysqldump -uroot --all-databases --events > /var/backups/mysql-backup.sql'

To start the container when the host boots, add this to the container’s config (e.g. /var/lib/lxc/mycontainer/config):

lxc.start.auto = 1

See lxc.container.conf(5) for container configuration options.

Host filesystems can be mounted inside the contain adding the following to /var/lib/lxc/mycontainer/config (in more-or-less fstab(5) format):

lxc.mount.entry=/path/in/host/mount_point /var/lib/lxc/mycontainer/rootfs/mount_point none ro,bind 0 0

Create the directory/mount point in the container first, or starting the container will fail!

Container Backup and Restore (even to another host)

On the originating host:

# lxc-stop -n mycontainer
# cd /var/lib/lxc/mycontainer
# tar --numeric-owner -czf mycontainer.tgz /var/lib/lxc/mycontainer
# scp mycontainer.tgz newserver:

On the new server:

# mkdir /var/lib/lxc/mycontainer
# tar --numeric-owner -xzf mycontainer.tgz -C /var/lib/lxc/mycontainer

It is possible to use LVM as a backing store for lxc, and to use lvm snapshots. Consistent live snapshots depend on the filesystem (e.g. lvm, zfs).

See also lxc-copy(1).

Cgroups: Measuring and Limiting

# lxc-info -n mycontainer -p

…gives us the PID of a container and the cgroups to which it belongs. With the PID, we can look in /proc/PID/ to find info.

% cat /proc/12945/cgroup

…reveals the cgroup of our container, which probably looks like “/lxc/mycontainer”. With the cgroup name, we can poke around /sys/fs/cgroup/*/lxc/mycontainer/*.stat.

% cat /sys/fs/cgroup/cpuacct/lxc/test/cpu.stat
% cat /sys/fs/cgroup/memory/lxc/test/memory.stat
% cat /sys/fs/cgroup/blkio/lxc/test/blkio.io_service_bytes

Add cgroup usage restrictions to /var/lib/lxc/mycontainer/config, like:

lxc.cgroup.memory.soft_limit_in_bytes = 256M
lxc.cgroup.memory.limit_in_bytes = 512M
lxc.cgroup.memory.memsw.limit_in_bytes = 1G
lxc.cgroup.blkio.weight = 200
lxc.cgroup.cpu.shares = 200

These setting persist, but only take effect after restarting the container. Immediately but impermanently set cgroup controls like:

# lxc-cgroup -n mycontainer memory.limit_in_bytes 512M

“cpu.shares” defaults to 1024. “blkio.weight” defaults to 1000. See systemd.resource-control(5) for more.

top

Filter top output by cgroup. See top(1). In top, hit f to select fields, and toggle display (d) of the field “CGNAME”. Filter by hitting o, and entering something like:

CGNAME=systemd:/lxc/mycontainer

= removes the filter.

Example Configuration

On host:

$ sudo sh -c "echo 'lxc.network.name = eth0' >> /var/lib/lxc/mycontainer/config"
$ sudo sh -c "echo 'lxc.network.ipv4 = 10.100.0.10/24' >> /var/lib/lxc/mycontainer/config"
$ sudo sh -c "echo 'lxc.network.ipv4.gateway = 10.100.0.1' >> /var/lib/lxc/mycontainer/config"

Assuming we’re using NAT bridging:

$ sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -j DNAT --to 10.100.0.10:8000
$ sudo iptables -t nat -vL

In container:

# sed -in 's/dhcp/manual/' /etc/network/interfaces
# apt update
# apt install inetutils-ping netcat-openbsd

# apt-get clean

If we’ve specified LXC_DHCP_CONFILE, it’s easier to reserve a DHCP address than to set a static one. Edit /etc/lxc/dnsmasq.conf:

dhcp-host=mycontainer,10.100.0.10

This depends on the hostname inside the container, not just the LXC name or directory name. If the container doesn’t have the hostname “mycontainer” it may not match the reservation in dnsmasq.conf.

And maybe:

# systemctl restart lxc-net

Proxy ARP

[THIS SECTION IS UNTESTED AND LIKELY INCOMPLETE.]

In certain environments, notably Amazon EC2, only one public MAC address may be available. For our container to use a public IP address (non-NAT) despite this, use proxy ARP. With proxy ARP, the host acts as a router, translating ARP requests for the container(s).

# echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
# echo 1 > /proc/sys/net/ipv4/ip_forward
# ip route add 10.100.0.10 dev eth0

(Remember to add the proxy_arp and ip_forward settings to the host’s /etc/sysctl.conf for them to persist.)

Or possibly:

echo 1 > /proc/sys/net/ipv4/conf/br0/forwarding
echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp_pvlan
echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp
echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
echo 0 > /proc/sys/net/ipv4/conf/br0/send_redirects

Systemd issue with /dev ??

[THIS MAY ONLY BE AN ISSUE WITH wheezy AND OLDER.]

There may or may not be some issue with running systemd in a container. It seems to involve conflicts in /dev, and may be avoided by enabling autodev mode int /etc/lxc/default.conf:

lxc.autodev = 1
lxc.kmsg = 0