Linux LXC ========= (Written in 2016, and revised in 2017. Tested on Debian Stretch.) LXC is light-weight kernel-based containers on linux, similar to FreeBSD jails. See lxc(7), which is pretty useful, as linux man pages go. And https://linuxcontainers.org/lxc/manpages/ # apt-get install lxc # lxc-checkconfig In the examples below, we assume the container host has the IP address 10.0.0.99. Furthermore, we assume a NAT bridge of 10.100.0.0/24 and a guest container at 10.100.0.10. ## Networking for Containers ## There are two different ways to network containers: - the containers get public IP addresses, and connect through a traditional bridge (`bridge-utils`, `br0`, etc.) - the containers DON'T get public IP addresses; the `lxc-net` service sets up a bridge (`lxcbr0`), NAT-ing, and DHCP for containers Create either a host bridge (if the containers will have a public IP addresses) or a NAT bridge (if the containers will hide behind the host's IP address). If using a NAT bridge with `lxc-net`, be sure to read the notes below about firewalling. **UPDATE 2019 on Debian Buster** Some of the config options have changed. Notably `lxc.network.foo` options have become `lxc.net.0.foo`, like: ``` lxc.net.0.type = veth lxc.net.0.flags = up lxc.net.0.link = br0 ``` ### NAT Bridge ### `lxc-net.service` handles the NAT bridge. Edit `/etc/lxc/default.conf`: lxc.network.type = veth lxc.network.link = lxcbr0 lxc.network.flags = up lxc.network.hwaddr = 00:16:3e:xx:xx:xx Create `/etc/default/lxc-net`: USE_LXC_BRIDGE="true" LXC_BRIDGE="lxcbr0" LXC_ADDR="10.100.0.1" LXC_NETMASK="255.255.255.0" LXC_NETWORK="10.100.0.0/24" LXC_DHCP_RANGE="10.100.0.101,10.100.0.249" LXC_DHCP_MAX="148" LXC_DHCP_CONFILE="/etc/lxc/dnsmasq.conf" LXC_DOMAIN="" And do: # touch /etc/lxc/dnsmasq.conf # systemctl enable lxc-net # systemctl start lxc-net ### Host Bridge ### To create a host bridge, edit `/etc/network/interfaces`: auto lo iface lo inet loopback auto eth0 iface eth0 inet manual auto br0 iface br0 inet static address 10.0.0.99 network 10.0.0.0 netmask 255.255.255.0 gateway 10.0.0.1 bridge_ports eth0 bridge_fd 9 bridge_hello 2 bridge_maxage 12 bride_stp off ... and bring it up: `sudo ifup br0`. Edit `/etc/lxc/default.conf`: lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html#lbAM ### iptables for LXC Containers ### TL;DR --- It's safest to the set firewall rules on the container's *filter table INPUT chain. Whether the containers hook up to a NAT bridge or a host bridge, pay careful attention to the host's rules. In either case, packets directed to a container never hit rules in the host's *filter table INPUT chain; rules for traffic destined to the host do not protect containers. Creating firewall rules for containers can be done in several ways. The most straightforward way is to filter traffic in the container's *filter table INPUT chain. That's a good way to do it. However, before traffic hits the container, it passes through the host's *filter table FORWARD chain. It's possible to filter traffic there. FORWARD rules are more straightforward for containers networked via a host bridge than for containers NAT'd with `lxc-net`. When it starts, the `lxc-net` service creates the following rule on the host's *filter table FORWARD chain: -I FORWARD -o lxcbr0 -j ACCEPT Since `lxc-net` _inserts_ this rule at the head of the chain, it effectively nullifies any further restrictions added to FORWARD. Boo. If we wanted to add rules on FORWARD and still use `lxc-net` we'd need minimally to do the following (but it's a fragile hack): # iptables -D FORWARD -o lxcbr0 -j ACCEPT # sed -i '/iptables $use_iptables_lock -I FORWARD -o ${LXC_BRIDGE} -j ACCEPT/d' /usr/lib/x86_64-linux-gnu/lxc/lxc-net When using `lxc-net`, don't bother writing FORWARD rules; filter in the container, or create _very selective_ NAT rules. Creating very specific NAT rules in the PREROUTING chain of the *nat table is reasonable, since we need to NAT service ports to our container anyhow. So, when firewalling an `lxc-net` NAT bridge, do one of the following: - Set rules in the container's *filter INPUT chain - Set very specific NAT rules in the host's *nat PREROUTING chain When firewalling a container on a host bridge, do one of the following: - Set rules in the container's *filter INPUT chain - Set rules in the host's *filter FORWARD chain #### Basic lxc-net iptables rules #### The `lxc-net` service is supposed to create these rules, but sometimes it doesn't. (A systemd concurrency iptables locking issue??) These are minimal rules, plus the example of NAT'ing port 443 to the container at 10.100.0.10 # iptables -A FORWARD -o lxcbr0 -j ACCEPT -m comment --comment "Note the WARNING below!" # iptables -A FORWARD -i lxcbr0 -j ACCEPT # iptables -A INPUT -i lxcbr0 -p tcp -m tcp --dport 53 -j ACCEPT # iptables -A INPUT -i lxcbr0 -p udp -m udp --dport 53 -j ACCEPT # iptables -A INPUT -i lxcbr0 -p tcp -m tcp --dport 67 -j ACCEPT # iptables -A INPUT -i lxcbr0 -p udp -m udp --dport 67 -j ACCEPT # iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill # iptables -t nat -A POSTROUTING -s 10.100.0.0/24 ! -d 10.100.0.0/24 -j MASQUERADE # iptables -t nat -A PREROUTING -d 10.0.0.99/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.100.0.10 Or, as `/etc/iptables/rules.v4` for iptables-restore: *mangle -A POSTROUTING -o lxcbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill COMMIT *nat -A PREROUTING -d 10.0.0.99/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.100.0.10 -A POSTROUTING -s 10.100.0.0/24 ! -d 10.100.0.0/24 -j MASQUERADE COMMIT *filter :INPUT DROP [17:984] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [71:11664] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -i lxcbr0 -p tcp -m tcp --dport 53 -j ACCEPT -A INPUT -i lxcbr0 -p udp -m udp --dport 53 -j ACCEPT -A INPUT -i lxcbr0 -p tcp -m tcp --dport 67 -j ACCEPT -A INPUT -i lxcbr0 -p udp -m udp --dport 67 -j ACCEPT -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp --dport 53 -j ACCEPT -A INPUT -p udp --dport 53 -j ACCEPT -A INPUT -i lo -j ACCEPT -A FORWARD -o lxcbr0 -j ACCEPT -A FORWARD -i lxcbr0 -j ACCEPT COMMIT ## Creating containers ## Create a container: # lxc-create -n mycontainer -t debian -- -r jessie # lxc-create -n test-alpine -t alpine -- -r v3.6 See the templates located in /usr/share/lxc/templates. Download updated/additional templates like: # lxc-create -t download -n test (This may require opening of outbound TCP port 11371 to allow access to PGP key servers.) Alpine containers are much smaller than Debian containers: # du -sh /var/lib/lxc/test-alpine/ /var/lib/lxc/test-debian/ 6.5M /var/lib/lxc/test-alpine/ 272M /var/lib/lxc/test-debian/ Destroy/delete a container: # lxc-destroy -n mycontainer ## Using containers ## List containers: # lxc-ls -f # lxc-ls --fancy Get info about a container: # lxc-info -n mycontainer Start a container: # lxc-start -n mycontainer Stop a container: # lxc-stop -n mycontainer Connect to the container console: # lxc-console -n mycontainer (Ctrl-a q to exit console. Or Ctrl-a Ctrl-a q if we're already in a tmux/screen session.) Connect to container as root: # lxc-attach -n mycontainer (`lxc-attach` apparently doesn't count as a login shell. Do `su -`.) Run a command inside the container: # lxc-attach -n mycontainer -- sh -c 'mysqldump -uroot --all-databases --events > /var/backups/mysql-backup.sql' To start the container when the host boots, add this to the container's config (e.g. `/var/lib/lxc/mycontainer/config`): lxc.start.auto = 1 See lxc.container.conf(5) for container configuration options. Host filesystems can be mounted inside the contain adding the following to `/var/lib/lxc/mycontainer/config` (in more-or-less `fstab(5)` format): lxc.mount.entry=/path/in/host/mount_point /var/lib/lxc/mycontainer/rootfs/mount_point none ro,bind 0 0 Create the directory/mount point in the container first, or starting the container will fail! ## Container Backup and Restore (even to another host) ## On the originating host: # lxc-stop -n mycontainer # cd /var/lib/lxc/mycontainer # tar --numeric-owner -czf mycontainer.tgz /var/lib/lxc/mycontainer # scp mycontainer.tgz newserver: On the new server: # mkdir /var/lib/lxc/mycontainer # tar --numeric-owner -xzf mycontainer.tgz -C /var/lib/lxc/mycontainer It is possible to use LVM as a backing store for lxc, and to use lvm snapshots. Consistent live snapshots depend on the filesystem (e.g. lvm, zfs). See also lxc-copy(1). ## Cgroups: Measuring and Limiting ## # lxc-info -n mycontainer -p ...gives us the PID of a container and the cgroups to which it belongs. With the PID, we can look in `/proc/PID/` to find info. % cat /proc/12945/cgroup ...reveals the cgroup of our container, which probably looks like "/lxc/mycontainer". With the cgroup name, we can poke around `/sys/fs/cgroup/*/lxc/mycontainer/*.stat`. % cat /sys/fs/cgroup/cpuacct/lxc/test/cpu.stat % cat /sys/fs/cgroup/memory/lxc/test/memory.stat % cat /sys/fs/cgroup/blkio/lxc/test/blkio.io_service_bytes Add cgroup usage restrictions to `/var/lib/lxc/mycontainer/config`, like: lxc.cgroup.memory.soft_limit_in_bytes = 256M lxc.cgroup.memory.limit_in_bytes = 512M lxc.cgroup.memory.memsw.limit_in_bytes = 1G lxc.cgroup.blkio.weight = 200 lxc.cgroup.cpu.shares = 200 These setting persist, but only take effect after restarting the container. Immediately but impermanently set cgroup controls like: # lxc-cgroup -n mycontainer memory.limit_in_bytes 512M "cpu.shares" defaults to 1024. "blkio.weight" defaults to 1000. See systemd.resource-control(5) for more. ### top ### Filter `top` output by cgroup. See top(1). In `top`, hit `f` to select fields, and toggle display (`d`) of the field "CGNAME". Filter by hitting `o`, and entering something like: CGNAME=systemd:/lxc/mycontainer `=` removes the filter. ## Example Configuration ## On host: $ sudo sh -c "echo 'lxc.network.name = eth0' >> /var/lib/lxc/mycontainer/config" $ sudo sh -c "echo 'lxc.network.ipv4 = 10.100.0.10/24' >> /var/lib/lxc/mycontainer/config" $ sudo sh -c "echo 'lxc.network.ipv4.gateway = 10.100.0.1' >> /var/lib/lxc/mycontainer/config" Assuming we're using NAT bridging: $ sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -j DNAT --to 10.100.0.10:8000 $ sudo iptables -t nat -vL In container: # sed -in 's/dhcp/manual/' /etc/network/interfaces # apt update # apt install inetutils-ping netcat-openbsd # apt-get clean If we've specified LXC_DHCP_CONFILE, it's easier to reserve a DHCP address than to set a static one. Edit `/etc/lxc/dnsmasq.conf`: dhcp-host=mycontainer,10.100.0.10 This depends on the hostname inside the container, not just the LXC name or directory name. If the container doesn't have the hostname "mycontainer" it may not match the reservation in `dnsmasq.conf`. And maybe: # systemctl restart lxc-net ### Proxy ARP ### [THIS SECTION IS UNTESTED AND LIKELY INCOMPLETE.] In certain environments, notably Amazon EC2, only one public MAC address may be available. For our container to use a public IP address (non-NAT) despite this, use proxy ARP. With proxy ARP, the host acts as a router, translating ARP requests for the container(s). # echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp # echo 1 > /proc/sys/net/ipv4/ip_forward # ip route add 10.100.0.10 dev eth0 (Remember to add the `proxy_arp` and `ip_forward` settings to the host's `/etc/sysctl.conf` for them to persist.) Or possibly: echo 1 > /proc/sys/net/ipv4/conf/br0/forwarding echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp_pvlan echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects echo 0 > /proc/sys/net/ipv4/conf/br0/send_redirects ## Systemd issue with /dev ?? ## [THIS MAY ONLY BE AN ISSUE WITH wheezy AND OLDER.] There may or may not be some issue with running systemd in a container. It seems to involve conflicts in /dev, and may be avoided by enabling autodev mode int `/etc/lxc/default.conf`: lxc.autodev = 1 lxc.kmsg = 0 ## Links ## - https://wiki.debian.org/LXC - https://wiki.archlinux.org/index.php/Linux_Containers - https://help.ubuntu.com/lts/serverguide/lxc.html - https://linuxcontainers.org/ - https://linuxcontainers.org/lxc/manpages/ - https://www.flockport.com/flockport-labs-extending-layer-2-across-container-hosts/ - http://askubuntu.com/questions/446831/how-to-let-built-in-dhcp-assign-a-static-ip-to-lxc-container-based-on-name-not - https://blog.docker.com/2013/10/gathering-lxc-docker-containers-metrics/ - https://www.stgraber.org/2016/03/26/lxd-2-0-resource-control-412/ - https://docs.oracle.com/cd/E37670_01/E37355/html/ol_control_containers.html - https://www.janoszen.com/2013/02/06/limiting-linux-processes-cgroups-explained/ - https://lists.linuxcontainers.org/pipermail/lxc-users/2016-January/010787.html - https://wiki.debian.org/BridgeNetworkConnectionsProxyArp - http://serverfault.com/questions/522478/lxc-container-with-bridge-networking-exposes-fake-mac-address-to-external-networ - http://lxc-users.linuxcontainers.narkive.com/UFV141nB/networking-issues-with-lxc-containers-in-ec2 - https://www.mail-archive.com/lxc-users@lists.linuxcontainers.org/msg06075.html - https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html