(Written in 2016, and revised in 2017. Tested on Debian Stretch.)
LXC is light-weight kernel-based containers on linux, similar to FreeBSD jails.
See lxc(7), which is pretty useful, as linux man pages go. And https://linuxcontainers.org/lxc/manpages/
# apt-get install lxc
# lxc-checkconfig
In the examples below, we assume the container host has the IP address 10.0.0.99. Furthermore, we assume a NAT bridge of 10.100.0.0/24 and a guest container at 10.100.0.10.
There are two different ways to network containers:
bridge-utils
, br0
, etc.)lxc-net
service sets up a bridge (lxcbr0
), NAT-ing, and DHCP for containersCreate either a host bridge (if the containers will have a public IP addresses) or a NAT bridge (if the containers will hide behind the host’s IP address).
If using a NAT bridge with lxc-net
, be sure to read the notes below about firewalling.
UPDATE 2019 on Debian Buster
Some of the config options have changed.
Notably lxc.network.foo
options have become lxc.net.0.foo
, like:
lxc.net.0.type = veth
lxc.net.0.flags = up
lxc.net.0.link = br0
lxc-net.service
handles the NAT bridge.
Edit /etc/lxc/default.conf
:
lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx
Create /etc/default/lxc-net
:
USE_LXC_BRIDGE="true"
LXC_BRIDGE="lxcbr0"
LXC_ADDR="10.100.0.1"
LXC_NETMASK="255.255.255.0"
LXC_NETWORK="10.100.0.0/24"
LXC_DHCP_RANGE="10.100.0.101,10.100.0.249"
LXC_DHCP_MAX="148"
LXC_DHCP_CONFILE="/etc/lxc/dnsmasq.conf"
LXC_DOMAIN=""
And do:
# touch /etc/lxc/dnsmasq.conf
# systemctl enable lxc-net
# systemctl start lxc-net
To create a host bridge, edit /etc/network/interfaces
:
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet manual
auto br0
iface br0 inet static
address 10.0.0.99
network 10.0.0.0
netmask 255.255.255.0
gateway 10.0.0.1
bridge_ports eth0
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bride_stp off
… and bring it up: sudo ifup br0
.
Edit /etc/lxc/default.conf
:
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html#lbAM
TL;DR — It’s safest to the set firewall rules on the container’s *filter table INPUT chain.
Whether the containers hook up to a NAT bridge or a host bridge, pay careful attention to the host’s rules. In either case, packets directed to a container never hit rules in the host’s *filter table INPUT chain; rules for traffic destined to the host do not protect containers.
Creating firewall rules for containers can be done in several ways. The most straightforward way is to filter traffic in the container’s *filter table INPUT chain. That’s a good way to do it.
However, before traffic hits the container, it passes through the host’s *filter table FORWARD chain.
It’s possible to filter traffic there.
FORWARD rules are more straightforward for containers networked via a host bridge than for containers NAT’d with lxc-net
.
When it starts, the lxc-net
service creates the following rule on the host’s *filter table FORWARD chain:
-I FORWARD -o lxcbr0 -j ACCEPT
Since lxc-net
inserts this rule at the head of the chain, it effectively nullifies any further restrictions added to FORWARD.
Boo.
If we wanted to add rules on FORWARD and still use lxc-net
we’d need minimally to do the following (but it’s a fragile hack):
# iptables -D FORWARD -o lxcbr0 -j ACCEPT
# sed -i '/iptables $use_iptables_lock -I FORWARD -o ${LXC_BRIDGE} -j ACCEPT/d' /usr/lib/x86_64-linux-gnu/lxc/lxc-net
When using lxc-net
, don’t bother writing FORWARD rules;
filter in the container, or create very selective NAT rules.
Creating very specific NAT rules in the PREROUTING chain of the *nat table is reasonable, since we need to NAT service ports to our container anyhow.
So, when firewalling an lxc-net
NAT bridge, do one of the following:
When firewalling a container on a host bridge, do one of the following:
The lxc-net
service is supposed to create these rules, but sometimes it doesn’t.
(A systemd concurrency iptables locking issue??)
These are minimal rules, plus the example of NAT’ing port 443 to the container at 10.100.0.10
# iptables -A FORWARD -o lxcbr0 -j ACCEPT -m comment --comment "Note the WARNING below!"
# iptables -A FORWARD -i lxcbr0 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p tcp -m tcp --dport 53 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p udp -m udp --dport 53 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p tcp -m tcp --dport 67 -j ACCEPT
# iptables -A INPUT -i lxcbr0 -p udp -m udp --dport 67 -j ACCEPT
# iptables -t mangle -A POSTROUTING -o lxcbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
# iptables -t nat -A POSTROUTING -s 10.100.0.0/24 ! -d 10.100.0.0/24 -j MASQUERADE
# iptables -t nat -A PREROUTING -d 10.0.0.99/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.100.0.10
Or, as /etc/iptables/rules.v4
for iptables-restore:
*mangle
-A POSTROUTING -o lxcbr0 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
*nat
-A PREROUTING -d 10.0.0.99/32 -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.100.0.10
-A POSTROUTING -s 10.100.0.0/24 ! -d 10.100.0.0/24 -j MASQUERADE
COMMIT
*filter
:INPUT DROP [17:984]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [71:11664]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -i lxcbr0 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i lxcbr0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i lxcbr0 -p tcp -m tcp --dport 67 -j ACCEPT
-A INPUT -i lxcbr0 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp --dport 53 -j ACCEPT
-A INPUT -p udp --dport 53 -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A FORWARD -o lxcbr0 -j ACCEPT
-A FORWARD -i lxcbr0 -j ACCEPT
COMMIT
Create a container:
# lxc-create -n mycontainer -t debian -- -r jessie
# lxc-create -n test-alpine -t alpine -- -r v3.6
See the templates located in /usr/share/lxc/templates. Download updated/additional templates like:
# lxc-create -t download -n test
(This may require opening of outbound TCP port 11371 to allow access to PGP key servers.)
Alpine containers are much smaller than Debian containers:
# du -sh /var/lib/lxc/test-alpine/ /var/lib/lxc/test-debian/
6.5M /var/lib/lxc/test-alpine/
272M /var/lib/lxc/test-debian/
Destroy/delete a container:
# lxc-destroy -n mycontainer
List containers:
# lxc-ls -f
# lxc-ls --fancy
Get info about a container:
# lxc-info -n mycontainer
Start a container:
# lxc-start -n mycontainer
Stop a container:
# lxc-stop -n mycontainer
Connect to the container console:
# lxc-console -n mycontainer
(Ctrl-a q to exit console. Or Ctrl-a Ctrl-a q if we’re already in a tmux/screen session.)
Connect to container as root:
# lxc-attach -n mycontainer
(lxc-attach
apparently doesn’t count as a login shell. Do su -
.)
Run a command inside the container:
# lxc-attach -n mycontainer -- sh -c 'mysqldump -uroot --all-databases --events > /var/backups/mysql-backup.sql'
To start the container when the host boots, add this to the container’s config (e.g. /var/lib/lxc/mycontainer/config
):
lxc.start.auto = 1
See lxc.container.conf(5) for container configuration options.
Host filesystems can be mounted inside the contain adding the following to /var/lib/lxc/mycontainer/config
(in more-or-less fstab(5)
format):
lxc.mount.entry=/path/in/host/mount_point /var/lib/lxc/mycontainer/rootfs/mount_point none ro,bind 0 0
Create the directory/mount point in the container first, or starting the container will fail!
On the originating host:
# lxc-stop -n mycontainer
# cd /var/lib/lxc/mycontainer
# tar --numeric-owner -czf mycontainer.tgz /var/lib/lxc/mycontainer
# scp mycontainer.tgz newserver:
On the new server:
# mkdir /var/lib/lxc/mycontainer
# tar --numeric-owner -xzf mycontainer.tgz -C /var/lib/lxc/mycontainer
It is possible to use LVM as a backing store for lxc, and to use lvm snapshots. Consistent live snapshots depend on the filesystem (e.g. lvm, zfs).
See also lxc-copy(1).
# lxc-info -n mycontainer -p
…gives us the PID of a container and the cgroups to which it belongs.
With the PID, we can look in /proc/PID/
to find info.
% cat /proc/12945/cgroup
…reveals the cgroup of our container, which probably looks like “/lxc/mycontainer”.
With the cgroup name, we can poke around /sys/fs/cgroup/*/lxc/mycontainer/*.stat
.
% cat /sys/fs/cgroup/cpuacct/lxc/test/cpu.stat
% cat /sys/fs/cgroup/memory/lxc/test/memory.stat
% cat /sys/fs/cgroup/blkio/lxc/test/blkio.io_service_bytes
Add cgroup usage restrictions to /var/lib/lxc/mycontainer/config
, like:
lxc.cgroup.memory.soft_limit_in_bytes = 256M
lxc.cgroup.memory.limit_in_bytes = 512M
lxc.cgroup.memory.memsw.limit_in_bytes = 1G
lxc.cgroup.blkio.weight = 200
lxc.cgroup.cpu.shares = 200
These setting persist, but only take effect after restarting the container. Immediately but impermanently set cgroup controls like:
# lxc-cgroup -n mycontainer memory.limit_in_bytes 512M
“cpu.shares” defaults to 1024. “blkio.weight” defaults to 1000. See systemd.resource-control(5) for more.
Filter top
output by cgroup. See top(1).
In top
, hit f
to select fields, and toggle display (d
) of the field “CGNAME”.
Filter by hitting o
, and entering something like:
CGNAME=systemd:/lxc/mycontainer
=
removes the filter.
On host:
$ sudo sh -c "echo 'lxc.network.name = eth0' >> /var/lib/lxc/mycontainer/config"
$ sudo sh -c "echo 'lxc.network.ipv4 = 10.100.0.10/24' >> /var/lib/lxc/mycontainer/config"
$ sudo sh -c "echo 'lxc.network.ipv4.gateway = 10.100.0.1' >> /var/lib/lxc/mycontainer/config"
Assuming we’re using NAT bridging:
$ sudo iptables -t nat -A PREROUTING -p tcp --dport 8000 -j DNAT --to 10.100.0.10:8000
$ sudo iptables -t nat -vL
In container:
# sed -in 's/dhcp/manual/' /etc/network/interfaces
# apt update
# apt install inetutils-ping netcat-openbsd
# apt-get clean
If we’ve specified LXC_DHCP_CONFILE, it’s easier to reserve a DHCP address than to set a static one. Edit /etc/lxc/dnsmasq.conf
:
dhcp-host=mycontainer,10.100.0.10
This depends on the hostname inside the container, not just the LXC name or directory name.
If the container doesn’t have the hostname “mycontainer” it may not match the reservation in dnsmasq.conf
.
And maybe:
# systemctl restart lxc-net
[THIS SECTION IS UNTESTED AND LIKELY INCOMPLETE.]
In certain environments, notably Amazon EC2, only one public MAC address may be available. For our container to use a public IP address (non-NAT) despite this, use proxy ARP. With proxy ARP, the host acts as a router, translating ARP requests for the container(s).
# echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
# echo 1 > /proc/sys/net/ipv4/ip_forward
# ip route add 10.100.0.10 dev eth0
(Remember to add the proxy_arp
and ip_forward
settings to the host’s /etc/sysctl.conf
for them to persist.)
Or possibly:
echo 1 > /proc/sys/net/ipv4/conf/br0/forwarding
echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp_pvlan
echo 1 > /proc/sys/net/ipv4/conf/br0/proxy_arp
echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
echo 0 > /proc/sys/net/ipv4/conf/br0/send_redirects
[THIS MAY ONLY BE AN ISSUE WITH wheezy AND OLDER.]
There may or may not be some issue with running systemd in a container. It seems to involve conflicts in /dev, and may be avoided by enabling autodev mode int /etc/lxc/default.conf
:
lxc.autodev = 1
lxc.kmsg = 0