nftables ============================================================================ (February 2017, January 2018) Nftables is a linux firewall that replaces the older iptables. Nftables reuses existing Netfilter kernel hooks, NAT, and userspace queuing and logging. The nftables syntax, inspired by `tcpdump`, adds features like _sets_ to make rules more concise. Like iptables, nftables organizes rules with tables and chains. Chains order rules. Tables logically group chains for administrative convenience. For example, iptables has `filter`, `nat`, and `mangle` tables; the `filter` table includes packet filtering rules from the `INPUT`, `FORWARD`, and `OUTPUT` chains. Unlike iptables, nftables isn't limited to pre-defined tables or chains. How does nftables make decisions about packets? A packet traverses the stages in the kernel's networking stack, with Netfilter providing a _hook_ for each stage. Nftables attaches user-defined rules to those hooks. At each stage, a packet is evaluated against each rule until it matches a terminal rule (accept/drop). A packet dropped at an early stage won't be reevaluated at a later stage. On Debian, install the user tools (i.e., the `nft` command) with `apt-get install nftables`. On RHEL/CentOS 7, run `yum install nftables`. RHEL 8 uses nftables by default. The man page `nft(8)` is helpful. Also see `/usr/share/doc/nftables/examples/` or `/usr/share/doc/nftables/`. The official docs seem to be at https://wiki.nftables.org. Overview of a common configuration and packet flow ---------------------------------------------------------------------------- A host acting as a simple firewall and gateway may define only a small number of nft chains, each matching a kernel hook: - a `prerouting` chain, for all newly-arrived IP traffic - an `input` chain, for traffic addressed to the local host itself - an `output` chain, for traffic originating from the local host itself - a `forward` chain, for packets the host is asked to simply pass from one network to another - a `postrouting` chain for all IP traffic leaving the firewall For configuration convenience and by convention, we group the `input`, `output`, and `forward` chains into a `filter` table. Most rules in setups like this attach to the `forward` chain. If NAT is required, we follow the convention of creating a `nat` table to hold the `prerouting` and `postrouting` chains. Source-NAT rules (where we rewrite the packet source) attach to the `postrouting` chain, and destination-NAT rules (where we rewrite the packet's destination) attach to the `prerouting` chain. Packet flow is straightforward. Only one chain attaches to each hook. The first `accept` or `drop` rule a packet matches wins. Hooks ---------------------------------------------------------------------------- A **hook** is a callback into a particular stage in the kernel's networking stack. A chain may register on one of the following hooks: - `ingress` The ingress hook sees all traffic, immediately after it arrives from the NIC. Ingress rules help with load balancing or very efficient early filtering, like DDOS protection. Rules on the ingress hook apply to the netdev family (i.e., everything, when it first arrives from the NIC). Ingress rules do not completely replace [tc](https://paulgorman.org/technical/linux-tc.txt.html) for complex traffic shaping. - `prerouting` Packets hit the prerouting hook regardless of their destination. For the ip and ip6 families. - `input` Packets destined for the local system hit the input hook. For the arp, ip, and ip6 families. - `output` Packets that originate from the local system hit the output hook. For the arp, ip, and ip6 families. - `forward` Packets not destined for the local system hit the forward hook. For the ip and ip6 families. - `postrouting` All packets leaving the machine hit this chain after the routing decision. For the ip and ip6 families.
◂ ◂ ◂ (e.g., loopback traffic) ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ↙ ┃ ┃ ▾┃ INPUT ○╮ ╭○ OUTPUT ┃▴ ┃ hook │ │ hook ┏┻┓ ▸ ▸ ▸ outbound ┏━━━━┻━━━━━━━━━┿━━┓ ┏━━┿━━━━━━━━━━┫╳┣━━━━━━┳━━━┿━━━ traffic ▸ ┃ ▸ ▸ ▸ ┃ ┃ ▸ ▸ ┗┯┛ ┃▴ │ ┃ ▾┃ ┃▴ │ ┃ │ ┃▴ local system │ ┃ ╰○ POSTROUTING inbound ▸ ┏┻┓ │ ┃ hook traffic ▸ ━━━━━┿━━┫╳┠── routing decision routing decision ┃ │ ┗┳┛ ┃ PREROUTING ○╯ ▾┃ ┃▴ hook ┗━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ▸ ▸ │ ▸ ▸ ╰○ FORWARD hook
Example of simple NFT configuration ---------------------------------------------------------------------------- Allow everything out, and filter everything incoming except for SSH and pings: ``` 🐚 ~ $ sudo apt install nftables 🐚 ~ $ sudo nft flush ruleset 🐚 ~ $ sudo nft add table inet filter 🐚 ~ $ sudo nft add chain filter input { type filter hook input priority 0\; policy accept\; } 🐚 ~ $ sudo nft add chain filter forward { type filter hook forward priority 0\; policy accept\; } 🐚 ~ $ sudo nft add chain filter output { type filter hook output priority 0\; policy accept\; } 🐚 ~ $ sudo nft add rule filter input ct state invalid drop 🐚 ~ $ sudo nft add rule filter input meta iif lo ct state new accept 🐚 ~ $ sudo nft add rule filter input ct state established,related accept 🐚 ~ $ sudo nft add rule filter input tcp dport ssh accept 🐚 ~ $ sudo nft add rule filter input icmp type echo-request accept 🐚 ~ $ sudo nft add rule inet filter input icmpv6 type { nd-neighbor-solicit, echo-request, nd-router-advert, nd-neighbor-advert } accept 🐚 ~ $ sudo nft add chain filter input { type filter hook input priority 999\; policy drop\; } 🐚 ~ $ sudo nft list ruleset table ip filter { chain input { type filter hook input priority 999; policy drop; ct state invalid drop iif "lo" ct state new accept ct state established,related accept tcp dport ssh accept icmp type echo-request accept } chain forward { type filter hook forward priority 0; policy accept; } chain output { type filter hook output priority 0; policy accept; } } 🐚 ~ $ sudo sh -c 'echo "flush ruleset" > /etc/nftables.conf' 🐚 ~ $ sudo sh -c 'nft list ruleset >> /etc/nftables.conf' 🐚 ~ $ sudo nft --check --file /etc/nftables.conf 🐚 ~ $ sudo systemctl enable nftables.service 🐚 ~ $ sudo systemctl start nftables.service ``` Families ---------------------------------------------------------------------------- Nftables recognizes these five protocol families. All traffic belongs to a family. Each rule applies to a particular family. - ip: for IPv4 - ip6: for IPv6 - inet: an alias for mixed ipv4 and ipv6 traffic - arp: for ARP (No more `arptables` command!) - bridge: for traffic traversing bridges (No more `ebtables` command!) - netdev: for _all_ traffic when it first arrives from the NIC Tables ---------------------------------------------------------------------------- A **table** simply groups chains for easier management (e.g., we can flush all chains grouped in a table with one command). The only restriction on which chains may be included in a particular table is that the chains must all affect the same protocol family. ``` 🐚 # nft list tables 🐚 # nft add table inet foo 🐚 # nft delete table inet foo 🐚 # nft flush table inet foo ``` The kernel doesn't know about tables. Tables are a user convenience for grouping/categorizing chains. Like, we create a "filter" table to group together chains with filtering rules. We're likely to include the "input" chain (if we have one) in the filter table, because we do lots of filtering on input. But we could include the input chain in other tables too. Chains ---------------------------------------------------------------------------- A **chain** groups together rules, and attaches those rules to a Netfilter kernel hook for packet processing. A chain can have a type, hook, priority, and policy. A chain's **policy** determines what happens to a packet that hasn't matched a particular rules. A chain is one of three **types**: - `filter` For filtering packets. Supported in the arp, bridge, ip, ip6 and inet families. - `route` For rerouting packets; it modified IP header fields or packet marks, like mangle for the `output` hook. Supported for ip and ip6. - `nat` For network address translation. Only the first packet of a flow hits this chain, so don't use `nat` for general filtering. Supported for the ip and ip6 families. ``` 🐚 # nft add chain [] { type hook priority \; [policy ] } 🐚 # nft add chain ip foo input { type filter hook input priority 0 \; } 🐚 # nft delete chain ip foo input 🐚 # nft flush chain foo input ``` (The escaped semi-colon is only necessary to not confuse the shell.) Chains not registered with a hooks do not get packets, but may be used to organize other chains. Iptables comes standard with one chain for each hook, with the chains named the same as the hooks. A user can give nftables chains any name, and create many chains per hook. In practice, however, many nftables configurations follow the convention of one chain per hook. Nftables evaluates each packet against _all_ chains on the same hook. The _last_ chain on the hook (i.e., the chain with the highest **priority**) that evaluates a packet wins (reversing any contrary decision by an earlier chain on that same hook). Each chain has its own priority, which the user may set. Note: a chain's priority may place the chain before or after some Netfilter internal operations, like: - NF_IP_PRI_CONNTRACK_DEFRAG (-400): priority of defragmentation - NF_IP_PRI_RAW (-300): traditional priority of the raw table placed before connection tracking operation - NF_IP_PRI_SELINUX_FIRST (-225): SELinux operations - NF_IP_PRI_CONNTRACK (-200): Connection tracking operations - NF_IP_PRI_MANGLE (-150): mangle operation - NF_IP_PRI_NAT_DST (-100): destination NAT - NF_IP_PRI_FILTER (0): filtering operation, the filter table - NF_IP_PRI_SECURITY (50): Place of security table where secmark can be set for example - NF_IP_PRI_NAT_SRC (100): source NAT - NF_IP_PRI_SELINUX_LAST (225): SELinux at packet exit - NF_IP_PRI_CONNTRACK_HELPER (300): connection tracking at exit Chains with a low priority (negative, zero) are evaluated before chains with a higher priority (positive). A chain with a higher priority can overrule an earlier chain on the same hook with a lower priority. Rules ---------------------------------------------------------------------------- A rule describes the type of traffic that matches it, and the action to take for matching traffic. Each rules has a number that sets its order in relation to other rules on the same chain. When ordering rules, a user may need to refer to either the rule's "handle" or "position". The handle is an internal ID that identifies the rule. The position is a number that places the rule before a particular handle (i.e., insert this rule right before this other rule). ``` 🐚 # nft add rule mytable mychain ip daddr 8.8.8.8 counter 🐚 # nft add rule mytable mychain position 8 ip daddr 127.0.0.8 drop 🐚 # nft insert rule mytable mychain position 8 ip daddr 127.0.0.6 drop 🐚 # nft delete rule mytable mychain handle 5 🐚 # nft replace rule mytable mychain handle 9 ip daddr 127.0.0.3 drop 🐚 # nft list table mytable -n -a ``` (`add` places the rule after the position, `insert` places the rule before the position.) In iptables, each rules has one target (e.g., `-j ACCEPT` or `-j LOG`). In nftables, one rules may perform several actions. Nftables provides these operations for rules: - `eq` for "equals" (also `==`) - `ne` for "not equal" (also `!=`) - `lt` for "less than" (also `<`) - `gt` for "greater than" (also `>`) - `le` for "less than or equal to" (also `<=`) - `ge` for "greater than or equal to" (also `>=`) Remember to escape `<` and `>` in the shell, like `\<` and `\>`. Match all incoming traffic not arriving on TCP port 22: ``` 🐚 # nft add rule mytable mychain tcp dport != 22 ``` Match traffic to high ports: ``` 🐚 # nft add rule mytable mychain tcp dport >= 1024 ``` Nftables provides a number of matching criteria. The available criteria vary somewhat by type (i.e., `ip`, `tcp`, `ip6`, `udp`, `arp`, `ct`, `vlan`, etc.). See https://wiki.nftables.org/wiki-nftables/index.php/Quick_reference-nftables_in_10_minutes#Rules. These examples are non-exhaustive: ``` 🐚 # nft add rule ip length 333-435 drop 🐚 # nft add rule ip ttl > 200 drop 🐚 # nft add rule ip protocol icmp drop 🐚 # nft add rule ip saddr 192.168.2.0/24 accept 🐚 # nft add rule ip daddr { 192.168.0.1-192.168.0.250 } drop 🐚 # nft add rule tcp dport {telnet, http, https } allow 🐚 # nft add rule icmp type {echo-reply, destination-unreachable, redirect, echo-request} allow 🐚 # nft add rule ct status expected allow 🐚 # nft add rule ct helper "ftp" log 🐚 # nft add rule meta iifname "eth2" continue ``` The "statement" of a rule is the action performed on matching packets. A statement may be "terminal" or "non-terminal". A rule may include several non-terminal statements but on one terminal statement. These verdict statements alter control flow in the ruleset and issue policy decisions for packets: - `accept` allows the packet and ends evaluation of further rules in the chain - `drop` rejects the packet and ends evaluation of further rules in the chain - `queue` sends the packet to userspace for further processing - `continue` continues evaluation with the next rule - `return` return from the current chain and continue at the next rule of the previous chain (in a base chain it's equivalent to `accept`) - `jump _chain_` continue at the first rules in _chain_, but return to the next rule in this current chain after a `return` - `goto _chain_` continue at the first rules in _chain_, and never return to this current chain Additional actions: - `log` - `mark` sets metainformation about a packet (e.g., priority, ct) - `queue` enqueues packet to a userspace application - `dup` duplicates packets to another ip or ip6 destination - `counter` counts packet (note that nftables does not count by default, unlike iptables) ``` 🐚 # nft add rule filter input iif lo log tcp dport 22 accept 🐚 # nft add rule nat postrouting ip saddr 192.168.1.0/24 oif eth0 snat 1.2.3.4 🐚 # nft add rule mangle prerouting dup to 172.20.0.2 🐚 # nft add rule filter input ip protocol tcp counter ``` Sets ---------------------------------------------------------------------------- Nftables adds another basic type not round in iptables: **sets**. In many scenarios, the use of sets dramatically increase performance versus implementing the functionality with individual rules. Use sets liberally! Sets can be anonymous or named. Anonymous sets are bound to a rule, and can't be updated without replacing the rule. ``` 🐚 # nft add rule filter output tcp dport { 22, 23 } counter ``` Named sets are not tied to rules and may be updated. ``` 🐚 # nft add set filter myset { type ipv4_addr\;} 🐚 # nft add element filter myset { 192.168.3.4 } 🐚 # nft add element filter myset { 192.168.1.4, 192.168.1.5 } 🐚 # nft add rule ip input ip saddr @myset allow 🐚 # nft list set filter myset ``` Named sets can have several characteristics: - type: - ipv4_addr: IPv4 address - ipv6_addr: IPv6 address. - ether_addr: Ethernet address. - inet_proto: Inet protocol type. - inet_service: Internet service (read tcp port for example) - mark: Mark type. - timeout: how long an element stays in the set (with a time string like "1d2h3m4s") - flags: - constant: set content may not change while bound - interval: set contains intervals - timeout: elements can be added with a timeout - gc-interval: garbage collection interval; only be used if `timeout` or the flag `timeout` is set; same time format as `timeout` - elements: initializes the set with member elements - size: limits the maximum number of elements of the set - policy: determines set selection policy; either `performance` (the default) or `memory` Example 0 ---------------------------------------------------------------------------- Filter traffic for a workstation (so we don't need a `forward` chain): ``` 🐚 # nft add table ip filter 🐚 # nft add chain ip filter input { type filter hook input priority 999 \; policy drop \; } 🐚 # nft add chain ip filter output { type filter hook output priority 0 \; policy accept \; } 🐚 # nft add rule filter input ct state established,related accept 🐚 # nft add rule filter output ip daddr 8.8.8.8 counter ``` Example 1 ---------------------------------------------------------------------------- ``` flush ruleset table inet filter { chain input { type filter hook input priority 999; iif lo accept ct state established,related accept tcp dport { 22, 80, 443 } ct state new accept ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept counter drop } } ``` ### Example 1 Discussion ### This is the simple example ruleset for a workstation found in `/usr/share/doc/nftables/examples/syntax/workstation`. `flush ruleset` clears existing rules. Nftables can flush individual tables too (i.e., `nft flush table mytable`). `table inet filter {` begins and declares a new table for IPv4 and IPv6 traffic (`inet` family) named "filter". `chain input {` created the "input" chain. The "input" here is simply a name. `type filter hook input priority 0;` sets the chain's type to `filter`, attaches the chain to the `input` hook, and sets its priority (zero is the expected priority for filtering). `iif lo accept` accept all traffic coming in on the loopback `lo` interface. (Is this right? According to `nft(8)`, `iif` should get an "interface index" while `iifname` gets a string/name.) `ct state established,related accept` uses connection tracking accept traffic with an existing state (i.e., related to connections that originated from us). `tcp dport { 22, 80, 443 } ct state new accept` lets us serve ssh and web traffic. Note use of a set to specify the ports. `ip6 nexthdr icmpv6 icmpv6 type { nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept` allows IPv6 neighbor discovery (or else IPv6 breaks!). Note use of a set to specify the ICMPv6 message types. `counter drop` counts and drops any traffic not covered by an earlier rule. Example 2 ---------------------------------------------------------------------------- ``` flush ruleset table firewall { chain incoming { type filter hook input priority 999; policy drop; ct state established,related accept iifname lo accept icmp type echo-request accept tcp dport {ssh, http} accept } } table ip6 firewall { chain incoming { type filter hook input priority 999; policy drop; ct state established,related accept ct state invalid drop iifname lo accept # routers may also want: mld-listener-query, nd-router-solicit icmpv6 type {echo-request,nd-neighbor-solicit} accept tcp dport {ssh, http} accept } } ``` ### Example 2 Discussion ### This is the simple firewall ruleset from https://wiki.nftables.org/wiki-nftables/index.php/Quick_reference-nftables_in_10_minutes. `flush ruleset` clears existing rules. `table firewall {` declares a new table. Note the declaration does not specify a protocol family, so it default to IPv4. `chain incoming {` declares a new chain. `type filter hook input priority 0; policy drop;` sets the chain's type to `filter`, attaches the chain to the `input` hook, sets its priority (zero is the expected priority for filtering), and sets the default policy to `drop`. `ct state established,related accept` uses connection tracking accept traffic with an existing state (i.e., related to connections that originated from us). `iifname lo accept` accept all traffic coming in on the loopback `lo` interface. `icmp type echo-request accept` allows pings. `tcp dport {ssh, http} accept` allows ssh and web serving. `table ip6 firewall {` declares a new table for IPv6. `chain incoming {` declares a new chain named "incoming". `type filter hook input priority 0; policy drop;` sets the chain's type to `filter`, attaches the chain to the `input` hook, sets its priority (zero is the expected priority for filtering), and sets the default policy to `drop`. `ct state established,related accept` uses connection tracking accept traffic with an existing state (i.e., related to connections that originated from us). `ct state invalid drop` drops invalid connections. `iifname lo accept` accepts traffic coming in on the loopback interface. `icmpv6 type {echo-request,nd-neighbor-solicit} accept` accept some IPv6 icmp traffic. `tcp dport {ssh, http} accept` allows ssh and web serving. A Brief Refresher on Connection Tracking ---------------------------------------------------------------------------- Connection tracking isn't unique to nftables — Netfilter provides it. Connection tracking filters packets based on criteria that IP header information alone can not provide. In other words: stateful firewalling. Connection tracking keeps facts about a connection — its source and destination addresses, protocol, ports, timeout, etc. A connection may have one of the following states: - "new" for a valid, just-initiated connection where traffic has thus far only appeared from one direction - "established" for a connection where the firewall sees two-way communication - "related" for an expected connection (see "helpers") - "invalid" for packets that deviate from the behavior expected for any connection in the connection tracking table These states have nothing to do with TCP states; even UPD connection can be stateful in the sense of connection tracking. Connection tracking works primarily at layer 3, although some of the modules operate at higher layers. Connection tracking facilitates some application-layer protocols with hard-to-track properties, like FTP. A connection tracking "helper" has a set of _expectations_ about the properties of connections. The FTP helper expects that, within a given time and from a given source and destination, that a passive FTP connection will open a second high-number port for data transmission. The helper inspects packet contents in order to find the necessary information. The helper is application-aware. In the case of FTP, the helper digs through packet payloads looking for the PORT reply from the server to the client. When its expectations are met, the helper establishes a new state. Helpers exist for IRC, SIP, SNMP, H323, etc. Saving and Restoring Rule Sets ---------------------------------------------------------------------------- With iptables, a common configuration method used a shell script to execute a series of iptables commands. Unfortunately, that was not an atomic operation. Nftables loads a rule file atomically with the `-f` flat: ``` 🐚 # nft -f myrulefile ``` Most Linux distributions read nftables rules from `/etc/nftables.conf`. Save the current rules to this file so they persist after a reboot: ``` 🐚 # echo "flush ruleset" > /etc/nftables.conf 🐚 # nft list ruleset >> /etc/nftables.conf ``` In a rule file, nftables treats a line beginning with `#` as a comment. Check the syntax of a rules file: ``` 🐚 $ /usr/sbin/nft --check --file /etc/nftables.conf ``` Under systemd, make sure to enable the nftables service so that the rules load on reboot: ``` 🐚 # sudo systemctl enable nftables.service 🐚 # sudo systemctl start nftables.service ``` Links ---------------------------------------------------------------------------- - https://wiki.nftables.org/wiki-nftables/index.php/Main_Page - https://paulgorman.org/technical/linux-iptables.txt - This is a good overview: https://developers.redhat.com/blog/2016/10/28/what-comes-after-iptables-its-successor-of-course-nftables/ - http://ral-arturo.org/2017/05/05/debian-stretch-stable-nftables.html - https://netfilter.org - https://wiki.archlinux.org/index.php/nftables - https://wiki.gentoo.org/wiki/Nftables - https://wiki.debian.org/nftables - https://stosb.com/blog/explaining-my-configs-nftables/ - https://developers.redhat.com/blog/2017/01/10/migrating-my-iptables-setup-to-nftables/ - https://lwn.net/Articles/324989/ - https://linoxide.com/firewall/configure-nftables-serve-internet/ - https://developers.redhat.com/blog/2017/04/11/benchmarking-nftables/ - https://access.redhat.com/errata/RHEA-2016:2558 - https://news.ycombinator.com/item?id=14286016 - http://conntrack-tools.netfilter.org/manual.html - https://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-3.html