Linux Traffic Control

(January 2019)

Traffic control sets which network packets to accept and at what rates. By default traffic flows into a single queue, and leaves that queue on a first-in-first-out basis. More complex schemes are possible, addressing problems like fair sharing of limited bandwidth or prioritizing latency-sensitive traffic (e.g., VoIP).

The term “QoS” (quality of service) is a synonym of “traffic control”.

tc is the user-space utility to control the Linux kernel packet scheduler. Most Linux distros include tc with their iproute2 package. See tc(8).

A commented example of adding a qdisc using tc:

$  tc qdisc add     $(# Add a queuing discipline )             \
      dev vti0      $(# Attach qdisc to this device )          \
      root          $(# Apply to egress )                      \
      handle 1:0    $(# Name qdisk with major:minor numbers )  \
      htb           $(# Apply HTB queuing discipline )

Note that (as of 2019) tc-tbf(8) still discusses burst in terms of “timer tick” and HZ. However, modern Linux is tickless, with high-resolution timers. Therefore, the optimum setting of burst depends mostly on application. Burst is the size in bytes of the TBF bucket — i.e., the number of bytes to pass through the queue at once, at full device speed before the rate limit kicks in.



A queue is a buffer that holds a finite number of packets waiting to be sent. A queue only becomes interesting for traffic control when we can delay, rearrange, drop, and prioritize queued packets, or even shift packets to other queues with different properties.


A flow is a unique set of packets sent from one particular source IP address to a particular destination IP address. Traffic control mechanisms often classify packets into flows, which can be managed in aggregate.

Tokens and buckets

To control the number or packets or bytes dequeued by setting timers and counting is expensive and complex. A cheaper, simpler method generates tokens at a fixed rate, and only dequeues packets when a token is available.

Imagine people waiting in line for an amusement park ride. The ride runs on rails, and a car arrives every minute to pick up another rider. The cart represents a token. The rider represents a packet.

A bucket is like a small, connected train of cars that can pick up several riders every minute. If the not enough people are waiting to fill the train, it will depart on schedule anyhow. A bucket holds multiple tokens.

The qdisc used in Linux traffic control is a “token bucket filter” (TBF). It transmits packets to match the available tokens, but defers any packets that exceed the number of tokens.

Packets and frames

Chunks of data in layer 2 are “frames”. Ethernet, for example, sends frames.

Chunks of data in layer 3 are “packets”. IP, for example, sends packets.

Discussions of traffic control generally call all chunks of data “packets”, despite “frame” sometimes being the more correct term.

Elements Common to Traffic Control Systems


Shapers delay output of packets according to a set rate.


Schedulers arrange/rearrange packets for dequeuing.

FIFO is the simplest scheduler. A “fair queuing” scheduler (e.g., SFQ) tries to keep any one client/flow from dominating. A round-robin scheduler (e.g., WRR) gives each flow a turn to dequeue packets.


Classifiers sort packets into different queues, and optionally mark them. Classifiers may work together with policers.

Linux traffic control can cascade a packet through a series of classifiers.


Policers limit traffic in a particular queue, usually to restrain a peer to a certain allocated bandwidth. Excess traffic might be dropped or (better) reclassified.

Unlike shaping, which can delay a packet, policing is an A/B decision — either enqueue the packet or take action B.


Dropping discards a packet.


Marking alters a packet to install a DSCP on the packet, which other routers in the domain can read and use.

This is not the same thing as iptables/Netfilter marks, which only affect packet metadata.

Components of the Linux Taffic Control Implementation


A qdisc is a scheduler. “qdisc” means “queuing discipline”.

These can be either classful qdiscs or classless qdiscs. Classful qdisks contain classes and provide handles to attach filters. Classless qdisks contain no classes or filters.

Confusingly, the terms “root qdisc” and “ingress qdisc” are not qdiscs in the sense we mean here — they’re just a pair of hooks that come with each interface. We can attach qdiscs to these — most commonly to the “root qdisc”, which corresponds to egress traffic.

We can attach qdiscs with classes or filters to the root qdisc. The ingress qdisc only accepts qdiscs with filters, not classes. So, ingress is more limited than root.


A class contains either several child classes or one child qdisc.

We can attach any number of filters to a class too, which send traffic into a child class or reclassify/drop it.


A filter combines a classifier and a policer.

A filter can be attached to a qdisc or a class.

A packet always gets screened through the filter attached to the root qdisc first, before it can be directed to any subclasses.


A classifier is one component of a filter. A classifier identifies a packet based on the packet’s characteristics or metadata. We manipulate a classifier with tc.


A policer is used as part of a filter. A policer sets a threshold, and takes one action for traffic rates above that threshold, and another action for traffic rates below that threshold.

A policer never delays traffic.


Dropping a packet. A drop only happens as a decision by a policer attached to a filter.

However, a drop might also happen as a side-effect. A shaper or scheduler might cause a traffic drop if a buffer fills during especially bursty traffic.


A handle uniquely identifies a class or classful qdisc in the traffic control structure. The handle has two parts: a major number and a minor number. The major and minor numbers may be assigned arbitrarily by the user, but:

Making Lossy/Jittery Interfaces for Fun and Testing

Make a thing we can ping that always has packet loss! Note that netem alterations only work on output, which is why we can’t just qdisc some dummy interfaces in the namespace. See tc-netem(8).

set -euf

# Create troubled interfaces to test network monitoring tools.
# Run with `sudo`.

ip netns add ns-trouble

ip link add veth0-trouble type veth peer name veth1-trouble
ip link set veth1-trouble netns ns-trouble
ip addr add dev veth0-trouble
ip link set veth0-trouble up

ip link add veth2-trouble type veth peer name veth3-trouble
ip link set veth3-trouble netns ns-trouble
ip addr add dev veth2-trouble
ip link set veth2-trouble up

ip netns exec ns-trouble ip addr add dev veth1-trouble
ip netns exec ns-trouble ip link set veth1-trouble up

ip netns exec ns-trouble ip addr add dev veth3-trouble
ip netns exec ns-trouble ip link set veth3-trouble up

ip netns exec ns-trouble ip route add default via

ip route add via
ip route add via

ip netns exec ns-trouble ip link add lossy0 type dummy
ip netns exec ns-trouble ip link set dev lossy0 up
ip netns exec ns-trouble ip addr add dev lossy0
tc qdisc add dev veth0-trouble root netem loss 30% 25% delay 3ms 30ms

ip netns exec ns-trouble ip link add latency0 type dummy
ip netns exec ns-trouble ip link set dev latency0 up
ip netns exec ns-trouble ip addr add dev latency0
tc qdisc add dev veth2-trouble root netem delay 50ms 500ms

# To clean up, run:
#    sudo ip netns del ns-trouble
🐚 ~ $ ping -c 5
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.091 ms
64 bytes from icmp_seq=4 ttl=64 time=0.047 ms

--- ping statistics ---
5 packets transmitted, 2 received, 60% packet loss, time 88ms
rtt min/avg/max/mdev = 0.047/0.069/0.091/0.022 ms
🐚 ~ $ ping -c 5
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=647 ms
64 bytes from icmp_seq=2 ttl=64 time=16.7 ms
64 bytes from icmp_seq=3 ttl=64 time=0.060 ms
64 bytes from icmp_seq=4 ttl=64 time=881 ms
64 bytes from icmp_seq=5 ttl=64 time=0.051 ms

--- ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 16ms
rtt min/avg/max/mdev = 0.051/308.972/880.818/378.855 ms