paulgorman.org/technical

systemd

(2016? Updated slightly 2018)

systemd is a new Linux init system.

SysV init systemd
ls /etc/init.d/ systemctl
service foo start/stop/restart/reload systemctl start/stop/restart/reload foo
cat /etc/init.d/foo.sh systemctl cat foo.service

Like it or not, systemd will be the linux init system for at least the next few years.

Init is the first process on a unix system. PID 1. All other processess descend from it. systemd fulfills that role, but does significantly more. In these notes, we’ll focus on how systemd fulfills its core duties as a replacement for SysV init.

On CentOS 7:

$  ps 1
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:07 /lib/systemd/systemd --system --deserialize 16

On Debian 9:

$  ps 1
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:03 /sbin/init
$  ls -l /sbin/init 
lrwxrwxrwx 1 root root 20 Apr  8 06:51 /sbin/init -> /lib/systemd/systemd

systemd controls units. A unit is anything systemd controls. These include services, sockets, devices, etc.

systemd reads unit files for configuration.

Documentation

Documentation for systemd is thin. [UPDATE: as to 2018 the documentation coverage is significantly improved.]

Misc

systemctl --failed                Show failed units/services
systemd-cgtop                     Top-like display of cgroups
systemctl kill foo                Kill foo _and_ its children
systemctl kill -s SIGKILL foo     Kill foo _and_ its children
systemctl list-dependencies ssh.service                Show dependencies
systemctl show -p CPUShares ssh.service                Show CPU shares
systemctl set-property ssh.service CPUShares=999999    Set CPU shares

systtemctl and boot

Running systemctl without arguments shows the state of each service loaded on boot. A few more details about any of these services can be seen with systemctl status ssh.service, for example.

$ systemctl status ssh.service
● ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled)
   Active: active (running) since Thu 2015-01-29 20:01:09 EST; 3 weeks 2 days ago
 Main PID: 759 (sshd)
   CGroup: /system.slice/ssh.service
           └─759 /usr/sbin/sshd -D

Unit Files

Unit files configure units. They’re scattered all over the place. From systemd.unit(5):

       /etc/systemd/system/*
       /run/systemd/system/*
       /lib/systemd/system/*
       ...

       $XDG_CONFIG_HOME/systemd/user/*
       $HOME/.config/systemd/user/*
       /etc/systemd/user/*
       $XDG_RUNTIME_DIR/systemd/user/*
       /run/systemd/user/*
       $XDG_DATA_HOME/systemd/user/*
       $HOME/.local/share/systemd/user/*
       /lib/systemd/user/*

(The ellipsis is from the man page. Nice.)

(Red Hat says: “Systemd unit files are stored in ‘/usr/lib/systemd/system/’”, whereas unit files created or modified by the users are stored in ‘/etc/systemd/system/’.“)

After adding or changing a unit file, systemctl daemon-reload will make systemd take notice of the change, although it will not automatically start a new service without systemctl start foo.service.

If creating a user unit (e.g., $HOME/.config/systemd/user/foo.service), enable it with:

$ systemctl --user daemon-reload
$ systemctl --user enable foo
$ systemctl --user start foo
$ systemctl --user status foo

What’s the difference, if any, between a service file and a unit file?

None. A *.service file is just a unit file for a service. *.socket, *.mount, etc., are also unit files.

The [Unit] section contains generic information about the service. systemd not only manages system services, but also devices, mount points, timers, and other components of the system. The generic term for all these objects in systemd is a unit, and the [Unit] section encodes information about it that might be applicable not only to services but also in to the other unit types systemd maintains.*

Here’s /etc/systemd/system/myexample.service:

[Unit]
Description=MyApp
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=/usr/bin/docker kill busybox1
ExecStartPre=/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name busybox1 busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"

[Install]
WantedBy=multi-user.target

…and then we do systemctl enable /etc/systemd/system/myexampel.service and systemctl start myexample.service for our new service.

A real unit file I wrote (/etc/systemd/system/asterisk.service):

[Unit]
Description=Asterisk PBX and telephony daemon
Documentation=man:asterisk(8)
Wants=network.target
After=network.target

[Service]
Type=simple
User=asterisk
Group=asterisk
PermissionsStartOnly=true
ExecStart=/usr/sbin/asterisk -g -f -C /etc/asterisk/asterisk.conf
ExecStop=/usr/sbin/asterisk -rx 'core stop now'
ExecReload=/usr/sbin/asterisk -rx 'core reload'
ExecStartPost=/home/admin/bin/asterisk_status.pl
ExecStartPost=/bin/sh -c 'echo "The Asterisk service on gab restarted. See https://gab.example.com/asterisk-status.txt" | mail -s "Asterisk service restarted" root'
ExecStopPost=/bin/sh -c 'echo "The Asterisk service on gab stopped." | mail -s "Asterisk service stopped" root'

Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Things to note about the above unit file:

Check the syntax of a unit file:

$  systemd-analyze verify myunit.service

Targets

Targets are a way of grouping units, and are vaguely similar to SysV’s run levels. Common targets include multi-user.target and graphical.target.

systemctl get-default shows the default target.

systemctl list-units --type target shows the current target(s).

systemctl list-units --type target --all lists all targets. (Jesus, nice command structure, guys.)

systemctl isolate foo.target changes the current target. systemctl isolate rescue.target is as close as we get to dropping to single user runlevel; systemctl rescue is shorthand for this. (In practice systemctl emergency might be preferable, since it sends a warning to all users.)

cgroups

cgroups are a feature of the linux kernel, not systemd. But systemd makes cgroups easier to use.

Imagine we have a web app that forks a bunch of apache processes. It would be handy to be able to manage and measure those processes as a group, apart from any unrelated apache processes on the box. cgroups lets us do that. Furthermore, cgroups let us restrain that gaggle of apache processes from starving other processes on the box by dint of the scheduling advantage their great number gives them.

A cgroup does two things: it groups and labels/tags related processes as a single service, and it lets us control/measure that service. systemd relies on the first feature of cgroups to function; the second is just a useful feature of cgroups.

Processes on traditional *nix systems are a single hierarchy (i.e., all processes descend from init). cgroups (“control groups”) bundle processes together such that each cgroup appears to be its own independent process hierarchy. The processes which are part of a cgroup are called “tasks”. A cgroup can spawn a new child cgroup, which inherit the attributes of the parent cgroup. Tasks can be moved between cgroups. Tasks can belong to more than one cgroup at a time, so long as those cgroups are not part of the same hierarchy of descent. (If a task is added to a second cgroup that’s part of the same hierarchy as its original cgroup, the task is automatically removed from the original cgroup). When a task forks off a child task, that child task is automatically part of the same cgroup (though it can be subsequently moved to a different cgroup). Forked tasks are independent; child and parent can be changed (e.g., to different cgroups) without affecting each other.

By grouping related tasks, we can think about managing resources for services rather than individual processes.

Each cgroup can attach to one or more resource “subsystems”. Subsystems include cpu, blkio, net_cls (tagging packets with their originating cgroup), memory, namespaces, etc (enumerate them with ls /sys/fs/cgroup/). We could, for example, the attach tasks in a cgroup to a particular cpu core. Kernel literature sometimes calls these resource subsystems “controllers” or “resource controllers”.

The point of cgroups is the ability to provide accounting (e.g., for billing purposes or provisioning planning), limits/prioritization (e.g., use only so much memory or disk I/O), and isolation (e.g., namespaces) for a group of processes.

(Namespace isolation is technically a separate feature from cgroups.)

In practice, systemd seems to use only a few of the features of cgroups, mainly to organize related processes in a way that makes it easier for administrators to keep track of them. Under sysvinit without using cgroups, orphaned processes are re-parented to PID 1, making it sometimes difficult to know where such a process originated; systemd keeps related processes together in a cgroup, even if the parent dies.

ps can show cgroups:

$ ps axw -o pid,user,cgroup,args
[...snip...]
  729 root     4:devices:/system.slice/rpc /sbin/rpcbind -w
  738 statd    4:devices:/system.slice/nfs /sbin/rpc.statd
  743 root     -                           [rpciod]
  745 root     -                           [nfsiod]
  752 root     4:devices:/system.slice/nfs /usr/sbin/rpc.idmapd
  756 root     4:devices:/system.slice/cro /usr/sbin/cron -f
  757 root     4:devices:/system.slice/sma /usr/sbin/smartd -n
  758 daemon   4:devices:/system.slice/atd /usr/sbin/atd -f
  759 root     4:devices:/system.slice/ssh /usr/sbin/sshd -D
[...snip...]
28267 paulgor+ 4:devices:/user.slice,1:nam rxvt
28268 paulgor+ 4:devices:/user.slice,1:nam rxvt
28269 paulgor+ 4:devices:/user.slice,1:nam bash
28734 paulgor+ 4:devices:/user.slice,1:nam /bin/bash
28736 paulgor+ 4:devices:/user.slice,1:nam iceweasel

The systemd-cgls command gives this information as a tree.

Signaling

Using systemd to signal a service like systemctl kill -s SIGTERM foo.service ensures that all processes that make up the service receive the signal.

Stopping services

systemctl stop foo.service terminates the running service. It will turn back on at the next boot or if something triggers activation for it (hardware plugging, socket activation, etc.).

systemctl disable foo.service unhooks a service from any activation triggers. It will not start on reboot. The service can still be started manually. (Note that disabling a service will not actually stop the currently running instance, if any, so you may also want to send systemctl stop foo.service.)

We can also mask a service, which both disables it and prevents it from being started manually: systemctl mask foo.service.

Finally, doing something like ln -s /dev/null /etc/systemd/system/foo.service; systemctl daemon-reload will block the service from being started, even manually, because entries in /etc/systemd/ override those in /lib/systemd/.

Services can be brought back up in the way we’d expect (enable, start).

/run and changes to /etc

There’s a new top-level directory called /run. This contains things that once went in /var/run (or, worse, got stuck in /dev/.foo/ because /var wasn’t available early enough in the boot process). See this mailing list post about /run.

/run/ isn’t strictly systemd-related, but part of a larger (some might say “overreaching”) clean up, like the newly standardized config files (although the point of those new config files is that they can be run directly by systemd, without executing a shell, so systemd directly reads /etc/fstab and /etc/hostname).

Because systemd unit files are capable of doing the same job (i.e. — offering config options for init scripts that have become too complex for admins to safely edit), systemd has the ambition to phase out /etc/default/ (and /etc/sysconfig on Red Hat-based distros).

Temp files

systemd-tmpfiles creates, deletes, and tidies up temp files based on configuration files in /etc/tmpfiles.d/ and /usr/lib/tmpfiles.d/. The syntax of these files is concise (see tmpfiles.d(5)):

$ cat /usr/lib/tmpfiles.d/sshd.conf
d /var/run/sshd 0755 root root

Timers

systemd can do cron-like stuff, configured with .timer unit files. See systemd.timer(5).

A unit file named like foo.timer controls execution of a foo.service file. Both files must exist in one of the standard paths for unit files (e.g., $XDG_CONFIG_HOME/systemd/user/). When enabling/disabling, do so for the .timer file, not the associated .service file.

The .timer file has a [Timer] section that sets when and how the timer runs. The time may be either:

🐚 ~ $ systemctl list-timers
🐚 ~ $ systemctl list-timers --all

If a timer gets out of sync, delete the stamp file that marks when the timer last ran (in /var/lib/systemd/timers or ~/.local/share/systemd/). Systemd will recreate the stamp file on the next timer run.

foo.timer (realtime)

[Unit]
Description=Run foo weekly

[Timer]
OnCalendar=weekly
Persistent=true

[Install]
WantedBy=timers.target

bar.timer (monotonic)

[Unit]
Description=Run bar weekly and on boot

[Timer]
OnBootSec=15min
OnUnitActiveSec=1w

[Install]
WantedBy=timers.target

Mount

systemd handles mounting filesystems. See systemd.mount(5). It can do automounting, and includes various additions to the traditional /etc/fstab syntax.

timedatectl and systemd-timesyncd

systemd has a built-in SNTP server, but it. It more or less backs off if an ntpd is running. The timedatectl utility can change or monitor the system clock.

$  timedatectl status
      Local time: Tue 2018-04-03 19:48:50 EDT
  Universal time: Tue 2018-04-03 23:48:50 UTC
        RTC time: Tue 2018-04-03 23:48:50
       Time zone: America/Detroit (EDT, -0400)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  Sun 2018-03-11 01:59:59 EST
                  Sun 2018-03-11 03:00:00 EDT
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  Sun 2018-11-04 01:59:59 EDT
                  Sun 2018-11-04 01:00:00 EST

How systemd handles /etc/init.d scripts

For script systemd finds in /etc/init.d, its systemd-sysv-generator service creates an ad hoc unit file. The generator places such files in /run/systemd/generator.late/ or similar.

If the generator doesn’t do exactly what we want, we can override it with a drop-in file (just like we can override any other unit file). systemd.unit(5) says:

There are two methods of overriding vendor settings in unit files: copying the unit file from /lib/systemd/system to /etc/systemd/system and modifying the chosen settings. Alternatively, one can create a directory named unit.d/ within /etc/systemd/system and place a drop-in file name.conf there that only changes the specific settings one is interested in. Note that multiple such drop-in files are read if present. […] Note that for drop-in files, if one wants to remove entries from a setting that is parsed as a list (and is not a dependency), such as ConditionPathExists= (or e.g. ExecStart= in service units), one needs to first clear the list before re-adding all entries except the one that is to be removed. See below for an example.

For example, make systemd restart a service if it dies:

--- ~ $  cat /run/systemd/generator.late/qemu-guest-agent.service
# Automatically generated by systemd-sysv-generator

[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/init.d/qemu-guest-agent
Description=LSB: QEMU Guest Agent startup script
Before=multi-user.target
Before=multi-user.target
Before=multi-user.target
Before=graphical.target
After=remote-fs.target

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SuccessExitStatus=5 6
ExecStart=/etc/init.d/qemu-guest-agent start
ExecStop=/etc/init.d/qemu-guest-agent stop
--- ~ $  sudo mkdir /etc/systemd/system/qemu-guest-agent.service.d
--- ~ $  sudo vi /etc/systemd/system/qemu-guest-agent.service.d/local.conf
--- ~ $  cat /etc/systemd/system/qemu-guest-agent.service.d/local.conf
[Service]
PIDFile=
PIDFile=/var/run/qemu-ga.pid
RemainAfterExit=
RemainAfterExit=no
Restart=
Restart=always
RestartSec=20
--- ~ $  sudo systemctl daemon-reload
--- ~ $  sudo systemctl restart qemu-guest-agent.service

References