(2016? Updated slightly 2018)
systemd is a new Linux init system.
SysV init | systemd |
---|---|
ls /etc/init.d/ |
systemctl |
service foo start/stop/restart/reload |
systemctl start/stop/restart/reload foo |
cat /etc/init.d/foo.sh |
systemctl cat foo.service |
Like it or not, systemd will be the linux init system for at least the next few years.
Init is the first process on a unix system. PID 1. All other processess descend from it. systemd fulfills that role, but does significantly more. In these notes, we’ll focus on how systemd fulfills its core duties as a replacement for SysV init.
On CentOS 7:
$ ps 1
PID TTY STAT TIME COMMAND
1 ? Ss 0:07 /lib/systemd/systemd --system --deserialize 16
On Debian 9:
$ ps 1
PID TTY STAT TIME COMMAND
1 ? Ss 0:03 /sbin/init
$ ls -l /sbin/init
lrwxrwxrwx 1 root root 20 Apr 8 06:51 /sbin/init -> /lib/systemd/systemd
systemd controls units. A unit is anything systemd controls. These include services, sockets, devices, etc.
systemd reads unit files for configuration.
Documentation for systemd is thin. [UPDATE: as to 2018 the documentation coverage is significantly improved.]
systemctl --failed Show failed units/services
systemd-cgtop Top-like display of cgroups
systemctl kill foo Kill foo _and_ its children
systemctl kill -s SIGKILL foo Kill foo _and_ its children
systemctl list-dependencies ssh.service Show dependencies
systemctl show -p CPUShares ssh.service Show CPU shares
systemctl set-property ssh.service CPUShares=999999 Set CPU shares
Running systemctl
without arguments shows the state of each service loaded on boot.
A few more details about any of these services can be seen with systemctl status ssh.service
, for example.
$ systemctl status ssh.service
● ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/lib/systemd/system/ssh.service; enabled)
Active: active (running) since Thu 2015-01-29 20:01:09 EST; 3 weeks 2 days ago
Main PID: 759 (sshd)
CGroup: /system.slice/ssh.service
└─759 /usr/sbin/sshd -D
Unit files configure units.
They’re scattered all over the place.
From systemd.unit(5)
:
/etc/systemd/system/*
/run/systemd/system/*
/lib/systemd/system/*
...
$XDG_CONFIG_HOME/systemd/user/*
$HOME/.config/systemd/user/*
/etc/systemd/user/*
$XDG_RUNTIME_DIR/systemd/user/*
/run/systemd/user/*
$XDG_DATA_HOME/systemd/user/*
$HOME/.local/share/systemd/user/*
/lib/systemd/user/*
(The ellipsis is from the man page. Nice.)
(Red Hat says: “Systemd unit files are stored in ‘/usr/lib/systemd/system/’”, whereas unit files created or modified by the users are stored in ‘/etc/systemd/system/’.“)
After adding or changing a unit file, systemctl daemon-reload
will make systemd take notice of the change, although it will not automatically start a new service without systemctl start foo.service
.
If creating a user unit (e.g., $HOME/.config/systemd/user/foo.service
), enable it with:
$ systemctl --user daemon-reload
$ systemctl --user enable foo
$ systemctl --user start foo
$ systemctl --user status foo
What’s the difference, if any, between a service file and a unit file?
None.
A *.service
file is just a unit file for a service.
*.socket
, *.mount
, etc., are also unit files.
The [Unit] section contains generic information about the service. systemd not only manages system services, but also devices, mount points, timers, and other components of the system. The generic term for all these objects in systemd is a unit, and the [Unit] section encodes information about it that might be applicable not only to services but also in to the other unit types systemd maintains.*
Here’s /etc/systemd/system/myexample.service
:
[Unit]
Description=MyApp
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=/usr/bin/docker kill busybox1
ExecStartPre=/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name busybox1 busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"
[Install]
WantedBy=multi-user.target
…and then we do systemctl enable /etc/systemd/system/myexampel.service
and systemctl start myexample.service
for our new service.
A real unit file I wrote (/etc/systemd/system/asterisk.service
):
[Unit]
Description=Asterisk PBX and telephony daemon
Documentation=man:asterisk(8)
Wants=network.target
After=network.target
[Service]
Type=simple
User=asterisk
Group=asterisk
PermissionsStartOnly=true
ExecStart=/usr/sbin/asterisk -g -f -C /etc/asterisk/asterisk.conf
ExecStop=/usr/sbin/asterisk -rx 'core stop now'
ExecReload=/usr/sbin/asterisk -rx 'core reload'
ExecStartPost=/home/admin/bin/asterisk_status.pl
ExecStartPost=/bin/sh -c 'echo "The Asterisk service on gab restarted. See https://gab.example.com/asterisk-status.txt" | mail -s "Asterisk service restarted" root'
ExecStopPost=/bin/sh -c 'echo "The Asterisk service on gab stopped." | mail -s "Asterisk service stopped" root'
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Things to note about the above unit file:
Check the syntax of a unit file:
$ systemd-analyze verify myunit.service
Targets are a way of grouping units, and are vaguely similar to SysV’s run levels. Common targets include multi-user.target
and graphical.target
.
systemctl get-default
shows the default target.
systemctl list-units --type target
shows the current target(s).
systemctl list-units --type target --all
lists all targets.
(Jesus, nice command structure, guys.)
systemctl isolate foo.target
changes the current target.
systemctl isolate rescue.target
is as close as we get to dropping to single user runlevel;
systemctl rescue
is shorthand for this.
(In practice systemctl emergency
might be preferable, since it sends a warning to all users.)
cgroups are a feature of the linux kernel, not systemd. But systemd makes cgroups easier to use.
Imagine we have a web app that forks a bunch of apache processes. It would be handy to be able to manage and measure those processes as a group, apart from any unrelated apache processes on the box. cgroups lets us do that. Furthermore, cgroups let us restrain that gaggle of apache processes from starving other processes on the box by dint of the scheduling advantage their great number gives them.
A cgroup does two things: it groups and labels/tags related processes as a single service, and it lets us control/measure that service. systemd relies on the first feature of cgroups to function; the second is just a useful feature of cgroups.
Processes on traditional *nix systems are a single hierarchy (i.e., all processes descend from init). cgroups (“control groups”) bundle processes together such that each cgroup appears to be its own independent process hierarchy. The processes which are part of a cgroup are called “tasks”. A cgroup can spawn a new child cgroup, which inherit the attributes of the parent cgroup. Tasks can be moved between cgroups. Tasks can belong to more than one cgroup at a time, so long as those cgroups are not part of the same hierarchy of descent. (If a task is added to a second cgroup that’s part of the same hierarchy as its original cgroup, the task is automatically removed from the original cgroup). When a task forks off a child task, that child task is automatically part of the same cgroup (though it can be subsequently moved to a different cgroup). Forked tasks are independent; child and parent can be changed (e.g., to different cgroups) without affecting each other.
By grouping related tasks, we can think about managing resources for services rather than individual processes.
Each cgroup can attach to one or more resource “subsystems”.
Subsystems include cpu, blkio, net_cls (tagging packets with their originating cgroup), memory, namespaces, etc (enumerate them with ls /sys/fs/cgroup/
).
We could, for example, the attach tasks in a cgroup to a particular cpu core.
Kernel literature sometimes calls these resource subsystems “controllers” or “resource controllers”.
The point of cgroups is the ability to provide accounting (e.g., for billing purposes or provisioning planning), limits/prioritization (e.g., use only so much memory or disk I/O), and isolation (e.g., namespaces) for a group of processes.
(Namespace isolation is technically a separate feature from cgroups.)
In practice, systemd seems to use only a few of the features of cgroups, mainly to organize related processes in a way that makes it easier for administrators to keep track of them. Under sysvinit without using cgroups, orphaned processes are re-parented to PID 1, making it sometimes difficult to know where such a process originated; systemd keeps related processes together in a cgroup, even if the parent dies.
ps
can show cgroups:
$ ps axw -o pid,user,cgroup,args
[...snip...]
729 root 4:devices:/system.slice/rpc /sbin/rpcbind -w
738 statd 4:devices:/system.slice/nfs /sbin/rpc.statd
743 root - [rpciod]
745 root - [nfsiod]
752 root 4:devices:/system.slice/nfs /usr/sbin/rpc.idmapd
756 root 4:devices:/system.slice/cro /usr/sbin/cron -f
757 root 4:devices:/system.slice/sma /usr/sbin/smartd -n
758 daemon 4:devices:/system.slice/atd /usr/sbin/atd -f
759 root 4:devices:/system.slice/ssh /usr/sbin/sshd -D
[...snip...]
28267 paulgor+ 4:devices:/user.slice,1:nam rxvt
28268 paulgor+ 4:devices:/user.slice,1:nam rxvt
28269 paulgor+ 4:devices:/user.slice,1:nam bash
28734 paulgor+ 4:devices:/user.slice,1:nam /bin/bash
28736 paulgor+ 4:devices:/user.slice,1:nam iceweasel
The systemd-cgls
command gives this information as a tree.
Using systemd to signal a service like systemctl kill -s SIGTERM foo.service
ensures that all processes that make up the service receive the signal.
systemctl stop foo.service
terminates the running service.
It will turn back on at the next boot or if something triggers activation for it (hardware plugging, socket activation, etc.).
systemctl disable foo.service
unhooks a service from any activation triggers.
It will not start on reboot.
The service can still be started manually.
(Note that disabling a service will not actually stop the currently running instance, if any, so you may also want to send systemctl stop foo.service
.)
We can also mask a service, which both disables it and prevents it from being started manually: systemctl mask foo.service
.
Finally, doing something like ln -s /dev/null /etc/systemd/system/foo.service; systemctl daemon-reload
will block the service from being started, even manually, because entries in /etc/systemd/
override those in /lib/systemd/
.
Services can be brought back up in the way we’d expect (enable
, start
).
/run
and changes to /etc
There’s a new top-level directory called /run
.
This contains things that once went in /var/run
(or, worse, got stuck in /dev/.foo/
because /var
wasn’t available early enough in the boot process).
See this mailing list post about /run.
/run/
isn’t strictly systemd-related, but part of a larger (some might say “overreaching”) clean up, like the newly standardized config files (although the point of those new config files is that they can be run directly by systemd, without executing a shell, so systemd directly reads /etc/fstab
and /etc/hostname
).
Because systemd unit files are capable of doing the same job (i.e. — offering config options for init scripts that have become too complex for admins to safely edit), systemd has the ambition to phase out /etc/default/
(and /etc/sysconfig
on Red Hat-based distros).
systemd-tmpfiles
creates, deletes, and tidies up temp files based on configuration files in /etc/tmpfiles.d/
and /usr/lib/tmpfiles.d/
.
The syntax of these files is concise (see tmpfiles.d(5)
):
$ cat /usr/lib/tmpfiles.d/sshd.conf
d /var/run/sshd 0755 root root
systemd can do cron-like stuff, configured with .timer
unit files.
See systemd.timer(5)
.
A unit file named like foo.timer
controls execution of a foo.service
file.
Both files must exist in one of the standard paths for unit files (e.g., $XDG_CONFIG_HOME/systemd/user/
).
When enabling/disabling, do so for the .timer
file, not the associated .service
file.
The .timer
file has a [Timer]
section that sets when and how the timer runs.
The time may be either:
OnCalendar=
OnBootSec
, OnActiveSec
, etc.🐚 ~ $ systemctl list-timers
🐚 ~ $ systemctl list-timers --all
If a timer gets out of sync, delete the stamp file that marks when the timer last ran (in /var/lib/systemd/timers
or ~/.local/share/systemd/
).
Systemd will recreate the stamp file on the next timer run.
foo.timer
(realtime)
[Unit]
Description=Run foo weekly
[Timer]
OnCalendar=weekly
Persistent=true
[Install]
WantedBy=timers.target
bar.timer
(monotonic)
[Unit]
Description=Run bar weekly and on boot
[Timer]
OnBootSec=15min
OnUnitActiveSec=1w
[Install]
WantedBy=timers.target
systemd handles mounting filesystems.
See systemd.mount(5)
.
It can do automounting, and includes various additions to the traditional /etc/fstab
syntax.
systemd has a built-in SNTP server, but it.
It more or less backs off if an ntpd is running.
The timedatectl
utility can change or monitor the system clock.
$ timedatectl status
Local time: Tue 2018-04-03 19:48:50 EDT
Universal time: Tue 2018-04-03 23:48:50 UTC
RTC time: Tue 2018-04-03 23:48:50
Time zone: America/Detroit (EDT, -0400)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: yes
Last DST change: DST began at
Sun 2018-03-11 01:59:59 EST
Sun 2018-03-11 03:00:00 EDT
Next DST change: DST ends (the clock jumps one hour backwards) at
Sun 2018-11-04 01:59:59 EDT
Sun 2018-11-04 01:00:00 EST
For script systemd finds in /etc/init.d
, its systemd-sysv-generator
service creates an ad hoc unit file.
The generator places such files in /run/systemd/generator.late/
or similar.
If the generator doesn’t do exactly what we want, we can override it with a drop-in file (just like we can override any other unit file).
systemd.unit(5)
says:
There are two methods of overriding vendor settings in unit files: copying the unit file from /lib/systemd/system to /etc/systemd/system and modifying the chosen settings. Alternatively, one can create a directory named unit.d/ within /etc/systemd/system and place a drop-in file name.conf there that only changes the specific settings one is interested in. Note that multiple such drop-in files are read if present. […] Note that for drop-in files, if one wants to remove entries from a setting that is parsed as a list (and is not a dependency), such as ConditionPathExists= (or e.g. ExecStart= in service units), one needs to first clear the list before re-adding all entries except the one that is to be removed. See below for an example.
For example, make systemd restart a service if it dies:
--- ~ $ cat /run/systemd/generator.late/qemu-guest-agent.service
# Automatically generated by systemd-sysv-generator
[Unit]
Documentation=man:systemd-sysv-generator(8)
SourcePath=/etc/init.d/qemu-guest-agent
Description=LSB: QEMU Guest Agent startup script
Before=multi-user.target
Before=multi-user.target
Before=multi-user.target
Before=graphical.target
After=remote-fs.target
[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SuccessExitStatus=5 6
ExecStart=/etc/init.d/qemu-guest-agent start
ExecStop=/etc/init.d/qemu-guest-agent stop
--- ~ $ sudo mkdir /etc/systemd/system/qemu-guest-agent.service.d
--- ~ $ sudo vi /etc/systemd/system/qemu-guest-agent.service.d/local.conf
--- ~ $ cat /etc/systemd/system/qemu-guest-agent.service.d/local.conf
[Service]
PIDFile=
PIDFile=/var/run/qemu-ga.pid
RemainAfterExit=
RemainAfterExit=no
Restart=
Restart=always
RestartSec=20
--- ~ $ sudo systemctl daemon-reload
--- ~ $ sudo systemctl restart qemu-guest-agent.service
man systemd.index
and http://www.freedesktop.org/software/systemd/man/index.html