paulgorman.org

Linux Virtualization (KVM)

This document assumes Debian as the host OS. Also, the fiddly bits focus more on virtualizing Windows guests (as virtualizing Linux guests seems more straightforward).

Linux supports a number of virtualization solutions. KVM is the solution supported in the official kernel, starting with kernel 2.6.20.

Overview

Virtualization has several layers. At the lowest level is the hardware of the host. The next layer is the host operating system. Then, there is the hypervisor (a.k.a.—virtual machine manager). In some virtualization solutions, the hypervisor is part of the host operating system. Guest virtual machines are the layer above the hypervisor. User-space applications run on top of the guests.

KVM adds hypervisor functionality to the Linux kernel with a loadable kernel module, which runs the /dev/kvm device.

Guest operating systems run as user-space processes, with memory allocation, scheduling, etc. handled by the kernel.

Requirements & Getting Started

KVM requires the processor to support hardware virtualization.

cat /proc/cpuinfo | grep flags | head -n1 | grep -oE 'vmx|svm|ept|vpid|npt|tpr_shadow|flexpriority|vnmi'

Either vmx (Intel) or svm (AMD) are required. The other flags are icing.

Install stuff:

sudo apt-get install kvm qemu-kvm libvirt-bin virtinst virt-manager virt-viewer bridge-utils spice-vdagent

After installation completes, running sudo virsh -c qemu:///system list should give output like:

 Id    Name                           State
----------------------------------------------------

Make sure the kvm kernel module loaded:

lsmod |grep kvm

If not, you can tell it to load:

modprobe kvm_intel

Check for any errors in loading the module:

dmesg |grep kvm

Network Setup

Edit /etc/network/interfaces to look something like:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet manual

# Bridge to support virtual macines
auto br0
iface br0 inet static
    address 10.0.0.76
    network 10.0.0.0
    netmask 255.255.255.0
    gateway 10.0.0.1
    bridge_ports eth0
    bridge_fd 9
    bridge_hello 2
    bridge_maxage 12
    bride_stp off

Bring up the bridge:

sudo ifup br0

The output of sudo ifconfig should look something like:

br0       Link encap:Ethernet  HWaddr 00:0a:5e:5d:00:74  
          inet addr:10.0.0.76  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::20a:5eff:fe5d:74/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:914 errors:0 dropped:1 overruns:0 frame:0
          TX packets:841 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:134387 (131.2 KiB)  TX bytes:148180 (144.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:0a:5e:5d:00:74  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:980 errors:0 dropped:0 overruns:0 frame:0
          TX packets:838 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:163403 (159.5 KiB)  TX bytes:154784 (151.1 KiB)
          Interrupt:17 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:576 errors:0 dropped:0 overruns:0 frame:0
          TX packets:576 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:50496 (49.3 KiB)  TX bytes:50496 (49.3 KiB)

A Note/Warning Regarding NetworkManager

Doing bridging with NetworkManager has some nasty gottcha's at this time (2014), and is not recommended. To see if NetworkManager is controlling any interfaces: nmcli dev status. Fortunately, NetworkManager will ignore any interface configured in /etc/network/interfaces if /etc/NetworkManager/NetworkManager.conf contains this:

[ifupdown]
managed=false

Creating A Guest

virt-install --connect qemu:///system --name=window_server_2012_test_01 --ram=4096 --vcpus=1,maxvcpus=2 --cdrom=/home/paulgorman/iso/Window_Server_2012_RTM.iso --os-type=windows --disk path=/home/paulgorman/var/kvm/windows_server_2012_test_01.qcow2,size=12,sparse=false, --network bridge=br0,mac=RANDOM --graphics vnc

(The above fails if you have VirtualBox running, with "kvm: enabling virtualization on CPU0 failed" seen in /var/log/syslog.)

Once the installation finishes allocating the disk image, virt-viewer will open with the install for the guest. Running virsh -c qemu:///system list at this point will show a running virtual machine.

Storage Pools

A storage pool is a file, directory, or storage device used by libvirt to provide storage to guests. By default, libvert uses /var/lib/libvirt/images/ as a directory-based storage pool.

If you're doing this as a production server, you probably want to at least put the machine on its own LVM volume (and dedicate an LVM volume group as the KVM storage pool) rather than as a file on an existing filesystem. See the LVM section below.

Beyond such local storage pool types, libvirt also supports networked/shared storage pools, like fibre channel (LUN's and FCoE), iSCSI, and NFS.

List the known/existing storage pools:

# virsh pool-list
 Name                 State      Autostart
-------------------------------------------
 default              active     yes
 virt-lvm-pool        active     yes

# virsh pool-info virt-lvm-pool
Name:           virt-lvm-pool
UUID:           6f7d52d7-2311-4e6b-a59b-51580f66f36d
State:          running
Persistent:     yes
Autostart:      yes
Capacity:       931.27 GiB
Allocation:     566.27 GiB
Available:      365.00 GiB

# virsh vol-list --pool virt-lvm-pool
 Name                 Path
------------------------------------------------------------------------------
 home                 /dev/falstaff-vg/home
 lv-openbsd-30gb      /dev/falstaff-vg/lv-openbsd-30gb
 lv-pi2-test          /dev/falstaff-vg/lv-pi2-test
 root                 /dev/falstaff-vg/root
 swap_1               /dev/falstaff-vg/swap_1

Local storage pools do not support live migration of guests. If you're running a handful of hosts with a few guests each, it may be practical to support live migration with NFS.

virsh

virsh is a virtual machine management shell. Running virsh puts you into the shell, where you can type help for a list of available commands. These include:

The config files for guests are stored under /etc/libvert/ as XML. These files are mostly generated by virsh, and shouldn't be edited by hand.

qemu:///system & qemu:///system

virsh takes a URI as an argument.

URI's like "qemu:///session" connect to a libvirtd instance running as the user, so QEMU instances spawned from it share that user's privileges. Use this for desktop virtualization, with virtual machines storing their disk images in a user's home directory and managed from the local desktop session.

URI's like "qemu:///system" connect to a libvirtd instance running as root, so QEMU instances spawned from it have more elevated privileges than the client managing them. Use this for server virtualization, where the guests may need access to host resources (block, network devices, etc.) that require elevated privileges.

virt-manager

virt-manager is a GUI utility for managing virtual machines. It can start, stop, resume, suspend, and connect to virtual machines. It also offers modest monitoring and guest creation features. It's pretty friendly, but virt-manager can't do all the things virsh handles.

sudo su -
DISPLAY=localhost:10.0 XAUTHORITY=/home/paulgorman/.Xauthority /usr/bin/virt-manager

Editing a guest config

virsh edit guestname

...edits the xml config of the guest. Changes are applied on the next restart of the guest.

Tuning & Performance

See the KVM tuning docs, which mention three things: pass actual hardware CPU flags to the guest (qemu -cpu host); make sure networking uses a bridged setup, and use VirtIO drivers in the guest rather than the default RTL8139 NIC driver; and use a raw partiction on either its own dedicated partition or a logical volume, rather than a qcow2 file sitting in /var/whatever.

I believe that the virt-manager Virtual Machine -> Details -> Processor -> Configuration -> Copy Host CPU Configuration button passes the actual hardware CPU flags to the guest.

virt-viewer uses VLC by default, which can seem slightly laggy. In virt-manager, change the video model to 'qxl' and the display to 'Spice'.

Download an ISO of the Windows VirtIO Drivers on your host. In virt-manager, set the guest's NIC device model to 'virtio'. Reboot the guest, connect the CD drive to the virtio driver ISO, then update the NIC drivers in the guest.

If you have sufficient RAM to reserve a static amount for all your guests, do so. The virtio balloon driver allows you to change the amount of RAM assigned to a guest without the need to pause or restart it. In practice, however, I found that setting a guest's current/initial memory to less than the guest's maximum memory causes a lot of unnecessary, almost constant disk activity.

LVM

The Logical Volume Manager offers a number of benefits:

LVM managed storage has several components/layers:


    hda1   hdc1          (Physical Volumes on partitions or whole disks)
       \   /
        \ /
       diskvg            (Volume Group)
       /  |  \
      /   |   \
  usrlv rootlv varlv     (Logical Volumes)
    |      |     |
 ext2  reiserfs  xfs     (filesystems)

Physical Volume
a hard disk or RAID group
Volume Group
the highest level abstraction used within the LVM. It gathers together a collection of Logical Volumes and Physical Volumes into one administrative unit.
Logical Volume
equivalent of a disk partition in a non-LVM system. The LV is visible as a standard block device; as such the LV can contain a file system (eg. /home).

LVM inspection and monitoring

Note that vgdisplay shows the free and allocated space on a volume group, which we will want to know if we're using the volume group as a storage pool for guests.

paulgorman@falstaff $ sudo pvscan
  PV /dev/sda5   VG falstaff   lvm2 [297.85 GiB / 0    free]
  Total: 1 [297.85 GiB] / in use: 1 [297.85 GiB] / in no VG: 0 [0   ]
paulgorman@falstaff $ sudo pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda5
  VG Name               falstaff
  PV Size               297.85 GiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              76249
  Free PE               0
  Allocated PE          76249
  PV UUID               EHKLtd-8c8K-v4sx-oE9V-lvK3-StsI-t7w4PZ
   
paulgorman@falstaff $ sudo vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "falstaff" using metadata type lvm2

paulgorman@falstaff $ sudo vgdisplay
  --- Volume group ---
  VG Name               falstaff
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  12
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               297.85 GiB
  PE Size               4.00 MiB
  Total PE              76249
  Alloc PE / Size       76249 / 297.85 GiB
  Free  PE / Size       0 / 0   
  VG UUID               8cazW7-4vAv-mIZb-CB6K-xhV3-SdIT-zOLPkW
   
paulgorman@falstaff $ sudo lvscan
  ACTIVE            '/dev/falstaff/swap_1' [11.47 GiB] inherit
  ACTIVE            '/dev/falstaff/root' [37.25 GiB] inherit
  ACTIVE            '/dev/falstaff/home' [249.13 GiB] inherit
paulgorman@falstaff $ sudo lvdisplay
  --- Logical volume ---
  LV Path                /dev/falstaff/swap_1
  LV Name                swap_1
  VG Name                falstaff
  LV UUID                v8AM25-fT2c-gsjA-Pawp-Krnd-hvZQ-HwkAGQ
  LV Write Access        read/write
  LV Creation host, time falstaff, 2012-07-16 13:53:50 -0400
  LV Status              available
  # open                 2
  LV Size                11.47 GiB
  Current LE             2936
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:1
   
  --- Logical volume ---
  LV Path                /dev/falstaff/root
  LV Name                root
  VG Name                falstaff
  LV UUID                SJ3WkH-h150-GEjZ-dWT7-FYIJ-opDk-HmH9rd
  LV Write Access        read/write
  LV Creation host, time falstaff, 2012-07-16 14:07:50 -0400
  LV Status              available
  # open                 1
  LV Size                37.25 GiB
  Current LE             9536
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:0
   
  --- Logical volume ---
  LV Path                /dev/falstaff/home
  LV Name                home
  VG Name                falstaff
  LV UUID                5pN42V-kkNQ-fMdX-GZ5u-xleA-iz3d-umR29a
  LV Write Access        read/write
  LV Creation host, time falstaff, 2012-07-16 14:08:02 -0400
  LV Status              available
  # open                 1
  LV Size                249.13 GiB
  Current LE             63777
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:2

Live Guest Backup with LVM Snapshots

You need a LV group with unused space. vgdisplay shows the allocated and free space on a volume group.

Create a new logical volume of the snapshot type, like lvcreate --snapshot --size 1500M --name mySnapshotVolume /dev/myLvmGroup/myOriginalVolume. This snapshot volume must be large enough to hold all writes to the original volume made during the lifetime of the snapshot. If the writes overflow the capacity of the snapshot volume, the snapshot volume will be automatically dropped.

lvs shows the status. You will likely see the snapshot is mostly empty, since it's only storing the new writes to the original volume.

Create the backup by duplicating the snapshot volume onto a backup volume with sufficient space: dd if=/dev/myLvmGroup/mySnapshotVolume of=/dev/myLvmGroup/backupVolume. Of course, you can also compress the output: dd if=/dev/myLvmGroup/mySnapshotVolume | gzip -c > /var/backups/myVirtualMachine.img.gz

When you're done with the backup, lvremove /dev/lvmgroup/mySnapshot.

SSD's

Solid state drives have nothing to do with virtualization. Arch Linux has a pretty good article on SSD use under linux.

Further Reading