Linux Virtualization (KVM)


Linux offers a number of virtualization solutions. KVM is the solution with official kernel support, starting with Linux 2.6.20.


Virtualization has several layers. At the lowest level is the hardware of the host. The next layer is the host operating system. Then, there is the hypervisor (a.k.a. — virtual machine manager). In some virtualization solutions, the hypervisor is part of the host operating system. Guest virtual machines are the layer above the hypervisor. User-space applications run on top of the guests.

KVM adds hypervisor functionality to the Linux kernel with a loadable kernel module, which runs the /dev/kvm device.

Guest operating systems run as user-space processes, with memory allocation, scheduling, etc. handled by the kernel.

What are the parts of Linux virtualization?

KVM calls guests “domains”.

Requirements and Getting Started

KVM requires the processor to support hardware virtualization.

$ cat /proc/cpuinfo | grep flags | head -n1 | grep -oE 'vmx|svm|ept|vpid|npt|tpr_shadow|flexpriority|vnmi'

Either vmx (Intel) or svm (AMD) are required.

Install stuff:

# sudo apt-get install kvm qemu-kvm libvirt-bin virtinst virt-manager virt-viewer bridge-utils spice-vdagent

After installation completes, running sudo virsh -c qemu:///system list should give output like:

Id    Name                           State

Make sure the kvm kernel module loaded:

$ lsmod | grep kvm

If not, tell it to load:

# modprobe kvm_intel

Check for any errors in loading the module:

# dmesg | grep kvm

Network Setup

Attach guests to the network using a bridge device, which works like a virtual network switch. See for more about virtual networking.

Edit /etc/network/interfaces to look something like:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet manual

# Bridge to support virtual macines
auto br0
iface br0 inet static
    bridge_ports eth0
    bridge_fd 9
    bridge_hello 2
    bridge_maxage 12
    bride_stp off

Bring up the bridge:

# ifup br0

The output of ifconfig should look something like:

br0       Link encap:Ethernet  HWaddr 00:0a:5e:5d:00:74
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::20a:5eff:fe5d:74/64 Scope:Link
          RX packets:914 errors:0 dropped:1 overruns:0 frame:0
          TX packets:841 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:134387 (131.2 KiB)  TX bytes:148180 (144.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:0a:5e:5d:00:74  
          RX packets:980 errors:0 dropped:0 overruns:0 frame:0
          TX packets:838 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:163403 (159.5 KiB)  TX bytes:154784 (151.1 KiB)

lo        Link encap:Local Loopback
          inet addr:  Mask:
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:576 errors:0 dropped:0 overruns:0 frame:0
          TX packets:576 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:50496 (49.3 KiB)  TX bytes:50496 (49.3 KiB)</pre>

A Note/Warning Regarding NetworkManager

[UPDATE: as of 2018, NetworkManger seems to handle bridges OK.]

Doing bridging with NetworkManager has some nasty gottchas (as of 2014), and is not recommended. To see if NetworkManager is controlling any interfaces: nmcli dev status. Fortunately, NetworkManager ignores any interface configured in /etc/network/interfaces if /etc/NetworkManager/NetworkManager.conf contains:


Creating A Guest

# virt-install --connect qemu:///system \
	--name=window_server_2012_test_01 \
	--ram=4096 \
	--vcpus=1 \
	--cdrom=/home/paulgorman/iso/Window_Server_2012_RTM.iso \
	--os-type=windows \
	--disk path=/home/paulgorman/var/kvm/windows_server_2012_test_01.qcow2,size=12,sparse=false, \
	--network bridge=br0,mac=RANDOM \
	--graphics vnc

(The above fails if you have VirtualBox running, with “kvm: enabling virtualization on CPU0 failed” seen in /var/log/syslog.)

Once the installation finishes allocating the disk image, virt-viewer opens with the installer for the guest. Running virsh -c qemu:///system list at this point will show a running virtual machine.

Storage Pools

A storage pool is a file, directory, or storage device used by libvirt to provide storage to guests. By default, libvirt uses /var/lib/libvirt/images/ as a directory-based storage pool.

If you’re doing this as a production server, you probably want to at least put the machine on its own LVM volume (and dedicate an LVM volume group as the KVM storage pool) rather than as a file on the host filesystem. See the LVM section below. (UPDATE 2017: the performance difference between a qcow2 file and an LVM volume is negligible in most cases, and qcow2 offers some advantages and flexibility.)

Beyond local storage pool types, libvirt also supports networked/shared storage pools, like fibre channel (LUN’s and FCoE), iSCSI, and NFS.

List the known/existing storage pools:

# virsh pool-list
 Name                 State      Autostart
 default              active     yes
 virt-lvm-pool        active     yes

# virsh pool-info virt-lvm-pool
Name:           virt-lvm-pool
UUID:           6f7d52d7-2311-4e6b-a59b-51580f66f36d
State:          running
Persistent:     yes
Autostart:      yes
Capacity:       931.27 GiB
Allocation:     566.27 GiB
Available:      365.00 GiB

# virsh vol-list --pool virt-lvm-pool
 Name                 Path
 home                 /dev/falstaff-vg/home
 lv-openbsd-30gb      /dev/falstaff-vg/lv-openbsd-30gb
 lv-pi2-test          /dev/falstaff-vg/lv-pi2-test
 root                 /dev/falstaff-vg/root
 swap_1               /dev/falstaff-vg/swap_1

Local storage pools do not support live migration of guests. If you’re running a handful of hosts with a few guests each, it may be practical to support live migration with NFS. (UPDATE 2017: local storage pools can support live migration in some scenarios.)

See the “qcow2” section below.


virsh is a virtual machine management shell. Running virsh enters the shell. Type help for a list of available commands, which include:

The config files for guests are stored under /etc/libvirt/ as XML. Libvirt generates these files, so don’t manually edit them (use virsh edit or virt-manager).

qemu:///system and qemu:///system

virsh takes a URI as an argument.

URI’s like qemu:///session connect to a libvirtd instance running as the user, so QEMU instances spawned from it share that user’s privileges. Use this for desktop virtualization, with virtual machines storing their disk images in a user’s home directory and managed from the local desktop session.

URI’s like qemu:///system connect to a libvirtd instance running as root, so QEMU instances spawned from it have more elevated privileges than the client managing them. Use this for server virtualization, where the guests may need access to host resources (block, network devices, etc.) that require elevated privileges.


virt-manager is a GUI utility for managing virtual machines. It can start, stop, resume, suspend, and connect to virtual machines. It also offers modest monitoring and guest creation features. However, the virt-manager GUI does not expose all the functionality offered by virsh.

sudo su -
DISPLAY=localhost:10.0 XAUTHORITY=/home/paulgorman/.Xauthority /usr/bin/virt-manager

Editing a Guest Config

# virsh edit myguest

…edits the xml config of the guest. Changes are applied on the next restart of the guest.

Tuning and Performance

See the KVM tuning docs, which mention three things:

I believe that the virt-manager Virtual Machine -> Details -> Processor -> Configuration -> Copy Host CPU Configuration button passes the actual hardware CPU flags to the guest.

virt-viewer uses VLC by default, which can seem slightly laggy. In virt-manager, change the video model to ‘qxl’ and the display to ‘Spice’.

Download an ISO of the Windows VirtIO Drivers on your host. In virt-manager, set the guest’s NIC device model to ‘virtio’. Reboot the guest, connect the CD drive to the virtio driver ISO, then update the NIC drivers in the guest.

If you have sufficient RAM to reserve a static amount for all your guests, do so. The virtio balloon driver allows you to change the amount of RAM assigned to a guest without the need to pause or restart it. In practice, however, setting a guest’s current/initial memory to less than the guest’s maximum memory causes a lot of paging to disk.


The Logical Volume Manager offers a number of benefits as a local storage pool. By abstracting physical storage, LVM provides great flexibility to move, resize, and add logical volumes on live systems.

LVM-managed storage has several components/layers:

    hda1   hdc1          (Physical Volumes on partitions or whole disks)
       \   /
        \ /
       diskvg            (Volume Group)
       /  |  \
      /   |   \
  usrlv rootlv varlv     (Logical Volumes)
    |      |     |
 ext4  reiserfs  xfs     (filesystems)

Set an LVM Volume Group as a Storage Pool

# virsh pool-list --all --details
# virsh pool-define-as --name vg0 --type logical
# virsh pool-autostart vg0
# virsh pool-start vg0
# virsh pool-list --all --details

LVM inspection and monitoring

Note that vgdisplay shows the free and allocated space on a volume group, which we will want to know if we’re using the volume group as a storage pool for guests.

# pvscan
  PV /dev/sda5   VG falstaff   lvm2 [297.85 GiB / 0    free]
  Total: 1 [297.85 GiB] / in use: 1 [297.85 GiB] / in no VG: 0 [0   ]
# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda5
  VG Name               falstaff
  PV Size               297.85 GiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              76249
  Free PE               0
  Allocated PE          76249
  PV UUID               EHKLtd-8c8K-v4sx-oE9V-lvK3-StsI-t7w4PZ

# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "falstaff" using metadata type lvm2

# vgdisplay
  --- Volume group ---
  VG Name               falstaff
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  12
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               297.85 GiB
  PE Size               4.00 MiB
  Total PE              76249
  Alloc PE / Size       76249 / 297.85 GiB
  Free  PE / Size       0 / 0   
  VG UUID               8cazW7-4vAv-mIZb-CB6K-xhV3-SdIT-zOLPkW
# lvscan
  ACTIVE            '/dev/falstaff/swap_1' [11.47 GiB] inherit
  ACTIVE            '/dev/falstaff/root' [37.25 GiB] inherit
  ACTIVE            '/dev/falstaff/home' [249.13 GiB] inherit
# lvdisplay
  --- Logical volume ---
  LV Path                /dev/falstaff/swap_1
  LV Name                swap_1
  VG Name                falstaff
  LV UUID                v8AM25-fT2c-gsjA-Pawp-Krnd-hvZQ-HwkAGQ
  LV Write Access        read/write
  LV Creation host, time falstaff, 2012-07-16 13:53:50 -0400
  LV Status              available
  # open                 2
  LV Size                11.47 GiB
  Current LE             2936
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:1
  --- Logical volume ---
  LV Path                /dev/falstaff/root
  LV Name                root
  VG Name                falstaff
  LV UUID                SJ3WkH-h150-GEjZ-dWT7-FYIJ-opDk-HmH9rd
  LV Write Access        read/write
  LV Creation host, time falstaff, 2012-07-16 14:07:50 -0400
  LV Status              available
  # open                 1
  LV Size                37.25 GiB
  Current LE             9536
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:0
  --- Logical volume ---
  LV Path                /dev/falstaff/home
  LV Name                home
  VG Name                falstaff
  LV UUID                5pN42V-kkNQ-fMdX-GZ5u-xleA-iz3d-umR29a
  LV Write Access        read/write
  LV Creation host, time falstaff, 2012-07-16 14:08:02 -0400
  LV Status              available
  # open                 1
  LV Size                249.13 GiB
  Current LE             63777
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:2

How much unprovisioned disk space to we have on the hypervisor?

# vgdisplay
  --- Volume group ---
  VG Name               vg0
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  207
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                9
  Open LV               6
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               2.73 TiB
  PE Size               4.00 MiB
  Total PE              715242
  Alloc PE / Size       227215 / 887.56 GiB
  Free  PE / Size       488027 / 1.86 TiB
  VG UUID               Q6VvMd-Jbge-xyX5-2LLk-3ozp-BCIP-tmiO65

Live Guest Backup with LVM Snapshots

Snapshots require an LV group with unused space. vgdisplay shows the allocated and free space on a volume group.

Create a new logical volume of the snapshot type, like lvcreate --snapshot --size 1000M --name mySnapshotVolume /dev/myLvmGroup/myOriginalVolume. This snapshot volume must be large enough to hold all writes to the original volume made during the lifetime of the snapshot. If the writes overflow the capacity of the snapshot volume, the snapshot volume will be automatically dropped.

lvs shows the status. You will likely see the snapshot is mostly empty, since it’s only storing writes that occurred since its creation.

Create a backup by duplicating the snapshot volume onto a backup volume with sufficient space: dd if=/dev/myLvmGroup/mySnapshotVolume of=/dev/myLvmGroup/backupVolume. Or compress the output: dd if=/dev/myLvmGroup/mySnapshotVolume | gzip -c &gt; /var/backups/myVirtualMachine.img.gz

Delete the snapshot volume like: lvremove /dev/lvmgroup/mySnapshot.

Grow an Existing Volume

What if we have an exiting volume attached to a VM, and need to grow it live?

# virsh qemu-monitor-command postoffice --hmp "info block"
drive-virtio-disk0: removable=0 io-status=ok file=/dev/vg0/postoffice-os ro=0 drv=raw encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
drive-virtio-disk1: removable=0 io-status=ok file=/dev/vg0/postoffice-data ro=0 drv=raw encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
drive-virtio-disk2: removable=0 io-status=ok file=/dev/vg0/florida-mail ro=0 drv=raw encrypted=0 bps=0 bps_rd=0 bps_wr=0 iops=0 iops_rd=0 iops_wr=0
drive-ide0-1-0: removable=1 locked=0 tray-open=0 io-status=ok [not inserted]

# virsh domblklist postoffice --details
Type       Device     Target     Source
block      disk       vda        /dev/vg0/postoffice-os
block      disk       vdb        /dev/vg0/postoffice-data
block      disk       vdc        /dev/vg0/florida-mail
file       cdrom      hdc        -


# lvextend -L+25G /dev/vg0/florida-mail
  Extending logical volume florida-mail to 525.00 GiB
  Logical volume florida-mail successfully resized
# virsh qemu-monitor-command postoffice block_resize drive-virtio-disk2 525G --hmp

The qemu-monitor-command ... block_resize command lets the VM know about the resize. We can then grow the partition inside the VM (e.g., using Disk Manager in Windows).

Duplicate a Block Device Across the Network for a Not-Live VM

#  dd if=/dev/vg0/myvolume bs=10M | pv -trabep -s 525g | gzip -1 -c | ssh zol 'gunzip -c | dd of=/dev/vg0/myvolume'

qcow2 files

The real-world performance difference between VM’s on LVM block devices versus qcow2 files is negligible. It’s easy to live snapshop qcow2 files. Use of qcow2 over LVM gives the option of serving VM images over NFS, rather than iSCSI or storage that presents block devices.

# virsh snapshot-create-as --domain {VM-NAME} --name "{SNAPSHOT-NAME}" --description "Snapshot before upgrade to version 11.0" --live
# virsh snapshot-list --domain {VM-NAME}
# virsh shutdown --domain {VM-NAME}
# virsh snapshot-revert --domain {VM-NAME} --snapshotname "{SNAPSHOT-NAME}" --running
# virsh snapshot-delete --domain {VM-NAME} --snapshotname "{SNAPSHOT-NAME}"

Besides libvirt’s virsh, QEMU itself provides tools for qcow2.

The qemu-img tool works with offline images. Do not use qemu-img on images attached to running virtual machines! Although libvirt can create a qcow2 image when it creates a guest, qemu-img can pre-create the qcow2 file:

$ qemu-img create -f qcow2 /tmp/myvm.qcow2 5G

Initially, such an image occupies only a couple hundred kilobytes on disk. The image file grows as the VM uses more space. If disk space is not at a premium, preallocating space improves guest performance. Preallocating with “full” writes zeros to the whole space, whereas “falloc” reserves space but does not zero it out. Most of the performance overhead of growing an image comes from calculating and writing metadata. The “metadata” preallocation setting is the best of both worlds — space on disk grows only as necessary but calculating metadata during image creation gives VM performance on-par with full preallocation. An image preallocated with “metadata” is sparse (i.e., the physical size is smaller than the logical size).

$ qemu-img create -f qcow2 -o preallocation=full /tmp/myvm-full.qcow2 100M
$ qemu-img create -f qcow2 -o preallocation=metadata /tmp/myvm-meta.qcow2 100M
$ ls -lh /tmp/ | grep myvm
-rw-r--r-- 1 paulgorman paulgorman 101M Mar 30 17:48 myvm-full.qcow2
-rw-r--r-- 1 paulgorman paulgorman 101M Mar 30 17:49 myvm-meta.qcow2
$ du -h /tmp/myvm-full.qcow2 /tmp/myvm-meta.qcow2
101M    /tmp/myvm-full.qcow2
264K    /tmp/myvm-meta.qcow2

In many scenarios, the cache=writethrough option may also significantly improve performance. (What’s the default cache setting?)

However, QEMU made preallocate=none the default for good reason. Many scenarios prioritize disk space and other concerns over the minor performance cost of growing the disk image. This enables thin provisioning, but also brings a number of useful ancillary benefits.

“qcow” is QEMU copy on write. This enables snapshots, but also enables the use of template/base/backing images to accelerate new image creation and conserve disk space. The idea of backing images resembles Docker’s overlay filesystem (though at a block/cluster* level). When creating an image, the backing-file option causes the new/child image to hold only differences from the backing file; with this, it’s unnecessary to supply a size for the new image.

$ qemu-img create -b base.qcow2 new.qcow2
$ qemu-img info --backing-chain new.qcow2

The original/backing file remains untouched, unless the commit command writes changes from the new image to the backing file. Note that a succesful commit command “empties” the new file.

$ qemu-img commit -b base.qcow2 new.qcow2

The rebase command changes the backing file for an image. Supplying an empty string as the backing file rebases it to itself (i.e., all content gets included, severing the link with any former backing file).

	$ qemu-img rebase -b other.qcow2 myvm.qcow2
	$ qemu-img rebase -b "" myvm.qcow2

Resizing an image is possible (though take caution to shrink any filesystems inside before shrinking the image!):

$ qemu-img resize /tmp/myvm.qcow2 10G

Check the consistency of an image (include the -r all flag to attempt automatic repair):

$ qemu-img check myvm.qcow2

Snapshots. qemu-img can list snapshots of a file, create snapshots, or delete them. It can also apply a snapshot, which reverts the base image to the state captured in the snapshot.

$ qemu-img -c snap1 myvm.qcow2
$ qemu-img -c snap2 myvm.qcow2
$ qemu-img -l myvm.qcow2
$ qemu-img -d snap1 myvm.qcow2
$ qemu-img -a snap2 myvm.qcow2

What about wringing the empty/whitespace out of a fragmented sparse image?

$ mv myvm.qcow2 backup-myvm.qcow2
$ qemu-img convert -O qcow2 backup-myvm.qcow2 myvm.qcow2

How do we mount a qcow2 image to examine its contents without starting the VM? Use the QEMU network block device server to link the image to a block device on the host, then mount as normal:

# modprobe nbd max_part=32
# qemu-nbd -c /dev/nbd0 myvm.qcow2
# fdisk -l /dev/nbd0
# mount /dev/nbd0p1 /mnt

After unmounting the image, clean up with qemu-nbd --disconnect /dev/nbd0.

How do we attach an extra qcow2 image to an existing VM?

# virsh attach-disk myvm --source /var/lib/libvirt/images/thing.qcow2 --target vdb [--persistent]

* A cluster is the smallest amount of data that can be read or written from/to a qcow2 image in one operation. Because of the copy-on-write implementation of qcow2, changing one bit requires a rewrite of the whole cluster. Set cluster size during image creation, between 512B and 2M. Cluster size affects size on disk and performance, much like the block size of filesystems. Lots of little, random reads/writes will be faster with a smaller cluster size, whereas larger random reads/writes or contiguous IO will be faster with larger cluster sizes. Obviously, cluster size also affects base/child images, rebasing, etc. qemu-img create (on Debian in 2018) defaults to a reasonable cluster size of 64K.


What is copy-on-write? Normally, in an non-COW system, when data changes, the system overwrites that data in place. With COW, the system writes the changed data to a different, unused place on disk, leaving the original data (for the moment) in tact.

This is what makes part of the magic of qcow2 base/child images possible. If we’re writing to another part of the drive rather than overwriting, we can use that to create derivative images of the original, preserved data.

How to shrink a sparse qcow2 file

  1. Zero-out disk space in the guest by writing a file to fill most space on each partition: dd if=/dev/zero of=/mytempfile. Do this as a normal user, not root, to leave the reserve space intact and the system operational.
  2. Delete the zero file.
  3. Shutdown the VM.
  4. mv myvm.qcow2 backup-myvm.qcow2
  5. qemu-img convert -O qcow2 backup-myvm.qcow2 myvm.qcow2
  6. Re-start the VM.

Live backup of a qcow2 VM with libvirt

  1. Create an overlay of the qcow2 image backing the vm.
  2. Copy or rsync or archive however the original qcow2 image.
  3. Merge any changes from the overlay back to the original.

If the VM runs the QEMU guest agent, pass the --quiesce flag to snapshot-create-as.

$ virsh domblklist myvm
Target     Source
vda        /var/virtimg/myvm.qcow2

$ virsh snapshot-create-as myvm myvm-backup \
	--diskspec vda,file=/var/virtimg/myvm-backup.qcow2 \
	--disk-only --atomic
$ virsh domblklist myvm
Target     Source
vda        /var/virtimg/myvm-backup.qcow2

$ cp /var/virtimg/myvm.qcow2 /backup/
$ virsh blockcommit myvm vda --active --verbose --pivot
$ virsh domblklist myvm
Target     Source
vda        /var/virtimg/myvm.qcow2

A shell script:

set -euf

virsh snapshot-create-as myvm myvm-backup \
    --diskspec hda,file=/var/virtimg/myvm-backup.qcow2 \
	--disk-only --atomic
gzip -3 < /var/virtimg/myvm.qcow2 > /smb/backup/myvm.qcow2.gz
virsh blockcommit myvm hda --active --verbose --pivot
virsh snapshot-delete myvm --metadata myvm-backup
rm /var/virtimg/myvm-backup.qcow2

Another (possibly less safe but faster) approach is to copy changed blocks with rsync, something like:

$  rsync --existing --inplace /var/virtimg/*qcow2 /smb/backup
$  rsync --ignore-existing --sparse /var/virtimg/*qcow2 /smb/backup

qcow2 v2p (virtual to physical)

#  qemu-img convert -f qcow2 -O raw test.qcow2 /dev/sdf

Further Reading