Users of cloud computing platforms like AWS, Azure, and OpenStack want flexibility in their instance types and to customize their instances. Users minimally need to inject SSH keys into new instances, but also want to set hostnames, timezone, etc., and even run custom scripts. OS vendors don’t want to master different qcow2 images for different cloud vendors, and they definitely don’t want different images for each instance type. Cloud providers, like Amazon, want to meet customer needs, while keeping the whole thing as simple as possible and automated.
Cloud-init solves the problem. Different cloud providers can use the same image to automatically provision various machine types, and users can set many options before spinning up new instances.
cloud-init
is a service pre-installed in the instance.cloud-config
is a set of scripts/directives executed after the instance starts (on first boot).Where does cloud-init get its customization data? How does it know where to look?
The default configuration file for cloud-init is /etc/cloud/cloud.cfg
.
The cloud.cfg
, like cloud-config, uses YAML format.
It can contain a datasource
section:
# Example datasource config
# datasource:
# Ec2:
# metadata_urls: [ 'blah.com' ]
# timeout: 5 # (defaults to 50 seconds)
# max_wait: 10 # (defaults to 120 seconds)
But a big part of cloud-init is using stock images, where we won’t have the opportunity to customize cloud.cfg
prior to the first boot.
So, how does cloud-init know where to find its metadata?
To what values does datasource
default?
Cloud-init has a built-in list of supported datasources.
During early boot, before the network comes up, cloud-init looks for local data sources.
The details vary depending on the provider, but often it’s either:
cidata
or config-2
) attached during initial bootIn what order does it check the dozen+ potential datasource providers? I don’t know, but it seems to check the local ones before the network ones.
Apart from /etc/cloud/
, cloud-init stores files in /var/lib/cloud/
.
The cloud-config syntax uses YAML.
On AWS, using the instance creation wizard, we add cloud-config data in the “User Data” field of the “Instance Details” step.
The official cloud-init docs offer many examples.
Cloud-config example of adding a user:
#cloud-config
users:
- name: paul
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf0q4P7G0doiBQYV7OlOxbRjle026hJPBWDAaaaaxcdxd
+eKHWuVXIpAiQlSE4EBqQn0pOqNJZ3IBCvSLnrdZTUph4czNC4885AArS9NkyM7lK27Oo8RV8
+NI5xPB/QT3Um2ZiBGRkIwIgNPN5uqUtXvjgAaaaaaaffcdc+i1CS0Ku4ld8vndXvr504jV9Bc
QoZrXEST3YlriOb8Wf7hYqphVMpF3b+8df96Pxsj0+iZqayS9wFcL8ITPApHi0yVwS8T
jxEtI3FDpCbf7Y/DmTGOv49+AWBkFhS2ZwwGTX65L61PDlTSAzL+rPFmHaQBHnsli8U9N6E4b
HDEOjbSnYX paul@example.com
Cloud-config example of adding a repo:
#cloud-config
yum_repos:
epel-testing:
baseurl: http://download.fedoraproject.org/pub/epel/testing/7/$basearch
enabled: false
failovermethod: priority
gpgcheck: true
gpgkey: file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
name: Extra Packages for Enterprise Linux 7 - Testing
Cloud-config example of updating a Debian-flavored instance and installing packages:
#cloud-config
apt_update: true
apt_upgrade: true
packages:
- apache2
- rsync
Cloud-config example of running arbitrary commands:
#cloud-config
runcmd:
- [ wget, "https://10.0.0.2", -O, /tmp/index.html ]
- /bin/echo 'Hello, world!' > /var/www/index.html ]
Use KVM, libvirt, QEMU, and cloud-init to spin up instances for testing and experimentation. Create each instance from a qcow2 base image supplied by the OS distro, then customize the instances with cloud-init.
The NoCloud datasource accepts data from an attached disk image.
The libguestfs-tools
package provides the tool virt-make-fs
, which helps create the “cidata” disk image.
Cloud-init expects two files on the NoCloud “cidata” disk image: a meta-data
file and a user-data
file.
The idea behind the two files is for the cloud provider to supply the metadata, and the customer/user to supply the user data.
Since we’re acting as our own cloud provider, we supply both.
The meta-data
file looks like:
instance-id: iid-012345
local-hostname: myvm-CentOS-7-x86_64
Our YAML cloud-config user-data
file looks something like:
#cloud-config
timezone: America/Detroit
locale: en_US.UTF-8
users:
- name: paul
groups: sudo
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCqia4sRZ+v6Xuvv4HrTf8R4wIHAilul7LIlR5LTKzHvaCXrTAVkIqUhK4cJMEXCOKyS7kOnAkwNzKj/NPQNfw42JQznGF3I6VT8HTqi2NfTQ4lqe+vaMiCZU9mW8q33J3SJzLD/UciA2SqqSr3QMI9urDLn3ovpyKqYjxnSo1f9i+LInmXGgF7muWbGAXpWJyiUJxNB0WzZa3NpCmexlXw6f27tR1+fSJUDDP/HfwimmZms7Q8hT03dohI5OJmbI2YAX+OrHfO4Bya1jeQ5vLKoEX7jywLXHhuJORJ+6fUZBGL6sMQGfWKbYThdpU8MDmNSlOdHl9inyoSi69QZ2bJ
Stick the two files in a directory, and make the disk image:
$ mkdir -p ~/libvirt/cidata
$ cp meta-data ~/libvirt/cidata/
$ cp user-data ~/libvirt/cidata/
$ virt-make-fs --type=vfat --label=cidata ~/libvirt/cidata ~/libvirt/cidata.vfat
$ virt-filesystems -a ~/libvirt/cidata.vfat --all --long -h
Here’s how we create an instance from a base qcow2 image, customized with cloud-init:
$ wget https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-1802.qcow2.xz
$ tar xf CentOS-7-x86_64-GenericCloud-1802.qcow2.xz
# mv CentOS-7-x86_64-GenericCloud-1802.qcow2 /var/lib/libvirt/images/
$ mkdir -p ~/libvirt
$ qemu-img create -f qcow2 -b /var/lib/libvirt/images/CentOS-7-x86_64-GenericCloud-1802.qcow2 ~/libvirt/myvm-CentOS-7-x86_64.qcow2 10G
$ virt-install --import \
--name=myvm-CentOS-7-x86_64 \
--ram=1024 \
--vcpus=1 \
--network bridge=br0 \
--os-variant rhel7.4 \
--graphics none \
--disk ~/libvirt/myvm-CentOS-7-x86_64.qcow2 \
--disk ~/libvirt/cidata.vfat
A different way? NoCloud also looks for meta-data in the SMBIOS “serial” string, like:
serial=ds=nocloud-net;s=http://10.10.0.1:8000/
virt-install
accepts SMBIOS settings through its --sysinfo
option.
Presumably, serving a meta-data
and user-data
file from “http://10.0.0.10/” (or wherever) would then be sufficient.
$ virt-install --sysinfo type=smbios,system_serial=ds=nocloud-net;s=http://10.0.0.10/ [...]
…or maybe:
$ virt-install --qemu-commandline="-smbios type=1,serial=ds=nocloud-net;s=http://10.0.0.10/" [...]
I have not yet gotten this to work.
virt-install graphics
Changing the virt-install
graphics from default to --graphics none
shows much more cloud-init debug information.
I don’t know why output to vnc or qxl graphics seems to end before showing the cloud-init information.
Hmm. The console output make it clear that “nocloud-net” isn’t simply trying to grab the meta-data and user-data files from the URL. It’s trying to access values at some some sort of tree-like key-value store.
[ 141.714579] cloud-init[849]: 2018-04-02 10:43:10,403 - url_helper.py[WARNING]: Calling 'http://10.0.0.1/latest/meta-data/instance-id' failed [0/120s]: bad status code [404]
[ 142.724408] cloud-init[849]: 2018-04-02 10:43:11,412 - url_helper.py[WARNING]: Calling 'http://10.0.0.1/latest/meta-data/instance-id' failed [1/120s]: bad status code [404]
[...]
[ 261.103099] cloud-init[849]: 2018-04-02 10:45:09,790 - url_helper.py[WARNING]: Calling 'http://10.0.0.1/latest/meta-data/instance-id' failed [119/120s]: request error [('Connection aborted.', error(115, 'Operation now in progress'))]
[ 268.120181] cloud-init[849]: 2018-04-02 10:45:16,798 - DataSourceCloudStack.py[CRITICAL]: Giving up on waiting for the metadata from ['http://10.0.0.1/latest/meta-data/instance-id'] after 126 seconds
It’s probably simpler after all to just attach a vfat drive with the meta-data and user-data files.