Tuesday, May 27, 2014

Building and booting vanilla Xen on vanilla Linux with systemd



If you want to do Xen development you should be working with upstream sources, and you should be sending your patches upstream, ASAP, that is before they are even in production. There simply should be no ifs or doubts about this. Doing it any other way is simply detrimental in the long run. I'm new to virtualization but from the architectural look of it I consider kvm a good reaction to virtualization evolution with focus for a clean new architecture that pairs up best with the latest hardware enhancements only. The decision to not support new bells and whistles on things that could be done through software but instead designed with hardware support eliminates tons of support on the software side, but obviously it relies on the assumption that folks will upgrade hardware and that the hardware was designed properly. Xen however is full of a rich history, experience, and flexibility, and as such its important to realize that there should be no easy decision to claim what is a better solution right now.

One thing I'm sure: both solutions at this point have a rich set of expertise and design goals to be learned from, the one thing I see kvm doing right is pushing Upstream First (TM) as a motto. Xen should learn from that strategy as there are markets and innovative groups who appreciate this tremendously. With the rapid pace of evolution of the Linux kernel, there is simply no other way, and because of this Xen development should change to a must be working upstream only model, and join the Upstream First (TM) bandwagon. In this post I will dive into the recipes required to get the latest Xen and vanilla Linux sources and get you started on the Upstream First (TM) bandwagon with Xen. I provide instructions for getting both Xen and the upstream Linux kernel configured properly. I will ignore anything not upstream on the Linux kernel, as what we need to do with that delta is just get it upstream. Additionally since even Debian has casted votes on supporting systemd as a Linux init replacement I'll also provide instructions on how to get systemd support on xen with active socket support as it seems that's the way of the future for all Linux distributions. Both Fedora 20 and OpenSUSE 13.1 have already jumped on systemd so you'll want proper systemd support for these, as it stands right now Xen does not have service unit files as part of its upstream sources, patches are in the works though and this posts also illustrates some corner cases found while implementing support, some general systemd autotools library helpers defined to make it easier for others to integrates support for systemd and an example code base which makes elaborate use of these helpers.

Please note that compiling xen with systemd support enables binaries to be used for systems either using legacy init or systemd using the the v5 series of integration patches documented here, systemd support patches are not yet merged upstream, but to help provide wider coverage support you should enable its support as per the instructions below and report any issues you have found to me. Since I wish for as many folks to jump on the upstream bandwagon I'll cover instructions only for getting the latest xen to run on the latest stable vanilla kernel over a slew of Linux distributions, this includes the Linux kernel as well as xen, and resolving all your dependencies. I'll recommend building and embracing oxenstored for reasons I've stated before, after all if you run into issues with the latest systemd series of patches you can easily revert back to cxenstored by a simple flip on the configuration file on either /etc/sysconfig/xencommons (rpm based distributions) or /etc/defaults/xencommons (Debian based distributions) (Note: this last part still needs to be worked on, right now this requires a bit more work for systemd).

I have built tested the below instructions on OpenSUSE Tumbleweed, Debian testing, and Fedora 20. I have only run time tested this on OpenSUSE Tumbleweed and Debian testing. Reports for any issues on run time on Fedora 20 and Ubuntu are appreciated. Instructions for other Linux distributions are welcomed so I can extend the documentation here while systemd support patches get baked upstream, after that I will move all documentation to the xen wiki.




Getting an updated /sbin/installkernel 

 

Linux distributions shipping with grub2 will need to ensure that their /sbin/installkernel script, which has to be provided by each Linux distribution, copies the the kernel configuration upon a custom kernel install time. The requirement for the config file comes from upstream grub2 /etc/grub.d/20_linux_xen which       
will only add xen as an instance to your grub.cfg if and only if it finds in your config file either of:                                           
                                                                               
CONFIG_XEN_DOM0=y                                                              
CONFIG_XEN_PRIVILEGED_GUEST=y   
                                               
                                                                               
Without this a user compiling and installing their own kernel with proper support for xen and with the xen hypervisor present will not get their respective grub2 update script to pick up the xen hypervisor. Debian testing has proper support for this, OpenSUSE required this change upstream on mkinitrd, so OpenSUSE folks will want to get the latest /sbin/installkernel hosted on the OpenSUSE mkinitrd repository on github.

# If on OpenSUSE update your /sbin/installkernel
git clone https://github.com/openSUSE/mkinitrd.git
cd mkinitrd
sudo cp sbin/installkernel /sbin/installkernel 

Fedora might need a similar update. I welcome feedback on confirming this.


Xen systemd build dependencies on OpenSUSE

 

 

# If you're now on the latest OpenSUSE you'll note its now a
# a rolling distribution base for (and also called Factory)
# The default instructions do not actually encourage you to
# install the source repositories, and even if you did
# install them the instructions disable them by default, so
# be sure to install them and enable them otherwise
# the command zypper source-install -d won't work.
# To enable the required repository if you already had it
# installed:
sudo zypper mr -e repo-src-oss

# Get the build dependencies for Xen
sudo zypper source-install -d xen

# Things not picked up by the build dependencies
sudo zypper install systemd-devel gettext-tools\
ocaml ocaml-compiler-libs ocaml-runtime \
ocaml-ocamldoc ocaml-findlib glibc-devel-32bit make patch

# Get build dependencies for Linux
sudo zypper source-install -d kernel-desktop



Xen systemd build dependencies on Debian testing and maybe Ubuntu

 

Note that these instructions are not to enable systemd as the init process on Debian, although there are some instructions here to help you with that if you wish to venture into that.

sudo apt-get build-dep xen linux
sudo apt-get install git libsystemd-daemon-dev \
libpixman-1-dev texinfo


Xen systemd build dependencies on Fedora 20 

 

Fedora may need an update to /sbin/installkernel as OpenSUSE did for grub2 support, see the notes above for more details on that. Verification on this is appreciated.

# Get build dependencies for xen
sudo yum-builddep xen

# Things not picked up by the build dependencies
sudo yum install glibc-devel.x86_64 systemd-devel.x86_64

# Get build dependencies for Linux
sudo yum-builddep kernel 


Getting the code



Next go get Linux and Xen sources.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git clone git://xenbits.xen.org/xen.git



Configuring vanilla Linux with xen support

 

cd linux
wget http://drvbp1.linux-foundation.org/~mcgrof/patches/2014/05/15/linux-xen-defconfig.patch
patch -p1 < linux-xen-defconfig.patch
cp /boot/config-your-distro-config .config
make xendom0config
make -j $(getconf _NPROCESSORS_ONLN)
sudo make install

 

Configuring xen with oxenstored and systemd support

 

cd xen
wget http://drvbp1.linux-foundation.org/~mcgrof/patches/2014/05/27/all-v5-series-xen-systemd.patch
git reset --hard 86216963fd1d89883bb8120535704fdc79fdad50
git am all-v5-series-xen-systemd.patch
./configure --with-xenstored=oxenstored --enable-systemd
make dist -j $(getconf _NPROCESSORS_ONLN)
sudo make install
sudo ldconfig

# If on systemd, that is, if you have /run/systemd/system/
sudo systemctl daemon-reload

The last step is to enable the systemd unit services you want, if you want to test the active socket stuff, just enable xenstored.socket, and after reboot you can just use netcat as root to tickle the socket as described below, if you just want to have the xenstored service already running enable the xenstored.service, which will also enable xenstored.socket as its a dependency.

sudo systemctl enable xenstored.socket
sudo systemctl enable xenstored.service

The last step is to ensure the grub config updated to pick up the xen hypervisor. This varies depending on Linux distributions. Below we cover the distributions that I have tested booting on.

Updating grub for Xen on OpenSUSE

 

sudo update-bootloader --refresh

Updating grub for Xen on Debian and maybe Ubuntu

 

sudo update-grub

Reboot and test 

 

That's all, reboot and make sure you pick the right grub entry. Typically grub2 will list regular kernel entries and hypervisor entries separated, with the option to go into advanced settings for each one. Entering the advanced settings for the hypervisor will enable you to pick the exact kernel you want to boot to. If you have hardware with some virtualization capabilities you'll want to enable that, this is done on through the BIOS / UEFI menu. Below are some pictures of enabling the features on a Thinkpad T440p, and then the flow through grub2.


Get into the virtualization menu on the system BIOS / UEFI menu.


On Intel hardware this will be labeled as Intel Virtualization Technology and Intel VT-d Feature. For AMD the name is some other flashy similar thing.


Boot into grub and you should now see an option for your distribution with the Xen hypervisor, pick that if you want to go with the defaults, but if instead you want to browse each hypervisor available pick the advanced options.






 If you picked the default hypervisor option you should be booting into the Xen Hypervisor and that in turn will boot your kernel / distribution. If you picked the advanced option you'll see the options for the hypervisor as below. In my case I have only the bleeding edge unstable version from git of the Xen hypervisor.


Next it will let you pick the kernel you want to boot your hypervisor with. All of the kernels with support for Xen will be displayed.



After this you should be booting into the Xen hypervisor and this in turn will boot Linux as dom0.

After bootup

 

 Starting xen with old init

 

First verify you booted into a xen hypervisor first as follows:

mcgrof@garbanzo ~ $ cat /sys/hypervisor/type
xen

You're all set, the next step is to start Xen. On Linux distributions stuck on old init like Debian right now you just have to spawn the old init script. This is done as follows:

mcgrof@garbanzo ~ $ sudo /etc/init.d/xencommons start
Starting /usr/local/sbin/oxenstored...
Setting domain 0 name and domid...
Starting xenconsoled...
Starting QEMU as disk backend for dom0

mcgrof@garbanzo ~ $ echo $?
0

You are ready to start creating guests!

Starting xen with systemd

 

First thing is to ensure your dom0 is now booted on the xen hypervisor. If you have systemd you can do this easily with:

mcgrof@ergon ~ $ sudo systemd-detect-virt
xen

Under the hood this is the same as the following:

mcgrof@garbanzo ~ $ cat /sys/hypervisor/type
xen

If you only enabled xenstored.socket you can verify the sockets by:

mcgrof@ergon ~ $ sudo netstat -lpn | grep xen
unix  2      [ ACC ]     STREAM     LISTENING     13976  1/init              /var/run/xenstored/socket
unix  2      [ ACC ]     STREAM     LISTENING     13979  1/init              /var/run/xenstored/socket_ro

You can also use systemd:

mcgrof@ergon ~ $ sudo systemctl list-sockets| grep xen
/var/run/xenstored/socket    xenstored.socket             xenstored.service
/var/run/xenstored/socket_ro xenstored.socket             xenstored.service

You can also verify the socket unit:

mcgrof@ergon ~ $ sudo systemctl status xenstored.socket
xenstored.socket - Xen xenstored / oxenstored Activation Socket
   Loaded: loaded (/usr/local/lib/systemd/system/xenstored.socket; enabled)
   Active: active (listening) since Thu 2014-05-15 01:12:53 PDT; 16min ago
   Listen: /var/run/xenstored/socket (Stream)
           /var/run/xenstored/socket_ro (Stream)

May 15 01:12:53 ergon systemd[1]: Starting Xen xenstored / oxenstored Activation Socket.
May 15 01:12:53 ergon systemd[1]: Listening on Xen xenstored / oxenstored Activation Socket.

Next, you can check to see if xenstored.service is running, it should not be if you didn't enable it and only enabled xenstored.socket:

mcgrof@ergon ~ $ sudo systemctl status xenstored.service
xenstored.service - Xenstored - daemon managing xenstore file system
   Loaded: loaded (/usr/local/lib/systemd/system/xenstored.service; disabled)
   Active: inactive (dead)

Next to see the active socket magic trigger you can just use netcat to tickle any of the sockets. Since the permissions are only to grant access to the root user you'll need root to tickle the socket.

mcgrof@ergon ~ $ sudo nc -w 1 -U /var/run/xenstored/socket_ro
mcgrof@ergon ~ $ echo $?
0

Now verify the xenstored.service is loaded:

mcgrof@ergon ~ $ sudo systemctl status xenstored.service
xenstored.service - Xenstored - daemon managing xenstore file system
   Loaded: loaded (/usr/local/lib/systemd/system/xenstored.service; disabled)
   Active: active (running) since Tue 2014-05-20 04:33:09 PDT; 1 day 16h ago
 Main PID: 1621 (oxenstored)
   CGroup: /system.slice/xenstored.service
           └─1621 /usr/local/sbin/oxenstored --no-fork

May 21 21:24:24 ergon oxenstored[1621]: xenstored is ready

Why you want active sockets

 

Systemd has support for "active sockets" or "socket based activation", but this concept is not new, socket based activation was pioneered by Apple's Launchd, and that software was released under the Apache 2.0 license, that project got its first release in 2005, while systemd's initial release dates 2010. Go and watch Dave Zarzycki's talk at Google about Launchd, there's tons of talks about systemd and, here's an old introduction talk about systemd it by Lennart Poettering, and Lennart does give Apple proper kudos here. Systemd is simply ├╝ber optimized for Linux, it takes advantage of tons of special Linux kernel enhancements. Socket based activation is ideal for local service, AF_UNIX sockets, although support does exist for inet sockets as well. There are two reasons why you want active sockets:
  1. On demand auto-spawning
  2. Help with bootup parallelizaiton
The on demand auto-spawning can be taken advantage by xen if and only if its tools are converted to try to open the unix socket when they run, but they currently don't do this and some communication uses the kernel ring interface, not the unix domain sockets. If you use the stubdoms you also never end up using  the unix domain sockets. The gains from parrallelization however are awlays welcomed, you essentially let systemd figure out how to bring things up by associating dependencies rather than trying to pile things up in a specific strict numbered order, this is all controlled by the service unit files and the requirements specified. Udev lends a here as well, which is not merged part of systemd, but I'll have to cover udev on another post. If one had an ecosystem that one was sure did not require the service to be spawned up all the time and you didn't need the kernel ring interface immediatley up, you could just either enable only the xenstored.socket or remove this section from the xenstored.service:

WantedBy=multi-user.target

A few things worth noting for daemons and systemd that I do not see covered clearly in documentation, the exact expectations on the different type of service types. Systemd supports different types of daemons, for those that don't fork you should declare in your service unit file a type of:

Service=simple

For daemons that do call fork() you should use the following:

Service=forking

In legacy init world, this consists of most of the daemons out there. There's a bit of a caveat here though: systemd expects you to behave in a certain way if you use Service=forking, your first parent process should be the one to call sd_notify_fds(), you should not let child processes do the sd_notify_fds() call. What deamons do vary and the assumption on systemd that daemon's spawn sockets on the parent rather than children means deamons will need a bit of a change in order to work with systemd properly as there is no way to tell systemd a child is going to be the main process, even if you try sd_notifyf() with the process ID of the child. Arguably there's a good reason for this though, you should consider using Service=notify and when you use this type of service you don't fork as part of your deamonizing effort, instead you just tell systemd when your service is ready with sd_notify(). There's some curious architectural design principles worth elaborating on that comes with this that highlight a mistake typically in place on some deamons that do fork. When deamonizing and forking killing the parent immediately is the easy and fastest way from a programmer's perspective but should typically not be done given that regular legacy init that spawn daemons in order will enable processes to make use of the daemon under the impression that the deamon is ready, leaving a small amount of time for a race condition to trigger. Typically this is addressed with nasty undocumented workarounds, for example retry connections to connect to the unix domain sockets on daemons that are expected to be created after initialization. Mind you, the race condition is small but yet very possible, specially if we want to boot up fast. This is one of the races that systemd services using sd_notify() avoid by design. This is pretty cool.

 

funk-systemd - example complex systemd daemon 

 

 

Apart from corner cases there is also the complexities introduced by the different types of build systems / target systems, specially for projects which really want to support multiple Operating Systems and init systems such as Xen. To address different build environments and targets a lot of projects use autotools, Xen follows this practice so integrating support for systemd on Xen required proper autotools support. Autotools support with systemd can get complicated fast -- you see, systemd does not allow variable placements on ExecStart settings for the binary you wish to run, this means that if your project uses configure to dynamically place the path of the binary you will also need proper replacement for the paths upon configure time. With autotools this is accomplished with the AC_CONFIG_FILES() helper but in order to make use of some paths with AC_CONFIG_FILES() you'll want to eval and call AC_SUBST() on them. This is not only useful for the ExecStart but also consider the different placements of the socket files. If using ${prefix} for any of the paths you will need to work with a not-so-well documented $ac_default_prefix. You also have to consider the different types of build environments and the different types of target systems that a project wishes to support for a produced single binary daemon. The different build environments may vary.  A project may wish to support forcing systemd to be present, some may wish to only use systemd if the development libraries are present, and others may with to require you to specify that you want systemd explicitly. As far as target systems are concerned -- they vary as well, in the worst case scenario a project may wish to support legacy init with and without systemd libraries present and then for the case where systemd is the init process. In this example situation if its desirable to support a single binary for all types of init systems the dynamic link loader (using dlopen(), dlsym()) can be used, or a in-place replacement for sd_booted() can be implemented as well instead of relying and calling on the systemd helper sd_booted(). A project such as Xen that supports two daemons for the same type of service also needs to consider which route to take for supporting and maintaining service until files for the different possible daemons. There's different strategies for this. A lot of this is not well documented, and good examples for for projects as complex as Xen's build system are not readily available, let alone cover all the cases I've described. Becuase of all this and since I ended up doing the work for systemd Xen integration I made sure to try to generalize a solution and address all types of environments as described above, I have also stuffed a sample daemon which also covers documents the legacy init corner case that sd_notify() explicitly addresses. You can find the sample code here, the autoconf helpers defined and documented here are also being submitted as part of the xen system integration patches:

https://github.com/mcgrof/funk-systemd

To look at an example solution for the legacy init race condition look at the usage of funk_wait_ready() which is called on the parent process that forks. As for xen, the legacy init daemon has as part of init script a retry counter, we should be able to remove that code with a similar solution for the legacy socket implementation. In this tree you will also find a few helpers if you want to get ramped up with systemd and autoconf which xen's systemd ingration patches make use of:
  • src/m4/systemd.m4 - systemd autoconf library which enables easy build integration support for systemd. There are four build options supported
    • AX_ENABLE_SYSTEMD() - enables systemd by default and requires an explicit --disable-systemd option flag to configure if you want to disable systemd support.
    • AX_ALLOW_SYSTEMD() - systemd will be disabled by default and requires you to run configure with --enable-systemd to look for and enable systemd
    • AX_AVAILABLE_SYSTEMD() - systemd will be disabled by default but if your build system is detected to have systemd build libraries it will be enabled. You can always force disable with --disable-systemd. This is the option we have decided to use for Xen.
    • If you want to use the dynamic link loader you should use AX_AVAILABLE_SYSTEMD() but must then ensure to use -rdynamic -ldl when linking, if using automake autotools will deal with this for you,otherwise you must ensure this is in place on your Makefile.
  • src/m4/paths.m4 - Implements AX_LOCAL_EXPAND_CONFIG() which you can use to replace meta @VAR@ variables on files defined with AC_CONFIG_FILES(). You might want to make use of this for example on systemd service unit file ExecStart, on the socket definition file, and/or the code that connects to the sockets.
  • src/funk_dynamic_helpers.c  - example systemd integration implementation support using the dynamic link loader -- using dlopen() and dlsym() which can be used for the one-binary-fits all solutions. Although a solution with this strategy was tested for systemd, this is not the option we are going to support on Xen.
  • funk daemon with-autoconf implementation  - example implementation with the above helpers with autoconf support alone
  • funk daemon with-automake implementation - example implementation with the above helpers with automake support
  • README and INSTALL - read these for more details on this example

 

Systemd support for projects with multiple daemon replacements

 

 

Xen is a good example of a project that requires support for multiple alternative binaries that can run as the daemon. For such type of situations there are a few possible solutions, this has been discussed only briefly on the systemd-devel list, you can end up implementing:
  1. Define a service unit file each for daemon, and define one target which defines the overall service. Service unit files that require the service will require the target, not the actual service unit file. The service unit files are then mutually exclusive with each other, the system administrator would then have to then manually select which service unit to enable. The downside to this strategy is you end up with multiple service unit files which in the worst case are identical and only differ on the ExecStart path.
  2. Define a service unit file for each daemon and define an Alias=foo.service for the general service. Services that need to depend on this service would then Require the alias, not the specific service file for each binary. The same downside is present with this solution.
  3. One service file and environment variables to be used by a binary launcher which will get use getenv() and execve() to launch the respective preferred daemon. This option gives the flexibility to be easily compatible with legacy init daemons that typically require /etc/sysconfig/  or /etc/default/ configuration files. Although Lennart has clarified that ideally the systemd-way could be to ignore /etc/sysconfig and /etc/default all together this solution would still enable to ignore /etc/sysconfig/ and /etc/default/ by requiring the default variable to be set via Environment=FOO_DEFAULT_DAEMON=/usr/local/sbin/bar. For support with legacy init systems the EnvironmentFile=-/etc/sysconfig/foodaemon and EnvironmentFile=-/etc/default/foodaemon can be used.
No example code or service unit files is provided at this point, what we end up doing for Xen remains to be decided.

Ocaml and systemd support

 

Xen has an ocaml implementation of the xenstore so as you can imagine we also had to add some support for systemd with ocaml. I won't provide examples here, but just not that support has been provided using a C interface wrapper. For details please review the posted patches.
Post a Comment