Monday, April 07, 2014

Summary of the gains of Xen oxenstored over cxenstored

Apart from upkeeping ongoing FOSS projects I help maintain and push forward one of the first things I've been asked to help with at SUSE is Xen, specifically helping address the huge delta in place with upstream. Before you give me the kvm lecture, realize that I'm very well aware of kvm now and while architecturally I think its beautiful tons of folks are still investing a lot into Xen and even new industries are considering it. As an example at the Linux Collaboration summit there was a talk by Alex Agizim about using Xen in the automotive industry by the folks at Global Logic. They prefixed their talk with a great video of on Steeri - Driverless car parody, hope is that that's not what things will be like. As we move forward with Xen my goal will also be to see what folks are doing on kvm and see if there might be anything to share or learn from. Before starting at SUSE I knew squat about Xen so I figure as I ramp up I can help with the documentation as well. Learning about Xen has been fun as it involves tons of areas of the kernel, and the history is very rich. As I ramp up I intend on helping with the the documentation on its wiki. As a collateral in dealing with the delta for upstream and documentation at times I may look for better way to do things, specially if it reduces our delta or if it can help the project, or at the very least socialize the ideas for a future feature enhancement. Apart from helping on the wiki, which I think is critical, I'll try to post things every now and then about parts of its architecture which perhaps don't yet belong on the wiki, or may use my blog post things first here, and then go curate them over into the wiki. I've now sent my brain dump to a few people over the summary of Thomas Gazagnaire and Vincent Hanquez's paper (both at Citrix) on their implementation of a xenstore in an OCaml implementation called oxenstored. I will likely want to point more folks to this summary later given I'm actually also interested in alternatives and I don't expect folks to read a full paper to evaluate alternatives. I'm not going to get into the specifics of what I hope to see in alternatives now though other than mentioning that this came about in discussions at the Linux Collaboration summit in Napa and that it involves git. In this post I'll just cover the basic generals of the xenstore, a review of the first implementation and a summary of oxenstored.

The paper: OXenstored - An Efficient Hierarchical and Transactional Database using Functional Programming with Reference Cell Comparisons

First a general description of the xenstore and its first implementation. The xenstore is where Xen stores the information over its systems. It covers dom0 and guests and it uses a filesystem type of layout kind of how we keep a layout of a system on the Linux kernel in sysfs. The original xenstored, which the paper refers to a Cxenstored was written in C. Since all information needs to be stored in a filesystem layout any library or tool that supports designing a tree to have key <--> value store of information should suffice to upkeep the xenstore. The Xen folks decided to use the Trival Database, tdb, which as it turns out was designed and implemented by the Samba folks for its own database. Xen then has a deamon sitting in the background which listens to reads / write requests onto this database, that's what you see running in the background if you 'ps -ef | grep xen' on dom0. dom0 is the first host, the rest are guests. dom0 uses Unix domain sockets to talk to the xenstore while guests talk to it using the kernel through the xenbus. The code for opening up a connection onto the c version of the xenstore is in tools/xenstore/xs.c and the the call is xs_open(). The first attempt by code will be to open the Unix domain socket with get_handle(xs_daemon_socket()) and if that fails it will try get_handle(xs_domain_dev()), the later will vary depending on your Operating System and you can override first by setting the environment variable XENSTORED_PATH. On Linux this is at /proc/xen/xenbus. All the xenstore is doing is brokering access to the database. The xenstore represents all data known to Xen, we build it upon bootup and can throw it out the window when shutting down, which is why we should just use a tmpfs for it (Debian does, OpenSUSE should be changed to it). The actual database for the C implementation is by default stored under the directory /var/lib/xenstored, the file that has the database there is called tdb. On OpenSUSE that's /var/lib/xenstored/tdb, on Debian (as of xen-utils-4.3) that's /run/xenstored/tdb. The C version of the xenstore therefore puts out a database file that can actually be used with tdb-tools (actual package name for Debian and SUSE). xentored does not use libtdb which is GPLv3+, Xen in-takes the tdb implementation which is licensed under the LGPL and carries a copy under tools/xenstore/tdb.c. Although you shouldn't be using tdb-tools to poke at the database you can still read from it using these tools, you can read the entire database as follows:
 tdbtool /run/xenstored/tdb dump
The biggest issue with the C version implementation and relying on tdb is that you can live lock it if you have have have a guest or any entity doing short quick accesses onto the xenstore. We need Xen to scale though and the research and development behind oxenstored was an effort to help with that. What follows next is my brain dump of the paper. I don't get into the details of the implementation because as can be expected I don't want to read OCaml code. Keep in mind that if I look for a replacement I'm looking also for something that Samba folks might want to consider.

OXenstored has the following observed gains:
  • 1/5th the size in terms of line of code in comparison to the C xenstored
  • better performance increasing support for the number of guests, it supports 3 times number of guests for an upper limit of 160 guests
The performance gains come from two things:
  • how it deals with transactions through an immutable prefix tree. Each transaction is associated with a triplet (T1, T2, p) where T1 is the root of the database just before a transaction, T2 is the local copy of the database with all updates made by the transaction made up to that point, p is the path to the furthest node from the root T2 whose subtree contains all the updates made by the transaction up that point.
  • how it deals with sharing immutable subtrees and uses 'reference cell equality', a limited form of pointer equality, which compares the location of values instead of the values themselves. Two values are shared if they share the same location. Functional programming languages enforce that multiple copies of immutable structures share the same location in memory. oxenstored takes avantage of this functional programming feature to design trie library which enforces sharing of  subtrees as much as possible. This lets them simpilfy how to determine and merge / coalesce concurrent transactions.
The complexity of the algorithms used by oxenstored is confined only to the
length of the path, which is rarely over 10. This gives predictable performance
regardless of the number of guests present.
Post a Comment