Saturday, April 05, 2014

Automatic kernel backporting with Coccinelle



Exponential growth does not only apply to technology and biology, it applies to software too. We need to adapt and scale better. I've written before on the implications on rapid software growth and freedoms, this post addresses the pure technical implications of accelerated software growth specifically on the Linux kernel in consideration of backporting. Tools to help rapid software growth are huge assets to large software projects. git is one example, and let's recall that Linus refused to use anything other than BitKeeper to maintain the Linux kernel until he replaced it with git. Its not only important, these tools are required. Its best if you already have Free Software tools available to help software evolve though. Another good example, but one which has been Free Software since the start, is Coccinelle, which came about through the research and development spearheaded by Julia Lawall and her team in trying to address the concept of collateral evolutions to help the Linux kernel grow faster, safely, and to make us lazier. I've written before on my conjecture on usage of Coccinelle for automatically backporting Linux. I looked at this as the Linux kernel and Linux kernel backports project grew, to the point I could not stop it, even if I wanted to -- I've tried twice now. I've tested my conjecture recently and Coccinelle has exceeded my expectations both on what it is already capable of and also on what might be possible in the future. We have been grooming a slew of techniques within the Linux backports project to do backporting automatically for a series of years now and I'm happy to report that Coccinelle will be one of those technologies that we embrace that will help us scale faster. In light of the high pace of evolution on hardware, Linux and Linux backports I'm convinced now that striving to backport the Linux kernel automatically is the only reasonable way to do backporting.


Before I dive into details I'll have to dedicate one paragraph to provide context under which evaluation of Coccinelle took place. One of the many reasons I left my job at Qualcomm was I found out some folks were using backports, a project I started with the community since way before I joined Atheros, with a proprietary driver. It gave me great reason to focus more on strengthening backports' core architecture, but also making it crystal clear that since our project's inception we are nothing more than a derivative works of the Linux kernel but we're also making it crystal clear that proprietary drivers are not allowed when you use backports. In case you haven't gotten any of the memos: proprietary drivers on backports are not allowed. Our documentation cannot be any more clearer I think. Since I strongly suspected we could help evolve backports more efficiently with Coccinelle I decided to reach out to Julia Lawall. I had met Julia at the 2011 Linux Plumbers conference during my talk on Backporting the Linux kernel for good where she provided to me a high level architectural hope on the ability to further automatically backport the Linux kernel. Julia happens to be the main research scientists leading Coccinelle research and development. I'm forever grateful that Julia managed to secure funding for me to do a 2-month research collaboration effort with her team and folks at LIP6 and IRILL, in Paris. Part of the work was seeing how we can better integrate and grow Coccinelle with the backports project and also collaborate on ideas at both LIP6 and IRILL. Given the lack of good faith I saw at Qualcomm, like dragging their feat on my own legal employee agreement for over one year and even being told to my face that they claimed to own copyright even on my personal public blog posts over backports, I also knew I couldn't continue there and had to eventually quit. I also couldn't let any work I do fall into the cracks of questionable gray area, specially if it was going to be of use for the community -- I didn't want my contributions to be used by proprietary drivers on Linux. I decided to spend one month learning as much as I could with the Coccinelle folks and pushing out a slew of enhancements for work on CRDA. I decided then it would also be a great idea to relicense CRDA to copyleft-next, of course. I knew I had to quit though so that I could do some work without it being skewed or used in some other proprietary PHB project. I was prepared to quit without a job at hand, if anything I could just go to Costa Rica and live by the beach for a while and sell bananas, or something. I was very fortunate that by the time I took a trip to Germany to visit some friends I was negotiating with two companies which seemed to respect everything I cared about. I decided to join SUSE, I'd start with them in January -- this was a huge relief -- I could quit my job early in my trip which gave me the perfect chance to focus on solid research and development my last month in Paris. None of this was planned. Even though I was doing research and development in Paris the first month was tough as I still had my other job and I was working late nights to be able to focus my attention on both. This also meant I didn't have much time to even keep backports up to date... I'm forever gratIeful Hauke Mehrtens was willing to step up and take on maintainership over the project while as I was in Limbo. Things turned out well.


I'd like to prefix the technical details that I will provide below by also thanking everyone in France -- the Atheros France office which despite the circumstances were simply awesome, folks at LIP6, IRILL, and specially Julia, for the warm welcome and great time both on research, development, and simply an amazing time. Specifically at IRILL -- Roberto Di Cosmo and Stefano Zacchiroli. At LIP6 -- Gilles Muller, David Lo. And then my strong French coffee buddies Peter Senna Tschudin (blog) and Lisong Guo. I hope to be back some time for some more strong good French coffee. Recruiters, researchers, and kernel developers, pay close attention to what these folks are doing and will be doing, they're bad ass. Now that the context is all fleshed lets dive into the technical details, I'll provide a status update of where were we stand with regards to Coccinelle integration on backports and what you can expect in the near future.


Let's start by reviewing the development flow on Coccinelle -- its a research project and as such the work flow of ideas are pretty flexible. Although Julia Lawall leads its research and development she's also incredibly busy giving students feedback, reading / evaluating papers, giving talks, or traveling, and of course, hacking on the Linux kernel. I'm surprised she manages to get so much done. All this means that the way that code flows in Coccinelle is to do most of the evolution loosely within an internal tree and full releases get pushed out to coccinelle's repo on github. My hope is that with a bit of volunteering we can strive to change this to follow a more linear work flow, as with the Linux kernel, but we'd have to prioritizing not disrupting any existing evolutions by its researchers. It'd also require learning OCaml -- eek. This is only required if you don't have access to the internal repository, which I do now, and if you're expected to try test patches as features or issues are addressed. If you're in that situation you might as well create an account and request read access to the Coccinelle INRIA gforge repository. As I'll explain below, we might be able to live without a linear tree on Coccinelle.


Now go get Coccinelle code on github. At this point you likely want the latest and greatest from github, you want to compile it yourself as any distro release right now is probably too old. If you're like me and use vim, go get Coccinelle vim syntax highlighting. Coccinelle is written in OCaml, a functional programming language. Although I tried to pick it up, Julia expressed that kernel developers should not have to pick it up, she'd be happy to receive feedback as problems or new ideas are found and address them. That made me happy :) Fortunately Johannes Berg had added the first SmPL (Semantic Patch Language) patch to backports and it was merged by the time I was in Paris. The code generation recipe backports follows is to copy a target set of code listed in a copy-list, copy over our backports module, its header files, and throw in some Kconfig logic. The last step involves patching the code. Any patch in the patches/ directory will be applied linearly in alphabetical order. If any patch happens to end with a .cocci file extension we treat it as a Coccinelle SmPL patch and call out on spatch to apply it. The first performance concerns expressed were spotted by Johannes, you see once you use Coccinelle within backports you use it at the very least once a day, and the Intel folks were already using backports in production deployments and relying on it. Johannes noted that using the Coccinelle spatch --dir option, which we use to specify applying a cocci file to the entire kernel, triggers a 'sh -c egrep -q' call for every file. In short Johannes' feedback was that calling out to the shell just to call another binary was pretty expensive. The alternatives that are documented for Coccinelle support are to use software indexing utilities, one is called Glimpse, and the other ideutils. Glimpse unfortunately is not free software so not even sure why that is supported, and one issue with using ideutils in a kernel development work flow is you are always updating your trees and not everyone may wish to be updating indexes with additional software. I personally never use ideutils or regular grep on git trees, I use 'git grep' as it skips tons files you don't care over. The issue spotted by Johannes can be seen with an example: 

sys/wireless$ time (find . -name *.c | xargs egrep -q '(\bgenl_ops\b)')
real    0m0.239s
user    0m0.208s
sys     0m0.088s

sys/wireless$ time (find . -name *.c | xargs -n1 sh -c "egrep -q '(\bgenl_ops\b)'")
real    0m20.120s
user    0m0.216s
sys     0m1.072s

Julia experimented with a few things, even using 'git grep' but that resulted in slower code evaluation, in the end an internal Cocci_grep was used and that in turn ended up speeding things quite significantly for both the cases where you are indexing code and where you are not. This change made it into the Coccinelle 1.0.0-rc19 release, you want to have at least 1.0.0-rc20 right now though. Hat tip to Johannes Berg for reporting this with great detail and Julia for fixing it like a speed daemon.


The Linux backports project backports automatically through a few strategies.It's main objective is to keep code it takes from upstream as-is as much as possible and only as as last resort does it use legacy patches and now SmPL patches. Patches are used typically to address core data structure changes which we cannot carry over, or to address complex functionality which we simply cannot tuck away under headers or newly exported symbols. All these changes are addressed typically with #ifdefs but as it turns out those pesky things are actually quite a bit problematic if you want to ensure you keep style using grammar rules. I remember there's at least one old mailing list thread (but I can't find it now) that had actually encouraged folks to not consider using Coccinelle for #ifdef'ing code. This is the next area I explored with Julia, mainly because #ifdef'ing is a huge part of what backports does on its patches, and keeping code usable and sane to read is of utmost importance to us. In so far as style -- you can end up with functional code that compiles and works but if it looks odd can you easily debug it? Helping with Coccinelle in this area has consisted in doing transformation of legacy patches to SmPL form and providing feedback where I don't see Coccinelle respecting the expected style / form. This work is ongoing on Coccinelle but a lot of fixes have been done already.


Once we have #ifdef style being respected perfectly, for all corner cases, it should be feasible to perform a proof that an SmPL patch can replace a series of legacy patches. This can also be useful for folks who may want to test converting an upstream patch into SmPL form as an exercise or to become an even lazier developer and only do a collateral evolutions for one driver, test an SmPL grammar replacement and prove that it yields the same change, and then kick it of to generate the full patch for the entire Linux kernel. Maintainers who do not trust SmPL patches from people can use this to prove the provided SmPL patch also matches the provided full patch supplied. Doing an SmPL proof means you have to start reformatting legacy patches into a consistent form, as that is how you will need to write your SmPL rules and how Coccinelle will generate patches. Remember, we're now trying to work with computers doing code modifications for us, not just patches, we need to be precise. There are several strategies that can be used to do a proof of SmPL patch replacement correctness. We could evaluate code compilation but that would take long. Another strategy is to see if we can infer the SmPL grammar from the patches and check for isomorphism using Coccinelle somehow, but if we had the ability do do that we wouldn't really need to write rules, right? Even so if we're replacing legacy patches we may wish to not trust SmPL patch inference or at least desire to want proof of correctness. For backports's sake once we have all the #ifdef spacing / style respected we can use the following strategy to prove an SmPL patch can replace a series of legacy patches. If you don't need to work with #ifdefs then you likely can already use this general approach.
  1. Use one tree, g1, and another base tree, b1, without git - and on each import all code before applying any patches
  2. Apply all legacy patches to g1 and commit
  3. Apply SmPL patches to b1
  4. rm -rf all code on g1 and replace it with all code in b1
  5. git diff --stat
I have this all implemented in Python and will be sending it soon for integration, but onto the backports project. If a generalized tool is desired we could consider doing that and merging it into Coccinelle. The 'git diff --stat' should yield 0 results if you have a proper SmPL replacement, it will show extra code additions if the SmPL patch actually added more code than what you had originally, and will show code removal for areas that perhaps the SmPL did not manage to address. It turns out though that you typically end up with more code changes but in files you hadn't addressed before. The reason for this is you have generalized a backport for a collateral evolutions and are using Coccinelle to perform the backport for all code, not just the target code you originally had addressed the collateral evolutions for. This is exactly why I have been cleaning up patches over time on the backports project into collateral evolutions. It makes you think of backports as atomic pieces, each addressing one collateral evolutions, one or a few series of commits upstream. Since Coccinelle lets you peg grammar rules to a structure granularity it means you can strictly localize a change, therefore avoiding similar structural changes in other parts of code for data structures that resemble the structure you are trying to change. Because of this there is no harm to having the additional changes applied since if a driver didn't have the change in place it meant that the driver was not enabled to be compiled for an older kernel, so Coccinelle was just extending collateral evolutions backport to more code and you won't use it where it wasn't there before. In the case that a driver was not enabled for older kernels the additional code will simply never run, although it will increase the binary mildly. This could however mean enabling drivers for older kernels. Getting Coccinelle to generate more backport code is a great examples of taking automatic backporting further. It also means that you have to do less work. Since we don't yet have SmPL grammar patch inferences, or support for the inverse of an SmPL patch, it means we have to translate collateral evolutions to SmPL form manually today. In the future this will change though and my hope is that we also get developers to write new collateral evolutions with SmPL, so that we can simply backport it with SmPL.


To what extent can we use Coccinelle? Can we backport everything with Coccinelle? I wanted to explore the limits of what we could do with Coccinelle. My original goal was to categorize collateral evolutions by types, and try to see which ones Coccinelle could not address. I started this by reviewing the list of collateral evolutions backports we had, and looking for the collateral evolution series of patches that had the largest amount of code or that I thought was more complex to address. Turns out that the more complex collateral evolutions that do have #ifefs are exactly what Coccinelle can be used to backport. Remember that backports already does most of its backport work by using a series of header files and exported symbols that implements backport functionality not available on older kernels. The series of legacy patches simply address the corner cases, things that we cannot backport through helpers or through exported symbols, and these typically require #ifdef'ing code. There's really only a few types of changes left then. Patches for collateral evolutions are split by the subsystem that they affect and then for each specific collateral evolutions by labeling it to describe the change backported. For networking we currently have 76 collateral evolutions. Of those only about 10 have extensive series of changes and the rest are really small. Using Coccinelle for small changes doesn't make sense -- specially if all we're doing is carrying over the patch as the kernel evolves and simply updating its hunks. Its the long patches, the complex patches, which are error prone, which are a bitch to backport, and that are applicable to a lot of drivers which we want to address with Coccinelle. This lets us simplify maintenance with a simple set of grammar rules and removes the nightmare of complex patch management. I decided to test the limits of Coccinelle by working on the craziest patch I could find that I was sure could not be backported. Obviously I was wrong. Here's what this looks like.


Thread IRQ support is an option that lets drivers let their IRQ context be run in a thread, which let it sleep. In order to backported threaded IRQ support Michael Buesch decided in 2009 to build our own struct compat_threaded_irq  that older kernels can use to queue_work() onto as the kernel thread will be running in process context, all the magic is dealt with behind the scenes through a header file that older kernels use, go check out the interrupt.h file. For now its required that each driver that wants to use this helper must extend one of their own private data structures with a struct compat_threaded_irq, to be later used on older kernels when a driver uses request_threaded_irq(), for older kernels we call compat_request_threaded_irq() instead. Then other IRQ calls for older kernels must call the respective backported IRQ version call, so when synchronize_irq(dev->dev->irq) is called on newer kernels older kernels must call compat_synchronize_threaded_irq(&dev->irq_compat). Likewise when free_irq() is called two helpers are now needed, compat_free_threaded_irq() and compat_destroy_threaded_irq(). All this is addressed in the 09-thread-irq series of legacy patches. I'm providing links to the code in place on the linux-3.13.y branch of backports as this code will soon disappear from the master branch.


To replace this series we end up with 4 SmPL rules, each rule has a name, pegged in between @ symbols. So @ rule_name @ is an example for a rule named rule_name. For complex examples we may want to tell Coccinelle only to proceed if a series of rules match on the expressed expressions or types declared. Lets review each rule separately.

Let's take a look at one change on a driver through the legacy patch approach. We'll review first the extension which I figured would be hard to generalize through grammar.

--- a/drivers/net/wireless/b43/b43.h
+++ b/drivers/net/wireless/b43/b43.h
@@ -805,6 +805,9 @@ enum {

 /* Data structure for one wireless device (802.11 core) */
 struct b43_wldev {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,31)
+    struct compat_threaded_irq irq_compat;
+#endif
     struct b43_bus_dev *dev;
     struct b43_wl *wl;
     /* a completion event structure needed if this call is asynchronous */ 


This change should be pretty straight forward, but consider figuring out somehow telling Coccinelle that what we need to do is modify this specific data structure. How can we give a hint to Coccinelle that this is the exact data structure? Remember that we don't want to make a grammar rule only for this driver, we want to generalize this change, this is the purpose of breaking things out with collateral evolutions, they are not specific to a driver but rather a change was done to them as a collateral to an evolution on the Linux kernel. To get a better idea of what we should tell Coccinelle we should change we must look at where this data structure is used in the series of patches. 
--- a/drivers/net/wireless/b43/main.c
+++ b/drivers/net/wireless/b43/main.c
@@ -4290,9 +4299,17 @@ static int b43_wireless_core_start(struc
             goto out;
         }
     } else {
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31)
         err = request_threaded_irq(dev->dev->irq,

                        b43_interrupt_handler,
                        b43_interrupt_thread_handler,
                        IRQF_SHARED, KBUILD_MODNAME, dev);   

+#else
+        err = compat_request_threaded_irq(&dev->irq_compat,
+                          dev->dev->irq,
+                          b43_interrupt_handler,
+                          b43_interrupt_thread_handler,
+                          IRQF_SHARED, KBUILD_MODNAME, dev);
+#endif
         if (err) {
             b43err(dev->wl, "Cannot request IRQ-%d\n",
                    dev->dev->irq);

The  struct b43_wldev is used for the dev variable above, its declared in its entry routine as an argument as struct b43_wldev *dev. So we know we can tell Coccinelle somehow that the target data structure we want to modify is used on request_threaded_irq(), but we also have to add our own #ifdef there for the older kernel's case as well.

The trick to getting this change in is using the generic type, and referencing it as a pointer, and later using that to tell Coccinelle which data structure you want modified. Pay close attention to the type T and T *private declarations. 


@ threaded_irq @
identifier ret;
expression irq, irq_handler, irq_thread_handler, flags, name;
type T;
T *private;
@@

+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31)
ret = request_threaded_irq(irq,
               irq_handler,
               irq_thread_handler,
               flags,
               name,
               private);
+#else
+ret = compat_request_threaded_irq(&private->irq_compat,
+                   irq,
+                   irq_handler,
+                   irq_thread_handler,
+                   flags,
+                   name,
+                   private);
+#endif

By using Type T and then T *private we are telling Coccinelle that private is a pointer, and that we want its type declared as T. This lets us later reference that type and if we want to extend it, which is exactly what we want to do! Let's look at the last piece that does this magic. 

@ modify_private_header depends on threaded_irq @
type threaded_irq.T;
@@

T {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,31)
+      struct compat_threaded_irq irq_compat;
+#endif

...
};


The key is to notice the usage of threaded_irq on the type on this rule, that is actually the name of the first rule, the name is put in between the two @ symbols. Delcaring it as type threaded_irq.T means that we want to use whatever Coccinelle picked up on the threaded_irq rule for T. This is actually the last rule, and the first one I showed is the first in the series. I decided that if we're going to extend driver data structure it'd be best to do that in the beginning as the end of a data structure at times can be used for dynamic arrays of variable size. The full thing looks as follow: 

/*
Backports threaded IRQ support
The 2.6.31 kernel introduced threaded IRQ support, in order to
backport threaded IRSs on older kernels we built our own struct
compat_threaded_irq to queue_work() onto it as the kernel thread
will be running the thread in process context as well.
For now each driver's private data structure is modified to add
the their own struct compat_threaded_irq, and that is used by
the backports module to queue_work() onto it. We can likely avoid
having to backport this feature by requiring to modify the private
driver's data structure by relying on an internal worker thread
within the backports module, this should be revised later.
*/
@ threaded_irq @
identifier ret;
expression irq, irq_handler, irq_thread_handler, flags, name;
type T;
T *private;
@@
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31)
ret = request_threaded_irq(irq,
               irq_handler,
               irq_thread_handler,
               flags,
               name,
               private);
+#else
+ret = compat_request_threaded_irq(&private->irq_compat,
+                   irq,
+                   irq_handler,
+                   irq_thread_handler,
+                   flags,
+                   name,
+                   private);
+#endif
@ sync_irq depends on threaded_irq @
expression irq;
type threaded_irq.T;
T *threaded_irq.private;
@@
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31)
synchronize_irq(irq);
+#else
+compat_synchronize_threaded_irq(&private->irq_compat);
+#endif
@ free depends on threaded_irq @
expression irq, dev;
type threaded_irq.T;
T *threaded_irq.private;
@@
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,31)
free_irq(irq, dev);
+#else
+compat_free_threaded_irq(&private->irq_compat);
+compat_destroy_threaded_irq(&dev->irq_compat);
+#endif
@ modify_private_header depends on threaded_irq @
type threaded_irq.T;
@@
T {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,31)
+      struct compat_threaded_irq irq_compat;
+#endif
...
};

The beautiful thing about this is that 09-thread-irq collateral evolution backport only addressed 3 drivers and that using the above SmPL patch in backports means we can transpose this backport onto all of the drivers that we are carrying, 13 of which are using request_threaded_irq(), so 10 new drivers get this collateral evolution backported, automatically. This is simply one of the architectural backport gain of backporting collateral evolutions with Coccinelle. There's a few other interesting things worth mentioning that converting the 09-thread-irq legacy patch to SmPL revealed:
  1. The backport was inconsistently programmed. This was revealed when Coccinelle ended up adding the struct compat_threaded_irq
    on the struct iwl_trans data structure rather than the struct iwl_trans_pcie! The reason this worked however was that it didn't matter which data structure we used in the driver, so long as it was consistent. Coccinelle deserves a brownie point here, security programmers should be excited about the prospects about precision here (also check out the Coccinelle library maintained by Peter).
  2. The 09-thread-irq backport was mildly sloppy -- it backported two collateral evolutions not one! We tuck this way under a new series, in case other drivers wish to backport this. Its important to do this to document and track why things are done. This new series backports commit b25c340c1 added by Thomas Gleixner through kernel v2.6.32 which added support for IRQF_ONESHOT. This lets drivers that use threaded IRQ support to request that the IRQ is not masked after the hard interrupt handler completes as this requires device access in hard IRQ context and for buses such as i2c and spi this at times is not possible. The TI driver uses this when a platform quirk with
    WL12XX_PLATFORM_QUIRK_EDGE_IRQ is detected. In retrospect this quirk
    does not seem backportable unless IRQF_ONESHOT is really not a requirement, but desired. If WL12XX_PLATFORM_QUIRK_EDGE_IRQ is indeed a requirement for IRQF_ONESHOT then we should not probe complete. Its unclear if this is a universal thing or not.
  3. The amount of time to generate the backports target code is reduced by embracing this new SmPL patch, even though it actually automatically backported the collateral evolution to 10 other drivers. The run time used to be 1 minute and 33 seconds, so it turns out 27 seconds were shaved off somehow magically. The new run time after this patch:
real    1m6.023s
user    10m0.276s
sys     0m26.196s
Now how wicked cool would it be if you didn't have to write SmPL grammar patches in the first place, you'd instead get Coccinelle to infer the SmPL grammar patch for you by giving it a series of legacy patches? Theoretically this is possible and I have hinted in the past that this is what drove me to consider Coccinelle more seriously as I found it hard to believe we could get a lot of developers to pick up a language for a grammar, which at this point seems rather important do embrace anyway, much like learning C. Another interesting prospect is getting Coccinelle to grok an -R option similar to patch -R, which would tell Coccinelle to apply the inverse of the patch. This way if we could get some kernel developers to write collateral evolutions with SmPL it would mean we wouldn't even have to infer the SmPL patch but instead extract it from the commit git log which could be used to keep record of the SmPL grammar rule. This is the ideal situation we need to strive to to perform automatic Linux kernel backporting. During my trip in Paris I found out though that the above two sets of objectives require a bit more research and development, however there is hope that this is achievable. Computer science students: if the above sounds interesting to you reach out to Julia, you might be able to help advance Linux with helping with research and development in this area.


Parallelism was the last aspect I tried to address on Coccinelle. Coccinelle already already has support for parallelism, it does this by splitting up all the target files it needs to address into a series of buckets, and each process addresses one bucket. Coccinelle requires you right now to specify that if you need j threads you iterate and kick it j times separately and for each iteration you specify j as the max number of threads you want and keep bumping an index on each run. In short it is not doing the spawning of children / threads automatically for you. As with the first performance observation made by Johannes it should be clear that this incurs a performance hit, requiring you to hit the shell for each spawned thread. A change to do this all internally on Coccinelle is welcomed but Julia notes making this change poses major problems for the Coccinelle internal structure, it can be addressed but will require a student to tackle this, and hope is that it can be tackled some time in 2014, unless a daring OCaml guru with spare time wishes to help and jump in. In the meantime we have to work with the solution in place. For shell this looks as follows:
 

#!/bin/bash
# By Kees Cook
# http://comments.gmane.org/gmane.comp.version-control.coccinelle/680
set -e
MAX=$(getconf _NPROCESSORS_ONLN)
dir=$(mktemp -d)
for i in $(seq 0 $(( MAX - 1 )) ); do
    spatch -max $MAX -index $i -very_quiet "$@" > $dir/$i.out &
done
wait
cat $dir/*.out
rm -f $dir/*.out
rmdir $dir

 


For now I've created a Linux backports python library that takes care of this for us but it still has the incurred overhead of creating j number of processes if its going to use j threads. (Update: you can now download pycocci, a standalone Python wrapper with sensible default for Coccinelle and multithread support as was implemented in backports, this has been submitted for inclusion into upstream Coccinelle). The overhead isn't as significant as with grep though as grep was used for every file during inspection, the overhead currently should therefore be extremely minimal. In practice I've observed the best performance using Linux as the target source using 3 times the number of CPUs available on the monster build box donated by SUSE, HP, and the Linux Foundation. The explanation for this is that Coccinelle threads that have nothing to do will bail out, leaving the CPU idle. If using Coccinelle on the Linux kernel this can happen quite a bit as an SmPL patch only has rules that should apply to certain files. If you are not using indexing this happens quite a bit. On this system embracing parallelism on Coccinelle as of backports-20140311 yields for a gain of 94.38% on run time at code generation time! On this system it meant reducing the amount of time at code generation time from 19 minutes 34 second to 1 minute and 6 second, after the new threaded IRQ SmPL patch! For those interested further in performance it is worth mentioning that all of the statistics provided, as well as statistics reported on the backports commit log are from monster build box that we use, and that we have all of our code and git trees in RAM. I suspect we can do better, maybe by using RCU. I've socialized a few other ideas as well but Julia recently expressed interest in exploring usage of GNU make's jobserver which I have written about before. Let's see!


Can using Coccinelle scale? The answer is yes, specially with parallelism in place. And let's remember that the current performance improvement is without software indexing. Recall that we've now determined that you won't be using Coccinelle for every single backport. You only want to use Coccinelle for long series of patches backport collateral evolutions. My focus in Paris was to use my time efficiently and review these and only bug Julia about the series which I deemed likely impossible for Coccinelle to address. The only patches Coccinelle can't address are the obvious ones, where there is no form that can be expressed. Its also not worth it to use Coccinelle for tiny changes too.


Do you want to test a Coccinelle patch? You can either use spatch directly with the above bash script for parallelism (note I use 3 num CPUs threads, the above just uses the number of CPU threads). If using backports you can use gentree.py within backports with --test-cocci. Johannes added a while ago git support into backports whereby if you specify --gitdebug a git tree will be generated for the code generated. A commit will be triggered after the first initial code import, and then one commit per patch, and one commit per SmPL patch. We take advantage of this feature and --test-cocci will skip all legacy patches, and only applies the Coccinelle SmPL patch specified. Can you can then just change into the directory where the code was generated and issue 'git show'.


What about profiling Coccinelle within backports? Sure -- just use --profile-cocci for gentree.py and specify the Coccinelle patch you want to test and profile. Coccinelle will generate a profile report for the time spent on each routine within Coccinelle for each process that ran. Remember that each process will only work on a bucket list of files, so at times a process will yield no interesting profile results, which could likely explain why I get best results with 3 * number of CPU threads -- some processes likely just die out fast. Here is an example output of a profile run on the 11-dev-pm-ops.cocci collateral evolution. First the SmPL rule, this one should be pretty straight forward.

// The 2.6.29 kernel has new struct dev_pm_ops [1] which are used
// on the pci device to distinguish power management hooks for suspend
// to RAM and hibernation. Older kernels don't have these so we need
// to resort back to the good ol' suspend/resume. Fortunately the calls
// are not so different so it should be possible to resuse the same
// calls on compat code with only slight modifications.
//
// [1] http://lxr.linux.no/#linux+v2.6.29/include/linux/pm.h#L170

@ module_pci @
declarer name MODULE_DEVICE_TABLE;
identifier pci_ids;
@@

MODULE_DEVICE_TABLE(pci, pci_ids);

@ simple_dev_pm depends on module_pci @

identifier ops, pci_suspend, pci_resume;
declarer name SIMPLE_DEV_PM_OPS;
declarer name compat_pci_suspend;
declarer name compat_pci_resume;
@@

+compat_pci_suspend(pci_suspend);
+compat_pci_resume(pci_resume);

SIMPLE_DEV_PM_OPS(ops, pci_suspend, pci_resume);

@@
identifier backport_driver;
expression pm_ops;
fresh identifier backports_pci_suspend = simple_dev_pm.pci_suspend ## "_compat";
fresh identifier backports_pci_resume = simple_dev_pm.pci_resume ## "_compat";
@@

struct pci_driver backport_driver = {
+#if (LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,29))
    .driver.pm  = pm_ops,
+#elif defined(CONFIG_PM_SLEEP)
+    .suspend    = backports_pci_suspend,
+    .resume     = backports_pci_resume,
+#endif

};


Although the profile support currently uses 3 * number CPU threads on Coccinelle here  is one example output with only 1 process. This is with an older version of Coccinelle.

$ ./gentree.py --clean --verbose --profile-cocci \
    patches/collateral-evolutions/network/11-dev-pm-ops.cocci \
    /home/mcgrof/linux-next/ \
    /home/mcgrof/build/backports-20131206

On big iron backports server:

---------------------
profiling result
---------------------
Main total                               :    275.327 sec          1 count
Main.outfiles computation                :    274.761 sec          1 count
full_engine                              :    272.034 sec        291 count
C parsing                                :    158.143 sec        346 count
TOTAL                                    :    158.141 sec        346 count
HACK                                     :     69.680 sec        696 count
C parsing.tokens                         :     54.000 sec        693 count
Parsing: 1st pass                        :     43.870 sec      22728 count
YACC                                     :     42.921 sec      22615 count
C parsing.fix_cpp                        :     36.855 sec        349 count
MACRO mgmt prep 2                        :     32.787 sec        346 count
TAC.annotate_program                     :     32.779 sec        318 count
flow                                     :     31.251 sec      21004 count
LEXING                                   :     26.415 sec        346 count
bigloop                                  :     15.298 sec        291 count
process_a_ctl_a_env_a_toplevel           :     14.840 sec      46661 count
C parsing.lookahead                      :     13.254 sec    1965389 count
mysat                                    :     12.777 sec      46661 count
show_xxx                                 :     11.933 sec      49645 count
C parsing.fix_define                     :      6.957 sec        693 count
Type_c.type_of_s                         :      6.729 sec     135896 count
module_pci                               :      6.653 sec        291 count
rule starting on line 28                 :      6.630 sec        291 count
fix_flow                                 :      6.498 sec      20038 count
C parsing.lex_ident                      :      6.388 sec    1695857 count
C consistencycheck                       :      6.213 sec        346 count
Pattern3.match_re_node                   :      6.110 sec     969235 count
Common.full_charpos_to_pos_large         :      5.684 sec        693 count
C parsing.mk_info_item                   :      3.556 sec      22728 count
worth_trying                             :      2.683 sec       1902 count
Parsing: multi pass                      :      2.574 sec        206 count
simple_dev_pm                            :      2.011 sec        291 count
TAC.unwrap_unfold_env                    :      2.007 sec     171026 count
TAC.typedef_fix                          :      1.948 sec     273295 count
TAC.lookup_env                           :      1.583 sec     236568 count
TAC.add_binding                          :      0.896 sec      57620 count
MACRO managment                          :      0.439 sec        118 count
Main.result analysis                     :      0.418 sec          1 count
Common.=~                                :      0.305 sec      80558 count
C unparsing                              :      0.168 sec         41 count
MACRO mgmt prep 1                        :      0.148 sec        346 count
parse cocci                              :      0.115 sec          1 count
pre_engine                               :      0.115 sec          1 count
Common.info_from_charpos                 :      0.102 sec         54 count
Main.infiles computation                 :      0.033 sec          1 count
ctl                                      :      0.019 sec         94 count
Transformation3.transform                :      0.006 sec         27 count
TAC.lookup_typedef                       :      0.004 sec        332 count
check_duplicate                          :      0.003 sec          1 count
Common.group_assoc_bykey_eff             :      0.003 sec          1 count
merge_env                                :      0.003 sec        649 count
post_engine                              :      0.000 sec          1 count
get_glimpse_constants                    :      0.000 sec          1 count
Common.full_charpos_to_pos               :      0.000 sec          2 count
asttoctl2                                :      0.000 sec          1 count

On a Chromebook Pixel:

---------------------
profiling result
---------------------
Main total                               :    379.349 sec          1 count
Main.outfiles computation                :    379.139 sec          1 count
full_engine                              :    372.905 sec       1902 count
HACK                                     :     96.134 sec       2785 count
C parsing.tokens                         :     76.053 sec       2769 count
Parsing: 1st pass                        :     57.708 sec      82287 count
YACC                                     :     56.033 sec      81918 count
C parsing.fix_cpp                        :     52.167 sec       1400 count
TAC.annotate_program                     :     46.560 sec       1356 count
MACRO mgmt prep 2                        :     43.976 sec       1384 count
flow                                     :     41.027 sec      80563 count
LEXING                                   :     38.111 sec       1384 count
bigloop                                  :     17.631 sec       1329 count
process_a_ctl_a_env_a_toplevel           :     17.174 sec     161123 count
C parsing.lookahead                      :     15.138 sec    6737589 count
mysat                                    :     14.537 sec     161123 count
Type_c.type_of_s                         :     11.815 sec     675989 count
Common.full_charpos_to_pos_large         :      9.507 sec       2769 count
fix_flow                                 :      8.685 sec      77269 count
module_pci                               :      8.459 sec       1329 count
rule starting on line 28                 :      8.400 sec       1329 count
C consistencycheck                       :      8.137 sec       1384 count
C parsing.lex_ident                      :      7.810 sec    5665983 count
C parsing.fix_define                     :      7.343 sec       2769 count
Pattern3.match_re_node                   :      6.827 sec    3046889 count
C parsing.mk_info_item                   :      5.787 sec      82287 count
show_xxx                                 :      4.825 sec     174022 count
Parsing: multi pass                      :      2.802 sec        679 count
TAC.lookup_env                           :      2.556 sec     858456 count
TAC.typedef_fix                          :      2.540 sec     976842 count
TAC.unwrap_unfold_env                    :      2.278 sec     587709 count
TAC.add_binding                          :      1.157 sec     204618 count
simple_dev_pm                            :      0.762 sec       1329 count
MACRO managment                          :      0.610 sec        394 count
Common.=~                                :      0.377 sec     253422 count
MACRO mgmt prep 1                        :      0.299 sec       1384 count
Main.result analysis                     :      0.200 sec          1 count
Common.info_from_charpos                 :      0.135 sec        248 count
C unparsing                              :      0.132 sec         41 count
pre_engine                               :      0.050 sec          1 count
parse cocci                              :      0.050 sec          1 count
C parsing                                :      0.016 sec       1384 count
check_duplicate                          :      0.012 sec          1 count
Common.group_assoc_bykey_eff             :      0.012 sec          1 count
TAC.lookup_typedef                       :      0.011 sec       1314 count
Main.infiles computation                 :      0.010 sec          1 count
ctl                                      :      0.009 sec         94 count
merge_env                                :      0.005 sec       2725 count
TOTAL                                    :      0.004 sec       1384 count
Transformation3.transform                :      0.003 sec         27 count
Common.full_charpos_to_pos               :      0.002 sec          2 count
C unparsing.new_tabbing                  :      0.000 sec        149 count
get_glimpse_constants                    :      0.000 sec          1 count
asttoctl2                                :      0.000 sec          1 count
post_engine                              :      0.000 sec          1 count


OK one last SmPL example, this one was added by Johannes: 

@@
expression dev;
expression ops;
@@
-dev->netdev_ops = ops;
+netdev_attach_ops(dev, ops);

 
We do all the dirty trickery of backporting the net_device data structure changes for device operation callbacks into a helper, this is an example of how we minimize code changes. I documented how and why we do this in my last post on automatically backporting the Linux kernel, go read that to learn how we backport this. I stated above that you can use data structures to ask Coccinelle to only make changes specific to the data structure you are interested in modifying, therefore reducing the namespace for changes. This SmPL patch can be modified to be data structure specific. 

@@
struct net_device *dev;
struct net_device_ops ops;
@@
-dev->netdev_ops = &ops;
+netdev_attach_ops(dev, &ops);

 
To see what I mean let me give you an example simple code snippet we can use to test this on, I've also put this on github on a netdev-ops git tree. After installing Coccinelle, you just want to run:
  1. make test1
  2. git checkout -f
  3. make test2
The difference in output should be enough to provide clarity for what I'm about to explain. Below is the netdev.c code snippet that we deal to test transformations.

#include

struct net_device_ops {
};


struct net_device {
    struct net_device_ops *netdev_ops;
};

struct bubble_ops {
};

struct bubbles {
    struct bubble_ops *netdev_ops;
};

static struct net_device_ops my_netdev_ops = {
};

static struct bubble_ops my_bubble_ops = {
};

static struct parent {
    struct net_device *dev;
    int b;
};

static struct parent_usb {
    struct net_device *net;
    int b;
};

int main(void)
{
    struct parent *p = malloc(sizeof(struct parent));
    struct parent_usb *p_usb = malloc(sizeof(struct parent));
    struct net_device *dev = malloc(sizeof(struct net_device));
    struct bubbles *bubble = malloc(sizeof(struct bubbles));

    dev->netdev_ops = &my_netdev_ops;
    bubble->netdev_ops = &my_bubble_ops;

    free(dev);
    free(bubble);
    free(p);
    free(p_usb);

    p->dev = dev;
    p->dev->netdev_ops = &my_netdev_ops;
    p_usb->net->netdev_ops = &my_netdev_ops;

    return 0;
}


Using the second version of the SmPL rules file will ensure we do not modify bubble->netdev_ops, and we only change  the dev->netdev_ops line. The p_usb case was added as an example where the dev name was not used but it still works. Notice that for the Linux kernel source it does however mean that you should use --recursive-includes on Coccinelle. Security folks should be excited about this though. The reason why this works is that the code generation and inspection is happening on the latest upstream code always. The above change to the SmPL rule actually incurs an added 50 second penalty but that is because of the --recursive-includes flag. Now, the interesting thing about this specific collateral evolution, and why I've dedicated so much attention to it in previous posts and in this post is that if we added a static inline upstream on the Linux kernel we can get rid of this SmPL file completely and the backport would just be done automatically through the backports module. I'm trying to do as much homework as I can to ensure that before I send a patch upstream its well understood and documented exactly why I believe an upstream change should be considered in light of the gains of automatically backporting Linux. Even though we'd end up removing that SmPL patch using SmPL however helps formalize this entire process, and we likely wouldn't end up removing all SmPL patches.

Our SmPL patch count per backports release so far:
  • linux-3.13.y - 3 SmPL patches (all thanks to Johannes!):
    • 25-multicast.cocci
    • 0005-netlink-portid.cocci
    • 0001-netdev_ops.cocci
  •  linux-3.14.y - 5 SmPL patches:
    • 25-multicast.cocci
    • 0005-netlink-portid.cocci
    • 0001-netdev_ops.cocci
    • 11-dev-pm-ops.cocci
    • 62-usb_driver_lpm.cocci
  • master - 6 SmPL patches
    • 25-multicast.cocci
    • 0005-netlink-portid.cocci
    • 0001-netdev_ops.cocci
    • 11-dev-pm-ops.cocci
    • 62-usb_driver_lpm.cocci
    • 0015-threaded-irq.cocci


The last thing we'll need to address somehow is to ensure that we don't provide support for a ton of kernels. That just doesn't scale. We currently provide backports down to the last 30 kernels. Before we make a release we test compile each release against 30 kernels. We need to shave that down but the reason we haven't been doing much of this is that most silicon industry have customers using tons of ancient random kernels and they never upgrade. It should be clear this is a security concern and I hope that with education, and perhaps using backports as a carrot, we can streamline only getting the entire industry to work on and embrace usage of only the kernels listed on kernel.org. Folks -- if we want to scale, we gotta do this.


In so far as my new role at SUSE is concerned, we'll be primarily focusing only on kernels greater than 3.0. We will also need to start being pickier about what drivers we backport. We should not backport / compile / test carry anything ancient. There is simply no point. You should expect a good cleaning on backports soon to address all this. We should strive so that kernel developers never have to do backporting -- in order to help with this maintainers can help be considerate of software architecture strategies that can be embraced to help do backporting efficiently. We're still figuring out what those things are, but you will soon see some example patches posted by me upstream that try to help with this. One example will be to introduce usage of static inlines for usage on data structure assignment. We should be conservative about what things make sense though. We're still learning the ropes as to what can help, but we're on our way.
Post a Comment