Rapidly detecting Linux kernel source features for driver deployment

A while back I wrote about genconfig.sh, a technique for auto-detecting Linux kernel features, in order to improve portability of an open source, out-of-tree filesystem driver I developed as part of ElectricAccelerator. genconfig.sh has enabled me to maintain compatibility across a wide range of Linux kernels with relative ease, but recently I noticed that the script was unacceptably slow. On the virtual machines in our test lab, genconfig.sh required nearly 65 seconds to execute. For my 11-person team, a conservative estimate of time wasted waiting for genconfig.sh is nearly an entire person-month per year. With a little effort, I was able to reduce execution time to about 7 seconds, nearly 10x faster! Here’s how I did it.

A brief review of genconfig.sh

genconfig.sh is a technique for detecting the source code features of a Linux kernel. Like autoconf configure scripts, genconfig.sh uses a series of trivial C source files to probe for various kernel source features, like the presence or absence of the big kernel lock. For example, here’s the code used to determine whether the kernel has the set_nlink() helper function:

1
2
3
4
5
#include <linux/fs.h>
void dummy(struct inode *i)
{
set_nlink(i, 0);
}

If a particular test file compiles successfully, the feature is present; if the compile fails, the feature is absent. The test results are used to set a series of C preprocessor #define directives, which in turn are used to conditionally compile driver code suitable for the host kernel.

Reaching the breaking point

When I first implemented genconfig.sh in 2009 we only supported a few versions of the Linux kernel. Since then our support matrix has swollen to include every variant from 2.6.9 through 3.5.0, including quirky “enterprise” distributions that habitually backport advanced features without changing the reported kernel version. But platform support tends to be a mostly one-way street: once something is in the matrix, it’s very hard to pull it out. As a consequence, the number of feature tests in genconfig.sh has grown too, from about a dozen in the original implementation to over 50 in the latest version. Here’s a real-time recording of a recent version of genconfig.sh on one of the virtual machines in our test lab:

genconfig.sh executing on a test system; click for full size

Accelerator actually has two instances of genconfig.sh, one for each of the two kernel drivers we bundle, which means every time we install Accelerator we burn about 2 minutes waiting for genconfig.sh — 25% of the 8 minutes total it takes to run the install. All told I think a conservative estimate is that this costs my team nearly one full person-month of time per year, between time waiting for CI builds (which do automated installs), time waiting for manual installs (for testing and verification) and my own time spent waiting when I’m making updates to support new kernel versions.

genconfig.sh: The Next Generation

I had a hunch about the source of the performance problem: the Linux kernel build system. Remember, the pattern repeated for each feature test in the original genconfig.sh is as follows:

  1. Emit a simple C source file, called test.c.
  2. Invoke the kernel build system, using make and a trivial kernel module makefile:
    1
    2
    3
    conftest-objs := test.o
    obj-m := conftest.o
    EXTRA_CFLAGS += -Werror
  3. Check the exit status from make to decide whether the test succeeded or failed.

The C sources used to probe for features are trivial, so it seemed unlikely that the compilation itself would take so long. But we don’t have to speculate — if we use Electric Make instead of GNU make to run the test, we can use the annotated build log and ElectricInsight to see exactly what’s going on:

ElectricInsight visualization of Linux kernel module build, click for full size.

Overall, using the kernel build system to compile this single file takes nearly 2 seconds — not a big deal with only a few tests, but it adds up quickly. To be clear, the only part we actually care about is the box labeled /root/__conftest__/test.o, which is about 1/4 second. The remaining 1 1/2 seconds? Pure overhead. Perhaps most surprising is the amount of time burned just parsing the kernel makefiles — the huge bright cyan box on the left edge, as well as the smaller bright cyan boxes in the middle. Nearly 50% of the total time is just spent parsing!

At this point an idea struck me: there’s no particular reason genconfig.sh must invoke the kernel build system separately for each probe. Why not write out all the probe files upfront and invoke the kernel build system just once to compile them all in a single pass? In fact, with this strategy you can even use parallel make (eg, make -j 4) to eke out a little bit more speed.

Of course, you can’t use the exit status from make with this approach, since there’s only one invocation for many tests. Instead, genconfig.sh can give each test file a descriptive name, and then check for the existence of the corresponding .o file after make finishes. If the file is present, the feature is present; otherwise the feature is absent. Here’s the revised algorithm:

  1. Emit a distinct C source file for each kernel feature test. For example, the sample shown above might be created as set_nlink.c. Another might be write_begin.c.
  2. Invoke the kernel build system, using make -j 4 and a slightly less trivial kernel module makefile:
    1
    2
    3
    conftest-objs := set_nlink.o write_begin.o ...
    obj-m := conftest.o
    EXTRA_CFLAGS += -Werror
  3. Check for the existence of each .o file, using something like if [ -f set_nlink.o ] ; then … ; fi to decide whether the test succeeded or failed.

The net result? After an afternoon of refactoring, genconfig.sh now completes in about 7 seconds, nearly 10x faster than the original:

Updated genconfig.sh executing on a test system, click for full size.

The only drawback I can see is that the script no longer has that satisfying step-by-step output, instead dumping everything out at once after a brief pause. I’m perfectly happy to trade that for the improved performance!

Future Work and Availability

This new strategy has significantly improved the usability of genconfig.sh. After I finished the conversion, I wondered if the same technique could be applied to autoconf configure scripts. Unfortunately I find autoconf nearly impossible to work with, so I didn’t make much progress exploring that idea. Perhaps one of my more daring (or stubborn) readers will take the ball and run with it there. If you do, please comment below to let us know the results!

The new version of genconfig.sh is used in ElectricAccelerator 6.2, and can also be seen in the open source Loopback File System (lofs) on my Github repo.

Measuring the Electric File System

Somebody asked me the other day what portion of the Electric File System (EFS) is shared code versus platform-specific code. The EFS, if you don’t know, is a custom filesystem driver and part of ElectricAccelerator. It enables us to virtualize the filesystem, so that each build job sees the correct view of the filesystem according to its logical position in the build, even if jobs are run out of order. It also provides the file usage information that powers our conflict detection algorithm. As a filesystem driver, the EFS is obviously tightly coupled to the platforms it’s used on, and the effort of porting it is one reason we don’t support more platforms than we do now — not that a dozen variants of Windows, 16 flavors of Linux and several versions of Solaris is anything to be ashamed of!

Anyway, the question intrigued me, and I found the answer quite surprising (click for full-size):

Note that here I’m measuring only actual code lines, exclusive of comments and whitespace, as of the upcoming 6.1 release. In total, the Windows version of the EFS has about 2x the lines of code that either the Solaris or Linux ports have. The platform-specific portion of the code is more than 6x greater!

Why is the Windows port so much bigger? To answer that question, I started looking at the historical size of the three ports, which lead to the following graph showing the total lines of code for (almost) every release we’ve made. Unfortunately our first two releases, 1.0 and 2.0, have been lost to the ether at some point over the past ten years, but I was able to collect data for every release starting from 2.1 (click for full-size):

Here you can see that the Windows port has always been larger than the others, but it’s really just a handful of Windows-specific features that blew up the footprint so much. The first of those was support for the Windows FastIO filesystem interface, an alternative “fast path” through the kernel that in theory enables higher throughput to and from the filesystem. It took us two tries to get that feature implemented, as shown on the graph, and all-told that contributed about 7,000 lines of code. The addition of FastIO to the filesystem means that the Windows driver has essentially three I/O interfaces: standard I/O, memory-mapped I/O and FastIO. In comparison, on Linux and Solaris you have only two: standard I/O and memory-mapped I/O.

The second significant difference between the platforms is that on Windows the EFS has to virtualize the registry in addition to the filesystem. In the 4.3 release we significantly enhanced that portion of the driver to allow full versioning of the registry along the same lines that the filesystem has always done. That feature added about 1,000 lines of code.

I marked a couple other points of interest on this graph as well. First, the addition of the “lightweight EFS” feature, which is when we switched from using RAM to store the contents of all files in the EFS to using temporary storage on the local filesystem for large files. That enabled Accelerator to handle significantly larger files, but added a fair amount of code. Finally, in the most recent release you can actually see a small decrease in the total lines of code on Solaris and Linux, which reflects the long-overdue removal of code that was needed to support legacy versions of those platforms (such as the 2.4.x Linux kernel).

I was surprised as well by the low rate of growth in the Solaris and Linux ports. Originally the Linux port supported only one version of the Linux kernel, but today it supports about sixteen. I guess this result reveals that the difficulty in porting to each new Linux version is not so much the amount of code to be added, but in figuring out exactly which few lines to add!

In fact, after the 4.4 release in early 2009, the growth has been relatively slow on all platforms, despite the addition of major new features to Accelerator as a whole over the last several releases. The reason is simply that most feature development involves changes to other components (primarily emake), rather than to the filesystem driver.

One last metric to look at is the number of unit tests for the code in the EFS. We don’t practice test-driven development for the most part, but we do place a strong emphasis on unit testing. Developers are expected to write unit tests for every change, and we strive to keep the tests isomorphic to the code to help ensure we have good coverage. Here’s how the total number of unit tests has grown over the years (click for full-size):

Note that this is looking only at unit tests for the EFS! We have thousands more unit tests for the other components of Accelerator, as well as thousands of integration tests.

Thankfully for my credibility, the growth in number of tests roughly matches the growth in lines of code! But I’m surprised that the ratio of tests to code is not more consistent across the platforms — that suggests a possible area for improvement. Rather than being discouraged by that though, I’m pleased with this result. After all, what’s the point of measuring something if you don’t use that data to improve?

HOWTO: ship a custom kernel driver for Linux

Pop quiz, hotshot: your company has developed a Linux kernel driver as part of its product offering. How do you deliver this driver such that your product is compatible with a significant majority of the Linux variants you are likely to encounter in the field? Consider the following:

  • RedHat Enterprise Linux 4 is based on kernel version 2.6.9
  • RHEL 5 is based on kernel version 2.6.18
  • RHEL 6 is based on 2.6.32
  • openSUSE 11.0 is based on 2.6.25
  • openSUSE 11.1 is based on 2.6.27
  • Ubuntu 9.04 is based on 2.6.28
  • Ubuntu 9.10 is based on 2.6.31
  • Ubuntu 10.04 is based on 2.6.32
  • Ubuntu 10.10 is based on 2.6.35

I could go on, but hopefully you get the point — “Linux” is not a single, identifiable entity, but rather a collection of related operating systems. And thus the question: how do you ship your driver such that you can install and use it on a broad spectrum of Linux variants? This is a problem that I’ve had to solve in my work.

Fundamentally, the solution is simple: ship the driver in source form. But that answer isn’t much help unless you can make your driver source-compatible with a wide range of kernel versions, spanning several years of Linux development. The solution to that problem is simple too, in hindsight, and yet I haven’t seen it used or described elsewhere: test for specific kernel features using something like a configure script; set preprocessor macros based on the results of the tests; and use the macros in the driver source to conditionally include code as needed. But before I get into the details of this solution, let’s look briefly at a few alternative solutions and why each was rejected.

Rejected alternatives: how NOT to ship a custom driver for Linux

Based on my informal survey of the state-of-the-art in this field, it seems there are three common approaches to solving this problem:

  1. Arrange for your driver to be bundled with the Linux kernel. If you can pull this off, fantastic! You’ve just outsourced the effort of porting your driver to the people who build and distribute the kernel. Unfortunately, kernel developers are not keen on bundling drivers that are not generally useful — that is, your driver has to have some utility outside of your specific application, or you can forget getting it bundled into the official kernel. Also, if you have any interesting IP in your driver, open-sourcing it is probably not an option.
  2. Prebuild your driver for every conceivable Linux variant. If you know which Linux variants your product will support, you could build the driver for each, then choose one of the prebuilt modules at installation time based on the information in /etc/issue and uname -r. VMWare uses this strategy — after installing VMWare Workstation, take a look in /usr/lib/vmware/modules/binary: you’ll find about a hundred different builds of their kernel modules, for various combinations of kernel versions, distributions and SMP-status. The trouble with this strategy is that it adds significant complexity to your build and release process: you need a build environment for every one of those variants. And all those modules bloat your install bundle. Finally, no matter how many distro’s you prebuild for, it will never be enough: somebody will come along and insist that your code install on their favorite variant.
  3. Ship source that uses the LINUX_VERSION_CODE and KERNEL_VERSION macros. These macros, defined by the Linux kernel build system, allow you to conditionally include code based on the version of the kernel being built. In theory this is all you need, if you know which version introduced a particular feature. But there are two big problems. First, you probably don’t know exactly which version introduced each feature. You could figure it out with some detective work, but who’s got the time to do that? Second, and far more troublesome, most enterprise Linux distributions (RHEL, SUSE, etc.) backport features and fixes from later kernels to their base kernel — without changing the value of LINUX_VERSION_CODE. Of course that renders this mechanism useless.

genconfig.sh: a configure script for kernel modules

Conceptually, genconfig.sh works the same way as an autoconf configure script: it uses a series of trivial test programs to check for different kernel features or constructs. The success or failure of each test to compile determines whether the corresponding feature is present, and by extension whether or not a particular bit of code ought to be included in the driver.

For example, in some versions of the Linux kernel (2.6.9, eg), struct inode includes a member called i_blksize. If present, this field should be set to the blocksize of the filesystem that owns the inode. It’s used in the implementation of the stat(2) system call. It’s a minor detail, but if you’re implementing a filesystem driver, it’s important to get it right.

We can determine whether or not to include code for this field by trying to compile a trivial kernel module containing just this code:

#include <linux/fs.h>
void dummy(void)
{
    struct inode i;
    i.i_blksize = 0;
    return;
}

If this code compiles, then we know to include code for managing the i_blksize field. We can create a header file containing a #define corresponding to this knowledge:

#define HAVE_INODE_I_BLKSIZE

Finally, the driver code uses that definition:

#ifdef HAVE_INODE_I_BLKSIZE
  inode->i_blksize = FS_BLOCKSIZE;
#endif

We can construct an equally trivial test case for each feature that is relevant to our driver. In the end we get a header with a series of defines, something like this:

#define HAVE_INODE_I_BLKSIZE
#define HAVE_3_ARG_INT_POSIX_TEST_LOCK
#define HAVE_KMEM_CACHE_T
#define HAVE_MODE_IN_VFS_SYMLINK
#define HAVE_3_ARG_PERMISSION
#define HAVE_2_ARG_UMOUNT_BEGIN
#define HAVE_PUT_INODE
#define HAVE_CLEANUP_IN_KMEM_CACHE_CREATE
#define HAVE_WRITE_BEGIN
#define HAVE_ADDRESS_SPACE_OPS_EXT
#define HAVE_SENDFILE
#define HAVE_DENTRY_IN_FSYNC

By referencing these definitions in the driver source code, we can make it source-compatible with a wide range of Linux kernel versions. To add support for a new kernel, we just have to determine which changes affect our module, write tests to check for those features, and update only the affected parts of our driver source.

This is more nimble, and far more manageable, than shipping prebuilt binaries for an endless litany of kernel variants. And it’s much more robust than relying on LINUX_VERSION_CODE: rather than implicitly trusting that a feature is present or absent based on an unreliable version string, we know for certain whether that feature is present, because we explicitly tried to use it.

Belt and suspenders: ensuring the driver works correctly

Now we have a strategy for shipping a driver that will build and load on a broad array of Linux variants. But this approach has introduced a new problem: how can we be sure that this driver that was just auto-configured and compiled on-the-fly will actually work as expected?

The solution to this problem has two components. First, we identified about a dozen specific Linux variants that are critical to our customers. The driver is exhaustively tested on each of these “tier 1” variants in every continuous integration build — over 3,000 automated unit tests are run against the driver on each. Of course, 12 variants is only a tiny fraction of the thousands of permutations that are possible, but by definition these variants represent the most important permutations to get right. We will know immediately if something has broken the driver on one of these variants.

Next, we ship a stripped down version of that unit test suite, and execute that automatically when the driver is built. This suite has only about 25 tests, but those tests cover every major piece of functionality — a reasonable compromise between coverage and simplicity. With this install-time test suite, we’ll know if there’s a problem with the driver on a particular platform as soon as somebody tries to install it.

Demonstration code

For demonstration purposes I have placed a trivial filesystem driver on my github repo. This driver, base0fs, was generated using the FiST filesystem generator, patched to make use of the genconfig.sh concept.

HOWTO: install kernel debuginfo packages on SUSE Linux Enterprise Server 11

I needed to debug a kernel crash on SUSE Linux Enterprise Server 11 today. If you’re not familiar with debugging Linux kernel crashes, you need the kernel debug symbols in order to analyze the crash dump. These are typically not part of the kernel image itself, but instead are bundled into a kernel debuginfo package corresponding to the kernel that produced the crash dump.

Although I’ve done this on RedHat Enterprise Linux many times, I had never debugged a kernel crash on SUSE before, so I was not familiar with the process for acquiring the debuginfo packages with that distro. I couldn’t find any single set of instructions explaining how to get the packages, and although it wasn’t hard, I figured I’d try to save somebody else a little time by writing down the steps I followed.

SUSE uses a package manager called ZYpp. I used zypper, the command-line interface to ZYpp, to install the packages.

Step 1: enable the debuginfo repositories

Before zypper can install the debuginfo packages, it must be able to find them. The packages reside in specialized debuginfo repositories, which are normally not enabled, although the system is aware of them. Use zypper repos to get a list of the repositories:

lin4-ea6:~ # zypper repos
# | Alias                                                    | Name                                                   | Enabled | Refresh
--+----------------------------------------------------------+--------------------------------------------------------+---------+--------
1 | SUSE-Linux-Enterprise-Server-11 11-0                     | SUSE-Linux-Enterprise-Server-11 11-0                   | Yes     | No     
2 | SUSE-Linux-Enterprise-Software-Development-Kit-11_11-0   | SUSE-Linux-Enterprise-Software-Development-Kit-11 11-0 | Yes     | No     
3 | SUSE-Linux-Enterprise-Software-Development-Kit-11_11-0_1 | SUSE-Linux-Enterprise-Software-Development-Kit-11 11-0 | Yes     | No     
4 | nu_novell_com:SLE11-Debuginfo-Pool                       | SLE11-Debuginfo-Pool                                   | No      | Yes    
5 | nu_novell_com:SLE11-Debuginfo-Updates                    | SLE11-Debuginfo-Updates                                | No      | Yes    
6 | nu_novell_com:SLES11-Extras                              | SLES11-Extras                                          | No      | Yes    
7 | nu_novell_com:SLES11-Pool                                | SLES11-Pool                                            | No      | Yes    
8 | nu_novell_com:SLES11-Updates                             | SLES11-Updates                                         | Yes     | Yes    

You want the two Debuginfo repos. To enable them, use zypper modifyrepo with the alias of the repo:

lin4-ea6:~ # zypper modifyrepo --enable nu_novell_com:SLE11-Debuginfo-Pool
Repository 'nu_novell_com:SLE11-Debuginfo-Pool' has been sucessfully enabled.
lin4-ea6:~ # zypper modifyrepo --enable nu_novell_com:SLE11-Debuginfo-Updates
Repository 'nu_novell_com:SLE11-Debuginfo-Updates' has been sucessfully enabled.

Step 2: find the debuginfo package for your crash

It’s critical to get the debuginfo package that matches the kernel that created your crash dump. It’s easy to determine the version you need: check the README.txt alongside the vmcore in the crash directory:

lin4-ea6:~ # cat /var/crash/2010-11-30-00:43/README.txt
Kernel crashdump
----------------

Crash time     : 2010-11-30 00:43 (+0000)
Kernel version : 2.6.27.45-0.1-pae
Host           : SLES-11-32
Dump level     : 0
Dump format    : compressed

In this case, I need the debuginfo for the pae variant of kernel version 2.6.27.45-0.1. Now, search the package repository for kernel debuginfo packages with zypper search:

lin4-ea6:~ # zypper search -s kernel-*-debuginfo*

Loading repository data...
Reading installed packages...

S | Name                     | Type    | Version          | Arch | Repository             
--+--------------------------+---------+------------------+------+------------------------
  | kernel-default-debuginfo | package | 2.6.27.54-0.2.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.48-0.12.1 | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.48-0.6.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.48-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.45-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.42-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.39-0.3.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.37-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.29-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.25-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.23-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.21-0.1.2  | i586 | SLE11-Debuginfo-Updates
  | kernel-default-debuginfo | package | 2.6.27.19-5.1    | i586 | SLE11-Debuginfo-Pool   
  | kernel-pae-debuginfo     | package | 2.6.27.54-0.2.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.48-0.12.1 | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.48-0.6.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.48-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.45-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.42-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.39-0.3.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.37-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.29-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.25-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.23-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.21-0.1.2  | i586 | SLE11-Debuginfo-Updates
  | kernel-pae-debuginfo     | package | 2.6.27.19-5.1    | i586 | SLE11-Debuginfo-Pool   
  | kernel-source-debuginfo  | package | 2.6.27.54-0.2.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.48-0.12.1 | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.48-0.6.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.48-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.45-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.42-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.39-0.3.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.37-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.29-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.25-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.23-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.21-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-source-debuginfo  | package | 2.6.27.19-5.1    | i586 | SLE11-Debuginfo-Pool   
  | kernel-vmi-debuginfo     | package | 2.6.27.54-0.2.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.48-0.12.1 | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.48-0.6.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.48-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.45-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.42-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.39-0.3.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.37-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.29-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.25-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.23-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.21-0.1.2  | i586 | SLE11-Debuginfo-Updates
  | kernel-vmi-debuginfo     | package | 2.6.27.19-5.1    | i586 | SLE11-Debuginfo-Pool   
  | kernel-xen-debuginfo     | package | 2.6.27.54-0.2.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.48-0.12.1 | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.48-0.6.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.48-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.45-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.42-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.39-0.3.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.37-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.29-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.25-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.23-0.1.1  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.21-0.1.2  | i586 | SLE11-Debuginfo-Updates
  | kernel-xen-debuginfo     | package | 2.6.27.19-5.1    | i586 | SLE11-Debuginfo-Pool   

You can see there are several versions of each variant available. One tricky thing is that there isn’t an exact match for the kernel version I need. I’m looking for 2.6.27.45-0.1; the closest thing to it is 2.6.27.45-0.1.1. This seems to be nothing more than a minor inconsistency in labeling: the 2.6.27.45-0.1.1 is the correct package.

Step 3: install the kernel debuginfo package

Having identified the package, you are ready to install it with zypper install:

lin4-ea6:~ # zypper install kernel-pae-debuginfo=2.6.27.45-0.1.1
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following NEW package is going to be installed:
  kernel-pae-debuginfo 


The following package is not supported by its vendor:
  kernel-pae-debuginfo 


Overall download size: 153.1 M. After the operation, additional 673.8 M will be used.
Continue? [YES/no]: 

At this prompt, you should type YES and hit Enter. zypper will download the package (which may take a while depending on the speed of your internet connection), showing a progress bar as it does:

Retrieving package kernel-pae-debuginfo-2.6.27.45-0.1.1.i586 (1/1), 153.1 M (673.8 M unpacked)
Retrieving: kernel-pae-debuginfo-2.6.27.45-0.1.1.i586.rpm [90% (1.1 M/s)]

Then zypper will install the package, again with a progress bar:

Retrieving package kernel-pae-debuginfo-2.6.27.45-0.1.1.i586 (1/1), 153.1 M (673.8 M unpacked)
Retrieving: kernel-pae-debuginfo-2.6.27.45-0.1.1.i586.rpm [done (244.7 K/s)]
Installing: kernel-pae-debuginfo-2.6.27.45-0.1.1 [84%]

When installation is complete, there is no more notification than that the progress bar reads “done”:

Retrieving package kernel-pae-debuginfo-2.6.27.45-0.1.1.i586 (1/1), 153.1 M (673.8 M unpacked)
Retrieving: kernel-pae-debuginfo-2.6.27.45-0.1.1.i586.rpm [done (244.7 K/s)]
Installing: kernel-pae-debuginfo-2.6.27.45-0.1.1 [done]

Step 4: symlink the debuginfo into the crash directory

The debuginfo package, for some reason, does not install the debuginfo files to a location known to the crash utility, so the final step is to make the debuginfo file available to crash before we invoke it. We do so by creating a symlink to the debuginfo file alongside the kernel image in the crash directory:

lin4-ea6:~ # ln -s /usr/lib/debug/boot/vmlinux-2.6.27.45-0.1-pae.debug /var/crash/2010-11-30-00:43

Decompress the kernel image and you’re in business:

lin4-ea6:~ # gzip -d /var/crash/2010-11-30-00:43/vmlinux-2.6.27.45-0.1-pae.gz
lin4-ea6:~ # cd /var/crash/2010-11-30-00:43
lin4-ea6:/var/crash/2010-11-30-00:43 # crash vmlinux-2.6.27.45-0.1-pae vmcore