post

Rapidly detecting Linux kernel source features for driver deployment

A while back I wrote about genconfig.sh, a technique for auto-detecting Linux kernel features, in order to improve portability of an open source, out-of-tree filesystem driver I developed as part of ElectricAccelerator. genconfig.sh has enabled me to maintain compatibility across a wide range of Linux kernels with relative ease, but recently I noticed that the script was unacceptably slow. On the virtual machines in our test lab, genconfig.sh required nearly 65 seconds to execute. For my 11-person team, a conservative estimate of time wasted waiting for genconfig.sh is nearly an entire person-month per year. With a little effort, I was able to reduce execution time to about 7 seconds, nearly 10x faster! Here’s how I did it.

A brief review of genconfig.sh

genconfig.sh is a technique for detecting the source code features of a Linux kernel. Like autoconf configure scripts, genconfig.sh uses a series of trivial C source files to probe for various kernel source features, like the presence or absence of the big kernel lock. For example, here’s the code used to determine whether the kernel has the set_nlink() helper function:

1
2
3
4
5
#include <linux/fs.h>
void dummy(struct inode *i)
{
set_nlink(i, 0);
}

If a particular test file compiles successfully, the feature is present; if the compile fails, the feature is absent. The test results are used to set a series of C preprocessor #define directives, which in turn are used to conditionally compile driver code suitable for the host kernel.

Reaching the breaking point

When I first implemented genconfig.sh in 2009 we only supported a few versions of the Linux kernel. Since then our support matrix has swollen to include every variant from 2.6.9 through 3.5.0, including quirky “enterprise” distributions that habitually backport advanced features without changing the reported kernel version. But platform support tends to be a mostly one-way street: once something is in the matrix, it’s very hard to pull it out. As a consequence, the number of feature tests in genconfig.sh has grown too, from about a dozen in the original implementation to over 50 in the latest version. Here’s a real-time recording of a recent version of genconfig.sh on one of the virtual machines in our test lab:

genconfig.sh executing on a test system; click for full size

Accelerator actually has two instances of genconfig.sh, one for each of the two kernel drivers we bundle, which means every time we install Accelerator we burn about 2 minutes waiting for genconfig.sh — 25% of the 8 minutes total it takes to run the install. All told I think a conservative estimate is that this costs my team nearly one full person-month of time per year, between time waiting for CI builds (which do automated installs), time waiting for manual installs (for testing and verification) and my own time spent waiting when I’m making updates to support new kernel versions.

genconfig.sh: The Next Generation

I had a hunch about the source of the performance problem: the Linux kernel build system. Remember, the pattern repeated for each feature test in the original genconfig.sh is as follows:

  1. Emit a simple C source file, called test.c.
  2. Invoke the kernel build system, using make and a trivial kernel module makefile:
    1
    2
    3
    conftest-objs := test.o
    obj-m := conftest.o
    EXTRA_CFLAGS += -Werror
  3. Check the exit status from make to decide whether the test succeeded or failed.

The C sources used to probe for features are trivial, so it seemed unlikely that the compilation itself would take so long. But we don’t have to speculate — if we use Electric Make instead of GNU make to run the test, we can use the annotated build log and ElectricInsight to see exactly what’s going on:

ElectricInsight visualization of Linux kernel module build, click for full size.

Overall, using the kernel build system to compile this single file takes nearly 2 seconds — not a big deal with only a few tests, but it adds up quickly. To be clear, the only part we actually care about is the box labeled /root/__conftest__/test.o, which is about 1/4 second. The remaining 1 1/2 seconds? Pure overhead. Perhaps most surprising is the amount of time burned just parsing the kernel makefiles — the huge bright cyan box on the left edge, as well as the smaller bright cyan boxes in the middle. Nearly 50% of the total time is just spent parsing!

At this point an idea struck me: there’s no particular reason genconfig.sh must invoke the kernel build system separately for each probe. Why not write out all the probe files upfront and invoke the kernel build system just once to compile them all in a single pass? In fact, with this strategy you can even use parallel make (eg, make -j 4) to eke out a little bit more speed.

Of course, you can’t use the exit status from make with this approach, since there’s only one invocation for many tests. Instead, genconfig.sh can give each test file a descriptive name, and then check for the existence of the corresponding .o file after make finishes. If the file is present, the feature is present; otherwise the feature is absent. Here’s the revised algorithm:

  1. Emit a distinct C source file for each kernel feature test. For example, the sample shown above might be created as set_nlink.c. Another might be write_begin.c.
  2. Invoke the kernel build system, using make -j 4 and a slightly less trivial kernel module makefile:
    1
    2
    3
    conftest-objs := set_nlink.o write_begin.o ...
    obj-m := conftest.o
    EXTRA_CFLAGS += -Werror
  3. Check for the existence of each .o file, using something like if [ -f set_nlink.o ] ; then … ; fi to decide whether the test succeeded or failed.

The net result? After an afternoon of refactoring, genconfig.sh now completes in about 7 seconds, nearly 10x faster than the original:

Updated genconfig.sh executing on a test system, click for full size.

The only drawback I can see is that the script no longer has that satisfying step-by-step output, instead dumping everything out at once after a brief pause. I’m perfectly happy to trade that for the improved performance!

Future Work and Availability

This new strategy has significantly improved the usability of genconfig.sh. After I finished the conversion, I wondered if the same technique could be applied to autoconf configure scripts. Unfortunately I find autoconf nearly impossible to work with, so I didn’t make much progress exploring that idea. Perhaps one of my more daring (or stubborn) readers will take the ball and run with it there. If you do, please comment below to let us know the results!

The new version of genconfig.sh is used in ElectricAccelerator 6.2, and can also be seen in the open source Loopback File System (lofs) on my Github repo.

post

The ElectricAccelerator 6.1 “Ship It!” Award

Having shipped ElectricAccelerator 6.1, I thought you might like to see the LEGO-based “Ship It!” award that I gave each member of the development team. I started this tradition with the 6.0 release last fall. Here’s the baseball card that accompanied the detective minifig I chose for this release:

The great detective is on the case!

The Accelerator 6.1 team

I picked the detective minifig for the 6.1 release in recognition of the significant improvements to Accelerator’s diagnostic capabilities (like cyclic redundancy checks to detect faulty networks, and MD5 checksums to detect faulty disks). Compared to the 6.0 award not much has changed in the design, although I did get my hands on the “official” corporate font this time. It strikes me that there’s a lot of wasted space on the back of the card though. Next time I’ll make better use of the space by incorporating statistics about the release. I actually have the design all ready to go, but you’ll have to wait until after the release to see it. Don’t fret though, the 6.2 release is expected soon!

post

Fixing recursive make

Recursive make is one of those things that everybody loves to hate. It’s even been the subject of one of those tired “… Considered Harmful” diatribes. According to popular opinion, recursive make will sap performance from your build, make it nigh impossible to ensure correctness in parallel builds, and may render the user sterile. OK, maybe not that last one. But seriously, the arguments against recursive make are legion, and deeply entrenched. The problem? They’re flawed. That’s because they assume there’s only one way to implement recursive make — when the submake is invoked, the parent make is blocked until the submake completes. That’s how almost everybody does it. But in Electric Make, part of ElectricAccelerator, we developed a novel new approach called non-blocking recursive make. This design eliminates the biggest problems attributed to recursive make, without requiring a painful and costly conversion of your build system to non-recursive make.

The problem with traditional recursive make

There’s really just two problems at the heart of complaints with traditional recursive make: first, there’s no way to ensure correctness of a parallel recursive make based build without overserializing the submakes, because there’s no way to articulate dependencies between individual targets in different submakes. That means you can’t have a dependency graph that is both correct and precise. Instead you either leave out the critical dependency entirely, which makes parallel (ie, fast) builds unreliable; or you serialize submakes in their entirety, which shackles build performance because no part of a submake with even a single dependency on some portion of an earlier submake can begin until the entire ealier submake completes. Second, even if there were a way to specify precise dependencies between targets in different submakes, most versions of make have implemented recursive make such that the parent make is blocked from proceeding until the submake has completed. Consider a typical use of recursive make with implicit serializations between submakes:

1
2
3
4
all:
@for dir in util client server ; do \
$(MAKE) -C $$dir; \
done

Each submake compiles a bunch of source files, then links them together into a library (util) or an executable (client and server). The only actual dependency between the work in the three make instances is that the client and server programs need the util library. Everything else is parallelizable, but with traditional recursive make, gmake is unable to exploit that parallelism: all of the work in the util submake must finish before any part of the client submake begins!

Conflict detection and non-blocking recursive make

If you’re familiar with Electric Make, you already know how it solves the first half of the recursive make problem: conflict detection and correction. I’ve written about conflict detection before, but here’s a quick recap: using the explicit dependencies given in the makefiles and information about the files accessed as each target is built, emake is able to dynamically determine when targets have been built too early due to missing explicit dependencies, and rerun those targets to generate the correct output. Electric Make can ensure the correctness of parallel builds even in the face of incomplete dependencies, even if the missing dependencies are between targets in different submakes. That means you need not serialize entire submakes to ensure the build will run correctly in parallel.

Like an acrobat’s safety net, conflict detection allows us to consider solutions to the other half of the problem that would otherwise be considered risky, if not outright madness. In fact, our solution would not be possible without conflict detection: non-blocking recursive make. This is analogous to the difference between blocking and non-blocking I/O: rather than waiting for a recursive make to finish, emake carries on executing subsequent commands in the build immediately, including other recursive makes. Conflict detection ensures that only the commands in each submake which require serialization are executed sequentially, so the build runs as quickly as possible, but the final build output is identical to a serial build.

The impact of this change is dramatic. Here I’ve plotted the execution of the simple build defined above on four cores, using both gmake (normal recursive make) and emake (non-blocking recursive make):

Recursive make build with gmake


Recursive make build with emake

Electric Make is able to execute this build about 20% faster than gmake, with no changes to the Makefiles or the execution environment. emake is literally able to squeeze more parallelism out of recursive-make-based builds than gmake. In fact, we can precisely quantify just how much more parallelism emake gets through an application of Amdahl’s law. First, we compute the best possible speedup for the build — that’s just the serial runtime divided by the best possible parallel runtime, which we can figure out through analysis of the depedency graph and runtime of individual jobs in the build (the Longest Serial Chain report in ElectricInsight can do this for you). Then we can compute the parallelizable portion P of the build by plugging the speedup S into this equation: P = 1 - (1 / S). Here’s how that works out for gmake and emake:

gmake emake
Serial baseline 65s 65s
Best build time 13.5s 7.5s
Best speedup 4.8x 8.7x
Parallel portion 79% 89%

On this build, non-blocking recursive make increases the parallel portion of the build by 10%. That may not seem like much, but Amdahl’s law shows how dramatically that difference affects the speedup you can expect as you apply more cores:

Implementation

On the backend, non-blocking recursive make is handled by conflict detection — the jobs from the recursive make are checked for conflicts in the serial order defined by the makefile structure. Any issues caused by aggressively running recursive makes early are detected during the conflict check, and the target that ran too early is rerun to generate the correct result.

On the frontend, emake uses a strategy that is at once both brilliant in its simplicity, and diabolical in its trickery. It starts with an environment variable. When emake is invoked recursively, it checks the value of EMAKE_BUILD_MODE. If it is set to node, emake runs in so-called stub mode: rather than executing the submake (parsing the makefile and building targets), emake captures the invocation context (working directory, command-line and environment) in a file on disk, prints a “magic” string and exits with a zero status code.

The file containing the invocation context is identified by a second environment variable, ECLOUD_RECURSIVE_COMMAND_FILE. The Accelerator agent (which handles invoking commands on behalf of emake) checks for the presence of that file after every command that is run. If it is found, the agent relays the content to the toplevel emake invocation, where a new make instance is created to represent the submake invocation. That instance comes with it’s own parse job of course, which gets inserted into the queue of jobs. Some (short) time later, the parse job will run, discover whatever work must be run by the submake, and create additional rule jobs.

The magic string — EMAKE_FNORD — serves as a placeholder in the stdout stream for the jobs, so emake can figure out which portion of the output text comes before and which portion comes after the submake. This ensures that the build output log is identical to that generated by a serialized gmake build. For example, given the following rule that invokes a submake, you’d expect to see the “Before” and “After” messages printed before and after the output generated by commands in the submake itself:

1
2
3
4
all:
@echo Before util ; \
@$(MAKE) -C util ; \
@echo After util

With non-blocking recursive make, the submake has not actually executed when the “echo After util” command runs. If emake doesn’t account for that reordering, both the “Before” and “After” messages will appear before any of the output from the submake. EMAKE_FNORD allows emake to “stitch” the output together so the build log matches a serial log.

Limitations

Conflict detection and non-blocking recursive make together solve the main problems associated with recursive make. But there are a couple scenarios where non-blocking recursive make does not work well. Fortunately, these are uncommon in practice and easily addressed.

Capturing recursive make stdout

The first scenario is when the build captures the output of the recursive make invocation, rather than letting it print to stdout as normal. Since emake defers the execution of the submake and prints only EMAKE_FNORD to stdout, this will not work. There are two reasons you might do this: first, you might want to have separate build logs for each submake, to simplify error detection and management. In this situation, the simplest workaround is to remove the redirection and instead us emake’s annotated build log, an XML version of the build output log which can be easily processed using standard tools. Second, you may be using make as a text-processing tool (sort of a “poor man’s” Perl), rather than for building per se:

1
2
3
all:
@$(MAKE) -f genlist.mk > objects.txt
@cat objects.txt | xargs rm

In this case, the workaround is to explicitly force emake to run in so-called “local” mode, which means emake will handle the recursive make invocation as a blocking invocation, just like traditional make would. You can force emake into local mode by adding EMAKE_BUILD_MODE=local to the environment before the recursive make invocation.

Immediate consumption of build products

The second scenario is when the build consumes the product of the submake in the same command that contains the invocation. For example:

1
2
all:
@$(MAKE) -C sub foo && cp sub/foo ./foo

Here the build assumes that the output files generated by the submake will be available for use immediately after the submake completes. Obviously this is not the case with non-blocking recursive make — when the invocation of $(MAKE) -C sub foo completes, only the submake stub has actually finished. The build products will not be available until after the submake is actually processed later. Note that in this build both the recursive make invocation and the commands that use the build products from that invocation are treated as a single command from the perspective of make: make actually invokes the shell, and the shell then runs the recursive make and cp commands.

The workaround is simple: split the consumer into a distinct command, from the perspective of make:

1
2
3
all:
@$(MAKE) -C sub foo
@cp sub/foo ./foo

With that trivial change, emake is able to treat the cp as a continuation job, which can be serialized against the completion of the recursive make as needed.

A fix for recursive make

For years, people have heaped scorn and criticism on recursive make. They’ve nearly convinced everybody that even considering its use is automatically wrong — you probably can’t help feeling a little bit guilty when you use recursive make. But the reality is that recursive make is a reasonable way to structure a large build. You just need a better make. With conflict detection and non-blocking recursive make, Electric Make has fixed the problems usually associated with recursive make, so you can get parallel builds that are both fast and correct. Give it a try!

post

What’s new in ElectricAccelerator 6.1

Electric Cloud announced the release of ElectricAccelerator 6.1 a few weeks ago, the 24th feature release of ElectricAccelerator since the company was founded in 2002. This release incorporates several enhancements that make the product more robust and more flexible. Here are the key additions:

Workload reporting

In many large organizations, a single massive Accelerator cluster is shared across multiple development teams, in order to reduce administration overhead and improve hardware utilization — naturally each team has “their” agents, but if those are not in use, they can be easily made available to other teams. Often, the maintenance cost is shared by the teams using the cluster, according to the amount they use the cluster. In order to facilitate this use case, Accelerator 6.1 adds workload reporting. At the conclusion of each build, emake reports the total CPU usage for the build to the cluster manager. The administrator can see a summary of usage by running the “Build Users” report from the cluster manager web interface:

Example build users report; click for full size

Example build users report. Click to view full size.

Multi-interface network support

In data centers it’s not uncommon to find servers configured with two network interfaces: for example, a gigabit connection for communication with systems outside the data center, and a 10 GigE, fiber-optic or Infiniband connection for extremely high-bandwidth communication with other systems inside the data center. Accelerator 6.1 adds explicit support for this configuration, so data transfers between agents in the cluster utilize the high-bandwidth secondary interface for increased performance.

Strong checksums for data integrity

For years Accelerator has relied on the checksum built into the TCP standard to ensure integrity of network data transfers. Unfortunately that checksum is relatively weak, and in rare cases we found it was insufficient to detect data errors introduced by faulty network hardware. In this release, we added an application-layer checksum to further guard against such problems. Accelerator 6.1 uses CRC32-c, chosen for its robust error detection capabilities and high performance.

Expanded platform support

Accelerator 6.1 adds support for a few new platforms and third-party tools, including:

  • Ubuntu 11.04
  • ClearCase 8

What’s next?

It feels great to have wrapped up Accelerator 6.1, but we’re not resting on our laurels. We’ve already done a lot of work for 6.2, which includes some key robustness improvements that have been literally years in the making; and we’re getting started on 7.0, which currently is planned to include some really exciting new performance enhancements around incremental builds — more on that soon.

Accelerator 6.1 is available immediately for current customers. New customers should contact sales@electric-cloud.com.

post

Measuring the Electric File System

Somebody asked me the other day what portion of the Electric File System (EFS) is shared code versus platform-specific code. The EFS, if you don’t know, is a custom filesystem driver and part of ElectricAccelerator. It enables us to virtualize the filesystem, so that each build job sees the correct view of the filesystem according to its logical position in the build, even if jobs are run out of order. It also provides the file usage information that powers our conflict detection algorithm. As a filesystem driver, the EFS is obviously tightly coupled to the platforms it’s used on, and the effort of porting it is one reason we don’t support more platforms than we do now — not that a dozen variants of Windows, 16 flavors of Linux and several versions of Solaris is anything to be ashamed of!

Anyway, the question intrigued me, and I found the answer quite surprising (click for full-size):

Note that here I’m measuring only actual code lines, exclusive of comments and whitespace, as of the upcoming 6.1 release. In total, the Windows version of the EFS has about 2x the lines of code that either the Solaris or Linux ports have. The platform-specific portion of the code is more than 6x greater!

Why is the Windows port so much bigger? To answer that question, I started looking at the historical size of the three ports, which lead to the following graph showing the total lines of code for (almost) every release we’ve made. Unfortunately our first two releases, 1.0 and 2.0, have been lost to the ether at some point over the past ten years, but I was able to collect data for every release starting from 2.1 (click for full-size):

Here you can see that the Windows port has always been larger than the others, but it’s really just a handful of Windows-specific features that blew up the footprint so much. The first of those was support for the Windows FastIO filesystem interface, an alternative “fast path” through the kernel that in theory enables higher throughput to and from the filesystem. It took us two tries to get that feature implemented, as shown on the graph, and all-told that contributed about 7,000 lines of code. The addition of FastIO to the filesystem means that the Windows driver has essentially three I/O interfaces: standard I/O, memory-mapped I/O and FastIO. In comparison, on Linux and Solaris you have only two: standard I/O and memory-mapped I/O.

The second significant difference between the platforms is that on Windows the EFS has to virtualize the registry in addition to the filesystem. In the 4.3 release we significantly enhanced that portion of the driver to allow full versioning of the registry along the same lines that the filesystem has always done. That feature added about 1,000 lines of code.

I marked a couple other points of interest on this graph as well. First, the addition of the “lightweight EFS” feature, which is when we switched from using RAM to store the contents of all files in the EFS to using temporary storage on the local filesystem for large files. That enabled Accelerator to handle significantly larger files, but added a fair amount of code. Finally, in the most recent release you can actually see a small decrease in the total lines of code on Solaris and Linux, which reflects the long-overdue removal of code that was needed to support legacy versions of those platforms (such as the 2.4.x Linux kernel).

I was surprised as well by the low rate of growth in the Solaris and Linux ports. Originally the Linux port supported only one version of the Linux kernel, but today it supports about sixteen. I guess this result reveals that the difficulty in porting to each new Linux version is not so much the amount of code to be added, but in figuring out exactly which few lines to add!

In fact, after the 4.4 release in early 2009, the growth has been relatively slow on all platforms, despite the addition of major new features to Accelerator as a whole over the last several releases. The reason is simply that most feature development involves changes to other components (primarily emake), rather than to the filesystem driver.

One last metric to look at is the number of unit tests for the code in the EFS. We don’t practice test-driven development for the most part, but we do place a strong emphasis on unit testing. Developers are expected to write unit tests for every change, and we strive to keep the tests isomorphic to the code to help ensure we have good coverage. Here’s how the total number of unit tests has grown over the years (click for full-size):

Note that this is looking only at unit tests for the EFS! We have thousands more unit tests for the other components of Accelerator, as well as thousands of integration tests.

Thankfully for my credibility, the growth in number of tests roughly matches the growth in lines of code! But I’m surprised that the ratio of tests to code is not more consistent across the platforms — that suggests a possible area for improvement. Rather than being discouraged by that though, I’m pleased with this result. After all, what’s the point of measuring something if you don’t use that data to improve?

post

Another confusing conflict in ElectricAccelerator

After solving the case of the confounding conflict, my user came back with another scenario where ElectricAccelerator produced an unexpected (to him) conflict:

1
2
3
4
5
6
all:
@$(MAKE) foo
@cp foo bar
foo:
@sleep 2 && echo hello world > foo

If you run this build without a history file, using at least two agents, you will see a conflict on the continuation job that executes the cp foo bar command, because that job is allowed to run before the job that creates foo in the recursive make invocation. After one run of course, emake records the dependency in history, so later builds don’t make the same mistake.

This situation is a bit different from the symlink conflict I showed you previously. In that case, it was not obvious what caused the usage that triggered the conflict (the GNU make stat cache). In this case, it’s readily apparent: the continuation job reads (or attempts to read) foo before foo has been created. That’s pretty much a text-book example of the sort of thing that causes conflicts.

What’s surprising in this example is that the continuation job is not automatically serialized with the recursive make that precedes it. In a very real sense, a continuation job is an artificial construct that we created for bookkeeping reasons internal to the implementation of emake. Logically we know that the commands in the continuation job should follow the commands in the recursive make. In fact it would be absolutely trivial for emake to just go ahead and stick in a dependency to ensure that the continuation is not allowed to start until after the recursive make finishes, thereby avoiding this conflict even when you have no history file.

Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case.

Absolutely trivial to do, yes — but also absolutely wrong. Not for correctness reasons, this time, but for performance. Remember, emake is all about maximizing performance across a broad range of builds. Given a choice between two strategies that both produce correct output, emake uses the strategy that produces the best performance in the general case. For continuation jobs, that means not automatically serializing the continuation against the preceding recursive make. I could give you a wordy, theoretical explanation, but it’s easier to just show you. Suppose that your makefile looked like this instead of the original — the difference here is that the continuation job itself launches another recursive make, rather than just doing a simple cp:

1
2
3
4
5
6
7
8
9
all:
@$(MAKE) foo
@$(MAKE) bar
foo:
@sleep 2 && echo hello world > foo
bar:
@sleep 2 && echo goodbye > bar

Hopefully you agree that the ideal execution of this build would have both foo and bar running in parallel. Forcing the continuation job to be serialized with the preceding recursive make would choke the performance of this build. And just in case you’re thinking that emake could be really clever by looking at the commands to be executed in the continuation job, and only serializing “when it needs to”: it can’t. First, that would require emake to implement an entire shell syntax parser (or several, really, since you can override SHELL in your makefile). Second, even if emake had that ability, it would be thwarted the instant the command is something like my_custom_script.pl — there’s no way to tell what will happen when that gets invoked. It could be a simple filesystem access. It could be a recursive make. It could be a whole series of recursive makes. Even when the command is something you think you recognize, can emake really be sure? Maybe cp is not our trustworthy standard Unix cp, but something else entirely.

Again, all is not lost for this user. If you want to avoid this conflict, you have a couple options:

  1. Use a good history file from a previous build. This is the simplest solution. You’ll only get conflicts in this build if you run without a history file.
  2. Refactor the makefile. You can explicitly describe the dependency between the commands in the continuation job and the recursive make by refactoring the makefile so that the stuff in the continuation is instead its own target, thus taking the decision out of emake’s hands. Here’s one way to do that:
    1
    2
    3
    4
    5
    6
    7
    8
    all: do_foo
    @cp foo bar
    do_foo:
    @$(MAKE) foo
    foo:
    @sleep 2 && echo hello world > foo

Either of these will eliminate the conflict from your build.

post

ElectricAccelerator and the Case of the Confounding Conflict

A user recently asked me why ElectricAccelerator reports a conflict in this simple build, when executed without a history file from a previous run:

1
2
3
4
5
6
7
all: foo symlink_to_foo
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo

Specifically, if you have at least two agents, emake will report a conflict between symlink_to_foo and foo, indicating that symlink_to_foo somehow read or otherwise accessed foo during execution! But ln does not access the target of a symlink when creating the symlink — in fact, you can even create a symlink to a non-existent file if you like. It seems obvious that there should be no conflict. What’s going on?

To understand why this conflict occurs, you have to wrap your head around two things. First, there’s more going on during a gmake-driven build than just the commands you see gmake invoke. That causes the usage that provokes the conflict. Second, emake considers a serial gmake build the “gold standard” — if a serial gmake build produces a particular result, so too must emake. That’s why the additional usage must result in a conflict.

In this case, the usage that triggers the conflict comes from management of the gmake stat cache. This is a gmake feature that was added to improve performance by avoiding redundant calls to stat() — once you’ve stat()‘d a file once, you don’t need to do it again. Unless the file is changed of course, which happens quite a lot during a build. To keep the stat cache up-to-date as the build progresses, gmake re-stat()‘s each target after it finishes running the commands for the target. So after the commands for symlink_to_foo complete, gmake stat()‘s symlink_to_foo again, using the standard stat() system call, which follows the symlink (in contrast to lstat(), which does not follow the symlink). That means gmake will actually cache the attributes of foo for symlink_to_foo.

To ensure compatibility with gmake, emake has to do the same. In Accelerator parlance, that means we get read usage on symlink_to_foo (because you have to read the symlink itself to determine the target of the symlink), and lookup usage on foo. The lookup on foo causes the conflict, because, of course, you will get a different result if you lookup foo before the job that creates it than you would get if you do the lookup after that job. Before the job, you’ll find that foo does not exist, obviously; after, you’ll find that it does.

But what difference does that make, really? In truth, if there’s no detectable difference in behavior, then it doesn’t matter at all. And in the example build there is no detectable difference — the build output is the same regardless of when exactly you stat() symlink_to_foo relative to when foo is created. But with a small modification to the build, it is suddenly becomes possible to see the impact:

1
2
3
4
5
6
7
8
9
10
all: foo symlink_to_foo reader
foo:
@sleep 2 && echo hello world > foo
symlink_to_foo:
@ln -s foo symlink_to_foo
reader: foo symlink_to_foo
@echo newer prereqs are: $?

Compare the output when this build is run serially with the output when the build is run in parallel — and note that I’m using gmake, so you can be certain I’m not trying to trick you with some peculiarity of emake’s implementation:

You can plainly see the difference: in the parallel build gmake stat()‘s symlink_to_foo before foo exists, so the stat cache records symlink_to_foo as non-existent. Then when gmake generates the value of $? for reader, symlink_to_foo is excluded, because non-existent files are never considered newer than existing files. In the serial build, gmake stat()‘s symlink_to_foo after foo has been created, so the stat cache indicates that symlink_to_foo exists and is newer than reader, so it is included in $?.

Hopefully you see now both what causes the conflict, and why it is necessary. The conflict occurs because of lookup usage generated when updating the stat cache. The conflict is necessary to ensure that the build output matches that produced by a serial gmake — the “gold standard” for build correctness. If no conflict is declared, there is the possibility for a detectable difference in build output compared to serial gmake.

However, you might be thinking that although it makes sense to treat this as a conflict in the general case, isn’t it possible to do something smarter in this specific case? After all, the orignal example build does not use $?, and without that there isn’t any detectable difference in the build output. So why not skip the conflict?

The answer is simple, if a bit disappointing. In theory it may be possible to elide the conflict by checking to see if the symlink is used by a later job in a manner that would produce a detectable difference (for example, by scanning the commands for subsequent targets for references to $?), but in reality the logistics of that check are daunting, and I’m not confident that we could guarantee correct behavior in all cases.

Fortunately all is not lost. If you wish to avoid this conflict, you have several options:

  1. Use a good history file from a previous build. This is the most obvious solution. You’ll only get conflicts if you run without a history file.
  2. Add an explicit dependency. If you make foo an explicit prereq of symlink_to_foo, then you will avoid the conflict. Here’s how that would look:
    1
    symlink_to_foo: foo
  3. Change the serial order. If you reorder the makefile so that symlink_to_foo has an earlier serial order than foo you will avoid the conflict. That just requires a reordering of the prereqs of all:
    1
    all: symlink_to_foo foo

Any one of these will eliminate the conflict from your build, and you’ll enjoy fast and correct parallel builds.

Case closed.

post

ElectricMake debug log levels

Often when analyzing builds executed with ElectricMake, all the information you need is in the annotation file — an easily digested XML file containing data such as the relationships between the jobs, the commands run, and the timing of each job. But sometimes you need more detail, and that’s where the emake debug log comes in.

To enable emake debug logging, you specify a pair of command-line arguments: ––emake-debug=value, which specifies the types of debug logging to enable as a set of single-letter values, such as “jng”; and ––emake-logfile=path, which specifies the location of the debug log. In this article I’ll explain each of emake’s debug log levels. Use this index to jump to the definition of a specific log level:

DISCLAIMER: emake debug logs are intended for use by Electric Cloud engineering and support staff. Debug logging contents and availability are subject to change in any release, for any or no reason. Enter at your own risk, your mileage may vary, etc. etc. The information in this article refers to ElectricAccelerator 6.0.

a: agent allocation

Agent allocation logging provides detailed information about emake’s attempts to procure agents from the cluster manager during the build. If you think emake may be stalled trying to acquire agents, allocation logging will help to understand what’s happening.

c: cache

Cache logging records details about the filesystem cache used by emake to accelerate parse jobs in cluster builds. For example, it logs when a directory’s contents are added to the cache, and the result of lookups in the cache. Since it is only used during remote parse jobs, you’ll have to use it with the ––emake-rdebug=value option. Use cache logging if you suspect a problem with the cached local filesystem.

e: environment

Environment logging augments node logging with a dump of the entire environment block for every job as it is sent to an agent. Normally this is omitted because it’s quite verbose (could be as much as 32KB per job). Usually you’re better off using env-level annotation, which is more compact and easier to parse.

f: filesystem

Filesystem logging records numerous details about emake’s interaction with its versioned filesystem data structure. In particular, it logs every time that emake looks up a file (when doing up-to-date checks, for example), and it logs every update to the versioned file system caused by file usage during the build’s execution. This level of logging is very verbose, so you shouldn’t enable it as a general rule. It’s most often used when diagnosing issues related to the versioned filesystem and conflicts.

g: profiling

Profiling logging is one of the easiest to interpret and most useful types of debug logging. When enabled, emake will emit hundreds of performance metrics at the end of the build. This is a very lightweight logging level, and is safe (even advisable) to enable for all builds.

h: history

History logging prints messages related to the data tracked in the emake history file — both filesystem dependencies and autodep information. When history logging is enabled, emake will print a message every time a dependency is added to the history file, and it will print information about the files checked during up-to-date checks based on autodep data. Enable history logging if you suspect a problem with autodep behavior.

j: job

Job logging prints minimal messages related to the creation and execution of jobs. For each job you’ll see a message when it starts running, when it finishes running, and when emake checks the job for conflicts. If there is a conflict in the job, you’ll see a message about that too. If you just want a general overview of how the build is progressing, j-level logging is a good choice.

L: nmake lexer

emake uses a generated parser to process portions of nmake makefiles. Lexer debug logging enables the debug logging in that generated code. This is generally not useful to end users as it is too low-level.

l: ledger

Ledger debug logging prints information about build decisions based on data in the ledger file, as well as updates made to the ledger file. Enable it if you believe the ledger is not functioning correctly.

m: memory

When memory logging is enabled, emake will print memory usage metrics to the debug log once per second. This includes the total process memory usage as well as current and peak memory usage grouped into several “buckets” corresponding to various types of data in emake. For example, the “Operation” bucket indicates the amount of memory used to store file operations; the “Variable” bucket is the amount of memory used for makefile variables. This is most useful when you are experiencing an out-of-memory failure in emake, as it can provide guidance as to how memory is being utilitized during the build, and how quickly it is growing.

n: node

Node logging prints detailed information about all messages between emake and the agents, including filesystem data and commands executed. Together with job logging, this can give a very comprehensive picture of the behavior of a build. However, node logging is extremely verbose, so you should enable it only when you are chasing a specific problem.

o: parse output

When parse output logging is enabled, emake will preserve the raw result of parsing a makefile. The result is a binary file containing information about all the targets, rules, dependencies and variables extracted from makefiles read during a parse job. This can be useful when investigating parser incompatibility issues and scheduling issues (for example, if a rule is not being scheduled for execution when you expect). Note that this debug level only makes sense when parsing, which means you have to specify it in the ––emake-rdebug option. The parse results will be saved in the ––emake-rlogdir directory, named as parse_jobid.out. Note that the directory may be on the local disk of the remote nodes, depending on the value you specify!

p: parse

Parse debug logging prints extremely detailed information about the reading and interpretation of makefiles during a parse job. This is most useful when investigating parser compatibility issues. This output is very verbose, so you should only enable this when pursuing a specific problem. Note that like parse output logging, this debug level only makes sense during parsing, which means you have to specify it in the ––emake-rdebug option. The parse log files will be saved in the ––emake-rlogdir directory, named as parse_jobid.dlog. Note that the directory may be on the local disk of the remote nodes, depending on the value you specify!

r: parse relocation

Parse relocation logging prints low-level information about the process of transmitting parse result data to emake at the end of a parse job. It’s only used internally when we are extending the parse result format, and so is unlikely to be of interest to end users.

s: subbuild

Subbuild logging prints details about decisions made while using the emake subbuild feature. You should enable it if you believe that the subbuild feature is not working correctly.

Y: authentication

Authentication logging is a subset of node logging that prints only those messages related to authenticating emake to agents and vice-versa. If you are having problems using the authentication feature, you should enable this debug level.

post

6 reasons your development team should be using instant messaging

The ElectricAccelerator development team sits at desks less than 30 feet apart, but despite our close proximity, we don’t often speak to one another. To an outside observer this may seem to be a sign of disfunction in the team — after all, developers have to communicate to work effectively. Some people think we’re obviously not communicating, but the truth is that we’re not obviously communicating! That’s because we use instant messaging for most of our communications, including status updates, technical collaboration and even code reviews, rather than face-to-face conversations. I believe this has made my team more connected and more productive. Here are six reasons why instant messaging trumps face-to-face conversations for software teams.

1. Logging

The key advantage of instant messaging is that all conversations are logged automatically. As a result I’ve got records of every conversation with every member of my team for the past two years. That’s proven invaluable on a few occasions, to provide additional context for decisions made weeks or months earlier. Obviously this is not a replacement for other types of project documentation, but it is a fantastic supplement.

2. Non-intrusive

The second most important advantage of instant messaging is that it’s relatively non-intrusive, at least compared to a face-to-face conversation. We all know how important it is to get into and preserve a state of flow when programming. Spoken conversations, by social convention, command your immediate attention — effectively an interrupt of the highest order. When somebody comes to my desk to ask me something in person, they are implicitly saying, “What I have to say to you is more important than anything else you might be doing right now.” Sometimes that’s true, but many times it’s not. And yet every time somebody initiates a face-to-face conversation with me, it destroys whatever flow I might have developed.

In contrast, instant messaging allows me to defer a response until I reach a good breaking point, so people can ask questions without interrupting me.

3. Non-disruptive

Our office has an open floor plan, which means that instead of individual offices or cubicles, we have a single big room. This layout worked very well when the company had only 6 people, who were all working on the same project. Now the company employs over 100 people, with two separate development teams working on completely different products, so the open layout doesn’t work quite so well. Conversations between other people can be very distracting when you’re heads down on a tricky technical problem. By using instant messaging instead of face-to-face conversations, we significantly reduce the distraction for our collegues.

4. Simultaneous conversations

Carrying on multiple face-to-face conversations on disparate topics is practically impossible, but doing the same via instant messenger is simple. Every IM client I’ve seen displays the last several messages of each active conversation, so you have context when a new message arrives. That signficiantly reduces the mental burden associated with each conversation, so it becomes possible to sustain several simultaneously. I often have five conversations “active” during the work day, and sometimes even more.

5. Consistency

Unlike face-to-face conversations, IM works well regardless of the relative locations of the conversants. That means that it doesn’t matter if my colleague is in the office with me, or working from home, or working from a customer site, or halfway around the world. I can use the same tool to communicate with them, which in turn means I don’t have to change the way I work to accomodate changes in the way they are working.

6. Versatility

One final advantage of instant messaging compared to face-to-face conversation is the versatility of the medium. I can trivially share a code fragment with somebody via IM, or a link to an online resource. Try doing that in a face-to-face conversation: “Yeah, you should check out the STL reference docs, at aich tee tee pee colon slash slash double you double you double you dot …”.

Instant messaging: give it a try

If you’re not already using instant messaging in your development team, give it a try. There are multiple free IM services out there, and there are good free IM clients on every platform, including smart phones, so you’ve really got nothing to lose — but you might gain a more efficient, productive team. It worked for us.

post

LEGO “Ship It!” Awards

Scriptics Connect 1.1 "Ship It!" Award

What am I supposed to do with this?


When we wrapped up the ElectricAccelerator 6.0 release recently, I wanted to give my teammates something to commemorate the release. Traditionally these are called “Ship It!” awards, and they often take the form of a Lucite plaque or trophy, or even a physical copy of the product (on DVD or CD, for example) locked inside an acrylic block. I’ve gotten a couple of those over the years, and honestly I think they’re kind of a waste. The last one I got went on a shelf to collect dust for a few years before being relocated to the trash heap, which is a shame because those things are expensive. Really expensive. I can’t even imagine the cost of the monster awards that Microsoft gives out.

So I don’t really like the usual embodiment of the “Ship It” award, but I do really like the underlying idea. After all, shipping a software release is a significant accomplishment, the culmination of months or even years of effort by a team of smart individuals. And unlike many other human endeavors, there’s nothing tangible when you’re finished — no bridge spanning the bay nor tower reaching to the heavens. Having something to commemorate the accomplishment seems fitting, and it’s another small way that I can show my appreciation for everybody’s contributions.

The Ideal “Ship It!” Award

To me, the ideal “Ship It!” award has the following attributes:

  • Themeable: I wanted something I could customize for each release, while maintaining consistency across releases. I plan to make this a tradition.
  • Inexpensive: I wanted something I could bankroll myself, so I could retain complete creative control.
  • Compact: I wanted something that wouldn’t take up much space, so it would be portable and easy to display.
  • Geek appeal: I wanted something that my teammates would think is cool. Chunks of Lucite just don’t cut it.

LEGO “Ship It!” Awards

LEGO race car driver minifig

The winner!


After a few days of idle brainstorming and bouncing ideas off my manager and co-conspirator, I had what seemed like a great idea: LEGO minifigs. I could get a bunch of a specific LEGO minifig and give one to each person on the team. It fit all my criteria. There have been over 4,000 different minifigs released since 1978, according to The Cult of LEGO. In the last two years alone LEGO has release five minifig packs, each with 16 completely new figures, so I can count on having a unique character for every feature release for the next several years. Minifigs are cheap, too — the majority can be bought for as little as a couple dollars each on Amazon or ebay. They’re obviously small. And of course, minifigs are dripping with geek appeal. What techie doesn’t like LEGO?

There was just one small problem. Minifigs are a little bit too small. There’s nowhere to put the information that would identify what it represented — the product name, release version and date, and so on. A couple more days of brainstorming gave me the solution: custom baseball cards. There are several companies that will print custom baseball cards. These outfits are obviously intended for children’s sports teams, but they will happily print cards with whatever graphic you want. You just have to create images of the front and back of your card and upload to their website. And like the minifigs themselves, the cards are inexpensive, at about $1 per card.

The ElectricAccelerator 6.0 “Ship It!” Award

For the ElectricAccelerator 6.0 “Ship It!” Award, I chose the race car driver shown above (because Accelerator is all about performance, of course!). I bought the minifigs on ebay. I spent a couple hours designing the card, then ordered them from CustomSportsProducts.com. The front shows the minifig, the product name, version, and release date, and the major new features; the back lists the names of everybody on the team. Total cost for awards for the entire team was about $40 for materials — about the cost of just one traditional Lucite-based “Ship It” award.

I was a little nervous when I presented the awards to my team a couple weeks ago, but as it turns out I needn’t have been! The reception was overwhelmingly positive. Although I hadn’t explicitly planned it this way, the minifigs actually arrived unassembled, in individual pouches. Immediately upon getting theirs, each person dumped out the pieces and started assembly — it was practically instinctive! Several people commented out loud that the award was “Awesome!” or “Really cool,” and, of course, “Kinda nerdy, but cool!” With that kind of reaction, you can bet that I’m already planning for the next release.

And finally, here’s a picture of the ElectricAccelerator 6.0 “Ship It!” Award, as it is proudly displayed on my desk:

ElectricAccelerator 6.0 "Ship It!" Award

Who doesn't love LEGO?

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: